From zhushisongzhu at yahoo.com  Fri Sep  1 03:26:39 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Fri, 1 Sep 2006 03:26:39 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060830045927.GB25478@mellanox.co.il>
Message-ID: <20060901102639.55709.qmail@web36915.mail.mud.yahoo.com>

OFED-1.1-rc3 has passed my tests. I have to adjust
Post buffer size to 0x4 and use your patch for me. 
Can you make it fixed not to do these myself manually?

 zhu


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From bunk at stusta.de  Fri Sep  1 09:00:23 2006
From: bunk at stusta.de (Adrian Bunk)
Date: Fri, 1 Sep 2006 18:00:23 +0200
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901015818.42767813.akpm@osdl.org>
References: <20060901015818.42767813.akpm@osdl.org>
Message-ID: <20060901160023.GB18276@stusta.de>

On Fri, Sep 01, 2006 at 01:58:18AM -0700, Andrew Morton wrote:
>...
> Changes since 2.6.18-rc4-mm3:
>...
> +amso1100-build-fix.patch
> 
>  Fix git-infiniband.patch
>...

This causes the following compile error on i386:

<--  snip  -->

...
  CC      drivers/infiniband/hw/amso1100/c2.o
/home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c: In function ‘c2_tx_ring_alloc’:
/home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function ‘__raw_writeq’
make[4]: *** [drivers/infiniband/hw/amso1100/c2.o] Error 1

<--  snip  -->

There seems to be some confusion regarding whether __raw_writeq() is 
considered a platform independent API.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


From robert.j.woodruff at intel.com  Fri Sep  1 09:12:42 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 1 Sep 2006 09:12:42 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C853D89@orsmsx418.amr.corp.intel.com>

 Tziporet Wrote,
>Hi,

>OFED 1.1-RC3 is available on 
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
>File: OFED-1.1-rc3.tgz
>Please report any issues in bugzilla http://openib.org/bugzilla/

Hi all, I installed the RC3 package on my Xeon/Lindenhurst platforms
and with the pathscale card I have the following problem
when trying to run Intel MPI and NetPipe.
The is on a RedHat EL4-U3 (2.6.9-34EL kernel).
Has anyone else been able to run DAPL/RDMA programs like Intel MPI
over the Pathscale cards with OFED-1.1-RC3 ?


# List of Benchmarks to run:

# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Alltoall
# Bcast
# Barrier
[1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
error. status=0x8. cookie=0x514ed0
rank 1 in job 2  rkl-13-ib0_32788   caused collective abort of all ranks
  exit status of rank 1: return code 255 

I also tried a version of NetPipe that was modified to use DAPL and
it works for messages < 2048 and then hangs 

Now starting the main loop
  0:       1 bytes   1000 times -->      1.53 Mbps in       4.98 usec
  1:       2 bytes   1000 times -->      3.11 Mbps in       4.91 usec
  2:       3 bytes   1000 times -->      4.72 Mbps in       4.85 usec
  3:       4 bytes   1000 times -->      6.20 Mbps in       4.92 usec
  4:       6 bytes   1000 times -->      9.23 Mbps in       4.96 usec
  5:       8 bytes   1000 times -->     12.41 Mbps in       4.92 usec
  6:      12 bytes   1000 times -->     18.61 Mbps in       4.92 usec
  7:      13 bytes   1000 times -->     19.98 Mbps in       4.96 usec
  8:      16 bytes   1000 times -->     24.48 Mbps in       4.99 usec
  9:      19 bytes   1000 times -->     29.04 Mbps in       4.99 usec
 10:      21 bytes   1000 times -->     32.31 Mbps in       4.96 usec
 11:      24 bytes   1000 times -->     36.72 Mbps in       4.99 usec
 12:      27 bytes   1000 times -->     40.67 Mbps in       5.06 usec
 13:      29 bytes   1000 times -->     44.24 Mbps in       5.00 usec
 14:      32 bytes   1000 times -->     49.15 Mbps in       4.97 usec
 15:      35 bytes   1000 times -->     53.18 Mbps in       5.02 usec
 16:      45 bytes   1000 times -->     67.60 Mbps in       5.08 usec
 17:      48 bytes   1000 times -->     72.32 Mbps in       5.06 usec
 18:      51 bytes   1000 times -->     76.65 Mbps in       5.08 usec
 19:      61 bytes   1000 times -->     90.20 Mbps in       5.16 usec
 20:      64 bytes   1000 times -->     94.50 Mbps in       5.17 usec
 21:      67 bytes   1000 times -->     96.89 Mbps in       5.28 usec
 22:      93 bytes   1000 times -->    134.20 Mbps in       5.29 usec
 23:      96 bytes   1000 times -->    137.11 Mbps in       5.34 usec
 24:      99 bytes   1000 times -->    139.72 Mbps in       5.41 usec
 25:     125 bytes   1000 times -->    175.08 Mbps in       5.45 usec
 26:     128 bytes   1000 times -->    184.12 Mbps in       5.30 usec
 27:     131 bytes   1000 times -->    184.72 Mbps in       5.41 usec
 28:     189 bytes   1000 times -->    258.25 Mbps in       5.58 usec
 29:     192 bytes   1000 times -->    269.32 Mbps in       5.44 usec
 30:     195 bytes   1000 times -->    270.92 Mbps in       5.49 usec
 31:     253 bytes   1000 times -->    339.98 Mbps in       5.68 usec
 32:     256 bytes   1000 times -->    347.01 Mbps in       5.63 usec
 33:     259 bytes   1000 times -->    349.64 Mbps in       5.65 usec
 34:     381 bytes   1000 times -->    491.64 Mbps in       5.91 usec
 35:     384 bytes   1000 times -->    495.59 Mbps in       5.91 usec
 36:     387 bytes   1000 times -->    493.49 Mbps in       5.98 usec
 37:     509 bytes   1000 times -->    621.98 Mbps in       6.24 usec
 38:     512 bytes   1000 times -->    639.37 Mbps in       6.11 usec
 39:     515 bytes   1000 times -->    632.97 Mbps in       6.21 usec
 40:     765 bytes   1000 times -->    854.35 Mbps in       6.83 usec
 41:     768 bytes   1000 times -->    878.14 Mbps in       6.67 usec
 42:     771 bytes   1000 times -->    878.74 Mbps in       6.69 usec
 43:    1021 bytes   1000 times -->   1067.29 Mbps in       7.30 usec
 44:    1024 bytes   1000 times -->   1073.29 Mbps in       7.28 usec
 45:    1027 bytes   1000 times -->   1076.14 Mbps in       7.28 usec
 46:    1533 bytes   1000 times -->   1396.85 Mbps in       8.37 usec
 47:    1536 bytes   1000 times -->   1407.83 Mbps in       8.32 usec
 48:    1539 bytes   1000 times -->   1385.12 Mbps in       8.48 usec
 49:    2045 bytes   1000 times -->   1647.53 Mbps in       9.47 usec
 50:    2048 bytes   1000 times -->   1657.56 Mbps in       9.43 usec
 51:    2051 bytes   1000 times -->
<- hangs here


From vuhuong at mellanox.com  Fri Sep  1 09:25:01 2006
From: vuhuong at mellanox.com (Vu Pham)
Date: Fri, 01 Sep 2006 09:25:01 -0700
Subject: [openib-general] Srp question
In-Reply-To: <C11CADDA.3505%minich@ornl.gov>
References: <C11CADDA.3505%minich@ornl.gov>
Message-ID: <44F85EDD.4090102@mellanox.com>


> 
> By default, we found (via stats from the DDN) that we were only seeing reads
> and writes in the 0-32Kbyte range.  Comparing IBGold and OFED, we found that
> the srp_sg_tablesize defaulted to 256, but in OFED it defaulted to 12.  So,
> changing this (via modprobe.conf) to 256 in OFED, we were able to see reads
> and writes in the 128Kbyte range (which is what ultimately got us to the
> performance above).  I also noticed that there is a max_sects option you can
> pass to add_target (in the SRP /sys entries) which seemed to be the same
> idea as srp_sg_tablesize, but this didn't seem to affect anything.
> 
> So, my question is, what is the right magic to get SRP up to speed?


I played around with these parameters: srp_sg_tablesize (via 
modprobe.conf or passing it directly), max_sect and 
max_cmd_per_lun.

srp_sg_tablesize={32, 64, and 128}
max_sect={512, 1024, and 2048}
max_cmd_per_lun={1, 2, 4, 8, 16, 32, and default 64} --> 
this really depends on the storage to have the right number

-vu


From caitlin.bestler at gmail.com  Fri Sep  1 10:14:09 2006
From: caitlin.bestler at gmail.com (Caitlin Bestler)
Date: Fri, 1 Sep 2006 10:14:09 -0700
Subject: [openib-general] single rkey
In-Reply-To: <loom.20060831T163651-327@post.gmane.org>
References: <loom.20060831T163651-327@post.gmane.org>
Message-ID: <469958e00609011014y222e10eakd4714d35fed35891@mail.gmail.com>

On 8/31/06, yipee <yipeeyipeeyipeeyipee at yahoo.com> wrote:
> Hi,
>
> Is it possible for several memory registrations (using ibv_reg_mr) to have a
> single rkey?
> Can I add memory registrations to a previous rkey?
>
>

You need to create the Memory Region as large as you think it will
need to be. But there are two things you can keep in mind:

1) You can create multiple Memory Regions. A scatter gather list
    can reference multiple memory regions.
2) You can create Memory Windows within Memory Regions to
    limit the scope exposed to the remote end.
3) The same pages can be registered to multiple Memory Regions.
    So you could create a *new* Memory Region that included the
    prior pages *and* the new pages, use that, and release the old
    Memory Region eventually when all use of it ended.

The goal of making Memory Region lookups by hardware efficient
has encouraged most implementations to use data structures
that are not friendly for dynamic host manipulation while the
Memory Region is already in use. That's why the API is designed
to set the contents of a Memory Region in a single operation
rather than by piecemeal addition.


From akpm at osdl.org  Fri Sep  1 10:13:40 2006
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 1 Sep 2006 10:13:40 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901160023.GB18276@stusta.de>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
Message-ID: <20060901101340.962150cb.akpm@osdl.org>

On Fri, 1 Sep 2006 18:00:23 +0200
Adrian Bunk <bunk at stusta.de> wrote:

> On Fri, Sep 01, 2006 at 01:58:18AM -0700, Andrew Morton wrote:
> >...
> > Changes since 2.6.18-rc4-mm3:
> >...
> > +amso1100-build-fix.patch
> > 
> >  Fix git-infiniband.patch
> >...
> 
> This causes the following compile error on i386:
> 
> <--  snip  -->
> 
> ...
>   CC      drivers/infiniband/hw/amso1100/c2.o
> /home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c: In function ‘c2_tx_ring_alloc’:
> /home/bunk/linux/kernel-2.6/linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function ‘__raw_writeq’
> make[4]: *** [drivers/infiniband/hw/amso1100/c2.o] Error 1
> 

That would have been me cheerfully deleting stuff because it didn't build
on powerpc.

> 
> There seems to be some confusion regarding whether __raw_writeq() is 
> considered a platform independent API.
> 

It appears to be undocumented and uncommented hence it's not an API
_at all_, is it?

What's __raw_writeq() supposed to do, anyway?  On alpha it's writeq()
without an mb().  On parisc it's writeq() only the data is byte-reversed. 
On sparc64() it's incomprehensible.  On everything else it's writeq().

What a crock.


From rdreier at cisco.com  Fri Sep  1 10:34:24 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 10:34:24 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901101340.962150cb.akpm@osdl.org> (Andrew Morton's
	message of "Fri, 1 Sep 2006 10:13:40 -0700")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org>
Message-ID: <adak64nij8f.fsf@cisco.com>

    Andrew> What's __raw_writeq() supposed to do, anyway?  On alpha
    Andrew> it's writeq() without an mb().  On parisc it's writeq()
    Andrew> only the data is byte-reversed.  On sparc64() it's
    Andrew> incomprehensible.  On everything else it's writeq().

My understanding is that __raw_writeq() is like writeq() except not
strongly ordered and without the byte-swap on big-endian
architectures.  The __raw_writeX() variants are convenient to avoid
having to write inefficient code like writel(swab32(foo), ...) when
talking to a PCI device that wants big-endian data.  Without the raw
variant, you end up with a double swap on big-endian architectures.

sparc64 looks wrong, since __raw_writeq() seems identical to writeq(),
which seems to imply it's going to swab what is stores.

 - R.


From rjwalsh at pathscale.com  Fri Sep  1 11:20:38 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Fri, 01 Sep 2006 11:20:38 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C853D89@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C853D89@orsmsx418.amr.corp.intel.com>
Message-ID: <44F879F6.8080601@pathscale.com>

> Hi all, I installed the RC3 package on my Xeon/Lindenhurst platforms
> and with the pathscale card I have the following problem
> when trying to run Intel MPI and NetPipe.

Actually, I've been trying to run Intel MPI myself, but haven't gotten 
very far yet.  My attempts die like this:

   $ mpiexec -n 2 ./mpitest
   I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so
   I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma
   I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so
   I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma
   [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
   [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
   rank 0 in job 2  ib-idev-05_51713   caused collective abort of all ranks
     exit status of rank 0: return code 254

dapltest seems to work just fine, so I'm a little confused.  Do you have 
any insight on what the DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) stuff is 
referring to?

Regards,
  Robert.


From akpm at osdl.org  Fri Sep  1 11:23:12 2006
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 1 Sep 2006 11:23:12 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <adak64nij8f.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
Message-ID: <20060901112312.5ff0dd8d.akpm@osdl.org>

On Fri, 01 Sep 2006 10:34:24 -0700
Roland Dreier <rdreier at cisco.com> wrote:

>     Andrew> What's __raw_writeq() supposed to do, anyway?  On alpha
>     Andrew> it's writeq() without an mb().  On parisc it's writeq()
>     Andrew> only the data is byte-reversed.  On sparc64() it's
>     Andrew> incomprehensible.  On everything else it's writeq().
> 
> My understanding is that __raw_writeq() is like writeq() except not
> strongly ordered and without the byte-swap on big-endian
> architectures.  The __raw_writeX() variants are convenient to avoid
> having to write inefficient code like writel(swab32(foo), ...) when
> talking to a PCI device that wants big-endian data.  Without the raw
> variant, you end up with a double swap on big-endian architectures.
> 
> sparc64 looks wrong, since __raw_writeq() seems identical to writeq(),
> which seems to imply it's going to swab what is stores.
> 

OK.  Can we please stop hacking around this in drivers and

a) work out what it's supposed to do

b) document that (Documentation/DocBook/deviceiobook.tmpl or code
   comment or whatever)

c) tell arch maintainers?


From robert.j.woodruff at intel.com  Fri Sep  1 11:28:03 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 1 Sep 2006 11:28:03 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C85400E@orsmsx418.amr.corp.intel.com>

Robert Walsh wrote,
> [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
>create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>   [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
>create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>   rank 0 in job 2  ib-idev-05_51713   caused collective abort of all
ranks
>     exit status of rank 0: return code 254

What version of Intel MPI are you running ? This looks like an error
that we saw with the 2.0 release, not sure if this was a DAPL
issue or an MPI issue, Arlin would remember for sure.

You should get the Intel MPI 2.0.1 refresh release
or the 3.0 beta release to make sure that you have all of the latest
MPI fixes.

woody


From rjwalsh at pathscale.com  Fri Sep  1 11:32:38 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Fri, 01 Sep 2006 11:32:38 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C85400E@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C85400E@orsmsx418.amr.corp.intel.com>
Message-ID: <44F87CC6.1060606@pathscale.com>

Woodruff, Robert J wrote:
> Robert Walsh wrote,
>> [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
>> create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>>   [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
>> create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>>   rank 0 in job 2  ib-idev-05_51713   caused collective abort of all
> ranks
>>     exit status of rank 0: return code 254
> 
> What version of Intel MPI are you running ? This looks like an error
> that we saw with the 2.0 release, not sure if this was a DAPL
> issue or an MPI issue, Arlin would remember for sure.
> 
> You should get the Intel MPI 2.0.1 refresh release
> or the 3.0 beta release to make sure that you have all of the latest
> MPI fixes.

I'm running 2.0.1.  The package number bit of the tar file was "12". 
I'm running the DAPL that came with OFED-1.1-RC3.

Can you send me a pointer to the 3.0 beta release?

Regards,
  Robert.


From Brian.Cain at ge.com  Fri Sep  1 11:51:16 2006
From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare))
Date: Fri, 1 Sep 2006 14:51:16 -0400
Subject: [openib-general] PXE + infiniband?
Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>

A while back
(http://openib.org/pipermail/openib-general/2005-September/010801.html)
there was mention of putting PXE stuff on an HCA.  Has anyone done this
with PXELINUX?  It doesn't seem like it's as straightforward as just
putting the stock PXELINUX image on your HCA.  I'm assuming this image
would have to recognize the HCA and bring up IPoIB in order to use the
conventional TFTP transport?

--
-Brian


From Brian.Cain at ge.com  Fri Sep  1 12:21:24 2006
From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare))
Date: Fri, 1 Sep 2006 15:21:24 -0400
Subject: [openib-general] PXE + infiniband?
In-Reply-To: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>
Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7F1@CINMLVEM11.e2k.ad.ge.com>

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Cain, 
> Brian (GE Healthcare)
> Sent: Friday, September 01, 2006 1:51 PM
> To: openib-general at openib.org
> Subject: [openib-general] PXE + infiniband?
> 
> A while back
> (http://openib.org/pipermail/openib-general/2005-September/010
> 801.html)
> there was mention of putting PXE stuff on an HCA.  Has anyone 
> done this
> with PXELINUX?  It doesn't seem like it's as straightforward as just
> putting the stock PXELINUX image on your HCA.  I'm assuming this image
> would have to recognize the HCA and bring up IPoIB in order to use the
> conventional TFTP transport?

Ok, nm -- I found that etherboot has a README.boot_over_ib which looks
like it'll probably work well.

-Brian


From sean.hefty at intel.com  Fri Sep  1 12:37:28 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 1 Sep 2006 12:37:28 -0700
Subject: [openib-general] [PATCH] cma: protect against adding device
 during destruction
In-Reply-To: <20060831193707.GA3859@mellanox.co.il>
Message-ID: <000001c6cdfe$0f4d0190$e598070a@amr.corp.intel.com>

>I'll test some, but the problem hasn't reappeared since.
>The patch looks right, I'd say push it for 2.6.18.

We need the following change, which applies on top of the previous patch, as
well.

Add missing synchronization around acquiring an IB device.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Index: cma.c
===================================================================
--- cma.c	(revision 9217)
+++ cma.c	(revision 9218)
@@ -1031,7 +1031,9 @@ static int cma_req_handler(struct ib_cm_
 	}
 
 	atomic_inc(&conn_id->dev_remove);
+	mutex_lock(&lock);
 	ret = cma_acquire_ib_dev(conn_id);
+	mutex_unlock(&lock);
 	if (ret) {
 		ret = -ENODEV;
 		cma_release_remove(conn_id);


From robert.j.woodruff at intel.com  Fri Sep  1 12:40:12 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 1 Sep 2006 12:40:12 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C8540C2@orsmsx418.amr.corp.intel.com>

Tzporet wrote,
>Hi,

>OFED 1.1-RC3 is available on 
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
>File: OFED-1.1-rc3.tgz
>Please report any issues in bugzilla http://openib.org/bugzilla/

I tried running OFED1.1-rc3 on my Itanium machines on RedHat EL4-U3
and got the following error.

[root at iclust-tiger1 woody]# /etc/init.d/openibd start
Loading HCA driver and Access Layer:                       [FAILED]

Please open an issue in the http://openib.org/bugzilla and attach
/tmp/ib_debug_info.log

dmesg

ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current).
ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW.
ib_uverbs: Unknown symbol hpage_shift      <-----------------------I
think this is the problem
divert: not allocating divert_blk for non-ethernet device ib0
divert: not allocating divert_blk for non-ethernet device ib1
ip_tables: (C) 2000-2002 Netfilter core team


From rdreier at cisco.com  Fri Sep  1 12:53:47 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 12:53:47 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901112312.5ff0dd8d.akpm@osdl.org> (Andrew Morton's
	message of "Fri, 1 Sep 2006 11:23:12 -0700")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org>
Message-ID: <ada8xl3ics4.fsf@cisco.com>

    Roland> My understanding is that __raw_writeq() is like writeq()
    Roland> except not strongly ordered and without the byte-swap on
    Roland> big-endian architectures.  The __raw_writeX() variants are
    Roland> convenient to avoid having to write inefficient code like
    Roland> writel(swab32(foo), ...) when talking to a PCI device that
    Roland> wants big-endian data.  Without the raw variant, you end
    Roland> up with a double swap on big-endian architectures.

Oh, I left one other thing out: writeq() and __raw_writeq() shold be
atomic in the sense that no other transactions should be able to get
onto the IO bus in the middle -- so implementing writeq() as two
writel()s in a row is not allowed

    Andrew> OK.  Can we please stop hacking around this in drivers and

    Andrew> a) work out what it's supposed to do

    Andrew> b) document that (Documentation/DocBook/deviceiobook.tmpl
    Andrew> or code comment or whatever)

    Andrew> c) tell arch maintainers?

Yes, I agree that's a good plan, especially the documentation part.
However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h 
is legitimate: the driver uses __raw_writeq() when it exists and uses
two __raw_writel()s properly serialized with a device-specific lock to
get exactly the atomicity it needs on 32-bit archs.

It's an open question what drivers that don't actually need atomicity
but just want a convenient way to write 64 bits at time should do.

 - R.


From akpm at osdl.org  Fri Sep  1 13:04:44 2006
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 1 Sep 2006 13:04:44 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <ada8xl3ics4.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
Message-ID: <20060901130444.48f19457.akpm@osdl.org>

On Fri, 01 Sep 2006 12:53:47 -0700
Roland Dreier <rdreier at cisco.com> wrote:

>     Roland> My understanding is that __raw_writeq() is like writeq()
>     Roland> except not strongly ordered and without the byte-swap on
>     Roland> big-endian architectures.  The __raw_writeX() variants are
>     Roland> convenient to avoid having to write inefficient code like
>     Roland> writel(swab32(foo), ...) when talking to a PCI device that
>     Roland> wants big-endian data.  Without the raw variant, you end
>     Roland> up with a double swap on big-endian architectures.
> 
> Oh, I left one other thing out: writeq() and __raw_writeq() shold be
> atomic in the sense that no other transactions should be able to get
> onto the IO bus in the middle -- so implementing writeq() as two
> writel()s in a row is not allowed
> 
>     Andrew> OK.  Can we please stop hacking around this in drivers and
> 
>     Andrew> a) work out what it's supposed to do
> 
>     Andrew> b) document that (Documentation/DocBook/deviceiobook.tmpl
>     Andrew> or code comment or whatever)
> 
>     Andrew> c) tell arch maintainers?
> 
> Yes, I agree that's a good plan, especially the documentation part.
> However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h 
> is legitimate: the driver uses __raw_writeq() when it exists and uses
> two __raw_writel()s properly serialized with a device-specific lock to
> get exactly the atomicity it needs on 32-bit archs.

No, driver-specific workarounds are not legitimate, sorry.

The driver should simply fail to compile on architectures which do not
implement __raw_writeq().

We can speed up the process by sending helpful emails to architecture
maintainers, but they'll notice either way.

Let's fix it once, and in the correct place.

> It's an open question what drivers that don't actually need atomicity
> but just want a convenient way to write 64 bits at time should do.

Well yeah.  We should sort out the design issues before implementing
things ;)


From tom at opengridcomputing.com  Fri Sep  1 13:20:59 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Fri, 01 Sep 2006 15:20:59 -0500
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901130444.48f19457.akpm@osdl.org>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
Message-ID: <1157142059.22301.74.camel@trinity.ogc.int>


So to make sure I understand all this...

The purpose of these services is to provide a platform independent API
for reading and writing 16, 32 and 64b values to MMIO devices. The
rationale for needing these services is that there is currently no
platform independent API for efficiently reading and writing these
values to BE devices on MMIO PCI devices. Examples are the mthca and
amso1100 devices.

Two classes of service are needed, atomic services that are interrupt
safe and services that either don't require atomicity or are called with
a suitable lock already held.

Does the API look something like this?

void mmio_wr_be16(__be16 val, void __iomem *addr);
void mmio_wr_be32(__be32 val, void __iomem *addr);
void mmio_wr_be64(__be64 val, void __iomem *addr);

void mmio_atomic_wr_be16(__be16 val, void __iomem *addr);
void mmio_atomic_wr_be32(__be32 val, void __iomem *addr);
void mmio_atomic_wr_be64(__be64 val, void __iomem *addr);

__be16 mmio_rd_be16(void __iomem *addr);
__be32 mmio_rd_be32(void __iomem *addr);
__be64 mmio_rd_be64(void __iomem *addr);

__be16 mmio_atomic_wr_be16(void __iomem *addr);
__be32 mmio_atomic_wr_be32(void __iomem *addr);
__be64 mmio_atomic_wr_be64(void __iomem *addr);


On Fri, 2006-09-01 at 13:04 -0700, Andrew Morton wrote:
> On Fri, 01 Sep 2006 12:53:47 -0700
> Roland Dreier <rdreier at cisco.com> wrote:
> 
> >     Roland> My understanding is that __raw_writeq() is like writeq()
> >     Roland> except not strongly ordered and without the byte-swap on
> >     Roland> big-endian architectures.  The __raw_writeX() variants are
> >     Roland> convenient to avoid having to write inefficient code like
> >     Roland> writel(swab32(foo), ...) when talking to a PCI device that
> >     Roland> wants big-endian data.  Without the raw variant, you end
> >     Roland> up with a double swap on big-endian architectures.
> > 
> > Oh, I left one other thing out: writeq() and __raw_writeq() shold be
> > atomic in the sense that no other transactions should be able to get
> > onto the IO bus in the middle -- so implementing writeq() as two
> > writel()s in a row is not allowed
> > 
> >     Andrew> OK.  Can we please stop hacking around this in drivers and
> > 
> >     Andrew> a) work out what it's supposed to do
> > 
> >     Andrew> b) document that (Documentation/DocBook/deviceiobook.tmpl
> >     Andrew> or code comment or whatever)
> > 
> >     Andrew> c) tell arch maintainers?
> > 
> > Yes, I agree that's a good plan, especially the documentation part.
> > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h 
> > is legitimate: the driver uses __raw_writeq() when it exists and uses
> > two __raw_writel()s properly serialized with a device-specific lock to
> > get exactly the atomicity it needs on 32-bit archs.
> 
> No, driver-specific workarounds are not legitimate, sorry.
> 
> The driver should simply fail to compile on architectures which do not
> implement __raw_writeq().
> 
> We can speed up the process by sending helpful emails to architecture
> maintainers, but they'll notice either way.
> 
> Let's fix it once, and in the correct place.
> 
> > It's an open question what drivers that don't actually need atomicity
> > but just want a convenient way to write 64 bits at time should do.
> 
> Well yeah.  We should sort out the design issues before implementing
> things ;)
> 


From bos at pathscale.com  Fri Sep  1 13:45:27 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 01 Sep 2006 13:45:27 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <ada8xl3ics4.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
Message-ID: <1157143527.20958.8.camel@chalcedony.pathscale.com>

On Fri, 2006-09-01 at 12:53 -0700, Roland Dreier wrote:

> Yes, I agree that's a good plan, especially the documentation part.
> However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h 
> is legitimate: the driver uses __raw_writeq() when it exists and uses
> two __raw_writel()s properly serialized with a device-specific lock to
> get exactly the atomicity it needs on 32-bit archs.

On the off chance that you might be arguing that mthca_write64 could be
a candidate drop-in for writeq on 32-bit arches:

That approach might work on mthca hardware, but it's not safe in
general.  The ipath driver requires a proper writeq(), for example,
because the hardware will quite legitimately treat 32-bit writes to some
registers as separate accesses, and screw things up royally.

You get atomicity from the perspective of software with this approach,
but you can do exciting and bad things to hardware.

	<b


From rmk+lkml at arm.linux.org.uk  Fri Sep  1 13:43:43 2006
From: rmk+lkml at arm.linux.org.uk (Russell King)
Date: Fri, 1 Sep 2006 21:43:43 +0100
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901130444.48f19457.akpm@osdl.org>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
Message-ID: <20060901204343.GA4979@flint.arm.linux.org.uk>

On Fri, Sep 01, 2006 at 01:04:44PM -0700, Andrew Morton wrote:
> On Fri, 01 Sep 2006 12:53:47 -0700
> Roland Dreier <rdreier at cisco.com> wrote:
> > Yes, I agree that's a good plan, especially the documentation part.
> > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h 
> > is legitimate: the driver uses __raw_writeq() when it exists and uses
> > two __raw_writel()s properly serialized with a device-specific lock to
> > get exactly the atomicity it needs on 32-bit archs.
> 
> No, driver-specific workarounds are not legitimate, sorry.
> 
> The driver should simply fail to compile on architectures which do not
> implement __raw_writeq().

So, what you're basically saying is that on architectures which can _NOT_
implement an atomic __raw_writeq(), certain drivers simply will not be
available?

> We can speed up the process by sending helpful emails to architecture
> maintainers, but they'll notice either way.

I think you're completely wrong in the context of the message you're
replying to - it's talking about an _atomic_ 64-bit write.

Sure, if you want a _non-atomic_ 64-bit write then that's possible,
but many 32-bit architectures can't do a 64-bit atomic IO write and
that isn't something they can "fix".

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core


From rdreier at cisco.com  Fri Sep  1 13:51:32 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 13:51:32 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901130444.48f19457.akpm@osdl.org> (Andrew Morton's
	message of "Fri, 1 Sep 2006 13:04:44 -0700")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
Message-ID: <ada4pvria3v.fsf@cisco.com>

    Andrew> No, driver-specific workarounds are not legitimate, sorry.

    Andrew> The driver should simply fail to compile on architectures
    Andrew> which do not implement __raw_writeq().

But how should i386 (say) implement __raw_writeq()?  As two
__raw_writel()s protected by a spinlock (that serializes all IO
transactions)?  That seems rather ugly.

 - R.


From rdreier at cisco.com  Fri Sep  1 13:54:04 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 13:54:04 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901204343.GA4979@flint.arm.linux.org.uk> (Russell
	King's message of "Fri, 1 Sep 2006 21:43:43 +0100")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
	<20060901204343.GA4979@flint.arm.linux.org.uk>
Message-ID: <adazmdjgvf7.fsf@cisco.com>

    Russell> Sure, if you want a _non-atomic_ 64-bit write then that's
    Russell> possible, but many 32-bit architectures can't do a 64-bit
    Russell> atomic IO write and that isn't something they can "fix".

I agree completely.  And going one step further: if an architecture
cannot implement a 64-bit write atomically, then the precise
serialization that is required is device-specific knowledge that
belongs in the device driver.

(For example, in the mthca case, the only serialization required is
that no writes go to the same page of MMIO space between the two
32-bit halves of the 64-bit write)

 - R.


From rdreier at cisco.com  Fri Sep  1 13:59:26 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 13:59:26 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <1157143527.20958.8.camel@chalcedony.pathscale.com> (Bryan
	O'Sullivan's message of "Fri, 01 Sep 2006 13:45:27 -0700")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<1157143527.20958.8.camel@chalcedony.pathscale.com>
Message-ID: <adaveo7gv69.fsf@cisco.com>

    Roland> Yes, I agree that's a good plan, especially the
    Roland> documentation part.  However I would argue that what's in
    Roland> drivers/infiniband/hw/mthca/mthca_doorbell.h is
    Roland> legitimate: the driver uses __raw_writeq() when it exists
    Roland> and uses two __raw_writel()s properly serialized with a
    Roland> device-specific lock to get exactly the atomicity it needs
    Roland> on 32-bit archs.

    Bryan> On the off chance that you might be arguing that
    Bryan> mthca_write64 could be a candidate drop-in for writeq on
    Bryan> 32-bit arches:

No, quite the opposite.  I'm arguing that the wrappers in mthca do
legitimately belong in a device driver, since they encapsulate
device-specific knowledge about what serialization suffices when an
atomic __raw_writeq() is not available.

    Bryan> That approach might work on mthca hardware, but it's not
    Bryan> safe in general.  The ipath driver requires a proper
    Bryan> writeq(), for example, because the hardware will quite
    Bryan> legitimately treat 32-bit writes to some registers as
    Bryan> separate accesses, and screw things up royally.

Yes, that's an unfortunate feature of the ipath hardware that
apparently makes it impossible to drive on a generic 32-bit architecture.

So perhaps writeq()/__raw_writeq() need to be defined to generate a
single bus cycle to the extent that makes sense.  Which would mean
that it's not possible to implement on all architectures.

 - R.


From bos at pathscale.com  Fri Sep  1 14:01:41 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 01 Sep 2006 14:01:41 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <adazmdjgvf7.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
	<20060901204343.GA4979@flint.arm.linux.org.uk>
	<adazmdjgvf7.fsf@cisco.com>
Message-ID: <1157144501.20958.12.camel@chalcedony.pathscale.com>

On Fri, 2006-09-01 at 13:54 -0700, Roland Dreier wrote:

> I agree completely.  And going one step further: if an architecture
> cannot implement a 64-bit write atomically, then the precise
> serialization that is required is device-specific knowledge that
> belongs in the device driver.

Absolutely.

	<b


From akpm at osdl.org  Fri Sep  1 13:59:11 2006
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 1 Sep 2006 13:59:11 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901204343.GA4979@flint.arm.linux.org.uk>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
	<20060901204343.GA4979@flint.arm.linux.org.uk>
Message-ID: <20060901135911.bc53d89a.akpm@osdl.org>

On Fri, 1 Sep 2006 21:43:43 +0100
Russell King <rmk+lkml at arm.linux.org.uk> wrote:

> On Fri, Sep 01, 2006 at 01:04:44PM -0700, Andrew Morton wrote:
> > On Fri, 01 Sep 2006 12:53:47 -0700
> > Roland Dreier <rdreier at cisco.com> wrote:
> > > Yes, I agree that's a good plan, especially the documentation part.
> > > However I would argue that what's in drivers/infiniband/hw/mthca/mthca_doorbell.h 
> > > is legitimate: the driver uses __raw_writeq() when it exists and uses
> > > two __raw_writel()s properly serialized with a device-specific lock to
> > > get exactly the atomicity it needs on 32-bit archs.
> > 
> > No, driver-specific workarounds are not legitimate, sorry.
> > 
> > The driver should simply fail to compile on architectures which do not
> > implement __raw_writeq().
> 
> So, what you're basically saying is that on architectures which can _NOT_
> implement an atomic __raw_writeq(), certain drivers simply will not be
> available?

If the driver *requires* an atomic __raw_writeq(), then yes.  The driver
cannot work correctly on that machine.

If, however, there is some way in which we can make the hardware work on
that machine (say, with other locking) then we got the __raw_writeq()
interface design (whatever that is) wrong.

IOW, the best way of tackling this is to work out what we're trying to do,
design an interface, then implement it.

Doing funny workarounds within individual drivers isn't the way to address
this.  In fact it's an indication that something is wrong.

> > We can speed up the process by sending helpful emails to architecture
> > maintainers, but they'll notice either way.
> 
> I think you're completely wrong in the context of the message you're
> replying to - it's talking about an _atomic_ 64-bit write.
> 
> Sure, if you want a _non-atomic_ 64-bit write then that's possible,
> but many 32-bit architectures can't do a 64-bit atomic IO write and
> that isn't something they can "fix".

If the hardware/driver absolutely requires that the 64-bit write be atomic
on-the-bus then sure, the fix is to disable that driver on that
architecture in Kconfig.

If, however, the atomicity requirement is a software thing (we need to be
atomic against other CPU reads and writes) then that can be solved with
locking, and we can design APIs for this which can be implemented
efficiently on all architectures.


From bos at serpentine.com  Fri Sep  1 14:03:57 2006
From: bos at serpentine.com (Bryan O'Sullivan)
Date: Fri, 01 Sep 2006 14:03:57 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <adaveo7gv69.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<1157143527.20958.8.camel@chalcedony.pathscale.com>
	<adaveo7gv69.fsf@cisco.com>
Message-ID: <1157144637.20958.15.camel@chalcedony.pathscale.com>

On Fri, 2006-09-01 at 13:59 -0700, Roland Dreier wrote:

> No, quite the opposite.  I'm arguing that the wrappers in mthca do
> legitimately belong in a device driver, since they encapsulate
> device-specific knowledge about what serialization suffices when an
> atomic __raw_writeq() is not available.

Yes, I figured that out from some later messages.  I think we're
violently in agreement, in that case.

	<b


From akpm at osdl.org  Fri Sep  1 14:03:13 2006
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 1 Sep 2006 14:03:13 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <ada4pvria3v.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org> <ada4pvria3v.fsf@cisco.com>
Message-ID: <20060901140313.51cf077b.akpm@osdl.org>

On Fri, 01 Sep 2006 13:51:32 -0700
Roland Dreier <rdreier at cisco.com> wrote:

>     Andrew> No, driver-specific workarounds are not legitimate, sorry.
> 
>     Andrew> The driver should simply fail to compile on architectures
>     Andrew> which do not implement __raw_writeq().
> 
> But how should i386 (say) implement __raw_writeq()?  As two
> __raw_writel()s protected by a spinlock (that serializes all IO
> transactions)?  That seems rather ugly.
> 

If it's a choice between "ugly" and "doesn't work on x86", we'll take
"ugly" ;)


From rdreier at cisco.com  Fri Sep  1 14:05:36 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 14:05:36 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901135911.bc53d89a.akpm@osdl.org> (Andrew Morton's
	message of "Fri, 1 Sep 2006 13:59:11 -0700")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
	<20060901204343.GA4979@flint.arm.linux.org.uk>
	<20060901135911.bc53d89a.akpm@osdl.org>
Message-ID: <adar6yvguvz.fsf@cisco.com>

    Andrew> If the hardware/driver absolutely requires that the 64-bit
    Andrew> write be atomic on-the-bus then sure, the fix is to
    Andrew> disable that driver on that architecture in Kconfig.

    Andrew> If, however, the atomicity requirement is a software thing
    Andrew> (we need to be atomic against other CPU reads and writes)
    Andrew> then that can be solved with locking, and we can design
    Andrew> APIs for this which can be implemented efficiently on all
    Andrew> architectures.

It seems that there are cases of both.  ipath needs actual 64-bit bus
transactions to work properly.  mthca needs to make sure that if
doorbell writes are split into two 32-bit halves, then no other writes
go to the same MMIO page in between the halves.

What do you think the API would look like?  Something along the lines
of mthca_doorbell.h, where we have macros for

DECLARE_WRITEQ_LOCK()
INIT_WRITEQ_LOCK()
GET_WRITEQ_LOCK()

which get stubbed out on architectures where writeq is already atomic,
and then pass the lock into writeq()?

But then you probably need some Kconfig symbol to say if writeq() is
really atomic or just software atomic (for ipath et al to depend on).

 - R.


From robert.j.woodruff at intel.com  Fri Sep  1 14:14:44 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 1 Sep 2006 14:14:44 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C88A825@orsmsx418.amr.corp.intel.com>

Woody wrote,
>ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current).
>ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW.
>ib_uverbs: Unknown symbol hpage_shift      <-----------------------I
think this is the problem

Just a follow up note on this one.
Looks like this is a new bug introduced at RC3, it did not fail at RC2.


From akpm at osdl.org  Fri Sep  1 14:26:06 2006
From: akpm at osdl.org (Andrew Morton)
Date: Fri, 1 Sep 2006 14:26:06 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <adar6yvguvz.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
	<20060901204343.GA4979@flint.arm.linux.org.uk>
	<20060901135911.bc53d89a.akpm@osdl.org> <adar6yvguvz.fsf@cisco.com>
Message-ID: <20060901142606.4f5c1152.akpm@osdl.org>

On Fri, 01 Sep 2006 14:05:36 -0700
Roland Dreier <rdreier at cisco.com> wrote:

>     Andrew> If the hardware/driver absolutely requires that the 64-bit
>     Andrew> write be atomic on-the-bus then sure, the fix is to
>     Andrew> disable that driver on that architecture in Kconfig.
> 
>     Andrew> If, however, the atomicity requirement is a software thing
>     Andrew> (we need to be atomic against other CPU reads and writes)
>     Andrew> then that can be solved with locking, and we can design
>     Andrew> APIs for this which can be implemented efficiently on all
>     Andrew> architectures.
> 
> It seems that there are cases of both.  ipath needs actual 64-bit bus
> transactions to work properly.

If we define __raw_writeq() to be 64-bit-atomic-on-the-bus then an
appropriate solution for ipath would be to call __raw_writeq() directly. 
If the arch cannot implement __raw_write() then build error -> Kconfig fix.

>  mthca needs to make sure that if
> doorbell writes are split into two 32-bit halves, then no other writes
> go to the same MMIO page in between the halves.
> 
> What do you think the API would look like?  Something along the lines
> of mthca_doorbell.h, where we have macros for
> 
> DECLARE_WRITEQ_LOCK()
> INIT_WRITEQ_LOCK()
> GET_WRITEQ_LOCK()
> 
> which get stubbed out on architectures where writeq is already atomic,
> and then pass the lock into writeq()?
> 
> But then you probably need some Kconfig symbol to say if writeq() is
> really atomic or just software atomic (for ipath et al to depend on).
> 

It depends on how many other devices have (or are expected to have)
mthca-like requirements.  If the answer is "very few, maybe none" then
perhaps we don't need to go designing generic interfaces to support such
things.

As for interfaces, umm, something like

#ifdef CONFIG_ARCH_HAS_64BIT_ATOMIC_MMIO_WRITES

struct be64_port {
	void __iomem *addr;
};

static inline void atomic_be64_mmio_write(u64 v, struct be64_port *port)
{
	__raw_writeq(v, port->addr);
}

#define be64_port_init(port, addr)
	port->addr = addr;

#define be64_port_init_external_locking(port, addr, lockp)
	be64_port_init(port, addr)


#else


struct be64_port {
	void __iomem *addr;
	spinlock_t lock;
	spinlock_t *lockp;
};

static inline void atomic_be64_mmio_write(u64 v, struct be64_port *port)
{
	unsigned long flags;
	
	spin_lock_irqsave(port->lockp, flags);
	__raw_writel(...);
	__raw_writel(...);
	spin_unlock_irqrestore(port->lockp, flags);
}

#define be64_port_init(port, addr)
	spin_lock_init(&port->lock);
	port->lockp = &port->lock;
	port->addr = addr;

#define be64_port_init_external_locking(port, addr, lockp)
	port->lockp = lockp;
	port->addr = addr;

#endif

perhaps?

btw, 32-bit mthca_write64() is downright scary from an endianness POV.  I
guess it's right, but I wouldn't label it "obviously correct" ;)


From rjwalsh at pathscale.com  Fri Sep  1 14:33:50 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Fri, 01 Sep 2006 14:33:50 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
In-Reply-To: <44F879F6.8080601@pathscale.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C853D89@orsmsx418.amr.corp.intel.com>
	<44F879F6.8080601@pathscale.com>
Message-ID: <44F8A73E.8050502@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Walsh wrote:
>> Hi all, I installed the RC3 package on my Xeon/Lindenhurst platforms
>> and with the pathscale card I have the following problem
>> when trying to run Intel MPI and NetPipe.
> 
> Actually, I've been trying to run Intel MPI myself, but haven't gotten 
> very far yet.  My attempts die like this:
> 
>    $ mpiexec -n 2 ./mpitest
>    I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so
>    I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma
>    I_MPI: [0] set_up_devices(): will use device: libmpi.rdma.so
>    I_MPI: [0] set_up_devices(): will use DAPL provider: OpenIB-cma
>    [0][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
> create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>    [1][rdma_iba_priv_intel.c:429] error(0x60029): OpenIB-cma: Could not
> create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>    rank 0 in job 2  ib-idev-05_51713   caused collective abort of all ranks
>      exit status of rank 0: return code 254
> 
> dapltest seems to work just fine, so I'm a little confused.  Do you have 
> any insight on what the DAT_INVALID_PARAMETER(DAT_INVALID_ARG6) stuff is 
> referring to?

FWIW: I'm seeing a similar problem with the Intel MPI 3.0 beta release:

  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 2  ib-idev-05_42160   caused collective abort of all ranks
    exit status of rank 1: killed by signal 9
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRPinPfzvnpzTd9fxAQJnzQf+OYOYjZwUpEQ0OtMiKJW94nAEa2okXh7H
LV/WcyH4p8q0dDmzPaEXh1dEwD+DkPWjTb0uh8r+b1Dt1f5jfC98ZXb/2sMqIW4d
93sSIoDWWPN2R2WuGnsvuQcNQBkk7h0HbCBi5vJELPQcXrQAjYPNtRXCPwjXqiGE
qefmsFXlUa+avWXQ+WbXBR+ldaBePvYGwFk+G4SwibgMhzyFwsSCzSc4FGrRvg7u
YLUIehmV2j0snxbgFK1jVCOQ+QPo8dEhR6OcwXEMbJwUqqslnwK16zUCo2IUTTdN
IROQ+kyuecaXfnH0gA2sDIKzGZxkw5zRU1cWN5cq92HPxnhjsCoa/A==
=mS+M
-----END PGP SIGNATURE-----


From sean.hefty at intel.com  Fri Sep  1 15:33:55 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 1 Sep 2006 15:33:55 -0700
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <000001c6cdfe$0f4d0190$e598070a@amr.corp.intel.com>
Message-ID: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com>

This closes a window where address resolution can attach an rdma_cm_id
to a device during destruction of the rdma_cm_id.  This can result in
the rdma_cm_id remaining in the device list after its memory has been
freed.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
I generated this patch off the tip of the for-2.6.19 git branch, so
it applies on top of the iWarp changes.

Also, OF is looking at hosting git repositories.  Once available,
I will publish the patches there.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c54c55a..2964dab 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -279,7 +279,7 @@ static int cma_acquire_dev(struct rdma_i
 	default:
 		return -ENODEV;
 	}
-	mutex_lock(&lock);
+
 	list_for_each_entry(cma_dev, &dev_list, list) {
 		ret = ib_find_cached_gid(cma_dev->device, &gid,
 					 &id_priv->id.port_num, NULL);
@@ -288,7 +288,6 @@ static int cma_acquire_dev(struct rdma_i
 			break;
 		}
 	}
-	mutex_unlock(&lock);
 	return ret;
 }
 
@@ -712,7 +711,9 @@ void rdma_destroy_id(struct rdma_cm_id *
 	state = cma_exch(id_priv, CMA_DESTROYING);
 	cma_cancel_operation(id_priv, state);
 
+	mutex_lock(&lock);
 	if (id_priv->cma_dev) {
+		mutex_unlock(&lock);
 		switch (rdma_node_get_transport(id->device->node_type)) {
 		case RDMA_TRANSPORT_IB:
 			if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
@@ -727,8 +728,8 @@ void rdma_destroy_id(struct rdma_cm_id *
 		}
 		mutex_lock(&lock);
 		cma_detach_from_dev(id_priv);
-		mutex_unlock(&lock);
 	}
+	mutex_unlock(&lock);
 
 	cma_release_port(id_priv);
 	cma_deref_id(id_priv);
@@ -925,7 +926,9 @@ static int cma_req_handler(struct ib_cm_
 	}
 
 	atomic_inc(&conn_id->dev_remove);
+	mutex_lock(&lock);
 	ret = cma_acquire_dev(conn_id);
+	mutex_unlock(&lock);
 	if (ret) {
 		ret = -ENODEV;
 		cma_release_remove(conn_id);
@@ -1097,7 +1100,9 @@ static int iw_conn_req_handler(struct iw
 		goto out;
 	}
 
+	mutex_lock(&lock);
 	ret = cma_acquire_dev(conn_id);
+	mutex_unlock(&lock);
 	if (ret) {
 		cma_release_remove(conn_id);
 		rdma_destroy_id(new_cm_id);
@@ -1507,16 +1512,26 @@ static void addr_handler(int status, str
 	enum rdma_cm_event_type event;
 
 	atomic_inc(&id_priv->dev_remove);
-	if (!id_priv->cma_dev && !status)
+
+	/*
+	 * Grab mutex to block rdma_destroy_id() from removing the device while
+	 * we're trying to acquire it.
+	 */
+	mutex_lock(&lock);
+	if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED)) {
+		mutex_unlock(&lock);
+		goto out;
+	}
+
+	if (!status && !id_priv->cma_dev)
 		status = cma_acquire_dev(id_priv);
+	mutex_unlock(&lock);
 
 	if (status) {
-		if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_BOUND))
+		if (!cma_comp_exch(id_priv, CMA_ADDR_RESOLVED, CMA_ADDR_BOUND))
 			goto out;
 		event = RDMA_CM_EVENT_ADDR_ERROR;
 	} else {
-		if (!cma_comp_exch(id_priv, CMA_ADDR_QUERY, CMA_ADDR_RESOLVED))
-			goto out;
 		memcpy(&id_priv->id.route.addr.src_addr, src_addr,
 		       ip_addr_size(src_addr));
 		event = RDMA_CM_EVENT_ADDR_RESOLVED;
@@ -1740,8 +1755,11 @@ int rdma_bind_addr(struct rdma_cm_id *id
 
 	if (!cma_any_addr(addr)) {
 		ret = rdma_translate_ip(addr, &id->route.addr.dev_addr);
-		if (!ret)
+		if (!ret) {
+			mutex_lock(&lock);
 			ret = cma_acquire_dev(id_priv);
+			mutex_unlock(&lock);
+		}
 		if (ret)
 			goto err;
 	}


From rdreier at cisco.com  Fri Sep  1 15:42:55 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 01 Sep 2006 15:42:55 -0700
Subject: [openib-general] 2.6.18-rc5-mm1:
 drivers/infiniband/hw/amso1100/c2.c compile error
In-Reply-To: <20060901142606.4f5c1152.akpm@osdl.org> (Andrew Morton's
	message of "Fri, 1 Sep 2006 14:26:06 -0700")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060901160023.GB18276@stusta.de>
	<20060901101340.962150cb.akpm@osdl.org> <adak64nij8f.fsf@cisco.com>
	<20060901112312.5ff0dd8d.akpm@osdl.org> <ada8xl3ics4.fsf@cisco.com>
	<20060901130444.48f19457.akpm@osdl.org>
	<20060901204343.GA4979@flint.arm.linux.org.uk>
	<20060901135911.bc53d89a.akpm@osdl.org> <adar6yvguvz.fsf@cisco.com>
	<20060901142606.4f5c1152.akpm@osdl.org>
Message-ID: <adamz9jgqds.fsf@cisco.com>

    Andrew> It depends on how many other devices have (or are expected
    Andrew> to have) mthca-like requirements.  If the answer is "very
    Andrew> few, maybe none" then perhaps we don't need to go
    Andrew> designing generic interfaces to support such things.

I actually don't know of any others -- not that I'm an expert on the
range of devices that exist...

What's your feeling about drivers like amso1100, which don't
particularly care about atomicity, but just want to write a 64-bit
quantity conveniently?  Should we require writeq()/__raw_writeq() for
all archs, and then define CONFIG_ARCH_HAS_64BIT_ATOMIC_MMIO_WRITES as
appropriate?

I see stuff like this is drivers/dma/ioatdma.c:

#if (BITS_PER_LONG == 64)
	ioatdma_chan_write64(ioat_chan, IOAT_CHAINADDR_OFFSET, desc->phys);
#else
	ioatdma_chan_write32(ioat_chan,
	                     IOAT_CHAINADDR_OFFSET_LOW,
	                     (u32) desc->phys);
	ioatdma_chan_write32(ioat_chan, IOAT_CHAINADDR_OFFSET_HIGH, 0);
#endif

and drivers/char/hpet.c:

#ifndef readq
static inline unsigned long long readq(void __iomem *addr)
{
	return readl(addr) | (((unsigned long long)readl(addr + 4)) << 32LL);
}
#endif

#ifndef writeq
static inline void writeq(unsigned long long v, void __iomem *addr)
{
	writel(v & 0xffffffff, addr);
	writel(v >> 32, addr + 4);
}
#endif

and so on...

 - R.


From robert.j.woodruff at intel.com  Fri Sep  1 16:11:31 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 1 Sep 2006 16:11:31 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C88A986@orsmsx418.amr.corp.intel.com>

Robert wrote,
>  $ mpiexec -n 2 ./a.out
>  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
> I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
> I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
>  will use rdma configuration
>  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
>could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>  Hello world: rank 0 of 2 running on ib-idev-05
>  rank 1 in job 2  ib-idev-05_42160   caused collective abort of all
ranks
>    exit status of rank 1: killed by signal 9

Hmm, if you have a debug version of the DAPL library, can you enable
debug messages,
export DAPL_DBG_TYPE=0xffff

That may give us more information. I will also have Arlin take a look
at this when returns on Tues.

woody


From rjwalsh at pathscale.com  Fri Sep  1 16:56:22 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Fri, 01 Sep 2006 16:56:22 -0700
Subject: [openib-general] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C88A986@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C88A986@orsmsx418.amr.corp.intel.com>
Message-ID: <44F8C8A6.8090300@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Hmm, if you have a debug version of the DAPL library, can you enable
> debug messages,
> export DAPL_DBG_TYPE=0xffff
> 
> That may give us more information. I will also have Arlin take a look
> at this when returns on Tues.

I've been using the stuff from OFED, which I don't think is built with
debugging turned on.  I compile up a new version next week with
debugging enabled.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRPjIpvzvnpzTd9fxAQIbsgf/R5Z1jqrhzOITZbILw2eW9rwpxEP0JQJE
AWFjlnLXuj3aD/XjbYLQ13t8IXQSJ8KA6TGHcsRLVZYQqmQoVtyyfcMoZp++eKu9
koK0Ttac39ThHgjY7/EQc57WIVyIHoeDQqaS0Q8Y4P+ZwcVXuJT9TlDkCRQ/EtZW
MljpJIa0XlOxyTXW0hiEMAaeMseumXbl/Sjfg5JDPz6m/d7URX6Q14Izt7PlJUly
bBQUPPcukx1Vpg/3SNc/BGUSoqNa7NnMu48EVdfnG0sHBwCkXgwhFkN1bE4AcxGH
Ndwksxzxz8zccu6D6dg7o/J7yOMLZo67iyAoC6c1mVjqhLeXMuZ0kw==
=VbBn
-----END PGP SIGNATURE-----


From G.Rudd at isu.usyd.edu.au  Fri Sep  1 22:19:49 2006
From: G.Rudd at isu.usyd.edu.au (Greg Rudd)
Date: Sat, 02 Sep 2006 15:19:49 +1000
Subject: [openib-general] Have I got something very wrong here?
Message-ID: <1157174389.29049.61.camel@localhost.localdomain>

Hi all sorry for sounding like a total tool on this list but after
upgrading one of my boxes to RHEL4 rel 4 and installing the
2.6.9-42.0.2.ELhugemem kernel my previous working ib interfaces defined
as ib0 and ib1 that used to be able to talk IP can no longer talk but
yet the interfaces can be brought up ok and starting to get some
interesting messages via the dmesg

ib0: Send unicast ARP to 002b
ib0: Send unicast ARP to 002b
ib0: Send unicast ARP to 002b
ib0: Send unicast ARP to 002b
ib0: Send unicast ARP to 002b
ib1: stopping interface
ib1: downing ib_dev
ib1: Freeing ah e88b1b20
ib1: All sends and receives done.
ip_tables: (C) 2000-2002 Netfilter core team
ib1: bringing up interface
ib1: Created ah e88cf960
ib0: Send unicast ARP to 002b

on bringing up the interfaces this message appears in the dmesg 

ib0: Start path record lookup for
fe80:0000:0000:0000:0013:21ff:ff75:3939
ib0: PathRec LID 0x002a for GID fe80:0000:0000:0000:0013:21ff:ff75:3939
ib0: Created ah e7f26600
ib0: created address handle e84f51c0 for LID 0x002a, SL 0
ib0: Send unicast ARP to 002a
ib0: Start path record lookup for
fe80:0000:0000:0000:0013:21ff:ff75:399d
ib0: PathRec LID 0x002b for GID fe80:0000:0000:0000:0013:21ff:ff75:399d
ib0: Created ah e88806c0
ib0: created address handle e88806a0 for LID 0x002b, SL 0


If I am correct redhat has totally changed the way how the infiniband
drivers work in RHEL4 4 

What it interesting is when you run /etc/init.d/openibd status I get the
following 

./openibd status

  HCA driver loaded

Configured devices:
ib0 ib1

Currently active devices:
ib0
ib1

The following modules are also loaded:

        ib_cm
        ib_sdp

I note that ib_ipoib  does not appear in this list but when you do an
lsmod it appears to be loaded into the kernel as shown below

[root at hippo init.d]# lsmod |grep -i ib
ib_sdp                 35153  0 
rdma_cm                26181  2 ib_sdp,rdma_ucm
ib_addr                11717  1 rdma_cm
ib_local_sa            15565  2 rdma_ucm,rdma_cm
findex                  8001  1 ib_local_sa
ib_mthca              132969  0 
ib_ipoib               50129  0 
ib_uverbs              40169  1 rdma_ucm
ib_umad                18929  0 
ib_ucm                 20549  0 
ib_sa                  17109  3 rdma_cm,ib_local_sa,ib_ipoib
ib_cm                  38444  2 rdma_cm,ib_ucm
ib_mad                 39385  5 ib_local_sa,ib_mthca,ib_umad,ib_sa,ib_cm
ib_core                49985  11
ib_sdp,rdma_cm,ib_local_sa,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad


As to the infiniband rpms installed this is what I have at the moment.

kernel-ib-1.0-1
libmthca-1.0.2-1.i386
libsdp-0.9.0-1.i386
libibverbs-1.0.3-1.i386
libibverbs-utils-1.0.3-1.i386
libibcommon-1.0-1.i386
libibumad-1.0-1.i386
opensm-libs-1.2.0-1.i386
opensm-1.2.0-1.i386
libibcm-0.9.0-1.i386
libibmad-1.0-1.i386
openib-diags-1.0-1.i386
perftest-1.0-1.i386
tvflash-0.9.0-1.i386
srptools-0.0.4-1.i386
librdmacm-0.9.0-1.i386
mstflint-1.0-1.i386

To get the infiniband interfaces to work as they did before under
2-6.9-34 to work here as both ib0 and ib1 am I missing something very
simple in the way of rpms or a kernel module that not has been loaded.
Or is there something else happening here.
 

Extra details

ib0       Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.0.0.1  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:850 errors:0 dropped:0 overruns:0 frame:0
          TX packets:920 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:47600 (46.4 KiB)  TX bytes:55256 (53.9 KiB)

ib1       Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  
          inet addr:10.0.0.2  Bcast:10.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:128 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)


Copy of /etc/modprobe.conf

alias eth0 tg3
alias eth1 tg3
alias bond0 bonding options bonding mode=active-backup  miimon=100
alias scsi_hostadapter cciss
alias eth2 e1000
alias eth3 e1000
alias usb-controller ohci-hcd
alias ib0 ib_ipoib
alias ib1 ib_ipoib
alias net-pf-27 ib_sdp
options ib_ipoib debug_level=2
options hangcheck-timer hangcheck_tick=30 hangcheck_margin=180
remove qla2xxx /sbin/modprobe -r --first-time --ignore-remove qla2xxx &&
{ /sbin
/modprobe -r --ignore-remove qla2xxx_conf; }

alias scsi_hostadapter1 qla2xxx_conf
alias scsi_hostadapter2 qla2xxx
alias scsi_hostadapter3 qla2300
alias scsi_hostadapter4 qla2400
alias scsi_hostadapter5 qla6312
options qla2xxx  ql2xmaxqdepth=16 qlport_down_retry=30
ql2xloginretrycount=16 ql
2xfailover=1 ql2xlbType=1 ql2xautorestore=0x80


ifcfg files in /etc/sysconfig/network-scripts

[root at hippo network-scripts]# more ifcfg-ib0 
DEVICE=ib0
BOOTPROTO=static
BROADCAST=10.255.255.255
IPADDR=10.0.0.1
NETMASK=255.0.0.0
ONBOOT=yes

[root at hippo network-scripts]# more ifcfg-ib1
DEVICE=ib1
BOOTPROTO=static
BROADCAST=10.255.255.255
IPADDR=10.0.0.2
NETMASK=255.0.0.0
ONBOOT=yes


Thanks in advance

-greg


From ogerlitz at voltaire.com  Sat Sep  2 23:57:20 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 03 Sep 2006 09:57:20 +0300
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com>
References: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com>
Message-ID: <44FA7CD0.5000506@voltaire.com>

Sean Hefty wrote:
> This closes a window where address resolution can attach an rdma_cm_id
> to a device during destruction of the rdma_cm_id.  This can result in
> the rdma_cm_id remaining in the device list after its memory has been
> freed.

Sean,

Does this patch protects against the case where an rdma_cm_id is being 
destructed while address resolution related to the **same** id attaches 
it to a device?

If yes, why does someone destroys this id? is it legal to do so?

If not, so your patch protects against the case where one id is being 
destroyed at the same time another id is being attached to the device?

thanks,

Or.


From tziporet at dev.mellanox.co.il  Sun Sep  3 02:13:19 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Sun, 03 Sep 2006 12:13:19 +0300
Subject: [openib-general] Interrupt Threshold Rate equivalent in
 Infiniband NIC
In-Reply-To: <loom.20060831T011859-102@post.gmane.org>
References: <a33d0a9f0608291446v1399d2cdkeb47c9d4d378e7f0@mail.gmail.com>
	<ada1wqznpjt.fsf@cisco.com> <loom.20060831T011859-102@post.gmane.org>
Message-ID: <44FA9CAF.2060608@dev.mellanox.co.il>

Aaron Fabbri wrote:
>
> I agree there is no equivalent to a rate limiter.  I do recall there is (or was)
> an interrupt timer that you can set when you burn the firmware on the Mellanox
> HCAs.  IIRC, it could be used to limit the interrupt rate, but the way it is
> implemented it can add latency.  If you don't care about latency you could try
> it out.  Ask Mellanox for specifics.
>
> Aaron
>
>
>
>   
Its not implemented for memfree devices, and also not fully tested for 
devices with memory.

Tziporet


From christian.guggenberger at rzg.mpg.de  Sun Sep  3 10:53:46 2006
From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger)
Date: Sun, 3 Sep 2006 19:53:46 +0200
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <44F453FC.4070300@dev.mellanox.co.il>
References: <44F453FC.4070300@dev.mellanox.co.il>
Message-ID: <20060903175345.GA6931@daltons.rzg.mpg.de>

Hi,
On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote:
> Hi All,
> In testing today we found that on SLES9 SP3 memory locking as a regular 
> user fails.
has any progress been made regarding this ?

I'd like to ask if the SLES9 port is really mature yet, because I tried
to go a step ahead and tried some trivial MPI code as root, but failed
and got the involved node locked down hard.
Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox
PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest,
OFED-1.1-rc3 and mvapich2-0.9.5.
Attached is a simple MPI code that causes the hard lock. Also attached
are some Kernel BUGs gathered via serial console - they look garbled,
unfortunately.
Note, everything is fine, if I use recent vanilla kernels on that SLES9
machine.

cheers.
 - Christian

-- 
-----------------------------------------------------------
Phone	+49-89-3299-1306
PGP 	http://www.rzg.mpg.de/~ccg/cg-public_key.asc
S/MIME 	http://ra.rzg.mpg.de
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-csrc
Size: 1260 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060903/f1a48542/attachment.c>
-------------- next part --------------
Kernel BUG at page_alloc:853
invalid operand: 0000 [1] SMP
CPU 0
Pid: 7092, comm: hanger Tainted: PF  U   (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[<ffffffff8016ad9e>] <ffffffff8016ad9e>{__free_pages+30}
RSP: 0018:00000100e3fdbbf0  EFLAGS: 00010256
RAX: 0000000000000000 RBX: 00000100e72d1280 RCX: 000001000000d000
RDX: 0000010002a1c4d8 RSI: 0000000000000000 RDI: 0000010002a1c4d8
RBP: 00000100e3fdbcc8 R08: 00000100e3fda000 R09: 0000000000000002
R10: 0000000000000064 R11: 0000000000000001 R12: 0000000000000000
R13: 00000100e72d1280 R14: 000001007e644d90 R15: 00000000000493e0
FS:  0000002a95bb5b00(0000) GS:ffffffff8057dc00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000041b009 CR3: 0000000000101000 CR4: 00000000000006e0
Process hanger (pid: 7092, threadinfo 00000100e3fda000, task 000001007e644d90)
Stack: ffffffff8013bd3f 0000000000000000 ffffffff801395a0 ffffffff803d3400
       0000000000000246 00000000000339b3 0000000000000202 0000010002c1c600
       000000000000006a 0000010002c1d6e0
Call Trace:<ffffffff8013bd3f>{__mmdrop+63} <ffffffff801395a0>{thread_return+108}
       <ffffffff801467b0>{process_timeout+0} <ffffffff80147376>{schedule_timeout+246}
       <ffffffff801467b0>{process_timeout+0} <ffffffffa017f460>{:ib_mthca:mthca_cmd_wait+448}
       <ffffffff80135cd0>{default_wake_function+0} <ffffffff80135cd0>{default_wake_function+0}
       <ffffffffa017f622>{:ib_mthca:mthca_cmd_box+66} <ffffffffa017fd59>{:ib_mthca:mthca_HW2SW_MPT+57}
       <ffffffffa0189423>{:ib_mthca:mthca_free_mr+67} <ffffffffa019014f>{:ib_mthca:mthca_dereg_mr+15}
       <ffffffffa0149e3a>{:ib_core:ib_dereg_mr+26} <ffffffffa01e5543>{:ib_uverbs:ib_uverbs_close+611}
       <ffffffff8018e332>{__fput+98} <ffffffff80189ffe>{filp_close+126}
       <ffffffff8018a105>{sys_close+229} <ffffffff801106b4>{system_call+124}


Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP <ffffffff8016ad9e>{__free_pages+30} RSP <00000100e3fdbbf0>
 ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at page_alloc:853
invalid operand: 0000 [2] SMP
CPU 1
Pid: 1, comm: init Tainted: PF  U   (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[<ffffffff8016ad9e>] <ffffffff8016ad9e>{__free_pages+30}
RSP: 0018:000001007ff81c80  EFLAGS: 00010256
RAX: 0000000000000000 RBX: 000001007e1e4980 RCX: 0000010080000000
RDX: 00000100815b6068 RSI: 0000000000000000 RDI: 00000100815b6068
RBP: 000001007ff81d58 R08: 000001007ff80000 R09: 0000000000000013
R10: 00000000000493e0 R11: 0000000000002710 R12: 0000000000000001
R13: 000001007e1e4980 R14: 00000100e7f3f2c0 R15: 00000000000493e0
FS:  0000002a95bb5b00(0000) GS:ffffffff8057dc80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000041b009 CR3: 000000007ff82000 CR4: 00000000000006e0
Process init (pid: 1, threadinfo 000001007ff80000, task 00000100e7f3f2c0)
Stack: ffffffff8013bd3f 0000000000000040 ffffffff801395a0 00000100e7f3e9a0
       000000d07f8a1580 0000000000000246 0000000000000001 00000100816f5580
       000000010000007d 00000100816f6660
Call Trace:<ffffffff8013bd3f>{__mmdrop+63} <ffffffff801395a0>{thread_return+108}
       <ffffffff80147376>{schedule_timeout+246} <ffffffff801467b0>{process_timeout+0}
       <ffffffff801a3f61>{do_select+1105} <ffffffff801a35a0>{__pollwait+0}
       <ffffffff801a4366>{sys_select+902} <ffffffff801106b4>{system_call+124}


Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP <ffffffff8016ad9e>{__free_pages+30} RSP <000001007ff81c80>
 b-<-0>-K--er--ne--l - p[ancuict : hAertte em] pt-e--d --to-- k-i- ll[p ileniatse!
  ite here B] ad-- p--a-ge-- s--ta
roK aert nferl eBe_UhG otat_c poaldge_p_aaglle oc(:in85 p3
0 ceinssv al'hidan ogeper'ra, ndpa: ge00 00000 [0301] 008SM1P5b 6
 68)CP
U f0 la<gs4>:0
x0P50id00:0 58025 m9,ap cpionmmg:: 00kl00og00d 00Ta00in00te0d00: 0 PFma  ppU ed  :(0 2.co6.un5-t:7.0 2p76ri-svampte S:0LxES009_00SP003_00BR
ANBCHac-2kt00r6ac07e:24
104
l3C1)al
t_RTrIPac: e:00<10ff:[ff<ffffffff8ff01ff6a806a168>ad{b9ead>]_p ag<ef+f1f2f0f}ff f80<f16ffadff9eff>{f8__0f16reaae_7fpa>{gefrs+ee30_h}o
  cRolSPd_: pa00ge18+1:0403}00<014>00 e4
e87  d4  0    E FL<fAGffS:ff 0f0ff0180021356bd
3fR>{AX__: mm0d0r0o00p+006300}00<04>00 0<0f RffBXff: ff00f800001310950ea072>{d1th28r0ea Rd_CXre: tu0r00n+0010108}000 00
  0  0
se  RD X:< 04>00<f00ff10ff00ff2af81c014da887 R41SI>{: dp00ut00+30300}00 00<00ff00ff0 ffRDfIf8: 01008900ff0e10>{00fi2alp1c_c4dlo8
10+1RB26P:} 0<400> 00
ff0e  4 e8  7e  18 <R0ff8:ff f00ff00f80101080ea14e0586>{00s0ys R_c09lo: se00+022009}000 00<0ff00ff01ff3
0080R1110:07 01e00>0{s00ys00re00t_04c9ar3eef0 ulR+1113: }0<004>00 0
R10 00  02  71 0
  2:Tr 0yi00ng00 0to00 f00ix00 i00t 00up
, Rbu13t : a0 r00eb00oo10t 0eis72 dne12ed80ed R
02:ha 0ng00er0[01700093e4]:1d sf4egb0f auR1lt5: a 0t 0000000000000200a904579381e03
0  FrSip:  0 0000000000202a9a9575889134b0200 r(s00p 0000) 00GS00:f7fffbfffffffff0f808 5e7drrc0or0( 1004
 0) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000041b009 CR3: 0000000000101000 CR4: 00000000000006e0
Process klogd (pid: 5259, threadinfo 00000100e4e86000, task 00000100e41df4b0)
Stack: ffffffff8013bd3f 0000000000008040 ffffffff801395a0 00000100e395f5b0
       0000000000000002 0000002a9556c010 000000000003ffff 0000000000040000
       000000009566b1c0 0000010002c1d6e0
Call Trace:<ffffffff8013bd3f>{__mmdrop+63} <ffffffff801395a0>{thread_return+108}
       <ffffffff8018d5bd>{do_sync_write+173} <ffffffff8013ea60>{do_syslog+384}
       <ffffffff8013d430>{autoremove_wake_function+0} <ffffffff8013d430>{autoremove_wake_function+0}
       <ffffffff801c9022>{kmsg_read+66} <ffffffff8018db84>{vfs_read+244}
       <ffffffff8018dddd>{sys_read+157} <ffffffff801106b4>{system_call+124}


Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP <ffffffff8016ad9e>{__free_pages+30} RSP <00000100e4e87d40>
 ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at page_alloc:853
 <0>invalid operand: 0000 [4] SMP
CPU 1
Pid: 7091, comm: python2.3 Tainted: PF  U B (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[<ffffffff8016ad9e>] <ffffffff8016ad9e>{__free_pages+30}
RSP: 0000:00000100e32c3c80  EFLAGS: 00010256
RAX: 0000000000000000 RBX: 000001007e1e4980 RCX: 0000010080000000
RDX: 00000100815b6068 RSI: 0000000000000000 RDI: 00000100815b6068
RBP: 00000100e32c3d58 R08: 00000100e32c2000 R09: 0000000000000013
R10: 00000000000493e0 R11: 0000000000002710 R12: 0000000000000001
R13: 000001007e1e4980 R14: 000001007ec47590 R15: 00000000000493e0
FS:  0000002a96202320(0000) GS:ffffffff8057dc80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000002a95781302 CR3: 000000007ff82000 CR4: 00000000000006e0
Process python2.3 (pid: 7091, threadinfo 00000100e32c2000, task 000001007ec47590)
Stack: ffffffff8013bd3f 0000000000000504 ffffffff801395a0 000001007e7cedb0
       0000010077509c80 0000000000000256 0000000080004380 00000100816f5580
       000000010000007d 00000100816f6660
Call Trace:<ffffffff8013bd3f>{__mmdrop+63} <ffffffff801395a0>{thread_return+108}
       <ffffffff80147376>{schedule_timeout+246} <ffffffff801467b0>{process_timeout+0}
       <ffffffff801a3f61>{do_select+1105} <ffffffff802e4dc6>{sys_sendto+246}
       <ffffffff801a35a0>{__pollwait+0} <ffffffff801a4366>{sys_select+902}
       <ffffffff801106b4>{system_call+124}

Code: 0f 0b f4 8b 38 80 ff ff ff ff 55 03 66 66 90 66 66 90 f0 83
RIP <ffffffff8016ad9e>{__free_pages+30} RSP <00000100e32c3c80>
 <1>Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
<ffffffff80137703>{find_busiest_group+659}
PML4 e3371067 PGD e3374067 PMD 0
Oops: 0000 [5] SMP
CPU 1
Pid: 7091, comm: python2.3 Tainted: PF  U B (2.6.5-7.276-smp SLES9_SP3_BRANCH-20060724104531)
RIP: 0010:[<ffffffff80137703>] <ffffffff80137703>{find_busiest_group+659}
RSP: 0000:00000100e7e07df0  EFLAGS: 00010006
RAX: 00000100e7e07eb8 RBX: 0000000000000000 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000040
RBP: 00000100e7e07e90 R08: 0000000000000040 R09: ffffffff805c3200
R10: 0000000000000064 R11: 00000000000002ff R12: 00000000000002ff
R13: ffffffff804aa7a0 R14: 0000000000000001 R15: 0000000000000000
FS:  0000002a96202320(0000) GS:ffffffff8057dc80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 000000007ff82000 CR4: 00000000000006e0
Process python2.3 (pid: 7091, threadinfo 00000100e32c2000, task 000001007ec47590)
Stack: 00000100e7e07e50 0000000000000000 0000000000000001 0000000000000080
       0000000000000000 0000000000017f80 0000000000000000 ffffffff804aa780
       0000000102ebfb80 00000100e7e07eb8
Call Trace:<IRQ> <ffffffff8013a63c>{rebalance_tick+460} <ffffffff8011d674>{smp_apic_timer_interrupt+52}
       <ffffffff80110e27>{apic_timer_interrupt+99} <ffffffff8011c93f>{smp_stop_cpu+31}
       <ffffffff8011c949>{smp_really_stop_cpu+9} <ffffffff8011c8b0>{smp_call_function_interrupt+64}
       <ffffffff80110dbf>{call_function_interrupt+99}  <EOI> <ffffffff80111bf3>{oops_end+35}
       <ffffffff80111be5>{oops_end+21} <ffffffff801124fb>{die+59}
       <ffffffff80112d21>{do_invalid_op+145} <ffffffff8016ad9e>{__free_pages+30}
       <ffffffff8031d197>{tcp_transmit_skb+1479} <ffffffff80110f79>{error_exit+0}
       <ffffffff8016ad9e>{__free_pages+30} <ffffffff8013bd3f>{__mmdrop+63}
       <ffffffff801395a0>{thread_return+108} <ffffffff80147376>{schedule_timeout+246}
       <ffffffff801467b0>{process_timeout+0} <ffffffff801a3f61>{do_select+1105}
       <ffffffff802e4dc6>{sys_sendto+246} <ffffffff801a35a0>{__pollwait+0}
       <ffffffff801a4366>{sys_select+902} <ffffffff801106b4>{system_call+124}


Code: 48 8b 43 18 48 39 c8 48 0f 47 c1 48 0f af d0 48 c1 ea 07 48
RIP <ffffffff80137703>{find_busiest_group+659} RSP <00000100e7e07df0>
CR2: 0000000000000018
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5594 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060903/f1a48542/attachment.bin>

From sean.hefty at intel.com  Sun Sep  3 20:30:20 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Sun, 3 Sep 2006 20:30:20 -0700
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <44FA7CD0.5000506@voltaire.com>
Message-ID: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com>

>Does this patch protects against the case where an rdma_cm_id is being
>destructed while address resolution related to the **same** id attaches
>it to a device?
>
>If yes, why does someone destroys this id? is it legal to do so?

Yes - this protects against the user destroying the id while that same id is
being attached to a device.  This is legal.  The user may want to cancel address
resolution by destroying the rdma_cm_id.

The issue is that address resolution is asynchronous, with device attachment
occurring in the address resolution callback handler.  The user isn't aware that
the callback handler has been invoked, and may attempt to destroy the rdma_cm_id
when this occurs.

- Sean


From ogerlitz at voltaire.com  Sun Sep  3 23:11:00 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 04 Sep 2006 09:11:00 +0300
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com>
References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com>
Message-ID: <44FBC374.8040709@voltaire.com>

Sean Hefty wrote:
>> Does this patch protects against the case where an rdma_cm_id is being
>> destructed while address resolution related to the **same** id attaches
>> it to a device?
>>
>> If yes, why does someone destroys this id? is it legal to do so?
> 
> Yes - this protects against the user destroying the id while that same id is
> being attached to a device.  This is legal.  The user may want to cancel address
> resolution by destroying the rdma_cm_id.

ok, thanks for clarifying that, is cancellation allowed only for address 
resolution or also for route resolving and/or CM calls? also how about 
documenting this?

Or.

diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index 402c63d..b9e22c8 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -117,6 +117,14 @@ struct rdma_cm_id {
  struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
  				  void *context, enum rdma_port_space ps);

+/**
+ * rdma_destroy_id - Destroys an RDMA identifier.
+ *
+ * @id: RDMA identifier.
+ *
+ * Note: calling this function has the effect of canceling in-flight
+ * asynchronous operations associated with the id.
+ */
  void rdma_destroy_id(struct rdma_cm_id *id);

  /**


From ogerlitz at voltaire.com  Mon Sep  4 02:00:09 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 04 Sep 2006 12:00:09 +0300
Subject: [openib-general] [PATCH 0/4] Dispatch communication related
 events to the IB CM
In-Reply-To: <aday7tf19t0.fsf@cisco.com>
References: <000001c6ad0d$e483a290$e598070a@amr.corp.intel.com>
	<44EC8F10.5050806@ichips.intel.com> <aday7tf19t0.fsf@cisco.com>
Message-ID: <44FBEB19.3010606@voltaire.com>

Roland Dreier wrote:
>     Sean> This patch set appears to be the preferred approach.  Any
>     Sean> objection to committing this?
> 
> It's unfortunate that we have to add a special-case event hook for the
> CM, but I guess the iWARP CM changes are so ugly anyway it doesn't
> matter much.  So I think committing this is OK.

Hi Sean,

My thinking is that the thing needs to be committed somewhere or at 
least please resubmit to the list the version you are intending to merge.

We will be able to test it with iser target running on gen2 stack and 
provide further feedback. I guess testing it can be carried also over 
SDP under a high rate connection open/close test, or with what ever 
CM/CMA test does supports reconnecting, the data/before/rtu race happens 
a lot and the code would be well exercised.

You will then be able to push it for 2.6.19

what do you think?

Or.


From tziporet at mellanox.co.il  Mon Sep  4 02:29:55 2006
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 4 Sep 2006 12:29:55 +0300
Subject: [openib-general] [openfabrics-ewg]  OFED 1.1-rc3 is ready
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7837@mtlexch01.mtl.com>

Thanks,
We found the problem it caused by the huge pages support we added in
RC3.
Was fixed and will be in RC4

Tziporet

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Woodruff,
Robert J
Sent: Saturday, September 02, 2006 12:15 AM
To: Woodruff, Robert J; Tziporet Koren; EWG
Cc: OPENIB
Subject: Re: [openfabrics-ewg] [openib-general] OFED 1.1-rc3 is ready

Woody wrote,
>ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current).
>ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW.
>ib_uverbs: Unknown symbol hpage_shift      <-----------------------I
think this is the problem

Just a follow up note on this one.
Looks like this is a new bug introduced at RC3, it did not fail at RC2.

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From ogerlitz at voltaire.com  Mon Sep  4 03:54:26 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 4 Sep 2006 13:54:26 +0300 (IDT)
Subject: [openib-general] IPoIB fails attaching QP to mcast group
Message-ID: <Pine.LNX.4.64.0609041347130.13095@zuben>

Michael, Roland,

Any idea what can cause the below, what actually is the error i am running into here?

ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10.
ib0: failed to modify QP, ret = -22
ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff

This is kernel.org system (netdev git) with PCI X HCA, some details below.

Or.

# uname -a

Linux dill 2.6.18-rc4-gdac2b3d3-dirty #2 SMP Thu Aug 24 13:21:58 IDT 2006 x86_64 x86_64 x86_64 GNU/Linux

# cat /sys/class/infiniband/mthca0/*

3.4.0
MT23108
a1
0008:f104:0396:51dc
1: CA
0008:f104:0396:51df

# cat /sys/module/ib_mthca/parameters/*
0
0
0
0
0


From mst at mellanox.co.il  Mon Sep  4 04:32:03 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 4 Sep 2006 14:32:03 +0300
Subject: [openib-general] IPoIB fails attaching QP to mcast group
In-Reply-To: <Pine.LNX.4.64.0609041347130.13095@zuben>
References: <Pine.LNX.4.64.0609041347130.13095@zuben>
Message-ID: <20060904113203.GM3440@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: [openib-general] IPoIB fails attaching QP to mcast group
> 
> Michael, Roland,
> 
> Any idea what can cause the below, what actually is the error i am running into here?
> 
> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
> ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10.
> ib0: failed to modify QP, ret = -22
> ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
> 
> This is kernel.org system (netdev git) with PCI X HCA, some details below.
> 
> Or.
> 
> # uname -a
> 
> Linux dill 2.6.18-rc4-gdac2b3d3-dirty #2 SMP Thu Aug 24 13:21:58 IDT 2006 x86_64 x86_64 x86_64 GNU/Linux
> 
> # cat /sys/class/infiniband/mthca0/*
> 
> 3.4.0
> MT23108
> a1
> 0008:f104:0396:51dc
> 1: CA
> 0008:f104:0396:51df
> 
> # cat /sys/module/ib_mthca/parameters/*
> 0
> 0
> 0
> 0
> 0

Looks like the QP is in error state, so modify QP fails.
Is this at all reproducible?
If so could you try with latest firmware please?

-- 
MST


From ogerlitz at voltaire.com  Mon Sep  4 05:45:09 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 04 Sep 2006 15:45:09 +0300
Subject: [openib-general] IPoIB fails attaching QP to mcast group
In-Reply-To: <20060904113203.GM3440@mellanox.co.il>
References: <Pine.LNX.4.64.0609041347130.13095@zuben>
	<20060904113203.GM3440@mellanox.co.il>
Message-ID: <44FC1FD5.3030008@voltaire.com>

Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:

>> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
>> ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10.
>> ib0: failed to modify QP, ret = -22
>> ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff

> Looks like the QP is in error state, so modify QP fails.
> Is this at all reproducible?
> If so could you try with latest firmware please?

I am not following you, do you claim that the SW (IPoIB/MTHCA) consider 
the QP to be in RTS but the FW/HW say that this QP is actually in error 
state?

This happened today in endless loop on a system which i have played with 
its IB link, specifically, i also saw the "recv port errors" counter was 
  getting incremented. Once i have stopped/reloaded ipoib it does not 
happen any more.

Or.


From mst at mellanox.co.il  Mon Sep  4 05:48:31 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 4 Sep 2006 15:48:31 +0300
Subject: [openib-general] IPoIB fails attaching QP to mcast group
In-Reply-To: <44FC1FD5.3030008@voltaire.com>
References: <44FC1FD5.3030008@voltaire.com>
Message-ID: <20060904124831.GA28926@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [openib-general] IPoIB fails attaching QP to mcast group
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> 
> >> ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
> >> ib_mthca 0000:02:00.0: modify QP 3->3 returned status 10.
> >> ib0: failed to modify QP, ret = -22
> >> ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
> 
> > Looks like the QP is in error state, so modify QP fails.
> > Is this at all reproducible?
> > If so could you try with latest firmware please?
> 
> I am not following you, do you claim that the SW (IPoIB/MTHCA) consider 
> the QP to be in RTS but the FW/HW say that this QP is actually in error 
> state?

Seems like this.

> This happened today in endless loop on a system which i have played with 
> its IB link, specifically, i also saw the "recv port errors" counter was 
>   getting incremented. Once i have stopped/reloaded ipoib it does not 
> happen any more.
> 
> Or.
> 
> 

-- 
MST


From johnt1johnt2 at gmail.com  Mon Sep  4 05:56:46 2006
From: johnt1johnt2 at gmail.com (john t)
Date: Mon, 4 Sep 2006 18:26:46 +0530
Subject: [openib-general] MPI Brodcast doubt
Message-ID: <a94efc20609040556h51ad5b60i91219bc5ef39855f@mail.gmail.com>

Hi,

I have 3 nodes connected via IB as shown below:

node1 ---> switch1 ---> node2
                    |----------> node3

If node1 sends a brodcast message to node2 and node3, I want to know if the
message is delivered to the switch twice (first time for node2 and second
time for node3) or just once (where switch will know by looking at some
headers or so that its a brodcast message and will send it on all the
outgoing ports) ?

Regards,
John T.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060904/cd3f9a62/attachment.html>

From tziporet at dev.mellanox.co.il  Mon Sep  4 07:35:00 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 04 Sep 2006 17:35:00 +0300
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <20060903175345.GA6931@daltons.rzg.mpg.de>
References: <44F453FC.4070300@dev.mellanox.co.il>
	<20060903175345.GA6931@daltons.rzg.mpg.de>
Message-ID: <44FC3994.8020509@dev.mellanox.co.il>

Christian Guggenberger wrote:
> Hi,
> On Tue, Aug 29, 2006 at 05:49:32PM +0300, Tziporet Koren wrote:
>   
>> Hi All,
>> In testing today we found that on SLES9 SP3 memory locking as a regular 
>> user fails.
>>     
> has any progress been made regarding this ?
>
> I'd like to ask if the SLES9 port is really mature yet, because I tried
> to go a step ahead and tried some trivial MPI code as root, but failed
> and got the involved node locked down hard.
> Testing was done on a single x86_64 SMP node (2 CPUs), with a Mellanox
> PCI-X HCA (23108, FW-3.5.0). Software Environment SLES9 SP3-latest,
> OFED-1.1-rc3 and mvapich2-0.9.5.
> Attached is a simple MPI code that causes the hard lock. Also attached
> are some Kernel BUGs gathered via serial console - they look garbled,
> unfortunately.
> Note, everything is fine, if I use recent vanilla kernels on that SLES9
> machine.
>
> cheers.
>  - Christian
>   
Hi,
We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
We tried to run here the test you attached on mvapich1 but have not seen 
this failure.
Can you try to reproduce with mvapich1 version?
If not please send us detailed instructions how to reproduce with 
mvapich2 (where to take sources, compile, etc.)
BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853

We couldn't find it.
Which kernel version are you using? We use here 2.6.5-7.244-smp.

Tziporet & Eli


From christian.guggenberger at rzg.mpg.de  Mon Sep  4 07:44:18 2006
From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger)
Date: Mon, 4 Sep 2006 16:44:18 +0200
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <44FC3994.8020509@dev.mellanox.co.il>
References: <44F453FC.4070300@dev.mellanox.co.il>
	<20060903175345.GA6931@daltons.rzg.mpg.de>
	<44FC3994.8020509@dev.mellanox.co.il>
Message-ID: <20060904144417.GD7576@daltons.rzg.mpg.de>

Hi,

> >Attached is a simple MPI code that causes the hard lock. Also attached
> >are some Kernel BUGs gathered via serial console - they look garbled,
> >unfortunately.
> >Note, everything is fine, if I use recent vanilla kernels on that SLES9
> >machine.
> >
> >cheers.
> > - Christian
> >  
> Hi,
> We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
> We tried to run here the test you attached on mvapich1 but have not seen 
> this failure.
> Can you try to reproduce with mvapich1 version?

is it also okay if I tried with plain mvapich1 from OSU ?

> If not please send us detailed instructions how to reproduce with 
> mvapich2 (where to take sources, compile, etc.)
> BTW when searching the SLES9 sources for the: Kernel BUG at page_alloc:853
> 
> We couldn't find it.
> Which kernel version are you using? We use here 2.6.5-7.244-smp.
> 
this is with 2.6.5-7.276-smp

cheers.
 - Christian

-- 
-----------------------------------------------------------
Phone	+49-89-3299-1306
PGP 	http://www.rzg.mpg.de/~ccg/cg-public_key.asc
S/MIME 	http://ra.rzg.mpg.de
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5594 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060904/8274a095/attachment.bin>

From tziporet at dev.mellanox.co.il  Mon Sep  4 08:08:27 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 04 Sep 2006 18:08:27 +0300
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <20060904144417.GD7576@daltons.rzg.mpg.de>
References: <44F453FC.4070300@dev.mellanox.co.il>
	<20060903175345.GA6931@daltons.rzg.mpg.de>
	<44FC3994.8020509@dev.mellanox.co.il>
	<20060904144417.GD7576@daltons.rzg.mpg.de>
Message-ID: <44FC416B.6020409@dev.mellanox.co.il>

Christian Guggenberger wrote:
>> Hi,
>> We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
>> We tried to run here the test you attached on mvapich1 but have not seen 
>> this failure.
>> Can you try to reproduce with mvapich1 version?
>>     
>
> is it also okay if I tried with plain mvapich1 from OSU ?
I guess yes, although we use the one that comes with OFED.
>>     
> this is with 2.6.5-7.276-smp
>
>
>   
I'll see if we can update our kernel version.

Tziporet


From christian.guggenberger at rzg.mpg.de  Mon Sep  4 08:24:49 2006
From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger)
Date: Mon, 4 Sep 2006 17:24:49 +0200
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <44FC416B.6020409@dev.mellanox.co.il>
References: <44F453FC.4070300@dev.mellanox.co.il>
	<20060903175345.GA6931@daltons.rzg.mpg.de>
	<44FC3994.8020509@dev.mellanox.co.il>
	<20060904144417.GD7576@daltons.rzg.mpg.de>
	<44FC416B.6020409@dev.mellanox.co.il>
Message-ID: <20060904152449.GF7576@daltons.rzg.mpg.de>

> >>We test here SLES9 but with mvapich1 library 0.9.7 version from OFED.
> >>We tried to run here the test you attached on mvapich1 but have not seen 
> >>this failure.
> >>Can you try to reproduce with mvapich1 version?
> >>    
> >
> >is it also okay if I tried with plain mvapich1 from OSU ?
> I guess yes, although we use the one that comes with OFED.

hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not
reproducible. Using mvapich2-0.9.5 it happens each time...

cheers.
 - Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5594 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060904/fde8e445/attachment.bin>

From bunk at stusta.de  Mon Sep  4 10:03:50 2006
From: bunk at stusta.de (Adrian Bunk)
Date: Mon, 4 Sep 2006 19:03:50 +0200
Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/:
	possible cleanups
In-Reply-To: <20060901015818.42767813.akpm@osdl.org>
References: <20060901015818.42767813.akpm@osdl.org>
Message-ID: <20060904170350.GR4416@stusta.de>

On Fri, Sep 01, 2006 at 01:58:18AM -0700, Andrew Morton wrote:
>...
> Changes since 2.6.18-rc4-mm3:
>...
>  git-infiniband.patch
>...
>  git trees.
>...

This patch contains the following possible cleanups:
- make the following needlessly global functions static:
  - c2_ae.c: to_qp_state_str()
  - c2_cq.c: c2_cq_get()
  - c2_cq.c: c2_cq_put()
  - c2_qp.c: to_ib_state()
  - c2_qp.c: to_ib_state_str()
  - c2_rnic.c: c2_rnic_query()
- #if 0 the following unused global function:
  - c2_mq.c: c2_mq_count()

Signed-off-by: Adrian Bunk <bunk at stusta.de>

---

 drivers/infiniband/hw/amso1100/c2.h      |    1 -
 drivers/infiniband/hw/amso1100/c2_ae.c   |    2 +-
 drivers/infiniband/hw/amso1100/c2_cq.c   |    4 ++--
 drivers/infiniband/hw/amso1100/c2_mq.c   |    3 ++-
 drivers/infiniband/hw/amso1100/c2_mq.h   |    1 -
 drivers/infiniband/hw/amso1100/c2_qp.c   |    4 ++--
 drivers/infiniband/hw/amso1100/c2_rnic.c |    3 +--
 7 files changed, 8 insertions(+), 10 deletions(-)

--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_ae.c.old	2006-09-01 21:02:16.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_ae.c	2006-09-01 21:02:23.000000000 +0200
@@ -125,7 +125,7 @@
 	return event_str[event];
 }
 
-const char *to_qp_state_str(int state)
+static const char *to_qp_state_str(int state)
 {
 	switch (state) {
 	case C2_QP_STATE_IDLE:
--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_cq.c.old	2006-09-01 21:02:45.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_cq.c	2006-09-01 21:03:06.000000000 +0200
@@ -41,7 +41,7 @@
 
 #define C2_CQ_MSG_SIZE ((sizeof(struct c2wr_ce) + 32-1) & ~(32-1))
 
-struct c2_cq *c2_cq_get(struct c2_dev *c2dev, int cqn)
+static struct c2_cq *c2_cq_get(struct c2_dev *c2dev, int cqn)
 {
 	struct c2_cq *cq;
 	unsigned long flags;
@@ -57,7 +57,7 @@
 	return cq;
 }
 
-void c2_cq_put(struct c2_cq *cq)
+static void c2_cq_put(struct c2_cq *cq)
 {
 	if (atomic_dec_and_test(&cq->refcount))
 		wake_up(&cq->wait);
--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.h.old	2006-09-01 21:03:23.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.h	2006-09-01 21:03:30.000000000 +0200
@@ -98,7 +98,6 @@
 extern void c2_mq_produce(struct c2_mq *q);
 extern void *c2_mq_consume(struct c2_mq *q);
 extern void c2_mq_free(struct c2_mq *q);
-extern u32 c2_mq_count(struct c2_mq *q);
 extern void c2_mq_req_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size,
 		       u8 __iomem *pool_start, u16 __iomem *peer, u32 type);
 extern void c2_mq_rep_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size,
--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.c.old	2006-09-01 21:03:37.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_mq.c	2006-09-01 21:03:49.000000000 +0200
@@ -121,7 +121,7 @@
 	}
 }
 
-
+#if 0
 u32 c2_mq_count(struct c2_mq *q)
 {
 	s32 count;
@@ -138,6 +138,7 @@
 
 	return (u32) count;
 }
+#endif  /*  0  */
 
 void c2_mq_req_init(struct c2_mq *q, u32 index, u32 q_size, u32 msg_size,
 		    u8 __iomem *pool_start, u16 __iomem *peer, u32 type)
--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_qp.c.old	2006-09-01 21:04:06.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_qp.c	2006-09-01 21:04:22.000000000 +0200
@@ -75,7 +75,7 @@
 	}
 }
 
-int to_ib_state(enum c2_qp_state c2_state)
+static int to_ib_state(enum c2_qp_state c2_state)
 {
 	switch (c2_state) {
 	case C2_QP_STATE_IDLE:
@@ -95,7 +95,7 @@
 	}
 }
 
-const char *to_ib_state_str(int ib_state)
+static const char *to_ib_state_str(int ib_state)
 {
 	static const char *state_str[] = {
 		"IB_QPS_RESET",
--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.h.old	2006-09-01 21:04:49.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2.h	2006-09-01 21:04:54.000000000 +0200
@@ -485,7 +485,6 @@
 extern int c2_rnic_init(struct c2_dev *c2dev);
 extern void c2_rnic_term(struct c2_dev *c2dev);
 extern void c2_rnic_interrupt(struct c2_dev *c2dev);
-extern int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props);
 extern int c2_del_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask);
 extern int c2_add_addr(struct c2_dev *c2dev, u32 inaddr, u32 inmask);
 
--- linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_rnic.c.old	2006-09-01 21:05:03.000000000 +0200
+++ linux-2.6.18-rc5-mm1/drivers/infiniband/hw/amso1100/c2_rnic.c	2006-09-01 21:05:17.000000000 +0200
@@ -118,8 +118,7 @@
 /*
  * Query the adapter
  */
-int c2_rnic_query(struct c2_dev *c2dev,
-		  struct ib_device_attr *props)
+static int c2_rnic_query(struct c2_dev *c2dev, struct ib_device_attr *props)
 {
 	struct c2_vq_req *vq_req;
 	struct c2wr_rnic_query_req wr;


From sashak at voltaire.com  Mon Sep  4 10:20:06 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 04 Sep 2006 20:20:06 +0300
Subject: [openib-general] [PATCH] opensm: osm_log_init_v2() - new osm_log
	initializer
Message-ID: <20060904172006.10400.62708.stgit@sashak.voltaire.com>


There is new osm_log initializer osm_log_init_v2(), this is wrapped
by osm_log_init() in order to preserve existing API.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---

 diags/src/saquery.c          |    4 ++--
 osm/complib/cl_event_wheel.c |    2 +-
 osm/include/opensm/osm_log.h |   29 +++++++++++++++++++++++++----
 osm/opensm/libopensm.map     |    1 +
 osm/opensm/osm_db_files.c    |    2 +-
 osm/opensm/osm_log.c         |   13 ++++++++++++-
 osm/opensm/osm_opensm.c      |    6 +++---
 osm/osmtest/osmtest.c        |    4 ++--
 8 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/diags/src/saquery.c b/diags/src/saquery.c
index 0bb46be..5e4b5f1 100644
--- a/diags/src/saquery.c
+++ b/diags/src/saquery.c
@@ -442,8 +442,8 @@ get_bind_handle(void)
 	complib_init();
 
 	osm_log_construct(&log_osm);
-	if ((status = osm_log_init( &log_osm, TRUE,
-				    0x0001, NULL, 0, TRUE )) != IB_SUCCESS) {
+	if ((status = osm_log_init_v2(&log_osm, TRUE, 0x0001, NULL,
+				      0, TRUE)) != IB_SUCCESS) {
 		fprintf(stderr, "Failed to init osm_log: %s\n",
 			ib_get_err_str(status));
 		exit (-1);
diff --git a/osm/complib/cl_event_wheel.c b/osm/complib/cl_event_wheel.c
index a215f40..e1ab141 100644
--- a/osm/complib/cl_event_wheel.c
+++ b/osm/complib/cl_event_wheel.c
@@ -610,7 +610,7 @@ main ()
   cl_event_wheel_construct( &event_wheel );
 
   /* init */
-  osm_log_init( &log, TRUE, 0xff, NULL, 0, FALSE);
+  osm_log_init_v2( &log, TRUE, 0xff, NULL, 0, FALSE);
   cl_event_wheel_init( &event_wheel, &log );
 
   /* Start Playing */
diff --git a/osm/include/opensm/osm_log.h b/osm/include/opensm/osm_log.h
index 5bfaef5..6f536f3 100644
--- a/osm/include/opensm/osm_log.h
+++ b/osm/include/opensm/osm_log.h
@@ -203,18 +203,18 @@ osm_log_destroy(
 *	osm_log_init
 *********/
 
-/****f* OpenSM: Log/osm_log_init
+/****f* OpenSM: Log/osm_log_init_v2
 * NAME
-*	osm_log_init
+*	osm_log_init_v2
 *
 * DESCRIPTION
-*	The osm_log_init function initializes a
+*	The osm_log_init_v2 function initializes a
 *	Log object for use.
 *
 * SYNOPSIS
 */
 ib_api_status_t
-osm_log_init(
+osm_log_init_v2(
   IN osm_log_t* const p_log,
   IN const boolean_t flush,
   IN const uint8_t log_flags,
@@ -249,6 +249,27 @@ osm_log_init(
 *	osm_log_destroy
 *********/
 
+/****f* OpenSM: Log/osm_log_init
+* NAME
+*	osm_log_init
+*
+* DESCRIPTION
+*	The osm_log_init function initializes a
+*	Log object for use. It wrapper for osm_log_init_v2()
+*
+* SYNOPSIS
+*/
+ib_api_status_t
+osm_log_init(
+  IN osm_log_t* const p_log,
+  IN const boolean_t flush,
+  IN const uint8_t log_flags,
+  IN const char *log_file,
+  IN const boolean_t accum_log_file );
+/*
+* All as above (osm_log_init_v2()), but without max_size parameters
+*/
+
 /****f* OpenSM: Log/osm_log_get_level
 * NAME
 *	osm_log_get_level
diff --git a/osm/opensm/libopensm.map b/osm/opensm/libopensm.map
index c60e3d5..3ac0dc4 100644
--- a/osm/opensm/libopensm.map
+++ b/osm/opensm/libopensm.map
@@ -3,6 +3,7 @@ OPENSM_1.2 {
 		osm_log;
 		osm_is_debug;
 		osm_log_init;
+		osm_log_init_v2;
 		osm_mad_pool_construct;
 		osm_mad_pool_destroy;
 		osm_mad_pool_init;
diff --git a/osm/opensm/osm_db_files.c b/osm/opensm/osm_db_files.c
index 6ae968e..d2f39ac 100644
--- a/osm/opensm/osm_db_files.c
+++ b/osm/opensm/osm_db_files.c
@@ -712,7 +712,7 @@ main(int argc, char **argv)
   cl_list_construct( &keys );
   cl_list_init( &keys, 10 );
 
-  osm_log_init( &log, TRUE, 0xff, "/tmp/test_osm_db.log", FALSE);
+  osm_log_init_v2( &log, TRUE, 0xff, "/tmp/test_osm_db.log", 0, FALSE);
 
   osm_db_construct(&db);
   if (osm_db_init(&db, &log))
diff --git a/osm/opensm/osm_log.c b/osm/opensm/osm_log.c
index a5dac10..45acebc 100644
--- a/osm/opensm/osm_log.c
+++ b/osm/opensm/osm_log.c
@@ -225,7 +225,7 @@ #endif /* defined( _DEBUG_ ) */
 }
 
 ib_api_status_t
-osm_log_init(
+osm_log_init_v2(
   IN osm_log_t* const p_log,
   IN const boolean_t flush,
   IN const uint8_t log_flags,
@@ -279,3 +279,14 @@ osm_log_init(
   else
     return IB_ERROR;
 }
+
+ib_api_status_t
+osm_log_init(
+  IN osm_log_t* const p_log,
+  IN const boolean_t flush,
+  IN const uint8_t log_flags,
+  IN const char *log_file,
+  IN const boolean_t accum_log_file )
+{
+  return osm_log_init_v2(p_log, flush, log_flags, log_file, 0, accum_log_file);
+}
diff --git a/osm/opensm/osm_opensm.c b/osm/opensm/osm_opensm.c
index 0b39d13..19d0412 100644
--- a/osm/opensm/osm_opensm.c
+++ b/osm/opensm/osm_opensm.c
@@ -180,9 +180,9 @@ osm_opensm_init(
    /* Can't use log macros here, since we're initializing the log. */
    osm_opensm_construct( p_osm );
 
-   status = osm_log_init( &p_osm->log, p_opt->force_log_flush,
-                          p_opt->log_flags, p_opt->log_file,
-                          p_opt->log_max_size, p_opt->accum_log_file );
+   status = osm_log_init_v2( &p_osm->log, p_opt->force_log_flush,
+                             p_opt->log_flags, p_opt->log_file,
+                             p_opt->log_max_size, p_opt->accum_log_file );
    if( status != IB_SUCCESS )
       return ( status );
 
diff --git a/osm/osmtest/osmtest.c b/osm/osmtest/osmtest.c
index 4f41e38..7b719a7 100644
--- a/osm/osmtest/osmtest.c
+++ b/osm/osmtest/osmtest.c
@@ -520,8 +520,8 @@ osmtest_init( IN osmtest_t * const p_osm
   /* Can't use log macros here, since we're initializing the log. */
   osmtest_construct( p_osmt );
 
-  status = osm_log_init( &p_osmt->log, p_opt->force_log_flush,
-                         0x0001, p_opt->log_file, 0, TRUE );
+  status = osm_log_init_v2( &p_osmt->log, p_opt->force_log_flush,
+                            0x0001, p_opt->log_file, 0, TRUE );
   if( status != IB_SUCCESS )
     return ( status );
 

From tziporet at mellanox.co.il  Mon Sep  4 12:55:02 2006
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 4 Sep 2006 22:55:02 +0300
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7843@mtlexch01.mtl.com>

Can you explain me how to run mvapich2-0.9.5?

Thanks,
Tziporet

-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Christian
Guggenberger
Sent: Monday, September 04, 2006 6:25 PM
To: Tziporet Koren
Cc: Eli Cohen; Christian Guggenberger; OPENIB
Subject: Re: [openib-general] problems to regiser memory as a reglar
user on SLES9 SP3

> >>We test here SLES9 but with mvapich1 library 0.9.7 version from
OFED.
> >>We tried to run here the test you attached on mvapich1 but have not
seen 
> >>this failure.
> >>Can you try to reproduce with mvapich1 version?
> >>    
> >
> >is it also okay if I tried with plain mvapich1 from OSU ?
> I guess yes, although we use the one that comes with OFED.

hmm. Using plain mvapich-0.9.7 from OSU, the BUGs/Ooops are not
reproducible. Using mvapich2-0.9.5 it happens each time...

cheers.
 - Christian


From mamidala at cse.ohio-state.edu  Mon Sep  4 13:15:05 2006
From: mamidala at cse.ohio-state.edu (amith rajith mamidala)
Date: Mon, 4 Sep 2006 16:15:05 -0400 (EDT)
Subject: [openib-general] rdmacm library
In-Reply-To: <42C02E8B.1070506@ichips.intel.com>
Message-ID: <Pine.GSO.4.40.0609041610230.5466-100000@nu.cse.ohio-state.edu>

Hi Sean,

I installed the latest kernel:2.6.17.11 and the latest ib stack: rev 9240
When I compile programs with rdmacm library, I get the error: (though the
program runs fine...)

/usr/bin/ld: warning: libibverbs.so.1, needed by
/usr/local/lib/librdmacm.so, may conflict with libibverbs.so.2

Does rdmacm use the older version of ibverbs or do I need to install
rdmacm differently?

Thanks,
Amith


From christian.guggenberger at rzg.mpg.de  Mon Sep  4 13:45:11 2006
From: christian.guggenberger at rzg.mpg.de (Christian Guggenberger)
Date: Mon, 4 Sep 2006 22:45:11 +0200
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7843@mtlexch01.mtl.com>
References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7843@mtlexch01.mtl.com>
Message-ID: <20060904204511.GA7855@daltons.rzg.mpg.de>

Hi Tziporet,
On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote:
> Can you explain me how to run mvapich2-0.9.5?

at first, simple compiling using the OSU scripts (make.mvapich2.gen2) -
should work out of the box. (except you will use PCI-X HCAs - you'll
have to ommit "-DSRQ" in the build script then). Note, python-devel is
needed for the build.

then, assuming your doing your tests as root on a single box.

- create /etc/mpd.conf

containing the line "secretword=blabla" - just some non-meaningful
passphrase ;)
(you'll probably also need the same file as ~/.mpd.conf and
~/.mpdpasswd , too)

- start mpd ring
# mpdboot -n 1 -f hosts
(hosts should contain the hostname)

- check if mpdring is up and running
# mpdtrace

- start application on 2 CPUs
# mpiexec -n 2 ./a.out

- once tests are over, stop the ring
# mpdallexit

hope that helps,

cheers.
 - Christian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5594 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060904/4f6b23e7/attachment.bin>

From panda at cse.ohio-state.edu  Mon Sep  4 14:06:28 2006
From: panda at cse.ohio-state.edu (Dhabaleswar Panda)
Date: Mon, 4 Sep 2006 17:06:28 -0400 (EDT)
Subject: [openib-general] problems to regiser memory as a reglar
In-Reply-To: <20060904204511.GA7855@daltons.rzg.mpg.de> from
	"Christian Guggenberger" at Sep 04, 2006 10:45:11 PM
Message-ID: <200609042106.k84L6S2q025644@xi.cse.ohio-state.edu>

Christian - Thanks for sending instructions for running mvapich2-0.9.5
to Tziporet.

Tziporet - Thanks for looking into this problem on SLES9 environment.

Please note that a detailed user guide for running and tuning MVAPICH2
0.9.5 is available from the following URL:

http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html

DK


> Hi Tziporet,
> On Mon, Sep 04, 2006 at 10:55:02PM +0300, Tziporet Koren wrote:
> > Can you explain me how to run mvapich2-0.9.5?
> 
> at first, simple compiling using the OSU scripts (make.mvapich2.gen2) -
> should work out of the box. (except you will use PCI-X HCAs - you'll
> have to ommit "-DSRQ" in the build script then). Note, python-devel is
> needed for the build.
> 
> then, assuming your doing your tests as root on a single box.
> 
> - create /etc/mpd.conf
> 
> containing the line "secretword=blabla" - just some non-meaningful
> passphrase ;)
> (you'll probably also need the same file as ~/.mpd.conf and
> ~/.mpdpasswd , too)
> 
> - start mpd ring
> # mpdboot -n 1 -f hosts
> (hosts should contain the hostname)
> 
> - check if mpdring is up and running
> # mpdtrace
> 
> - start application on 2 CPUs
> # mpiexec -n 2 ./a.out
> 
> - once tests are over, stop the ring
> # mpdallexit
> 
> hope that helps,
> 
> cheers.
>  - Christian


From sean.hefty at intel.com  Mon Sep  4 21:05:31 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 4 Sep 2006 21:05:31 -0700
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <44FBC374.8040709@voltaire.com>
Message-ID: <000001c6d0a0$87bd9050$15248686@amr.corp.intel.com>

>ok, thanks for clarifying that, is cancellation allowed only for address
>resolution or also for route resolving and/or CM calls? also how about
>documenting this?

Cancellation is allowed for any asynchronous operation.  I will pull in your
patch when I get back in the office.  Thanks.

- Sean


From sean.hefty at intel.com  Mon Sep  4 21:08:04 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 4 Sep 2006 21:08:04 -0700
Subject: [openib-general] rdmacm library
In-Reply-To: <Pine.GSO.4.40.0609041610230.5466-100000@nu.cse.ohio-state.edu>
Message-ID: <000101c6d0a0$e3207de0$15248686@amr.corp.intel.com>

>/usr/bin/ld: warning: libibverbs.so.1, needed by
>/usr/local/lib/librdmacm.so, may conflict with libibverbs.so.2
>
>Does rdmacm use the older version of ibverbs or do I need to install
>rdmacm differently?

I keep the RDMA CM updated with the latest version of verbs.  There may be an
issue with the library's build; I'll look into this.

- Sean


From eitan at dev.mellanox.co.il  Mon Sep  4 22:40:59 2006
From: eitan at dev.mellanox.co.il (eitan at dev.mellanox.co.il)
Date: Tue, 5 Sep 2006 08:40:59 +0300 (IDT)
Subject: [openib-general] MPI Brodcast doubt
In-Reply-To: <a94efc20609040556h51ad5b60i91219bc5ef39855f@mail.gmail.com>
References: <a94efc20609040556h51ad5b60i91219bc5ef39855f@mail.gmail.com>
Message-ID: <10677.194.90.237.34.1157434859.squirrel@dev.mellanox.co.il>


>
> I have 3 nodes connected via IB as shown below:
>
> node1 ---> switch1 ---> node2
>                     |----------> node3
>
> If node1 sends a brodcast message to node2 and node3, I want to know if
> the
> message is delivered to the switch twice (first time for node2 and second
> time for node3) or just once (where switch will know by looking at some
> headers or so that its a brodcast message and will send it on all the
> outgoing ports) ?
Message delivered once. Switch duplicates it.

EZ


From ogerlitz at voltaire.com  Mon Sep  4 23:43:15 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 5 Sep 2006 09:43:15 +0300 (IDT)
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB
Message-ID: <Pine.LNX.4.64.0609050924180.13095@zuben>

Hi,

While doing some work to have linux bonding driver be able to work on top
of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.

	ib0: failed send event (status=2, wrid=52 vend_err 62)

What does this vendor error means? its the same system over which i saw the qp modify error.

There are some more problematic prints i see here which i will be happy
to get some idea on their meaning...

 ib1: dev_queue_xmit failed to requeue packet
 ib1: dev_queue_xmit failed to requeue packet

 ???

 ib1: timing out; will leak address handles
 ib1: ib_dealloc_pd failed

(the pd dealloc failure is as of the ah leak) but what is the leak cause ???

Below is a more detailed snapshot of the time the problems has occured, I was
playing with this HCA 2 IB links, getting one of down for about 45 seconds (by
some instrumentation of the SM) and then the other, etc.

The ipoib code is unchanged (other then adding the "ipoib_set_mcast_list called" print).

The bonding code was changed not to set the slave mac address but rather use the mac address
of the active slave and also override the ether_setup() settings with the active slave ones.

One thing which i think to see is that the IPoIB attempts to join the IPv4 broadcast group
even when the port IB link is down, am i correct? if yes, would it be easy to fix this?

Or.

     1	ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     2	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
     3	ib0: starting multicast thread
     4	ib1: stopping multicast thread
     5	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     6	ib1: flushing multicast list
     7	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
     8	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
     9	ib1: starting multicast thread
    10	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    11	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    12	ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
    13	ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c103c0, LID 0xc000, SL 0
    14	ib1: successfully joined all multicast groups
    15	bonding: bond0: link status definitely down for interface ib0, disabling it
    16	bonding: bond0: making interface ib1 the new active one.
    17	ib0: ipoib_set_mcast_list called
    18	ib1: ipoib_set_mcast_list called
    19	ib0: restarting multicast task
    20	ib0: stopping multicast thread
    21	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    22	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
    23	ib0: starting multicast thread
    24	ib1: restarting multicast task
    25	ib1: stopping multicast thread
    26	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    27	ib1: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
    28	ib1: starting multicast thread
    29	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    30	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    31	ib1: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
    32	ib1: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810037f91d00, LID 0xc001, SL 0
    33	ib1: successfully joined all multicast groups
    34	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
    35	ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
    36	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    37	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
    38	ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
    39	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    40	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
    41	ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -110
    42	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    43	ib0: stopping multicast thread
    44	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    45	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
    46	ib0: flushing multicast list
    47	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
    48	ib0: starting multicast thread
    49	ib1: stopping multicast thread
    50	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    51	ib1: flushing multicast list
    52	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    53	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
    54	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    55	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
    56	ib1: starting multicast thread
    57	ib0: stopping multicast thread
    58	ib0: flushing multicast list
    59	ib0: starting multicast thread
    60	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    61	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    62	bonding: bond0: link status definitely down for interface ib1, disabling it
    63	ib1: ipoib_set_mcast_list called
    64	bonding: bond0: now running without any active interface !
    65	ib1: restarting multicast task
    66	ib1: stopping multicast thread
    67	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    68	ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
    69	ib1: starting multicast thread
    70	ib1: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    71	ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
    72	ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff810033c10d80, LID 0xc000, SL 0
    73	ib0: successfully joined all multicast groups
    74	ib1: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
    75	ib1: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff81000b8453c0, LID 0xc000, SL 0
    76	ib1: successfully joined all multicast groups
    77	ib1: dev_queue_xmit failed to requeue packet
    78	ib1: dev_queue_xmit failed to requeue packet
    79	bonding: bond0: link status definitely up for interface ib0.
    80	bonding: bond0: link status definitely up for interface ib1.
    81	bonding: bond0: making interface ib0 the new active one.
    82	ib0: ipoib_set_mcast_list called
    83	bonding: bond0: first active interface up!
    84	ib0: restarting multicast task
    85	ib0: stopping multicast thread
    86	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
    87	ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
    88	ib0: starting multicast thread
    89	ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    90	ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
    91	ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff81000099c340, LID 0xc001, SL 0
    92	ib0: successfully joined all multicast groups
    93	ib0: failed send event (status=2, wrid=52 vend_err 62)
    94	ib0: ipoib_set_mcast_list called
    95	ib0: restarting multicast task
    96	ib0: stopping multicast thread
    97	ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
    98	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
    99	ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
   100	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
   101	ib0: starting multicast thread
   102	ib0: successfully joined all multicast groups
   103	ib0: stopping multicast thread
   104	ib0: flushing multicast list
   105	ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   106	ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
   107	ib1: stopping multicast thread
   108	ib1: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   109	ib1: flushing multicast list
   110	ib1: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
   111	ib1: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
   112	ib1: timing out; will leak address handles
   113	bonding: bond0: released all slaves
   114	ib0: stopping multicast thread
   115	ib0: flushing multicast list
   116	ib1: stopping multicast thread
   117	ib1: flushing multicast list
   118	ib1: ib_dealloc_pd failed


From mst at mellanox.co.il  Tue Sep  5 00:13:02 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 5 Sep 2006 10:13:02 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB
In-Reply-To: <Pine.LNX.4.64.0609050924180.13095@zuben>
References: <Pine.LNX.4.64.0609050924180.13095@zuben>
Message-ID: <20060905071302.GC5401@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: getting LOC_QP_OP_ERR with IPoIB
> 
> Hi,
> 
> While doing some work to have linux bonding driver be able to work on top
> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.
> 
> 	ib0: failed send event (status=2, wrid=52 vend_err 62)
> 
> What does this vendor error means? its the same system over which i saw the qp modify error.

vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched

-- 
MST


From leonid at scalemp.com  Tue Sep  5 00:30:41 2006
From: leonid at scalemp.com (Leonid Arsh)
Date: Tue, 5 Sep 2006 10:30:41 +0300
Subject: [openib-general] OpenSM - guid2lid cache file questions
Message-ID: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>

Hi list,

 I have a question regarding the guid2lid cache file.

  The file is read by OpenSM on the start up.
  OpenSM may reassign LIDs according to the LIDs saved in this file.
 It isn't always acceptable.

 Is it a right policy? Am I missing anything here?
 Is there a way to disable the file reading on start up?

Regards,
   Leonid


From ogerlitz at voltaire.com  Tue Sep  5 00:40:56 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 05 Sep 2006 10:40:56 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB
In-Reply-To: <20060905071302.GC5401@mellanox.co.il>
References: <Pine.LNX.4.64.0609050924180.13095@zuben>
	<20060905071302.GC5401@mellanox.co.il>
Message-ID: <44FD2A08.1040708@voltaire.com>

Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:

>> While doing some work to have linux bonding driver be able to work on top
>> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.
>> 	ib0: failed send event (status=2, wrid=52 vend_err 62)
>> What does this vendor error means? its the same system over which i saw the qp modify error.


> vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched

Thanks.

So what's your thinking, am i running into some ipoib bogus scenario?

Or.


From mst at mellanox.co.il  Tue Sep  5 00:48:34 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 5 Sep 2006 10:48:34 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB
In-Reply-To: <44FD2A08.1040708@voltaire.com>
References: <Pine.LNX.4.64.0609050924180.13095@zuben>
	<20060905071302.GC5401@mellanox.co.il> <44FD2A08.1040708@voltaire.com>
Message-ID: <20060905074834.GD5401@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: getting LOC_QP_OP_ERR with IPoIB
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> 
> >> While doing some work to have linux bonding driver be able to work on top
> >> of IPoIB i have run into LOC_QP_OP_ERR with vendor (mellanox PCIX HCA) error 62.
> >> 	ib0: failed send event (status=2, wrid=52 vend_err 62)
> >> What does this vendor error means? its the same system over which i saw the qp modify error.
> 
> 
> > vend_err 0x62 is WQE-fetch failure due to WQE-region non-exists or PD mismatched
> 
> Thanks.
> 
> So what's your thinking, am i running into some ipoib bogus scenario?
> 
> Or.

Donnu, it looks really weird. Could you try firmware 3.5.0 please?

-- 
MST


From halr at voltaire.com  Tue Sep  5 03:57:53 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Sep 2006 06:57:53 -0400
Subject: [openib-general] OpenSM - guid2lid cache file questions
In-Reply-To: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
Message-ID: <1157453867.26953.176326.camel@hal.voltaire.com>

Hi Leonid,

On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> Hi list,
> 
>  I have a question regarding the guid2lid cache file.
> 
>   The file is read by OpenSM on the start up.
>   OpenSM may reassign LIDs according to the LIDs saved in this file.
>  It isn't always acceptable.
> 
>  Is it a right policy? Am I missing anything here?
>  Is there a way to disable the file reading on start up?

There is the -r (--reassign_lids) option for this but it is not the
default behavior of OpenSM.

-- Hal

> 
> Regards,
>    Leonid
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From halr at voltaire.com  Tue Sep  5 03:58:27 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Sep 2006 06:58:27 -0400
Subject: [openib-general] MPI Brodcast doubt
In-Reply-To: <a94efc20609040556h51ad5b60i91219bc5ef39855f@mail.gmail.com>
References: <a94efc20609040556h51ad5b60i91219bc5ef39855f@mail.gmail.com>
Message-ID: <1157453896.26953.176365.camel@hal.voltaire.com>

John,

On Mon, 2006-09-04 at 08:56, john t wrote:
> Hi,
>  
> I have 3 nodes connected via IB as shown below:
>  
> node1 ---> switch1 ---> node2
>                     |----------> node3
>  
> If node1 sends a brodcast message to node2 and node3, I want to know
> if the message is delivered to the switch twice (first time for node2
> and second time for node3) or just once (where switch will know by
> looking at some headers or so that its a brodcast message and will
> send it on all the outgoing ports) ?

Assuming nodes 1, 2, and 3 are part of the same multicast group, the
multicast send is sent once from node 1. When received at the switch, it
is replicated to all ports which have members in the same group (in this
case, nodes 2 and 3). The switch knows by the header (specifically the
LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable
to determine on which ports to forward it. However, IB multicast is
unreliable so to create reliable multicast, it is sometimes "emulated"
in that the sender tracks the group members and may use serial unicast
sends or augment a multicast send with unicast sends to the receivers
and track their acknowledgements of receipt.

-- Hal

> Regards,
> John T.
> 
> ______________________________________________________________________
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From leonid at scalemp.com  Tue Sep  5 05:11:33 2006
From: leonid at scalemp.com (Leonid Arsh)
Date: Tue, 5 Sep 2006 15:11:33 +0300
Subject: [openib-general] OpenSM - guid2lid cache file questions
In-Reply-To: <1157453867.26953.176326.camel@hal.voltaire.com>
References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
	<1157453867.26953.176326.camel@hal.voltaire.com>
Message-ID: <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com>

Hi Hal,

  Thank you for your reply.

Probably I wasn't clear.

I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
OpenSM changes LIDs in this case.
I don't want  the LIDs to be changed.
As I understand it, the '-r' option, on the contrary, causes the SM to
reassign all the LIDs.

I could just remove the file to handle the problem.
I'd like to know if there is a way to do it without touching the file.

Thanks,
    Leonid

On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> Hi Leonid,
>
> On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> > Hi list,
> >
> >  I have a question regarding the guid2lid cache file.
> >
> >   The file is read by OpenSM on the start up.
> >   OpenSM may reassign LIDs according to the LIDs saved in this file.
> >  It isn't always acceptable.
> >
> >  Is it a right policy? Am I missing anything here?
> >  Is there a way to disable the file reading on start up?
>
> There is the -r (--reassign_lids) option for this but it is not the
> default behavior of OpenSM.
>
> -- Hal
>
> >
> > Regards,
> >    Leonid
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>


From dotanb at dev.mellanox.co.il  Tue Sep  5 05:26:33 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 05 Sep 2006 15:26:33 +0300
Subject: [openib-general] MPI Brodcast doubt
In-Reply-To: <1157453896.26953.176365.camel@hal.voltaire.com>
References: <a94efc20609040556h51ad5b60i91219bc5ef39855f@mail.gmail.com>
	<1157453896.26953.176365.camel@hal.voltaire.com>
Message-ID: <44FD6CF9.6090805@dev.mellanox.co.il>

Hal Rosenstock wrote:
> John,
>
> On Mon, 2006-09-04 at 08:56, john t wrote:
>   
>> Hi,
>>  
>> I have 3 nodes connected via IB as shown below:
>>  
>> node1 ---> switch1 ---> node2
>>                     |----------> node3
>>  
>> If node1 sends a brodcast message to node2 and node3, I want to know
>> if the message is delivered to the switch twice (first time for node2
>> and second time for node3) or just once (where switch will know by
>> looking at some headers or so that its a brodcast message and will
>> send it on all the outgoing ports) ?
>>     
>
> Assuming nodes 1, 2, and 3 are part of the same multicast group, the
> multicast send is sent once from node 1. When received at the switch, it
> is replicated to all ports which have members in the same group (in this
> case, nodes 2 and 3). The switch knows by the header (specifically the
> LRH:DLID which is a multicast LID) and uses the MulticastForwardingTable
> to determine on which ports to forward it. However, IB multicast is
> unreliable so to create reliable multicast, it is sometimes "emulated"
> in that the sender tracks the group members and may use serial unicast
> sends or augment a multicast send with unicast sends to the receivers
> and track their acknowledgements of receipt.
>
> -- Hal
>   
All of the above is true for IB multicast (there isn't any broadcast in IB).

If the question was "what happens when one send a message using 
MPI_broadcast?"
then the answer will be: it depends on the MPI implementation.
I know that in MVAPICH the MPI handles the duplications by itself by default
(and the switch will get two messages and not one).
There is an option in that MPI to use IB multicast but it is disabled by 
default.

Dotan


From halr at voltaire.com  Tue Sep  5 05:46:22 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Sep 2006 08:46:22 -0400
Subject: [openib-general] OpenSM - guid2lid cache file questions
In-Reply-To: <10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com>
References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
	<1157453867.26953.176326.camel@hal.voltaire.com>
	<10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com>
Message-ID: <1157460382.26953.179764.camel@hal.voltaire.com>

Hi Leonid,

On Tue, 2006-09-05 at 08:11, Leonid Arsh wrote:
> Hi Hal,
> 
>   Thank you for your reply.
> 
> Probably I wasn't clear.
> 
> I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
> OpenSM changes LIDs in this case.

How do you know the file is "out of date" ?

> I don't want  the LIDs to be changed.

Oh, it's the other way you were asking about.

> As I understand it, the '-r' option, on the contrary, causes the SM to
> reassign all the LIDs.
> 
> I could just remove the file to handle the problem.

or move it aside.

> I'd like to know if there is a way to do it without touching the file.

Not currently. There is the -x (--honor_guid2lid) which will do this
(ignore the guid2lid file) when OpenSM is coming out of STANDBY though.

-- Hal

> Thanks,
>     Leonid
> 
> On 05 Sep 2006 06:57:53 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > Hi Leonid,
> >
> > On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> > > Hi list,
> > >
> > >  I have a question regarding the guid2lid cache file.
> > >
> > >   The file is read by OpenSM on the start up.
> > >   OpenSM may reassign LIDs according to the LIDs saved in this file.
> > >  It isn't always acceptable.
> > >
> > >  Is it a right policy? Am I missing anything here?
> > >  Is there a way to disable the file reading on start up?
> >
> > There is the -r (--reassign_lids) option for this but it is not the
> > default behavior of OpenSM.
> >
> > -- Hal
> >
> > >
> > > Regards,
> > >    Leonid
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > >
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
> >


From ogerlitz at voltaire.com  Tue Sep  5 05:51:43 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 05 Sep 2006 15:51:43 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint
	question
In-Reply-To: <20060905074834.GD5401@mellanox.co.il>
References: <Pine.LNX.4.64.0609050924180.13095@zuben>
	<20060905071302.GC5401@mellanox.co.il> <44FD2A08.1040708@voltaire.com>
	<20060905074834.GD5401@mellanox.co.il>
Message-ID: <44FD72DF.4000708@voltaire.com>

Michael S. Tsirkin wrote:
> Donnu, it looks really weird. Could you try firmware 3.5.0 please?

I just noted that you can not work with mstflint if the mthca driver is 
not loaded, i think it was not the case in the gen1 tools, am i correct.

Is this connected to this print

	ACPI: PCI interrupt for device 0000:02:00.0 disabled

i see once the mthca driver is unloaded?

Or.

> dill:/tmp # modprobe -r ib_mthca

> dill:/tmp # ./mstflint -d 00:02:00.0 q
> *** ERROR *** Read a corrupted device id (0xffff). Probably HW/PCI access problem
> *** ERROR *** Device type 65535 not supported.
> *** ERROR *** Can not get flash type using device 00:02:00.0

> dill:/tmp # modprobe ib_mthca

> dill:/tmp # ./mstflint -d 00:02:00.0 q
> Image type:      Failsafe
> I.S. Version:    1
> Chip Revision:   A1
> GUID Des:        Node             Port1            Port2            Sys image
> GUIDs:           0008f104039651dc 0008f104039651dd 0008f104039651de 0008f104039651df
> Board ID:         (VLT0010010001)
> VSD:
> PSID:            VLT0010010001

> dill:/tmp # dmesg

> ACPI: PCI interrupt for device 0000:02:00.0 disabled

> ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
> ib_mthca: Initializing 0000:02:00.0
> PCI: Enabling device 0000:02:00.0 (0110 -> 0112)
> ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 29 (level, low) -> IRQ 193


From tziporet at dev.mellanox.co.il  Tue Sep  5 05:57:17 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 05 Sep 2006 15:57:17 +0300
Subject: [openib-general] problems to regiser memory as a reglar
In-Reply-To: <200609042106.k84L6S2q025644@xi.cse.ohio-state.edu>
References: <200609042106.k84L6S2q025644@xi.cse.ohio-state.edu>
Message-ID: <44FD742D.10506@dev.mellanox.co.il>

Dhabaleswar Panda wrote:
> Christian - Thanks for sending instructions for running mvapich2-0.9.5
> to Tziporet.
>
> Tziporet - Thanks for looking into this problem on SLES9 environment.
>
> Please note that a detailed user guide for running and tuning MVAPICH2
> 0.9.5 is available from the following URL:
>
> http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich2/mvapich2_user_guide.html
>
> DK
>   
Thanks to all,
We found the bug that was in memory registration flow of SLES9 only.
A fix will be available in OFED 1.1 RC4

Tziporet


From oibleo at gmail.com  Tue Sep  5 06:13:00 2006
From: oibleo at gmail.com (Leonid Arsh)
Date: Tue, 5 Sep 2006 16:13:00 +0300
Subject: [openib-general] OpenSM - guid2lid cache file questions
In-Reply-To: <1157460382.26953.179764.camel@hal.voltaire.com>
References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
	<1157453867.26953.176326.camel@hal.voltaire.com>
	<10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com>
	<1157460382.26953.179764.camel@hal.voltaire.com>
Message-ID: <10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com>

Thanks,

On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
> > OpenSM changes LIDs in this case.
>
> How do you know the file is "out of date" ?
>
Actually, the LIDs were assigned by another SM.
When I start my new OpenSM, the old SM is already dead.
Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs different
from ones in the file.
When I start OpenSM, the LIDs are reassigned on the fabric.


From bugzilla-daemon at openib.org  Tue Sep  5 06:16:24 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Tue,  5 Sep 2006 06:16:24 -0700 (PDT)
Subject: [openib-general] [Bug 131] working with huge pages may crash the
	kernel on Suse10
Message-ID: <20060905131624.21B162283D8@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=131


tziporet at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from tziporet at mellanox.co.il  2006-09-05 06:16 -------
was fixed in 1.1-rc3


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at openib.org  Tue Sep  5 06:18:58 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Tue,  5 Sep 2006 06:18:58 -0700 (PDT)
Subject: [openib-general] [Bug 145] IB Core unable to communicate IPoIB on
	Fedora Core 4
Message-ID: <20060905131858.0ED0E228423@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=145


tziporet at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


------- Comment #2 from tziporet at mellanox.co.il  2006-09-05 06:18 -------
this is not a bug in OFED


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From halr at voltaire.com  Tue Sep  5 06:18:03 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Sep 2006 09:18:03 -0400
Subject: [openib-general] OpenSM - guid2lid cache file questions
In-Reply-To: <10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com>
References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
	<1157453867.26953.176326.camel@hal.voltaire.com>
	<10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com>
	<1157460382.26953.179764.camel@hal.voltaire.com>
	<10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com>
Message-ID: <1157462283.26953.180804.camel@hal.voltaire.com>

Leonid,

On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
> Thanks,
> 
> On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
> > > OpenSM changes LIDs in this case.
> >
> > How do you know the file is "out of date" ?
> >
> Actually, the LIDs were assigned by another SM.

Different (vendor) SMs have different LID assignment and pathing
(routing) policies. It is inadvisable to failover across vendor SMs for
this and other reasons.

-- Hal

> When I start my new OpenSM, the old SM is already dead.
> Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs different
> from ones in the file.
> When I start OpenSM, the LIDs are reassigned on the fabric.


From halr at voltaire.com  Tue Sep  5 06:25:28 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Sep 2006 09:25:28 -0400
Subject: [openib-general] [PATCH] opensm: osm_log_init_v2() - new
	osm_log initializer
In-Reply-To: <20060904172006.10400.62708.stgit@sashak.voltaire.com>
References: <20060904172006.10400.62708.stgit@sashak.voltaire.com>
Message-ID: <1157462724.26953.181066.camel@hal.voltaire.com>

On Mon, 2006-09-04 at 13:20, Sasha Khapyorsky wrote:
> There is new osm_log initializer osm_log_init_v2(), this is wrapped
> by osm_log_init() in order to preserve existing API.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied (to trunk and 1.1).

-- Hal


From eitan at dev.mellanox.co.il  Tue Sep  5 06:25:44 2006
From: eitan at dev.mellanox.co.il (Eitan Zahavi)
Date: Tue, 5 Sep 2006 16:25:44 +0300
Subject: [openib-general] OpenSM - guid2lid cache file questions
In-Reply-To: <1157462283.26953.180804.camel@hal.voltaire.com>
References: <10e223bf0609050030t593557fcy8b0d654e3ca79ba8@mail.gmail.com>
	<1157453867.26953.176326.camel@hal.voltaire.com>
	<10e223bf0609050511w1ffbab4cx3d7b39e707340ed0@mail.gmail.com>
	<1157460382.26953.179764.camel@hal.voltaire.com>
	<10e223bf0609050613w68de2491r9f4e1c0a37f5a5b6@mail.gmail.com>
	<1157462283.26953.180804.camel@hal.voltaire.com>
Message-ID: <000001c6d0ee$cb29dcb0$617d9610$@mellanox.co.il>

Hi Leonid,

The best approach when switching from another vendor SM to 
OpenSM is to delete the /var/cache/osm/guid2lid file.

> -----Original Message-----
> From: openib-general-bounces at openib.org [mailto:openib-general-
> bounces at openib.org] On Behalf Of Hal Rosenstock
> Sent: Tuesday, September 05, 2006 4:18 PM
> To: Leonid Arsh
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] OpenSM - guid2lid cache file questions
> 
> Leonid,
> 
> On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
> > Thanks,
> >
> > On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > I have a problem when OpenSM, being started, reads an out-if-date
> guid2lid file.
> > > > OpenSM changes LIDs in this case.
> > >
> > > How do you know the file is "out of date" ?
> > >
> > Actually, the LIDs were assigned by another SM.
> 
> Different (vendor) SMs have different LID assignment and pathing
> (routing) policies. It is inadvisable to failover across vendor SMs for
this and
> other reasons.
> 
> -- Hal
> 
> > When I start my new OpenSM, the old SM is already dead.
> > Before starting the new OpenSM, the  ibnetdiscover utility shows LIDs
> > different from ones in the file.
> > When I start OpenSM, the LIDs are reassigned on the fabric.
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From mst at mellanox.co.il  Tue Sep  5 06:36:50 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 5 Sep 2006 16:36:50 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint
	question
In-Reply-To: <44FD72DF.4000708@voltaire.com>
References: <44FD72DF.4000708@voltaire.com>
Message-ID: <20060905133650.GL5401@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question
> 
> Michael S. Tsirkin wrote:
> > Donnu, it looks really weird. Could you try firmware 3.5.0 please?
> 
> I just noted that you can not work with mstflint if the mthca driver is 
> not loaded, i think it was not the case in the gen1 tools, am i correct.

Yes, recent kernels disable device access once driver is unloaded:

mstflint -d 08:00.0 q
*** ERROR *** Read a corrupted device id (0xffff). Probably HW/PCI access
problem
*** ERROR *** Device type 65535 not supported.
*** ERROR *** Can not get flash type using device 08:00.0

mstflint should work without driver using /proc:
mstflint -d /proc/bus/pci/08/00.0 q
Image type:      Failsafe
I.S. Version:    1
Chip Revision:   A0


In gen1 flint had a separate driver which you had to load.
I am not sure whether this would work on 2.6.18

> Is this connected to this print
> 
> 	ACPI: PCI interrupt for device 0000:02:00.0 disabled
> 
> i see once the mthca driver is unloaded?
> 
> Or.

Probably not.

-- 
MST


From thomas.bub at thomson.net  Tue Sep  5 07:22:28 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Tue, 5 Sep 2006 16:22:28 +0200
Subject: [openib-general] libibcm can't connect/talk to libicm on other
	machine.
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD388@wdtssmail01.eu.thmulti.com>

I'm still in the process of migrating my gen1 application to gen2.
Actually I CAN connect a gen2 application to a gen2 listener application
on the same machine but NOT to a gen 2 listener on another machine.
Any hints where to look at?
Is there anything in the architecture that might prevent a libibcm
connection to another machine?
I'm using an old Voltaire switch to connect the machines. Can this be
the reason?
The switch didn't cause problems using gen1 clients.
Thanks
Thomas Bub
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060905/1657d7f8/attachment.html>

From dotanb at dev.mellanox.co.il  Tue Sep  5 08:12:08 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 05 Sep 2006 18:12:08 +0300
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD388@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD388@wdtssmail01.eu.thmulti.com>
Message-ID: <44FD93C8.1020604@dev.mellanox.co.il>

Hi bub.


Bub Thomas wrote:
>
> I’m still in the process of migrating my gen1 application to gen2.
>
> Actually I CAN connect a gen2 application to a gen2 listener 
> application on the same machine but NOT to a gen 2 listener on another 
> machine.
>
> Any hints where to look at?
>
> Is there anything in the architecture that might prevent a libibcm 
> connection to another machine?
>
> I’m using an old Voltaire switch to connect the machines. Can this be 
> the reason?
>
> The switch didn’t cause problems using gen1 clients.
>
What is the problem that you see?
there are some examples that comes with the libibcm that can show you 
how to use the library.

there can be several reasons for your problem:
1) side A send a req when side B is not ready and there is a timeout failure
2) only in side A the ib_ucm kernel module enabled
3) SM is not working (well)
4) host A cannot be reached to host B using IB
5) endianess issues?

i tried to use the libibcm and i don't have any problem (but i don't 
have any Voltaire switch, so i can't check your scenario).

Dotan


From halr at voltaire.com  Tue Sep  5 08:20:00 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 05 Sep 2006 11:20:00 -0400
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD388@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD388@wdtssmail01.eu.thmulti.com>
Message-ID: <1157469599.26953.184762.camel@hal.voltaire.com>

Hi Bub,

On Tue, 2006-09-05 at 10:22, Bub Thomas wrote:
> I’m still in the process of migrating my gen1 application to gen2.
> 
> Actually I CAN connect a gen2 application to a gen2 listener
> application on the same machine but NOT to a gen 2 listener on another
> machine.
> 
> Any hints where to look at?

What are you using for SM ? OpenSM or vendor SM ?

> Is there anything in the architecture that might prevent a libibcm
> connection to another machine?

I don't think this is an architectural issue.

-- Hal

> I’m using an old Voltaire switch to connect the machines. Can this be
> the reason?
> 
> The switch didn’t cause problems using gen1 clients.
> 
> Thanks
> 
> Thomas Bub
> 
> 
> 
> ______________________________________________________________________
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From thomas.bub at thomson.net  Tue Sep  5 09:11:13 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Tue, 5 Sep 2006 18:11:13 +0200
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD389@wdtssmail01.eu.thmulti.com>

Dotan,
the ibv_rc_pingpong example works for me so I can exclude the
architecture.
I never got the libibcm example compiled.
Which is your example and which architecture x86 vs. x86_64 did you
compile it for?
Can you share your libibcm the example code? (if it is not the standard
that I can't get compiled)
Thomas

-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Dotan Barak
Sent: Tuesday, September 05, 2006 5:12 PM
To: Bub Thomas
Cc: openib-general at openib.org
Subject: Re: [openib-general] libibcm can't connect/talk to libicm on
other machine.

Hi bub.


Bub Thomas wrote:
>
> I'm still in the process of migrating my gen1 application to gen2.
>
> Actually I CAN connect a gen2 application to a gen2 listener 
> application on the same machine but NOT to a gen 2 listener on another

> machine.
>
> Any hints where to look at?
>
> Is there anything in the architecture that might prevent a libibcm 
> connection to another machine?
>
> I'm using an old Voltaire switch to connect the machines. Can this be 
> the reason?
>
> The switch didn't cause problems using gen1 clients.
>
What is the problem that you see?
there are some examples that comes with the libibcm that can show you 
how to use the library.

there can be several reasons for your problem:
1) side A send a req when side B is not ready and there is a timeout
failure
2) only in side A the ib_ucm kernel module enabled
3) SM is not working (well)
4) host A cannot be reached to host B using IB
5) endianess issues?

i tried to use the libibcm and i don't have any problem (but i don't 
have any Voltaire switch, so i can't check your scenario).

Dotan

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From dlpaktor at us.ibm.com  Tue Sep  5 10:30:43 2006
From: dlpaktor at us.ibm.com (David L Paktor)
Date: Tue, 5 Sep 2006 10:30:43 -0700
Subject: [openib-general] New development tool for boot-time drivers (FCode,
 IEE-1275, IBM/Sun)
Message-ID: <OF0A9CBD94.25BBED8A-ON882571E0.005FBA0A-882571E0.0060324F@us.ibm.com>


If anyone is interested in developing boot-time device drivers for plug-in
devices, conformant to the IEEE-1275 (Open Firmware) specification, using
FCode (tokenized Forth source), which is compatible with both IBM and Sun
platforms (and is platform-independent, so that a driver written once is
compatible with all Open Firmware platforms ... but you already know all
this if you're using Open Firmware), then you will need a Tokenizer to
translate from your Forth source to FCode tokens, which are the "medium
of exchange" between the device and the platform.

I am writing to announce that a new FCode Tokenizer, capable of running
on IBM equipment (and that can be compiled on any other host that supports
the GnuCC compiler, and others as well) is freely available at the web-site
of the OpenBIOS project,  www.openbios.org   (and just follow the links
about the New FCODE suite)

If you have any questions, please direct them to the OpenBIOS Mailing List.

Thank you.

-----

David L. Paktor                  System Firmware Developer
System and Technology Group      Global Firmware Division
dlpaktor at us.ibm.com              David L Paktor/Almaden/IBM at IBMUS

18880 Homestead Rd.              Building 9945
Cupertino CA 95014               Room 1026
408-342-6110                     T/L 560-6110

"The Bug Stops Here"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060905/3ab8e193/attachment.html>

From mshefty at ichips.intel.com  Tue Sep  5 11:14:41 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 05 Sep 2006 11:14:41 -0700
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD389@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD389@wdtssmail01.eu.thmulti.com>
Message-ID: <44FDBE91.4020005@ichips.intel.com>

Bub Thomas wrote:
> Dotan,
> the ibv_rc_pingpong example works for me so I can exclude the
> architecture.
> I never got the libibcm example compiled.
> Which is your example and which architecture x86 vs. x86_64 did you
> compile it for?
> Can you share your libibcm the example code? (if it is not the standard
> that I can't get compiled)
> Thomas

Did you try applying the following patch?

http://openib.org/pipermail/openib-general/2006-August/025005.html

I should also mention that I have a version of cmpost that works with the new 
libibsa, but I am waiting for the review of the kernel sa_query changes before 
committing.

- Sean


From jwm at systemfabricworks.com  Tue Sep  5 11:43:25 2006
From: jwm at systemfabricworks.com (JWM)
Date: Tue, 5 Sep 2006 13:43:25 -0500
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
References: <B79FAF8BB536314E859EA1963CFFD22201FBD388@wdtssmail01.eu.thmulti.com>
Message-ID: <004201c6d11b$2be1d360$7401a8c0@Maelstrom>

libibcm can't connect/talk to libicm on other machine.    I know this sounds simple, but have you checked the routing tables?
    ....JW
  ----- Original Message ----- 
  From: Bub Thomas 
  To: openib-general at openib.org 
  Sent: Tuesday, September 05, 2006 9:22 AM
  Subject: [openib-general] libibcm can't connect/talk to libicm on other machine.


  I'm still in the process of migrating my gen1 application to gen2.

  Actually I CAN connect a gen2 application to a gen2 listener application on the same machine but NOT to a gen 2 listener on another machine.

  Any hints where to look at?

  Is there anything in the architecture that might prevent a libibcm connection to another machine?

  I'm using an old Voltaire switch to connect the machines. Can this be the reason?

  The switch didn't cause problems using gen1 clients.

  Thanks

  Thomas Bub


------------------------------------------------------------------------------


  _______________________________________________
  openib-general mailing list
  openib-general at openib.org
  http://openib.org/mailman/listinfo/openib-general

  To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060905/f4b83959/attachment.html>

From arlin.r.davis at intel.com  Tue Sep  5 14:16:15 2006
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 5 Sep 2006 14:16:15 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <44F8E006.5030607@pathscale.com>
Message-ID: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>

Robert,

Here is a slightly modified patch for your attributes issue. Can you give it a try?

Signed-off by: Arlin Davis ardavis at ichips.intel.com

Index: dapl/openib/dapl_ib_util.c
===================================================================
--- dapl/openib/dapl_ib_util.c	(revision 9106)
+++ dapl/openib/dapl_ib_util.c	(working copy)
@@ -446,6 +446,7 @@
 		return(dapl_convert_errno(errno,"ib_query_hca"));
 
 	if (ia_attr != NULL) {
+		(void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
 		ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
 		ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
 		ia_attr->ia_address_ptr = 
@@ -470,7 +471,12 @@
 		/* ia_attr->hardware_version_minor   = dev_attr.fw_ver; */
 		ia_attr->max_eps                  = dev_attr.max_qp;
 		ia_attr->max_dto_per_ep           = dev_attr.max_qp_wr;
-		ia_attr->max_rdma_read_per_ep     = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_in         = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_out        = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+		ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
 		ia_attr->max_evds                 = dev_attr.max_cq;
 		ia_attr->max_evd_qlen             = dev_attr.max_cqe;
 		ia_attr->max_iov_segments_per_dto = dev_attr.max_sge;
@@ -501,6 +507,7 @@
 	}
 	
 	if (ep_attr != NULL) {
+		(void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
 		ep_attr->max_mtu_size     = port_attr.max_msg_sz;
 		ep_attr->max_rdma_size    = port_attr.max_msg_sz;
 		ep_attr->max_recv_dtos    = dev_attr.max_qp_wr;
Index: dapl/openib_cma/dapl_ib_util.c
===================================================================
--- dapl/openib_cma/dapl_ib_util.c	(revision 9106)
+++ dapl/openib_cma/dapl_ib_util.c	(working copy)
@@ -424,6 +424,7 @@
 		return(dapl_convert_errno(errno,"ib_query_hca"));
 
 	if (ia_attr != NULL) {
+		(void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
 		ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
 		ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
 		ia_attr->ia_address_ptr = 
@@ -446,6 +447,8 @@
 		ia_attr->hardware_version_major = dev_attr.hw_ver;
 		ia_attr->max_eps                  = dev_attr.max_qp;
 		ia_attr->max_dto_per_ep           = dev_attr.max_qp_wr;
+		ia_attr->max_rdma_read_in         = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_out        = dev_attr.max_qp_rd_atom;
 		ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
 		ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
 		ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
@@ -481,6 +484,7 @@
 	}
 	
 	if (ep_attr != NULL) {
+		(void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
 		ep_attr->max_mtu_size     = port_attr.max_msg_sz;
 		ep_attr->max_rdma_size    = port_attr.max_msg_sz;
 		ep_attr->max_recv_dtos    = dev_attr.max_qp_wr;
Index: dapl/openib_scm/dapl_ib_util.c
===================================================================
--- dapl/openib_scm/dapl_ib_util.c	(revision 9106)
+++ dapl/openib_scm/dapl_ib_util.c	(working copy)
@@ -373,6 +373,7 @@
 		return(dapl_convert_errno(errno,"ib_query_hca"));
 
 	if (ia_attr != NULL) {
+		(void) dapl_os_memzero(ia_attr, sizeof(*ia_attr));
 		ia_attr->adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
 		ia_attr->vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0';
 		ia_attr->ia_address_ptr = (DAT_IA_ADDRESS_PTR)&hca_ptr->hca_address;
@@ -390,7 +391,12 @@
 		/* ia_attr->hardware_version_minor   = dev_attr.fw_ver; */
 		ia_attr->max_eps                  = dev_attr.max_qp;
 		ia_attr->max_dto_per_ep           = dev_attr.max_qp_wr;
-		ia_attr->max_rdma_read_per_ep     = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_in         = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_out        = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
+		ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
 		ia_attr->max_evds                 = dev_attr.max_cq;
 		ia_attr->max_evd_qlen             = dev_attr.max_cqe;
 		ia_attr->max_iov_segments_per_dto = dev_attr.max_sge;
@@ -422,6 +428,7 @@
 	}
 	
 	if (ep_attr != NULL) {
+		(void) dapl_os_memzero(ep_attr, sizeof(*ep_attr));
 		ep_attr->max_mtu_size     = port_attr.max_msg_sz;
 		ep_attr->max_rdma_size    = port_attr.max_msg_sz;
 		ep_attr->max_recv_dtos    = dev_attr.max_qp_wr;


From rjwalsh at pathscale.com  Tue Sep  5 14:30:05 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 05 Sep 2006 14:30:05 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
Message-ID: <44FDEC5D.8050508@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Arlin Davis wrote:
> Robert,
> 
> Here is a slightly modified patch for your attributes issue. Can you give it a try?
> 

I'll give it a spin this afternoon: it looks quite a bit more
comprehensive than the small patch I did.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP3sXfzvnpzTd9fxAQLwwAf+IOIsC+gqb9Juzt8rwJJlnSW1PjZFrRGi
NrCnRXvn52tsgclNNHGSzqOgkIntZ2TqxwEJJeTou3UhUQ5laJWEkQgwrvFTazcn
+IQH3BGDLFyZJJQO0WSi2685dEKOH5by6Zp9yVo9sy3Odu6jod2v/uCOjdGkR8ys
CvQW+y70qDmom1SJ9P2XQ4/dxxX/v2IFYOWMoVzMlDZsNnvnti/Uspwc1KpQeP6F
RRwWImlDyuuAW6+JX6atM5Lne797T5IO7MugW6d/+0oAMVU7H3oiDBdX+9tVwBci
IBJJ/PdQ8e7a7x4uOg+LKOSDH16IFVNaua4XhBfVmQEjf1y41KepDw==
=1zt8
-----END PGP SIGNATURE-----


From mvharish at gmail.com  Tue Sep  5 14:49:27 2006
From: mvharish at gmail.com (harish)
Date: Tue, 5 Sep 2006 14:49:27 -0700
Subject: [openib-general] Question about interrupt generation
Message-ID: <a33d0a9f0609051449x79e6c4bidc41fc5031a86d76@mail.gmail.com>

Hi All,

I tried the following simple experiment and am not able to understand the
results:

Calcualted the number of interrupts  generated by the infiniband [with
little or no traffic to the NIC] over a period of 10seconds and saw around
10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K
interrupts/sec.  This screwed up my host machine. To reduce the impact of
the interrupts, I add a call back that is scheduled to be periodically
called every few microseconds that masks the irq line used by the NIC and a
little later unmasks the same. Noticed that with no traffic, I see anywhere
between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+
interrupts/sec.

Am a newbie to infiniband technology and so do not understand why so many
interrupts are getting generated when I have my call back periodically
called. Could it be that the Infiniband supports MSI? Or is what I am seeing
IPIs? Or does Infiniband generate interrupts based on types of events and
what I am doing by masking/unmasking the interrupt line is one such event?
Any information/suggestions would be useful.

Thanks in advance,
harish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060905/53509a1c/attachment.html>

From robert.j.woodruff at intel.com  Tue Sep  5 14:51:37 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Tue, 5 Sep 2006 14:51:37 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C8DEB9A@orsmsx418.amr.corp.intel.com>

Robert Walsh wrote,
>I'll give it a spin this afternoon: it looks quite a bit more
>comprehensive than the small patch I did.

I also just tried running the ib_rdma_bw test and it seems to
be flaky if you stress it. If you just run the defaults, it seems to
work, but if you crank up the iterations and the message size,
it sometimes fails with.....

[woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
iters=10000 | duplex=0 | cma=0 |
4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
VAddr 0x00002a95dd3480
4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
VAddr 0x00002a95c85480
4730:main: Completion with error at client:
4730:main: Failed status 9: wr_id 3
4730:main: scnt=7584, ccnt=6584
[woody at rkl-13 bin]$  

woody


From rdreier at cisco.com  Tue Sep  5 14:57:29 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 05 Sep 2006 14:57:29 -0700
Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/:
 possible cleanups
In-Reply-To: <20060904170350.GR4416@stusta.de> (Adrian Bunk's message of
	"Mon, 4 Sep 2006 19:03:50 +0200")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060904170350.GR4416@stusta.de>
Message-ID: <adalkoydliu.fsf@cisco.com>

Thanks, I've rolled this up in the amso1100 patch I have queued up.

 > - #if 0 the following unused global function:
 >  - c2_mq.c: c2_mq_count()

Tom/Steve, any reason to keep c2_mq_count() at all?

 - R.


From rdreier at cisco.com  Tue Sep  5 14:58:56 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 05 Sep 2006 14:58:56 -0700
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com> (Sean
	Hefty's message of "Fri, 1 Sep 2006 15:33:55 -0700")
References: <000101c6ce16$b6057090$e598070a@amr.corp.intel.com>
Message-ID: <adahczmdlgf.fsf@cisco.com>

Thanks, queued for 2.6.19.


From ardavis at ichips.intel.com  Tue Sep  5 15:07:59 2006
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Tue, 05 Sep 2006 15:07:59 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <44FDEC5D.8050508@pathscale.com>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
	<44FDEC5D.8050508@pathscale.com>
Message-ID: <44FDF53F.3040601@ichips.intel.com>

Robert Walsh wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Arlin Davis wrote:
>  
>
>>Robert,
>>
>>Here is a slightly modified patch for your attributes issue. Can you give it a try?
>>
>>    
>>
>
>I'll give it a spin this afternoon: it looks quite a bit more
>comprehensive than the small patch I did.
>
>Regards,
> Robert.
>  
>

Just added all appropriate RDMA in/out fields and some code to zero out 
the structure to avoid uninitialized data fields.

-arlin


From rjwalsh at pathscale.com  Tue Sep  5 15:13:25 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 05 Sep 2006 15:13:25 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <44FDF53F.3040601@ichips.intel.com>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
	<44FDEC5D.8050508@pathscale.com> <44FDF53F.3040601@ichips.intel.com>
Message-ID: <44FDF685.8020205@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Just added all appropriate RDMA in/out fields and some code to zero out
> the structure to avoid uninitialized data fields.

Yup.  By "comprehensive", I meant "better" :-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP32hfzvnpzTd9fxAQJnMwgAgcyxQpxdbk/eLEECXTnAOAYjyv3seTpE
Ir1s+K7JEYL2Rbyk9h9CzbK67YSYe4QeIE52pTopEVFw8mnSLaz+ZIOmvdRUiHSS
FiwEyfbXEPrFKZfyXu/REsigWx5vn7vCZid3hUIdx1vbt9eVAiVPGbAO1ALI8en9
/xc7iTGpYxwBwNOYbdhW0cOCjvobV98Fp6UJebvxd9xiRUS6c2JeZKLYdQyRO5rm
JV7L8HqJr1dS8nbAiPG7DSjCv7/3SFdQVr+Tgt5MQpVfD56z41eBBuXzEfeqsg5E
HHSxUOTdqizpscMyLudAWGAr5DZwOAQ4Z90zAL8gc2YYbjbOT3D6bA==
=JKRU
-----END PGP SIGNATURE-----


From swise at opengridcomputing.com  Tue Sep  5 15:14:25 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 05 Sep 2006 17:14:25 -0500
Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/:
 possible cleanups
In-Reply-To: <adalkoydliu.fsf@cisco.com>
References: <20060901015818.42767813.akpm@osdl.org>
	<20060904170350.GR4416@stusta.de> <adalkoydliu.fsf@cisco.com>
Message-ID: <1157494465.9086.45.camel@stevo-desktop>


Its old debug code that isn't used anywhere.  It would be nice to keep
it around, but if you really don't want it, nuke it...


On Tue, 2006-09-05 at 14:57 -0700, Roland Dreier wrote:
> Thanks, I've rolled this up in the amso1100 patch I have queued up.
> 
>  > - #if 0 the following unused global function:
>  >  - c2_mq.c: c2_mq_count()
> 
> Tom/Steve, any reason to keep c2_mq_count() at all?
> 
>  - R.


From rjwalsh at pathscale.com  Tue Sep  5 15:33:38 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 05 Sep 2006 15:33:38 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
Message-ID: <44FDFB42.3040305@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Arlin Davis wrote:
> Robert,
> 
> Here is a slightly modified patch for your attributes issue. Can you give it a try?

Oddly enough, I'm back to the same problem with your new patch as I saw
with the unpatched version:

  $ mpiexec -n 2 ./a.out
  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
registry: OpenIB-cma
  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
rdma configuration
  will use rdma configuration
  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
  Hello world: rank 0 of 2 running on ib-idev-05
  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
    exit status of rank 1: killed by signal 9

Still tracking this one down.  I noticed in the patch you removed a
couple of lines, too:

  - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;

Any particular reason why you did this?

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP37QvzvnpzTd9fxAQI79wf6Anc3/Ve7tg3x31hE4i5qa9bB01qEYmEv
9xx4FQqXNbhMos9hHEQAWJ9S0sKccr+yCNekkIX6GzlaVDv+AKDzZF6uzA8Prrhr
CEcf28c1Pw7gflg8MMfVcnAHr2YG/hXyd+ve9m6cGv0rxgPqY6lWmHjghKDxKO7h
f/SaDOaVAuN6kEJMRgIrKIxDyFSVl4z1tGXAK3yHVhslvPqNqGwDqNfFMV6UQK+V
NNfKVVKVCttUWdzcVELzi3zkiat5xDdqIcwQr8xs2YaXHfAGeD4NurWowil887Sn
bRuh5soVdBaKW9mAtQWuAECt9VLDvyYReLWkEq6ikgilPGCeJluDEw==
=TNaE
-----END PGP SIGNATURE-----


From rdreier at cisco.com  Tue Sep  5 15:39:50 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 05 Sep 2006 15:39:50 -0700
Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/:
 possible cleanups
In-Reply-To: <1157494465.9086.45.camel@stevo-desktop> (Steve Wise's
	message of "Tue, 05 Sep 2006 17:14:25 -0500")
References: <20060901015818.42767813.akpm@osdl.org>
	<20060904170350.GR4416@stusta.de> <adalkoydliu.fsf@cisco.com>
	<1157494465.9086.45.camel@stevo-desktop>
Message-ID: <ada8xkydjk9.fsf@cisco.com>

    Steve> Its old debug code that isn't used anywhere.  It would be
    Steve> nice to keep it around, but if you really don't want it,
    Steve> nuke it...

No, that's fine, I'll leave it inside the #if 0.

 - R.


From arlin.r.davis at intel.com  Tue Sep  5 15:51:46 2006
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 5 Sep 2006 15:51:46 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <44FDFB42.3040305@pathscale.com>
Message-ID: <000101c6d13d$dd9975f0$bb97070a@amr.corp.intel.com>


>Oddly enough, I'm back to the same problem with your new patch as I saw
>with the unpatched version:
 
Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked.

Did you ever pick up the Intel MPI 3.0 beta?

>
>  $ mpiexec -n 2 ./a.out
>  I_MPI: [1] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
>registry: OpenIB-cma
>  I_MPI: [0] MPIDI_CH3I_RDMA_init(): will use DAPL provider from
>registry: OpenIB-cma
>  I_MPI: [0] MPIDI_CH3_Init(): I_MPI: [1] MPIDI_CH3_Init(): will use
>rdma configuration
>  will use rdma configuration
>  [1:ib-idev-06][rdma_iba_init_d.c:154] error(0x60029): OpenIB-cma:
>could not create DAPL endpoint: DAT_INVALID_PARAMETER(DAT_INVALID_ARG6)
>  Hello world: rank 0 of 2 running on ib-idev-05
>  rank 1 in job 1  ib-idev-05_51891   caused collective abort of all ranks
>    exit status of rank 1: killed by signal 9
>
>Still tracking this one down.  I noticed in the patch you removed a
>couple of lines, too:
>
>  - ia_attr->max_rdma_read_per_ep = dev_attr.max_qp_rd_atom;
>
>Any particular reason why you did this?

max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Look at dat.h line #369

/* To support backwards compatibility for DAPL-1.0 */
#define max_rdma_read_per_ep 		max_rdma_read_per_ep_in
#define DAT_IA_FIELD_IA_MAX_DTO_PER_OP  DAT_IA_FIELD_IA_MAX_DTO_PER_EP_IN

/* To support backwards compatibility for DAPL-1.0 & DAPL-1.1 */
#define max_mtu_size max_message_size


-arlin


From rjwalsh at pathscale.com  Tue Sep  5 16:07:24 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 05 Sep 2006 16:07:24 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <000101c6d13d$dd9975f0$bb97070a@amr.corp.intel.com>
References: <000101c6d13d$dd9975f0$bb97070a@amr.corp.intel.com>
Message-ID: <44FE032C.4010108@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>> Oddly enough, I'm back to the same problem with your new patch as I saw
>> with the unpatched version:
>  
> Hmmm. We ran this with OFED 1.1 RC3 and MPI 3.0b on an EM64T server with your adapter and it worked.

Weird - it's not working for me at all.  Maybe I'm messing up somewhere.
 I've got a meeting for the next hour or so - I'll check again when I
get back.

> Did you ever pick up the Intel MPI 3.0 beta?

Yup.

> max_rdma_read_per_ep is the same as max_rdma_read_per_ep_in. 

Ah - fair enough.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4DLPzvnpzTd9fxAQJ3nwgAiO+dLDRQv22RrBHYqHcodDwC2ZakxzFh
pXBn9j5kwzA2EmnXCvex14v7K168Alqr9lgUpfaGr6StZsCdBU0FY2TRjok41VFl
h+fYu78QFgDjleTMkp17Hl7RG9/r8AWzKzTG1LDn1YqwHrn9ngeZlqFfy1BP1tfB
pkkW+Nj7HQXbXUNiDc/V9HKW7eBOjwCvkfDI7Knbrfp2QVBI/9ABpWGO4bJf3P7X
n9ZzlEBN0SCOHKtGAa1gspQrmJGMHw0qyajUA6Yuyp1dWRygbl8L+ahF2BJFwZSx
KGyhoBRZexpP8m0AJASnKgAVjGf6JR31dL7O8WAOjD4QpFEofMSqqA==
=yDmH
-----END PGP SIGNATURE-----


From bugzilla-daemon at openib.org  Tue Sep  5 16:22:29 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Tue,  5 Sep 2006 16:22:29 -0700 (PDT)
Subject: [openib-general] [Bug 218] New: Call usage verifier is detecting
	reinitialization of spinlocks already in use
Message-ID: <20060905232229.F083B2283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=218

           Summary: Call usage verifier is detecting reinitialization of
                    spinlocks already in use
           Product: OpenFabrics Windows
           Version: unspecified
          Platform: X86
        OS/Version: Other
            Status: NEW
          Severity: major
          Priority: P2
         Component: mthca driver
        AssignedTo: bugzilla at openib.org
        ReportedBy: jbottorff at xsigo.com


I built a debug version of revision 467 and turned on call usage verifier (CUV)
for the mthca driver. It's detecting many cases of spinlocks being initialized
after they have already been used. This is usually bad. To build with CUV all
you have to do is add the following line to the sources file.

VERIFIER_DDK_EXTENSIONS=1

My experience is CUV tends to detect a different set of bugs from driver
verifier, and it might be useful to turn on CUV for all the drivers and see
what's reported.

CUV Driver Error: Calling KeInitializeSpinLock(...) at File
k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h, Line 57
                  The Spin lock specified as parameter 1 [0x87840EDC]
                  has been previously initialized and used as
                  a In-Stack Queued Spin lock by this driver.
Break, Ignore, Zap, Remove, Disable all, H for help (bizrdh)? b
b
Breaking in... (press g<enter> to return to assert menu)
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
8075cc00 cc               int     3
0: kd> k 50
ChildEBP RetAddr  
f7926438 baeab189 nt!DbgBreakPoint
f7926450 baeaa814 mthca!DDKExtPrompt+0x10a
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\messages.cpp @ 709]
f7926468 baea990e mthca!DDKExtVInitializeItem+0x98
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\validate.cpp @ 195]
f7926490 bae81635 mthca!DDK_KeInitializeSpinLock+0x35
[d:\dnsrv\sdktools\ddk\ddk_ext\verifier\locks.cpp @ 298]
f79264a4 baea42ee mthca!spin_lock_init+0x15
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_spinlock.h @ 58]
f79264b0 baea4057 mthca!mthca_wq_init+0xe
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 383]
f792653c bae7eaac mthca!mthca_modify_qp+0xe97
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mthca_qp.c @ 853]
f7926550 bae76eaa mthca!ibv_modify_qp+0x1c
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\mt_verbs.c @ 467]
f7926628 ba99e0f3 mthca!mlnx_modify_qp+0x11a
[k:\windows-openib\src\winib-467b\hw\mthca\kernel\hca_verbs.c @ 955]
f792673c ba99df12 ibbus!al_modify_qp+0x113
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1346]
f7926760 ba99d7b8 ibbus!modify_qp+0x502
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1313]
f7926778 ba99eef5 ibbus!ib_modify_qp+0x18
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1288]
f7926848 ba99ec9e ibbus!init_dgrm_svc+0x175
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1453]
f7926870 ba96d005 ibbus!ib_init_dgrm_svc+0x73e
[k:\windows-openib\src\winib-467b\core\al\al_qp.c @ 1395]
f7926c4c ba969fd8 ibbus!create_spl_qp_svc+0x18a5
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 718]
f7926c78 ba969a45 ibbus!spl_qp_agent_pnp+0x128
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 476]
f7926c8c ba98f071 ibbus!spl_qp0_agent_pnp_cb+0x95
[k:\windows-openib\src\winib-467b\core\al\kernel\al_smi.c @ 429]
f7926cf4 ba98f2e8 ibbus!__pnp_notify_user+0x561
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 523]
f7926d38 ba990e7c ibbus!__pnp_port_notify+0x118
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 612]
f7926d70 ba94d8a4 ibbus!__pnp_process_add_ca+0x2dc
[k:\windows-openib\src\winib-467b\core\al\kernel\al_pnp.c @ 943]
f7926d8c ba953b94 ibbus!__cl_async_proc_worker+0x94
[k:\windows-openib\src\winib-467b\core\complib\cl_async_proc.c @ 153]
f7926da0 ba955c4c ibbus!__cl_thread_pool_routine+0x54
[k:\windows-openib\src\winib-467b\core\complib\cl_threadpool.c @ 67]
f7926dac 80a07678 ibbus!__thread_callback+0x2c
[k:\windows-openib\src\winib-467b\core\complib\kernel\cl_thread.c @ 49]
f7926ddc 80781346 nt!PspSystemThreadStartup+0x2e
00000000 00000000 nt!KiThreadStartup+0x16
0: kd> g


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mvharish at gmail.com  Tue Sep  5 16:53:10 2006
From: mvharish at gmail.com (harish)
Date: Tue, 5 Sep 2006 16:53:10 -0700
Subject: [openib-general] Question about interrupt generation
In-Reply-To: <a33d0a9f0609051449x79e6c4bidc41fc5031a86d76@mail.gmail.com>
References: <a33d0a9f0609051449x79e6c4bidc41fc5031a86d76@mail.gmail.com>
Message-ID: <a33d0a9f0609051653q6223cebcrf5719c8f1838672f@mail.gmail.com>

Hi,
One more question. What kind of event mask helps mask the interrupts?
thanks
harish

On 9/5/06, harish <mvharish at gmail.com> wrote:
>
> Hi All,
>
> I tried the following simple experiment and am not able to understand the
> results:
>
> Calcualted the number of interrupts  generated by the infiniband [with
> little or no traffic to the NIC] over a period of 10seconds and saw around
> 10-20 interrupts/sec. Then ran a netperf test and saw around 100+ K
> interrupts/sec.  This screwed up my host machine. To reduce the impact of
> the interrupts, I add a call back that is scheduled to be periodically
> called every few microseconds that masks the irq line used by the NIC and a
> little later unmasks the same. Noticed that with no traffic, I see anywhere
> between 30-50K interrupts/sec. With the netperf traffic, I see around 120K+
> interrupts/sec.
>
> Am a newbie to infiniband technology and so do not understand why so many
> interrupts are getting generated when I have my call back periodically
> called. Could it be that the Infiniband supports MSI? Or is what I am seeing
> IPIs? Or does Infiniband generate interrupts based on types of events and
> what I am doing by masking/unmasking the interrupt line is one such event?
> Any information/suggestions would be useful.
>
> Thanks in advance,
> harish
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060905/e23a136a/attachment.html>

From rjwalsh at pathscale.com  Tue Sep  5 17:45:45 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 05 Sep 2006 17:45:45 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C8DEB9A@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C8DEB9A@orsmsx418.amr.corp.intel.com>
Message-ID: <44FE1A39.3060108@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Woodruff, Robert J wrote:
> Robert Walsh wrote,
>> I'll give it a spin this afternoon: it looks quite a bit more
>> comprehensive than the small patch I did.
> 
> I also just tried running the ib_rdma_bw test and it seems to
> be flaky if you stress it. If you just run the defaults, it seems to
> work, but if you crank up the iterations and the message size,
> it sometimes fails with.....
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
> VAddr 0x00002a95dd3480
> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
> VAddr 0x00002a95c85480
> 4730:main: Completion with error at client:
> 4730:main: Failed status 9: wr_id 3
> 4730:main: scnt=7584, ccnt=6584
> [woody at rkl-13 bin]$  

This looks like a known bug, the fix to which didn't make it into OFED
1.1-RC3.  Hopefully we can still get this into 1.1-RC4.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4aOfzvnpzTd9fxAQKAEggAlZC5hYi9kdxLkj9Mfl/BwHJQxWUwsKcG
K2ck3jtrP6PVa04FdVI/TNL2XE7R3eu69vTfBaTS26pw2CVM6av0ztFiWEV2r5Fu
8FXGJBOuDOYxnwuA0o3yHSMVFtrRW6Jgn2G/JQPZ8IDAK7GrPj3VebvyclPwF5+d
KMPIFXJaTzjoJl2JEGFLiSlf+tFMOEs3vazrRwkZpQezKRcs3F1E6TQImtN7kuYK
0/IKxeS4ZOduXpczsJZgsPs6Y9kYi94XN0E4JeJJAh9Miq+bXkxhxbrafieNl7xW
n9m7i/phcFcngSzDwjBNXE2ZuQjujDpz94SRnkVedomYNbr5zKXBgQ==
=NurT
-----END PGP SIGNATURE-----


From rjwalsh at pathscale.com  Tue Sep  5 18:21:18 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 05 Sep 2006 18:21:18 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
Message-ID: <44FE228E.9050402@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Here is a slightly modified patch for your attributes issue. Can you give it a try?

I rebuilt OFED from scratch with the patch, and ran successfully on
Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
If you have a later beta version you could send me, that would be great,
too.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP4ijvzvnpzTd9fxAQIqeggAkJ4OQ3GrkpqyJUbHImgqbob6npINOv5L
lBUANcHZZ8DMFIq5hP4H+OYX2s/yoS3AKDGf0x8kHoVsTDFTFNe69bsGzJMT3znP
YDmq3ETN4aSGOgKX2NFzWs+mYG0pEN9uDt/SmEYmccYiIuK3lTlb8jxON6mqqJFL
nfitAp7WaLn7OS8A3CfVrAbWwYJ4U6UWPD/rB5sJTg8nTxECc94JaOhPZ90smB6H
9xk8OihEoTxodFLzcpaz/ORS4EPAle69Uw2tP3myjr/4w/SzLGJT6DFVpGQ0BaWC
jVXFYVKyVW4JmFMcW1X29ogmVNH8gEDBUfbG1P5Wd8sLzMMB18tINA==
=X/q7
-----END PGP SIGNATURE-----


From sweitzen at cisco.com  Tue Sep  5 21:57:30 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Tue, 5 Sep 2006 21:57:30 -0700
Subject: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how
 do I enable madeye)?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023884A4@xmb-sjc-216.amer.cisco.com>

> 5. Added Madeye utility

How do I build madeye?  I don't see any reference to it to install.sh.
Is there any documentation for madeye?

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

From bugzilla-daemon at openib.org  Tue Sep  5 23:39:21 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Tue,  5 Sep 2006 23:39:21 -0700 (PDT)
Subject: [openib-general] [Bug 218] Call usage verifier is detecting
	reinitialization of spinlocks already in use
Message-ID: <20060906063921.1E4AC2283D8@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=218


tziporet at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |rolandd at cisco.com


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From moshek at voltaire.com  Wed Sep  6 02:26:47 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Wed, 6 Sep 2006 12:26:47 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint
 question
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB8563@taurus.voltaire.com>

I have tested the mstflint problem with two different ppc64 machines :

- On sles 10 PPC64 PowerMac G5  ->  mstflint -d 0001:07:00.0 q    works
o.k. with and without the ib_mthca loaded

- On s;es10 PPC64 IBM JS21   ->  mstflint -d 0001:07:00.0 q    DOESN'T
work with and without the ib_mthca loaded and I have to use
/proc/bus/pci/.....

Is it time to create a  work arround that opens /proc/bus/pci/ .... And
always work ?

Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Michael S.
Tsirkin
Sent: Tuesday, September 05, 2006 4:37 PM
To: Or Gerlitz
Cc: Roland Dreier; openib-general at openib.org
Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB -
mstflint question


Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: getting LOC_QP_OP_ERR with IPoIB - mstflint question
> 
> Michael S. Tsirkin wrote:
> > Donnu, it looks really weird. Could you try firmware 3.5.0 please?
> 
> I just noted that you can not work with mstflint if the mthca driver 
> is
> not loaded, i think it was not the case in the gen1 tools, am i
correct.

Yes, recent kernels disable device access once driver is unloaded:

mstflint -d 08:00.0 q
*** ERROR *** Read a corrupted device id (0xffff). Probably HW/PCI
access problem
*** ERROR *** Device type 65535 not supported.
*** ERROR *** Can not get flash type using device 08:00.0

mstflint should work without driver using /proc:
mstflint -d /proc/bus/pci/08/00.0 q
Image type:      Failsafe
I.S. Version:    1
Chip Revision:   A0


In gen1 flint had a separate driver which you had to load.
I am not sure whether this would work on 2.6.18

> Is this connected to this print
> 
> 	ACPI: PCI interrupt for device 0000:02:00.0 disabled
> 
> i see once the mthca driver is unloaded?
> 
> Or.

Probably not.

-- 
MST

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From tziporet at dev.mellanox.co.il  Wed Sep  6 04:23:29 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 06 Sep 2006 14:23:29 +0300
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <44FE228E.9050402@pathscale.com>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
	<44FE228E.9050402@pathscale.com>
Message-ID: <44FEAFB1.3040902@dev.mellanox.co.il>

Robert Walsh wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>   
>> Here is a slightly modified patch for your attributes issue. Can you give it a try?
>>     
>
> I rebuilt OFED from scratch with the patch, and ran successfully on
> Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
> Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
> in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
> If you have a later beta version you could send me, that would be great,
> too.
>
> Regards,
>  Robert.
>   
I added this patch under fixes to OFED 1.1. Will be in RC4

Tziporet


From tzachid at mellanox.co.il  Wed Sep  6 04:37:17 2006
From: tzachid at mellanox.co.il (Tzachi Dar)
Date: Wed, 6 Sep 2006 14:37:17 +0300
Subject: [openib-general] [Openib-windows] File transfer performance
	options
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302D8674E@mtlexch01.mtl.com>

Hi Paul,

In the beginning of this mail thread you have described a problem of
passing files from a Linux server to windows server. You have described
many experiments that you did and the fact that the performance that you
received was not as good as expected.

In reply I have advised you to consider using SDP for this file
transfers. if to summarize your answer in one sentence you said that SDP
is still not ready. I would have loved to tell you that SDP is ready,
but unfortunately the windows SDP is not a product yet. However, it is
mature enough to start doing some measurements which is what I did.

I have changed a simple benchmark program that I had to also write it's
data to disk. As a disk I have used AMT Ramdisk (512 MB). I have run two
instances of this program, and got the results of 578 MB/sec which is
considerably higher than results that you have achieved using other
experiments. (one client gave me 450 MB/sec)
Please note that since data is being copied 3 times in this scenario, we
are standing near the theoretical speed of the machine (one copy is from
the HCA to the kernel buffer, another is from the kernel buffer to the
application buffer, and that last copy is from the application buffer to
the Ram Disk).

It is true that the development road of your application might force you
not to use SDP, as SDP is not in production right now, but if you can
wait the extra time than please note that SDP can supply the BW.

Thanks
Tzachi

> -----Original Message-----
> From: Paul Baxter [mailto:paul.baxter at dsl.pipex.com] 
> Sent: Friday, September 01, 2006 1:11 AM
> To: openib-windows at openib.org; Tzachi Dar
> Subject: Re: [Openib-windows] File transfer performance options
> 
> >
> From: "Tzachi Dar" <tzachid at mellanox.co.il> There is one 
> thing that is missing from your mail, and that is if you want 
> to see the windows machine as some file server (for example 
> SAMBA, NFS, SRP), or are you ready to accept it as a normal 
> server. The big difference is that on the second option the 
> server can be running at user mode (for example FTP server).
> <
> 
> The windows machine has to list and then choose amongst a set 
> of files from our Linux system and retrieve only relevant 
> files e.g. those whose filename relates to particular time slots.
> We prefer not to write a Linux 'client' application to do 
> this explicitly but would rather have the windows machine's 
> application access our data files directly.
> A few application-level locks are in place so that we won't 
> be writing new files to our local disks at the same time as 
> the remote archiving accesses them.
> 
> Other than that the main goal is to make the inter-OS (and 
> inter-company) interface as simple as possible. It currently 
> doesn't seem that there is a proven solution to support this 
> at any transfer rate that takes significant advantage of Infiniband.
> 
> I've specced my disks for 200 MB/s and we have DDR cards etc. 
> (for other reasons!), just no means to flex their muscles too 
> easily using existing COTS infrastructure.
> 
> >
> When (the server application is) running at user mode, SDP 
> can be used as a socket provider.  This means that 
> theoretically every socket application should run and enjoy 
> the speed of Infiniband. Currently there are two projects of 
> SDP under development: one is for Linux and the other for 
> Windows, so SDP can be used to allow machines from both types 
> to connect.
> <
> 
> The key here is 'theoretical'. IMHO, Linux-Linux and 
> Windows-Windows get a lot more testing and priority than a 
> Linux-Windows combination. (Which is fair enough if that's 
> where the market is.)
> 
> We've been burnt by this not being robustly tested and proven 
> in reality in cross-platform cases. (Note that this was 
> before the current openfabrics windows driver initiative).
> 
> >
> Performance that we have measured on the windows platform, 
> using DDR cards was bigger than 1200 MB/Sec. (of course, this 
> data was from host memory, and not from disks).
> <
> 
> We've used SDP previously in our Linux message interface and 
> were very happy with the results. Then someone included an 
> old (v9 ) Solaris machine into the mix so even before we 
> tested on Windows, we ended up using sockets/gigabit ethernet 
> for command transfers.
> 
> SDP as an option for other parts of our application (large 
> data transfers) took a big turn for the worse when the 
> previous Linux SDP implementation was mothballed without a 
> mature replacement. We've ended up writing our application to 
> use RDMA write directly now.
> 
> Note that I'm not too critical of the way SDP went away since 
> I can appreciate the need to greatly simplify the Linux SDP 
> implementation, it did leave people like me in the lurch 
> however. I really appreciate the effort put into these things 
> by Michael Tsirkin et al. and look forward to the new code in OFED 1.1
> 
> 
> I'm also not sure that cross-platform operation of 
> high-performance Infiniband is near the top of anyone's 
> agenda. Inside the windows world and inside the Linux world 
> things are looking rosey, but I'm largely stuck with IPoIB or 
> low-level verbs for cross-platform use.
> 
> SRP looks promising, but as a user, I see lots of statements 
> that this SRP initiator only works with that SRP target. 
> Support for cross-platform high-speed operation is 'coming soon'.
> 
> I'd love to know whether there has been significant testing 
> between the windows openfabrics  SRP initiator and the openIB 
> Linux SRP target? Is this on anyone's agenda. (Distinct from 
> any windows SRP 'WHQL-certification
> issues.) Is their even an 'inter-operable' standard that both 
> implementations can aspire to match?
> 
> >
> So, if all you need to do is to pass files from one side to 
> the other, I would recommend that you will check this option.
> <
> Thanks for the tip. Maybe now the dust is settling on Linux 
> SDP we may well revisit this option.
> 
> >
> One note about your experiments: when using ram disks, this 
> probably means that there is one more copy from the ram disk 
> to the application buffer. A real disk, has it's DMA engine, 
> while a ram disk doesn't.
> Another copy is probably not a problem when you are talking 
> about 100MB/sec, but it would become a problem once you will 
> use SDP (I hope).
> <
> We were only using these as a sanity check that physical 
> disks weren't the cause of the bottleneck.
> 
> >
> Thanks
> Tzachi
> <
> Thanks to you, Tzachi, and everyone helping to develop robust 
> infiniband support across a range of platforms.
> 


From moshek at voltaire.com  Wed Sep  6 06:01:44 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Wed, 6 Sep 2006 16:01:44 +0300
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB8567@taurus.voltaire.com>

Hi Tziporet,

I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64.

Install is stopped at the very beginning as 64-bit udev is missing.

I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed
as result of compilation error.

Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit
udev ?

Moshe 

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Tziporet Koren
Sent: Tuesday, August 29, 2006 5:50 PM
To: OPENIB
Subject: [openib-general] problems to regiser memory as a reglar user on
SLES9 SP3


Hi All,
In testing today we found that on SLES9 SP3 memory locking as a regular 
user fails.
Although I changed /etc/security/limits.conf and added the following two

lines:
* soft memlock <number>
* hard memlock <number>

Note that same change does work in SLES10.

Another change I tried (that worked in gen1) was to add the following 
line to the file/etc/sysctl.conf:
vm.disable_cap_mlock=1.

However nothing helped in SLES9

Does anyone have any idea how to solve this?

Thanks,
Tziporet

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From mst at mellanox.co.il  Wed Sep  6 06:24:48 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 16:24:48 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint
 question
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB8563@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB8563@taurus.voltaire.com>
Message-ID: <20060906132448.GA6928@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Is it time to create a  work arround that opens /proc/bus/pci/ .... And
> always work ?

But why isn't the driver loaded?

-- 
MST


From halr at voltaire.com  Wed Sep  6 06:21:25 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Sep 2006 09:21:25 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
Message-ID: <1157548884.12940.5019.camel@hal.voltaire.com>

OpenSM/osm_log API: Rather than polluting the namespace with needless
symbols, use symbol versions and have a versioned osm_log_init rather
than adding osm_log_init_v2 as an additional API

This patch is intended to be applied to both trunk and 1.1 versions.

Signed-off-by: Doug Ledford <dledford at redhat.com>
Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: osm/opensm/libopensm.map
===================================================================
--- osm/opensm/libopensm.map	(revision 9253)
+++ osm/opensm/libopensm.map	(working copy)
@@ -3,7 +3,6 @@ OPENSM_1.3 {
 		osm_log;
 		osm_is_debug;
 		osm_log_init;
-		osm_log_init_v2;
 		osm_mad_pool_construct;
 		osm_mad_pool_destroy;
 		osm_mad_pool_init;
@@ -55,3 +54,8 @@ OPENSM_1.3 {
 		osm_get_sm_mgr_state_str;
 	local: *;
 };
+
+OPENSM_1.3.1 {
+	global:
+		osm_log_init;
+} OPENSM_1.3;
Index: osm/opensm/libopensm.ver
===================================================================
--- osm/opensm/libopensm.ver	(revision 9158)
+++ osm/opensm/libopensm.ver	(working copy)
@@ -6,4 +6,4 @@
 # API_REV - advance on any added API
 # RUNNING_REV - advance any change to the vendor files
 # AGE - number of backward versions the API still supports
-LIBVERSION=2:0:1
+LIBVERSION=2:1:1
Index: osm/include/opensm/osm_log.h
===================================================================
--- osm/include/opensm/osm_log.h	(revision 9251)
+++ osm/include/opensm/osm_log.h	(working copy)
@@ -152,13 +152,13 @@ osm_log_construct(
 *	This function does not return a value.
 *
 * NOTES
-*	Allows calling osm_log_init, osm_log_init_v2, osm_log_destroy
+*	Allows calling osm_log_init, osm_log_destroy
 *
 *	Calling osm_log_construct is a prerequisite to calling any other
-*	method except osm_log_init or osm_log_init_v2.
+*	method except osm_log_init.
 *
 * SEE ALSO
-*	Log object, osm_log_init, osm_log_init_v2,
+*	Log object, osm_log_init,
 *	osm_log_destroy
 *********/
 
@@ -196,25 +196,25 @@ osm_log_destroy(
 *	Log object.
 *	Further operations should not be attempted on the destroyed object.
 *	This function should only be called after a call to
-*	osm_log_construct, osm_log_init, or osm_log_init_v2.
+*	osm_log_construct, osm_log_init.
 *
 * SEE ALSO
 *	Log object, osm_log_construct,
-*	osm_log_init, osm_log_init_v2
+*	osm_log_init
 *********/
 
-/****f* OpenSM: Log/osm_log_init_v2
+/****f* OpenSM: Log/osm_log_init
 * NAME
-*	osm_log_init_v2
+*	osm_log_init
 *
 * DESCRIPTION
-*	The osm_log_init_v2 function initializes a
+*	The osm_log_init function initializes a
 *	Log object for use.
 *
 * SYNOPSIS
 */
 ib_api_status_t
-osm_log_init_v2(
+osm_log_init(
   IN osm_log_t* const p_log,
   IN const boolean_t flush,
   IN const uint8_t log_flags,
@@ -249,27 +249,6 @@ osm_log_init_v2(
 *	osm_log_destroy
 *********/
 
-/****f* OpenSM: Log/osm_log_init
-* NAME
-*	osm_log_init
-*
-* DESCRIPTION
-*	The osm_log_init function initializes a
-*	Log object for use. It is a wrapper for osm_log_init_v2().
-*
-* SYNOPSIS
-*/
-ib_api_status_t
-osm_log_init(
-  IN osm_log_t* const p_log,
-  IN const boolean_t flush,
-  IN const uint8_t log_flags,
-  IN const char *log_file,
-  IN const boolean_t accum_log_file );
-/*
- * Same as osm_log_init_v2() but without max_size parameter
- */
-
 /****f* OpenSM: Log/osm_log_get_level
 * NAME
 *	osm_log_get_level
Index: osm/opensm/osm_log.c
===================================================================
--- osm/opensm/osm_log.c	(revision 9257)
+++ osm/opensm/osm_log.c	(working copy)
@@ -225,7 +225,7 @@ osm_is_debug(void)
 }
 
 ib_api_status_t
-osm_log_init_v2(
+osm_log_init_1_3_1(
   IN osm_log_t* const p_log,
   IN const boolean_t flush,
   IN const uint8_t log_flags,
@@ -280,13 +280,18 @@ osm_log_init_v2(
     return IB_ERROR;
 }
 
+__asm__(".symver osm_log_init_1_3_1, osm_log_init@@OPENSM_1.3.1");
+
 ib_api_status_t
-osm_log_init(
+osm_log_init_1_3(
   IN osm_log_t* const p_log,
   IN const boolean_t flush,
   IN const uint8_t log_flags,
   IN const char *log_file,
   IN const boolean_t accum_log_file )
 {
-  return osm_log_init_v2( p_log, flush, log_flags, log_file, 0, accum_log_file );
+  return osm_log_init_1_3_1( p_log, flush, log_flags, log_file, 0, accum_log_file );
 }
+
+__asm__(".symver osm_log_init_1_3, osm_log_init at OPENSM_1.3");
+
Index: osm/opensm/osm_opensm.c
===================================================================
--- osm/opensm/osm_opensm.c	(revision 9251)
+++ osm/opensm/osm_opensm.c	(working copy)
@@ -180,9 +180,9 @@ osm_opensm_init(
    /* Can't use log macros here, since we're initializing the log. */
    osm_opensm_construct( p_osm );
 
-   status = osm_log_init_v2( &p_osm->log, p_opt->force_log_flush,
-                             p_opt->log_flags, p_opt->log_file,
-                             p_opt->log_max_size, p_opt->accum_log_file );
+   status = osm_log_init( &p_osm->log, p_opt->force_log_flush,
+                          p_opt->log_flags, p_opt->log_file,
+                          p_opt->log_max_size, p_opt->accum_log_file );
    if( status != IB_SUCCESS )
       return ( status );
 
Index: osm/opensm/osm_db_files.c
===================================================================
--- osm/opensm/osm_db_files.c	(revision 9275)
+++ osm/opensm/osm_db_files.c	(working copy)
@@ -712,7 +712,7 @@ main(int argc, char **argv)
   cl_list_construct( &keys );
   cl_list_init( &keys, 10 );
 
-  osm_log_init_v2( &log, TRUE, 0xff, "/var/log/osm_db_test.log", 0, FALSE);
+  osm_log_init( &log, TRUE, 0xff, "/var/log/osm_db_test.log", 0, FALSE );
 
   osm_db_construct(&db);
   if (osm_db_init(&db, &log))
Index: osm/osmtest/osmtest.c
===================================================================
--- osm/osmtest/osmtest.c	(revision 9251)
+++ osm/osmtest/osmtest.c	(working copy)
@@ -520,8 +520,8 @@ osmtest_init( IN osmtest_t * const p_osm
   /* Can't use log macros here, since we're initializing the log. */
   osmtest_construct( p_osmt );
 
-  status = osm_log_init_v2( &p_osmt->log, p_opt->force_log_flush,
-                            0x0001, p_opt->log_file, 0, TRUE );
+  status = osm_log_init( &p_osmt->log, p_opt->force_log_flush,
+                         0x0001, p_opt->log_file, 0, TRUE );
   if( status != IB_SUCCESS )
     return ( status );
 
Index: osm/complib/cl_event_wheel.c
===================================================================
--- osm/complib/cl_event_wheel.c	(revision 9251)
+++ osm/complib/cl_event_wheel.c	(working copy)
@@ -610,7 +610,7 @@ main ()
   cl_event_wheel_construct( &event_wheel );
 
   /* init */
-  osm_log_init_v2( &log, TRUE, 0xff, NULL, 0, FALSE);
+  osm_log_init( &log, TRUE, 0xff, NULL, 0, FALSE );
   cl_event_wheel_init( &event_wheel, &log );
 
   /* Start Playing */
Index: diags/src/saquery.c
===================================================================
--- diags/src/saquery.c	(revision 9251)
+++ diags/src/saquery.c	(working copy)
@@ -442,8 +442,8 @@ get_bind_handle(void)
 	complib_init();
 
 	osm_log_construct(&log_osm);
-	if ((status = osm_log_init_v2(&log_osm, TRUE, 0x0001, NULL,
-				      0, TRUE)) != IB_SUCCESS) {
+	if ((status = osm_log_init(&log_osm, TRUE, 0x0001, NULL,
+				   0, TRUE)) != IB_SUCCESS) {
 		fprintf(stderr, "Failed to init osm_log: %s\n",
 			ib_get_err_str(status));
 		exit(-1);


From tziporet at dev.mellanox.co.il  Wed Sep  6 06:33:56 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 06 Sep 2006 16:33:56 +0300
Subject: [openib-general] problems to regiser memory as a reglar user on
 SLES9 SP3
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB8567@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB8567@taurus.voltaire.com>
Message-ID: <44FECE44.4070003@dev.mellanox.co.il>

Moshe Kazir wrote:
> Hi Tziporet,
>
> I'm trying Ofed 1.1 rc3 on IBM js21 sles9sp3 ppc64.
>
> Install is stopped at the very beginning as 64-bit udev is missing.
>
> I tried to compile the udev...src.rpm supplied in sls9sp3 cd3 and failed
> as result of compilation error.
>
> Did you test ofed 1.1 rc3 on ppc64. Can you advice me how to get 64-bit
> udev ?
>
>   
We have here only one MAC PPC64 machine that can run only Fedora C4 thus 
this is the only system we check.
Maybe Vlad can help but I think best if you approach Novel (Mois is 
their contact for OFED)

Tziporet


From mst at mellanox.co.il  Wed Sep  6 06:42:06 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 16:42:06 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <1157548884.12940.5019.camel@hal.voltaire.com>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
Message-ID: <20060906134206.GB6928@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> 
> OpenSM/osm_log API: Rather than polluting the namespace with needless
> symbols, use symbol versions and have a versioned osm_log_init rather
> than adding osm_log_init_v2 as an additional API
> 
> This patch is intended to be applied to both trunk and 1.1 versions.
> 
> Signed-off-by: Doug Ledford <dledford at redhat.com>
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>

This preserves the ABI, but would this not break the API?

Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all.
We are in code freeze, only critical fixes are supposed to be applied to branch
at this stage. How was adding osm_log_init_v2 critical?

Nor is this feature uncontroversial. Would not support for log rotation
be better?

So - why are all these changes going into 1.1 branch?

-- 
MST


From halr at voltaire.com  Wed Sep  6 07:14:30 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Sep 2006 10:14:30 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <20060906134206.GB6928@mellanox.co.il>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
Message-ID: <1157552070.12940.6861.camel@hal.voltaire.com>

On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > 
> > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > symbols, use symbol versions and have a versioned osm_log_init rather
> > than adding osm_log_init_v2 as an additional API
> > 
> > This patch is intended to be applied to both trunk and 1.1 versions.
> > 
> > Signed-off-by: Doug Ledford <dledford at redhat.com>
> > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> 
> This preserves the ABI, but would this not break the API?

Yes, this patch changes the API (in a most trivial way).

> Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all.
> We are in code freeze, only critical fixes are supposed to be applied to branch
> at this stage. How was adding osm_log_init_v2 critical?

There was a bug reported when the log filled up which started motivating
these changes. We had just missed the rc3 window for this. It is an
upward compatible change so is low risk.

> Nor is this feature uncontroversial. Would not support for log rotation
> be better?

Were there comments on the list before to this effect ?

> So - why are all these changes going into 1.1 branch?

See answers above.

-- Hal


From dledford at redhat.com  Wed Sep  6 07:58:09 2006
From: dledford at redhat.com (Doug Ledford)
Date: Wed, 06 Sep 2006 10:58:09 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <1157552070.12940.6861.camel@hal.voltaire.com>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
Message-ID: <1157554690.2569.6.camel@fc6.xsintricity.com>

On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote:
> On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:

> > Nor is this feature uncontroversial. Would not support for log rotation
> > be better?

If you are just going to do log rotation, then no need to change opensm,
just add an appropriate logrotate.d/opensm file to the distribution.
But, that doesn't address what to do if you hit a full filesystem
condition, nor how to limit the size of a log file between rotations
(which, as I understand it, is really only an issue because opensm can
log so much), which is what this entire patch series was designed to
address.  They are two different problem spaces.

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060906/f4b6448a/attachment.sig>

From dotanb at dev.mellanox.co.il  Wed Sep  6 08:07:54 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 06 Sep 2006 18:07:54 +0300
Subject: [openib-general] [librdmacm] execuation of the the test udaddy is
	failing
Message-ID: <44FEE44A.60308@dev.mellanox.co.il>

Here are the machine/driver props:
*************************************************************
Host Architecture : x86_64
Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10
Kernel Version    : 2.6.16.21-0.8-smp
GCC Version       : gcc (GCC) 4.1.0 (SUSE Linux)
Memory size       : 4045720 kB
Driver Version    : gen2_linux-20060905-1700 (REV=9264)
HCA ID(s)         : mthca0
HCA model(s)      : 25218
FW version(s)     : 5.1.927
Board(s)          : MT_0150000001
*************************************************************

Here is the output of the test:

 # udaddy
udaddy: starting server
librdmacm: Kernel ABI does not support requested port space.
udaddy: listen request failed
test complete
return status -93


executing the test mckey fails too because of the same problem:
 # mckey recv 239.0.0.2
librdmacm: Kernel ABI does not support requested port space.

The tests rping and the ucmatose are passing with no problem.

thanks
Dotan


From halr at voltaire.com  Wed Sep  6 08:09:06 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Sep 2006 11:09:06 -0400
Subject: [openib-general] [PATCH] OpenSM/osm_base.h: Change
 OSM_DEFAULT_TMP_DIR to /var/log for Linux
Message-ID: <1157555341.12940.8796.camel@hal.voltaire.com>

OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to /var/log for Linux

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: include/opensm/osm_base.h
===================================================================
--- include/opensm/osm_base.h	(revision 9158)
+++ include/opensm/osm_base.h	(working copy)
@@ -177,15 +177,14 @@ BEGIN_C_DECLS
 *
 * DESCRIPTION
 *	Specifies the default temporary directory for the log file, subnet.lst
-*  and the other log files (with the exception of osm.log for Linux being 
-*  in /var/log).
+*  and the other log files.
 *
 * SYNOPSIS
 */
 #ifdef __WIN__
 #define OSM_DEFAULT_TMP_DIR GetOsmTempPath()
 #else
-#define OSM_DEFAULT_TMP_DIR "/tmp/"
+#define OSM_DEFAULT_TMP_DIR "/var/log/"
 #endif
 /***********/
 

From mst at mellanox.co.il  Wed Sep  6 08:16:59 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 18:16:59 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <1157552070.12940.6861.camel@hal.voltaire.com>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
Message-ID: <20060906151659.GC6928@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> 
> On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > 
> > > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > > symbols, use symbol versions and have a versioned osm_log_init rather
> > > than adding osm_log_init_v2 as an additional API
> > > 
> > > This patch is intended to be applied to both trunk and 1.1 versions.
> > > 
> > > Signed-off-by: Doug Ledford <dledford at redhat.com>
> > > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > 
> > This preserves the ABI, but would this not break the API?
> 
> Yes, this patch changes the API (in a most trivial way).

So all users need to change code or they won't compile against the new
library?

> > Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all.
> > We are in code freeze, only critical fixes are supposed to be applied to branch
> > at this stage. How was adding osm_log_init_v2 critical?
> 
> There was a bug reported when the log filled up which started motivating
> these changes.

As I see it, a bugzilla ticket does not automatically convert feature request
into a bug report.  The issue is not exactly new, and people seem to have been
able to live with this.

The enhancement will keep opensm friendly to appliance like devices that are
single task subnet managers. fine, but OFED by default will activate opensm
without this switch.

Given all of the above, I don't see how can this be considered a critical bug
fix.

> We had just missed the rc3 window for this.

So that's a reason not to apply on branch unless it is critical.

> It is an upward compatible change so is low risk.

Not sure what do you mean by upward compatible. This API change does not seem to
be backward compatible - won't it break building dependent applications?
If so is not something you should do after code freeze.

If we care about namespace pollution that much (and I don't really see an issue)
do the changes on trunk.

> > Nor is this feature uncontroversial. Would not support for log rotation
> > be better?
> 
> Were there comments on the list before to this effect ?

Hmm. Not explicitly. There were comments this is non-standard and
will surprise system administrators if activated.
http://thread.gmane.org/gmane.linux.drivers.openib/29195/focus=29199

-- 
MST


From eitan at mellanox.co.il  Wed Sep  6 08:28:42 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Wed, 6 Sep 2006 18:28:42 +0300
Subject: [openib-general] [PATCH] OpenSM/osm_base.h: Change
 OSM_DEFAULT_TMP_DIR to /var/logfor Linux
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302D8686F@mtlexch01.mtl.com>

OK. I will need to update the ibdmchk utility to look by default for
osm.{fdbs,mcfdbs} and subnet.lst in the /var/tmp ...

I hope this is not targeting the OFED 1.1 as it must not be critical ...

Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL


> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Wednesday, September 06, 2006 6:09 PM
> To: openib-general at openib.org
> Cc: Eitan Zahavi
> Subject: [PATCH] OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to
> /var/logfor Linux
> 
> OpenSM/osm_base.h: Change OSM_DEFAULT_TMP_DIR to /var/log for Linux
> 
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> 
> Index: include/opensm/osm_base.h
> ================================================================
> ===
> --- include/opensm/osm_base.h	(revision 9158)
> +++ include/opensm/osm_base.h	(working copy)
> @@ -177,15 +177,14 @@ BEGIN_C_DECLS
>  *
>  * DESCRIPTION
>  *	Specifies the default temporary directory for the log file,
subnet.lst
> -*  and the other log files (with the exception of osm.log for Linux
being
> -*  in /var/log).
> +*  and the other log files.
>  *
>  * SYNOPSIS
>  */
>  #ifdef __WIN__
>  #define OSM_DEFAULT_TMP_DIR GetOsmTempPath()  #else -#define
> OSM_DEFAULT_TMP_DIR "/tmp/"
> +#define OSM_DEFAULT_TMP_DIR "/var/log/"
>  #endif
>  /***********/
> 
> 


From mst at mellanox.co.il  Wed Sep  6 08:27:29 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 18:27:29 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <1157554690.2569.6.camel@fc6.xsintricity.com>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
	<1157554690.2569.6.camel@fc6.xsintricity.com>
Message-ID: <20060906152729.GD6928@mellanox.co.il>

Quoting r. Doug Ledford <dledford at redhat.com>:
> Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> 
> On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote:
> > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> 
> > > Nor is this feature uncontroversial. Would not support for log rotation
> > > be better?
> 
> If you are just going to do log rotation, then no need to change opensm,
> just add an appropriate logrotate.d/opensm file to the distribution.

I guess opensm will need to be signalled to close/reopen the log file though.
No?

> But, that doesn't address what to do if you hit a full filesystem
> condition,

Since logs are compressed this should at least alleviate that.
what do other daemons do?

> nor how to limit the size of a log file between rotations

again, what do other daemons do?

> (which, as I understand it, is really only an issue because opensm can
> log so much),
> which is what this entire patch series was designed to
> address.  They are two different problem spaces.

So ... wouldn't it be better to address the real issue?
As I see it, the problem only appears if you activate opensm in the verbose
mode. And the reason to run so for a long time is only if you suspect you'll
want to debug something later, without killing opensm.

So the ability to control verbosity at runtime will be a better solution
it seems, and there are patches that do that.

-- 
MST


From thomas.bub at thomson.net  Wed Sep  6 08:29:44 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Wed, 6 Sep 2006 17:29:44 +0200
Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD38B@wdtssmail01.eu.thmulti.com>

I'm still in the process of porting my gen1 code to gen2.
As mentioned yesterday I can connect to a listener on the same machine
using libibcm.
Doing this I have to do the ibv_modify_qp by myself to get the qp's from
INIT via RTR to RTS on both sides.
At least the ibv_modify_qp doesn not complain when having done the
connection via the libibcm.
So my assumption is I have my two qp's successfully connected.

First action after the connection is the listener to wait on it's
receive cq for an IBV_WR_SEND done by the connector.
Here is now the problem:
*	The listener never gets a completion
*	The connector doing the IBV_WR_SEND does get error on the send
cq like 
opcode=0x7f status=0x5 vendor_err=129 for the first IBV_WR_SEND and
opcode=0x7f status=0xc vendor_err=129 for all sub-sequent attempt to
send the data

Is there anyone out there who can help me out to understand the error
codes and or to understand what is wrong?

Thanks in advance from Germany

Thomas

............................................................
Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: +49 6150 104 147
Fax: +49 6150 104 656
Email: Thomas.Bub at thomson.net
www.GrassValley.com  <http://www.grassvalley.com> 
............................................................


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060906/8aee59b7/attachment.html>

From halr at voltaire.com  Wed Sep  6 08:34:25 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Sep 2006 11:34:25 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <20060906151659.GC6928@mellanox.co.il>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
	<20060906151659.GC6928@mellanox.co.il>
Message-ID: <1157556861.12940.9754.camel@hal.voltaire.com>

On Wed, 2006-09-06 at 11:16, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > 
> > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > > 
> > > > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > > > symbols, use symbol versions and have a versioned osm_log_init rather
> > > > than adding osm_log_init_v2 as an additional API
> > > > 
> > > > This patch is intended to be applied to both trunk and 1.1 versions.
> > > > 
> > > > Signed-off-by: Doug Ledford <dledford at redhat.com>
> > > > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > > 
> > > This preserves the ABI, but would this not break the API?
> > 
> > Yes, this patch changes the API (in a most trivial way).
> 
> So all users need to change code or they won't compile against the new
> library?
> 
> > > Anyway, frankly, I do not see why was osm_log_init_v2 added into 1.1 at all.
> > > We are in code freeze, only critical fixes are supposed to be applied to branch
> > > at this stage. How was adding osm_log_init_v2 critical?
> > 
> > There was a bug reported when the log filled up which started motivating
> > these changes.
> 
> As I see it, a bugzilla ticket does not automatically convert feature request
> into a bug report.  The issue is not exactly new, and people seem to have been
> able to live with this.
> 
> The enhancement will keep opensm friendly to appliance like devices that are
> single task subnet managers. fine, but OFED by default will activate opensm
> without this switch.

It is another feature when this situation is encountered. It has been
encountered and will be again.

> Given all of the above, I don't see how can this be considered a critical bug
> fix.
> 
> > We had just missed the rc3 window for this.
> 
> So that's a reason not to apply on branch unless it is critical.

I've also seen other patches which do not meet this criteria go into
1.1. I know that's not a reason either.

> > It is an upward compatible change so is low risk.
> 
> Not sure what do you mean by upward compatible. This API change does not seem to
> be backward compatible - won't it break building dependent applications?

We are talking about 2 different changes. I was responding to your
comment about the addition of osm_log_init_v2 not being a bug fix, not
the symver patch on top of that.

> If so is not something you should do after code freeze.
> 
> If we care about namespace pollution that much (and I don't really see an issue)
> do the changes on trunk.
> 
> > > Nor is this feature uncontroversial. Would not support for log rotation
> > > be better?
> > 
> > Were there comments on the list before to this effect ?
> 
> Hmm. Not explicitly. There were comments this is non-standard and
> will surprise system administrators if activated.
> http://thread.gmane.org/gmane.linux.drivers.openib/29195/focus=29199

Not sure exactly what email (and comment) you are referring to here.

-- Hal


From mst at mellanox.co.il  Wed Sep  6 08:46:26 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 18:46:26 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than pollutingnamespace
In-Reply-To: <1157556861.12940.9754.camel@hal.voltaire.com>
References: <1157556861.12940.9754.camel@hal.voltaire.com>
Message-ID: <20060906154626.GE6928@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > It is an upward compatible change so is low risk.
> > 
> > Not sure what do you mean by upward compatible. This API change does not
> > seem to be backward compatible - won't it break building dependent
> > applications?
> 
> We are talking about 2 different changes. I was responding to your
> comment about the addition of osm_log_init_v2 not being a bug fix, not
> the symver patch on top of that.

I'm mostly concerned with the symver patch. I think we can't do API changes
at this stage in the release process.

-- 
MST


From halr at voltaire.com  Wed Sep  6 08:51:58 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Sep 2006 11:51:58 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <20060906152729.GD6928@mellanox.co.il>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
	<1157554690.2569.6.camel@fc6.xsintricity.com>
	<20060906152729.GD6928@mellanox.co.il>
Message-ID: <1157557918.12940.10426.camel@hal.voltaire.com>

On Wed, 2006-09-06 at 11:27, Michael S. Tsirkin wrote:
> Quoting r. Doug Ledford <dledford at redhat.com>:
> > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > 
> > On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote:
> > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > 
> > > > Nor is this feature uncontroversial. Would not support for log rotation
> > > > be better?
> > 
> > If you are just going to do log rotation, then no need to change opensm,
> > just add an appropriate logrotate.d/opensm file to the distribution.
> 
> I guess opensm will need to be signalled to close/reopen the log file though.
> No?
> 
> > But, that doesn't address what to do if you hit a full filesystem
> > condition,
> 
> Since logs are compressed this should at least alleviate that.
> what do other daemons do?
> 
> > nor how to limit the size of a log file between rotations
> 
> again, what do other daemons do?
> 
> > (which, as I understand it, is really only an issue because opensm can
> > log so much),
> > which is what this entire patch series was designed to
> > address.  They are two different problem spaces.
> 
> So ... wouldn't it be better to address the real issue?
> As I see it, the problem only appears if you activate opensm in the verbose
> mode. And the reason to run so for a long time is only if you suspect you'll
> want to debug something later, without killing opensm.

Those patches are still pending and won't be in OFED 1,1, right ?

> So the ability to control verbosity at runtime 


There already is a way to do that.

-- Hal

> will be a better solution
> it seems, and there are patches that do that.


From mst at mellanox.co.il  Wed Sep  6 09:10:54 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 19:10:54 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <1157557918.12940.10426.camel@hal.voltaire.com>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
	<1157554690.2569.6.camel@fc6.xsintricity.com>
	<20060906152729.GD6928@mellanox.co.il>
	<1157557918.12940.10426.camel@hal.voltaire.com>
Message-ID: <20060906161054.GF6928@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > (which, as I understand it, is really only an issue because opensm can
> > > log so much),
> > > which is what this entire patch series was designed to
> > > address.  They are two different problem spaces.
> > 
> > So ... wouldn't it be better to address the real issue?
> > As I see it, the problem only appears if you activate opensm in the verbose
> > mode. And the reason to run so for a long time is only if you suspect you'll
> > want to debug something later, without killing opensm.
> 
> Those patches are still pending and won't be in OFED 1,1, right ?

Well, I donnu. If it's assumed reducing log size is important enough
for 1.1, then maybe applying these patches are the way to go?

-- 
MST


From halr at voltaire.com  Wed Sep  6 09:13:22 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 06 Sep 2006 12:13:22 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <20060906161054.GF6928@mellanox.co.il>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
	<1157554690.2569.6.camel@fc6.xsintricity.com>
	<20060906152729.GD6928@mellanox.co.il>
	<1157557918.12940.10426.camel@hal.voltaire.com>
	<20060906161054.GF6928@mellanox.co.il>
Message-ID: <1157559199.12940.11246.camel@hal.voltaire.com>

On Wed, 2006-09-06 at 12:10, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > (which, as I understand it, is really only an issue because opensm can
> > > > log so much),
> > > > which is what this entire patch series was designed to
> > > > address.  They are two different problem spaces.
> > > 
> > > So ... wouldn't it be better to address the real issue?
> > > As I see it, the problem only appears if you activate opensm in the verbose
> > > mode. And the reason to run so for a long time is only if you suspect you'll
> > > want to debug something later, without killing opensm.
> > 
> > Those patches are still pending and won't be in OFED 1,1, right ?
> 
> Well, I donnu. If it's assumed reducing log size is important enough
> for 1.1, then maybe applying these patches are the way to go?

Right now these also involve an API change and that patch is being
reworked (so they couldn't possibly make OFED 1.1 rc4).

-- Hal


From mst at mellanox.co.il  Wed Sep  6 09:34:01 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 6 Sep 2006 19:34:01 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than pollutingnamespace
In-Reply-To: <1157559199.12940.11246.camel@hal.voltaire.com>
References: <1157559199.12940.11246.camel@hal.voltaire.com>
Message-ID: <20060906163401.GG6928@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: OpenSM/osm_log API: Use symbol versions rather than pollutingnamespace
> 
> On Wed, 2006-09-06 at 12:10, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > (which, as I understand it, is really only an issue because opensm can
> > > > > log so much),
> > > > > which is what this entire patch series was designed to
> > > > > address.  They are two different problem spaces.
> > > > 
> > > > So ... wouldn't it be better to address the real issue?
> > > > As I see it, the problem only appears if you activate opensm in the verbose
> > > > mode. And the reason to run so for a long time is only if you suspect you'll
> > > > want to debug something later, without killing opensm.
> > > 
> > > Those patches are still pending and won't be in OFED 1,1, right ?
> > 
> > Well, I donnu. If it's assumed reducing log size is important enough
> > for 1.1, then maybe applying these patches are the way to go?
> 
> Right now these also involve an API change and that patch is being
> reworked (so they couldn't possibly make OFED 1.1 rc4).

Actually I was under impression that that patch was preserving the exiting API
(only extension).

I hope we all agree API breakage isn't an option at this point.

-- 
MST


From sean.hefty at intel.com  Wed Sep  6 09:34:54 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 6 Sep 2006 09:34:54 -0700
Subject: [openib-general] [librdmacm] execuation of the the test udaddy
 is failing
In-Reply-To: <44FEE44A.60308@dev.mellanox.co.il>
Message-ID: <000201c6d1d2$625590f0$51c8180a@amr.corp.intel.com>

> # udaddy
>udaddy: starting server
>librdmacm: Kernel ABI does not support requested port space.
>udaddy: listen request failed
>test complete
>return status -93

UD QP and multicast support requires kernel ABI version 2.  It appears that the
kernel version running is 1.

- Sean


From Don.Dhondt at Bull.com  Wed Sep  6 10:56:37 2006
From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com)
Date: Wed, 6 Sep 2006 10:56:37 -0700
Subject: [openib-general] Latency Problem with MT25204 HCAs
Message-ID: <OFF9ECC268.4943B445-ON072571E1.005C3A9A-072571E1.006291A8@us-phx1.az05.bull.com>

We are seeing a latency problem that seems to be specific to the Mellanox 
MT25204 HCA.
We do not see the same problem with MT25208 HCAs running in MT23108 
compatibility mode.
The problem is demonstrated running multiple streams of ib_rdma_lat.

On the SDR MT25208 HCA:
                typical latency
1 stream        3.70 usec
2 streams       4.47 usec
4 streams       6.74 usec

On the DDR MT25204 HCA:
                typical latency
1 stream        3.03 usec
2 streams       7.36 usec
4 streams       22.4 usec

Can anyone explain this behavior?

We are running OFED 1.0 release on a pair of EM64T dual CPU nodes.

ibstat output:
CA 'mthca0'
        CA type: MT25208 (MT23108 compat mode)
        Number of ports: 2
        Firmware version: 4.7.400
        Hardware version: a0
        Node GUID: 0x0005ad0000035950
        System image GUID: 0x0005ad000100d050
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 21
                LMC: 0
                SM lid: 16
                Capability mask: 0x02500a68
                Port GUID: 0x0005ad0000035951
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 10
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02500a68
                Port GUID: 0x0005ad0000035952
CA 'mthca1'
        CA type: MT25204
        Number of ports: 1
        Firmware version: 1.0.800
        Hardware version: a0
        Node GUID: 0x0002c90200216e40
        System image GUID: 0x0002c90200216e43
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 20
                Base lid: 22
                LMC: 0
                SM lid: 22
                Capability mask: 0x02500a6a
                Port GUID: 0x0002c90200216e41

-Don
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060906/99c86f68/attachment.html>

From rjwalsh at pathscale.com  Wed Sep  6 11:39:29 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 06 Sep 2006 11:39:29 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <44FEAFB1.3040902@dev.mellanox.co.il>
References: <000001c6d130$85e0dea0$bb97070a@amr.corp.intel.com>
	<44FE228E.9050402@pathscale.com> <44FEAFB1.3040902@dev.mellanox.co.il>
Message-ID: <44FF15E1.4040704@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tziporet Koren wrote:
> Robert Walsh wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>  
>>> Here is a slightly modified patch for your attributes issue. Can you
>>> give it a try?
>>>     
>>
>> I rebuilt OFED from scratch with the patch, and ran successfully on
>> Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
>> Intel MPI 3.0b.  If you could verify that the fix you mentioned that is
>> in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
>> If you have a later beta version you could send me, that would be great,
>> too.
>>
>> Regards,
>>  Robert.
>>   
> I added this patch under fixes to OFED 1.1. Will be in RC4

Excellent.  Thanks, Tziporet.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP8V4fzvnpzTd9fxAQLZVAf+IYtLA2c7cBCbzih2Suy4AHUdD1CghC0U
XL+iWjLo4TFbcUhBIrzwG4M72VQanqhNr2Qs3ZtfU2+qN6qKnSZXdejd7nYYOAsz
5LnrWa6Y+9Jfy3K/JOQ4wpjc3lWs3rvuzPTBhmEPcNHZk5+/m0gbfzYLdrc2djPp
soyFSQpyLdpF0J5iY12EWiPYnFK7ConoqYHkTODZV8IjBJIImvDoScouIC+Uzi+x
HlANIlneKa4/zQHNaK+3vZ6N7ZUq30quMZU6ICMI2gzFEzsEe/HxbtnraXfnXH1J
NQ4mMOJNXwPVveNn1E9zA7IgFTMYsnGH080O5saloj2S6P6jb3PLXw==
=mDD0
-----END PGP SIGNATURE-----


From arlin.r.davis at intel.com  Wed Sep  6 13:43:47 2006
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Wed, 6 Sep 2006 13:43:47 -0700
Subject: [openib-general] uDAPL patch
Message-ID: <B0095134066CC94FBC80973103FFA1FE01838BC2@orsmsx416.amr.corp.intel.com>

>Hi James,
>
>I don't know if you've been following the back-and-forth on
>openib-general concerning the problems we've had running Intel MPI on
>QLogic's adapters.  Basically, between Arlin Davis and myself, we've
>come up with a patch to uDAPL to fix some uninitialized fields returned
>by dat_ia_query() that allows the InfiniPath adapters to work
correctly.

Committed in OpenFabrics (svn9315) and SourceForge (svn1411). 

Thanks,

-arlin


From bos at pathscale.com  Wed Sep  6 15:54:12 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Wed, 06 Sep 2006 15:54:12 -0700
Subject: [openib-general] [PATCH] Reduce packet loss in receive path,
	OFED 1.1
Message-ID: <1157583252.22887.62.camel@sardonyx>

Hi, Tziporet -

This is another patch for RC4, which reduces the likelihood of packet
loss when the receiver is being saturated with packets.  Please apply.

Thanks,

	<b
-------------- next part --------------
IB/ipath - use memcpy_cachebypass to reduce packet loss

In cases where a large incoming RDMA is being received, we have to
copy data inside the interrupt handler before we can ACK each packet.
The source is DMAed to by the hardware, which means that the CPU won't
have it cached.  We only read the source this one time; using normal load
instructions pollutes the dcache with useless data, reducing performance
to the point where we can lose a significant number of packets.

We use memcpy_cachebypass to try to not fill the dcache with useless data.
Avoiding the cache refill penalty lets us keep up better with the sender,
resulting in many fewer dropped packets.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r d8eed27eaaa2 drivers/infiniband/hw/ipath/Makefile
--- a/drivers/infiniband/hw/ipath/Makefile	Wed Sep 06 13:26:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/Makefile	Wed Sep 06 15:48:34 2006 -0700
@@ -31,4 +31,5 @@ ib_ipath-y := \
 	ipath_verbs.o
 
 ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o
+ib_ipath-$(CONFIG_X86_64) += memcpy_cachebypass_x86_64.o
 ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o
diff -r d8eed27eaaa2 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c	Wed Sep 06 13:26:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c	Wed Sep 06 15:48:45 2006 -0700
@@ -40,6 +40,12 @@
 #include "ipath_verbs.h"
 #include "ipath_common.h"
 
+#ifdef __x86_64__
+void *memcpy_cachebypass(void *, const void *, __kernel_size_t);
+#else
+#define memcpy_cachebypass(a,b,c) memcpy((a),(b),(c))
+#endif
+
 static unsigned int ib_ipath_qp_table_size = 251;
 module_param_named(qp_table_size, ib_ipath_qp_table_size, uint, S_IRUGO);
 MODULE_PARM_DESC(qp_table_size, "QP table size");
@@ -167,7 +173,7 @@ void ipath_copy_sge(struct ipath_sge_sta
 		BUG_ON(len == 0);
 		if (len > length)
 			len = length;
-		memcpy(sge->vaddr, data, len);
+		memcpy_cachebypass(sge->vaddr, data, len);
 		sge->vaddr += len;
 		sge->length -= len;
 		sge->sge_length -= len;
diff -r d8eed27eaaa2 drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S	Wed Sep 06 15:48:34 2006 -0700
@@ -0,0 +1,115 @@
+	.text
+	.p2align 4,,15
+	/* rdi  destination, rsi source, rdx count */
+	.globl	memcpy_cachebypass
+	.type	memcpy_cachebypass, @function
+memcpy_cachebypass:
+	movq	%rdi, %rax
+.L5:
+	cmpq	$15, %rdx
+	ja	.L34
+.L3:
+	cmpl	$8, %edx	/* rdx is 0..15 */
+	jbe	.L9
+.L6:
+	testb	$8, %dxl	/* rdx is 3,5,6,7,9..15 */
+	je	.L13
+	movq	(%rsi), %rcx
+	addq	$8, %rsi
+	movq	%rcx, (%rdi)
+	addq	$8, %rdi
+.L13:
+	testb	$4, %dxl
+	je	.L15
+	movl	(%rsi), %ecx
+	addq	$4, %rsi
+	movl	%ecx, (%rdi)
+	addq	$4, %rdi
+.L15:
+	testb	$2, %dxl
+	je	.L17
+	movzwl	(%rsi), %ecx
+	addq	$2, %rsi
+	movw	%cx, (%rdi)
+	addq	$2, %rdi
+.L17:
+	testb	$1, %dxl
+	je	.L33
+.L1:
+	movzbl	(%rsi), %ecx
+	movb	%cl, (%rdi)
+.L33:
+	ret
+.L34:
+	cmpq	$63, %rdx	/* rdx is > 15 */
+	ja	.L64
+	movl	$16, %ecx	/* rdx is 16..63 */
+.L25:
+	movq	8(%rsi), %r8
+	movq	(%rsi), %r9
+	addq	%rcx, %rsi
+	movq	%r8, 8(%rdi)
+	movq	%r9, (%rdi)
+	addq	%rcx, %rdi
+	subq	%rcx, %rdx
+	cmpl	%edx, %ecx	/* is rdx >= 16? */
+	jbe	.L25
+	jmp	.L3		/* rdx is 0..15 */
+	.p2align 4,,7
+.L64:
+	movl	$64, %ecx
+.L42:
+	prefetchnta	128(%rsi)
+	movq	(%rsi), %r8
+	movq	8(%rsi), %r9
+	movq	16(%rsi), %r10
+	movq	24(%rsi), %r11
+	subq	%rcx, %rdx
+	movq	%r8, (%rdi)
+	movq	32(%rsi), %r8
+	movq	%r9, 8(%rdi)
+	movq	40(%rsi), %r9
+	movq	%r10, 16(%rdi)
+	movq	48(%rsi), %r10
+	movq	%r11, 24(%rdi)
+	movq	56(%rsi), %r11
+	addq	%rcx, %rsi
+	movq	%r8, 32(%rdi)
+	movq	%r9, 40(%rdi)
+	movq	%r10, 48(%rdi)
+	movq	%r11, 56(%rdi)
+	addq	%rcx, %rdi
+	cmpq	%rdx, %rcx	/* is rdx >= 64? */
+	jbe	.L42
+	sfence
+	orl	%edx, %edx
+	je	.L33
+	jmp	.L5
+.L9:
+	jmp	*.L12(,%rdx,8)	/* rdx is 0..8 */
+	.section	.rodata
+	.align 8
+	.align 4
+.L12:
+	.quad	.L33
+	.quad	.L1
+	.quad	.L2
+	.quad	.L6
+	.quad	.L4
+	.quad	.L6
+	.quad	.L6
+	.quad	.L6
+	.quad	.L8
+	.text
+.L2:
+	movzwl	(%rsi), %ecx
+	movw	%cx, (%rdi)
+	ret
+.L4:
+	movl	(%rsi), %ecx
+	movl	%ecx, (%rdi)
+	ret
+.L8:
+	movq	(%rsi), %rcx
+	movq	%rcx, (%rdi)
+	ret

From bugzilla-daemon at openib.org  Wed Sep  6 16:01:45 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Wed,  6 Sep 2006 16:01:45 -0700 (PDT)
Subject: [openib-general] [Bug 222] New: ib_uverbs fails to load on ia64,
	OFED 1.1 - rc3
Message-ID: <20060906230145.640032283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=222

           Summary: ib_uverbs fails to load on ia64, OFED 1.1 - rc3
           Product: OpenFabrics Linux
           Version: 1.1rc3
          Platform: IA64
        OS/Version: RHEL 4
            Status: NEW
          Severity: blocker
          Priority: P1
         Component: Verbs
        AssignedTo: bugzilla at openib.org
        ReportedBy: robert.j.woodruff at intel.com


OFED 1.1-rc3 ib_uverbs fails to load on Itanium on RHEL4-U3,
due to unknown symbol hpage_shift.

This is a new bug that did not happen with OFED 1.1-rc2.
/etc/init.d/openibd start
Loading HCA driver and Access Layer:                       [FAILED]

Please open an issue in the http://openib.org/bugzilla and attach
/tmp/ib_debug_info.log

> dmesg
ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca: Initializing 0000:0d:00.0
ACPI: PCI interrupt 0000:0d:00.0[A] -> GSI 76 (level, low) -> IRQ 58
ib_mthca 0000:0d:00.0: HCA FW version 3.3.2 is old (3.4.0 is current).
ib_mthca 0000:0d:00.0: If you have problems, try updating your HCA FW.
ib_uverbs: Unknown symbol hpage_shift
divert: not allocating divert_blk for non-ethernet device ib0
divert: not allocating divert_blk for non-ethernet device ib1
ip_tables: (C) 2000-2002 Netfilter core team
ib0: no IPv6 routers present


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From robert.j.woodruff at intel.com  Wed Sep  6 16:56:55 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 6 Sep 2006 16:56:55 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C9147CE@orsmsx418.amr.corp.intel.com>

 Robert Walsh wrote,
>> I rebuilt OFED from scratch with the patch, and ran successfully on
>> Intel MPI 2.0.1 with the refresh patch.  I could not get it to run on
>> Intel MPI 3.0b.  If you could verify that the fix you mentioned that
is
>> in the 2.0.1 refresh patch also made it into 3.0b, I'd appreciate it.
>> If you have a later beta version you could send me, that would be
great,
>> too.
>>
>> Regards,
>>  Robert.

I spoke with our MPI team lead and it is very likely that the fix that
is in 2.0.1-refresh did not make it into 3.0 beta, but it should be
in the 3.0 release schedule to be completed in a couple of weeks.

woody


From rjwalsh at pathscale.com  Wed Sep  6 17:16:09 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 06 Sep 2006 17:16:09 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C9147CE@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C9147CE@orsmsx418.amr.corp.intel.com>
Message-ID: <44FF64C9.4090608@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I spoke with our MPI team lead and it is very likely that the fix that
> is in 2.0.1-refresh did not make it into 3.0 beta, but it should be
> in the 3.0 release schedule to be completed in a couple of weeks.

OK then - I'll wait for that.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRP9kyfzvnpzTd9fxAQJu/wf+PEjyS1xAKzmXD+oZJxUNNeaW7QpqKz3h
zc370m74yIWjI+8GianGN4VM6Zx4InPdsRbGNGTd+FRhmZvYDhuuo8VBQUDdAZdB
Tkm+PomDIWdftj8cWCsiah4UkhzRv//83TiIkGZ5+zk25qOvQ6VAW4fy6vpJhKvo
uTW9Sow/G/BAIuMZ8wwg5Jyz5kbYxDxr+21jzQ+nblM/6YdGVco3GI1/z/dXwK5V
JEPIEu4ZxExOU9yGqS/hculq2Z9WFyGTBYoll67KkhpOuLUxiCxCxStA8Z0x52fG
OIhL0vKYgiOWLZnxZONRsy89OR/mUV7SNZeOZVqJSqMh7SpeLWWYHQ==
=SRiy
-----END PGP SIGNATURE-----


From tom at opengridcomputing.com  Wed Sep  6 20:51:11 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Wed, 06 Sep 2006 22:51:11 -0500
Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/:
 possible cleanups
In-Reply-To: <ada8xkydjk9.fsf@cisco.com>
Message-ID: <C125015F.84B2%tom@opengridcomputing.com>


Roland:

Is there anything we know about that is still unresolved at this point?
We've got a bunch of balls up in the air here and I want to make sure we
haven't dropped one.

Thanks,
Tom

On 9/5/06 5:39 PM, "Roland Dreier" <rdreier at cisco.com> wrote:

>     Steve> Its old debug code that isn't used anywhere.  It would be
>     Steve> nice to keep it around, but if you really don't want it,
>     Steve> nuke it...
> 
> No, that's fine, I'll leave it inside the #if 0.
> 
>  - R.


From dledford at redhat.com  Wed Sep  6 21:16:00 2006
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 07 Sep 2006 00:16:00 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versions rather
 than polluting namespace
In-Reply-To: <20060906151659.GC6928@mellanox.co.il>
References: <1157548884.12940.5019.camel@hal.voltaire.com>
	<20060906134206.GB6928@mellanox.co.il>
	<1157552070.12940.6861.camel@hal.voltaire.com>
	<20060906151659.GC6928@mellanox.co.il>
Message-ID: <1157602561.4652.53.camel@fc6.xsintricity.com>

On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > 
> > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > > 
> > > > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > > > symbols, use symbol versions and have a versioned osm_log_init rather
> > > > than adding osm_log_init_v2 as an additional API
> > > > 
> > > > This patch is intended to be applied to both trunk and 1.1 versions.
> > > > 
> > > > Signed-off-by: Doug Ledford <dledford at redhat.com>
> > > > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > > 
> > > This preserves the ABI, but would this not break the API?
> > 
> > Yes, this patch changes the API (in a most trivial way).
> 
> So all users need to change code or they won't compile against the new
> library?

Yes, and that is the correct way to handle this change.  I could see
leaving the whole log init change out entirely, but if it's going to go
in, this is the right way to do it.

> Not sure what do you mean by upward compatible. This API change does not seem to
> be backward compatible - won't it break building dependent applications?
> If so is not something you should do after code freeze.

APIs change.  Any app you can build can compensate.  The goal is to keep
apps that aren't recompiled working, and to make apps that are
recompiled compliant with the latest version of the function.


-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/ae9c10e7/attachment.sig>

From dotanb at dev.mellanox.co.il  Wed Sep  6 22:31:40 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 07 Sep 2006 08:31:40 +0300
Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD38B@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD38B@wdtssmail01.eu.thmulti.com>
Message-ID: <44FFAEBC.9020109@dev.mellanox.co.il>

Hi Thomas.


Bub Thomas wrote:
>
> I’m still in the process of porting my gen1 code to gen2.
>
> As mentioned yesterday I can connect to a listener on the same machine 
> using libibcm.
>
> Doing this I have to do the ibv_modify_qp by myself to get the qp’s 
> from INIT via RTR to RTS on both sides.
>
> At least the ibv_modify_qp doesn not complain when having done the 
> connection via the libibcm.
>
> So my assumption is I have my two qp’s successfully connected.
>
> First action after the connection is the listener to wait on it’s 
> receive cq for an IBV_WR_SEND done by the connector.
>
> Here is now the problem:
>
> · The listener never gets a completion
>
> · The connector doing the IBV_WR_SEND does get error on the send cq like
> opcode=0x7f status=0x5 vendor_err=129 for the first IBV_WR_SEND and
> opcode=0x7f status=0xc vendor_err=129 for all sub-sequent attempt to 
> send the data
>
> Is there anyone out there who can help me out to understand the error 
> codes and or to understand what is wrong?
>
> Thanks in advance from Germany
>
> Thomas
>
Which QP do you use (RC/UC/UD)?

do you get any completion in the connector side?

If you are using RC QP:
the reason for not getting any completion in the CQ is that

Did you post any RR (Receive Request) at the listener side?
rnr_retry =7 means that in case of RNR retry there will be infinite retries

if the timeout = 0 and the remote QP is not ready then there won't be 
any retransmition.

Dotan


From bugzilla-daemon at openib.org  Wed Sep  6 23:04:17 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Wed,  6 Sep 2006 23:04:17 -0700 (PDT)
Subject: [openib-general] [Bug 222] ib_uverbs fails to load on ia64,
	OFED 1.1 - rc3
Message-ID: <20060907060417.DBC4C2283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=222


tziporet at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from tziporet at mellanox.co.il  2006-09-06 23:04 -------
A fix was done in the way page shift are calculated in Itanium.
Will be part of RC4


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From sweitzen at cisco.com  Wed Sep  6 23:22:48 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 6 Sep 2006 23:22:48 -0700
Subject: [openib-general] Cisco SQA results so far for OFED 1.1 rc3
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023889E1@xmb-sjc-216.amer.cisco.com>

Testing is still continuing, we have not started testing RHEL4 U4, SLES
10, IPoIB HA, or SRP HA yet.
 
Some high points of rc3 (all testing done with Mellanox HCAs and Cisco
switches):

*	We have migrated from Intel 9.0 compilers to 9.1.
*	Are seeing 2 million msg/sec with MVAPICH.
*	Sinai 1.1.000 firmware fixes SDP scalability.
*	Open MPI 1.1.1 is working better than 1.1.
*	We see up to 3.5 Gb/sec max throughput with IPoIB on latest
Intel Xeon and AMD Opteron processors.

Many bug fixes have been tested:

*	193   OFED 1.1 rc1: openib-diags should not be linked with
opensm libs
*	197   OFED 1.1rc1: Open MPI fails on RHEL4 64-bit
*	74    OFED 1.0 rc4: Open MPI Pallas test hangs
*	109   OFED 1.0 rc5: SDP can't sustain 100+ concurrent SDP
connections (mem leak?)
*	101   OFED 1.0: need documentation on openib-diags
*	80    OFED 1.0 rc4: Open MPI fails on RHEL4 U3 ppc64
*	135   OFED 1.0: MVAPICH doesn't work on RHEL4 U3 ppc64
*	176   OFED 1.0: mpicc fails with Intel C on RHEL4 IA64
*	179   move /usr/local/ofed/sbin binaries to /usr/local/ofed/bin
*	103   OFED 1.0: change ibutils to not depend on opensm

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060906/1e754dbe/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ofed_sqa_results.xls
Type: application/vnd.ms-excel
Size: 81408 bytes
Desc: ofed_sqa_results.xls
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060906/1e754dbe/attachment.xls>

From mst at mellanox.co.il  Wed Sep  6 23:22:43 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 7 Sep 2006 09:22:43 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol versionsrather
 than polluting namespace
In-Reply-To: <1157602561.4652.53.camel@fc6.xsintricity.com>
References: <1157602561.4652.53.camel@fc6.xsintricity.com>
Message-ID: <20060907062243.GH6928@mellanox.co.il>

Quoting r. Doug Ledford <dledford at redhat.com>:
> Subject: Re: [openib-general] OpenSM/osm_log API: Use symbol versionsrather than polluting namespace
> 
> On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > 
> > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > > > 
> > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > > > > symbols, use symbol versions and have a versioned osm_log_init rather
> > > > > than adding osm_log_init_v2 as an additional API
> > > > > 
> > > > > This patch is intended to be applied to both trunk and 1.1 versions.
> > > > > 
> > > > > Signed-off-by: Doug Ledford <dledford at redhat.com>
> > > > > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > > > 
> > > > This preserves the ABI, but would this not break the API?
> > > 
> > > Yes, this patch changes the API (in a most trivial way).
> > 
> > So all users need to change code or they won't compile against the new
> > library?
> 
> Yes, and that is the correct way to handle this change.

I disagree.

In my opinion, asking all users to add a parameter they don't care about is
worse than having multiple functions with a convenient set of options.  And if
there is a low cost way to help apps compile without code change, I don't see
why it makes sense to create work.

Even if this were a good idea, I don't think introducing a flag day for all
users without warning is as good way to extend library APIs. I would expect at
least one release where both new and old functions are available.

> I could see
> leaving the whole log init change out entirely, but if it's going to go
> in, this is the right way to do it.

Maybe it should be left out. Whether the issue this addresses is critical
for release is Hal's call. But if the change affects other modules
I think it's clear we won't be able to take the fix.

> > Not sure what do you mean by upward compatible. This API change does not seem to
> > be backward compatible - won't it break building dependent applications?
> > If so is not something you should do after code freeze.
> 
> APIs change.

APIs should not change with every release.

> Any app you can build can compensate.

Sure it seems simple if you are RedHat and rebuild the whole OS.
However, let us look at an application vendor that cares about portability.
What this "trivial" change involves is:

1. add a configure hook to check library version installed
2. define an approprite macro
3. add a wrapper in header file to call the appropriate function
4. update the application to use the wrapper instead of the function
   directly

All this after a supposed code freeze.

> The goal is to keep
> apps that aren't recompiled working, and to make apps that are
> recompiled compliant with the latest version of the function.

We are past code freeze. I agree with Hal that it might be hard to draw a line
between a critical and a non-critical bugfix. However, an API change
that
1. is purely cosmetical
2. requires code changes in dependent applications
3. is not uncontroversial
is, for me, obviously beyond that line.

-- 
MST


From eli at dev.mellanox.co.il  Wed Sep  6 23:28:58 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 07 Sep 2006 09:28:58 +0300
Subject: [openib-general] PXE + infiniband?
In-Reply-To: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>
References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>
Message-ID: <1157610538.30038.35.camel@localhost>

On Fri, 2006-09-01 at 14:51 -0400, Cain, Brian (GE Healthcare) wrote:
> A while back
> (http://openib.org/pipermail/openib-general/2005-September/010801.html)
> there was mention of putting PXE stuff on an HCA.  Has anyone done this
> with PXELINUX?  It doesn't seem like it's as straightforward as just
> putting the stock PXELINUX image on your HCA.  I'm assuming this image
> would have to recognize the HCA and bring up IPoIB in order to use the
> conventional TFTP transport?

There is an implementation of PXE for Mellanox's HCAs that can be found
here: http://sourceforge.net/forum/forum.php?forum_id=494529


From paul.baxter at dsl.pipex.com  Thu Sep  7 00:19:15 2006
From: paul.baxter at dsl.pipex.com (Paul Baxter)
Date: Thu, 7 Sep 2006 08:19:15 +0100
Subject: [openib-general] PXE + infiniband?
References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>
	<1157610538.30038.35.camel@localhost>
Message-ID: <003e01c6d24d$f19caed0$8000a8c0@blorp>

> There is an implementation of PXE for Mellanox's HCAs that can be found
> here: http://sourceforge.net/forum/forum.php?forum_id=494529

Thanks for the tip

I, too, am interested in this.

Do you have a more direct link as I wandered around etherboot's project site 
and couldn't find anything IB-specific.

Paul Baxter 


From paul.baxter at dsl.pipex.com  Thu Sep  7 00:28:39 2006
From: paul.baxter at dsl.pipex.com (Paul Baxter)
Date: Thu, 7 Sep 2006 08:28:39 +0100
Subject: [openib-general] PXE + infiniband?
References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>
	<1157610538.30038.35.camel@localhost>
	<003e01c6d24d$f19caed0$8000a8c0@blorp>
Message-ID: <005701c6d24f$3cf2c3f0$8000a8c0@blorp>

>> There is an implementation of PXE for Mellanox's HCAs that can be found
>> here: http://sourceforge.net/forum/forum.php?forum_id=494529
>
> Thanks for the tip
>
> I, too, am interested in this.
>
> Do you have a more direct link as I wandered around etherboot's project 
> site
> and couldn't find anything IB-specific.


I must have been having a 'special moment' before, because I couldn't find 
the mailing lists

Here they are!

http://sourceforge.net/search/?ml_name=etherboot-developers&type_of_search=mlists&group_id=4233&words=infiniband 


From dotanb at dev.mellanox.co.il  Thu Sep  7 01:01:53 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 07 Sep 2006 11:01:53 +0300
Subject: [openib-general] [librdmacm] execuation of the the test udaddy
 is failing
In-Reply-To: <000201c6d1d2$625590f0$51c8180a@amr.corp.intel.com>
References: <000201c6d1d2$625590f0$51c8180a@amr.corp.intel.com>
Message-ID: <44FFD1F1.7050204@dev.mellanox.co.il>

Sean Hefty wrote:
>> # udaddy
>> udaddy: starting server
>> librdmacm: Kernel ABI does not support requested port space.
>> udaddy: listen request failed
>> test complete
>> return status -93
>>     
>
> UD QP and multicast support requires kernel ABI version 2.  It appears that the
> kernel version running is 1.
>
>   
Thanks, that was the problem.

Dotan


From dotanb at dev.mellanox.co.il  Thu Sep  7 01:11:38 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 07 Sep 2006 11:11:38 +0300
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD389@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD389@wdtssmail01.eu.thmulti.com>
Message-ID: <44FFD43A.4020108@dev.mellanox.co.il>

Bub Thomas wrote:
> Dotan,
> the ibv_rc_pingpong example works for me so I can exclude the
> architecture.
> I never got the libibcm example compiled.
> Which is your example and which architecture x86 vs. x86_64 did you
> compile it for?
> Can you share your libibcm the example code? (if it is not the standard
> that I can't get compiled)
> Thomas
>   
I started to modify the qp_test (a test that can be found in 
https://openib.org/svn/trunk/contrib/mellanox/ibtp/gen2/userspace/useraccess/qp_test/)
here is the main file that deals with the libibcm.

I'm sorry, but if you'll add this file to the qp_test it won't compile 
(because of some more changed in the code).
When I'll finish to clean the code i will commit the full version (with 
the libibcm support) to the openib svn.

I hope that this code will help you ...
Dotan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: connect_qp.c
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/0a63e22a/attachment.c>

From moshek at voltaire.com  Thu Sep  7 02:31:59 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 7 Sep 2006 12:31:59 +0300
Subject: [openib-general] getting LOC_QP_OP_ERR with IPoIB - mstflint
 question
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB856E@taurus.voltaire.com>


Let assume that the HCA has wrong FWR and/or other reason that cause
driver load failure  ?

We have to check what's going on in this case. ->  mstflint is one of
our tools.

Moshe.


____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Wednesday, September 06, 2006 4:25 PM
To: Moshe Kazir
Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org; Yiftah Shahar;
Tseng-hui Lin
Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB -
mstflint question


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Is it time to create a  work arround that opens /proc/bus/pci/ .... 
> And always work ?

But why isn't the driver loaded?

-- 
MST


From tziporet at dev.mellanox.co.il  Thu Sep  7 03:50:11 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Thu, 07 Sep 2006 13:50:11 +0300
Subject: [openib-general] [PATCH] Reduce packet loss in receive path,
 OFED 1.1
In-Reply-To: <1157583252.22887.62.camel@sardonyx>
References: <1157583252.22887.62.camel@sardonyx>
Message-ID: <44FFF963.6030508@dev.mellanox.co.il>

Bryan O'Sullivan wrote:
> Hi, Tziporet -
>
> This is another patch for RC4, which reduces the likelihood of packet
> loss when the receiver is being saturated with packets.  Please apply.
>
>   
this patch is in for RC4
Tziporet


From ishai at mellanox.co.il  Thu Sep  7 04:05:39 2006
From: ishai at mellanox.co.il (Ishai Rabinovitz)
Date: Thu, 7 Sep 2006 14:05:39 +0300
Subject: [openib-general] FW: OFED 1.1 rc3 srp driver panic
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com>

Here is an Oops in ib_srp

________________________________

From: Dachepalli, Sudhir [mailto:Sudhir.Dachepalli at lsil.com] 
Sent: Tuesday, September 05, 2006 11:09 PM
To: Vu Pham
Cc: Richard, Bill
Subject: OFED 1.1 rc3 srp driver panic


Hello Vu,
 
I am trying to integrate MPP and OFED 1.1 rc3 srp.  
 
Status on following 2 issues.

*	New Host number allocation for controller offline / online - MPP
will handle this with out the need to run hot_add. we need to use
srp_daemon. 
*	scsi error handler invocation - we need to figure out how to
cleanly exit out of error handler after cleaning up all the IO's - THIS
IS THE BIGGEST ISSUE NOW.

 
Panic
 
 I noticed the following panic when I performed sysreboot on controller
A while IO is going on :
 
ib_srp: SRP reset_host called
ib_srp: QP event 1
ib_srp: QP event 1
ib_srp: connection closed
Unable to handle kernel NULL pointer dereference at 0000000000000000
RIP:
[<0000000000000000>]
PML4 214f0d067 PGD 214657067 PMD 0
Oops: 0010 [1] SMP
CPU 1
Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc
rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_srp(U) ds yenta_socket
 pcmcia_core dm_mirror dm_mod button battery ac md5 ipv6 uhci_hcd
ehci_hcd ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib
_sa(U) ib_mad(U) ib_core(U) e1000 ext3 jbd mppVhba(U) qla2400 qla2322
qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptfc mptscsi
 mptbase ata_piix libata mppUpper(U) sg sd_mod scsi_mod
Pid: 4991, comm: scsi_eh_7 Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<0000000000000000>] [<0000000000000000>]
RSP: 0018:000001021202dd70  EFLAGS: 00010006
RAX: 0000010210234100 RBX: 00000102114b0a28 RCX: 00000102114b08a0
RDX: 00000102114b08b0 RSI: 00000102114b0a28 RDI: 0000010210234100
RBP: 00000102114b0c08 R08: 0000000000000000 R09: 0000000210d6c000
R10: 00000102114b03c8 R11: 00000102114b03c8 R12: 00000102114b03c8
R13: 000001021202dee8 R14: 00000102114b0000 R15: 000001021202ded8
FS:  0000002a9589a760(0000) GS:ffffffff804d7b80(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000cfe58000 CR4: 00000000000006e0
Process scsi_eh_7 (pid: 4991, threadinfo 000001021202c000, task
0000010211e9b030)
Stack: ffffffffa02a1792 00000102114b08b0 00000102114b03c8
0000000000000000
       ffffffffa02a18bf 0000003000000008 000001021202de78
000001021202ddb8
       0000010211e9b030 ffffffff801333c8
Call Trace:<ffffffffa02a1792>{:ib_srp:srp_reset_req+37}
<ffffffffa02a18bf>{:ib_srp:srp_reconnect_target+288}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff801e6fb2>{kobject_release+0}
       <ffffffffa02a2fdb>{:ib_srp:srp_reset_host+51}
<ffffffffa02a2fe3>{:ib_srp:srp_reset_host+59}
       <ffffffffa0005943>{:scsi_mod:scsi_try_host_reset+118}
       <ffffffffa00062f9>{:scsi_mod:scsi_error_handler+2347}
       <ffffffff80110e17>{child_rip+8}
<ffffffffa00059ce>{:scsi_mod:scsi_error_handler+0}
       <ffffffff80110e0f>{child_rip+0}
 
Code:  Bad RIP value.
RIP [<0000000000000000>] RSP <000001021202dd70>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Oops
 
 
Sudhir Dachepalli 
Engenio Storage Group 
LSI Logic Corporation 
12331 Riata Trace Parkway 
Suite B200 
Austin , Texas   78727 
512 794 3706 phone
512 794 3702 fax
sudhir.dachepalli at lsil.com 
www.lsilogic.com/engenio
<outbind://50/DOCUME~1/sdachepa/LOCALS~1/Temp/FrontPageTempDir/www.lsilo
gic.com/engenio>  
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/2a2541b9/attachment.html>

From tziporet at mellanox.co.il  Thu Sep  7 04:24:05 2006
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Thu, 7 Sep 2006 14:24:05 +0300
Subject: [openib-general] [openfabrics-ewg] Cisco SQA results so far for
	OFED 1.1 rc3
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA787B@mtlexch01.mtl.com>

Hi Scott,

Thanks for the details report.

This is the status of bugs that are not Cisco specific (e.g. tvflash)
for RC4:

219       OFED 1.1rc3 contains prerelease unstable libibverbs code -
fixed

221       SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118 - open
and Roland should work on this             

222       ib_uverbs fails to load on ia64, OFED 1.1 - rc3 (opened by Bob
Woodruff, but we saw it too) - fixed

 
Tziporet

 
-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Scott
Weitzenkamp (sweitzen)
Sent: Thursday, September 07, 2006 9:23 AM
To: EWG
Cc: OPENIB
Subject: [openfabrics-ewg] Cisco SQA results so far for OFED 1.1 rc3

 
Testing is still continuing, we have not started testing RHEL4 U4, SLES
10, IPoIB HA, or SRP HA yet.

 
Some high points of rc3 (all testing done with Mellanox HCAs and Cisco
switches):

*         We have migrated from Intel 9.0 compilers to 9.1.

*         Are seeing 2 million msg/sec with MVAPICH.

*         Sinai 1.1.000 firmware fixes SDP scalability.

*         Open MPI 1.1.1 is working better than 1.1.

*         We see up to 3.5 Gb/sec max throughput with IPoIB on latest
Intel Xeon and AMD Opteron processors.

Many bug fixes have been tested:

*         193   OFED 1.1 rc1: openib-diags should not be linked with
opensm libs

*         197   OFED 1.1rc1: Open MPI fails on RHEL4 64-bit

*         74    OFED 1.0 rc4: Open MPI Pallas test hangs

*         109   OFED 1.0 rc5: SDP can't sustain 100+ concurrent SDP
connections (mem leak?)

*         101   OFED 1.0: need documentation on openib-diags

*         80    OFED 1.0 rc4: Open MPI fails on RHEL4 U3 ppc64

*         135   OFED 1.0: MVAPICH doesn't work on RHEL4 U3 ppc64

*         176   OFED 1.0: mpicc fails with Intel C on RHEL4 IA64

*         179   move /usr/local/ofed/sbin binaries to
/usr/local/ofed/bin

*         103   OFED 1.0: change ibutils to not depend on opensm

Scott Weitzenkamp

SQA and Release Manager

Server Virtualization Business Unit

Cisco Systems

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/46c63bbb/attachment.html>

From ishai at dev.mellanox.co.il  Thu Sep  7 04:18:22 2006
From: ishai at dev.mellanox.co.il (ishai at dev.mellanox.co.il)
Date: Thu, 7 Sep 2006 14:18:22 +0300 (IDT)
Subject: [openib-general] FW: OFED 1.1 rc3 srp driver panic
In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com>
References: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com>
Message-ID: <58551.194.90.237.34.1157627902.squirrel@dev.mellanox.co.il>

I think I found the race that causes this NULL Dereference.

1) There is a connection error.

2) srp_completion gets bad status and schedules a call to srp_reconnect_work.

3) srp_reconnect_work is scheduled to run and calls srp_reconnect_target.

4) srp_reconnect_target starts to run, changes the target state to
SRP_TARGET_CONNECTING but there is a context switch before it gets to
execute srp_reset_req.

5) The scsi error handling calls to srp_reset_host.

6) srp_reset_host calls srp_reconnect_target that returns -EAGAIN
(because the target state is not SRP_TARGET_LIVE).

7) srp_reset_host returns FAILED and therefore the device goes offline.

8) Because the device goes offline the commands are being freed (In the
scsi mid-layer).

9) The first execution of srp_reconnect_target resumes and calls to
srp_reset_req that tries to access the commands that were freed.

10) NULL deref.

Ishai


From mst at mellanox.co.il  Thu Sep  7 05:00:01 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 7 Sep 2006 15:00:01 +0300
Subject: [openib-general] [PATCH] IB/srp: don't schedule reconnect from srp,
 scsi does it for us (was Re: FW: OFED 1.1 rc3 srp driver panic)
In-Reply-To: <58551.194.90.237.34.1157627902.squirrel@dev.mellanox.co.il>
References: <6AB138A2AB8C8E4A98B9C0C3D52670E302D86A99@mtlexch01.mtl.com>
	<58551.194.90.237.34.1157627902.squirrel@dev.mellanox.co.il>
Message-ID: <20060907120001.GO6928@mellanox.co.il>

Quoting r. ishai at dev.mellanox.co.il <ishai at dev.mellanox.co.il>:
> Subject: Re: FW: OFED 1.1 rc3 srp driver panic
> 
> I think I found the race that causes this NULL Dereference.
> 
> 1) There is a connection error.
> 
> 2) srp_completion gets bad status and schedules a call to srp_reconnect_work.
> 
> 3) srp_reconnect_work is scheduled to run and calls srp_reconnect_target.
> 
> 4) srp_reconnect_target starts to run, changes the target state to
> SRP_TARGET_CONNECTING but there is a context switch before it gets to
> execute srp_reset_req.
> 
> 5) The scsi error handling calls to srp_reset_host.
> 
> 6) srp_reset_host calls srp_reconnect_target that returns -EAGAIN
> (because the target state is not SRP_TARGET_LIVE).
> 
> 7) srp_reset_host returns FAILED and therefore the device goes offline.
> 
> 8) Because the device goes offline the commands are being freed (In the
> scsi mid-layer).
> 
> 9) The first execution of srp_reconnect_target resumes and calls to
> srp_reset_req that tries to access the commands that were freed.
> 
> 10) NULL deref.
> 
> Ishai

It seems that we don't really need to schedule srp_reconnect_work on
error since it will be called later anyway.
So it seems we can address these crashes and simplify srp
in the following way:

---

IB/srp: don't schedule reconnet from srp, scsi does it for us

If there is a problem in the connection, the scsi mid-layer will
eventually call reset_host that will call srp_reconnect, so we
do not need to schedule srp_reconnect_work from srp_completion.

Removing this prevents srp_reset_host from failing if srp_completion
was in progress, which in turn was causing crashes as both scsi
midlayer and srp_reconnect were cancelling commands.

Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-06 15:37:50.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-07 11:16:28.000000000 +0300
@@ -799,13 +799,6 @@ static void srp_process_rsp(struct srp_t
 	spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
 }
 
-static void srp_reconnect_work(void *target_ptr)
-{
-	struct srp_target_port *target = target_ptr;
-
-	srp_reconnect_target(target);
-}
-
 static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 {
 	struct srp_iu *iu;
@@ -858,7 +851,6 @@ static void srp_completion(struct ib_cq 
 {
 	struct srp_target_port *target = target_ptr;
 	struct ib_wc wc;
-	unsigned long flags;
 
 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
@@ -866,10 +858,6 @@ static void srp_completion(struct ib_cq 
 			printk(KERN_ERR PFX "failed %s status %d\n",
 			       wc.wr_id & SRP_OP_RECV ? "receive" : "send",
 			       wc.status);
-			spin_lock_irqsave(target->scsi_host->host_lock, flags);
-			if (target->state == SRP_TARGET_LIVE)
-				schedule_work(&target->work);
-			spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
 			break;
 		}
 
@@ -1724,8 +1712,6 @@ static ssize_t srp_create_target(struct 
 	target->scsi_host  = target_host;
 	target->srp_host   = host;
 
-	INIT_WORK(&target->work, srp_reconnect_work, target);
-
 	INIT_LIST_HEAD(&target->free_reqs);
 	INIT_LIST_HEAD(&target->req_queue);
 	for (i = 0; i < SRP_SQ_SIZE; ++i) {

-- 
MST


From vlad at mellanox.co.il  Thu Sep  7 05:05:08 2006
From: vlad at mellanox.co.il (Vladimir Sokolovsky)
Date: Thu, 07 Sep 2006 15:05:08 +0300
Subject: [openib-general] [openfabrics-ewg] OFED 1.1-rc2 is ready (how
 do I enable madeye)?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023884A4@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023884A4@xmb-sjc-216.amer.cisco.com>
Message-ID: <45000AF4.8040103@mellanox.co.il>

Madeye build will be available in OFED-1.1-rc4.

To build madeye run:
    *export OPENIB_PARAMS="--with-madeye-mod"*

(or put it into ofed.conf file for unattended installation)

Run */install.sh
/*( or /./install.sh -c openib.conf/ for unattended installation)

Regards,
Vladimir


Scott Weitzenkamp (sweitzen) wrote:
>> 5. Added Madeye utility
>>     
>
> How do I build madeye?  I don't see any reference to it to install.sh.
> Is there any documentation for madeye?
>
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From Brian.Cain at ge.com  Thu Sep  7 06:32:10 2006
From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare))
Date: Thu, 7 Sep 2006 09:32:10 -0400
Subject: [openib-general] PXE + infiniband?
In-Reply-To: <005701c6d24f$3cf2c3f0$8000a8c0@blorp>
Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033DC1296@CINMLVEM11.e2k.ad.ge.com>

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Paul Baxter
> Sent: Thursday, September 07, 2006 2:29 AM
> To: openib-general at openib.org; Eli cohen
> Subject: Re: [openib-general] PXE + infiniband?
> 
> >> There is an implementation of PXE for Mellanox's HCAs that 
> can be found
> >> here: http://sourceforge.net/forum/forum.php?forum_id=494529
> >
> > Thanks for the tip
> >
> > I, too, am interested in this.
> >
> > Do you have a more direct link as I wandered around 
> etherboot's project 
> > site
> > and couldn't find anything IB-specific.
> 
> 
> I must have been having a 'special moment' before, because I 
> couldn't find 
> the mailing lists
> 
> Here they are!
> 
> http://sourceforge.net/search/?ml_name=etherboot-developers&ty
> pe_of_search=mlists&group_id=4233&words=infiniband 

I was able to follow the procedure outlined in Eli's README and I
achieved some mixed results.  On one hand, lspci now shows "Expansion
ROM at ed700000 [disabled] [size=1M]" whereas it didn't indicate that
before ("disabled" means it's zeroed out, maybe?).  The BIOS seems to
confirm the whole disabled thing since it doesn't list the HCA in the
boot priority list.

After making this change, IPoIB seems to work via this HCA, but SRP
(initiation, anyways) no longer does.  "ibsrpdm -c" no longer produces
any output, even though I can see the target via the ibnetdiscover.
Accessing the SRP target from another host on the fabric works fine.

-Brian


From rdreier at cisco.com  Thu Sep  7 07:20:22 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 07 Sep 2006 07:20:22 -0700
Subject: [openib-general] [-mm patch] drivers/infiniband/hw/amso1100/:
 possible cleanups
In-Reply-To: <C125015F.84B2%tom@opengridcomputing.com> (Tom Tucker's
	message of "Wed, 06 Sep 2006 22:51:11 -0500")
References: <C125015F.84B2%tom@opengridcomputing.com>
Message-ID: <adak64fbvx5.fsf@cisco.com>

    Tom> Is there anything we know about that is still unresolved at
    Tom> this point?  We've got a bunch of balls up in the air here
    Tom> and I want to make sure we haven't dropped one.

nope, I think we're good.


From tom at opengridcomputing.com  Thu Sep  7 07:56:22 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Thu, 07 Sep 2006 09:56:22 -0500
Subject: [openib-general] RDMA CMA and C++
Message-ID: <1157640982.20399.5.camel@trinity.ogc.int>

Sean:

The user-mode cm header files don't have the C++ stuff to identify all
the declarations as C. The verbs.h file has it and works fine if you
wanted to copy it, but all you really need is ...

#ifdef __cpluplus
extern "C" {
#endif

		at the top and and, 

#ifdef __cplusplus
}
#endif

		at the bottom of the file.

Thanks,
Tom


From dotanb at dev.mellanox.co.il  Thu Sep  7 08:13:21 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 07 Sep 2006 18:13:21 +0300
Subject: [openib-general] RDMA CMA and C++
In-Reply-To: <1157640982.20399.5.camel@trinity.ogc.int>
References: <1157640982.20399.5.camel@trinity.ogc.int>
Message-ID: <45003711.3040108@dev.mellanox.co.il>

Tom Tucker wrote:
> Sean:
>
> The user-mode cm header files don't have the C++ stuff to identify all
> the declarations as C. The verbs.h file has it and works fine if you
> wanted to copy it, but all you really need is ...
>
> #ifdef __cpluplus
> extern "C" {
> #endif
>
> 		at the top and and, 
>
> #ifdef __cplusplus
> }
> #endif
>
> 		at the bottom of the file.
>
> Thanks,
> Tom
>   


Sean, please add those definitions to the libibcm header as well.

Dotan


From thomas.bub at thomson.net  Thu Sep  7 08:20:15 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Thu, 7 Sep 2006 17:20:15 +0200
Subject: [openib-general] libibcm can't connect/talk to libicm on other
 machine.
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD38D@wdtssmail01.eu.thmulti.com>


Sean,
Finally I could compile the cmpost example.
The solution was:
1.) Use the OFED-1.1-rc3 instead of OFED-1.0.1 This removed some missing
DEFINES. As an "End-User" ;-) I'm not following the SVN tree but
installing releases.
2.) Add "#include <infiniband/sa.h>" to cmpost.c.

Now I can compile and use the example at least on one machine.
The issues with client and server on one machine, that I reported
yesterday, are not visible as well.
So I'm able now to debug my connection establishment and the initial
data exchange.
Next week I can debug cmpost on two different machines, my second
machine has been stolen by a developer colleague till mid of next week.
;-)

I would suggest that the cmpost.c example, including the missing include
from above, might be integrated into the next OFED-release?
Thanks
Thomas


-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Sean Hefty
Sent: Tuesday, September 05, 2006 8:15 PM
To: Bub Thomas
Cc: openib-general at openib.org
Subject: Re: [openib-general] libibcm can't connect/talk to libicm on
other machine.

Bub Thomas wrote:
> Dotan,
> the ibv_rc_pingpong example works for me so I can exclude the
> architecture.
> I never got the libibcm example compiled.
> Which is your example and which architecture x86 vs. x86_64 did you
> compile it for?
> Can you share your libibcm the example code? (if it is not the
standard
> that I can't get compiled)
> Thomas

Did you try applying the following patch?

http://openib.org/pipermail/openib-general/2006-August/025005.html

I should also mention that I have a version of cmpost that works with
the new 
libibsa, but I am waiting for the review of the kernel sa_query changes
before 
committing.

- Sean

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From thomas.bub at thomson.net  Thu Sep  7 08:29:55 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Thu, 7 Sep 2006 17:29:55 +0200
Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC431@wdtssmail01.eu.thmulti.com>


Dotan,
Find my answers inline.
Since I could get the cmpost example from Sean compiled and running I
will try to compare cmpost.c with my code and find the bugs in my code
this way.
I will keep your connect_qp example for the case that I can't find the
problems the other way.
Thanks
Thomas


Which QP do you use (RC/UC/UD)?

[Bub] Rc 
do you get any completion in the connector side?

[Bub] Only the the errors
 
If you are using RC QP:
the reason for not getting any completion in the CQ is that

Did you post any RR (Receive Request) at the listener side?

[Bub] yes
rnr_retry =7 means that in case of RNR retry there will be infinite
retries

[Bub] rnr_retry is at 4

if the timeout = 0 and the remote QP is not ready then there won't be 
any retransmition.
[Bub] I have the timeout at 254

Dotan


From swise at opengridcomputing.com  Thu Sep  7 08:50:45 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 07 Sep 2006 10:50:45 -0500
Subject: [openib-general] missing dtest program evdtest.c
Message-ID: <1157644245.28308.55.camel@stevo-desktop>


Is dapl/test/dtest missing evdtest.c?  Its in the makefile...

Steve.


From toralf.foerster at gmx.de  Thu Sep  7 10:02:56 2006
From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=)
Date: Thu, 7 Sep 2006 19:02:56 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
Message-ID: <200609071902.57379.toralf.foerster@gmx.de>

The compile test of the attached .config failed :
...

drivers/built-in.o: In function `iser_connect':
drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id'
drivers/infiniband/ulp/iser/iser_verbs.c:525: undefined reference to `rdma_resolve_addr'
drivers/built-in.o: In function `iscsi_transport_init':
drivers/scsi/scsi_transport_iscsi.c:1636: undefined reference to `netlink_register_notifier'
drivers/scsi/scsi_transport_iscsi.c:1640: undefined reference to `netlink_kernel_create'
drivers/scsi/scsi_transport_iscsi.c:1652: undefined reference to `sock_release'
drivers/scsi/scsi_transport_iscsi.c:1654: undefined reference to `netlink_unregister_notifier'
drivers/built-in.o: In function `iscsi_transport_exit':
drivers/scsi/scsi_transport_iscsi.c:1669: undefined reference to `sock_release'
drivers/scsi/scsi_transport_iscsi.c:1670: undefined reference to `netlink_unregister_notifier'
make: *** [.tmp_vmlinux1] Error 1

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.18-rc6-git1
# Thu Sep  7 18:29:08 2006
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_RELAY=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_RT_MUTEXES=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
# CONFIG_MODULES is not set

#
# Block layer
#
CONFIG_LBD=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUMM=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_X86_UP_APIC is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_X86_REBOOTFIXUPS=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_EFI_VARS is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
CONFIG_MATH_EMULATION=y
# CONFIG_MTRR is not set
CONFIG_EFI=y
CONFIG_BOOT_IOREMAP=y
CONFIG_REGPARM=y
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_PHYSICAL_START=0x100000
# CONFIG_COMPAT_VDSO is not set

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
# CONFIG_ACPI_AC is not set
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y

#
# APM (Advanced Power Management) BIOS Support
#
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
CONFIG_PCI_GOMMCONFIG=y
# CONFIG_PCI_GODIRECT is not set
# CONFIG_PCI_GOANY is not set
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_PCI_DEBUG=y
CONFIG_ISA_DMA_API=y
# CONFIG_ISA is not set
# CONFIG_MCA is not set
CONFIG_SCx200=y
CONFIG_SCx200HR_TIMER=y

#
# PCCARD (PCMCIA/CardBus) support
#
CONFIG_PCCARD=y
CONFIG_PCMCIA_DEBUG=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_IOCTL=y
# CONFIG_CARDBUS is not set

#
# PC-card bridges
#
# CONFIG_YENTA is not set
CONFIG_PD6729=y
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y

#
# PCI Hotplug Support
#

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
CONFIG_BINFMT_AOUT=y
# CONFIG_BINFMT_MISC is not set

#
# Networking
#
# CONFIG_NET is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_DEBUG_DRIVER=y
# CONFIG_SYS_HYPERVISOR is not set

#
# Connector - unified userspace <-> kernelspace linker
#

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_SERIAL=y
# CONFIG_PARPORT_PC_PCMCIA is not set
CONFIG_PARPORT_NOT_PC=y
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=y
CONFIG_PARPORT_1284=y

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
CONFIG_PARIDE=y
CONFIG_PARIDE_PARPORT=y

#
# Parallel IDE high-level drivers
#
# CONFIG_PARIDE_PD is not set
# CONFIG_PARIDE_PCD is not set
# CONFIG_PARIDE_PF is not set
# CONFIG_PARIDE_PT is not set
CONFIG_PARIDE_PG=y

#
# Parallel IDE protocol modules
#
CONFIG_PARIDE_ATEN=y
CONFIG_PARIDE_BPCK=y
CONFIG_PARIDE_BPCK6=y
CONFIG_PARIDE_COMM=y
CONFIG_PARIDE_DSTR=y
# CONFIG_PARIDE_FIT2 is not set
# CONFIG_PARIDE_FIT3 is not set
# CONFIG_PARIDE_EPAT is not set
CONFIG_PARIDE_EPIA=y
CONFIG_PARIDE_FRIQ=y
CONFIG_PARIDE_FRPW=y
# CONFIG_PARIDE_KBIC is not set
CONFIG_PARIDE_KTTI=y
CONFIG_PARIDE_ON20=y
# CONFIG_PARIDE_ON26 is not set
CONFIG_BLK_CPQ_DA=y
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
# CONFIG_BLK_DEV_LOOP is not set
CONFIG_BLK_DEV_SX8=y
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CDROM_PKTCDVD is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_BLK_DEV_IDECS is not set
# CONFIG_BLK_DEV_IDECD is not set
CONFIG_BLK_DEV_IDEFLOPPY=y
# CONFIG_BLK_DEV_IDESCSI is not set
CONFIG_IDE_TASK_IOCTL=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_OFFBOARD=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_RZ1000=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_AEC62XX=y
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
CONFIG_BLK_DEV_ATIIXP=y
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_CS5535 is not set
# CONFIG_BLK_DEV_HPT34X is not set
CONFIG_BLK_DEV_HPT366=y
CONFIG_BLK_DEV_SC1200=y
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
# CONFIG_PDC202XX_BURST is not set
CONFIG_BLK_DEV_PDC202XX_NEW=y
# CONFIG_BLK_DEV_SVWKS is not set
CONFIG_BLK_DEV_SIIMAGE=y
# CONFIG_BLK_DEV_SIS5513 is not set
CONFIG_BLK_DEV_SLC90E66=y
CONFIG_BLK_DEV_TRM290=y
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=y
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=y

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
CONFIG_SCSI_LOGGING=y

#
# SCSI Transport Attributes
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
CONFIG_SCSI_3W_9XXX=y
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
# CONFIG_AIC79XX_ENABLE_RD_STRM is not set
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_DPT_I2O is not set
CONFIG_SCSI_ADVANSYS=y
# CONFIG_MEGARAID_NEWGEN is not set
CONFIG_MEGARAID_LEGACY=y
CONFIG_MEGARAID_SAS=y
# CONFIG_SCSI_SATA is not set
CONFIG_SCSI_HPTIOP=y
CONFIG_SCSI_BUSLOGIC=y
# CONFIG_SCSI_OMIT_FLASHPOINT is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
CONFIG_SCSI_FUTURE_DOMAIN=y
CONFIG_SCSI_GDTH=y
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
CONFIG_SCSI_PPA=y
CONFIG_SCSI_IMM=y
# CONFIG_SCSI_IZIP_EPP16 is not set
CONFIG_SCSI_IZIP_SLOW_CTR=y
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=y
CONFIG_SCSI_QLA_FC=y
# CONFIG_SCSI_LPFC is not set
CONFIG_SCSI_DC390T=y
CONFIG_SCSI_NSP32=y
# CONFIG_SCSI_DEBUG is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
CONFIG_I2O=y
CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_CONFIG=y
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=y
CONFIG_I2O_BLOCK=y
CONFIG_I2O_SCSI=y
CONFIG_I2O_PROC=y

#
# ISDN subsystem
#

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
CONFIG_KEYBOARD_LKKBD=y
CONFIG_KEYBOARD_XTKBD=y
CONFIG_KEYBOARD_NEWTON=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_JOYSTICK=y
CONFIG_JOYSTICK_ANALOG=y
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
CONFIG_JOYSTICK_GF2K=y
# CONFIG_JOYSTICK_GRIP is not set
CONFIG_JOYSTICK_GRIP_MP=y
CONFIG_JOYSTICK_GUILLEMOT=y
CONFIG_JOYSTICK_INTERACT=y
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
CONFIG_JOYSTICK_SPACEBALL=y
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=y
CONFIG_JOYSTICK_DB9=y
CONFIG_JOYSTICK_GAMECON=y
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_JOYDUMP=y
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_WISTRON_BTNS is not set
# CONFIG_INPUT_UINPUT is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
CONFIG_SERIO_PARKBD=y
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=y
# CONFIG_GAMEPORT_L4 is not set
# CONFIG_GAMEPORT_EMU10K1 is not set
CONFIG_GAMEPORT_FM801=y

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=y
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
CONFIG_MOXA_SMARTIO=y
CONFIG_ISI=y
CONFIG_SYNCLINK=y
CONFIG_SYNCLINKMP=y
CONFIG_SYNCLINK_GT=y
CONFIG_N_HDLC=y
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
CONFIG_SX=y
# CONFIG_RIO is not set
CONFIG_STALDRV=y
# CONFIG_STALLION is not set
# CONFIG_ISTALLION is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=y
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
# CONFIG_TIPAR is not set

#
# IPMI
#
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=y
CONFIG_IPMI_SI=y
CONFIG_IPMI_WATCHDOG=y
# CONFIG_IPMI_POWEROFF is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_INTEL is not set
CONFIG_HW_RANDOM_AMD=y
CONFIG_HW_RANDOM_GEODE=y
CONFIG_HW_RANDOM_VIA=y
CONFIG_NVRAM=y
# CONFIG_RTC is not set
CONFIG_GEN_RTC=y
# CONFIG_GEN_RTC_X is not set
# CONFIG_DTLK is not set
CONFIG_R3964=y
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
# CONFIG_AGP is not set
# CONFIG_DRM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
CONFIG_MWAVE=y
# CONFIG_SCx200_GPIO is not set
# CONFIG_PC8736x_GPIO is not set
# CONFIG_NSC_GPIO is not set
CONFIG_CS5535_GPIO=y
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_RTC_IRQ is not set
# CONFIG_HPET_MMAP is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y

#
# I2C Algorithms
#
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCF=y
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
CONFIG_I2C_I810=y
CONFIG_I2C_PIIX4=y
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
CONFIG_I2C_PROSAVAGE=y
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIAPRO is not set
# CONFIG_I2C_VOODOO3 is not set
# CONFIG_I2C_PCA_ISA is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_I2C_DEBUG_CORE is not set
CONFIG_I2C_DEBUG_ALGO=y
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# SPI support
#
# CONFIG_SPI is not set
# CONFIG_SPI_MASTER is not set

#
# Dallas's 1-wire bus
#

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Misc devices
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set
CONFIG_VIDEO_V4L2=y

#
# Digital Video Broadcasting Devices
#

#
# Graphics support
#
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
CONFIG_FB_CYBER2000=y
CONFIG_FB_ARC=y
CONFIG_FB_ASILIANT=y
CONFIG_FB_IMSTT=y
CONFIG_FB_VGA16=y
# CONFIG_FB_VESA is not set
# CONFIG_FB_IMAC is not set
# CONFIG_FB_HGA is not set
CONFIG_FB_S1D13XXX=y
# CONFIG_FB_NVIDIA is not set
CONFIG_FB_RIVA=y
CONFIG_FB_RIVA_I2C=y
CONFIG_FB_RIVA_DEBUG=y
CONFIG_FB_MATROX=y
# CONFIG_FB_MATROX_MILLENIUM is not set
CONFIG_FB_MATROX_MYSTIQUE=y
CONFIG_FB_MATROX_G=y
CONFIG_FB_MATROX_I2C=y
# CONFIG_FB_MATROX_MAVEN is not set
CONFIG_FB_MATROX_MULTIHEAD=y
CONFIG_FB_RADEON=y
# CONFIG_FB_RADEON_I2C is not set
CONFIG_FB_RADEON_DEBUG=y
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_SIS is not set
CONFIG_FB_NEOMAGIC=y
CONFIG_FB_KYRO=y
CONFIG_FB_3DFX=y
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_CYBLA is not set
CONFIG_FB_TRIDENT=y
CONFIG_FB_VIRTUAL=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set

#
# Logo configuration
#
# CONFIG_LOGO is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Sound
#
CONFIG_SOUND=y

#
# Advanced Linux Sound Architecture
#
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_RAWMIDI=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
# CONFIG_SND_PCM_OSS_PLUGINS is not set
# CONFIG_SND_SEQUENCER_OSS is not set
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set

#
# Generic devices
#
CONFIG_SND_MPU401_UART=y
CONFIG_SND_OPL3_LIB=y
CONFIG_SND_VX_LIB=y
CONFIG_SND_AC97_CODEC=y
CONFIG_SND_AC97_BUS=y
# CONFIG_SND_DUMMY is not set
CONFIG_SND_VIRMIDI=y
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=y

#
# PCI devices
#
# CONFIG_SND_AD1889 is not set
CONFIG_SND_ALS300=y
CONFIG_SND_ALS4000=y
CONFIG_SND_ALI5451=y
# CONFIG_SND_ATIIXP is not set
CONFIG_SND_ATIIXP_MODEM=y
CONFIG_SND_AU8810=y
CONFIG_SND_AU8820=y
CONFIG_SND_AU8830=y
CONFIG_SND_BT87X=y
# CONFIG_SND_BT87X_OVERCLOCK is not set
# CONFIG_SND_CA0106 is not set
CONFIG_SND_CMIPCI=y
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
CONFIG_SND_CS5535AUDIO=y
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
CONFIG_SND_DARLA24=y
# CONFIG_SND_GINA24 is not set
CONFIG_SND_LAYLA24=y
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
CONFIG_SND_INDIGO=y
# CONFIG_SND_INDIGOIO is not set
CONFIG_SND_INDIGODJ=y
# CONFIG_SND_EMU10K1 is not set
CONFIG_SND_EMU10K1X=y
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
CONFIG_SND_ES1938=y
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_HDA_INTEL is not set
CONFIG_SND_HDSP=y
CONFIG_SND_HDSPM=y
# CONFIG_SND_ICE1712 is not set
CONFIG_SND_ICE1724=y
CONFIG_SND_INTEL8X0=y
CONFIG_SND_INTEL8X0M=y
CONFIG_SND_KORG1212=y
CONFIG_SND_MAESTRO3=y
# CONFIG_SND_MIXART is not set
CONFIG_SND_NM256=y
# CONFIG_SND_PCXHR is not set
CONFIG_SND_RIPTIDE=y
CONFIG_SND_RME32=y
CONFIG_SND_RME96=y
# CONFIG_SND_RME9652 is not set
CONFIG_SND_SONICVIBES=y
CONFIG_SND_TRIDENT=y
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VX222 is not set
CONFIG_SND_YMFPCI=y

#
# PCMCIA devices
#
CONFIG_SND_VXPOCKET=y
# CONFIG_SND_PDAUDIOCF is not set

#
# Open Sound System
#
# CONFIG_SOUND_PRIME is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
# CONFIG_USB is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# USB Gadget Support
#
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_DEBUG_FILES=y
CONFIG_USB_GADGET_SELECTED=y
CONFIG_USB_GADGET_NET2280=y
CONFIG_USB_NET2280=y
# CONFIG_USB_GADGET_PXA2XX is not set
# CONFIG_USB_GADGET_GOKU is not set
# CONFIG_USB_GADGET_LH7A40X is not set
# CONFIG_USB_GADGET_OMAP is not set
# CONFIG_USB_GADGET_AT91 is not set
# CONFIG_USB_GADGET_DUMMY_HCD is not set
CONFIG_USB_GADGET_DUALSPEED=y
# CONFIG_USB_ZERO is not set
# CONFIG_USB_ETH is not set
# CONFIG_USB_GADGETFS is not set
CONFIG_USB_FILE_STORAGE=y
CONFIG_USB_FILE_STORAGE_TEST=y
# CONFIG_USB_G_SERIAL is not set

#
# MMC/SD Card support
#
CONFIG_MMC=y
CONFIG_MMC_DEBUG=y
# CONFIG_MMC_BLOCK is not set
CONFIG_MMC_WBSD=y

#
# LED devices
#
# CONFIG_NEW_LEDS is not set

#
# LED drivers
#

#
# LED Triggers
#

#
# InfiniBand support
#
CONFIG_INFINIBAND=y
CONFIG_INFINIBAND_USER_MAD=y
CONFIG_INFINIBAND_USER_ACCESS=y
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_SRP=y
CONFIG_INFINIBAND_ISER=y

#
# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
#

#
# Real Time Clock
#

#
# DMA Engine support
#
CONFIG_DMA_ENGINE=y

#
# DMA Clients
#

#
# DMA Devices
#
CONFIG_INTEL_IOATDMA=y

#
# File systems
#
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
CONFIG_JFS_POSIX_ACL=y
# CONFIG_JFS_SECURITY is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_INOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_HFSPLUS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
CONFIG_HPFS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
CONFIG_UFS_FS=y
# CONFIG_UFS_DEBUG is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
CONFIG_ACORN_PARTITION=y
# CONFIG_ACORN_PARTITION_CUMANA is not set
# CONFIG_ACORN_PARTITION_EESOX is not set
# CONFIG_ACORN_PARTITION_ICS is not set
# CONFIG_ACORN_PARTITION_ADFS is not set
# CONFIG_ACORN_PARTITION_POWERTEC is not set
CONFIG_ACORN_PARTITION_RISCIX=y
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
CONFIG_SOLARIS_X86_PARTITION=y
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=y
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
CONFIG_NLS_CODEPAGE_860=y
CONFIG_NLS_CODEPAGE_861=y
CONFIG_NLS_CODEPAGE_862=y
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=y
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=y
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=y
CONFIG_NLS_ISO8859_8=y
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
# CONFIG_NLS_ISO8859_1 is not set
CONFIG_NLS_ISO8859_2=y
# CONFIG_NLS_ISO8859_3 is not set
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
CONFIG_NLS_ISO8859_13=y
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHEDSTATS=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_VM=y
CONFIG_FRAME_POINTER=y
# CONFIG_UNWIND_INFO is not set
# CONFIG_FORCED_INLINING is not set
CONFIG_RCU_TORTURE_TEST=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_RODATA is not set
CONFIG_4KSTACKS=y
CONFIG_DOUBLEFAULT=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_DEBUG_PROC_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
CONFIG_CRYPTO_SHA512=y
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_BLOWFISH is not set
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_586=y
CONFIG_CRYPTO_CAST5=y
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
CONFIG_CRYPTO_KHAZAD=y
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_CRC32C is not set

#
# Hardware crypto devices
#
# CONFIG_CRYPTO_DEV_PADLOCK is not set

#
# Library routines
#
CONFIG_CRC_CCITT=y
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_PLIST=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_KTIME_SCALAR=y


-------------------------------------------------------

-- 
MfG/Sincerely
Toralf Förster
-------------- next part --------------
The compile test of the attached .config failed :
...

drivers/built-in.o: In function `iser_connect':
drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id'
drivers/infiniband/ulp/iser/iser_verbs.c:525: undefined reference to `rdma_resolve_addr'
drivers/built-in.o: In function `iscsi_transport_init':
drivers/scsi/scsi_transport_iscsi.c:1636: undefined reference to `netlink_register_notifier'
drivers/scsi/scsi_transport_iscsi.c:1640: undefined reference to `netlink_kernel_create'
drivers/scsi/scsi_transport_iscsi.c:1652: undefined reference to `sock_release'
drivers/scsi/scsi_transport_iscsi.c:1654: undefined reference to `netlink_unregister_notifier'
drivers/built-in.o: In function `iscsi_transport_exit':
drivers/scsi/scsi_transport_iscsi.c:1669: undefined reference to `sock_release'
drivers/scsi/scsi_transport_iscsi.c:1670: undefined reference to `netlink_unregister_notifier'
make: *** [.tmp_vmlinux1] Error 1

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.18-rc6-git1
# Thu Sep  7 18:29:08 2006
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_RELAY=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_RT_MUTEXES=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
# CONFIG_MODULES is not set

#
# Block layer
#
CONFIG_LBD=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUMM=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_X86_UP_APIC is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_X86_REBOOTFIXUPS=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_EFI_VARS is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
CONFIG_MATH_EMULATION=y
# CONFIG_MTRR is not set
CONFIG_EFI=y
CONFIG_BOOT_IOREMAP=y
CONFIG_REGPARM=y
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_PHYSICAL_START=0x100000
# CONFIG_COMPAT_VDSO is not set

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
# CONFIG_ACPI_AC is not set
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y

#
# APM (Advanced Power Management) BIOS Support
#
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
CONFIG_PCI_GOMMCONFIG=y
# CONFIG_PCI_GODIRECT is not set
# CONFIG_PCI_GOANY is not set
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_PCI_DEBUG=y
CONFIG_ISA_DMA_API=y
# CONFIG_ISA is not set
# CONFIG_MCA is not set
CONFIG_SCx200=y
CONFIG_SCx200HR_TIMER=y

#
# PCCARD (PCMCIA/CardBus) support
#
CONFIG_PCCARD=y
CONFIG_PCMCIA_DEBUG=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_IOCTL=y
# CONFIG_CARDBUS is not set

#
# PC-card bridges
#
# CONFIG_YENTA is not set
CONFIG_PD6729=y
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y

#
# PCI Hotplug Support
#

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
CONFIG_BINFMT_AOUT=y
# CONFIG_BINFMT_MISC is not set

#
# Networking
#
# CONFIG_NET is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_DEBUG_DRIVER=y
# CONFIG_SYS_HYPERVISOR is not set

#
# Connector - unified userspace <-> kernelspace linker
#

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_SERIAL=y
# CONFIG_PARPORT_PC_PCMCIA is not set
CONFIG_PARPORT_NOT_PC=y
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=y
CONFIG_PARPORT_1284=y

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
CONFIG_PARIDE=y
CONFIG_PARIDE_PARPORT=y

#
# Parallel IDE high-level drivers
#
# CONFIG_PARIDE_PD is not set
# CONFIG_PARIDE_PCD is not set
# CONFIG_PARIDE_PF is not set
# CONFIG_PARIDE_PT is not set
CONFIG_PARIDE_PG=y

#
# Parallel IDE protocol modules
#
CONFIG_PARIDE_ATEN=y
CONFIG_PARIDE_BPCK=y
CONFIG_PARIDE_BPCK6=y
CONFIG_PARIDE_COMM=y
CONFIG_PARIDE_DSTR=y
# CONFIG_PARIDE_FIT2 is not set
# CONFIG_PARIDE_FIT3 is not set
# CONFIG_PARIDE_EPAT is not set
CONFIG_PARIDE_EPIA=y
CONFIG_PARIDE_FRIQ=y
CONFIG_PARIDE_FRPW=y
# CONFIG_PARIDE_KBIC is not set
CONFIG_PARIDE_KTTI=y
CONFIG_PARIDE_ON20=y
# CONFIG_PARIDE_ON26 is not set
CONFIG_BLK_CPQ_DA=y
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
# CONFIG_BLK_DEV_LOOP is not set
CONFIG_BLK_DEV_SX8=y
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CDROM_PKTCDVD is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_BLK_DEV_IDECS is not set
# CONFIG_BLK_DEV_IDECD is not set
CONFIG_BLK_DEV_IDEFLOPPY=y
# CONFIG_BLK_DEV_IDESCSI is not set
CONFIG_IDE_TASK_IOCTL=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_OFFBOARD=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_RZ1000=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_AEC62XX=y
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
CONFIG_BLK_DEV_ATIIXP=y
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_CS5535 is not set
# CONFIG_BLK_DEV_HPT34X is not set
CONFIG_BLK_DEV_HPT366=y
CONFIG_BLK_DEV_SC1200=y
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
# CONFIG_PDC202XX_BURST is not set
CONFIG_BLK_DEV_PDC202XX_NEW=y
# CONFIG_BLK_DEV_SVWKS is not set
CONFIG_BLK_DEV_SIIMAGE=y
# CONFIG_BLK_DEV_SIS5513 is not set
CONFIG_BLK_DEV_SLC90E66=y
CONFIG_BLK_DEV_TRM290=y
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=y
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=y

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
CONFIG_SCSI_LOGGING=y

#
# SCSI Transport Attributes
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
CONFIG_SCSI_3W_9XXX=y
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
# CONFIG_AIC79XX_ENABLE_RD_STRM is not set
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_DPT_I2O is not set
CONFIG_SCSI_ADVANSYS=y
# CONFIG_MEGARAID_NEWGEN is not set
CONFIG_MEGARAID_LEGACY=y
CONFIG_MEGARAID_SAS=y
# CONFIG_SCSI_SATA is not set
CONFIG_SCSI_HPTIOP=y
CONFIG_SCSI_BUSLOGIC=y
# CONFIG_SCSI_OMIT_FLASHPOINT is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
CONFIG_SCSI_FUTURE_DOMAIN=y
CONFIG_SCSI_GDTH=y
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
CONFIG_SCSI_PPA=y
CONFIG_SCSI_IMM=y
# CONFIG_SCSI_IZIP_EPP16 is not set
CONFIG_SCSI_IZIP_SLOW_CTR=y
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=y
CONFIG_SCSI_QLA_FC=y
# CONFIG_SCSI_LPFC is not set
CONFIG_SCSI_DC390T=y
CONFIG_SCSI_NSP32=y
# CONFIG_SCSI_DEBUG is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
CONFIG_I2O=y
CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_CONFIG=y
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=y
CONFIG_I2O_BLOCK=y
CONFIG_I2O_SCSI=y
CONFIG_I2O_PROC=y

#
# ISDN subsystem
#

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
CONFIG_KEYBOARD_LKKBD=y
CONFIG_KEYBOARD_XTKBD=y
CONFIG_KEYBOARD_NEWTON=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_JOYSTICK=y
CONFIG_JOYSTICK_ANALOG=y
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
CONFIG_JOYSTICK_GF2K=y
# CONFIG_JOYSTICK_GRIP is not set
CONFIG_JOYSTICK_GRIP_MP=y
CONFIG_JOYSTICK_GUILLEMOT=y
CONFIG_JOYSTICK_INTERACT=y
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
CONFIG_JOYSTICK_SPACEBALL=y
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=y
CONFIG_JOYSTICK_DB9=y
CONFIG_JOYSTICK_GAMECON=y
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_JOYDUMP=y
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_WISTRON_BTNS is not set
# CONFIG_INPUT_UINPUT is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
CONFIG_SERIO_PARKBD=y
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=y
# CONFIG_GAMEPORT_L4 is not set
# CONFIG_GAMEPORT_EMU10K1 is not set
CONFIG_GAMEPORT_FM801=y

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=y
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
CONFIG_MOXA_SMARTIO=y
CONFIG_ISI=y
CONFIG_SYNCLINK=y
CONFIG_SYNCLINKMP=y
CONFIG_SYNCLINK_GT=y
CONFIG_N_HDLC=y
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
CONFIG_SX=y
# CONFIG_RIO is not set
CONFIG_STALDRV=y
# CONFIG_STALLION is not set
# CONFIG_ISTALLION is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=y
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
# CONFIG_TIPAR is not set

#
# IPMI
#
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=y
CONFIG_IPMI_SI=y
CONFIG_IPMI_WATCHDOG=y
# CONFIG_IPMI_POWEROFF is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_INTEL is not set
CONFIG_HW_RANDOM_AMD=y
CONFIG_HW_RANDOM_GEODE=y
CONFIG_HW_RANDOM_VIA=y
CONFIG_NVRAM=y
# CONFIG_RTC is not set
CONFIG_GEN_RTC=y
# CONFIG_GEN_RTC_X is not set
# CONFIG_DTLK is not set
CONFIG_R3964=y
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
# CONFIG_AGP is not set
# CONFIG_DRM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
CONFIG_MWAVE=y
# CONFIG_SCx200_GPIO is not set
# CONFIG_PC8736x_GPIO is not set
# CONFIG_NSC_GPIO is not set
CONFIG_CS5535_GPIO=y
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_RTC_IRQ is not set
# CONFIG_HPET_MMAP is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y

#
# I2C Algorithms
#
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCF=y
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
CONFIG_I2C_I810=y
CONFIG_I2C_PIIX4=y
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
CONFIG_I2C_PROSAVAGE=y
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIAPRO is not set
# CONFIG_I2C_VOODOO3 is not set
# CONFIG_I2C_PCA_ISA is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_I2C_DEBUG_CORE is not set
CONFIG_I2C_DEBUG_ALGO=y
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# SPI support
#
# CONFIG_SPI is not set
# CONFIG_SPI_MASTER is not set

#
# Dallas's 1-wire bus
#

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Misc devices
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set
CONFIG_VIDEO_V4L2=y

#
# Digital Video Broadcasting Devices
#

#
# Graphics support
#
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
CONFIG_FB_CYBER2000=y
CONFIG_FB_ARC=y
CONFIG_FB_ASILIANT=y
CONFIG_FB_IMSTT=y
CONFIG_FB_VGA16=y
# CONFIG_FB_VESA is not set
# CONFIG_FB_IMAC is not set
# CONFIG_FB_HGA is not set
CONFIG_FB_S1D13XXX=y
# CONFIG_FB_NVIDIA is not set
CONFIG_FB_RIVA=y
CONFIG_FB_RIVA_I2C=y
CONFIG_FB_RIVA_DEBUG=y
CONFIG_FB_MATROX=y
# CONFIG_FB_MATROX_MILLENIUM is not set
CONFIG_FB_MATROX_MYSTIQUE=y
CONFIG_FB_MATROX_G=y
CONFIG_FB_MATROX_I2C=y
# CONFIG_FB_MATROX_MAVEN is not set
CONFIG_FB_MATROX_MULTIHEAD=y
CONFIG_FB_RADEON=y
# CONFIG_FB_RADEON_I2C is not set
CONFIG_FB_RADEON_DEBUG=y
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_SIS is not set
CONFIG_FB_NEOMAGIC=y
CONFIG_FB_KYRO=y
CONFIG_FB_3DFX=y
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_CYBLA is not set
CONFIG_FB_TRIDENT=y
CONFIG_FB_VIRTUAL=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set

#
# Logo configuration
#
# CONFIG_LOGO is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Sound
#
CONFIG_SOUND=y

#
# Advanced Linux Sound Architecture
#
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_RAWMIDI=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
# CONFIG_SND_PCM_OSS_PLUGINS is not set
# CONFIG_SND_SEQUENCER_OSS is not set
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set

#
# Generic devices
#
CONFIG_SND_MPU401_UART=y
CONFIG_SND_OPL3_LIB=y
CONFIG_SND_VX_LIB=y
CONFIG_SND_AC97_CODEC=y
CONFIG_SND_AC97_BUS=y
# CONFIG_SND_DUMMY is not set
CONFIG_SND_VIRMIDI=y
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=y

#
# PCI devices
#
# CONFIG_SND_AD1889 is not set
CONFIG_SND_ALS300=y
CONFIG_SND_ALS4000=y
CONFIG_SND_ALI5451=y
# CONFIG_SND_ATIIXP is not set
CONFIG_SND_ATIIXP_MODEM=y
CONFIG_SND_AU8810=y
CONFIG_SND_AU8820=y
CONFIG_SND_AU8830=y
CONFIG_SND_BT87X=y
# CONFIG_SND_BT87X_OVERCLOCK is not set
# CONFIG_SND_CA0106 is not set
CONFIG_SND_CMIPCI=y
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
CONFIG_SND_CS5535AUDIO=y
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
CONFIG_SND_DARLA24=y
# CONFIG_SND_GINA24 is not set
CONFIG_SND_LAYLA24=y
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
CONFIG_SND_INDIGO=y
# CONFIG_SND_INDIGOIO is not set
CONFIG_SND_INDIGODJ=y
# CONFIG_SND_EMU10K1 is not set
CONFIG_SND_EMU10K1X=y
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
CONFIG_SND_ES1938=y
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_HDA_INTEL is not set
CONFIG_SND_HDSP=y
CONFIG_SND_HDSPM=y
# CONFIG_SND_ICE1712 is not set
CONFIG_SND_ICE1724=y
CONFIG_SND_INTEL8X0=y
CONFIG_SND_INTEL8X0M=y
CONFIG_SND_KORG1212=y
CONFIG_SND_MAESTRO3=y
# CONFIG_SND_MIXART is not set
CONFIG_SND_NM256=y
# CONFIG_SND_PCXHR is not set
CONFIG_SND_RIPTIDE=y
CONFIG_SND_RME32=y
CONFIG_SND_RME96=y
# CONFIG_SND_RME9652 is not set
CONFIG_SND_SONICVIBES=y
CONFIG_SND_TRIDENT=y
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VX222 is not set
CONFIG_SND_YMFPCI=y

#
# PCMCIA devices
#
CONFIG_SND_VXPOCKET=y
# CONFIG_SND_PDAUDIOCF is not set

#
# Open Sound System
#
# CONFIG_SOUND_PRIME is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
# CONFIG_USB is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# USB Gadget Support
#
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_DEBUG_FILES=y
CONFIG_USB_GADGET_SELECTED=y
CONFIG_USB_GADGET_NET2280=y
CONFIG_USB_NET2280=y
# CONFIG_USB_GADGET_PXA2XX is not set
# CONFIG_USB_GADGET_GOKU is not set
# CONFIG_USB_GADGET_LH7A40X is not set
# CONFIG_USB_GADGET_OMAP is not set
# CONFIG_USB_GADGET_AT91 is not set
# CONFIG_USB_GADGET_DUMMY_HCD is not set
CONFIG_USB_GADGET_DUALSPEED=y
# CONFIG_USB_ZERO is not set
# CONFIG_USB_ETH is not set
# CONFIG_USB_GADGETFS is not set
CONFIG_USB_FILE_STORAGE=y
CONFIG_USB_FILE_STORAGE_TEST=y
# CONFIG_USB_G_SERIAL is not set

#
# MMC/SD Card support
#
CONFIG_MMC=y
CONFIG_MMC_DEBUG=y
# CONFIG_MMC_BLOCK is not set
CONFIG_MMC_WBSD=y

#
# LED devices
#
# CONFIG_NEW_LEDS is not set

#
# LED drivers
#

#
# LED Triggers
#

#
# InfiniBand support
#
CONFIG_INFINIBAND=y
CONFIG_INFINIBAND_USER_MAD=y
CONFIG_INFINIBAND_USER_ACCESS=y
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_SRP=y
CONFIG_INFINIBAND_ISER=y

#
# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
#

#
# Real Time Clock
#

#
# DMA Engine support
#
CONFIG_DMA_ENGINE=y

#
# DMA Clients
#

#
# DMA Devices
#
CONFIG_INTEL_IOATDMA=y

#
# File systems
#
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
CONFIG_JFS_POSIX_ACL=y
# CONFIG_JFS_SECURITY is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_INOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_HFSPLUS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
CONFIG_HPFS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
CONFIG_UFS_FS=y
# CONFIG_UFS_DEBUG is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
CONFIG_ACORN_PARTITION=y
# CONFIG_ACORN_PARTITION_CUMANA is not set
# CONFIG_ACORN_PARTITION_EESOX is not set
# CONFIG_ACORN_PARTITION_ICS is not set
# CONFIG_ACORN_PARTITION_ADFS is not set
# CONFIG_ACORN_PARTITION_POWERTEC is not set
CONFIG_ACORN_PARTITION_RISCIX=y
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
CONFIG_SOLARIS_X86_PARTITION=y
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=y
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
CONFIG_NLS_CODEPAGE_860=y
CONFIG_NLS_CODEPAGE_861=y
CONFIG_NLS_CODEPAGE_862=y
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=y
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=y
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=y
CONFIG_NLS_ISO8859_8=y
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
# CONFIG_NLS_ISO8859_1 is not set
CONFIG_NLS_ISO8859_2=y
# CONFIG_NLS_ISO8859_3 is not set
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
CONFIG_NLS_ISO8859_13=y
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHEDSTATS=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_VM=y
CONFIG_FRAME_POINTER=y
# CONFIG_UNWIND_INFO is not set
# CONFIG_FORCED_INLINING is not set
CONFIG_RCU_TORTURE_TEST=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_RODATA is not set
CONFIG_4KSTACKS=y
CONFIG_DOUBLEFAULT=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_DEBUG_PROC_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
CONFIG_CRYPTO_SHA512=y
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_BLOWFISH is not set
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_586=y
CONFIG_CRYPTO_CAST5=y
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
CONFIG_CRYPTO_KHAZAD=y
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_CRC32C is not set

#
# Hardware crypto devices
#
# CONFIG_CRYPTO_DEV_PADLOCK is not set

#
# Library routines
#
CONFIG_CRC_CCITT=y
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_PLIST=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_KTIME_SCALAR=y

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/902c60e2/attachment.sig>

From ardavis at ichips.intel.com  Thu Sep  7 11:06:55 2006
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Thu, 07 Sep 2006 11:06:55 -0700
Subject: [openib-general] missing dtest program evdtest.c
In-Reply-To: <1157644245.28308.55.camel@stevo-desktop>
References: <1157644245.28308.55.camel@stevo-desktop>
Message-ID: <45005FBF.5080902@ichips.intel.com>

Steve Wise wrote:

>Is dapl/test/dtest missing evdtest.c?  Its in the makefile...
>
>Steve.
>
>  
>
It was inadvertently included with the last update when I was testing 
the fix for dat_evd_set_unwaitable. I will update the makefile.


From or.gerlitz at gmail.com  Thu Sep  7 12:52:54 2006
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Thu, 7 Sep 2006 21:52:54 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <200609071902.57379.toralf.foerster@gmx.de>
References: <200609071902.57379.toralf.foerster@gmx.de>
Message-ID: <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com>

On 9/7/06, Toralf Förster <toralf.foerster at gmx.de> wrote:
> The compile test of the attached .config failed :
> ...
>
> drivers/built-in.o: In function `iser_connect':
> drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id'
> drivers/infiniband/ulp/iser/iser_verbs.c:525: undefined reference to `rdma_resolve_addr'
> drivers/built-in.o: In function `iscsi_transport_init':
> drivers/scsi/scsi_transport_iscsi.c:1636: undefined reference to `netlink_register_notifier'
> drivers/scsi/scsi_transport_iscsi.c:1640: undefined reference to `netlink_kernel_create'
> drivers/scsi/scsi_transport_iscsi.c:1652: undefined reference to `sock_release'
> drivers/scsi/scsi_transport_iscsi.c:1654: undefined reference to `netlink_unregister_notifier'
> drivers/built-in.o: In function `iscsi_transport_exit':
> drivers/scsi/scsi_transport_iscsi.c:1669: undefined reference to `sock_release'
> drivers/scsi/scsi_transport_iscsi.c:1670: undefined reference to `netlink_unregister_notifier'
> make: *** [.tmp_vmlinux1] Error 1

you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i think
you are missing CONFIG_INET=m

Or.


From tziporet at mellanox.co.il  Thu Sep  7 13:01:30 2006
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Thu, 7 Sep 2006 23:01:30 +0300
Subject: [openib-general] OFED 1.1 status
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com>

Hi,

OFED 1.1 RC4 will be published on Monday 11-Sep.

We currently work on several showstoppers:

1.	223: mthca.so not properly linked to libibverbs - Vlad & Jack
2.	221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118  -
Roland
3.	219: OFED 1.1rc3 contains prerelease unstable libibverbs code -
Vlad & Jack

 
Thus final release date will be delayed to end of next week

 
Tziporet Koren

Software Director

Mellanox Technologies

mailto: tziporet at mellanox.co.il
Tel +972-4-9097200, ext 380

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/4dc2f05d/attachment.html>

From rsalmon at tulane.edu  Thu Sep  7 13:06:40 2006
From: rsalmon at tulane.edu (Rene Salmon)
Date: Thu, 07 Sep 2006 15:06:40 -0500
Subject: [openib-general] PXE + infiniband?
In-Reply-To: <2376B63A5AF8564F8A2A2D76BC6DB033DC1296@CINMLVEM11.e2k.ad.ge.com>
References: <2376B63A5AF8564F8A2A2D76BC6DB033DC1296@CINMLVEM11.e2k.ad.ge.com>
Message-ID: <45007BD0.9040101@tulane.edu>

Hi,

We are also interested in either PXE or etherboot over IB.
We also run LinuxBIOS.  If anyone manages to get this working can you 
post some notes maybe a wiki or a howto.


thanks
Rene


Cain, Brian (GE Healthcare) wrote:
>> -----Original Message-----
>> From: openib-general-bounces at openib.org 
>> [mailto:openib-general-bounces at openib.org] On Behalf Of Paul Baxter
>> Sent: Thursday, September 07, 2006 2:29 AM
>> To: openib-general at openib.org; Eli cohen
>> Subject: Re: [openib-general] PXE + infiniband?
>>
>>>> There is an implementation of PXE for Mellanox's HCAs that 
>> can be found
>>>> here: http://sourceforge.net/forum/forum.php?forum_id=494529
>>> Thanks for the tip
>>>
>>> I, too, am interested in this.
>>>
>>> Do you have a more direct link as I wandered around 
>> etherboot's project 
>>> site
>>> and couldn't find anything IB-specific.
>>
>> I must have been having a 'special moment' before, because I 
>> couldn't find 
>> the mailing lists
>>
>> Here they are!
>>
>> http://sourceforge.net/search/?ml_name=etherboot-developers&ty
>> pe_of_search=mlists&group_id=4233&words=infiniband 
> 
> I was able to follow the procedure outlined in Eli's README and I
> achieved some mixed results.  On one hand, lspci now shows "Expansion
> ROM at ed700000 [disabled] [size=1M]" whereas it didn't indicate that
> before ("disabled" means it's zeroed out, maybe?).  The BIOS seems to
> confirm the whole disabled thing since it doesn't list the HCA in the
> boot priority list.
> 
> After making this change, IPoIB seems to work via this HCA, but SRP
> (initiation, anyways) no longer does.  "ibsrpdm -c" no longer produces
> any output, even though I can see the target via the ibnetdiscover.
> Accessing the SRP target from another host on the fabric works fine.
> 
> -Brian
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Thu Sep  7 13:11:34 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 07 Sep 2006 13:11:34 -0700
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com>
	(Or Gerlitz's message of "Thu, 7 Sep 2006 21:52:54 +0200")
References: <200609071902.57379.toralf.foerster@gmx.de>
	<15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com>
Message-ID: <aday7sva13d.fsf@cisco.com>

    Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i
    Or> think you are missing CONFIG_INET=m

Seems like a bug in the iSER Kconfig -- it shouldn't be possible to
select iSER without everything it needs to compile.


From HNGUYEN at de.ibm.com  Thu Sep  7 14:42:58 2006
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Thu, 7 Sep 2006 23:42:58 +0200
Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4
In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com>
Message-ID: <OF609606CA.F2DB04D1-ONC12571E2.00719DBA-C12571E2.0076F34D@de.ibm.com>

Hello Tziporet!
Below is a patch of ehca against the ofed git tree branch ehca-branch in 
order to upgrade it to the same code level of Roland's git tree branch 
for-2.6.19, which has been posted for a while. The main code changes are:
- Replace the "huge" EDEB macro by a simpler wrapper based on dev_err/dbg
- Remove superfluous variables initialization and arguments checking
- Replace struct ehca_module by static member variables in appropriate 
files, where they are accessed
- Rename module name to ib_ehca.ko
Thanks!
Nam Nguyen


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---

 Kconfig         |   14 
 Makefile        |    9 
 ehca_av.c       |  128 ++----
 ehca_classes.h  |   27 -
 ehca_cq.c       |  222 +++++------
 ehca_eq.c       |   71 ---
 ehca_hca.c      |  103 +----
 ehca_irq.c      |  221 +++--------
 ehca_main.c     |  491 ++++++++----------------
 ehca_mcast.c    |  119 +----
 ehca_mrmw.c     | 1113 
++++++++++++++++++++++----------------------------------
 ehca_mrmw.h     |    3 
 ehca_pd.c       |   60 +--
 ehca_qp.c       |  572 ++++++++++++----------------
 ehca_reqs.c     |  219 ++++-------
 ehca_sqp.c      |   50 --
 ehca_tools.h    |  337 ++--------------
 ehca_uverbs.c   |  278 ++++++-------
 hcp_if.c        |  834 ++++++++++++-----------------------------
 hcp_phyp.c      |   26 -
 hcp_phyp.h      |   10 
 hipz_fns_core.h |   44 --
 ipz_pt_fn.c     |   37 -
 ipz_pt_fn.h     |    7 
 24 files changed, 1781 insertions(+), 3214 deletions(-)


diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/Kconfig 
linux-2.6/drivers/infiniband/hw/ehca/Kconfig
--- linux-2.6_orig/drivers/infiniband/hw/ehca/Kconfig   2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/Kconfig        2006-08-30 
20:00:16.000000000 +0200
@@ -1,12 +1,16 @@
 config INFINIBAND_EHCA
-       tristate "eHCA support"
-       depends on IBMEBUS && INFINIBAND
-       ---help---
-       This is a low level device driver for the IBM GX based Host 
channel
-       adapters (HCAs).
+       tristate "eHCA support"
+       depends on IBMEBUS && INFINIBAND
+       ---help---
+       This driver supports the IBM pSeries eHCA InfiniBand adapter.
+
+       To compile the driver as a module, choose M here. The module
+       will be called ib_ehca.
 
 config INFINIBAND_EHCA_SCALING
        bool "Scaling support (EXPERIMENTAL)"
        depends on IBMEBUS && INFINIBAND_EHCA && HOTPLUG_CPU && 
EXPERIMENTAL
        ---help---
        eHCA scaling support schedules the CQ callbacks to different CPUs.
+
+       To enable this feature choose Y here.
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/Makefile 
linux-2.6/drivers/infiniband/hw/ehca/Makefile
--- linux-2.6_orig/drivers/infiniband/hw/ehca/Makefile  2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/Makefile       2006-08-30 
20:00:17.000000000 +0200
@@ -8,11 +8,10 @@
 #
 #  This source code is distributed under a dual license of GPL v2.0 and 
OpenIB BSD.
 
-obj-$(CONFIG_INFINIBAND_EHCA) += hcad_mod.o
+obj-$(CONFIG_INFINIBAND_EHCA) += ib_ehca.o
 
 
-hcad_mod-objs  = ehca_main.o ehca_hca.o ehca_mcast.o ehca_pd.o ehca_av.o 
ehca_eq.o \
-                ehca_cq.o ehca_qp.o ehca_sqp.o ehca_mrmw.o ehca_reqs.o 
ehca_irq.o \
-                ehca_uverbs.o ipz_pt_fn.o hcp_if.o hcp_phyp.o
+ib_ehca-objs  = ehca_main.o ehca_hca.o ehca_mcast.o ehca_pd.o ehca_av.o 
ehca_eq.o \
+               ehca_cq.o ehca_qp.o ehca_sqp.o ehca_mrmw.o ehca_reqs.o 
ehca_irq.o \
+               ehca_uverbs.o ipz_pt_fn.o hcp_if.o hcp_phyp.o
 
-CFLAGS += -DEHCA_USE_HCALL -DEHCA_USE_HCALL_KERNEL
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_av.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_av.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_av.c 2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_av.c      2006-08-30 
20:00:16.000000000 +0200
@@ -42,34 +42,26 @@
  */
 
 
-#define DEB_PREFIX "ehav"
-
 #include <asm/current.h>
 
 #include "ehca_tools.h"
 #include "ehca_iverbs.h"
 #include "hcp_if.h"
 
+static struct kmem_cache *av_cache;
+
 struct ib_ah *ehca_create_ah(struct ib_pd *pd, struct ib_ah_attr 
*ah_attr)
 {
-       extern struct ehca_module ehca_module;
-       extern int ehca_static_rate;
-       int ret = 0;
-       struct ehca_av *av = NULL;
-       struct ehca_shca *shca = NULL;
-
-       EHCA_CHECK_PD_P(pd);
-       EHCA_CHECK_ADR_P(ah_attr);
+       int ret;
+       struct ehca_av *av;
+       struct ehca_shca *shca = container_of(pd->device, struct 
ehca_shca,
+                                             ib_device);
 
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
-
-       EDEB_EN(7, "pd=%p ah_attr=%p", pd, ah_attr);
-
-       av = kmem_cache_alloc(ehca_module.cache_av, SLAB_KERNEL);
+       av = kmem_cache_alloc(av_cache, SLAB_KERNEL);
        if (!av) {
-               EDEB_ERR(4, "Out of memory pd=%p ah_attr=%p", pd, 
ah_attr);
-               ret = -ENOMEM;
-               goto create_ah_exit0;
+               ehca_err(pd->device, "Out of memory pd=%p ah_attr=%p",
+                        pd, ah_attr);
+               return ERR_PTR(-ENOMEM);
        }
 
        av->av.sl = ah_attr->sl;
@@ -89,10 +81,6 @@ struct ib_ah *ehca_create_ah(struct ib_p
        } else
                av->av.ipd = ehca_static_rate;
 
-       EDEB(7, "IPD av->av.ipd set =%x  ah_attr->static_rate=%x "
-            "shca_ib_rate=%x ",av->av.ipd, ah_attr->static_rate,
-            shca->sport[ah_attr->port_num].rate);
-
        av->av.lnh = ah_attr->ah_flags;
        av->av.grh.word_0 = EHCA_BMASK_SET(GRH_IPVERSION_MASK, 6);
        av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_TCLASS_MASK,
@@ -104,7 +92,7 @@ struct ib_ah *ehca_create_ah(struct ib_p
        av->av.grh.word_0 |= EHCA_BMASK_SET(GRH_NEXTHEADER_MASK, 0x1B);
        /* set sgid in grh.word_1 */
        if (ah_attr->ah_flags & IB_AH_GRH) {
-               int rc = 0;
+               int rc;
                struct ib_port_attr port_attr;
                union ib_gid gid;
                memset(&port_attr, 0, sizeof(port_attr));
@@ -112,7 +100,7 @@ struct ib_ah *ehca_create_ah(struct ib_p
                                     &port_attr);
                if (rc) { /* invalid port number */
                        ret = -EINVAL;
-                       EDEB_ERR(4, "Invalid port number "
+                       ehca_err(pd->device, "Invalid port number "
                                 "ehca_query_port() returned %x "
                                 "pd=%p ah_attr=%p", rc, pd, ah_attr);
                        goto create_ah_exit1;
@@ -123,7 +111,7 @@ struct ib_ah *ehca_create_ah(struct ib_p
                                    ah_attr->grh.sgid_index, &gid);
                if (rc) {
                        ret = -EINVAL;
-                       EDEB_ERR(4, "Failed to retrieve sgid "
+                       ehca_err(pd->device, "Failed to retrieve sgid "
                                 "ehca_query_gid() returned %x "
                                 "pd=%p ah_attr=%p", rc, pd, ah_attr);
                        goto create_ah_exit1;
@@ -137,37 +125,24 @@ struct ib_ah *ehca_create_ah(struct ib_p
        memcpy(&av->av.grh.word_3, &ah_attr->grh.dgid,
               sizeof(ah_attr->grh.dgid));
 
-       EHCA_REGISTER_AV(device, pd);
-
-       EDEB_EX(7, "pd=%p ah_attr=%p av=%p", pd, ah_attr, av);
        return &av->ib_ah;
 
 create_ah_exit1:
-       kmem_cache_free(ehca_module.cache_av, av);
-
-create_ah_exit0:
-       EDEB_EX(7, "ret=%x pd=%p ah_attr=%p", ret, pd, ah_attr);
+       kmem_cache_free(av_cache, av);
 
        return ERR_PTR(ret);
 }
 
 int ehca_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr)
 {
-       struct ehca_av *av = NULL;
+       struct ehca_av *av;
        struct ehca_ud_av new_ehca_av;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_pd *my_pd = container_of(ah->pd, struct ehca_pd, 
ib_pd);
        u32 cur_pid = current->tgid;
-       int ret = 0;
-
-       EHCA_CHECK_AV(ah);
-       EHCA_CHECK_ADR(ah_attr);
 
-       EDEB_EN(7, "ah=%p ah_attr=%p", ah, ah_attr);
-
-       my_pd = container_of(ah->pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(ah->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                return -EINVAL;
        }
@@ -189,33 +164,31 @@ int ehca_modify_ah(struct ib_ah *ah, str
 
        /* set sgid in grh.word_1 */
        if (ah_attr->ah_flags & IB_AH_GRH) {
-               int rc = 0;
+               int rc;
                struct ib_port_attr port_attr;
                union ib_gid gid;
                memset(&port_attr, 0, sizeof(port_attr));
                rc = ehca_query_port(ah->device, ah_attr->port_num,
                                     &port_attr);
                if (rc) { /* invalid port number */
-                       ret = -EINVAL;
-                       EDEB_ERR(4, "Invalid port number "
+                       ehca_err(ah->device, "Invalid port number "
                                 "ehca_query_port() returned %x "
                                 "ah=%p ah_attr=%p port_num=%x",
                                 rc, ah, ah_attr, ah_attr->port_num);
-                       goto modify_ah_exit1;
+                       return -EINVAL;
                }
                memset(&gid, 0, sizeof(gid));
                rc = ehca_query_gid(ah->device,
                                    ah_attr->port_num,
                                    ah_attr->grh.sgid_index, &gid);
                if (rc) {
-                       ret = -EINVAL;
-                       EDEB_ERR(4, "Failed to retrieve sgid "
+                       ehca_err(ah->device, "Failed to retrieve sgid "
                                 "ehca_query_gid() returned %x "
                                 "ah=%p ah_attr=%p port_num=%x "
                                 "sgid_index=%x",
                                 rc, ah, ah_attr, ah_attr->port_num,
                                 ah_attr->grh.sgid_index);
-                       goto modify_ah_exit1;
+                       return -EINVAL;
                }
                memcpy(&new_ehca_av.grh.word_1, &gid, sizeof(gid));
        }
@@ -228,33 +201,22 @@ int ehca_modify_ah(struct ib_ah *ah, str
        av = container_of(ah, struct ehca_av, ib_ah);
        av->av = new_ehca_av;
 
-modify_ah_exit1:
-       EDEB_EX(7, "ret=%x ah=%p ah_attr=%p", ret, ah, ah_attr);
-
-       return ret;
+       return 0;
 }
 
 int ehca_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr)
 {
-       int ret = 0;
-       struct ehca_av *av = NULL;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_av *av = container_of(ah, struct ehca_av, ib_ah);
+       struct ehca_pd *my_pd = container_of(ah->pd, struct ehca_pd, 
ib_pd);
        u32 cur_pid = current->tgid;
 
-       EHCA_CHECK_AV(ah);
-       EHCA_CHECK_ADR(ah_attr);
-
-       EDEB_EN(7, "ah=%p ah_attr=%p", ah, ah_attr);
-
-       my_pd = container_of(ah->pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(ah->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                return -EINVAL;
        }
 
-       av = container_of(ah, struct ehca_av, ib_ah);
        memcpy(&ah_attr->grh.dgid, &av->av.grh.word_3,
               sizeof(ah_attr->grh.dgid));
        ah_attr->sl = av->av.sl;
@@ -271,33 +233,39 @@ int ehca_query_ah(struct ib_ah *ah, stru
        ah_attr->grh.flow_label = EHCA_BMASK_GET(GRH_FLOWLABEL_MASK,
                                                 av->av.grh.word_0);
 
-       EDEB_EX(7, "ah=%p ah_attr=%p ret=%x", ah, ah_attr, ret);
-       return ret;
+       return 0;
 }
 
 int ehca_destroy_ah(struct ib_ah *ah)
 {
-       extern struct ehca_module ehca_module;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_pd *my_pd = container_of(ah->pd, struct ehca_pd, 
ib_pd);
        u32 cur_pid = current->tgid;
-       int ret = 0;
-
-       EHCA_CHECK_AV(ah);
-       EHCA_DEREGISTER_AV(ah);
-
-       EDEB_EN(7, "ah=%p", ah);
 
-       my_pd = container_of(ah->pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(ah->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                return -EINVAL;
        }
 
-       kmem_cache_free(ehca_module.cache_av,
-                       container_of(ah, struct ehca_av, ib_ah));
+       kmem_cache_free(av_cache, container_of(ah, struct ehca_av, 
ib_ah));
 
-       EDEB_EX(7, "ret=%x ah=%p", ret, ah);
-       return ret;
+       return 0;
+}
+
+int ehca_init_av_cache(void)
+{
+       av_cache = kmem_cache_create("ehca_cache_av",
+                                  sizeof(struct ehca_av), 0,
+                                  SLAB_HWCACHE_ALIGN,
+                                  NULL, NULL);
+       if (!av_cache)
+               return -ENOMEM;
+       return 0;
+}
+
+void ehca_cleanup_av_cache(void)
+{
+       if (av_cache)
+               kmem_cache_destroy(av_cache);
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_classes.h 
linux-2.6/drivers/infiniband/hw/ehca/ehca_classes.h
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_classes.h    2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_classes.h 2006-08-30 
20:00:16.000000000 +0200
@@ -63,18 +63,6 @@ struct ehca_av;
 
 #include "ehca_irq.h"
 
-struct ehca_module {
-       struct list_head shca_list;
-       spinlock_t shca_lock;
-       struct timer_list timer;
-       kmem_cache_t *cache_pd;
-       kmem_cache_t *cache_cq;
-       kmem_cache_t *cache_qp;
-       kmem_cache_t *cache_av;
-       kmem_cache_t *cache_mr;
-       kmem_cache_t *cache_mw;
-};
-
 struct ehca_eq {
        u32 length;
        struct ipz_queue ipz_queue;
@@ -274,11 +262,26 @@ int ehca_shca_delete(struct ehca_shca *m
 
 struct ehca_sport *ehca_sport_new(struct ehca_shca *anchor);
 
+int ehca_init_pd_cache(void);
+void ehca_cleanup_pd_cache(void);
+int ehca_init_cq_cache(void);
+void ehca_cleanup_cq_cache(void);
+int ehca_init_qp_cache(void);
+void ehca_cleanup_qp_cache(void);
+int ehca_init_av_cache(void);
+void ehca_cleanup_av_cache(void);
+int ehca_init_mrmw_cache(void);
+void ehca_cleanup_mrmw_cache(void);
+
 extern spinlock_t ehca_qp_idr_lock;
 extern spinlock_t ehca_cq_idr_lock;
 extern struct idr ehca_qp_idr;
 extern struct idr ehca_cq_idr;
 
+extern int ehca_static_rate;
+extern int ehca_port_act_time;
+extern int ehca_use_hp_mr;
+
 struct ipzu_queue_resp {
        u64 queue;        /* points to first queue entry */
        u32 qe_size;      /* queue entry size */
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_cq.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_cq.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_cq.c 2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_cq.c      2006-08-30 
20:00:17.000000000 +0200
@@ -43,8 +43,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "e_cq"
-
 #include <asm/current.h>
 
 #include "ehca_iverbs.h"
@@ -52,17 +50,20 @@
 #include "ehca_irq.h"
 #include "hcp_if.h"
 
+static struct kmem_cache *cq_cache;
+
 int ehca_cq_assign_qp(struct ehca_cq *cq, struct ehca_qp *qp)
 {
        unsigned int qp_num = qp->real_qp_num;
        unsigned int key = qp_num & (QP_HASHTAB_LEN-1);
-       unsigned long spl_flags = 0;
+       unsigned long spl_flags;
 
        spin_lock_irqsave(&cq->spinlock, spl_flags);
        hlist_add_head(&qp->list_entries, &cq->qp_hashtab[key]);
        spin_unlock_irqrestore(&cq->spinlock, spl_flags);
 
-       EDEB(7, "cq_num=%x real_qp_num=%x", cq->cq_number, qp_num);
+       ehca_dbg(cq->ib_cq.device, "cq_num=%x real_qp_num=%x",
+                cq->cq_number, qp_num);
 
        return 0;
 }
@@ -71,26 +72,27 @@ int ehca_cq_unassign_qp(struct ehca_cq *
 {
        int ret = -EINVAL;
        unsigned int key = real_qp_num & (QP_HASHTAB_LEN-1);
-       struct hlist_node *iter = NULL;
-       struct ehca_qp *qp = NULL;
-       unsigned long spl_flags = 0;
+       struct hlist_node *iter;
+       struct ehca_qp *qp;
+       unsigned long spl_flags;
 
        spin_lock_irqsave(&cq->spinlock, spl_flags);
        hlist_for_each(iter, &cq->qp_hashtab[key]) {
                qp = hlist_entry(iter, struct ehca_qp, list_entries);
                if (qp->real_qp_num == real_qp_num) {
                        hlist_del(iter);
-                       EDEB(7, "removed qp from cq .cq_num=%x 
real_qp_num=%x",
-                            cq->cq_number, real_qp_num);
+                       ehca_dbg(cq->ib_cq.device,
+                                "removed qp from cq .cq_num=%x 
real_qp_num=%x",
+                                cq->cq_number, real_qp_num);
                        ret = 0;
                        break;
                }
        }
        spin_unlock_irqrestore(&cq->spinlock, spl_flags);
-       if (ret) {
-               EDEB_ERR(4, "qp not found cq_num=%x real_qp_num=%x",
+       if (ret)
+               ehca_err(cq->ib_cq.device,
+                        "qp not found cq_num=%x real_qp_num=%x",
                         cq->cq_number, real_qp_num);
-       }
 
        return ret;
 }
@@ -99,8 +101,8 @@ struct ehca_qp* ehca_cq_get_qp(struct eh
 {
        struct ehca_qp *ret = NULL;
        unsigned int key = real_qp_num & (QP_HASHTAB_LEN-1);
-       struct hlist_node *iter = NULL;
-       struct ehca_qp *qp = NULL;
+       struct hlist_node *iter;
+       struct ehca_qp *qp;
        hlist_for_each(iter, &cq->qp_hashtab[key]) {
                qp = hlist_entry(iter, struct ehca_qp, list_entries);
                if (qp->real_qp_num == real_qp_num) {
@@ -115,37 +117,28 @@ struct ib_cq *ehca_create_cq(struct ib_d
                             struct ib_ucontext *context,
                             struct ib_udata *udata)
 {
-       extern struct ehca_module ehca_module;
-       struct ib_cq *cq = NULL;
-       struct ehca_cq *my_cq = NULL;
-       struct ehca_shca *shca = NULL;
+       static const u32 additional_cqe = 20;
+       struct ib_cq *cq;
+       struct ehca_cq *my_cq;
+       struct ehca_shca *shca =
+               container_of(device, struct ehca_shca, ib_device);
        struct ipz_adapter_handle adapter_handle;
-       /* h_call's out parameters */
-       struct ehca_alloc_cq_parms param;
-       u32 counter = 0;
-       void *vpage = NULL;
-       u64 rpage = 0;
+       struct ehca_alloc_cq_parms param; /* h_call's out parameters */
        struct h_galpa gal;
-       u64 cqx_fec = 0;
-       u64 h_ret = 0;
-       int ipz_rc = 0;
-       int ret = 0;
-       const u32 additional_cqe=20;
-       int i= 0;
+       void *vpage;
+       u32 counter;
+       u64 rpage, cqx_fec, h_ret;
+       int ipz_rc, ret, i;
        unsigned long flags;
 
-       EHCA_CHECK_DEVICE_P(device);
-       EDEB_EN(7,  "device=%p cqe=%x context=%p", device, cqe, context);
-
        if (cqe >= 0xFFFFFFFF - 64 - additional_cqe)
                return ERR_PTR(-EINVAL);
 
-       my_cq = kmem_cache_alloc(ehca_module.cache_cq, SLAB_KERNEL);
+       my_cq = kmem_cache_alloc(cq_cache, SLAB_KERNEL);
        if (!my_cq) {
-               cq = ERR_PTR(-ENOMEM);
-               EDEB_ERR(4, "Out of memory for ehca_cq struct device=%p",
+               ehca_err(device, "Out of memory for ehca_cq struct 
device=%p",
                         device);
-               goto create_cq_exit0;
+               return ERR_PTR(-ENOMEM);
        }
 
        memset(my_cq, 0, sizeof(struct ehca_cq));
@@ -158,17 +151,14 @@ struct ib_cq *ehca_create_cq(struct ib_d
 
        cq = &my_cq->ib_cq;
 
-       shca = container_of(device, struct ehca_shca, ib_device);
        adapter_handle = shca->ipz_hca_handle;
        param.eq_handle = shca->eq.ipz_eq_handle;
 
-
        do {
                if (!idr_pre_get(&ehca_cq_idr, GFP_KERNEL)) {
                        cq = ERR_PTR(-ENOMEM);
-                       EDEB_ERR(4,
-                                "Can't reserve idr resources. "
-                                "device=%p", device);
+                       ehca_err(device, "Can't reserve idr nr. 
device=%p",
+                                device);
                        goto create_cq_exit1;
                }
 
@@ -180,9 +170,8 @@ struct ib_cq *ehca_create_cq(struct ib_d
 
        if (ret) {
                cq = ERR_PTR(-ENOMEM);
-               EDEB_ERR(4,
-                        "Can't allocate new idr entry. "
-                        "device=%p", device);
+               ehca_err(device, "Can't allocate new idr entry. 
device=%p",
+                        device);
                goto create_cq_exit1;
        }
 
@@ -194,7 +183,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
        h_ret = hipz_h_alloc_resource_cq(adapter_handle, my_cq, &param);
 
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4,"hipz_h_alloc_resource_cq() failed "
+               ehca_err(device, "hipz_h_alloc_resource_cq() failed "
                         "h_ret=%lx device=%p", h_ret, device);
                cq = ERR_PTR(ehca2ib_return_code(h_ret));
                goto create_cq_exit2;
@@ -203,9 +192,8 @@ struct ib_cq *ehca_create_cq(struct ib_d
        ipz_rc = ipz_queue_ctor(&my_cq->ipz_queue, param.act_pages,
                                EHCA_PAGESIZE, sizeof(struct ehca_cqe), 
0);
        if (!ipz_rc) {
-               EDEB_ERR(4,
-                        "ipz_queue_ctor() failed "
-                        "ipz_rc=%x device=%p", ipz_rc, device);
+               ehca_err(device, "ipz_queue_ctor() failed ipz_rc=%x 
device=%p",
+                        ipz_rc, device);
                cq = ERR_PTR(-EINVAL);
                goto create_cq_exit3;
        }
@@ -213,7 +201,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
        for (counter = 0; counter < param.act_pages; counter++) {
                vpage = ipz_qpageit_get_inc(&my_cq->ipz_queue);
                if (!vpage) {
-                       EDEB_ERR(4, "ipz_qpageit_get_inc() "
+                       ehca_err(device, "ipz_qpageit_get_inc() "
                                 "returns NULL device=%p", device);
                        cq = ERR_PTR(-EAGAIN);
                        goto create_cq_exit4;
@@ -231,10 +219,9 @@ struct ib_cq *ehca_create_cq(struct ib_d
                                                 kernel);
 
                if (h_ret < H_SUCCESS) {
-                       EDEB_ERR(4, "hipz_h_register_rpage_cq() failed "
-                                "ehca_cq=%p cq_num=%x h_ret=%lx "
-                                "counter=%i act_pages=%i",
-                                my_cq, my_cq->cq_number,
+                       ehca_err(device, "hipz_h_register_rpage_cq() 
failed "
+                                "ehca_cq=%p cq_num=%x h_ret=%lx 
counter=%i "
+                                "act_pages=%i", my_cq, my_cq->cq_number,
                                 h_ret, counter, param.act_pages);
                        cq = ERR_PTR(-EINVAL);
                        goto create_cq_exit4;
@@ -243,16 +230,16 @@ struct ib_cq *ehca_create_cq(struct ib_d
                if (counter == (param.act_pages - 1)) {
                        vpage = ipz_qpageit_get_inc(&my_cq->ipz_queue);
                        if ((h_ret != H_SUCCESS) || vpage) {
-                               EDEB_ERR(4, "Registration of pages not "
+                               ehca_err(device, "Registration of pages 
not "
                                         "complete ehca_cq=%p cq_num=%x "
-                                        "h_ret=%lx",
-                                        my_cq, my_cq->cq_number, h_ret);
+                                        "h_ret=%lx", my_cq, 
my_cq->cq_number,
+                                        h_ret);
                                cq = ERR_PTR(-EAGAIN);
                                goto create_cq_exit4;
                        }
                } else {
                        if (h_ret != H_PAGE_REGISTERED) {
-                               EDEB_ERR(4, "Registration of page failed "
+                               ehca_err(device, "Registration of page 
failed "
                                         "ehca_cq=%p cq_num=%x h_ret=%lx"
                                         "counter=%i act_pages=%i",
                                         my_cq, my_cq->cq_number,
@@ -267,8 +254,8 @@ struct ib_cq *ehca_create_cq(struct ib_d
 
        gal = my_cq->galpas.kernel;
        cqx_fec = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_fec));
-       EDEB(8, "ehca_cq=%p cq_num=%x CQX_FEC=%lx",
-            my_cq, my_cq->cq_number, cqx_fec);
+       ehca_dbg(device, "ehca_cq=%p cq_num=%x CQX_FEC=%lx",
+                my_cq, my_cq->cq_number, cqx_fec);
 
        my_cq->ib_cq.cqe = my_cq->nr_of_entries =
                param.act_nr_of_entries - additional_cqe;
@@ -280,7 +267,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
        if (context) {
                struct ipz_queue *ipz_queue = &my_cq->ipz_queue;
                struct ehca_create_cq_resp resp;
-               struct vm_area_struct *vma = NULL;
+               struct vm_area_struct *vma;
                memset(&resp, 0, sizeof(resp));
                resp.cq_number = my_cq->cq_number;
                resp.token = my_cq->token;
@@ -294,7 +281,7 @@ struct ib_cq *ehca_create_cq(struct ib_d
                                       (void**)&resp.ipz_queue.queue,
                                       &vma);
                if (ret) {
-                       EDEB_ERR(4, "Could not mmap queue pages");
+                       ehca_err(device, "Could not mmap queue pages");
                        cq = ERR_PTR(ret);
                        goto create_cq_exit4;
                }
@@ -304,19 +291,17 @@ struct ib_cq *ehca_create_cq(struct ib_d
 (void**)&resp.galpas.kernel.fw_handle,
                                         &vma);
                if (ret) {
-                       EDEB_ERR(4, "Could not mmap fw_handle");
+                       ehca_err(device, "Could not mmap fw_handle");
                        cq = ERR_PTR(ret);
                        goto create_cq_exit5;
                }
                my_cq->uspace_fwh = (u64)resp.galpas.kernel.fw_handle;
                if (ib_copy_to_udata(udata, &resp, sizeof(resp))) {
-                       EDEB_ERR(4,  "Copy to udata failed.");
+                       ehca_err(device, "Copy to udata failed.");
                        goto create_cq_exit6;
                }
        }
 
-       EDEB_EX(7,"retcode=%p ehca_cq=%p cq_num=%x cq_size=%x",
-               cq, my_cq, my_cq->cq_number, param.act_nr_of_entries);
        return cq;
 
 create_cq_exit6:
@@ -331,8 +316,8 @@ create_cq_exit4:
 create_cq_exit3:
        h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1);
        if (h_ret != H_SUCCESS)
-               EDEB(4, "hipz_h_destroy_cq() failed ehca_cq=%p cq_num=%x "
-                    "h_ret=%lx", my_cq, my_cq->cq_number, h_ret);
+               ehca_err(device, "hipz_h_destroy_cq() failed ehca_cq=%p "
+                        "cq_num=%x h_ret=%lx", my_cq, my_cq->cq_number, 
h_ret);
 
 create_cq_exit2:
        spin_lock_irqsave(&ehca_cq_idr_lock, flags);
@@ -340,36 +325,24 @@ create_cq_exit2:
        spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 
 create_cq_exit1:
-       kmem_cache_free(ehca_module.cache_cq, my_cq);
+       kmem_cache_free(cq_cache, my_cq);
 
-create_cq_exit0:
-       EDEB_EX(4, "An error has occured retcode=%p", cq);
        return cq;
 }
 
 int ehca_destroy_cq(struct ib_cq *cq)
 {
-       extern struct ehca_module ehca_module;
-       u64 h_ret = 0;
-       int ret = 0;
-       struct ehca_cq *my_cq = NULL;
-       int cq_num = 0;
-       struct ib_device *device = NULL;
-       struct ehca_shca *shca = NULL;
-       struct ipz_adapter_handle adapter_handle;
+       u64 h_ret;
+       int ret;
+       struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
+       int cq_num = my_cq->cq_number;
+       struct ib_device *device = cq->device;
+       struct ehca_shca *shca = container_of(device, struct ehca_shca,
+                                             ib_device);
+       struct ipz_adapter_handle adapter_handle = shca->ipz_hca_handle;
        u32 cur_pid = current->tgid;
        unsigned long flags;
 
-       EHCA_CHECK_CQ(cq);
-       my_cq = container_of(cq, struct ehca_cq, ib_cq);
-       cq_num = my_cq->cq_number;
-       device = cq->device;
-       EHCA_CHECK_DEVICE(device);
-       shca = container_of(device, struct ehca_shca, ib_device);
-       adapter_handle = shca->ipz_hca_handle;
-       EDEB_EN(7, "ehca_cq=%p cq_num=%x",
-               my_cq, my_cq->cq_number);
-
        spin_lock_irqsave(&ehca_cq_idr_lock, flags);
        while (my_cq->nr_callbacks)
                yield();
@@ -378,7 +351,7 @@ int ehca_destroy_cq(struct ib_cq *cq)
        spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 
        if (my_cq->uspace_queue && my_cq->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_cq->ownpid);
                return -EINVAL;
        }
@@ -386,64 +359,69 @@ int ehca_destroy_cq(struct ib_cq *cq)
        /* un-mmap if vma alloc */
        if (my_cq->uspace_queue ) {
                ret = ehca_munmap(my_cq->uspace_queue,
-                                     my_cq->ipz_queue.queue_length);
+                                 my_cq->ipz_queue.queue_length);
+               if (ret)
+                       ehca_err(device, "Could not munmap queue 
ehca_cq=%p "
+                                "cq_num=%x", my_cq, cq_num);
                ret = ehca_munmap(my_cq->uspace_fwh, EHCA_PAGESIZE);
+               if (ret)
+                       ehca_err(device, "Could not munmap fwh ehca_cq=%p 
"
+                                "cq_num=%x", my_cq, cq_num);
        }
 
        h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 0);
        if (h_ret == H_R_STATE) {
                /* cq in err: read err data and destroy it forcibly */
-               EDEB(4, "ehca_cq=%p cq_num=%x ressource=%lx in err state. 
"
-                    "Try to delete it forcibly.",
-                    my_cq, my_cq->cq_number, 
my_cq->ipz_cq_handle.handle);
+               ehca_dbg(device, "ehca_cq=%p cq_num=%x ressource=%lx in 
err "
+                        "state. Try to delete it forcibly.",
+                        my_cq, cq_num, my_cq->ipz_cq_handle.handle);
                ehca_error_data(shca, my_cq, my_cq->ipz_cq_handle.handle);
                h_ret = hipz_h_destroy_cq(adapter_handle, my_cq, 1);
                if (h_ret == H_SUCCESS)
-                       EDEB(4, "ehca_cq=%p cq_num=%x deleted 
successfully.",
-                            my_cq, my_cq->cq_number);
+                       ehca_dbg(device, "cq_num=%x deleted 
successfully.",
+                                cq_num);
        }
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4,"hipz_h_destroy_cq() failed "
-                        "h_ret=%lx ehca_cq=%p cq_num=%x",
-                        h_ret, my_cq, my_cq->cq_number);
-               ret = ehca2ib_return_code(h_ret);
-               goto destroy_cq_exit0;
+               ehca_err(device, "hipz_h_destroy_cq() failed h_ret=%lx "
+                        "ehca_cq=%p cq_num=%x", h_ret, my_cq, cq_num);
+               return ehca2ib_return_code(h_ret);
        }
        ipz_queue_dtor(&my_cq->ipz_queue);
-       kmem_cache_free(ehca_module.cache_cq, my_cq);
+       kmem_cache_free(cq_cache, my_cq);
 
-destroy_cq_exit0:
-       EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x ",
-               my_cq, cq_num, ret);
-       return ret;
+       return 0;
 }
 
 int ehca_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata)
 {
-       int ret = 0;
-       struct ehca_cq *my_cq = NULL;
+       struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
        u32 cur_pid = current->tgid;
 
-       if (unlikely(!cq)) {
-               EDEB_ERR(4, "cq is NULL");
-               return -EFAULT;
-       }
-
-       my_cq = container_of(cq, struct ehca_cq, ib_cq);
-       EDEB_EN(7, "ehca_cq=%p cq_num=%x",
-               my_cq, my_cq->cq_number);
-
        if (my_cq->uspace_queue && my_cq->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(cq->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_cq->ownpid);
                return -EINVAL;
        }
 
        /* TODO: proper resize needs to be done */
-       ret = -EFAULT;
-       EDEB_ERR(4, "not implemented yet");
+       ehca_err(cq->device, "not implemented yet");
 
-       EDEB_EX(7, "ehca_cq=%p cq_num=%x",
-               my_cq, my_cq->cq_number);
-       return ret;
+       return -EFAULT;
+}
+
+int ehca_init_cq_cache(void)
+{
+       cq_cache = kmem_cache_create("ehca_cache_cq",
+                                    sizeof(struct ehca_cq), 0,
+                                    SLAB_HWCACHE_ALIGN,
+                                    NULL, NULL);
+       if (!cq_cache)
+               return -ENOMEM;
+       return 0;
+}
+
+void ehca_cleanup_cq_cache(void)
+{
+       if (cq_cache)
+               kmem_cache_destroy(cq_cache);
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_eq.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_eq.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_eq.c 2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_eq.c      2006-08-30 
20:00:16.000000000 +0200
@@ -43,8 +43,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "e_eq"
-
 #include "ehca_classes.h"
 #include "ehca_irq.h"
 #include "ehca_iverbs.h"
@@ -56,24 +54,21 @@ int ehca_create_eq(struct ehca_shca *shc
                   struct ehca_eq *eq,
                   const enum ehca_eq_type type, const u32 length)
 {
-       u64 ret = H_SUCCESS;
-       u32 nr_pages = 0;
+       u64 ret;
+       u32 nr_pages;
        u32 i;
-       void *vpage = NULL;
-
-       EDEB_EN(7, "shca=%p eq=%p length=%x", shca, eq, length);
-       EHCA_CHECK_ADR(shca);
-       EHCA_CHECK_ADR(eq);
+       void *vpage;
+       struct ib_device *ib_dev = &shca->ib_device;
 
        spin_lock_init(&eq->spinlock);
        eq->is_initialized = 0;
 
        if (type != EHCA_EQ && type != EHCA_NEQ) {
-               EDEB_ERR(4, "Invalid EQ type %x. eq=%p", type, eq);
+               ehca_err(ib_dev, "Invalid EQ type %x. eq=%p", type, eq);
                return -EINVAL;
        }
-       if (length == 0) {
-               EDEB_ERR(4, "EQ length must not be zero. eq=%p", eq);
+       if (!length) {
+               ehca_err(ib_dev, "EQ length must not be zero. eq=%p", eq);
                return -EINVAL;
        }
 
@@ -86,14 +81,14 @@ int ehca_create_eq(struct ehca_shca *shc
                                       &nr_pages, &eq->ist);
 
        if (ret != H_SUCCESS) {
-               EDEB_ERR(4, "Can't allocate EQ / NEQ. eq=%p", eq);
+               ehca_err(ib_dev, "Can't allocate EQ/NEQ. eq=%p", eq);
                return -EINVAL;
        }
 
        ret = ipz_queue_ctor(&eq->ipz_queue, nr_pages,
                             EHCA_PAGESIZE, sizeof(struct ehca_eqe), 0);
        if (!ret) {
-               EDEB_ERR(4, "Can't allocate EQ pages. eq=%p", eq);
+               ehca_err(ib_dev, "Can't allocate EQ pages eq=%p", eq);
                goto create_eq_exit1;
        }
 
@@ -130,7 +125,7 @@ int ehca_create_eq(struct ehca_shca *shc
                                          SA_INTERRUPT, "ehca_eq",
                                          (void *)shca);
                if (ret < 0)
-                       EDEB_ERR(4, "Can't map interrupt handler.");
+                       ehca_err(ib_dev, "Can't map interrupt handler.");
 
                tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, 
(long)shca);
        } else if (type == EHCA_NEQ) {
@@ -138,15 +133,13 @@ int ehca_create_eq(struct ehca_shca *shc
                                          SA_INTERRUPT, "ehca_neq",
                                          (void *)shca);
                if (ret < 0)
-                       EDEB_ERR(4, "Can't map interrupt handler.");
+                       ehca_err(ib_dev, "Can't map interrupt handler.");
 
                tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, 
(long)shca);
        }
 
        eq->is_initialized = 1;
 
-       EDEB_EX(7, "ret=%lx", ret);
-
        return 0;
 
 create_eq_exit2:
@@ -155,53 +148,25 @@ create_eq_exit2:
 create_eq_exit1:
        hipz_h_destroy_eq(shca->ipz_hca_handle, eq);
 
-       EDEB_EX(7, "ret=%lx", ret);
-
        return -EINVAL;
 }
 
 void *ehca_poll_eq(struct ehca_shca *shca, struct ehca_eq *eq)
 {
-       unsigned long flags = 0;
-       void *eqe = NULL;
-
-       EDEB_EN(7, "shca=%p  eq=%p", shca, eq);
-       EHCA_CHECK_ADR_P(shca);
-       EHCA_CHECK_EQ_P(eq);
+       unsigned long flags;
+       void *eqe;
 
        spin_lock_irqsave(&eq->spinlock, flags);
        eqe = ipz_eqit_eq_get_inc_valid(&eq->ipz_queue);
        spin_unlock_irqrestore(&eq->spinlock, flags);
 
-       EDEB_EX(7, "eq=%p eqe=%p", eq, eqe);
-
        return eqe;
 }
 
-void ehca_poll_eqs(unsigned long data)
-{
-       struct ehca_shca *shca;
-       struct ehca_module *module = (struct ehca_module*)data;
-
-       spin_lock(&module->shca_lock);
-       list_for_each_entry(shca, &module->shca_list, shca_list) {
-               if (shca->eq.is_initialized)
-                       ehca_tasklet_eq((unsigned long)(void*)shca);
-       }
-       mod_timer(&module->timer, jiffies + HZ);
-       spin_unlock(&module->shca_lock);
-
-       return;
-}
-
 int ehca_destroy_eq(struct ehca_shca *shca, struct ehca_eq *eq)
 {
-       unsigned long flags = 0;
-       u64 h_ret = H_SUCCESS;
-
-       EDEB_EN(7, "shca=%p  eq=%p", shca, eq);
-       EHCA_CHECK_ADR(shca);
-       EHCA_CHECK_EQ(eq);
+       unsigned long flags;
+       u64 h_ret;
 
        spin_lock_irqsave(&eq->spinlock, flags);
        ibmebus_free_irq(NULL, eq->ist, (void *)shca);
@@ -211,12 +176,10 @@ int ehca_destroy_eq(struct ehca_shca *sh
        spin_unlock_irqrestore(&eq->spinlock, flags);
 
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "Can't free EQ resources.");
+               ehca_err(&shca->ib_device, "Can't free EQ resources.");
                return -EINVAL;
        }
        ipz_queue_dtor(&eq->ipz_queue);
 
-       EDEB_EX(7, "h_ret=%lx", h_ret);
-
-       return h_ret;
+       return 0;
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_hca.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_hca.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_hca.c        2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_hca.c     2006-08-30 
20:00:16.000000000 +0200
@@ -39,36 +39,29 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#undef DEB_PREFIX
-#define DEB_PREFIX "shca"
-
 #include "ehca_tools.h"
-
 #include "hcp_if.h"
 
 int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr 
*props)
 {
        int ret = 0;
-       struct ehca_shca *shca;
+       struct ehca_shca *shca = container_of(ibdev, struct ehca_shca,
+                                             ib_device);
        struct hipz_query_hca *rblock;
 
-       EDEB_EN(7, "");
-
-       memset(props, 0, sizeof(struct ib_device_attr));
-       shca = container_of(ibdev, struct ehca_shca, ib_device);
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4, "Can't allocate rblock memory.");
-               ret = -ENOMEM;
-               goto query_device0;
+               ehca_err(&shca->ib_device, "Can't allocate rblock 
memory.");
+               return -ENOMEM;
        }
 
        if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) {
-               EDEB_ERR(4, "Can't query device properties");
+               ehca_err(&shca->ib_device, "Can't query device 
properties");
                ret = -EINVAL;
                goto query_device1;
        }
+
+       memset(props, 0, sizeof(struct ib_device_attr));
        props->fw_ver          = rblock->hw_ver;
        props->max_mr_size     = rblock->max_mr_size;
        props->vendor_id       = rblock->vendor_id >> 8;
@@ -105,9 +98,6 @@ int ehca_query_device(struct ib_device *
 query_device1:
        kfree(rblock);
 
-query_device0:
-       EDEB_EX(7, "ret=%x", ret);
-
        return ret;
 }
 
@@ -115,27 +105,23 @@ int ehca_query_port(struct ib_device *ib
                    u8 port, struct ib_port_attr *props)
 {
        int ret = 0;
-       struct ehca_shca *shca;
+       struct ehca_shca *shca = container_of(ibdev, struct ehca_shca,
+                                             ib_device);
        struct hipz_query_port *rblock;
 
-       EDEB_EN(7, "port=%x", port);
-
-       memset(props, 0, sizeof(struct ib_port_attr));
-       shca = container_of(ibdev, struct ehca_shca, ib_device);
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4, "Can't allocate rblock memory.");
-               ret = -ENOMEM;
-               goto query_port0;
+               ehca_err(&shca->ib_device, "Can't allocate rblock 
memory.");
+               return -ENOMEM;
        }
 
        if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != 
H_SUCCESS) {
-               EDEB_ERR(4, "Can't query port properties");
+               ehca_err(&shca->ib_device, "Can't query port properties");
                ret = -EINVAL;
                goto query_port1;
        }
 
+       memset(props, 0, sizeof(struct ib_port_attr));
        props->state = rblock->state;
 
        switch (rblock->max_mtu) {
@@ -155,7 +141,9 @@ int ehca_query_port(struct ib_device *ib
                props->active_mtu = props->max_mtu = IB_MTU_4096;
                break;
        default:
-               EDEB_ERR(4, "Unknown MTU size: %x.", rblock->max_mtu);
+               ehca_err(&shca->ib_device, "Unknown MTU size: %x.",
+                        rblock->max_mtu);
+               break;
        }
 
        props->gid_tbl_len     = rblock->gid_tbl_len;
@@ -176,37 +164,28 @@ int ehca_query_port(struct ib_device *ib
 query_port1:
        kfree(rblock);
 
-query_port0:
-       EDEB_EX(7, "ret=%x", ret);
-
        return ret;
 }
 
 int ehca_query_pkey(struct ib_device *ibdev, u8 port, u16 index, u16 
*pkey)
 {
        int ret = 0;
-       struct ehca_shca *shca;
+       struct ehca_shca *shca = container_of(ibdev, struct ehca_shca, 
ib_device);
        struct hipz_query_port *rblock;
 
-       EDEB_EN(7, "port=%x index=%x", port, index);
-
        if (index > 16) {
-               EDEB_ERR(4, "Invalid index: %x.", index);
-               ret = -EINVAL;
-               goto query_pkey0;
+               ehca_err(&shca->ib_device, "Invalid index: %x.", index);
+               return -EINVAL;
        }
 
-       shca = container_of(ibdev, struct ehca_shca, ib_device);
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4,  "Can't allocate rblock memory.");
-               ret = -ENOMEM;
-               goto query_pkey0;
+               ehca_err(&shca->ib_device,  "Can't allocate rblock 
memory.");
+               return -ENOMEM;
        }
 
        if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != 
H_SUCCESS) {
-               EDEB_ERR(4, "Can't query port properties");
+               ehca_err(&shca->ib_device, "Can't query port properties");
                ret = -EINVAL;
                goto query_pkey1;
        }
@@ -216,9 +195,6 @@ int ehca_query_pkey(struct ib_device *ib
 query_pkey1:
        kfree(rblock);
 
-query_pkey0:
-       EDEB_EX(7, "ret=%x", ret);
-
        return ret;
 }
 
@@ -226,28 +202,23 @@ int ehca_query_gid(struct ib_device *ibd
                   int index, union ib_gid *gid)
 {
        int ret = 0;
-       struct ehca_shca *shca;
+       struct ehca_shca *shca = container_of(ibdev, struct ehca_shca,
+                                             ib_device);
        struct hipz_query_port *rblock;
 
-       EDEB_EN(7, "port=%x index=%x", port, index);
-
        if (index > 255) {
-               EDEB_ERR(4, "Invalid index: %x.", index);
-               ret = -EINVAL;
-               goto query_gid0;
+               ehca_err(&shca->ib_device, "Invalid index: %x.", index);
+               return -EINVAL;
        }
 
-       shca = container_of(ibdev, struct ehca_shca, ib_device);
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4, "Can't allocate rblock memory.");
-               ret = -ENOMEM;
-               goto query_gid0;
+               ehca_err(&shca->ib_device, "Can't allocate rblock 
memory.");
+               return -ENOMEM;
        }
 
        if (hipz_h_query_port(shca->ipz_hca_handle, port, rblock) != 
H_SUCCESS) {
-               EDEB_ERR(4, "Can't query port properties");
+               ehca_err(&shca->ib_device, "Can't query port properties");
                ret = -EINVAL;
                goto query_gid1;
        }
@@ -258,11 +229,6 @@ int ehca_query_gid(struct ib_device *ibd
 query_gid1:
        kfree(rblock);
 
-query_gid0:
-       EDEB_EX(7, "ret=%x GID=%lx%lx", ret,
-               *(u64 *) & gid->raw[0],
-               *(u64 *) & gid->raw[8]);
-
        return ret;
 }
 
@@ -270,13 +236,6 @@ int ehca_modify_port(struct ib_device *i
                     u8 port, int port_modify_mask,
                     struct ib_port_modify *props)
 {
-       int ret = 0;
-
-       EDEB_EN(7, "port=%x", port);
-
-       /* Not implemented yet. */
-
-       EDEB_EX(7, "ret=%x", ret);
-
-       return ret;
+       /* Not implemented yet */
+       return -EFAULT;
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_irq.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_irq.c        2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_irq.c     2006-08-30 
20:00:16.000000000 +0200
@@ -39,8 +39,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "eirq"
-
 #include "ehca_classes.h"
 #include "ehca_irq.h"
 #include "ehca_iverbs.h"
@@ -64,15 +62,17 @@
 #define ERROR_DATA_LENGTH      EHCA_BMASK_IBM(52,63)
 #define ERROR_DATA_TYPE        EHCA_BMASK_IBM(0,7)
 
+#ifdef CONFIG_INFINIBAND_EHCA_SCALING
+
 static void queue_comp_task(struct ehca_cq *__cq);
 
 static struct ehca_comp_pool* pool;
 static struct notifier_block comp_pool_callback_nb;
 
+#endif
+
 static inline void comp_event_callback(struct ehca_cq *cq)
 {
-       EDEB_EN(7, "cq=%p", cq);
-
        if (!cq->ib_cq.comp_handler)
                return;
 
@@ -80,8 +80,6 @@ static inline void comp_event_callback(s
        cq->ib_cq.comp_handler(&cq->ib_cq, cq->ib_cq.cq_context);
        spin_unlock(&cq->cb_lock);
 
-       EDEB_EX(7, "cq=%p", cq);
-
        return;
 }
 
@@ -91,9 +89,6 @@ static void print_error_data(struct ehca
        u64 type = EHCA_BMASK_GET(ERROR_DATA_TYPE, rblock[2]);
        u64 resource = rblock[1];
 
-       EDEB_EN(7, "shca=%p data=%p rblock=%p length=%x",
-               shca, data, rblock, length);
-
        switch (type) {
        case 0x1: /* Queue Pair */
        {
@@ -103,7 +98,8 @@ static void print_error_data(struct ehca
                if (rblock[6] == 0)
                        return;
 
-               EDEB_ERR(4, "QP 0x%x (resource=%lx) has errors.",
+               ehca_err(&shca->ib_device,
+                        "QP 0x%x (resource=%lx) has errors.",
                         qp->ib_qp.qp_num, resource);
                break;
        }
@@ -111,25 +107,25 @@ static void print_error_data(struct ehca
        {
                struct ehca_cq *cq = (struct ehca_cq*)data;
 
-               EDEB_ERR(4, "CQ 0x%x (resource=%lx) has errors.",
+               ehca_err(&shca->ib_device,
+                        "CQ 0x%x (resource=%lx) has errors.",
                         cq->cq_number, resource);
                break;
        }
        default:
-               EDEB_ERR(4, "Unknown errror type: %lx on %s.",
+               ehca_err(&shca->ib_device,
+                        "Unknown errror type: %lx on %s.",
                         type, shca->ib_device.name);
                break;
        }
 
-       EDEB_ERR(4, "Error data is available: %lx.", resource);
-       EDEB_ERR(4, "EHCA ----- error data begin "
+       ehca_err(&shca->ib_device, "Error data is available: %lx.", 
resource);
+       ehca_err(&shca->ib_device, "EHCA ----- error data begin "
                 "---------------------------------------------------");
-       EDEB_DMP(4, rblock, length, "resource=%lx", resource);
-       EDEB_ERR(4, "EHCA ----- error data end "
+       ehca_dmp(rblock, length, "resource=%lx", resource);
+       ehca_err(&shca->ib_device, "EHCA ----- error data end "
                 "----------------------------------------------------");
 
-       EDEB_EX(7, "");
-
        return;
 }
 
@@ -137,15 +133,13 @@ int ehca_error_data(struct ehca_shca *sh
                    u64 resource)
 {
 
-       unsigned long ret = 0;
+       unsigned long ret;
        u64 *rblock;
        unsigned long block_count;
 
-       EDEB_EN(7, "shca=%p data=%p resource=%lx", shca, data, resource);
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4, "Cannot allocate rblock memory.");
+               ehca_err(&shca->ib_device, "Cannot allocate rblock 
memory.");
                ret = -ENOMEM;
                goto error_data1;
        }
@@ -156,7 +150,8 @@ int ehca_error_data(struct ehca_shca *sh
                                &block_count);
 
        if (ret == H_R_STATE) {
-               EDEB_ERR(4, "No error data is available: %lx.", resource);
+               ehca_err(&shca->ib_device,
+                        "No error data is available: %lx.", resource);
        }
        else if (ret == H_SUCCESS) {
                int length;
@@ -169,7 +164,8 @@ int ehca_error_data(struct ehca_shca *sh
                print_error_data(shca, data, rblock, length);
        }
        else {
-               EDEB_ERR(4, "Error data could not be fetched: %lx", 
resource);
+               ehca_err(&shca->ib_device,
+                        "Error data could not be fetched: %lx", 
resource);
        }
 
        kfree(rblock);
@@ -188,8 +184,6 @@ static void qp_event_callback(struct ehc
        unsigned long flags;
        u32 token = EHCA_BMASK_GET(EQE_QP_TOKEN, eqe);
 
-       EDEB_EN(7, "eqe=%lx", eqe);
-
        spin_lock_irqsave(&ehca_qp_idr_lock, flags);
        qp = idr_find(&ehca_qp_idr, token);
        spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
@@ -209,8 +203,6 @@ static void qp_event_callback(struct ehc
 
        qp->ib_qp.event_handler(&event, qp->ib_qp.qp_context);
 
-       EDEB_EX(7, "qp=%p", qp);
-
        return;
 }
 
@@ -221,8 +213,6 @@ static void cq_event_callback(struct ehc
        unsigned long flags;
        u32 token = EHCA_BMASK_GET(EQE_CQ_TOKEN, eqe);
 
-       EDEB_EN(7, "eqe=%lx", eqe);
-
        spin_lock_irqsave(&ehca_cq_idr_lock, flags);
        cq = idr_find(&ehca_cq_idr, token);
        spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
@@ -232,8 +222,6 @@ static void cq_event_callback(struct ehc
 
        ehca_error_data(shca, cq, cq->ipz_cq_handle.handle);
 
-       EDEB_EX(7, "cq=%p", cq);
-
        return;
 }
 
@@ -241,8 +229,6 @@ static void parse_identifier(struct ehca
 {
        u8 identifier = EHCA_BMASK_GET(EQE_EE_IDENTIFIER, eqe);
 
-       EDEB_EN(7, "shca=%p eqe=%lx", shca, eqe);
-
        switch (identifier) {
        case 0x02: /* path migrated */
                qp_event_callback(shca, eqe, IB_EVENT_PATH_MIG);
@@ -262,41 +248,39 @@ static void parse_identifier(struct ehca
                cq_event_callback(shca, eqe);
                break;
        case 0x09: /* MRMWPTE error */
-               EDEB_ERR(4, "MRMWPTE error.");
+               ehca_err(&shca->ib_device, "MRMWPTE error.");
                break;
        case 0x0A: /* port event */
-               EDEB_ERR(4, "Port event.");
+               ehca_err(&shca->ib_device, "Port event.");
                break;
        case 0x0B: /* MR access error */
-               EDEB_ERR(4, "MR access error.");
+               ehca_err(&shca->ib_device, "MR access error.");
                break;
        case 0x0C: /* EQ error */
-               EDEB_ERR(4, "EQ error.");
+               ehca_err(&shca->ib_device, "EQ error.");
                break;
        case 0x0D: /* P/Q_Key mismatch */
-               EDEB_ERR(4, "P/Q_Key mismatch.");
+               ehca_err(&shca->ib_device, "P/Q_Key mismatch.");
                break;
        case 0x10: /* sampling complete */
-               EDEB_ERR(4, "Sampling complete.");
+               ehca_err(&shca->ib_device, "Sampling complete.");
                break;
        case 0x11: /* unaffiliated access error */
-               EDEB_ERR(4, "Unaffiliated access error.");
+               ehca_err(&shca->ib_device, "Unaffiliated access error.");
                break;
        case 0x12: /* path migrating error */
-               EDEB_ERR(4, "Path migration error.");
+               ehca_err(&shca->ib_device, "Path migration error.");
                break;
        case 0x13: /* interface trace stopped */
-               EDEB_ERR(4, "Interface trace stopped.");
+               ehca_err(&shca->ib_device, "Interface trace stopped.");
                break;
        case 0x14: /* first error capture info available */
        default:
-               EDEB_ERR(4, "Unknown identifier: %x on %s.",
+               ehca_err(&shca->ib_device, "Unknown identifier: %x on 
%s.",
                         identifier, shca->ib_device.name);
                break;
        }
 
-       EDEB_EX(7, "eqe=%lx identifier=%x", eqe, identifier);
-
        return;
 }
 
@@ -306,21 +290,19 @@ static void parse_ec(struct ehca_shca *s
        u8 ec   = EHCA_BMASK_GET(NEQE_EVENT_CODE, eqe);
        u8 port = EHCA_BMASK_GET(NEQE_PORT_NUMBER, eqe);
 
-       EDEB_EN(7, "shca=%p eqe=%lx", shca, eqe);
-
        switch (ec) {
        case 0x30: /* port availability change */
                if (EHCA_BMASK_GET(NEQE_PORT_AVAILABILITY, eqe)) {
-                       EDEB(4, "%s: port %x is active.",
-                            shca->ib_device.name, port);
+                       ehca_info(&shca->ib_device,
+                                 "port %x is active.", port);
                        event.device = &shca->ib_device;
                        event.event = IB_EVENT_PORT_ACTIVE;
                        event.element.port_num = port;
                        shca->sport[port - 1].port_state = IB_PORT_ACTIVE;
                        ib_dispatch_event(&event);
                } else {
-                       EDEB(4, "%s: port %x is inactive.",
-                            shca->ib_device.name, port);
+                       ehca_info(&shca->ib_device,
+                                 "port %x is inactive.", port);
                        event.device = &shca->ib_device;
                        event.event = IB_EVENT_PORT_ERR;
                        event.element.port_num = port;
@@ -333,19 +315,19 @@ static void parse_ec(struct ehca_shca *s
                 * disruptive change is caused by
                 * LID, PKEY or SM change
                 */
-                EDEB(4, "EHCA disruptive port %x "
-                    "configuration change.", port);
+               ehca_warn(&shca->ib_device,
+                         "disruptive port %x configuration change", 
port);
 
-               EDEB(4, "%s: port %x is inactive.",
-                    shca->ib_device.name, port);
+               ehca_info(&shca->ib_device,
+                        "port %x is inactive.", port);
                event.device = &shca->ib_device;
                event.event = IB_EVENT_PORT_ERR;
                event.element.port_num = port;
                shca->sport[port - 1].port_state = IB_PORT_DOWN;
                ib_dispatch_event(&event);
 
-               EDEB(4, "%s: port %x is active.",
-                            shca->ib_device.name, port);
+               ehca_info(&shca->ib_device,
+                        "port %x is active.", port);
                event.device = &shca->ib_device;
                event.event = IB_EVENT_PORT_ACTIVE;
                event.element.port_num = port;
@@ -353,34 +335,27 @@ static void parse_ec(struct ehca_shca *s
                ib_dispatch_event(&event);
                break;
        case 0x32: /* adapter malfunction */
-               EDEB_ERR(4, "Adapter malfunction.");
+               ehca_err(&shca->ib_device, "Adapter malfunction.");
                break;
        case 0x33:  /* trace stopped */
-               EDEB_ERR(4, "Traced stopped.");
+               ehca_err(&shca->ib_device, "Traced stopped.");
                break;
        default:
-               EDEB_ERR(4, "Unknown event code: %x on %s.",
+               ehca_err(&shca->ib_device, "Unknown event code: %x on 
%s.",
                         ec, shca->ib_device.name);
                break;
        }
 
-       EDEB_EN(7, "eqe=%lx ec=%x", eqe, ec);
-
        return;
 }
 
 static inline void reset_eq_pending(struct ehca_cq *cq)
 {
-       u64 CQx_EP = 0;
+       u64 CQx_EP;
        struct h_galpa gal = cq->galpas.kernel;
 
-       EDEB_EN(7, "cq=%p", cq);
-
        hipz_galpa_store_cq(gal, cqx_ep, 0x0);
        CQx_EP = hipz_galpa_load(gal, CQTEMM_OFFSET(cqx_ep));
-       EDEB(7, "CQx_EP=%lx", CQx_EP);
-
-       EDEB_EX(7, "cq=%p", cq);
 
        return;
 }
@@ -389,12 +364,8 @@ irqreturn_t ehca_interrupt_neq(int irq, 
 {
        struct ehca_shca *shca = (struct ehca_shca*)dev_id;
 
-       EDEB_EN(7, "dev_id=%p", dev_id);
-
        tasklet_hi_schedule(&shca->neq.interrupt_task);
 
-       EDEB_EX(7, "");
-
        return IRQ_HANDLED;
 }
 
@@ -402,9 +373,7 @@ void ehca_tasklet_neq(unsigned long data
 {
        struct ehca_shca *shca = (struct ehca_shca*)data;
        struct ehca_eqe *eqe;
-       u64 ret = H_SUCCESS;
-
-       EDEB_EN(7, "shca=%p", shca);
+       u64 ret;
 
        eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->neq);
 
@@ -419,9 +388,7 @@ void ehca_tasklet_neq(unsigned long data
                                 shca->neq.ipz_eq_handle, 
0xFFFFFFFFFFFFFFFFL);
 
        if (ret != H_SUCCESS)
-               EDEB_ERR(4, "Can't clear notification events.");
-
-       EDEB_EX(7, "shca=%p", shca);
+               ehca_err(&shca->ib_device, "Can't clear notification 
events.");
 
        return;
 }
@@ -430,12 +397,8 @@ irqreturn_t ehca_interrupt_eq(int irq, v
 {
        struct ehca_shca *shca = (struct ehca_shca*)dev_id;
 
-       EDEB_EN(7, "dev_id=%p", dev_id);
-
        tasklet_hi_schedule(&shca->eq.interrupt_task);
 
-       EDEB_EX(7, "");
-
        return IRQ_HANDLED;
 }
 
@@ -446,8 +409,6 @@ void ehca_tasklet_eq(unsigned long data)
        int int_state;
        int query_cnt = 0;
 
-       EDEB_EN(7, "shca=%p", shca);
-
        do {
                eqe = (struct ehca_eqe *)ehca_poll_eq(shca, &shca->eq);
 
@@ -460,17 +421,18 @@ void ehca_tasklet_eq(unsigned long data)
                        while (eqe) {
                                u64 eqe_value = eqe->entry;
 
-                               EDEB(7, "eqe_value=%lx", eqe_value);
+                               ehca_dbg(&shca->ib_device,
+                                        "eqe_value=%lx", eqe_value);
 
                                /* TODO: better structure */
                                if (EHCA_BMASK_GET(EQE_COMPLETION_EVENT,
                                                   eqe_value)) {
-                                       extern struct idr ehca_cq_idr;
                                        unsigned long flags;
                                        u32 token;
                                        struct ehca_cq *cq;
 
-                                       EDEB(6, "... completion event");
+                                       ehca_dbg(&shca->ib_device,
+                                                "... completion event");
                                        token =
 EHCA_BMASK_GET(EQE_CQ_TOKEN,
                                                               eqe_value);
@@ -494,7 +456,8 @@ void ehca_tasklet_eq(unsigned long data)
                                        comp_event_callback(cq);
 #endif
                                } else {
-                                       EDEB(6, "... non completion 
event");
+                                       ehca_dbg(&shca->ib_device,
+                                                "... non completion 
event");
                                        parse_identifier(shca, eqe_value);
                                }
                                eqe =
@@ -518,29 +481,25 @@ void ehca_tasklet_eq(unsigned long data)
                }
        } while (int_state != 0);
 
-       EDEB_EX(7, "shca=%p", shca);
-
        return;
 }
 
+#ifdef CONFIG_INFINIBAND_EHCA_SCALING
+
 static inline int find_next_online_cpu(struct ehca_comp_pool* pool)
 {
        unsigned long flags_last_cpu;
 
-       EDEB_DMP(7, &cpu_online_map, sizeof(cpumask_t), "");
+       if (ehca_debug_level)
+               ehca_dmp(&cpu_online_map, sizeof(cpumask_t), "");
 
        spin_lock_irqsave(&pool->last_cpu_lock, flags_last_cpu);
        pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map);
-
        if (pool->last_cpu == NR_CPUS)
-               pool->last_cpu = 0;
-       if (!cpu_online(pool->last_cpu))
-               pool->last_cpu = next_cpu(pool->last_cpu, cpu_online_map);
-
+               pool->last_cpu = first_cpu(cpu_online_map);
        spin_unlock_irqrestore(&pool->last_cpu_lock, flags_last_cpu);
 
-       // return pool->last_cpu;
-       return 1;
+       return pool->last_cpu;
 }
 
 static void __queue_comp_task(struct ehca_cq *__cq,
@@ -549,8 +508,6 @@ static void __queue_comp_task(struct ehc
        unsigned long flags_cct;
        unsigned long flags_cq;
 
-       EDEB_EN(7, "__cq=%p cct=%p", __cq, cct);
-
        spin_lock_irqsave(&cct->task_lock, flags_cct);
        spin_lock_irqsave(&__cq->task_lock, flags_cq);
 
@@ -565,10 +522,6 @@ static void __queue_comp_task(struct ehc
 
        spin_unlock_irqrestore(&__cq->task_lock, flags_cq);
        spin_unlock_irqrestore(&cct->task_lock, flags_cct);
-
-
-       EDEB_EX(7, "");
-
 }
 
 static void queue_comp_task(struct ehca_cq *__cq)
@@ -580,10 +533,6 @@ static void queue_comp_task(struct ehca_
        cpu = get_cpu();
        cpu_id = find_next_online_cpu(pool);
 
-       EDEB_EN(7, "pool=%p cq=%p cq_nr=%x CPU=%x:%x:%x:%x",
-               pool, __cq, __cq->cq_number,
-               cpu, cpu_id, num_online_cpus(), num_possible_cpus());
-
        BUG_ON(!cpu_online(cpu_id));
 
        cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu_id);
@@ -597,20 +546,15 @@ static void queue_comp_task(struct ehca_
 
        put_cpu();
 
-       EDEB_EX(7, "cct=%p", cct);
-
        return;
 }
 
 static void run_comp_task(struct ehca_cpu_comp_task* cct)
 {
-       struct ehca_cq *cq = NULL;
+       struct ehca_cq *cq;
        unsigned long flags_cct;
        unsigned long flags_cq;
 
-
-       EDEB_EN(7, "cct=%p", cct);
-
        spin_lock_irqsave(&cct->task_lock, flags_cct);
 
        while (!list_empty(&cct->cq_list)) {
@@ -631,8 +575,6 @@ static void run_comp_task(struct ehca_cp
 
        spin_unlock_irqrestore(&cct->task_lock, flags_cct);
 
-       EDEB_EX(7, "cct=%p cq=%p", cct, cq);
-
        return;
 }
 
@@ -641,8 +583,6 @@ static int comp_task(void *__cct)
        struct ehca_cpu_comp_task* cct = __cct;
        DECLARE_WAITQUEUE(wait, current);
 
-       EDEB_EN(7, "cct=%p", cct);
-
        set_current_state(TASK_INTERRUPTIBLE);
        while(!kthread_should_stop()) {
                add_wait_queue(&cct->wait_queue, &wait);
@@ -661,8 +601,6 @@ static int comp_task(void *__cct)
        }
        __set_current_state(TASK_RUNNING);
 
-       EDEB_EX(7, "");
-
        return 0;
 }
 
@@ -671,16 +609,12 @@ static struct task_struct *create_comp_t
 {
        struct ehca_cpu_comp_task *cct;
 
-       EDEB_EN(7, "cpu=%d:%d", cpu, NR_CPUS);
-
        cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
        spin_lock_init(&cct->task_lock);
        INIT_LIST_HEAD(&cct->cq_list);
        init_waitqueue_head(&cct->wait_queue);
        cct->task = kthread_create(comp_task, cct, "ehca_comp/%d", cpu);
 
-       EDEB_EX(7, "cct/%d=%p", cpu, cct);
-
        return cct->task;
 }
 
@@ -691,8 +625,6 @@ static void destroy_comp_task(struct ehc
        struct task_struct *task;
        unsigned long flags_cct;
 
-       EDEB_EN(7, "pool=%p cpu=%d:%d", pool, cpu, NR_CPUS);
-
        cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
 
        spin_lock_irqsave(&cct->task_lock, flags_cct);
@@ -706,8 +638,6 @@ static void destroy_comp_task(struct ehc
        if (task)
                kthread_stop(task);
 
-       EDEB_EX(7, "");
-
        return;
 }
 
@@ -719,8 +649,6 @@ static void take_over_work(struct ehca_c
        struct ehca_cq *cq;
        unsigned long flags_cct;
 
-       EDEB_EN(7, "cpu=%x", cpu);
-
        spin_lock_irqsave(&cct->task_lock, flags_cct);
 
        list_splice_init(&cct->cq_list, &list);
@@ -735,8 +663,6 @@ static void take_over_work(struct ehca_c
 
        spin_unlock_irqrestore(&cct->task_lock, flags_cct);
 
-       EDEB_EX(7, "");
-
 }
 
 static int comp_pool_callback(struct notifier_block *nfb,
@@ -746,55 +672,50 @@ static int comp_pool_callback(struct not
        unsigned int cpu = (unsigned long)hcpu;
        struct ehca_cpu_comp_task *cct;
 
-       EDEB_EN(7, "CPU number changed (action=%lx)", action);
-
        switch (action) {
        case CPU_UP_PREPARE:
-               EDEB(4, "CPU: %x (CPU_PREPARE)", cpu);
+               ehca_gen_dbg("CPU: %x (CPU_PREPARE)", cpu);
                if(!create_comp_task(pool, cpu)) {
-                       EDEB_ERR(4, "Can't create comp_task for cpu: %x", 
cpu);
+                       ehca_gen_err("Can't create comp_task for cpu: %x", 
cpu);
                        return NOTIFY_BAD;
                }
                break;
        case CPU_UP_CANCELED:
-               EDEB(4, "CPU: %x (CPU_CANCELED)", cpu);
+               ehca_gen_dbg("CPU: %x (CPU_CANCELED)", cpu);
                cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
                kthread_bind(cct->task, any_online_cpu(cpu_online_map));
                destroy_comp_task(pool, cpu);
                break;
        case CPU_ONLINE:
-               EDEB(4, "CPU: %x (CPU_ONLINE)", cpu);
+               ehca_gen_dbg("CPU: %x (CPU_ONLINE)", cpu);
                cct = per_cpu_ptr(pool->cpu_comp_tasks, cpu);
                kthread_bind(cct->task, cpu);
                wake_up_process(cct->task);
                break;
        case CPU_DOWN_PREPARE:
-               EDEB(4, "CPU: %x (CPU_DOWN_PREPARE)", cpu);
+               ehca_gen_dbg("CPU: %x (CPU_DOWN_PREPARE)", cpu);
                break;
        case CPU_DOWN_FAILED:
-               EDEB(4, "CPU: %x (CPU_DOWN_FAILED)", cpu);
+               ehca_gen_dbg("CPU: %x (CPU_DOWN_FAILED)", cpu);
                break;
        case CPU_DEAD:
-               EDEB(4, "CPU: %x (CPU_DEAD)", cpu);
+               ehca_gen_dbg("CPU: %x (CPU_DEAD)", cpu);
                destroy_comp_task(pool, cpu);
                take_over_work(pool, cpu);
                break;
        }
 
-       EDEB_EX(7, "CPU number changed");
-
        return NOTIFY_OK;
 }
 
+#endif
+
 int ehca_create_comp_pool(void)
 {
 #ifdef CONFIG_INFINIBAND_EHCA_SCALING
        int cpu;
        struct task_struct *task;
 
-       EDEB_EN(7, "");
-
-
        pool = kzalloc(sizeof(struct ehca_comp_pool), GFP_KERNEL);
        if (pool == NULL)
                return -ENOMEM;
@@ -819,8 +740,6 @@ int ehca_create_comp_pool(void)
        comp_pool_callback_nb.notifier_call = comp_pool_callback;
        comp_pool_callback_nb.priority =0;
        register_cpu_notifier(&comp_pool_callback_nb);
-
-       EDEB_EX(7, "pool=%p", pool);
 #endif
 
        return 0;
@@ -831,16 +750,12 @@ void ehca_destroy_comp_pool(void)
 #ifdef CONFIG_INFINIBAND_EHCA_SCALING
        int i;
 
-       EDEB_EN(7, "pool=%p", pool);
-
        unregister_cpu_notifier(&comp_pool_callback_nb);
 
        for (i = 0; i < NR_CPUS; i++) {
                if (cpu_online(i))
                        destroy_comp_task(pool, i);
        }
-
-       EDEB_EN(7, "");
 #endif
 
        return;
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_main.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_main.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_main.c       2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_main.c    2006-08-30 
20:00:17.000000000 +0200
@@ -4,6 +4,7 @@
  *  module start stop, hca detection
  *
  *  Authors: Heiko J Schick <schickhj at de.ibm.com>
+ *           Hoang-Nam Nguyen <hnguyen at de.ibm.com>
  *
  *  Copyright (c) 2005 IBM Corporation
  *
@@ -38,8 +39,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "shca"
-
 #include "ehca_classes.h"
 #include "ehca_iverbs.h"
 #include "ehca_mrmw.h"
@@ -49,10 +48,10 @@
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0012");
+MODULE_VERSION("SVNEHCA_0015");
 
 int ehca_open_aqp1     = 0;
-int ehca_debug_level   = -1;
+int ehca_debug_level   = 0;
 int ehca_hw_level      = 0;
 int ehca_nr_ports      = 2;
 int ehca_use_hp_mr     = 0;
@@ -73,7 +72,7 @@ MODULE_PARM_DESC(open_aqp1,
                 "AQP1 on startup (0: no (default), 1: yes)");
 MODULE_PARM_DESC(debug_level,
                 "debug level"
-                " (0: node, 6: only errors (default), 9: all)");
+                " (0: no debug traces (default), 1: with debug traces)");
 MODULE_PARM_DESC(hw_level,
                 "hardware level"
                 " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)");
@@ -89,170 +88,74 @@ MODULE_PARM_DESC(poll_all_eqs,
 MODULE_PARM_DESC(static_rate,
                 "set permanent static rate (default: disabled)");
 
-/*
- * This external trace mask controls what will end up in the
- * kernel ring buffer. Number 6 means, that everything between
- * 0 and 5 will be stored.
- */
-u8 ehca_edeb_mask[EHCA_EDEB_TRACE_MASK_SIZE]={6, 6, 6, 6,
-                                             6, 6, 6, 6,
-                                             6, 6, 6, 6,
-                                             6, 6, 6, 6,
-                                             6, 6, 6, 6,
-                                             6, 6, 6, 6,
-                                             6, 6, 6, 6,
-                                             6, 6, 0, 0};
-
 spinlock_t ehca_qp_idr_lock;
 spinlock_t ehca_cq_idr_lock;
 DEFINE_IDR(ehca_qp_idr);
 DEFINE_IDR(ehca_cq_idr);
 
-struct ehca_module ehca_module;
-
-void ehca_init_trace(void)
-{
-       EDEB_EN(7, "");
+static struct list_head shca_list; /* list of all registered ehcas */
+static spinlock_t shca_list_lock;
 
-       if (ehca_debug_level != -1) {
-               int i;
-               for (i = 0; i < EHCA_EDEB_TRACE_MASK_SIZE; i++)
-                       ehca_edeb_mask[i] = ehca_debug_level;
-       }
-
-       EDEB_EX(7, "");
-}
+static struct timer_list poll_eqs_timer;
 
-int ehca_create_slab_caches(struct ehca_module *ehca_module)
+static int ehca_create_slab_caches(void)
 {
-       int ret = 0;
-
-       EDEB_EN(7, "");
+       int ret;
 
-       ehca_module->cache_pd =
-               kmem_cache_create("ehca_cache_pd",
-                                 sizeof(struct ehca_pd),
-                                 0, SLAB_HWCACHE_ALIGN,
-                                 NULL, NULL);
-       if (!ehca_module->cache_pd) {
-               EDEB_ERR(4, "Cannot create PD SLAB cache.");
-               ret = -ENOMEM;
-               goto create_slab_caches1;
+       ret = ehca_init_pd_cache();
+       if (ret) {
+               ehca_gen_err("Cannot create PD SLAB cache.");
+               return ret;
        }
 
-       ehca_module->cache_cq =
-               kmem_cache_create("ehca_cache_cq",
-                                 sizeof(struct ehca_cq),
-                                 0, SLAB_HWCACHE_ALIGN,
-                                 NULL, NULL);
-       if (!ehca_module->cache_cq) {
-               EDEB_ERR(4, "Cannot create CQ SLAB cache.");
-               ret = -ENOMEM;
+       ret = ehca_init_cq_cache();
+       if (ret) {
+               ehca_gen_err("Cannot create CQ SLAB cache.");
                goto create_slab_caches2;
        }
 
-       ehca_module->cache_qp =
-               kmem_cache_create("ehca_cache_qp",
-                                 sizeof(struct ehca_qp),
-                                 0, SLAB_HWCACHE_ALIGN,
-                                 NULL, NULL);
-       if (!ehca_module->cache_qp) {
-               EDEB_ERR(4, "Cannot create QP SLAB cache.");
-               ret = -ENOMEM;
+       ret = ehca_init_qp_cache();
+       if (ret) {
+               ehca_gen_err("Cannot create QP SLAB cache.");
                goto create_slab_caches3;
        }
 
-       ehca_module->cache_av =
-               kmem_cache_create("ehca_cache_av",
-                                 sizeof(struct ehca_av),
-                                 0, SLAB_HWCACHE_ALIGN,
-                                 NULL, NULL);
-       if (!ehca_module->cache_av) {
-               EDEB_ERR(4, "Cannot create AV SLAB cache.");
-               ret = -ENOMEM;
+       ret = ehca_init_av_cache();
+       if (ret) {
+               ehca_gen_err("Cannot create AV SLAB cache.");
                goto create_slab_caches4;
        }
 
-       ehca_module->cache_mw =
-               kmem_cache_create("ehca_cache_mw",
-                                 sizeof(struct ehca_mw),
-                                 0, SLAB_HWCACHE_ALIGN,
-                                 NULL, NULL);
-       if (!ehca_module->cache_mw) {
-               EDEB_ERR(4, "Cannot create MW SLAB cache.");
-               ret = -ENOMEM;
+       ret = ehca_init_mrmw_cache();
+       if (ret) {
+               ehca_gen_err("Cannot create MR&MW SLAB cache.");
                goto create_slab_caches5;
        }
 
-       ehca_module->cache_mr =
-               kmem_cache_create("ehca_cache_mr",
-                                 sizeof(struct ehca_mr),
-                                 0, SLAB_HWCACHE_ALIGN,
-                                 NULL, NULL);
-       if (!ehca_module->cache_mr) {
-               EDEB_ERR(4, "Cannot create MR SLAB cache.");
-               ret = -ENOMEM;
-               goto create_slab_caches6;
-       }
-
-       EDEB_EX(7, "ret=%x", ret);
-
-       return ret;
-
-create_slab_caches6:
-       kmem_cache_destroy(ehca_module->cache_mw);
+       return 0;
 
 create_slab_caches5:
-       kmem_cache_destroy(ehca_module->cache_av);
+       ehca_cleanup_av_cache();
 
 create_slab_caches4:
-       kmem_cache_destroy(ehca_module->cache_qp);
+       ehca_cleanup_qp_cache();
 
 create_slab_caches3:
-       kmem_cache_destroy(ehca_module->cache_cq);
+       ehca_cleanup_cq_cache();
 
 create_slab_caches2:
-       kmem_cache_destroy(ehca_module->cache_pd);
-
-create_slab_caches1:
-       EDEB_EX(7, "ret=%x", ret);
+       ehca_cleanup_pd_cache();
 
        return ret;
 }
 
-int ehca_destroy_slab_caches(struct ehca_module *ehca_module)
+static void ehca_destroy_slab_caches(void)
 {
-       int ret;
-
-       EDEB_EN(7, "");
-
-       ret = kmem_cache_destroy(ehca_module->cache_pd);
-       if (ret)
-               EDEB_ERR(4, "Cannot destroy PD SLAB cache. ret=%x", ret);
-
-       ret = kmem_cache_destroy(ehca_module->cache_cq);
-       if (ret)
-               EDEB_ERR(4, "Cannot destroy CQ SLAB cache. ret=%x", ret);
-
-       ret = kmem_cache_destroy(ehca_module->cache_qp);
-       if (ret)
-               EDEB_ERR(4, "Cannot destroy QP SLAB cache. ret=%x", ret);
-
-       ret = kmem_cache_destroy(ehca_module->cache_av);
-       if (ret)
-               EDEB_ERR(4, "Cannot destroy AV SLAB cache. ret=%x", ret);
-
-       ret = kmem_cache_destroy(ehca_module->cache_mw);
-       if (ret)
-               EDEB_ERR(4, "Cannot destroy MW SLAB cache. ret=%x", ret);
-
-       ret = kmem_cache_destroy(ehca_module->cache_mr);
-       if (ret)
-               EDEB_ERR(4, "Cannot destroy MR SLAB cache. ret=%x", ret);
-
-       EDEB_EX(7, "");
-
-       return 0;
+       ehca_cleanup_mrmw_cache();
+       ehca_cleanup_av_cache();
+       ehca_cleanup_qp_cache();
+       ehca_cleanup_cq_cache();
+       ehca_cleanup_pd_cache();
 }
 
 #define EHCA_HCAAVER  EHCA_BMASK_IBM(32,39)
@@ -260,22 +163,20 @@ int ehca_destroy_slab_caches(struct ehca
 
 int ehca_sense_attributes(struct ehca_shca *shca)
 {
-       int ret = -EINVAL;
-       u64 h_ret = H_SUCCESS;
+       int ret = 0;
+       u64 h_ret;
        struct hipz_query_hca *rblock;
 
-       EDEB_EN(7, "shca=%p", shca);
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4, "Cannot allocate rblock memory.");
-               ret = -ENOMEM;
-               goto num_ports0;
+               ehca_gen_err("Cannot allocate rblock memory.");
+               return -ENOMEM;
        }
 
        h_ret = hipz_h_query_hca(shca->ipz_hca_handle, rblock);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "Cannot query device properties. h_ret=%lx", 
h_ret);
+               ehca_gen_err("Cannot query device properties. h_ret=%lx",
+                            h_ret);
                ret = -EPERM;
                goto num_ports1;
        }
@@ -285,7 +186,7 @@ int ehca_sense_attributes(struct ehca_sh
        else
                shca->num_ports = (u8)rblock->num_ports;
 
-       EDEB(6, " ... found %x ports", rblock->num_ports);
+       ehca_gen_dbg(" ... found %x ports", rblock->num_ports);
 
        if (ehca_hw_level == 0) {
                u32 hcaaver;
@@ -294,8 +195,7 @@ int ehca_sense_attributes(struct ehca_sh
                hcaaver = EHCA_BMASK_GET(EHCA_HCAAVER, rblock->hw_ver);
                revid   = EHCA_BMASK_GET(EHCA_REVID, rblock->hw_ver);
 
-               EDEB(6, " ... hardware version=%x:%x",
-                    hcaaver, revid);
+               ehca_gen_dbg(" ... hardware version=%x:%x", hcaaver, 
revid);
 
                if ((hcaaver == 1) && (revid == 0))
                        shca->hw_level = 0;
@@ -304,58 +204,43 @@ int ehca_sense_attributes(struct ehca_sh
                else if ((hcaaver == 1) && (revid == 2))
                        shca->hw_level = 2;
        }
-       EDEB(6, " ... hardware level=%x", shca->hw_level);
+       ehca_gen_dbg(" ... hardware level=%x", shca->hw_level);
 
        shca->sport[0].rate = IB_RATE_30_GBPS;
        shca->sport[1].rate = IB_RATE_30_GBPS;
 
-       ret = 0;
-
 num_ports1:
        kfree(rblock);
-
-num_ports0:
-       EDEB_EX(7, "ret=%x", ret);
-
        return ret;
 }
 
-static int init_node_guid(struct ehca_shca* shca)
+static int init_node_guid(struct ehca_shca *shca)
 {
        int ret = 0;
        struct hipz_query_hca *rblock;
 
-       EDEB_EN(7, "");
-
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!rblock) {
-               EDEB_ERR(4, "Can't allocate rblock memory.");
-               ret = -ENOMEM;
-               goto init_node_guid0;
+               ehca_err(&shca->ib_device, "Can't allocate rblock 
memory.");
+               return -ENOMEM;
        }
 
        if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) {
-               EDEB_ERR(4, "Can't query device properties");
+               ehca_err(&shca->ib_device, "Can't query device 
properties");
                ret = -EINVAL;
                goto init_node_guid1;
        }
 
-       memcpy(&shca->ib_device.node_guid, &rblock->node_guid, 
(sizeof(u64)));
+       memcpy(&shca->ib_device.node_guid, &rblock->node_guid, 
sizeof(u64));
 
 init_node_guid1:
        kfree(rblock);
-
-init_node_guid0:
-       EDEB_EX(7, "node_guid=%lx ret=%x", shca->ib_device.node_guid, 
ret);
-
        return ret;
 }
 
 int ehca_register_device(struct ehca_shca *shca)
 {
-       int ret = 0;
-
-       EDEB_EN(7, "shca=%p", shca);
+       int ret;
 
        ret = init_node_guid(shca);
        if (ret)
@@ -383,7 +268,7 @@ int ehca_register_device(struct ehca_shc
                (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)        |
                (1ull << IB_USER_VERBS_CMD_DETACH_MCAST);
 
-       shca->ib_device.node_type           = IB_NODE_CA;
+       shca->ib_device.node_type           = RDMA_NODE_IB_CA;
        shca->ib_device.phys_port_cnt       = shca->num_ports;
        shca->ib_device.dma_device          = 
&shca->ibmebus_dev->ofdev.dev;
        shca->ib_device.query_device        = ehca_query_device;
@@ -432,38 +317,35 @@ int ehca_register_device(struct ehca_shc
        shca->ib_device.mmap                = ehca_mmap;
 
        ret = ib_register_device(&shca->ib_device);
-
-       EDEB_EX(7, "ret=%x", ret);
+       if (ret)
+               ehca_err(&shca->ib_device,
+                        "ib_register_device() failed ret=%x", ret);
 
        return ret;
 }
 
 static int ehca_create_aqp1(struct ehca_shca *shca, u32 port)
 {
-       struct ehca_sport *sport;
+       struct ehca_sport *sport = &shca->sport[port - 1];
        struct ib_cq *ibcq;
        struct ib_qp *ibqp;
        struct ib_qp_init_attr qp_init_attr;
-       int ret = 0;
-
-       EDEB_EN(7, "shca=%p port=%x", shca, port);
-
-       sport = &shca->sport[port - 1];
+       int ret;
 
        if (sport->ibcq_aqp1) {
-               EDEB_ERR(4, "AQP1 CQ is already created.");
+               ehca_err(&shca->ib_device, "AQP1 CQ is already created.");
                return -EPERM;
        }
 
        ibcq = ib_create_cq(&shca->ib_device, NULL, NULL, (void*)(-1), 
10);
        if (IS_ERR(ibcq)) {
-               EDEB_ERR(4, "Cannot create AQP1 CQ.");
+               ehca_err(&shca->ib_device, "Cannot create AQP1 CQ.");
                return PTR_ERR(ibcq);
        }
        sport->ibcq_aqp1 = ibcq;
 
        if (sport->ibqp_aqp1) {
-               EDEB_ERR(4, "AQP1 QP is already created.");
+               ehca_err(&shca->ib_device, "AQP1 QP is already created.");
                ret = -EPERM;
                goto create_aqp1;
        }
@@ -484,84 +366,62 @@ static int ehca_create_aqp1(struct ehca_
 
        ibqp = ib_create_qp(&shca->pd->ib_pd, &qp_init_attr);
        if (IS_ERR(ibqp)) {
-               EDEB_ERR(4, "Cannot create AQP1 QP.");
+               ehca_err(&shca->ib_device, "Cannot create AQP1 QP.");
                ret = PTR_ERR(ibqp);
                goto create_aqp1;
        }
        sport->ibqp_aqp1 = ibqp;
 
-       goto create_aqp0;
+       return 0;
 
 create_aqp1:
        ib_destroy_cq(sport->ibcq_aqp1);
-
-create_aqp0:
-       EDEB_EX(7, "ret=%x", ret);
-
        return ret;
 }
 
 static int ehca_destroy_aqp1(struct ehca_sport *sport)
 {
-       int ret = 0;
-
-       EDEB_EN(7, "sport=%p", sport);
+       int ret;
 
        ret = ib_destroy_qp(sport->ibqp_aqp1);
        if (ret) {
-               EDEB_ERR(4, "Cannot destroy AQP1 QP. ret=%x", ret);
-               goto destroy_aqp1;
+               ehca_gen_err("Cannot destroy AQP1 QP. ret=%x", ret);
+               return ret;
        }
 
        ret = ib_destroy_cq(sport->ibcq_aqp1);
        if (ret)
-               EDEB_ERR(4, "Cannot destroy AQP1 CQ. ret=%x", ret);
-
-destroy_aqp1:
-       EDEB_EX(7, "ret=%x", ret);
+               ehca_gen_err("Cannot destroy AQP1 CQ. ret=%x", ret);
 
        return ret;
 }
 
-static ssize_t ehca_show_debug_mask(struct device_driver *ddp, char *buf)
+static ssize_t ehca_show_debug_level(struct device_driver *ddp, char 
*buf)
 {
-       int i;
-       int total = 0;
-       total += snprintf(buf + total, PAGE_SIZE - total, "%d",
-                         ehca_edeb_mask[0]);
-       for (i = 1; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) {
-               total += snprintf(buf + total, PAGE_SIZE - total, "%d",
-                                 ehca_edeb_mask[i]);
-       }
-
-       total += snprintf(buf + total, PAGE_SIZE - total, "\n");
-
-       return total;
+       return  snprintf(buf, PAGE_SIZE, "%d\n",
+                        ehca_debug_level);
 }
 
-static ssize_t ehca_store_debug_mask(struct device_driver *ddp,
-                                    const char *buf, size_t count)
+static ssize_t ehca_store_debug_level(struct device_driver *ddp,
+                                     const char *buf, size_t count)
 {
-       int i;
-       for (i = 0; i < EHCA_EDEB_TRACE_MASK_SIZE; i++) {
-               char value = buf[i] - '0';
-               if ((value <= 9) && (count >= i)) {
-                       ehca_edeb_mask[i] = value;
-               }
-       }
-       return count;
+       int value = (*buf) - '0';
+       if (value >= 0 && value <= 9)
+               ehca_debug_level = value;
+       return 1;
 }
-DRIVER_ATTR(debug_mask, S_IRUSR | S_IWUSR,
-           ehca_show_debug_mask, ehca_store_debug_mask);
+
+DRIVER_ATTR(debug_level, S_IRUSR | S_IWUSR,
+           ehca_show_debug_level, ehca_store_debug_level);
 
 void ehca_create_driver_sysfs(struct ibmebus_driver *drv)
 {
-       driver_create_file(&drv->driver, &driver_attr_debug_mask);
+       driver_create_file(&drv->driver, &driver_attr_debug_level);
 }
 
 void ehca_remove_driver_sysfs(struct ibmebus_driver *drv)
 {
-       driver_remove_file(&drv->driver, &driver_attr_debug_mask);
+       driver_remove_file(&drv->driver, &driver_attr_debug_level);
 }
 
 #define EHCA_RESOURCE_ATTR(name)  \
@@ -577,14 +437,14 @@ static ssize_t  ehca_show_##name(struct 
 \
        rblock = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL); \
        if (!rblock) { \
-               EDEB_ERR(4, "Can't allocate rblock memory."); \
+               dev_err(dev, "Can't allocate rblock memory."); \
                return 0; \
        } \
 \
        if (hipz_h_query_hca(shca->ipz_hca_handle, rblock) != H_SUCCESS) { 
\
-                       EDEB_ERR(4, "Can't query device properties"); \
-                       kfree(rblock); \
-                       return 0; \
+               dev_err(dev, "Can't query device properties"); \
+               kfree(rblock); \
+               return 0; \
        } \
  \
        data = rblock->name; \
@@ -669,26 +529,24 @@ static int __devinit ehca_probe(struct i
        struct ehca_shca *shca;
        u64 *handle;
        struct ib_pd *ibpd;
-       int ret = 0;
-
-       EDEB_EN(7, "");
+       int ret;
 
        handle = (u64 *)get_property(dev->ofdev.node, "ibm,hca-handle", 
NULL);
        if (!handle) {
-               EDEB_ERR(4, "Cannot get eHCA handle for adapter: %s.",
-                        dev->ofdev.node->full_name);
+               ehca_gen_err("Cannot get eHCA handle for adapter: %s.",
+                            dev->ofdev.node->full_name);
                return -ENODEV;
        }
 
        if (!(*handle)) {
-               EDEB_ERR(4, "Wrong eHCA handle for adapter: %s.",
-                        dev->ofdev.node->full_name);
+               ehca_gen_err("Wrong eHCA handle for adapter: %s.",
+                            dev->ofdev.node->full_name);
                return -ENODEV;
        }
 
        shca = (struct ehca_shca *)ib_alloc_device(sizeof(*shca));
-       if (shca == NULL) {
-               EDEB_ERR(4, "Cannot allocate shca memory.");
+       if (!shca) {
+               ehca_gen_err("Cannot allocate shca memory.");
                return -ENOMEM;
        }
 
@@ -698,29 +556,35 @@ static int __devinit ehca_probe(struct i
 
        ret = ehca_sense_attributes(shca);
        if (ret < 0) {
-               EDEB_ERR(4, "Cannot sense eHCA attributes.");
+               ehca_gen_err("Cannot sense eHCA attributes.");
+               goto probe1;
+       }
+
+       ret = ehca_register_device(shca);
+       if (ret) {
+               ehca_gen_err("Cannot register Infiniband device");
                goto probe1;
        }
 
        /* create event queues */
        ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048);
        if (ret) {
-               EDEB_ERR(4, "Cannot create EQ.");
-               goto probe1;
+               ehca_err(&shca->ib_device, "Cannot create EQ.");
+               goto probe2;
        }
 
        ret = ehca_create_eq(shca, &shca->neq, EHCA_NEQ, 513);
        if (ret) {
-               EDEB_ERR(4, "Cannot create NEQ.");
-               goto probe2;
+               ehca_err(&shca->ib_device, "Cannot create NEQ.");
+               goto probe3;
        }
 
        /* create internal protection domain */
        ibpd = ehca_alloc_pd(&shca->ib_device, (void*)(-1), NULL);
        if (IS_ERR(ibpd)) {
-               EDEB_ERR(4, "Cannot create internal PD.");
+               ehca_err(&shca->ib_device, "Cannot create internal PD.");
                ret = PTR_ERR(ibpd);
-               goto probe3;
+               goto probe4;
        }
 
        shca->pd = container_of(ibpd, struct ehca_pd, ib_pd);
@@ -730,13 +594,8 @@ static int __devinit ehca_probe(struct i
        ret = ehca_reg_internal_maxmr(shca, shca->pd, &shca->maxmr);
 
        if (ret) {
-               EDEB_ERR(4, "Cannot create internal MR. ret=%x", ret);
-               goto probe4;
-       }
-
-       ret = ehca_register_device(shca);
-       if (ret) {
-               EDEB_ERR(4, "Cannot register Infiniband device.");
+               ehca_err(&shca->ib_device, "Cannot create internal MR 
ret=%x",
+                        ret);
                goto probe5;
        }
 
@@ -745,7 +604,8 @@ static int __devinit ehca_probe(struct i
                shca->sport[0].port_state = IB_PORT_DOWN;
                ret = ehca_create_aqp1(shca, 1);
                if (ret) {
-                       EDEB_ERR(4, "Cannot create AQP1 for port 1.");
+                       ehca_err(&shca->ib_device,
+                                "Cannot create AQP1 for port 1.");
                        goto probe6;
                }
        }
@@ -755,54 +615,56 @@ static int __devinit ehca_probe(struct i
                shca->sport[1].port_state = IB_PORT_DOWN;
                ret = ehca_create_aqp1(shca, 2);
                if (ret) {
-                       EDEB_ERR(4, "Cannot create AQP1 for port 2.");
+                       ehca_err(&shca->ib_device,
+                                "Cannot create AQP1 for port 2.");
                        goto probe7;
                }
        }
 
        ehca_create_device_sysfs(dev);
 
-       spin_lock(&ehca_module.shca_lock);
-       list_add(&shca->shca_list, &ehca_module.shca_list);
-       spin_unlock(&ehca_module.shca_lock);
-
-       EDEB_EX(7, "ret=%x", ret);
+       spin_lock(&shca_list_lock);
+       list_add(&shca->shca_list, &shca_list);
+       spin_unlock(&shca_list_lock);
 
        return 0;
 
 probe7:
        ret = ehca_destroy_aqp1(&shca->sport[0]);
        if (ret)
-               EDEB_ERR(4, "Cannot destroy AQP1 for port 1. ret=%x", 
ret);
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy AQP1 for port 1. ret=%x", ret);
 
 probe6:
-       ib_unregister_device(&shca->ib_device);
+       ret = ehca_dereg_internal_maxmr(shca);
+       if (ret)
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy internal MR. ret=%x", ret);
 
 probe5:
-       ret = ehca_dereg_internal_maxmr(shca);
+       ret = ehca_dealloc_pd(&shca->pd->ib_pd);
        if (ret)
-               EDEB_ERR(4, "Cannot destroy internal MR. ret=%x", ret);
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy internal PD. ret=%x", ret);
 
 probe4:
-       ret = ehca_dealloc_pd(&shca->pd->ib_pd);
-       if (ret != 0)
-               EDEB_ERR(4, "Cannot destroy internal PD. ret=%x", ret);
+       ret = ehca_destroy_eq(shca, &shca->neq);
+       if (ret)
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy NEQ. ret=%x", ret);
 
 probe3:
-       ret = ehca_destroy_eq(shca, &shca->neq);
-       if (ret != 0)
-               EDEB_ERR(4, "Cannot destroy NEQ. ret=%x", ret);
+       ret = ehca_destroy_eq(shca, &shca->eq);
+       if (ret)
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy EQ. ret=%x", ret);
 
 probe2:
-       ret = ehca_destroy_eq(shca, &shca->eq);
-       if (ret != 0)
-               EDEB_ERR(4, "Cannot destroy EQ. ret=%x", ret);
+       ib_unregister_device(&shca->ib_device);
 
 probe1:
        ib_dealloc_device(&shca->ib_device);
 
-       EDEB_EX(4, "ret=%x", ret);
-
        return -EINVAL;
 }
 
@@ -811,18 +673,16 @@ static int __devexit ehca_remove(struct 
        struct ehca_shca *shca = dev->ofdev.dev.driver_data;
        int ret;
 
-       EDEB_EN(7, "shca=%p", shca);
-
        ehca_remove_device_sysfs(dev);
 
        if (ehca_open_aqp1 == 1) {
                int i;
-
                for (i = 0; i < shca->num_ports; i++) {
                        ret = ehca_destroy_aqp1(&shca->sport[i]);
-                       if (ret != 0)
-                               EDEB_ERR(4, "Cannot destroy AQP1 for port 
%x."
-                                        " ret=%x", ret, i);
+                       if (ret)
+                               ehca_err(&shca->ib_device,
+                                        "Cannot destroy AQP1 for port %x 
"
+                                        "ret=%x", ret, i);
                }
        }
 
@@ -830,27 +690,27 @@ static int __devexit ehca_remove(struct 
 
        ret = ehca_dereg_internal_maxmr(shca);
        if (ret)
-               EDEB_ERR(4, "Cannot destroy internal MR. ret=%x", ret);
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy internal MR. ret=%x", ret);
 
        ret = ehca_dealloc_pd(&shca->pd->ib_pd);
        if (ret)
-               EDEB_ERR(4, "Cannot destroy internal PD. ret=%x", ret);
+               ehca_err(&shca->ib_device,
+                        "Cannot destroy internal PD. ret=%x", ret);
 
        ret = ehca_destroy_eq(shca, &shca->eq);
        if (ret)
-               EDEB_ERR(4, "Cannot destroy EQ. ret=%x", ret);
+               ehca_err(&shca->ib_device, "Cannot destroy EQ. ret=%x", 
ret);
 
        ret = ehca_destroy_eq(shca, &shca->neq);
        if (ret)
-               EDEB_ERR(4, "Canot destroy NEQ. ret=%x", ret);
+               ehca_err(&shca->ib_device, "Canot destroy NEQ. ret=%x", 
ret);
 
        ib_dealloc_device(&shca->ib_device);
 
-       spin_lock(&ehca_module.shca_lock);
+       spin_lock(&shca_list_lock);
        list_del(&shca->shca_list);
-       spin_unlock(&ehca_module.shca_lock);
-
-       EDEB_EX(7, "ret=%x", ret);
+       spin_unlock(&shca_list_lock);
 
        return ret;
 }
@@ -871,37 +731,46 @@ static struct ibmebus_driver ehca_driver
        .remove   = ehca_remove,
 };
 
+void ehca_poll_eqs(unsigned long data)
+{
+       struct ehca_shca *shca;
+
+       spin_lock(&shca_list_lock);
+       list_for_each_entry(shca, &shca_list, shca_list) {
+               if (shca->eq.is_initialized)
+                       ehca_tasklet_eq((unsigned long)(void*)shca);
+       }
+       mod_timer(&poll_eqs_timer, jiffies + HZ);
+       spin_unlock(&shca_list_lock);
+}
+
 int __init ehca_module_init(void)
 {
-       int ret = 0;
+       int ret;
 
        printk(KERN_INFO "eHCA Infiniband Device Driver "
-                        "(Rel.: SVNEHCA_0012)\n");
-       EDEB_EN(7, "");
-
+                        "(Rel.: SVNEHCA_0015)\n");
        idr_init(&ehca_qp_idr);
        idr_init(&ehca_cq_idr);
        spin_lock_init(&ehca_qp_idr_lock);
        spin_lock_init(&ehca_cq_idr_lock);
 
-       INIT_LIST_HEAD(&ehca_module.shca_list);
-       spin_lock_init(&ehca_module.shca_lock);
-
-       ehca_init_trace();
+       INIT_LIST_HEAD(&shca_list);
+       spin_lock_init(&shca_list_lock);
 
        if ((ret = ehca_create_comp_pool())) {
-               EDEB_ERR(4, "Cannot create comp pool.");
-               goto module_init0;
+               ehca_gen_err("Cannot create comp pool.");
+               return ret;
        }
 
-       if ((ret = ehca_create_slab_caches(&ehca_module))) {
-               EDEB_ERR(4, "Cannot create SLAB caches");
+       if ((ret = ehca_create_slab_caches())) {
+               ehca_gen_err("Cannot create SLAB caches");
                ret = -ENOMEM;
                goto module_init1;
        }
 
        if ((ret = ibmebus_register_driver(&ehca_driver))) {
-               EDEB_ERR(4, "Cannot register eHCA device driver");
+               ehca_gen_err("Cannot register eHCA device driver");
                ret = -EINVAL;
                goto module_init2;
        }
@@ -909,49 +778,39 @@ int __init ehca_module_init(void)
        ehca_create_driver_sysfs(&ehca_driver);
 
        if (ehca_poll_all_eqs != 1) {
-               EDEB_ERR(4, "WARNING!!!");
-               EDEB_ERR(4, "It is possible to lose interrupts.");
+               ehca_gen_err("WARNING!!!");
+               ehca_gen_err("It is possible to lose interrupts.");
        } else {
-               init_timer(&ehca_module.timer);
-               ehca_module.timer.function = ehca_poll_eqs;
-               ehca_module.timer.data = (unsigned long)&ehca_module;
-               ehca_module.timer.expires = jiffies + HZ;
-               add_timer(&ehca_module.timer);
+               init_timer(&poll_eqs_timer);
+               poll_eqs_timer.function = ehca_poll_eqs;
+               poll_eqs_timer.expires = jiffies + HZ;
+               add_timer(&poll_eqs_timer);
        }
 
-       goto module_init0;
+       return 0;
 
 module_init2:
-       ehca_destroy_slab_caches(&ehca_module);
+       ehca_destroy_slab_caches();
 
 module_init1:
        ehca_destroy_comp_pool();
-
-module_init0:
-       EDEB_EX(7, "ret=%x", ret);
-
        return ret;
 };
 
 void __exit ehca_module_exit(void)
 {
-       EDEB_EN(7, "");
-
        if (ehca_poll_all_eqs == 1)
-               del_timer_sync(&ehca_module.timer);
+               del_timer_sync(&poll_eqs_timer);
 
        ehca_remove_driver_sysfs(&ehca_driver);
        ibmebus_unregister_driver(&ehca_driver);
 
-       if (ehca_destroy_slab_caches(&ehca_module) != 0)
-               EDEB_ERR(4, "Cannot destroy SLAB caches");
+       ehca_destroy_slab_caches();
 
        ehca_destroy_comp_pool();
 
        idr_destroy(&ehca_cq_idr);
        idr_destroy(&ehca_qp_idr);
-
-       EDEB_EX(7, "");
 };
 
 module_init(ehca_module_init);
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mcast.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_mcast.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mcast.c      2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_mcast.c   2006-08-30 
20:00:16.000000000 +0200
@@ -42,54 +42,38 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "mcas"
-
 #include <linux/module.h>
 #include <linux/err.h>
 #include "ehca_classes.h"
 #include "ehca_tools.h"
 #include "ehca_qes.h"
 #include "ehca_iverbs.h"
-
 #include "hcp_if.h"
 
 #define MAX_MC_LID 0xFFFE
 #define MIN_MC_LID 0xC000      /* Multicast limits */
 #define EHCA_VALID_MULTICAST_GID(gid)  ((gid)[0] == 0xFF)
-#define EHCA_VALID_MULTICAST_LID(lid)  (((lid) >= MIN_MC_LID) && ((lid) 
<= MAX_MC_LID))
+#define EHCA_VALID_MULTICAST_LID(lid) \
+       (((lid) >= MIN_MC_LID) && ((lid) <= MAX_MC_LID))
 
 int ehca_attach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 {
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_shca *shca = NULL;
+       struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
+       struct ehca_shca *shca = container_of(ibqp->device, struct 
ehca_shca,
+                                             ib_device);
        union ib_gid my_gid;
-       u64 subnet_prefix;
-       u64 interface_id;
-       u64 h_ret = H_SUCCESS;
-       int ret = 0;
-
-       EHCA_CHECK_ADR(ibqp);
-       EHCA_CHECK_ADR(gid);
-
-       my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
+       u64 subnet_prefix, interface_id, h_ret;
 
-       EHCA_CHECK_QP(my_qp);
        if (ibqp->qp_type != IB_QPT_UD) {
-               EDEB_ERR(4, "invalid qp_type %x gid, ret=%x",
-                        ibqp->qp_type, EINVAL);
+               ehca_err(ibqp->device, "invalid qp_type=%x", 
ibqp->qp_type);
                return -EINVAL;
        }
 
-       shca = container_of(ibqp->pd->device, struct ehca_shca, 
ib_device);
-       EHCA_CHECK_ADR(shca);
-
        if (!(EHCA_VALID_MULTICAST_GID(gid->raw))) {
-               EDEB_ERR(4, "gid is not valid mulitcast gid ret=%x",
-                        EINVAL);
+               ehca_err(ibqp->device, "invalid mulitcast gid");
                return -EINVAL;
        } else if ((lid < MIN_MC_LID) || (lid > MAX_MC_LID)) {
-               EDEB_ERR(4, "lid=%x is not valid mulitcast lid ret=%x",
-                        lid, EINVAL);
+               ehca_err(ibqp->device, "invalid mulitcast lid=%x", lid);
                return -EINVAL;
        }
 
@@ -101,100 +85,47 @@ int ehca_attach_mcast(struct ib_qp *ibqp
                                   my_qp->ipz_qp_handle,
                                   my_qp->galpas.kernel,
                                   lid, subnet_prefix, interface_id);
-       if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4,
+       if (h_ret != H_SUCCESS)
+               ehca_err(ibqp->device,
                         "ehca_qp=%p qp_num=%x hipz_h_attach_mcqp() failed 
"
                         "h_ret=%lx", my_qp, ibqp->qp_num, h_ret);
-       }
-       ret = ehca2ib_return_code(h_ret);
 
-       EDEB_EX(7, "mcast attach ret=%x\n"
-                  "ehca_qp=%p qp_num=%x  lid=%x\n"
-                  "my_gid=  %x %x %x %x\n"
-                  "         %x %x %x %x\n"
-                  "         %x %x %x %x\n"
-                  "         %x %x %x %x\n",
-                  ret, my_qp, ibqp->qp_num, lid,
-                  my_gid.raw[0], my_gid.raw[1],
-                  my_gid.raw[2], my_gid.raw[3],
-                  my_gid.raw[4], my_gid.raw[5],
-                  my_gid.raw[6], my_gid.raw[7],
-                  my_gid.raw[8], my_gid.raw[9],
-                  my_gid.raw[10], my_gid.raw[11],
-                  my_gid.raw[12], my_gid.raw[13],
-                  my_gid.raw[14], my_gid.raw[15]);
-
-       return ret;
+       return ehca2ib_return_code(h_ret);
 }
 
 int ehca_detach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
 {
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_shca *shca = NULL;
+       struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
+       struct ehca_shca *shca = container_of(ibqp->pd->device,
+                                             struct ehca_shca, 
ib_device);
        union ib_gid my_gid;
-       u64 subnet_prefix;
-       u64 interface_id;
-       u64 h_ret = H_SUCCESS;
-       int ret = 0;
-
-       EHCA_CHECK_ADR(ibqp);
-       EHCA_CHECK_ADR(gid);
+       u64 subnet_prefix, interface_id, h_ret;
 
-       my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
-
-       EHCA_CHECK_QP(my_qp);
        if (ibqp->qp_type != IB_QPT_UD) {
-               EDEB_ERR(4, "invalid qp_type %x gid, ret=%x",
-                        ibqp->qp_type, EINVAL);
+               ehca_err(ibqp->device, "invalid qp_type %x", 
ibqp->qp_type);
                return -EINVAL;
        }
 
-       shca = container_of(ibqp->pd->device, struct ehca_shca, 
ib_device);
-       EHCA_CHECK_ADR(shca);
-
        if (!(EHCA_VALID_MULTICAST_GID(gid->raw))) {
-               EDEB_ERR(4, "gid is not valid mulitcast gid ret=%x",
-                        EINVAL);
+               ehca_err(ibqp->device, "invalid mulitcast gid");
                return -EINVAL;
        } else if ((lid < MIN_MC_LID) || (lid > MAX_MC_LID)) {
-               EDEB_ERR(4, "lid=%x is not valid mulitcast lid ret=%x",
-                        lid, EINVAL);
+               ehca_err(ibqp->device, "invalid mulitcast lid=%x", lid);
                return -EINVAL;
        }
 
-       EDEB_EN(7, "dgid=%p qp_numl=%x lid=%x",
-               gid, ibqp->qp_num, lid);
-
        memcpy(&my_gid.raw, gid->raw, sizeof(union ib_gid));
 
        subnet_prefix = be64_to_cpu(my_gid.global.subnet_prefix);
        interface_id = be64_to_cpu(my_gid.global.interface_id);
        h_ret = hipz_h_detach_mcqp(shca->ipz_hca_handle,
-                                    my_qp->ipz_qp_handle,
-                                    my_qp->galpas.kernel,
-                                    lid, subnet_prefix, interface_id);
-       if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4,
+                                  my_qp->ipz_qp_handle,
+                                  my_qp->galpas.kernel,
+                                  lid, subnet_prefix, interface_id);
+       if (h_ret != H_SUCCESS)
+               ehca_err(ibqp->device,
                         "ehca_qp=%p qp_num=%x hipz_h_detach_mcqp() failed 
"
                         "h_ret=%lx", my_qp, ibqp->qp_num, h_ret);
-       }
-       ret = ehca2ib_return_code(h_ret);
-
-       EDEB_EX(7, "mcast detach ret=%x\n"
-               "ehca_qp=%p qp_num=%x  lid=%x\n"
-               "my_gid=  %x %x %x %x\n"
-               "         %x %x %x %x\n"
-               "         %x %x %x %x\n"
-               "         %x %x %x %x\n",
-               ret, my_qp, ibqp->qp_num, lid,
-               my_gid.raw[0], my_gid.raw[1],
-               my_gid.raw[2], my_gid.raw[3],
-               my_gid.raw[4], my_gid.raw[5],
-               my_gid.raw[6], my_gid.raw[7],
-               my_gid.raw[8], my_gid.raw[9],
-               my_gid.raw[10], my_gid.raw[11],
-               my_gid.raw[12], my_gid.raw[13],
-               my_gid.raw[14], my_gid.raw[15]);
 
-       return ret;
+       return ehca2ib_return_code(h_ret);
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.c       2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.c    2006-08-30 
20:00:16.000000000 +0200
@@ -39,9 +39,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#undef DEB_PREFIX
-#define DEB_PREFIX "mrmw"
-
 #include <asm/current.h>
 
 #include "ehca_iverbs.h"
@@ -49,78 +46,62 @@
 #include "hcp_if.h"
 #include "hipz_hw.h"
 
-extern int ehca_use_hp_mr;
+static struct kmem_cache *mr_cache;
+static struct kmem_cache *mw_cache;
 
 static struct ehca_mr *ehca_mr_new(void)
 {
-       extern struct ehca_module ehca_module;
        struct ehca_mr *me;
 
-       me = kmem_cache_alloc(ehca_module.cache_mr, SLAB_KERNEL);
+       me = kmem_cache_alloc(mr_cache, SLAB_KERNEL);
        if (me) {
                memset(me, 0, sizeof(struct ehca_mr));
                spin_lock_init(&me->mrlock);
-               EDEB_EX(7, "ehca_mr=%p sizeof(ehca_mr_t)=%x", me,
-                       (u32) sizeof(struct ehca_mr));
-       } else {
-               EDEB_ERR(3, "alloc failed");
-       }
+       } else
+               ehca_gen_err("alloc failed");
 
        return me;
 }
 
 static void ehca_mr_delete(struct ehca_mr *me)
 {
-       extern struct ehca_module ehca_module;
-
-       kmem_cache_free(ehca_module.cache_mr, me);
+       kmem_cache_free(mr_cache, me);
 }
 
 static struct ehca_mw *ehca_mw_new(void)
 {
-       extern struct ehca_module ehca_module;
        struct ehca_mw *me;
 
-       me = kmem_cache_alloc(ehca_module.cache_mw, SLAB_KERNEL);
+       me = kmem_cache_alloc(mw_cache, SLAB_KERNEL);
        if (me) {
                memset(me, 0, sizeof(struct ehca_mw));
                spin_lock_init(&me->mwlock);
-               EDEB_EX(7, "ehca_mw=%p sizeof(ehca_mw_t)=%x", me,
-                       (u32) sizeof(struct ehca_mw));
-       } else {
-               EDEB_ERR(3, "alloc failed");
-       }
+       } else
+               ehca_gen_err("alloc failed");
 
        return me;
 }
 
 static void ehca_mw_delete(struct ehca_mw *me)
 {
-       extern struct ehca_module ehca_module;
-
-       kmem_cache_free(ehca_module.cache_mw, me);
+       kmem_cache_free(mw_cache, me);
 }
 
 
/*----------------------------------------------------------------------*/
 
 struct ib_mr *ehca_get_dma_mr(struct ib_pd *pd, int mr_access_flags)
 {
-       struct ib_mr *ib_mr = NULL;
-       int ret = 0;
-       struct ehca_mr *e_maxmr = NULL;
-       struct ehca_pd *e_pd = NULL;
-       struct ehca_shca *shca = NULL;
-
-       EDEB_EN(7, "pd=%p mr_access_flags=%x", pd, mr_access_flags);
-
-       EHCA_CHECK_PD_P(pd);
-       e_pd = container_of(pd, struct ehca_pd, ib_pd);
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
+       struct ib_mr *ib_mr;
+       int ret;
+       struct ehca_mr *e_maxmr;
+       struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd);
+       struct ehca_shca *shca =
+               container_of(pd->device, struct ehca_shca, ib_device);
 
        if (shca->maxmr) {
                e_maxmr = ehca_mr_new();
                if (!e_maxmr) {
-                       EDEB_ERR(4, "out of memory");
+                       ehca_err(&shca->ib_device, "out of memory");
                        ib_mr = ERR_PTR(-ENOMEM);
                        goto get_dma_mr_exit0;
                }
@@ -135,18 +116,15 @@ struct ib_mr *ehca_get_dma_mr(struct ib_
                }
                ib_mr = &e_maxmr->ib.ib_mr;
        } else {
-               EDEB_ERR(4, "no internal max-MR exist!");
+               ehca_err(&shca->ib_device, "no internal max-MR exist!");
                ib_mr = ERR_PTR(-EINVAL);
                goto get_dma_mr_exit0;
        }
 
 get_dma_mr_exit0:
        if (IS_ERR(ib_mr))
-               EDEB_EX(4, "rc=%lx pd=%p mr_access_flags=%x ",
-                       PTR_ERR(ib_mr), pd, mr_access_flags);
-       else
-               EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x",
-                       ib_mr, ib_mr->lkey, ib_mr->rkey);
+               ehca_err(&shca->ib_device, "rc=%lx pd=%p 
mr_access_flags=%x ",
+                        PTR_ERR(ib_mr), pd, mr_access_flags);
        return ib_mr;
 } /* end ehca_get_dma_mr() */
 
@@ -158,23 +136,20 @@ struct ib_mr *ehca_reg_phys_mr(struct ib
                               int mr_access_flags,
                               u64 *iova_start)
 {
-       struct ib_mr *ib_mr = NULL;
-       int ret = 0;
-       struct ehca_mr *e_mr = NULL;
-       struct ehca_shca *shca = NULL;
-       struct ehca_pd *e_pd = NULL;
-       u64 size = 0;
+       struct ib_mr *ib_mr;
+       int ret;
+       struct ehca_mr *e_mr;
+       struct ehca_shca *shca =
+               container_of(pd->device, struct ehca_shca, ib_device);
+       struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd);
+
+       u64 size;
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
-       u32 num_pages_mr = 0;
-       u32 num_pages_4k = 0; /* 4k portion "pages" */
+       u32 num_pages_mr;
+       u32 num_pages_4k; /* 4k portion "pages" */
 
-       EDEB_EN(7, "pd=%p phys_buf_array=%p num_phys_buf=%x "
-               "mr_access_flags=%x iova_start=%p", pd, phys_buf_array,
-               num_phys_buf, mr_access_flags, iova_start);
-
-       EHCA_CHECK_PD_P(pd);
-       if ((num_phys_buf <= 0) || ehca_adr_bad(phys_buf_array)) {
-               EDEB_ERR(4, "bad input values: num_phys_buf=%x "
+       if ((num_phys_buf <= 0) || !phys_buf_array) {
+               ehca_err(pd->device, "bad input values: num_phys_buf=%x "
                         "phys_buf_array=%p", num_phys_buf, 
phys_buf_array);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_phys_mr_exit0;
@@ -187,7 +162,7 @@ struct ib_mr *ehca_reg_phys_mr(struct ib
                 * Remote Write Access requires Local Write Access
                 * Remote Atomic Access requires Local Write Access
                 */
-               EDEB_ERR(4, "bad input values: mr_access_flags=%x",
+               ehca_err(pd->device, "bad input values: 
mr_access_flags=%x",
                         mr_access_flags);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_phys_mr_exit0;
@@ -202,18 +177,15 @@ struct ib_mr *ehca_reg_phys_mr(struct ib
        }
        if ((size == 0) ||
            (((u64)iova_start + size) < (u64)iova_start)) {
-               EDEB_ERR(4, "bad input values: size=%lx iova_start=%p",
+               ehca_err(pd->device, "bad input values: size=%lx 
iova_start=%p",
                         size, iova_start);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_phys_mr_exit0;
        }
 
-       e_pd = container_of(pd, struct ehca_pd, ib_pd);
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
-
        e_mr = ehca_mr_new();
        if (!e_mr) {
-               EDEB_ERR(4, "out of memory");
+               ehca_err(pd->device, "out of memory");
                ib_mr = ERR_PTR(-ENOMEM);
                goto reg_phys_mr_exit0;
        }
@@ -253,20 +225,16 @@ struct ib_mr *ehca_reg_phys_mr(struct ib
        }
 
        /* successful registration of all pages */
-       ib_mr = &e_mr->ib.ib_mr;
-       goto reg_phys_mr_exit0;
+       return &e_mr->ib.ib_mr;
 
 reg_phys_mr_exit1:
        ehca_mr_delete(e_mr);
 reg_phys_mr_exit0:
        if (IS_ERR(ib_mr))
-               EDEB_EX(4, "rc=%lx pd=%p phys_buf_array=%p "
-                       "num_phys_buf=%x mr_access_flags=%x 
iova_start=%p",
-                       PTR_ERR(ib_mr), pd, phys_buf_array,
-                       num_phys_buf, mr_access_flags, iova_start);
-       else
-               EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x",
-                       ib_mr, ib_mr->lkey, ib_mr->rkey);
+               ehca_err(pd->device, "rc=%lx pd=%p phys_buf_array=%p "
+                        "num_phys_buf=%x mr_access_flags=%x 
iova_start=%p",
+                        PTR_ERR(ib_mr), pd, phys_buf_array,
+                        num_phys_buf, mr_access_flags, iova_start);
        return ib_mr;
 } /* end ehca_reg_phys_mr() */
 
@@ -277,21 +245,22 @@ struct ib_mr *ehca_reg_user_mr(struct ib
                               int mr_access_flags,
                               struct ib_udata *udata)
 {
-       struct ib_mr *ib_mr = NULL;
-       struct ehca_mr *e_mr = NULL;
-       struct ehca_shca *shca = NULL;
-       struct ehca_pd *e_pd = NULL;
+       struct ib_mr *ib_mr;
+       struct ehca_mr *e_mr;
+       struct ehca_shca *shca =
+               container_of(pd->device, struct ehca_shca, ib_device);
+       struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd);
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
-       int ret = 0;
-       u32 num_pages_mr = 0;
-       u32 num_pages_4k = 0; /* 4k portion "pages" */
-
-       EDEB_EN(7, "pd=%p region=%p mr_access_flags=%x udata=%p",
-               pd, region, mr_access_flags, udata);
-
-       EHCA_CHECK_PD_P(pd);
-       if (ehca_adr_bad(region)) {
-               EDEB_ERR(4, "bad input values: region=%p", region);
+       int ret;
+       u32 num_pages_mr;
+       u32 num_pages_4k; /* 4k portion "pages" */
+
+        if (!pd) {
+               ehca_gen_err("bad pd=%p", pd);
+               return ERR_PTR(-EFAULT);
+       }
+       if (!region) {
+               ehca_err(pd->device, "bad input values: region=%p", 
region);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_user_mr_exit0;
        }
@@ -303,36 +272,29 @@ struct ib_mr *ehca_reg_user_mr(struct ib
                 * Remote Write Access requires Local Write Access
                 * Remote Atomic Access requires Local Write Access
                 */
-               EDEB_ERR(4, "bad input values: mr_access_flags=%x",
+               ehca_err(pd->device, "bad input values: 
mr_access_flags=%x",
                         mr_access_flags);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_user_mr_exit0;
        }
-       EDEB(7, "user_base=%lx virt_base=%lx length=%lx offset=%x 
page_size=%x "
-            "chunk_list.next=%p",
-            region->user_base, region->virt_base, region->length,
-            region->offset, region->page_size, region->chunk_list.next);
        if (region->page_size != PAGE_SIZE) {
-               EDEB_ERR(4, "page size not supported, 
region->page_size=%x",
-                        region->page_size);
+               ehca_err(pd->device, "page size not supported, "
+                        "region->page_size=%x", region->page_size);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_user_mr_exit0;
        }
 
        if ((region->length == 0) ||
            ((region->virt_base + region->length) < region->virt_base)) {
-               EDEB_ERR(4, "bad input values: length=%lx virt_base=%lx",
-                        region->length, region->virt_base);
+               ehca_err(pd->device, "bad input values: length=%lx "
+                        "virt_base=%lx", region->length, 
region->virt_base);
                ib_mr = ERR_PTR(-EINVAL);
                goto reg_user_mr_exit0;
        }
 
-       e_pd = container_of(pd, struct ehca_pd, ib_pd);
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
-
        e_mr = ehca_mr_new();
        if (!e_mr) {
-               EDEB_ERR(4, "out of memory");
+               ehca_err(pd->device, "out of memory");
                ib_mr = ERR_PTR(-ENOMEM);
                goto reg_user_mr_exit0;
        }
@@ -362,19 +324,15 @@ struct ib_mr *ehca_reg_user_mr(struct ib
        }
 
        /* successful registration of all pages */
-       ib_mr = &e_mr->ib.ib_mr;
-       goto reg_user_mr_exit0;
+       return &e_mr->ib.ib_mr;
 
 reg_user_mr_exit1:
        ehca_mr_delete(e_mr);
 reg_user_mr_exit0:
        if (IS_ERR(ib_mr))
-               EDEB_EX(4, "rc=%lx pd=%p region=%p mr_access_flags=%x "
-                       "udata=%p",
-                       PTR_ERR(ib_mr), pd, region, mr_access_flags, 
udata);
-       else
-               EDEB_EX(7, "ib_mr=%p lkey=%x rkey=%x",
-                       ib_mr, ib_mr->lkey, ib_mr->rkey);
+               ehca_err(pd->device, "rc=%lx pd=%p region=%p 
mr_access_flags=%x"
+                        " udata=%p",
+                        PTR_ERR(ib_mr), pd, region, mr_access_flags, 
udata);
        return ib_mr;
 } /* end ehca_reg_user_mr() */
 
@@ -388,32 +346,26 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
                       int mr_access_flags,
                       u64 *iova_start)
 {
-       int ret = 0;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mr *e_mr = NULL;
-       u64 new_size = 0;
-       u64 *new_start = NULL;
-       u32 new_acl = 0;
-       struct ehca_pd *new_pd = NULL;
-       u32 tmp_lkey = 0;
-       u32 tmp_rkey = 0;
+       int ret;
+
+       struct ehca_shca *shca =
+               container_of(mr->device, struct ehca_shca, ib_device);
+       struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
+       struct ehca_pd *my_pd = container_of(mr->pd, struct ehca_pd, 
ib_pd);
+       u64 new_size;
+       u64 *new_start;
+       u32 new_acl;
+       struct ehca_pd *new_pd;
+       u32 tmp_lkey, tmp_rkey;
        unsigned long sl_flags;
        u32 num_pages_mr = 0;
        u32 num_pages_4k = 0; /* 4k portion "pages" */
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
-       struct ehca_pd *my_pd = NULL;
        u32 cur_pid = current->tgid;
 
-       EDEB_EN(7, "mr=%p mr_rereg_mask=%x pd=%p phys_buf_array=%p "
-               "num_phys_buf=%x mr_access_flags=%x iova_start=%p",
-               mr, mr_rereg_mask, pd, phys_buf_array, num_phys_buf,
-               mr_access_flags, iova_start);
-
-       EHCA_CHECK_MR(mr);
-       my_pd = container_of(mr->pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            (my_pd->ownpid != cur_pid)) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(mr->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                ret = -EINVAL;
                goto rereg_phys_mr_exit0;
@@ -421,15 +373,19 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
 
        if (!(mr_rereg_mask & IB_MR_REREG_TRANS)) {
                /* TODO not supported, because PHYP rereg hCall needs 
pages */
-               EDEB_ERR(4, "rereg without IB_MR_REREG_TRANS not supported 
yet,"
-                        " mr_rereg_mask=%x", mr_rereg_mask);
+               ehca_err(mr->device, "rereg without IB_MR_REREG_TRANS not 
"
+                        "supported yet, mr_rereg_mask=%x", 
mr_rereg_mask);
                ret = -EINVAL;
                goto rereg_phys_mr_exit0;
        }
 
-       e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
        if (mr_rereg_mask & IB_MR_REREG_PD) {
-               EHCA_CHECK_PD(pd);
+               if (!pd) {
+                       ehca_err(mr->device, "rereg with bad pd, pd=%p "
+                                "mr_rereg_mask=%x", pd, mr_rereg_mask);
+                       ret = -EINVAL;
+                       goto rereg_phys_mr_exit0;
+               }
        }
 
        if ((mr_rereg_mask &
@@ -439,12 +395,10 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
                goto rereg_phys_mr_exit0;
        }
 
-       shca = container_of(mr->device, struct ehca_shca, ib_device);
-
        /* check other parameters */
        if (e_mr == shca->maxmr) {
                /* should be impossible, however reject to be sure */
-               EDEB_ERR(3, "rereg internal max-MR impossible, mr=%p "
+               ehca_err(mr->device, "rereg internal max-MR impossible, 
mr=%p "
                         "shca->maxmr=%p mr->lkey=%x",
                         mr, shca->maxmr, mr->lkey);
                ret = -EINVAL;
@@ -452,14 +406,14 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
        }
        if (mr_rereg_mask & IB_MR_REREG_TRANS) { /* transl., i.e. 
addr/size */
                if (e_mr->flags & EHCA_MR_FLAG_FMR) {
-                       EDEB_ERR(4, "not supported for FMR, mr=%p 
flags=%x",
-                                mr, e_mr->flags);
+                       ehca_err(mr->device, "not supported for FMR, mr=%p 
"
+                                "flags=%x", mr, e_mr->flags);
                        ret = -EINVAL;
                        goto rereg_phys_mr_exit0;
                }
-               if (ehca_adr_bad(phys_buf_array) || num_phys_buf <= 0) {
-                       EDEB_ERR(4, "bad input values: mr_rereg_mask=%x "
-                                "phys_buf_array=%p num_phys_buf=%x",
+               if (!phys_buf_array || num_phys_buf <= 0) {
+                       ehca_err(mr->device, "bad input values: 
mr_rereg_mask=%x"
+                                " phys_buf_array=%p num_phys_buf=%x",
                                 mr_rereg_mask, phys_buf_array, 
num_phys_buf);
                        ret = -EINVAL;
                        goto rereg_phys_mr_exit0;
@@ -474,7 +428,7 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
                 * Remote Write Access requires Local Write Access
                 * Remote Atomic Access requires Local Write Access
                 */
-               EDEB_ERR(4, "bad input values: mr_rereg_mask=%x "
+               ehca_err(mr->device, "bad input values: mr_rereg_mask=%x "
                         "mr_access_flags=%x", mr_rereg_mask, 
mr_access_flags);
                ret = -EINVAL;
                goto rereg_phys_mr_exit0;
@@ -497,7 +451,7 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
                        goto rereg_phys_mr_exit1;
                if ((new_size == 0) ||
                    (((u64)iova_start + new_size) < (u64)iova_start)) {
-                       EDEB_ERR(4, "bad input values: new_size=%lx "
+                       ehca_err(mr->device, "bad input values: 
new_size=%lx "
                                 "iova_start=%p", new_size, iova_start);
                        ret = -EINVAL;
                        goto rereg_phys_mr_exit1;
@@ -519,10 +473,6 @@ int ehca_rereg_phys_mr(struct ib_mr *mr,
        if (mr_rereg_mask & IB_MR_REREG_PD)
                new_pd = container_of(pd, struct ehca_pd, ib_pd);
 
-       EDEB(7, "mr=%p new_start=%p new_size=%lx new_acl=%x new_pd=%p "
-            "num_pages_mr=%x num_pages_4k=%x", e_mr, new_start, new_size,
-            new_acl, new_pd, num_pages_mr, num_pages_4k);
-
        ret = ehca_rereg_mr(shca, e_mr, new_start, new_size, new_acl,
                            new_pd, &pginfo, &tmp_lkey, &tmp_rkey);
        if (ret)
@@ -538,17 +488,11 @@ rereg_phys_mr_exit1:
        spin_unlock_irqrestore(&e_mr->mrlock, sl_flags);
 rereg_phys_mr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x mr=%p mr_rereg_mask=%x pd=%p "
-                       "phys_buf_array=%p num_phys_buf=%x 
mr_access_flags=%x "
-                       "iova_start=%p",
-                       ret, mr, mr_rereg_mask, pd, phys_buf_array,
-                       num_phys_buf, mr_access_flags, iova_start);
-       else
-               EDEB_EX(7, "mr=%p mr_rereg_mask=%x pd=%p phys_buf_array=%p 
"
-                       "num_phys_buf=%x mr_access_flags=%x 
iova_start=%p",
-                       mr, mr_rereg_mask, pd, phys_buf_array, 
num_phys_buf,
-                       mr_access_flags, iova_start);
-
+               ehca_err(mr->device, "ret=%x mr=%p mr_rereg_mask=%x pd=%p 
"
+                        "phys_buf_array=%p num_phys_buf=%x 
mr_access_flags=%x "
+                        "iova_start=%p",
+                        ret, mr, mr_rereg_mask, pd, phys_buf_array,
+                        num_phys_buf, mr_access_flags, iova_start);
        return ret;
 } /* end ehca_rereg_phys_mr() */
 
@@ -557,47 +501,36 @@ rereg_phys_mr_exit0:
 int ehca_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr)
 {
        int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mr *e_mr = NULL;
-       struct ehca_pd *my_pd = NULL;
+       u64 h_ret;
+       struct ehca_shca *shca =
+               container_of(mr->device, struct ehca_shca, ib_device);
+       struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
+       struct ehca_pd *my_pd = container_of(mr->pd, struct ehca_pd, 
ib_pd);
        u32 cur_pid = current->tgid;
        unsigned long sl_flags;
        struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0};
 
-       EDEB_EN(7, "mr=%p mr_attr=%p", mr, mr_attr);
-
-       EHCA_CHECK_MR(mr);
-
-       my_pd = container_of(mr->pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            (my_pd->ownpid != cur_pid)) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(mr->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                ret = -EINVAL;
                goto query_mr_exit0;
        }
 
-       e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
-       if (ehca_adr_bad(mr_attr)) {
-               EDEB_ERR(4, "bad input values: mr_attr=%p", mr_attr);
-               ret = -EINVAL;
-               goto query_mr_exit0;
-       }
        if ((e_mr->flags & EHCA_MR_FLAG_FMR)) {
-               EDEB_ERR(4, "not supported for FMR, mr=%p e_mr=%p "
+               ehca_err(mr->device, "not supported for FMR, mr=%p e_mr=%p 
"
                         "e_mr->flags=%x", mr, e_mr, e_mr->flags);
                ret = -EINVAL;
                goto query_mr_exit0;
        }
 
-       shca = container_of(mr->device, struct ehca_shca, ib_device);
        memset(mr_attr, 0, sizeof(struct ib_mr_attr));
        spin_lock_irqsave(&e_mr->mrlock, sl_flags);
 
        h_ret = hipz_h_query_mr(shca->ipz_hca_handle, e_mr, &hipzout);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_mr_query failed, h_ret=%lx mr=%p "
+               ehca_err(mr->device, "hipz_mr_query failed, h_ret=%lx 
mr=%p "
                         "hca_hndl=%lx mr_hndl=%lx lkey=%x",
                         h_ret, mr, shca->ipz_hca_handle.handle,
                         e_mr->ipz_mr_handle.handle, mr->lkey);
@@ -615,13 +548,8 @@ query_mr_exit1:
        spin_unlock_irqrestore(&e_mr->mrlock, sl_flags);
 query_mr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x mr=%p mr_attr=%p", ret, mr, mr_attr);
-       else
-               EDEB_EX(7, "pd=%p device_virt_addr=%lx size=%lx "
-                       "mr_access_flags=%x lkey=%x rkey=%x",
-                       mr_attr->pd, mr_attr->device_virt_addr,
-                       mr_attr->size, mr_attr->mr_access_flags,
-                       mr_attr->lkey, mr_attr->rkey);
+               ehca_err(mr->device, "ret=%x mr=%p mr_attr=%p",
+                        ret, mr, mr_attr);
        return ret;
 } /* end ehca_query_mr() */
 
@@ -630,35 +558,29 @@ query_mr_exit0:
 int ehca_dereg_mr(struct ib_mr *mr)
 {
        int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mr *e_mr = NULL;
-       struct ehca_pd *my_pd = NULL;
+       u64 h_ret;
+       struct ehca_shca *shca =
+               container_of(mr->device, struct ehca_shca, ib_device);
+       struct ehca_mr *e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
+       struct ehca_pd *my_pd = container_of(mr->pd, struct ehca_pd, 
ib_pd);
        u32 cur_pid = current->tgid;
 
-       EDEB_EN(7, "mr=%p", mr);
-
-       EHCA_CHECK_MR(mr);
-       my_pd = container_of(mr->pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            (my_pd->ownpid != cur_pid)) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(mr->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                ret = -EINVAL;
                goto dereg_mr_exit0;
        }
 
-       e_mr = container_of(mr, struct ehca_mr, ib.ib_mr);
-       shca = container_of(mr->device, struct ehca_shca, ib_device);
-
        if ((e_mr->flags & EHCA_MR_FLAG_FMR)) {
-               EDEB_ERR(4, "not supported for FMR, mr=%p e_mr=%p "
+               ehca_err(mr->device, "not supported for FMR, mr=%p e_mr=%p 
"
                         "e_mr->flags=%x", mr, e_mr, e_mr->flags);
                ret = -EINVAL;
                goto dereg_mr_exit0;
        } else if (e_mr == shca->maxmr) {
                /* should be impossible, however reject to be sure */
-               EDEB_ERR(3, "dereg internal max-MR impossible, mr=%p "
+               ehca_err(mr->device, "dereg internal max-MR impossible, 
mr=%p "
                         "shca->maxmr=%p mr->lkey=%x",
                         mr, shca->maxmr, mr->lkey);
                ret = -EINVAL;
@@ -668,8 +590,8 @@ int ehca_dereg_mr(struct ib_mr *mr)
        /* TODO: BUSY: MR still has bound window(s) */
        h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx shca=%p 
e_mr=%p"
-                        " hca_hndl=%lx mr_hndl=%lx mr->lkey=%x",
+               ehca_err(mr->device, "hipz_free_mr failed, h_ret=%lx 
shca=%p "
+                        "e_mr=%p hca_hndl=%lx mr_hndl=%lx mr->lkey=%x",
                         h_ret, shca, e_mr, shca->ipz_hca_handle.handle,
                         e_mr->ipz_mr_handle.handle, mr->lkey);
                ret = ehca_mrmw_map_hrc_free_mr(h_ret);
@@ -681,9 +603,7 @@ int ehca_dereg_mr(struct ib_mr *mr)
 
 dereg_mr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x mr=%p", ret, mr);
-       else
-               EDEB_EX(7, "");
+               ehca_err(mr->device, "ret=%x mr=%p", ret, mr);
        return ret;
 } /* end ehca_dereg_mr() */
 
@@ -691,19 +611,14 @@ dereg_mr_exit0:
 
 struct ib_mw *ehca_alloc_mw(struct ib_pd *pd)
 {
-       struct ib_mw *ib_mw = NULL;
-       u64 h_ret = H_SUCCESS;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mw *e_mw = NULL;
-       struct ehca_pd *e_pd = NULL;
+       struct ib_mw *ib_mw;
+       u64 h_ret;
+       struct ehca_mw *e_mw;
+       struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd);
+       struct ehca_shca *shca =
+               container_of(pd->device, struct ehca_shca, ib_device);
        struct ehca_mw_hipzout_parms hipzout = {{0},0};
 
-       EDEB_EN(7, "pd=%p", pd);
-
-       EHCA_CHECK_PD_P(pd);
-       e_pd = container_of(pd, struct ehca_pd, ib_pd);
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
-
        e_mw = ehca_mw_new();
        if (!e_mw) {
                ib_mw = ERR_PTR(-ENOMEM);
@@ -713,25 +628,22 @@ struct ib_mw *ehca_alloc_mw(struct ib_pd
        h_ret = hipz_h_alloc_resource_mw(shca->ipz_hca_handle, e_mw,
                                         e_pd->fw_pd, &hipzout);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_mw_allocate failed, h_ret=%lx shca=%p "
-                        "hca_hndl=%lx mw=%p", h_ret, shca,
-                        shca->ipz_hca_handle.handle, e_mw);
+               ehca_err(pd->device, "hipz_mw_allocate failed, h_ret=%lx "
+                        "shca=%p hca_hndl=%lx mw=%p",
+                        h_ret, shca, shca->ipz_hca_handle.handle, e_mw);
                ib_mw = ERR_PTR(ehca_mrmw_map_hrc_alloc(h_ret));
                goto alloc_mw_exit1;
        }
        /* successful MW allocation */
        e_mw->ipz_mw_handle = hipzout.handle;
        e_mw->ib_mw.rkey    = hipzout.rkey;
-       ib_mw = &e_mw->ib_mw;
-       goto alloc_mw_exit0;
+       return &e_mw->ib_mw;
 
 alloc_mw_exit1:
        ehca_mw_delete(e_mw);
 alloc_mw_exit0:
        if (IS_ERR(ib_mw))
-               EDEB_EX(4, "rc=%lx pd=%p", PTR_ERR(ib_mw), pd);
-       else
-               EDEB_EX(7, "ib_mw=%p rkey=%x", ib_mw, ib_mw->rkey);
+               ehca_err(pd->device, "rc=%lx pd=%p", PTR_ERR(ib_mw), pd);
        return ib_mw;
 } /* end ehca_alloc_mw() */
 
@@ -741,55 +653,32 @@ int ehca_bind_mw(struct ib_qp *qp,
                 struct ib_mw *mw,
                 struct ib_mw_bind *mw_bind)
 {
-       int ret = 0;
-
        /* TODO: not supported up to now */
-       EDEB_ERR(4, "bind MW currently not supported by HCAD");
-       ret = -EPERM;
-       goto bind_mw_exit0;
+       ehca_gen_err("bind MW currently not supported by HCAD");
 
-bind_mw_exit0:
-       if (ret)
-               EDEB_EX(4, "ret=%x qp=%p mw=%p mw_bind=%p",
-                       ret, qp, mw, mw_bind);
-       else
-               EDEB_EX(7, "qp=%p mw=%p mw_bind=%p", qp, mw, mw_bind);
-       return ret;
+       return -EPERM;
 } /* end ehca_bind_mw() */
 
 
/*----------------------------------------------------------------------*/
 
 int ehca_dealloc_mw(struct ib_mw *mw)
 {
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mw *e_mw = NULL;
-
-       EDEB_EN(7, "mw=%p", mw);
-
-       EHCA_CHECK_MW(mw);
-       e_mw = container_of(mw, struct ehca_mw, ib_mw);
-       shca = container_of(mw->device, struct ehca_shca, ib_device);
+       u64 h_ret;
+       struct ehca_shca *shca =
+               container_of(mw->device, struct ehca_shca, ib_device);
+       struct ehca_mw *e_mw = container_of(mw, struct ehca_mw, ib_mw);
 
        h_ret = hipz_h_free_resource_mw(shca->ipz_hca_handle, e_mw);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_free_mw failed, h_ret=%lx shca=%p mw=%p 
"
-                        "rkey=%x hca_hndl=%lx mw_hndl=%lx",
+               ehca_err(mw->device, "hipz_free_mw failed, h_ret=%lx 
shca=%p "
+                        "mw=%p rkey=%x hca_hndl=%lx mw_hndl=%lx",
                         h_ret, shca, mw, mw->rkey, 
shca->ipz_hca_handle.handle,
                         e_mw->ipz_mw_handle.handle);
-               ret = ehca_mrmw_map_hrc_free_mw(h_ret);
-               goto dealloc_mw_exit0;
+               return ehca_mrmw_map_hrc_free_mw(h_ret);
        }
        /* successful deallocation */
        ehca_mw_delete(e_mw);
-
-dealloc_mw_exit0:
-       if (ret)
-               EDEB_EX(4, "ret=%x mw=%p", ret, mw);
-       else
-               EDEB_EX(7, "");
-       return ret;
+       return 0;
 } /* end ehca_dealloc_mw() */
 
 
/*----------------------------------------------------------------------*/
@@ -798,28 +687,15 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_
                              int mr_access_flags,
                              struct ib_fmr_attr *fmr_attr)
 {
-       struct ib_fmr *ib_fmr = NULL;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mr *e_fmr = NULL;
-       int ret = 0;
-       struct ehca_pd *e_pd = NULL;
-       u32 tmp_lkey = 0;
-       u32 tmp_rkey = 0;
+       struct ib_fmr *ib_fmr;
+       struct ehca_shca *shca =
+               container_of(pd->device, struct ehca_shca, ib_device);
+       struct ehca_pd *e_pd = container_of(pd, struct ehca_pd, ib_pd);
+       struct ehca_mr *e_fmr;
+       int ret;
+       u32 tmp_lkey, tmp_rkey;
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
 
-       EDEB_EN(7, "pd=%p mr_access_flags=%x fmr_attr=%p",
-               pd, mr_access_flags, fmr_attr);
-
-       EHCA_CHECK_PD_P(pd);
-       if (ehca_adr_bad(fmr_attr)) {
-               EDEB_ERR(4, "bad input values: fmr_attr=%p", fmr_attr);
-               ib_fmr = ERR_PTR(-EINVAL);
-               goto alloc_fmr_exit0;
-       }
-
-       EDEB(7, "max_pages=%x max_maps=%x page_shift=%x",
-            fmr_attr->max_pages, fmr_attr->max_maps, 
fmr_attr->page_shift);
-
        /* check other parameters */
        if (((mr_access_flags & IB_ACCESS_REMOTE_WRITE) &&
             !(mr_access_flags & IB_ACCESS_LOCAL_WRITE)) ||
@@ -829,19 +705,19 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_
                 * Remote Write Access requires Local Write Access
                 * Remote Atomic Access requires Local Write Access
                 */
-               EDEB_ERR(4, "bad input values: mr_access_flags=%x",
+               ehca_err(pd->device, "bad input values: 
mr_access_flags=%x",
                         mr_access_flags);
                ib_fmr = ERR_PTR(-EINVAL);
                goto alloc_fmr_exit0;
        }
        if (mr_access_flags & IB_ACCESS_MW_BIND) {
-               EDEB_ERR(4, "bad input values: mr_access_flags=%x",
+               ehca_err(pd->device, "bad input values: 
mr_access_flags=%x",
                         mr_access_flags);
                ib_fmr = ERR_PTR(-EINVAL);
                goto alloc_fmr_exit0;
        }
        if ((fmr_attr->max_pages == 0) || (fmr_attr->max_maps == 0)) {
-               EDEB_ERR(4, "bad input values: fmr_attr->max_pages=%x "
+               ehca_err(pd->device, "bad input values: 
fmr_attr->max_pages=%x "
                         "fmr_attr->max_maps=%x fmr_attr->page_shift=%x",
                         fmr_attr->max_pages, fmr_attr->max_maps,
                         fmr_attr->page_shift);
@@ -850,15 +726,12 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_
        }
        if (((1 << fmr_attr->page_shift) != EHCA_PAGESIZE) &&
            ((1 << fmr_attr->page_shift) != PAGE_SIZE)) {
-               EDEB_ERR(4, "unsupported fmr_attr->page_shift=%x",
+               ehca_err(pd->device, "unsupported 
fmr_attr->page_shift=%x",
                         fmr_attr->page_shift);
                ib_fmr = ERR_PTR(-EINVAL);
                goto alloc_fmr_exit0;
        }
 
-       e_pd = container_of(pd, struct ehca_pd, ib_pd);
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
-
        e_fmr = ehca_mr_new();
        if (!e_fmr) {
                ib_fmr = ERR_PTR(-ENOMEM);
@@ -881,19 +754,15 @@ struct ib_fmr *ehca_alloc_fmr(struct ib_
        e_fmr->fmr_max_pages = fmr_attr->max_pages;
        e_fmr->fmr_max_maps = fmr_attr->max_maps;
        e_fmr->fmr_map_cnt = 0;
-       ib_fmr = &e_fmr->ib.ib_fmr;
-       goto alloc_fmr_exit0;
+       return &e_fmr->ib.ib_fmr;
 
 alloc_fmr_exit1:
        ehca_mr_delete(e_fmr);
 alloc_fmr_exit0:
        if (IS_ERR(ib_fmr))
-               EDEB_EX(4, "rc=%lx pd=%p mr_access_flags=%x "
-                       "fmr_attr=%p", PTR_ERR(ib_fmr), pd,
-                       mr_access_flags, fmr_attr);
-       else
-               EDEB_EX(7, "ib_fmr=%p tmp_lkey=%x tmp_rkey=%x",
-                       ib_fmr, tmp_lkey, tmp_rkey);
+               ehca_err(pd->device, "rc=%lx pd=%p mr_access_flags=%x "
+                        "fmr_attr=%p", PTR_ERR(ib_fmr), pd,
+                        mr_access_flags, fmr_attr);
        return ib_fmr;
 } /* end ehca_alloc_fmr() */
 
@@ -904,24 +773,16 @@ int ehca_map_phys_fmr(struct ib_fmr *fmr
                      int list_len,
                      u64 iova)
 {
-       int ret = 0;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mr *e_fmr = NULL;
-       struct ehca_pd *e_pd = NULL;
+       int ret;
+       struct ehca_shca *shca =
+               container_of(fmr->device, struct ehca_shca, ib_device);
+       struct ehca_mr *e_fmr = container_of(fmr, struct ehca_mr, 
ib.ib_fmr);
+       struct ehca_pd *e_pd = container_of(fmr->pd, struct ehca_pd, 
ib_pd);
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
-       u32 tmp_lkey = 0;
-       u32 tmp_rkey = 0;
-
-       EDEB_EN(7, "fmr=%p page_list=%p list_len=%x iova=%lx",
-               fmr, page_list, list_len, iova);
-
-       EHCA_CHECK_FMR(fmr);
-       e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr);
-       shca = container_of(fmr->device, struct ehca_shca, ib_device);
-       e_pd = container_of(fmr->pd, struct ehca_pd, ib_pd);
+       u32 tmp_lkey, tmp_rkey;
 
        if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) {
-               EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x",
+               ehca_err(fmr->device, "not a FMR, e_fmr=%p 
e_fmr->flags=%x",
                         e_fmr, e_fmr->flags);
                ret = -EINVAL;
                goto map_phys_fmr_exit0;
@@ -931,16 +792,16 @@ int ehca_map_phys_fmr(struct ib_fmr *fmr
                goto map_phys_fmr_exit0;
        if (iova % e_fmr->fmr_page_size) {
                /* only whole-numbered pages */
-               EDEB_ERR(4, "bad iova, iova=%lx fmr_page_size=%x",
+               ehca_err(fmr->device, "bad iova, iova=%lx 
fmr_page_size=%x",
                         iova, e_fmr->fmr_page_size);
                ret = -EINVAL;
                goto map_phys_fmr_exit0;
        }
        if (e_fmr->fmr_map_cnt >= e_fmr->fmr_max_maps) {
                /* HCAD does not limit the maps, however trace this anyway 
*/
-               EDEB(6, "map limit exceeded, fmr=%p e_fmr->fmr_map_cnt=%x 
"
-                    "e_fmr->fmr_max_maps=%x",
-                    fmr, e_fmr->fmr_map_cnt, e_fmr->fmr_max_maps);
+               ehca_info(fmr->device, "map limit exceeded, fmr=%p "
+                         "e_fmr->fmr_map_cnt=%x e_fmr->fmr_max_maps=%x",
+                         fmr, e_fmr->fmr_map_cnt, e_fmr->fmr_max_maps);
        }
 
        pginfo.type      = EHCA_MR_PGI_FMR;
@@ -960,14 +821,13 @@ int ehca_map_phys_fmr(struct ib_fmr *fmr
        e_fmr->fmr_map_cnt++;
        e_fmr->ib.ib_fmr.lkey = tmp_lkey;
        e_fmr->ib.ib_fmr.rkey = tmp_rkey;
+       return 0;
 
 map_phys_fmr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x fmr=%p page_list=%p list_len=%x 
iova=%lx",
-                       ret, fmr, page_list, list_len, iova);
-       else
-               EDEB_EX(7, "lkey=%x rkey=%x",
-                       e_fmr->ib.ib_fmr.lkey, e_fmr->ib.ib_fmr.rkey);
+               ehca_err(fmr->device, "ret=%x fmr=%p page_list=%p 
list_len=%x "
+                        "iova=%lx",
+                        ret, fmr, page_list, list_len, iova);
        return ret;
 } /* end ehca_map_phys_fmr() */
 
@@ -976,31 +836,34 @@ map_phys_fmr_exit0:
 int ehca_unmap_fmr(struct list_head *fmr_list)
 {
        int ret = 0;
-       struct ib_fmr *ib_fmr = NULL;
+       struct ib_fmr *ib_fmr;
        struct ehca_shca *shca = NULL;
-       struct ehca_shca *prev_shca = NULL;
-       struct ehca_mr *e_fmr = NULL;
+       struct ehca_shca *prev_shca;
+       struct ehca_mr *e_fmr;
        u32 num_fmr = 0;
        u32 unmap_fmr_cnt = 0;
 
-       EDEB_EN(7, "fmr_list=%p", fmr_list);
-
        /* check all FMR belong to same SHCA, and check internal flag */
        list_for_each_entry(ib_fmr, fmr_list, list) {
                prev_shca = shca;
+               if (!ib_fmr) {
+                       ehca_gen_err("bad fmr=%p in list", ib_fmr);
+                       ret = -EINVAL;
+                       goto unmap_fmr_exit0;
+               }
                shca = container_of(ib_fmr->device, struct ehca_shca,
                                    ib_device);
-               EHCA_CHECK_FMR(ib_fmr);
                e_fmr = container_of(ib_fmr, struct ehca_mr, ib.ib_fmr);
                if ((shca != prev_shca) && prev_shca) {
-                       EDEB_ERR(4, "SHCA mismatch, shca=%p prev_shca=%p "
-                                "e_fmr=%p", shca, prev_shca, e_fmr);
+                       ehca_err(&shca->ib_device, "SHCA mismatch, shca=%p 
"
+                                "prev_shca=%p e_fmr=%p",
+                                shca, prev_shca, e_fmr);
                        ret = -EINVAL;
                        goto unmap_fmr_exit0;
                }
                if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) {
-                       EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x",
-                                e_fmr, e_fmr->flags);
+                       ehca_err(&shca->ib_device, "not a FMR, e_fmr=%p "
+                                "e_fmr->flags=%x", e_fmr, e_fmr->flags);
                        ret = -EINVAL;
                        goto unmap_fmr_exit0;
                }
@@ -1016,20 +879,18 @@ int ehca_unmap_fmr(struct list_head *fmr
                ret = ehca_unmap_one_fmr(shca, e_fmr);
                if (ret) {
                        /* unmap failed, stop unmapping of rest of FMRs */
-                       EDEB_ERR(4, "unmap of one FMR failed, stop rest, "
-                                "e_fmr=%p num_fmr=%x unmap_fmr_cnt=%x 
lkey=%x",
-                                e_fmr, num_fmr, unmap_fmr_cnt,
-                                e_fmr->ib.ib_fmr.lkey);
+                       ehca_err(&shca->ib_device, "unmap of one FMR 
failed, "
+                                "stop rest, e_fmr=%p num_fmr=%x "
+                                "unmap_fmr_cnt=%x lkey=%x", e_fmr, 
num_fmr,
+                                unmap_fmr_cnt, e_fmr->ib.ib_fmr.lkey);
                        goto unmap_fmr_exit0;
                }
        }
 
 unmap_fmr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x fmr_list=%p num_fmr=%x 
unmap_fmr_cnt=%x",
-                       ret, fmr_list, num_fmr, unmap_fmr_cnt);
-       else
-               EDEB_EX(7, "num_fmr=%x", num_fmr);
+               ehca_gen_err("ret=%x fmr_list=%p num_fmr=%x 
unmap_fmr_cnt=%x",
+                            ret, fmr_list, num_fmr, unmap_fmr_cnt);
        return ret;
 } /* end ehca_unmap_fmr() */
 
@@ -1037,19 +898,14 @@ unmap_fmr_exit0:
 
 int ehca_dealloc_fmr(struct ib_fmr *fmr)
 {
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       struct ehca_shca *shca = NULL;
-       struct ehca_mr *e_fmr = NULL;
-
-       EDEB_EN(7, "fmr=%p", fmr);
-
-       EHCA_CHECK_FMR(fmr);
-       e_fmr = container_of(fmr, struct ehca_mr, ib.ib_fmr);
-       shca = container_of(fmr->device, struct ehca_shca, ib_device);
+       int ret;
+       u64 h_ret;
+       struct ehca_shca *shca =
+               container_of(fmr->device, struct ehca_shca, ib_device);
+       struct ehca_mr *e_fmr = container_of(fmr, struct ehca_mr, 
ib.ib_fmr);
 
        if (!(e_fmr->flags & EHCA_MR_FLAG_FMR)) {
-               EDEB_ERR(4, "not a FMR, e_fmr=%p e_fmr->flags=%x",
+               ehca_err(fmr->device, "not a FMR, e_fmr=%p 
e_fmr->flags=%x",
                         e_fmr, e_fmr->flags);
                ret = -EINVAL;
                goto free_fmr_exit0;
@@ -1057,21 +913,20 @@ int ehca_dealloc_fmr(struct ib_fmr *fmr)
 
        h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_fmr);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx e_fmr=%p "
+               ehca_err(fmr->device, "hipz_free_mr failed, h_ret=%lx 
e_fmr=%p "
                         "hca_hndl=%lx fmr_hndl=%lx fmr->lkey=%x",
                         h_ret, e_fmr, shca->ipz_hca_handle.handle,
                         e_fmr->ipz_mr_handle.handle, fmr->lkey);
-               ehca_mrmw_map_hrc_free_mr(h_ret);
+               ret = ehca_mrmw_map_hrc_free_mr(h_ret);
                goto free_fmr_exit0;
        }
        /* successful deregistration */
        ehca_mr_delete(e_fmr);
+       return 0;
 
 free_fmr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x fmr=%p", ret, fmr);
-       else
-               EDEB_EX(7, "");
+               ehca_err(&shca->ib_device, "ret=%x fmr=%p", ret, fmr);
        return ret;
 } /* end ehca_dealloc_fmr() */
 
@@ -1087,15 +942,11 @@ int ehca_reg_mr(struct ehca_shca *shca,
                u32 *lkey, /*OUT*/
                u32 *rkey) /*OUT*/
 {
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       u32 hipz_acl = 0;
+       int ret;
+       u64 h_ret;
+       u32 hipz_acl;
        struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0};
 
-       EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x e_pd=%p 
"
-               "pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr, 
iova_start,
-               size, acl, e_pd, pginfo, pginfo->num_pages, 
pginfo->num_4k);
-
        ehca_mrmw_map_acl(acl, &hipz_acl);
        ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl);
        if (ehca_use_hp_mr == 1)
@@ -1105,8 +956,8 @@ int ehca_reg_mr(struct ehca_shca *shca,
                                         (u64)iova_start, size, hipz_acl,
                                         e_pd->fw_pd, &hipzout);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_alloc_mr failed, h_ret=%lx 
hca_hndl=%lx",
-                        h_ret, shca->ipz_hca_handle.handle);
+               ehca_err(&shca->ib_device, "hipz_alloc_mr failed, 
h_ret=%lx "
+                        "hca_hndl=%lx", h_ret, 
shca->ipz_hca_handle.handle);
                ret = ehca_mrmw_map_hrc_alloc(h_ret);
                goto ehca_reg_mr_exit0;
        }
@@ -1125,26 +976,27 @@ int ehca_reg_mr(struct ehca_shca *shca,
        e_mr->acl       = acl;
        *lkey = hipzout.lkey;
        *rkey = hipzout.rkey;
-       goto ehca_reg_mr_exit0;
+       return 0;
 
 ehca_reg_mr_exit1:
        h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, e_mr);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(1, "h_ret=%lx shca=%p e_mr=%p iova_start=%p "
-                        "size=%lx acl=%x e_pd=%p lkey=%x pginfo=%p "
-                        "num_pages=%lx num_4k=%lx ret=%x", h_ret, shca, 
e_mr,
-                        iova_start, size, acl, e_pd, hipzout.lkey, 
pginfo,
-                        pginfo->num_pages, pginfo->num_4k, ret);
-               EDEB_ERR(1, "internal error in ehca_reg_mr, not 
recoverable");
+               ehca_err(&shca->ib_device, "h_ret=%lx shca=%p e_mr=%p "
+                        "iova_start=%p size=%lx acl=%x e_pd=%p lkey=%x "
+                        "pginfo=%p num_pages=%lx num_4k=%lx ret=%x",
+                        h_ret, shca, e_mr, iova_start, size, acl, e_pd,
+                        hipzout.lkey, pginfo, pginfo->num_pages,
+                        pginfo->num_4k, ret);
+               ehca_err(&shca->ib_device, "internal error in ehca_reg_mr, 
"
+                        "not recoverable");
        }
 ehca_reg_mr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx 
"
-                       "acl=%x e_pd=%p pginfo=%p num_pages=%lx 
num_4k=%lx",
-                       ret, shca, e_mr, iova_start, size, acl, e_pd, 
pginfo,
-                       pginfo->num_pages, pginfo->num_4k);
-       else
-               EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey);
+               ehca_err(&shca->ib_device, "ret=%x shca=%p e_mr=%p "
+                        "iova_start=%p size=%lx acl=%x e_pd=%p pginfo=%p 
"
+                        "num_pages=%lx num_4k=%lx",
+                        ret, shca, e_mr, iova_start, size, acl, e_pd, 
pginfo,
+                        pginfo->num_pages, pginfo->num_4k);
        return ret;
 } /* end ehca_reg_mr() */
 
@@ -1155,18 +1007,15 @@ int ehca_reg_mr_rpages(struct ehca_shca 
                       struct ehca_mr_pginfo *pginfo)
 {
        int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       u32 rnum = 0;
-       u64 rpage = 0;
+       u64 h_ret;
+       u32 rnum;
+       u64 rpage;
        u32 i;
-       u64 *kpage = NULL;
-
-       EDEB_EN(7, "shca=%p e_mr=%p pginfo=%p num_pages=%lx num_4k=%lx",
-               shca, e_mr, pginfo, pginfo->num_pages, pginfo->num_4k);
+       u64 *kpage;
 
        kpage = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!kpage) {
-               EDEB_ERR(4, "kpage alloc failed");
+               ehca_err(&shca->ib_device, "kpage alloc failed");
                ret = -ENOMEM;
                goto ehca_reg_mr_rpages_exit0;
        }
@@ -1184,29 +1033,29 @@ int ehca_reg_mr_rpages(struct ehca_shca 
                if (rnum > 1) {
                        ret = ehca_set_pagebuf(e_mr, pginfo, rnum, kpage);
                        if (ret) {
-                               EDEB_ERR(4, "ehca_set_pagebuf bad rc, 
ret=%x "
-                                        "rnum=%x kpage=%p", ret, rnum, 
kpage);
+                               ehca_err(&shca->ib_device, 
"ehca_set_pagebuf "
+                                        "bad rc, ret=%x rnum=%x 
kpage=%p",
+                                        ret, rnum, kpage);
                                ret = -EFAULT;
                                goto ehca_reg_mr_rpages_exit1;
                        }
                        rpage = virt_to_abs(kpage);
                        if (!rpage) {
-                               EDEB_ERR(4, "kpage=%p i=%x", kpage, i);
+                               ehca_err(&shca->ib_device, "kpage=%p 
i=%x",
+                                        kpage, i);
                                ret = -EFAULT;
                                goto ehca_reg_mr_rpages_exit1;
                        }
                } else {  /* rnum==1 */
                        ret = ehca_set_pagebuf_1(e_mr, pginfo, &rpage);
                        if (ret) {
-                               EDEB_ERR(4, "ehca_set_pagebuf_1 bad rc, "
-                                        "ret=%x i=%x", ret, i);
+                               ehca_err(&shca->ib_device, 
"ehca_set_pagebuf_1 "
+                                        "bad rc, ret=%x i=%x", ret, i);
                                ret = -EFAULT;
                                goto ehca_reg_mr_rpages_exit1;
                        }
                }
 
-               EDEB(9, "i=%x rnum=%x rpage=%lx", i, rnum, rpage);
-
                h_ret = hipz_h_register_rpage_mr(shca->ipz_hca_handle, 
e_mr,
                                                 0, /* pagesize 4k */
                                                 0, rpage, rnum);
@@ -1217,9 +1066,10 @@ int ehca_reg_mr_rpages(struct ehca_shca 
                         * and for 'page registered'==H_PAGE_REGISTERED
                         */
                        if (h_ret != H_SUCCESS) {
-                               EDEB_ERR(4, "last hipz_reg_rpage_mr 
failed, "
-                                        "h_ret=%lx e_mr=%p i=%x 
hca_hndl=%lx "
-                                        "mr_hndl=%lx lkey=%x", h_ret, 
e_mr, i,
+                               ehca_err(&shca->ib_device, "last "
+                                        "hipz_reg_rpage_mr failed, 
h_ret=%lx "
+                                        "e_mr=%p i=%x hca_hndl=%lx 
mr_hndl=%lx"
+                                        " lkey=%x", h_ret, e_mr, i,
                                         shca->ipz_hca_handle.handle,
                                         e_mr->ipz_mr_handle.handle,
                                         e_mr->ib.ib_mr.lkey);
@@ -1228,8 +1078,8 @@ int ehca_reg_mr_rpages(struct ehca_shca 
                        } else
                                ret = 0;
                } else if (h_ret != H_PAGE_REGISTERED) {
-                       EDEB_ERR(4, "hipz_reg_rpage_mr failed, h_ret=%lx "
-                                "e_mr=%p i=%x lkey=%x hca_hndl=%lx "
+                       ehca_err(&shca->ib_device, "hipz_reg_rpage_mr 
failed, "
+                                "h_ret=%lx e_mr=%p i=%x lkey=%x 
hca_hndl=%lx "
                                 "mr_hndl=%lx", h_ret, e_mr, i,
                                 e_mr->ib.ib_mr.lkey,
                                 shca->ipz_hca_handle.handle,
@@ -1245,11 +1095,9 @@ ehca_reg_mr_rpages_exit1:
        kfree(kpage);
 ehca_reg_mr_rpages_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x shca=%p e_mr=%p pginfo=%p num_pages=%lx 
"
-                       "num_4k=%lx", ret, shca, e_mr, pginfo,
-                       pginfo->num_pages, pginfo->num_4k);
-       else
-               EDEB_EX(7, "ret=%x", ret);
+               ehca_err(&shca->ib_device, "ret=%x shca=%p e_mr=%p 
pginfo=%p "
+                        "num_pages=%lx num_4k=%lx", ret, shca, e_mr, 
pginfo,
+                        pginfo->num_pages, pginfo->num_4k);
        return ret;
 } /* end ehca_reg_mr_rpages() */
 
@@ -1265,25 +1113,20 @@ inline int ehca_rereg_mr_rereg1(struct e
                                u32 *lkey, /*OUT*/
                                u32 *rkey) /*OUT*/
 {
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       u32 hipz_acl = 0;
-       u64 *kpage = NULL;
-       u64 rpage = 0;
+       int ret;
+       u64 h_ret;
+       u32 hipz_acl;
+       u64 *kpage;
+       u64 rpage;
        struct ehca_mr_pginfo pginfo_save;
        struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0};
 
-       EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x "
-               "e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr,
-               iova_start, size, acl, e_pd, pginfo, pginfo->num_pages,
-               pginfo->num_4k);
-
        ehca_mrmw_map_acl(acl, &hipz_acl);
        ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl);
 
        kpage = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (!kpage) {
-               EDEB_ERR(4, "kpage alloc failed");
+               ehca_err(&shca->ib_device, "kpage alloc failed");
                ret = -ENOMEM;
                goto ehca_rereg_mr_rereg1_exit0;
        }
@@ -1291,14 +1134,15 @@ inline int ehca_rereg_mr_rereg1(struct e
        pginfo_save = *pginfo;
        ret = ehca_set_pagebuf(e_mr, pginfo, pginfo->num_4k, kpage);
        if (ret) {
-               EDEB_ERR(4, "set pagebuf failed, e_mr=%p pginfo=%p type=%x 
"
-                        "num_pages=%lx num_4k=%lx kpage=%p", e_mr, 
pginfo,
-                        pginfo->type, pginfo->num_pages, 
pginfo->num_4k,kpage);
+               ehca_err(&shca->ib_device, "set pagebuf failed, e_mr=%p "
+                        "pginfo=%p type=%x num_pages=%lx num_4k=%lx 
kpage=%p",
+                        e_mr, pginfo, pginfo->type, pginfo->num_pages,
+                        pginfo->num_4k,kpage);
                goto ehca_rereg_mr_rereg1_exit1;
        }
        rpage = virt_to_abs(kpage);
        if (!rpage) {
-               EDEB_ERR(4, "kpage=%p", kpage);
+               ehca_err(&shca->ib_device, "kpage=%p", kpage);
                ret = -EFAULT;
                goto ehca_rereg_mr_rereg1_exit1;
        }
@@ -1311,13 +1155,13 @@ inline int ehca_rereg_mr_rereg1(struct e
                 * e.g. this is required in case H_MR_CONDITION
                 * (MW bound or MR is shared)
                 */
-               EDEB(6, "hipz_h_reregister_pmr failed (Rereg1), h_ret=%lx 
"
-                    "e_mr=%p", h_ret, e_mr);
+               ehca_warn(&shca->ib_device, "hipz_h_reregister_pmr failed 
"
+                         "(Rereg1), h_ret=%lx e_mr=%p", h_ret, e_mr);
                *pginfo = pginfo_save;
                ret = -EAGAIN;
        } else if ((u64*)hipzout.vaddr != iova_start) {
-               EDEB_ERR(4, "PHYP changed iova_start in rereg_pmr, "
-                        "iova_start=%p iova_start_out=%lx e_mr=%p "
+               ehca_err(&shca->ib_device, "PHYP changed iova_start in "
+                        "rereg_pmr, iova_start=%p iova_start_out=%lx 
e_mr=%p "
                         "mr_handle=%lx lkey=%x lkey_out=%x", iova_start,
                         hipzout.vaddr, e_mr, e_mr->ipz_mr_handle.handle,
                         e_mr->ib.ib_mr.lkey, hipzout.lkey);
@@ -1340,13 +1184,10 @@ ehca_rereg_mr_rereg1_exit1:
        kfree(kpage);
 ehca_rereg_mr_rereg1_exit0:
        if ( ret && (ret != -EAGAIN) )
-               EDEB_EX(4, "ret=%x h_ret=%lx lkey=%x rkey=%x pginfo=%p "
-                       "num_pages=%lx num_4k=%lx", ret, h_ret, *lkey, 
*rkey,
-                       pginfo, pginfo->num_pages, pginfo->num_4k);
-       else
-               EDEB_EX(7, "ret=%x h_ret=%lx lkey=%x rkey=%x pginfo=%p "
-                       "num_pages=%lx num_4k=%lx", ret, h_ret, *lkey, 
*rkey,
-                       pginfo, pginfo->num_pages, pginfo->num_4k);
+               ehca_err(&shca->ib_device, "ret=%x lkey=%x rkey=%x "
+                        "pginfo=%p num_pages=%lx num_4k=%lx",
+                        ret, *lkey, *rkey, pginfo, pginfo->num_pages,
+                        pginfo->num_4k);
        return ret;
 } /* end ehca_rereg_mr_rereg1() */
 
@@ -1363,20 +1204,15 @@ int ehca_rereg_mr(struct ehca_shca *shca
                  u32 *rkey)
 {
        int ret = 0;
-       u64 h_ret = H_SUCCESS;
+       u64 h_ret;
        int rereg_1_hcall = 1; /* 1: use hipz_h_reregister_pmr directly */
        int rereg_3_hcall = 0; /* 1: use 3 hipz calls for reregistration 
*/
 
-       EDEB_EN(7, "shca=%p e_mr=%p iova_start=%p size=%lx acl=%x "
-               "e_pd=%p pginfo=%p num_pages=%lx num_4k=%lx", shca, e_mr,
-               iova_start, size, acl, e_pd, pginfo, pginfo->num_pages,
-               pginfo->num_4k);
-
        /* first determine reregistration hCall(s) */
        if ((pginfo->num_4k > 512) || (e_mr->num_4k > 512) ||
            (pginfo->num_4k > e_mr->num_4k)) {
-               EDEB(7, "Rereg3 case, pginfo->num_4k=%lx "
-                    "e_mr->num_4k=%x", pginfo->num_4k, e_mr->num_4k);
+               ehca_dbg(&shca->ib_device, "Rereg3 case, 
pginfo->num_4k=%lx "
+                        "e_mr->num_4k=%x", pginfo->num_4k, e_mr->num_4k);
                rereg_1_hcall = 0;
                rereg_3_hcall = 1;
        }
@@ -1385,7 +1221,8 @@ int ehca_rereg_mr(struct ehca_shca *shca
                rereg_1_hcall = 0;
                rereg_3_hcall = 1;
                e_mr->flags &= ~EHCA_MR_FLAG_MAXMR;
-               EDEB(4, "Rereg MR for max-MR! e_mr=%p", e_mr);
+               ehca_err(&shca->ib_device, "Rereg MR for max-MR! e_mr=%p",
+                        e_mr);
        }
 
        if (rereg_1_hcall) {
@@ -1405,8 +1242,9 @@ int ehca_rereg_mr(struct ehca_shca *shca
                /* first deregister old MR */
                h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, 
e_mr);
                if (h_ret != H_SUCCESS) {
-                       EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx 
e_mr=%p "
-                                "hca_hndl=%lx mr_hndl=%lx mr->lkey=%x",
+                       ehca_err(&shca->ib_device, "hipz_free_mr failed, "
+                                "h_ret=%lx e_mr=%p hca_hndl=%lx 
mr_hndl=%lx "
+                                "mr->lkey=%x",
                                 h_ret, e_mr, shca->ipz_hca_handle.handle,
                                 e_mr->ipz_mr_handle.handle,
                                 e_mr->ib.ib_mr.lkey);
@@ -1436,18 +1274,12 @@ int ehca_rereg_mr(struct ehca_shca *shca
 
 ehca_rereg_mr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx 
"
-                       "acl=%x e_pd=%p pginfo=%p num_pages=%lx lkey=%x 
rkey=%x"
-                       " rereg_1_hcall=%x rereg_3_hcall=%x", ret, shca, 
e_mr,
-                       iova_start, size, acl, e_pd, pginfo, 
pginfo->num_pages,
-                       *lkey, *rkey, rereg_1_hcall, rereg_3_hcall);
-       else
-               EDEB_EX(7, "ret=%x shca=%p e_mr=%p iova_start=%p size=%lx 
"
-                       "acl=%x e_pd=%p pginfo=%p num_pages=%lx lkey=%x 
rkey=%x"
-                       " rereg_1_hcall=%x rereg_3_hcall=%x", ret, shca, 
e_mr,
-                       iova_start, size, acl, e_pd, pginfo, 
pginfo->num_pages,
-                       *lkey, *rkey, rereg_1_hcall, rereg_3_hcall);
-
+               ehca_err(&shca->ib_device, "ret=%x shca=%p e_mr=%p "
+                        "iova_start=%p size=%lx acl=%x e_pd=%p pginfo=%p 
"
+                        "num_pages=%lx lkey=%x rkey=%x rereg_1_hcall=%x "
+                        "rereg_3_hcall=%x", ret, shca, e_mr, iova_start, 
size,
+                        acl, e_pd, pginfo, pginfo->num_pages, *lkey, 
*rkey,
+                        rereg_1_hcall, rereg_3_hcall);
        return ret;
 } /* end ehca_rereg_mr() */
 
@@ -1457,26 +1289,22 @@ int ehca_unmap_one_fmr(struct ehca_shca 
                       struct ehca_mr *e_fmr)
 {
        int ret = 0;
-       u64 h_ret = H_SUCCESS;
+       u64 h_ret;
        int rereg_1_hcall = 1; /* 1: use hipz_mr_reregister directly */
        int rereg_3_hcall = 0; /* 1: use 3 hipz calls for unmapping */
-       struct ehca_pd *e_pd = NULL;
+       struct ehca_pd *e_pd =
+               container_of(e_fmr->ib.ib_fmr.pd, struct ehca_pd, ib_pd);
        struct ehca_mr save_fmr;
-       u32 tmp_lkey = 0;
-       u32 tmp_rkey = 0;
+       u32 tmp_lkey, tmp_rkey;
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
        struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0};
 
-       EDEB_EN(7, "shca=%p e_fmr=%p", shca, e_fmr);
-
        /* first check if reregistration hCall can be used for unmap */
        if (e_fmr->fmr_max_pages > 512) {
                rereg_1_hcall = 0;
                rereg_3_hcall = 1;
        }
 
-       e_pd = container_of(e_fmr->ib.ib_fmr.pd, struct ehca_pd, ib_pd);
-
        if (rereg_1_hcall) {
                /*
                 * note: after using rereg hcall with len=0,
@@ -1489,10 +1317,10 @@ int ehca_unmap_one_fmr(struct ehca_shca 
                         * should not happen, because length checked 
above,
                         * FMRs are not shared and no MW bound to FMRs
                         */
-                       EDEB_ERR(4, "hipz_reregister_pmr failed (Rereg1), 
"
-                                "h_ret=%lx e_fmr=%p hca_hndl=%lx 
mr_hndl=%lx "
-                                "lkey=%x lkey_out=%x", h_ret, e_fmr,
-                                shca->ipz_hca_handle.handle,
+                       ehca_err(&shca->ib_device, "hipz_reregister_pmr 
failed "
+                                "(Rereg1), h_ret=%lx e_fmr=%p 
hca_hndl=%lx "
+                                "mr_hndl=%lx lkey=%x lkey_out=%x",
+                                h_ret, e_fmr, 
shca->ipz_hca_handle.handle,
                                 e_fmr->ipz_mr_handle.handle,
                                 e_fmr->ib.ib_fmr.lkey, hipzout.lkey);
                        rereg_3_hcall = 1;
@@ -1511,9 +1339,10 @@ int ehca_unmap_one_fmr(struct ehca_shca 
                /* first free old FMR */
                h_ret = hipz_h_free_resource_mr(shca->ipz_hca_handle, 
e_fmr);
                if (h_ret != H_SUCCESS) {
-                       EDEB_ERR(4, "hipz_free_mr failed, h_ret=%lx 
e_fmr=%p "
-                                "hca_hndl=%lx mr_hndl=%lx lkey=%x", 
h_ret,
-                                e_fmr, shca->ipz_hca_handle.handle,
+                       ehca_err(&shca->ib_device, "hipz_free_mr failed, "
+                                "h_ret=%lx e_fmr=%p hca_hndl=%lx 
mr_hndl=%lx "
+                                "lkey=%x",
+                                h_ret, e_fmr, 
shca->ipz_hca_handle.handle,
                                 e_fmr->ipz_mr_handle.handle,
                                 e_fmr->ib.ib_fmr.lkey);
                        ret = ehca_mrmw_map_hrc_free_mr(h_ret);
@@ -1547,9 +1376,11 @@ int ehca_unmap_one_fmr(struct ehca_shca 
        }
 
 ehca_unmap_one_fmr_exit0:
-       EDEB_EX(7, "ret=%x tmp_lkey=%x tmp_rkey=%x fmr_max_pages=%x "
-               "rereg_1_hcall=%x rereg_3_hcall=%x", ret, tmp_lkey, 
tmp_rkey,
-               e_fmr->fmr_max_pages, rereg_1_hcall, rereg_3_hcall);
+       if (ret)
+               ehca_err(&shca->ib_device, "ret=%x tmp_lkey=%x tmp_rkey=%x 
"
+                        "fmr_max_pages=%x rereg_1_hcall=%x 
rereg_3_hcall=%x",
+                        ret, tmp_lkey, tmp_rkey, e_fmr->fmr_max_pages,
+                        rereg_1_hcall, rereg_3_hcall);
        return ret;
 } /* end ehca_unmap_one_fmr() */
 
@@ -1565,13 +1396,10 @@ int ehca_reg_smr(struct ehca_shca *shca,
                 u32 *rkey) /*OUT*/
 {
        int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       u32 hipz_acl = 0;
+       u64 h_ret;
+       u32 hipz_acl;
        struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0};
 
-       EDEB_EN(7,"shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x 
e_pd=%p",
-               shca, e_origmr, e_newmr, iova_start, acl, e_pd);
-
        ehca_mrmw_map_acl(acl, &hipz_acl);
        ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl);
 
@@ -1579,10 +1407,11 @@ int ehca_reg_smr(struct ehca_shca *shca,
                                    (u64)iova_start, hipz_acl, 
e_pd->fw_pd,
                                    &hipzout);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_reg_smr failed, h_ret=%lx shca=%p 
e_origmr=%p"
-                        " e_newmr=%p iova_start=%p acl=%x e_pd=%p 
hca_hndl=%lx"
-                        " mr_hndl=%lx lkey=%x", h_ret, shca, e_origmr, 
e_newmr,
-                        iova_start, acl, e_pd, 
shca->ipz_hca_handle.handle,
+               ehca_err(&shca->ib_device, "hipz_reg_smr failed, h_ret=%lx 
"
+                        "shca=%p e_origmr=%p e_newmr=%p iova_start=%p 
acl=%x "
+                        "e_pd=%p hca_hndl=%lx mr_hndl=%lx lkey=%x",
+                        h_ret, shca, e_origmr, e_newmr, iova_start, acl, 
e_pd,
+                        shca->ipz_hca_handle.handle,
                         e_origmr->ipz_mr_handle.handle,
                         e_origmr->ib.ib_mr.lkey);
                ret = ehca_mrmw_map_hrc_reg_smr(h_ret);
@@ -1597,15 +1426,13 @@ int ehca_reg_smr(struct ehca_shca *shca,
        e_newmr->ipz_mr_handle = hipzout.handle;
        *lkey = hipzout.lkey;
        *rkey = hipzout.rkey;
-       goto ehca_reg_smr_exit0;
+       return 0;
 
 ehca_reg_smr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x shca=%p e_origmr=%p e_newmr=%p "
-                       "iova_start=%p acl=%x e_pd=%p",
-                       ret, shca, e_origmr, e_newmr, iova_start, acl, 
e_pd);
-       else
-               EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey);
+               ehca_err(&shca->ib_device, "ret=%x shca=%p e_origmr=%p "
+                        "e_newmr=%p iova_start=%p acl=%x e_pd=%p",
+                        ret, shca, e_origmr, e_newmr, iova_start, acl, 
e_pd);
        return ret;
 } /* end ehca_reg_smr() */
 
@@ -1617,27 +1444,18 @@ int ehca_reg_internal_maxmr(
        struct ehca_pd *e_pd,
        struct ehca_mr **e_maxmr)  /*OUT*/
 {
-       int ret = 0;
-       struct ehca_mr *e_mr = NULL;
-       u64 *iova_start = NULL;
-       u64 size_maxmr = 0;
+       int ret;
+       struct ehca_mr *e_mr;
+       u64 *iova_start;
+       u64 size_maxmr;
        struct ehca_mr_pginfo 
pginfo={0,0,0,0,0,0,0,NULL,0,NULL,NULL,0,NULL,0};
        struct ib_phys_buf ib_pbuf;
-       u32 num_pages_mr = 0;
-       u32 num_pages_4k = 0; /* 4k portion "pages" */
-
-       EDEB_EN(7, "shca=%p e_pd=%p e_maxmr=%p", shca, e_pd, e_maxmr);
-
-       if (ehca_adr_bad(shca) || ehca_adr_bad(e_pd) || 
ehca_adr_bad(e_maxmr)) {
-               EDEB_ERR(4, "bad input values: shca=%p e_pd=%p 
e_maxmr=%p",
-                        shca, e_pd, e_maxmr);
-               ret = -EINVAL;
-               goto ehca_reg_internal_maxmr_exit0;
-       }
+       u32 num_pages_mr;
+       u32 num_pages_4k; /* 4k portion "pages" */
 
        e_mr = ehca_mr_new();
        if (!e_mr) {
-               EDEB_ERR(4, "out of memory");
+               ehca_err(&shca->ib_device, "out of memory");
                ret = -ENOMEM;
                goto ehca_reg_internal_maxmr_exit0;
        }
@@ -1645,7 +1463,6 @@ int ehca_reg_internal_maxmr(
 
        /* register internal max-MR on HCA */
        size_maxmr = (u64)high_memory - PAGE_OFFSET;
-       EDEB(7, "high_memory=%p PAGE_OFFSET=%lx", high_memory, 
PAGE_OFFSET);
        iova_start = (u64*)KERNELBASE;
        ib_pbuf.addr = 0;
        ib_pbuf.size = size_maxmr;
@@ -1664,8 +1481,8 @@ int ehca_reg_internal_maxmr(
                          &pginfo, &e_mr->ib.ib_mr.lkey,
                          &e_mr->ib.ib_mr.rkey);
        if (ret) {
-               EDEB_ERR(4, "reg of internal max MR failed, e_mr=%p "
-                        "iova_start=%p size_maxmr=%lx num_pages_mr=%x "
+               ehca_err(&shca->ib_device, "reg of internal max MR failed, 
"
+                        "e_mr=%p iova_start=%p size_maxmr=%lx 
num_pages_mr=%x "
                         "num_pages_4k=%x", e_mr, iova_start, size_maxmr,
                         num_pages_mr, num_pages_4k);
                goto ehca_reg_internal_maxmr_exit1;
@@ -1678,18 +1495,14 @@ int ehca_reg_internal_maxmr(
        atomic_inc(&(e_pd->ib_pd.usecnt));
        atomic_set(&(e_mr->ib.ib_mr.usecnt), 0);
        *e_maxmr = e_mr;
-       goto ehca_reg_internal_maxmr_exit0;
+       return 0;
 
 ehca_reg_internal_maxmr_exit1:
        ehca_mr_delete(e_mr);
 ehca_reg_internal_maxmr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x shca=%p e_pd=%p e_maxmr=%p",
-                       ret, shca, e_pd, e_maxmr);
-       else
-               EDEB_EX(7, "*e_maxmr=%p lkey=%x rkey=%x",
-                       *e_maxmr, (*e_maxmr)->ib.ib_mr.lkey,
-                       (*e_maxmr)->ib.ib_mr.rkey);
+               ehca_err(&shca->ib_device, "ret=%x shca=%p e_pd=%p 
e_maxmr=%p",
+                        ret, shca, e_pd, e_maxmr);
        return ret;
 } /* end ehca_reg_internal_maxmr() */
 
@@ -1703,15 +1516,11 @@ int ehca_reg_maxmr(struct ehca_shca *shc
                   u32 *lkey,
                   u32 *rkey)
 {
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
+       u64 h_ret;
        struct ehca_mr *e_origmr = shca->maxmr;
-       u32 hipz_acl = 0;
+       u32 hipz_acl;
        struct ehca_mr_hipzout_parms hipzout = {{0},0,0,0,0,0};
 
-       EDEB_EN(7,"shca=%p e_origmr=%p e_newmr=%p iova_start=%p acl=%x 
e_pd=%p",
-               shca, e_origmr, e_newmr, iova_start, acl, e_pd);
-
        ehca_mrmw_map_acl(acl, &hipz_acl);
        ehca_mrmw_set_pgsize_hipz_acl(&hipz_acl);
 
@@ -1719,13 +1528,12 @@ int ehca_reg_maxmr(struct ehca_shca *shc
                                    (u64)iova_start, hipz_acl, 
e_pd->fw_pd,
                                    &hipzout);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_reg_smr failed, h_ret=%lx e_origmr=%p "
-                        "hca_hndl=%lx mr_hndl=%lx lkey=%x",
+               ehca_err(&shca->ib_device, "hipz_reg_smr failed, h_ret=%lx 
"
+                        "e_origmr=%p hca_hndl=%lx mr_hndl=%lx lkey=%x",
                         h_ret, e_origmr, shca->ipz_hca_handle.handle,
                         e_origmr->ipz_mr_handle.handle,
                         e_origmr->ib.ib_mr.lkey);
-               ret = ehca_mrmw_map_hrc_reg_smr(h_ret);
-               goto ehca_reg_maxmr_exit0;
+               return ehca_mrmw_map_hrc_reg_smr(h_ret);
        }
        /* successful registration */
        e_newmr->num_pages     = e_origmr->num_pages;
@@ -1736,24 +1544,19 @@ int ehca_reg_maxmr(struct ehca_shca *shc
        e_newmr->ipz_mr_handle = hipzout.handle;
        *lkey = hipzout.lkey;
        *rkey = hipzout.rkey;
-
-ehca_reg_maxmr_exit0:
-       EDEB_EX(7, "ret=%x lkey=%x rkey=%x", ret, *lkey, *rkey);
-       return ret;
+       return 0;
 } /* end ehca_reg_maxmr() */
 
 
/*----------------------------------------------------------------------*/
 
 int ehca_dereg_internal_maxmr(struct ehca_shca *shca)
 {
-       int ret = 0;
-       struct ehca_mr *e_maxmr = NULL;
-       struct ib_pd *ib_pd = NULL;
-
-       EDEB_EN(7, "shca=%p shca->maxmr=%p", shca, shca->maxmr);
+       int ret;
+       struct ehca_mr *e_maxmr;
+       struct ib_pd *ib_pd;
 
        if (!shca->maxmr) {
-               EDEB_ERR(4, "bad call, shca=%p", shca);
+               ehca_err(&shca->ib_device, "bad call, shca=%p", shca);
                ret = -EINVAL;
                goto ehca_dereg_internal_maxmr_exit0;
        }
@@ -1764,7 +1567,7 @@ int ehca_dereg_internal_maxmr(struct ehc
 
        ret = ehca_dereg_mr(&e_maxmr->ib.ib_mr);
        if (ret) {
-               EDEB_ERR(3, "dereg internal max-MR failed, "
+               ehca_err(&shca->ib_device, "dereg internal max-MR failed, 
"
                         "ret=%x e_maxmr=%p shca=%p lkey=%x",
                         ret, e_maxmr, shca, e_maxmr->ib.ib_mr.lkey);
                shca->maxmr = e_maxmr;
@@ -1775,10 +1578,8 @@ int ehca_dereg_internal_maxmr(struct ehc
 
 ehca_dereg_internal_maxmr_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x shca=%p shca->maxmr=%p",
-                       ret, shca, shca->maxmr);
-       else
-               EDEB_EX(7, "");
+               ehca_err(&shca->ib_device, "ret=%x shca=%p 
shca->maxmr=%p",
+                        ret, shca, shca->maxmr);
        return ret;
 } /* end ehca_dereg_internal_maxmr() */
 
@@ -1798,34 +1599,35 @@ int ehca_mr_chk_buf_and_calc_size(struct
        u32 i;
 
        if (num_phys_buf == 0) {
-               EDEB_ERR(4, "bad phys buf array len, num_phys_buf=0");
+               ehca_gen_err("bad phys buf array len, num_phys_buf=0");
                return -EINVAL;
        }
        /* check first buffer */
        if (((u64)iova_start & ~PAGE_MASK) != (pbuf->addr & ~PAGE_MASK)) {
-               EDEB_ERR(4, "iova_start/addr mismatch, iova_start=%p "
-                        "pbuf->addr=%lx pbuf->size=%lx",
-                        iova_start, pbuf->addr, pbuf->size);
+               ehca_gen_err("iova_start/addr mismatch, iova_start=%p "
+                            "pbuf->addr=%lx pbuf->size=%lx",
+                            iova_start, pbuf->addr, pbuf->size);
                return -EINVAL;
        }
        if (((pbuf->addr + pbuf->size) % PAGE_SIZE) &&
            (num_phys_buf > 1)) {
-               EDEB_ERR(4, "addr/size mismatch in 1st buf, pbuf->addr=%lx 
"
-                        "pbuf->size=%lx", pbuf->addr, pbuf->size);
+               ehca_gen_err("addr/size mismatch in 1st buf, 
pbuf->addr=%lx "
+                            "pbuf->size=%lx", pbuf->addr, pbuf->size);
                return -EINVAL;
        }
 
        for (i = 0; i < num_phys_buf; i++) {
                if ((i > 0) && (pbuf->addr % PAGE_SIZE)) {
-                       EDEB_ERR(4, "bad address, i=%x pbuf->addr=%lx "
-                                "pbuf->size=%lx", i, pbuf->addr, 
pbuf->size);
+                       ehca_gen_err("bad address, i=%x pbuf->addr=%lx "
+                                    "pbuf->size=%lx",
+                                    i, pbuf->addr, pbuf->size);
                        return -EINVAL;
                }
                if (((i > 0) && /* not 1st */
                     (i < (num_phys_buf - 1)) &&        /* not last */
                     (pbuf->size % PAGE_SIZE)) || (pbuf->size == 0)) {
-                       EDEB_ERR(4, "bad size, i=%x pbuf->size=%lx",
-                                i, pbuf->size);
+                       ehca_gen_err("bad size, i=%x pbuf->size=%lx",
+                                    i, pbuf->size);
                        return -EINVAL;
                }
                size_count += pbuf->size;
@@ -1844,17 +1646,12 @@ int ehca_fmr_check_page_list(struct ehca
                             int list_len)
 {
        u32 i;
-       u64 *page = NULL;
-
-       if (ehca_adr_bad(page_list)) {
-               EDEB_ERR(4, "bad page_list, page_list=%p fmr=%p",
-                        page_list, e_fmr);
-               return -EINVAL;
-       }
+       u64 *page;
 
        if ((list_len == 0) || (list_len > e_fmr->fmr_max_pages)) {
-               EDEB_ERR(4, "bad list_len, list_len=%x 
e_fmr->fmr_max_pages=%x "
-                        "fmr=%p", list_len, e_fmr->fmr_max_pages, e_fmr);
+               ehca_gen_err("bad list_len, list_len=%x "
+                            "e_fmr->fmr_max_pages=%x fmr=%p",
+                            list_len, e_fmr->fmr_max_pages, e_fmr);
                return -EINVAL;
        }
 
@@ -1862,9 +1659,9 @@ int ehca_fmr_check_page_list(struct ehca
        page = page_list;
        for (i = 0; i < list_len; i++) {
                if (*page % e_fmr->fmr_page_size) {
-                       EDEB_ERR(4, "bad page, i=%x *page=%lx page=%p "
-                                "fmr=%p fmr_page_size=%x",
-                                i, *page, page, e_fmr, 
e_fmr->fmr_page_size);
+                       ehca_gen_err("bad page, i=%x *page=%lx page=%p 
fmr=%p "
+                                    "fmr_page_size=%x", i, *page, page, 
e_fmr,
+                                    e_fmr->fmr_page_size);
                        return -EINVAL;
                }
                page++;
@@ -1882,24 +1679,14 @@ int ehca_set_pagebuf(struct ehca_mr *e_m
                     u64 *kpage)
 {
        int ret = 0;
-       struct ib_umem_chunk *prev_chunk = NULL;
-       struct ib_umem_chunk *chunk      = NULL;
-       struct ib_phys_buf *pbuf         = NULL;
-       u64 *fmrlist = NULL;
-       u64 num4k  = 0;
-       u64 pgaddr = 0;
-       u64 offs4k = 0;
+       struct ib_umem_chunk *prev_chunk;
+       struct ib_umem_chunk *chunk;
+       struct ib_phys_buf *pbuf;
+       u64 *fmrlist;
+       u64 num4k, pgaddr, offs4k;
        u32 i = 0;
        u32 j = 0;
 
-       EDEB_EN(7, "pginfo=%p type=%x num_pages=%lx num_4k=%lx 
next_buf=%lx "
-               "next_4k=%lx number=%x kpage=%p page_cnt=%lx 
page_4k_cnt=%lx "
-               "next_listelem=%lx region=%p next_chunk=%p next_nmap=%lx",
-               pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k,
-               pginfo->next_buf, pginfo->next_4k, number, kpage,
-               pginfo->page_cnt, pginfo->page_4k_cnt, 
pginfo->next_listelem,
-               pginfo->region, pginfo->next_chunk, pginfo->next_nmap);
-
        if (pginfo->type == EHCA_MR_PGI_PHYS) {
                /* loop over desired phys_buf_array entries */
                while (i < number) {
@@ -1911,23 +1698,27 @@ int ehca_set_pagebuf(struct ehca_mr *e_m
                                /* sanity check */
                                if ((pginfo->page_cnt >= 
pginfo->num_pages) ||
                                    (pginfo->page_4k_cnt >= 
pginfo->num_4k)) {
-                                       EDEB_ERR(4, "page_cnt >= 
num_pages, "
-                                                "page_cnt=%lx 
num_pages=%lx "
-                                                "page_4k_cnt=%lx 
num_4k=%lx "
-                                                "i=%x", pginfo->page_cnt,
-                                                pginfo->num_pages,
-                                                pginfo->page_4k_cnt,
-                                                pginfo->num_4k, i);
+                                       ehca_gen_err("page_cnt >= 
num_pages, "
+                                                    "page_cnt=%lx "
+                                                    "num_pages=%lx "
+                                                    "page_4k_cnt=%lx "
+                                                    "num_4k=%lx i=%x",
+                                                    pginfo->page_cnt,
+                                                    pginfo->num_pages,
+                                                    pginfo->page_4k_cnt,
+                                                    pginfo->num_4k, i);
                                        ret = -EFAULT;
+                                       goto ehca_set_pagebuf_exit0;
                                }
                                *kpage = phys_to_abs(
                                        (pbuf->addr & EHCA_PAGEMASK)
                                        + (pginfo->next_4k * 
EHCA_PAGESIZE));
                                if ( !(*kpage) && pbuf->addr ) {
-                                       EDEB_ERR(4, "pbuf->addr=%lx "
-                                                "pbuf->size=%lx 
next_4k=%lx",
-                                                pbuf->addr, pbuf->size,
-                                                pginfo->next_4k);
+                                       ehca_gen_err("pbuf->addr=%lx "
+                                                    "pbuf->size=%lx "
+                                                    "next_4k=%lx", 
pbuf->addr,
+                                                    pbuf->size,
+                                                    pginfo->next_4k);
                                        ret = -EFAULT;
                                        goto ehca_set_pagebuf_exit0;
                                }
@@ -1952,23 +1743,21 @@ int ehca_set_pagebuf(struct ehca_mr *e_m
                list_for_each_entry_continue(chunk,
 (&(pginfo->region->chunk_list)),
                                             list) {
-                       EDEB(9, "chunk->page_list[0]=%lx",
-                            (u64)sg_dma_address(&chunk->page_list[0]));
                        for (i = pginfo->next_nmap; i < chunk->nmap; ) {
                                pgaddr = ( 
page_to_pfn(chunk->page_list[i].page)
                                           << PAGE_SHIFT );
                                *kpage = phys_to_abs(pgaddr +
                                                     (pginfo->next_4k *
                                                      EHCA_PAGESIZE));
-                               EDEB(9,"pgaddr=%lx *kpage=%lx 
next_4k=%lx",
-                                    pgaddr, *kpage, pginfo->next_4k);
                                if ( !(*kpage) ) {
-                                       EDEB_ERR(4, "pgaddr=%lx "
-                                                "chunk->page_list[i]=%lx 
i=%x "
-                                                "next_4k=%lx mr=%p", 
pgaddr,
-                                                (u64)sg_dma_address(
- &chunk->page_list[i]),
-                                                i, pginfo->next_4k, 
e_mr);
+                                       ehca_gen_err("pgaddr=%lx "
+ "chunk->page_list[i]=%lx "
+                                                    "i=%x next_4k=%lx 
mr=%p",
+                                                    pgaddr,
+                                                    (u64)sg_dma_address(
+                                                            &chunk->
+ page_list[i]),
+                                                    i, pginfo->next_4k, 
e_mr);
                                        ret = -EFAULT;
                                        goto ehca_set_pagebuf_exit0;
                                }
@@ -2009,10 +1798,11 @@ int ehca_set_pagebuf(struct ehca_mr *e_m
                        *kpage = phys_to_abs((*fmrlist & EHCA_PAGEMASK) +
                                             pginfo->next_4k * 
EHCA_PAGESIZE);
                        if ( !(*kpage) ) {
-                               EDEB_ERR(4, "*fmrlist=%lx fmrlist=%p "
-                                        "next_listelem=%lx next_4k=%lx",
-                                        *fmrlist, fmrlist,
- pginfo->next_listelem,pginfo->next_4k);
+                               ehca_gen_err("*fmrlist=%lx fmrlist=%p "
+                                            "next_listelem=%lx 
next_4k=%lx",
+                                            *fmrlist, fmrlist,
+                                            pginfo->next_listelem,
+                                            pginfo->next_4k);
                                ret = -EFAULT;
                                goto ehca_set_pagebuf_exit0;
                        }
@@ -2028,32 +1818,23 @@ int ehca_set_pagebuf(struct ehca_mr *e_m
                        }
                }
        } else {
-               EDEB_ERR(4, "bad pginfo->type=%x", pginfo->type);
+               ehca_gen_err("bad pginfo->type=%x", pginfo->type);
                ret = -EFAULT;
                goto ehca_set_pagebuf_exit0;
        }
 
 ehca_set_pagebuf_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx 
"
-                       "num_4k=%lx next_buf=%lx next_4k=%lx number=%x "
-                       "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x "
-                       "next_listelem=%lx region=%p next_chunk=%p "
-                       "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type,
-                       pginfo->num_pages, pginfo->num_4k, 
pginfo->next_buf,
-                       pginfo->next_4k, number, kpage, pginfo->page_cnt,
-                       pginfo->page_4k_cnt, i, pginfo->next_listelem,
-                       pginfo->region, pginfo->next_chunk, 
pginfo->next_nmap);
-       else
-               EDEB_EX(7, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx 
"
-                       "num_4k=%lx next_buf=%lx next_4k=%lx number=%x "
-                       "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x "
-                       "next_listelem=%lx region=%p next_chunk=%p "
-                       "next_nmap=%lx", ret, e_mr, pginfo, pginfo->type,
-                       pginfo->num_pages, pginfo->num_4k, 
pginfo->next_buf,
-                       pginfo->next_4k, number, kpage, pginfo->page_cnt,
-                       pginfo->page_4k_cnt, i, pginfo->next_listelem,
-                       pginfo->region, pginfo->next_chunk, 
pginfo->next_nmap);
+               ehca_gen_err("ret=%x e_mr=%p pginfo=%p type=%x 
num_pages=%lx "
+                            "num_4k=%lx next_buf=%lx next_4k=%lx 
number=%x "
+                            "kpage=%p page_cnt=%lx page_4k_cnt=%lx i=%x "
+                            "next_listelem=%lx region=%p next_chunk=%p "
+                            "next_nmap=%lx", ret, e_mr, pginfo, 
pginfo->type,
+                            pginfo->num_pages, pginfo->num_4k,
+                            pginfo->next_buf, pginfo->next_4k, number, 
kpage,
+                            pginfo->page_cnt, pginfo->page_4k_cnt, i,
+                            pginfo->next_listelem, pginfo->region,
+                            pginfo->next_chunk, pginfo->next_nmap);
        return ret;
 } /* end ehca_set_pagebuf() */
 
@@ -2065,30 +1846,20 @@ int ehca_set_pagebuf_1(struct ehca_mr *e
                       u64 *rpage)
 {
        int ret = 0;
-       struct ib_phys_buf *tmp_pbuf = NULL;
-       u64 *fmrlist = NULL;
-       struct ib_umem_chunk *chunk = NULL;
-       struct ib_umem_chunk *prev_chunk = NULL;
-       u64 pgaddr = 0;
-       u64 num4k = 0;
-       u64 offs4k = 0;
-
-       EDEB_EN(7, "pginfo=%p type=%x num_pages=%lx num_4k=%lx 
next_buf=%lx "
-               "next_4k=%lx rpage=%p page_cnt=%lx page_4k_cnt=%lx "
-               "next_listelem=%lx region=%p next_chunk=%p next_nmap=%lx",
-               pginfo, pginfo->type, pginfo->num_pages, pginfo->num_4k,
-               pginfo->next_buf, pginfo->next_4k, rpage, 
pginfo->page_cnt,
-               pginfo->page_4k_cnt, pginfo->next_listelem, 
pginfo->region,
-               pginfo->next_chunk, pginfo->next_nmap);
+       struct ib_phys_buf *tmp_pbuf;
+       u64 *fmrlist;
+       struct ib_umem_chunk *chunk;
+       struct ib_umem_chunk *prev_chunk;
+       u64 pgaddr, num4k, offs4k;
 
        if (pginfo->type == EHCA_MR_PGI_PHYS) {
                /* sanity check */
                if ((pginfo->page_cnt >= pginfo->num_pages) ||
                    (pginfo->page_4k_cnt >= pginfo->num_4k)) {
-                       EDEB_ERR(4, "page_cnt >= num_pages, page_cnt=%lx "
-                                "num_pages=%lx page_4k_cnt=%lx 
num_4k=%lx",
-                                pginfo->page_cnt, pginfo->num_pages,
-                                pginfo->page_4k_cnt, pginfo->num_4k);
+                       ehca_gen_err("page_cnt >= num_pages, page_cnt=%lx 
"
+                                    "num_pages=%lx page_4k_cnt=%lx 
num_4k=%lx",
+                                    pginfo->page_cnt, pginfo->num_pages,
+                                    pginfo->page_4k_cnt, pginfo->num_4k);
                        ret = -EFAULT;
                        goto ehca_set_pagebuf_1_exit0;
                }
@@ -2099,10 +1870,10 @@ int ehca_set_pagebuf_1(struct ehca_mr *e
                *rpage = phys_to_abs((tmp_pbuf->addr & EHCA_PAGEMASK) +
                                     (pginfo->next_4k * EHCA_PAGESIZE));
                if ( !(*rpage) && tmp_pbuf->addr ) {
-                       EDEB_ERR(4, "tmp_pbuf->addr=%lx"
-                                " tmp_pbuf->size=%lx next_4k=%lx",
-                                tmp_pbuf->addr, tmp_pbuf->size,
-                                pginfo->next_4k);
+                       ehca_gen_err("tmp_pbuf->addr=%lx"
+                                    " tmp_pbuf->size=%lx next_4k=%lx",
+                                    tmp_pbuf->addr, tmp_pbuf->size,
+                                    pginfo->next_4k);
                        ret = -EFAULT;
                        goto ehca_set_pagebuf_1_exit0;
                }
@@ -2125,16 +1896,15 @@ int ehca_set_pagebuf_1(struct ehca_mr *e
                                   << PAGE_SHIFT);
                        *rpage = phys_to_abs(pgaddr +
                                             (pginfo->next_4k * 
EHCA_PAGESIZE));
-                       EDEB(9,"pgaddr=%lx *rpage=%lx next_4k=%lx", 
pgaddr,
-                            *rpage, pginfo->next_4k);
                        if ( !(*rpage) ) {
-                               EDEB_ERR(4, "pgaddr=%lx 
chunk->page_list[]=%lx "
-                                        "next_nmap=%lx next_4k=%lx 
mr=%p",
-                                        pgaddr, (u64)sg_dma_address(
-                                                &chunk->page_list[
- pginfo->next_nmap]),
-                                        pginfo->next_nmap, 
pginfo->next_4k,
-                                        e_mr);
+                               ehca_gen_err("pgaddr=%lx 
chunk->page_list[]=%lx"
+                                            " next_nmap=%lx next_4k=%lx 
mr=%p",
+                                            pgaddr, (u64)sg_dma_address(
+                                                    &chunk->page_list[
+                                                            pginfo->
+                                                            next_nmap]),
+                                            pginfo->next_nmap, 
pginfo->next_4k,
+                                            e_mr);
                                ret = -EFAULT;
                                goto ehca_set_pagebuf_1_exit0;
                        }
@@ -2161,9 +1931,10 @@ int ehca_set_pagebuf_1(struct ehca_mr *e
                *rpage = phys_to_abs((*fmrlist & EHCA_PAGEMASK) +
                                     pginfo->next_4k * EHCA_PAGESIZE);
                if ( !(*rpage) ) {
-                       EDEB_ERR(4, "*fmrlist=%lx fmrlist=%p 
next_listelem=%lx "
-                                "next_4k=%lx", *fmrlist, fmrlist,
-                                pginfo->next_listelem, pginfo->next_4k);
+                       ehca_gen_err("*fmrlist=%lx fmrlist=%p "
+                                    "next_listelem=%lx next_4k=%lx",
+                                    *fmrlist, fmrlist, 
pginfo->next_listelem,
+                                    pginfo->next_4k);
                        ret = -EFAULT;
                        goto ehca_set_pagebuf_1_exit0;
                }
@@ -2176,32 +1947,22 @@ int ehca_set_pagebuf_1(struct ehca_mr *e
                        pginfo->next_4k = 0;
                }
        } else {
-               EDEB_ERR(4, "bad pginfo->type=%x", pginfo->type);
+               ehca_gen_err("bad pginfo->type=%x", pginfo->type);
                ret = -EFAULT;
                goto ehca_set_pagebuf_1_exit0;
        }
 
 ehca_set_pagebuf_1_exit0:
        if (ret)
-               EDEB_EX(4, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx 
"
-                       "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p "
-                       "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx "
-                       "region=%p next_chunk=%p next_nmap=%lx", ret, 
e_mr,
-                       pginfo, pginfo->type, pginfo->num_pages, 
pginfo->num_4k,
-                       pginfo->next_buf, pginfo->next_4k, rpage,
-                       pginfo->page_cnt, pginfo->page_4k_cnt,
-                       pginfo->next_listelem, pginfo->region,
-                       pginfo->next_chunk, pginfo->next_nmap);
-       else
-               EDEB_EX(7, "ret=%x e_mr=%p pginfo=%p type=%x num_pages=%lx 
"
-                       "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p "
-                       "page_cnt=%lx page_4k_cnt=%lx next_listelem=%lx "
-                       "region=%p next_chunk=%p next_nmap=%lx", ret, 
e_mr,
-                       pginfo, pginfo->type, pginfo->num_pages, 
pginfo->num_4k,
-                       pginfo->next_buf, pginfo->next_4k, rpage,
-                       pginfo->page_cnt, pginfo->page_4k_cnt,
-                       pginfo->next_listelem, pginfo->region,
-                       pginfo->next_chunk, pginfo->next_nmap);
+               ehca_gen_err("ret=%x e_mr=%p pginfo=%p type=%x 
num_pages=%lx "
+                            "num_4k=%lx next_buf=%lx next_4k=%lx rpage=%p 
"
+                            "page_cnt=%lx page_4k_cnt=%lx 
next_listelem=%lx "
+                            "region=%p next_chunk=%p next_nmap=%lx", ret, 
e_mr,
+                            pginfo, pginfo->type, pginfo->num_pages,
+                            pginfo->num_4k, pginfo->next_buf, 
pginfo->next_4k,
+                            rpage, pginfo->page_cnt, pginfo->page_4k_cnt,
+                            pginfo->next_listelem, pginfo->region,
+                            pginfo->next_chunk, pginfo->next_nmap);
        return ret;
 } /* end ehca_set_pagebuf_1() */
 
@@ -2217,7 +1978,7 @@ int ehca_mr_is_maxmr(u64 size,
        /* a MR is treated as max-MR only if it fits following: */
        if ((size == ((u64)high_memory - PAGE_OFFSET)) &&
            (iova_start == (void*)KERNELBASE)) {
-               EDEB(6, "this is a max-MR");
+               ehca_gen_dbg("this is a max-MR");
                return 1;
        } else
                return 0;
@@ -2470,3 +2231,31 @@ void ehca_mr_deletenew(struct ehca_mr *m
        mr->nr_of_pages   = 0;
        mr->pagearray     = NULL;
 } /* end ehca_mr_deletenew() */
+
+int ehca_init_mrmw_cache(void)
+{
+       mr_cache = kmem_cache_create("ehca_cache_mr",
+                                    sizeof(struct ehca_mr), 0,
+                                    SLAB_HWCACHE_ALIGN,
+                                    NULL, NULL);
+       if (!mr_cache)
+               return -ENOMEM;
+       mw_cache = kmem_cache_create("ehca_cache_mw",
+                                    sizeof(struct ehca_mw), 0,
+                                    SLAB_HWCACHE_ALIGN,
+                                    NULL, NULL);
+       if (!mw_cache) {
+               kmem_cache_destroy(mr_cache);
+               mr_cache = NULL;
+               return -ENOMEM;
+       }
+       return 0;
+}
+
+void ehca_cleanup_mrmw_cache(void)
+{
+       if (mr_cache)
+               kmem_cache_destroy(mr_cache);
+       if (mw_cache)
+               kmem_cache_destroy(mw_cache);
+}
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.h 
linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.h
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_mrmw.h       2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_mrmw.h    2006-08-30 
20:00:16.000000000 +0200
@@ -42,9 +42,6 @@
 #ifndef _EHCA_MRMW_H_
 #define _EHCA_MRMW_H_
 
-#undef DEB_PREFIX
-#define DEB_PREFIX "mrmw"
-
 int ehca_reg_mr(struct ehca_shca *shca,
                struct ehca_mr *e_mr,
                u64 *iova_start,
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_pd.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_pd.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_pd.c 2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_pd.c      2006-08-30 
20:00:16.000000000 +0200
@@ -38,29 +38,22 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-
-#define DEB_PREFIX "vpd "
-
 #include <asm/current.h>
 
 #include "ehca_tools.h"
 #include "ehca_iverbs.h"
 
+static struct kmem_cache *pd_cache;
+
 struct ib_pd *ehca_alloc_pd(struct ib_device *device,
                            struct ib_ucontext *context, struct ib_udata 
*udata)
 {
-       extern struct ehca_module ehca_module;
-       struct ib_pd *mypd = NULL;
-       struct ehca_pd *pd = NULL;
-
-       EDEB_EN(7, "device=%p context=%p udata=%p", device, context, 
udata);
+       struct ehca_pd *pd;
 
-       EHCA_CHECK_DEVICE_P(device);
-
-       pd = kmem_cache_alloc(ehca_module.cache_pd, SLAB_KERNEL);
+       pd = kmem_cache_alloc(pd_cache, SLAB_KERNEL);
        if (!pd) {
-               EDEB_ERR(4, "ERROR device=%p context=%p pd=%p"
-                        " out of memory", device, context, mypd);
+               ehca_err(device, "device=%p context=%p out of memory",
+                        device, context);
                return ERR_PTR(-ENOMEM);
        }
 
@@ -82,39 +75,40 @@ struct ib_pd *ehca_alloc_pd(struct ib_de
        } else
                pd->fw_pd.value = (u64)pd;
 
-       mypd = &pd->ib_pd;
-
-       EHCA_REGISTER_PD(device, pd);
-
-       EDEB_EX(7, "device=%p context=%p pd=%p", device, context, mypd);
-
-       return mypd;
+       return &pd->ib_pd;
 }
 
 int ehca_dealloc_pd(struct ib_pd *pd)
 {
-       extern struct ehca_module ehca_module;
-       int ret = 0;
        u32 cur_pid = current->tgid;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_pd *my_pd = container_of(pd, struct ehca_pd, ib_pd);
 
-       EDEB_EN(7, "pd=%p", pd);
-
-       EHCA_CHECK_PD(pd);
-       my_pd = container_of(pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(pd->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                return -EINVAL;
        }
 
-       EHCA_DEREGISTER_PD(pd);
-
-       kmem_cache_free(ehca_module.cache_pd,
+       kmem_cache_free(pd_cache,
                        container_of(pd, struct ehca_pd, ib_pd));
 
-       EDEB_EX(7, "pd=%p", pd);
+       return 0;
+}
 
-       return ret;
+int ehca_init_pd_cache(void)
+{
+       pd_cache = kmem_cache_create("ehca_cache_pd",
+                                    sizeof(struct ehca_pd), 0,
+                                    SLAB_HWCACHE_ALIGN,
+                                    NULL, NULL);
+       if (!pd_cache)
+               return -ENOMEM;
+       return 0;
+}
+
+void ehca_cleanup_pd_cache(void)
+{
+       if (pd_cache)
+               kmem_cache_destroy(pd_cache);
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_qp.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_qp.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_qp.c 2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_qp.c      2006-08-30 
20:00:16.000000000 +0200
@@ -42,8 +42,6 @@
  */
 
 
-#define DEB_PREFIX "e_qp"
-
 #include <asm/current.h>
 
 #include "ehca_classes.h"
@@ -53,6 +51,8 @@
 #include "hcp_if.h"
 #include "hipz_fns.h"
 
+static struct kmem_cache *qp_cache;
+
 /*
  * attributes not supported by query qp
  */
@@ -114,7 +114,7 @@ static inline enum ehca_qp_state ib2ehca
        case IB_QPS_ERR:
                return EHCA_QPS_ERR;
        default:
-               EDEB_ERR(4, "invalid ib_qp_state=%x", ib_qp_state);
+               ehca_gen_err("invalid ib_qp_state=%x", ib_qp_state);
                return -EINVAL;
        }
 }
@@ -142,7 +142,7 @@ static inline enum ib_qp_state ehca2ib_q
        case EHCA_QPS_ERR:
                return IB_QPS_ERR;
        default:
-               EDEB_ERR(4,"invalid ehca_qp_state=%x",ehca_qp_state);
+               ehca_gen_err("invalid ehca_qp_state=%x", ehca_qp_state);
                return -EINVAL;
        }
 }
@@ -176,7 +176,7 @@ static inline enum ehca_qp_type ib2ehcaq
        case IB_QPT_UD:
                return QPT_UD;
        default:
-               EDEB_ERR(4,"Invalid ibqptype=%x", ibqptype);
+               ehca_gen_err("Invalid ibqptype=%x", ibqptype);
                return -EINVAL;
        }
 }
@@ -190,24 +190,34 @@ static inline enum ib_qp_statetrans get_
                index = IB_QPST_ANY2RESET;
                break;
        case IB_QPS_INIT:
-               if (ib_fromstate == IB_QPS_RESET)
+               switch (ib_fromstate) {
+               case IB_QPS_RESET:
                        index = IB_QPST_RESET2INIT;
-               else if (ib_fromstate == IB_QPS_INIT)
+                       break;
+               case IB_QPS_INIT:
                        index = IB_QPST_INIT2INIT;
+                       break;
+               }
                break;
        case IB_QPS_RTR:
                if (ib_fromstate == IB_QPS_INIT)
                        index = IB_QPST_INIT2RTR;
                break;
        case IB_QPS_RTS:
-               if (ib_fromstate == IB_QPS_RTR)
+               switch (ib_fromstate) {
+               case IB_QPS_RTR:
                        index = IB_QPST_RTR2RTS;
-               else if (ib_fromstate == IB_QPS_RTS)
+                       break;
+               case IB_QPS_RTS:
                        index = IB_QPST_RTS2RTS;
-               else if (ib_fromstate == IB_QPS_SQD)
+                       break;
+               case IB_QPS_SQD:
                        index = IB_QPST_SQD2RTS;
-               else if (ib_fromstate == IB_QPS_SQE)
+                       break;
+               case IB_QPS_SQE:
                        index = IB_QPST_SQE2RTS;
+                       break;
+               }
                break;
        case IB_QPS_SQD:
                if (ib_fromstate == IB_QPS_RTS)
@@ -252,7 +262,7 @@ static inline int ibqptype2servicetype(e
        case IB_QPT_RAW_ETY:
                return -EINVAL;
        default:
-               EDEB_ERR(4, "Invalid ibqptype=%x", ibqptype);
+               ehca_gen_err("Invalid ibqptype=%x", ibqptype);
                return -EINVAL;
        }
 }
@@ -260,7 +270,7 @@ static inline int ibqptype2servicetype(e
 /*
  * init_qp_queues initializes/constructs r/squeue and registers queue 
pages.
  */
-static inline int init_qp_queues(struct ipz_adapter_handle 
ipz_hca_handle,
+static inline int init_qp_queues(struct ehca_shca *shca,
                                 struct ehca_qp *my_qp,
                                 int nr_sq_pages,
                                 int nr_rq_pages,
@@ -268,28 +278,26 @@ static inline int init_qp_queues(struct 
                                 int rwqe_size,
                                 int nr_send_sges, int nr_receive_sges)
 {
-       int ret = -EINVAL;
-       int cnt = 0;
-       void *vpage = NULL;
-       u64 rpage = 0;
-       int ipz_rc = -1;
-       u64 h_ret = H_PARAMETER;
+       int ret, cnt, ipz_rc;
+       void *vpage;
+       u64 rpage, h_ret;
+       struct ib_device *ib_dev = &shca->ib_device;
+       struct ipz_adapter_handle ipz_hca_handle = shca->ipz_hca_handle;
 
        ipz_rc = ipz_queue_ctor(&my_qp->ipz_squeue,
                                nr_sq_pages,
                                EHCA_PAGESIZE, swqe_size, nr_send_sges);
        if (!ipz_rc) {
-               EDEB_ERR(4, "Cannot allocate page for squeue. ipz_rc=%x",
+               ehca_err(ib_dev,"Cannot allocate page for squeue. 
ipz_rc=%x",
                         ipz_rc);
-               ret = -EBUSY;
-               return ret;
+               return -EBUSY;
        }
 
        ipz_rc = ipz_queue_ctor(&my_qp->ipz_rqueue,
                                nr_rq_pages,
                                EHCA_PAGESIZE, rwqe_size, 
nr_receive_sges);
        if (!ipz_rc) {
-               EDEB_ERR(4, "Cannot allocate page for rqueue. ipz_rc=%x",
+               ehca_err(ib_dev, "Cannot allocate page for rqueue. 
ipz_rc=%x",
                         ipz_rc);
                ret = -EBUSY;
                goto init_qp_queues0;
@@ -298,7 +306,7 @@ static inline int init_qp_queues(struct 
        for (cnt = 0; cnt < nr_sq_pages; cnt++) {
                vpage = ipz_qpageit_get_inc(&my_qp->ipz_squeue);
                if (!vpage) {
-                       EDEB_ERR(4, "SQ ipz_qpageit_get_inc() "
+                       ehca_err(ib_dev, "SQ ipz_qpageit_get_inc() "
                                 "failed p_vpage= %p", vpage);
                        ret = -EINVAL;
                        goto init_qp_queues1;
@@ -311,8 +319,8 @@ static inline int init_qp_queues(struct 
                                                 rpage, 1,
                                                 my_qp->galpas.kernel);
                if (h_ret < H_SUCCESS) {
-                       EDEB_ERR(4,"SQ  hipz_qp_register_rpage() faield "
-                                "rc=%lx", h_ret);
+                       ehca_err(ib_dev, "SQ hipz_qp_register_rpage()"
+                                " failed rc=%lx", h_ret);
                        ret = ehca2ib_return_code(h_ret);
                        goto init_qp_queues1;
                }
@@ -324,9 +332,8 @@ static inline int init_qp_queues(struct 
        for (cnt = 0; cnt < nr_rq_pages; cnt++) {
                vpage = ipz_qpageit_get_inc(&my_qp->ipz_rqueue);
                if (!vpage) {
-                       EDEB_ERR(4,"RQ ipz_qpageit_get_inc() "
+                       ehca_err(ib_dev, "RQ ipz_qpageit_get_inc() "
                                 "failed p_vpage = %p", vpage);
-                       h_ret = H_RESOURCE;
                        ret = -EINVAL;
                        goto init_qp_queues1;
                }
@@ -338,29 +345,28 @@ static inline int init_qp_queues(struct 
                                                 &my_qp->pf, 0, 1,
                                                 rpage, 
1,my_qp->galpas.kernel);
                if (h_ret < H_SUCCESS) {
-                       EDEB_ERR(4, "RQ hipz_qp_register_rpage() failed "
+                       ehca_err(ib_dev, "RQ hipz_qp_register_rpage() 
failed "
                                 "rc=%lx", h_ret);
                        ret = ehca2ib_return_code(h_ret);
                        goto init_qp_queues1;
                }
                if (cnt == (nr_rq_pages - 1)) { /* last page! */
                        if (h_ret != H_SUCCESS) {
-                               EDEB_ERR(4,"RQ hipz_qp_register_rpage() "
+                               ehca_err(ib_dev, "RQ 
hipz_qp_register_rpage() "
                                         "h_ret= %lx ", h_ret);
                                ret = ehca2ib_return_code(h_ret);
                                goto init_qp_queues1;
                        }
                        vpage = ipz_qpageit_get_inc(&my_qp->ipz_rqueue);
                        if (vpage) {
-                               EDEB_ERR(4,"ipz_qpageit_get_inc() "
-                                        "should not succeed vpage=%p",
-                                        vpage);
+                               ehca_err(ib_dev, "ipz_qpageit_get_inc() "
+                                        "should not succeed vpage=%p", 
vpage);
                                ret = -EINVAL;
                                goto init_qp_queues1;
                        }
                } else {
                        if (h_ret != H_PAGE_REGISTERED) {
-                               EDEB_ERR(4,"RQ hipz_qp_register_rpage() "
+                               ehca_err(ib_dev, "RQ 
hipz_qp_register_rpage() "
                                         "h_ret= %lx ", h_ret);
                                ret = ehca2ib_return_code(h_ret);
                                goto init_qp_queues1;
@@ -379,37 +385,30 @@ init_qp_queues0:
        return ret;
 }
 
-
 struct ib_qp *ehca_create_qp(struct ib_pd *pd,
                             struct ib_qp_init_attr *init_attr,
                             struct ib_udata *udata)
 {
-       extern struct ehca_module ehca_module;
-       static int da_msg_size[]={ 128, 256, 512, 1024, 2048, 4096 };
-       int ret = -EINVAL;
-
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_pd *my_pd = NULL;
-       struct ehca_shca *shca = NULL;
+       static int da_rc_msg_size[]={ 128, 256, 512, 1024, 2048, 4096 };
+       static int da_ud_sq_msg_size[]={ 128, 384, 896, 1920, 3968 };
+       struct ehca_qp *my_qp;
+       struct ehca_pd *my_pd = container_of(pd, struct ehca_pd, ib_pd);
+       struct ehca_shca *shca = container_of(pd->device, struct 
ehca_shca,
+                                             ib_device);
        struct ib_ucontext *context = NULL;
-       u64 h_ret = H_PARAMETER;
-       int max_send_sge;
-       int max_recv_sge;
+       u64 h_ret;
+       int max_send_sge, max_recv_sge, ret;
 
        /* h_call's out parameters */
        struct ehca_alloc_qp_parms parms;
-       u32 qp_nr = 0, swqe_size = 0, rwqe_size = 0;
+       u32 swqe_size = 0, rwqe_size = 0;
        u8 daqp_completion, isdaqp;
        unsigned long flags;
 
-       EDEB_EN(7,"pd=%p init_attr=%p", pd, init_attr);
-       EHCA_CHECK_PD_P(pd);
-       EHCA_CHECK_ADR_P(init_attr);
-
        if (init_attr->sq_sig_type != IB_SIGNAL_REQ_WR &&
                init_attr->sq_sig_type != IB_SIGNAL_ALL_WR) {
-               EDEB_ERR(4, "init_attr->sg_sig_type=%x not allowed",
-                       init_attr->sq_sig_type);
+               ehca_err(pd->device, "init_attr->sg_sig_type=%x not 
allowed",
+                        init_attr->sq_sig_type);
                return ERR_PTR(-EINVAL);
        }
 
@@ -424,20 +423,36 @@ struct ib_qp *ehca_create_qp(struct ib_p
            init_attr->qp_type != IB_QPT_GSI &&
            init_attr->qp_type != IB_QPT_UC &&
            init_attr->qp_type != IB_QPT_RC) {
-               EDEB_ERR(4,"wrong QP Type=%x",init_attr->qp_type);
+               ehca_err(pd->device, "wrong QP Type=%x", 
init_attr->qp_type);
                return ERR_PTR(-EINVAL);
        }
-       if (init_attr->qp_type != IB_QPT_RC && isdaqp != 0) {
-               EDEB_ERR(4,"unsupported LL QP 
Type=%x",init_attr->qp_type);
+       if ((init_attr->qp_type != IB_QPT_RC && init_attr->qp_type != 
IB_QPT_UD)
+           && isdaqp) {
+               ehca_err(pd->device, "unsupported LL QP Type=%x",
+                        init_attr->qp_type);
+               return ERR_PTR(-EINVAL);
+       } else if (init_attr->qp_type == IB_QPT_RC && isdaqp &&
+                  (init_attr->cap.max_send_wr > 255 ||
+                   init_attr->cap.max_recv_wr > 255 )) {
+                      ehca_err(pd->device, "Invalid Number of max_sq_wr 
=%x "
+                               "or max_rq_wr=%x for QP Type=%x",
+                               init_attr->cap.max_send_wr,
+ init_attr->cap.max_recv_wr,init_attr->qp_type);
+                      return ERR_PTR(-EINVAL);
+       } else if (init_attr->qp_type == IB_QPT_UD && isdaqp &&
+                 init_attr->cap.max_send_wr > 255) {
+               ehca_err(pd->device,
+                        "Invalid Number of max_send_wr=%x for UD 
QP_TYPE=%x",
+                        init_attr->cap.max_send_wr, init_attr->qp_type);
                return ERR_PTR(-EINVAL);
        }
 
        if (pd->uobject && udata)
                context = pd->uobject->context;
 
-       my_qp = kmem_cache_alloc(ehca_module.cache_qp, SLAB_KERNEL);
+       my_qp = kmem_cache_alloc(qp_cache, SLAB_KERNEL);
        if (!my_qp) {
-               EDEB_ERR(4, "pd=%p not enough memory to alloc qp", pd);
+               ehca_err(pd->device, "pd=%p not enough memory to alloc 
qp", pd);
                return ERR_PTR(-ENOMEM);
        }
 
@@ -446,9 +461,6 @@ struct ib_qp *ehca_create_qp(struct ib_p
        spin_lock_init(&my_qp->spinlock_s);
        spin_lock_init(&my_qp->spinlock_r);
 
-       my_pd = container_of(pd, struct ehca_pd, ib_pd);
-
-       shca = container_of(pd->device, struct ehca_shca, ib_device);
        my_qp->recv_cq =
                container_of(init_attr->recv_cq, struct ehca_cq, ib_cq);
        my_qp->send_cq =
@@ -459,7 +471,7 @@ struct ib_qp *ehca_create_qp(struct ib_p
        do {
                if (!idr_pre_get(&ehca_qp_idr, GFP_KERNEL)) {
                        ret = -ENOMEM;
-                       EDEB_ERR(4, "Can't reserve idr resources.");
+                       ehca_err(pd->device, "Can't reserve idr 
resources.");
                        goto create_qp_exit0;
                }
 
@@ -471,14 +483,14 @@ struct ib_qp *ehca_create_qp(struct ib_p
 
        if (ret) {
                ret = -ENOMEM;
-               EDEB_ERR(4, "Can't allocate new idr entry.");
+               ehca_err(pd->device, "Can't allocate new idr entry.");
                goto create_qp_exit0;
        }
 
        parms.servicetype = ibqptype2servicetype(init_attr->qp_type);
        if (parms.servicetype < 0) {
                ret = -EINVAL;
-               EDEB_ERR(4, "Invalid qp_type=%x", init_attr->qp_type);
+               ehca_err(pd->device, "Invalid qp_type=%x", 
init_attr->qp_type);
                goto create_qp_exit0;
        }
 
@@ -497,8 +509,6 @@ struct ib_qp *ehca_create_qp(struct ib_p
                max_recv_sge += 2;
        }
 
-       EDEB(7, "isdaqp=%x daqp_completion=%x", isdaqp, daqp_completion);
-
        parms.ipz_eq_handle = shca->eq.ipz_eq_handle;
        parms.daqp_ctrl = isdaqp | daqp_completion;
        parms.pd = my_pd->fw_pd;
@@ -508,7 +518,8 @@ struct ib_qp *ehca_create_qp(struct ib_p
        h_ret = hipz_h_alloc_resource_qp(shca->ipz_hca_handle, my_qp, 
&parms);
 
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "h_alloc_resource_qp() failed h_ret=%lx", 
h_ret);
+               ehca_err(pd->device, "h_alloc_resource_qp() failed 
h_ret=%lx",
+                        h_ret);
                ret = ehca2ib_return_code(h_ret);
                goto create_qp_exit1;
        }
@@ -521,8 +532,8 @@ struct ib_qp *ehca_create_qp(struct ib_p
                        rwqe_size = offsetof(struct ehca_wqe, 
u.nud.sg_list[
                                             (parms.act_nr_recv_sges)]);
                } else { /* for daqp we need to use msg size, not wqe size 
*/
-                       swqe_size = da_msg_size[max_send_sge];
-                       rwqe_size = da_msg_size[max_recv_sge];
+                       swqe_size = da_rc_msg_size[max_send_sge];
+                       rwqe_size = da_rc_msg_size[max_recv_sge];
                        parms.act_nr_send_sges = 1;
                        parms.act_nr_recv_sges = 1;
                }
@@ -540,10 +551,17 @@ struct ib_qp *ehca_create_qp(struct ib_p
                /* UD circumvention */
                parms.act_nr_recv_sges -= 2;
                parms.act_nr_send_sges -= 2;
-                swqe_size = offsetof(struct ehca_wqe,
- u.ud_av.sg_list[parms.act_nr_send_sges]);
-               rwqe_size = offsetof(struct ehca_wqe,
- u.ud_av.sg_list[parms.act_nr_recv_sges]);
+               if (isdaqp) {
+                       swqe_size = da_ud_sq_msg_size[max_send_sge];
+                       rwqe_size = da_rc_msg_size[max_recv_sge];
+                       parms.act_nr_send_sges = 1;
+                       parms.act_nr_recv_sges = 1;
+               } else {
+                       swqe_size = offsetof(struct ehca_wqe,
+ u.ud_av.sg_list[parms.act_nr_send_sges]);
+                       rwqe_size = offsetof(struct ehca_wqe,
+ u.ud_av.sg_list[parms.act_nr_recv_sges]);
+               }
 
                if (IB_QPT_GSI == init_attr->qp_type ||
                    IB_QPT_SMI == init_attr->qp_type) {
@@ -562,13 +580,13 @@ struct ib_qp *ehca_create_qp(struct ib_p
        }
 
        /* initializes r/squeue and registers queue pages */
-       ret = init_qp_queues(shca->ipz_hca_handle, my_qp,
+       ret = init_qp_queues(shca, my_qp,
                             parms.nr_sq_pages, parms.nr_rq_pages,
                             swqe_size, rwqe_size,
                             parms.act_nr_send_sges, 
parms.act_nr_recv_sges);
        if (ret) {
-               EDEB_ERR(4,"Couldn't initialize r/squeue and pages 
ret=%x",
-                        ret);
+               ehca_err(pd->device,
+                        "Couldn't initialize r/squeue and pages ret=%x", 
ret);
                goto create_qp_exit2;
        }
 
@@ -597,7 +615,8 @@ struct ib_qp *ehca_create_qp(struct ib_p
        if (init_attr->qp_type == IB_QPT_GSI) {
                h_ret = ehca_define_sqp(shca, my_qp, init_attr);
                if (h_ret != H_SUCCESS) {
-                       EDEB_ERR(4, "ehca_define_sqp() failed 
rc=%lx",h_ret);
+                       ehca_err(pd->device, "ehca_define_sqp() failed 
rc=%lx",
+                                h_ret);
                        ret = ehca2ib_return_code(h_ret);
                        goto create_qp_exit3;
                }
@@ -607,7 +626,7 @@ struct ib_qp *ehca_create_qp(struct ib_p
                                                  struct ehca_cq, ib_cq);
                ret = ehca_cq_assign_qp(cq, my_qp);
                if (ret) {
-                       EDEB_ERR(4, "Couldn't assign qp to send_cq 
ret=%x",
+                       ehca_err(pd->device, "Couldn't assign qp to 
send_cq ret=%x",
                                 ret);
                        goto create_qp_exit3;
                }
@@ -637,7 +656,7 @@ struct ib_qp *ehca_create_qp(struct ib_p
                                       (void**)&resp.ipz_rqueue.queue,
                                       &vma);
                if (ret) {
-                       EDEB_ERR(4, "Could not mmap rqueue pages");
+                       ehca_err(pd->device, "Could not mmap rqueue 
pages");
                        goto create_qp_exit3;
                }
                my_qp->uspace_rqueue = resp.ipz_rqueue.queue;
@@ -652,7 +671,7 @@ struct ib_qp *ehca_create_qp(struct ib_p
                                       (void**)&resp.ipz_squeue.queue,
                                       &vma);
                if (ret) {
-                       EDEB_ERR(4, "Could not mmap squeue pages");
+                       ehca_err(pd->device, "Could not mmap squeue 
pages");
                        goto create_qp_exit4;
                }
                my_qp->uspace_squeue = resp.ipz_squeue.queue;
@@ -662,20 +681,18 @@ struct ib_qp *ehca_create_qp(struct ib_p
 (void**)&resp.galpas.kernel.fw_handle,
                                         &vma);
                if (ret) {
-                       EDEB_ERR(4, "Could not mmap fw_handle");
+                       ehca_err(pd->device, "Could not mmap fw_handle");
                        goto create_qp_exit5;
                }
                my_qp->uspace_fwh = (u64)resp.galpas.kernel.fw_handle;
 
                if (ib_copy_to_udata(udata, &resp, sizeof resp)) {
-                       EDEB_ERR(4, "Copy to udata failed");
+                       ehca_err(pd->device, "Copy to udata failed");
                        ret = -EINVAL;
                        goto create_qp_exit6;
                }
        }
 
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x, token=%x",
-               my_qp, qp_nr, my_qp->token);
        return &my_qp->ib_qp;
 
 create_qp_exit6:
@@ -700,10 +717,8 @@ create_qp_exit1:
        spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
 create_qp_exit0:
-       kmem_cache_free(ehca_module.cache_qp, my_qp);
-       EDEB_EX(4, "failed ret=%x", ret);
+       kmem_cache_free(qp_cache, my_qp);
        return ERR_PTR(ret);
-
 }
 
 /*
@@ -714,48 +729,45 @@ create_qp_exit0:
 static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca,
                           int *bad_wqe_cnt)
 {
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       struct ipz_queue *squeue = NULL;
-       void *bad_send_wqe_p = NULL;
-       void *bad_send_wqe_v = NULL;
-       void *squeue_start_p = NULL;
-       void *squeue_end_p = NULL;
-       void *squeue_start_v = NULL;
-       void *squeue_end_v = NULL;
-       struct ehca_wqe *wqe = NULL;
+       u64 h_ret;
+       struct ipz_queue *squeue;
+       void *bad_send_wqe_p, *bad_send_wqe_v;
+       void *squeue_start_p, *squeue_end_p;
+       void *squeue_start_v, *squeue_end_v;
+       struct ehca_wqe *wqe;
        int qp_num = my_qp->ib_qp.qp_num;
 
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x ", my_qp, qp_num);
-
        /* get send wqe pointer */
        h_ret = hipz_h_disable_and_get_wqe(shca->ipz_hca_handle,
                                           my_qp->ipz_qp_handle, 
&my_qp->pf,
                                           &bad_send_wqe_p, NULL, 2);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_h_disable_and_get_wqe() failed "
-                        "ehca_qp=%p qp_num=%x h_ret=%lx",my_qp, qp_num, 
h_ret);
-               ret = ehca2ib_return_code(h_ret);
-               goto prepare_sqe_rts_exit1;
+               ehca_err(&shca->ib_device, "hipz_h_disable_and_get_wqe() 
failed"
+                        " ehca_qp=%p qp_num=%x h_ret=%lx",
+                        my_qp, qp_num, h_ret);
+               return ehca2ib_return_code(h_ret);
        }
        bad_send_wqe_p = (void*)((u64)bad_send_wqe_p & (~(1L<<63)));
-       EDEB(7, "qp_num=%x bad_send_wqe_p=%p", qp_num, bad_send_wqe_p);
+       ehca_dbg(&shca->ib_device, "qp_num=%x bad_send_wqe_p=%p",
+                qp_num, bad_send_wqe_p);
        /* convert wqe pointer to vadr */
        bad_send_wqe_v = abs_to_virt((u64)bad_send_wqe_p);
-       EDEB_DMP(6, bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num);
+       if (ehca_debug_level)
+               ehca_dmp(bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num);
        squeue = &my_qp->ipz_squeue;
        squeue_start_p = (void*)virt_to_abs(ipz_qeit_calc(squeue, 0L));
        squeue_end_p = squeue_start_p+squeue->queue_length;
        squeue_start_v = abs_to_virt((u64)squeue_start_p);
        squeue_end_v = abs_to_virt((u64)squeue_end_p);
-       EDEB(6, "qp_num=%x squeue_start_v=%p squeue_end_v=%p",
-            qp_num, squeue_start_v, squeue_end_v);
+       ehca_dbg(&shca->ib_device, "qp_num=%x squeue_start_v=%p 
squeue_end_v=%p",
+                qp_num, squeue_start_v, squeue_end_v);
 
        /* loop sets wqe's purge bit */
        wqe = (struct ehca_wqe*)bad_send_wqe_v;
        *bad_wqe_cnt = 0;
        while (wqe->optype != 0xff && wqe->wqef != 0xff) {
-               EDEB_DMP(6, wqe, 32, "qp_num=%x wqe", qp_num);
+               if (ehca_debug_level)
+                       ehca_dmp(wqe, 32, "qp_num=%x wqe", qp_num);
                wqe->nr_of_data_seg = 0; /* suppress data access */
                wqe->wqef = WQEF_PURGE; /* WQE to be purged */
                wqe = (struct ehca_wqe*)((u8*)wqe+squeue->qe_size);
@@ -768,13 +780,11 @@ static int prepare_sqe_rts(struct ehca_q
         * bad wqe will be reprocessed and ignored when pol_cq() is 
called,
         *  i.e. nr of wqes with flush error status is one less
         */
-       EDEB(6, "qp_num=%x flusherr_wqe_cnt=%x", qp_num, 
(*bad_wqe_cnt)-1);
+       ehca_dbg(&shca->ib_device, "qp_num=%x flusherr_wqe_cnt=%x",
+                qp_num, (*bad_wqe_cnt)-1);
        wqe->wqef = 0;
 
-prepare_sqe_rts_exit1:
-
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x", my_qp, qp_num, ret);
-       return ret;
+       return 0;
 }
 
 /*
@@ -787,34 +797,25 @@ static int internal_modify_qp(struct ib_
                              struct ib_qp_attr *attr,
                              int attr_mask, int smi_reset2init)
 {
-       enum ib_qp_state qp_cur_state = 0, qp_new_state = 0;
-       int cnt = 0, qp_attr_idx = 0, ret = 0;
-
+       enum ib_qp_state qp_cur_state, qp_new_state;
+       int cnt, qp_attr_idx, ret = 0;
        enum ib_qp_statetrans statetrans;
-       struct hcp_modify_qp_control_block *mqpcb = NULL;
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_shca *shca = NULL;
-       u64 update_mask = 0;
-       u64 h_ret = H_SUCCESS;
+       struct hcp_modify_qp_control_block *mqpcb;
+       struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
+       struct ehca_shca *shca =
+               container_of(ibqp->pd->device, struct ehca_shca, 
ib_device);
+       u64 update_mask;
+       u64 h_ret;
        int bad_wqe_cnt = 0;
        int squeue_locked = 0;
        unsigned long spl_flags = 0;
 
-       my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
-       shca = container_of(ibqp->pd->device, struct ehca_shca, 
ib_device);
-
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x ibqp_type=%x "
-               "new qp_state=%x attribute_mask=%x",
-               my_qp, ibqp->qp_num, ibqp->qp_type,
-               attr->qp_state, attr_mask);
-
        /* do query_qp to obtain current attr values */
        mqpcb = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL);
        if (mqpcb == NULL) {
-               ret = -ENOMEM;
-               EDEB_ERR(4, "Could not get zeroed page for mqpcb "
+               ehca_err(ibqp->device, "Could not get zeroed page for 
mqpcb "
                         "ehca_qp=%p qp_num=%x ", my_qp, ibqp->qp_num);
-               goto modify_qp_exit0;
+               return -ENOMEM;
        }
 
        h_ret = hipz_h_query_qp(shca->ipz_hca_handle,
@@ -822,20 +823,18 @@ static int internal_modify_qp(struct ib_
                                &my_qp->pf,
                                mqpcb, my_qp->galpas.kernel);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_h_query_qp() failed "
+               ehca_err(ibqp->device, "hipz_h_query_qp() failed "
                         "ehca_qp=%p qp_num=%x h_ret=%lx",
                         my_qp, ibqp->qp_num, h_ret);
                ret = ehca2ib_return_code(h_ret);
                goto modify_qp_exit1;
        }
-       EDEB(7, "ehca_qp=%p qp_num=%x ehca_qp_state=%x",
-            my_qp, ibqp->qp_num, mqpcb->qp_state);
 
        qp_cur_state = ehca2ib_qp_state(mqpcb->qp_state);
 
        if (qp_cur_state == -EINVAL) {  /* invalid qp state */
                ret = -EINVAL;
-               EDEB_ERR(4, "Invalid current ehca_qp_state=%x "
+               ehca_err(ibqp->device, "Invalid current ehca_qp_state=%x "
                         "ehca_qp=%p qp_num=%x",
                         mqpcb->qp_state, my_qp, ibqp->qp_num);
                goto modify_qp_exit1;
@@ -860,37 +859,38 @@ static int internal_modify_qp(struct ib_
                int smirc = internal_modify_qp(
                        ibqp, &smiqp_attr, smiqp_attr_mask, 1);
                if (smirc) {
-                       EDEB_ERR(4, "SMI RESET -> INIT failed. "
+                       ehca_err(ibqp->device, "SMI RESET -> INIT failed. 
"
                                 "ehca_modify_qp() rc=%x", smirc);
                        ret = H_PARAMETER;
                        goto modify_qp_exit1;
                }
                qp_cur_state = IB_QPS_INIT;
-               EDEB(7, "SMI RESET -> INIT succeeded");
+               ehca_dbg(ibqp->device, "SMI RESET -> INIT succeeded");
        }
        /* is transmitted current state  equal to "real" current state */
        if ((attr_mask & IB_QP_CUR_STATE) &&
            qp_cur_state != attr->cur_qp_state) {
                ret = -EINVAL;
-               EDEB_ERR(4, "Invalid IB_QP_CUR_STATE 
attr->curr_qp_state=%x <>"
+               ehca_err(ibqp->device,
+                        "Invalid IB_QP_CUR_STATE attr->curr_qp_state=%x 
<>"
                         " actual cur_qp_state=%x. ehca_qp=%p qp_num=%x",
                         attr->cur_qp_state, qp_cur_state, my_qp, 
ibqp->qp_num);
                goto modify_qp_exit1;
        }
 
-       EDEB(7, "ehca_qp=%p qp_num=%x current qp_state=%x "
-            "new qp_state=%x attribute_mask=%x",
-            my_qp, ibqp->qp_num, qp_cur_state, attr->qp_state, 
attr_mask);
+       ehca_dbg(ibqp->device,"ehca_qp=%p qp_num=%x current qp_state=%x "
+                "new qp_state=%x attribute_mask=%x",
+                my_qp, ibqp->qp_num, qp_cur_state, attr->qp_state, 
attr_mask);
 
        qp_new_state = attr_mask & IB_QP_STATE ? attr->qp_state : 
qp_cur_state;
        if (!smi_reset2init &&
            !ib_modify_qp_is_ok(qp_cur_state, qp_new_state, ibqp->qp_type,
                                attr_mask)) {
                ret = -EINVAL;
-               EDEB_ERR(4, "Invalid qp transition new_state=%x 
cur_state=%x "
-                        "ehca_qp=%p qp_num=%x attr_mask=%x",
-                        qp_new_state, qp_cur_state, my_qp, ibqp->qp_num,
-                        attr_mask);
+               ehca_err(ibqp->device,
+                        "Invalid qp transition new_state=%x cur_state=%x 
"
+                        "ehca_qp=%p qp_num=%x attr_mask=%x", 
qp_new_state,
+                        qp_cur_state, my_qp, ibqp->qp_num, attr_mask);
                goto modify_qp_exit1;
        }
 
@@ -898,7 +898,7 @@ static int internal_modify_qp(struct ib_
                update_mask = EHCA_BMASK_SET(MQPCB_MASK_QP_STATE, 1);
        else {
                ret = -EINVAL;
-               EDEB_ERR(4, "Invalid new qp state=%x "
+               ehca_err(ibqp->device, "Invalid new qp state=%x "
                         "ehca_qp=%p qp_num=%x",
                         qp_new_state, my_qp, ibqp->qp_num);
                goto modify_qp_exit1;
@@ -908,10 +908,9 @@ static int internal_modify_qp(struct ib_
        statetrans = get_modqp_statetrans(qp_cur_state, qp_new_state);
        if (statetrans < 0) {
                ret = -EINVAL;
-               EDEB_ERR(4, "<INVALID STATE CHANGE> qp_cur_state=%x "
-                        "new_qp_state=%x State_xsition=%x "
-                        "ehca_qp=%p qp_num=%x",
-                        qp_cur_state, qp_new_state,
+               ehca_err(ibqp->device, "<INVALID STATE CHANGE> 
qp_cur_state=%x "
+                        "new_qp_state=%x State_xsition=%x ehca_qp=%p "
+                        "qp_num=%x", qp_cur_state, qp_new_state,
                         statetrans, my_qp, ibqp->qp_num);
                goto modify_qp_exit1;
        }
@@ -920,13 +919,15 @@ static int internal_modify_qp(struct ib_
 
        if (qp_attr_idx < 0) {
                ret = qp_attr_idx;
-               EDEB_ERR(4, "Invalid QP type=%x ehca_qp=%p qp_num=%x",
+               ehca_err(ibqp->device,
+                        "Invalid QP type=%x ehca_qp=%p qp_num=%x",
                         ibqp->qp_type, my_qp, ibqp->qp_num);
                goto modify_qp_exit1;
        }
 
-       EDEB(7, "ehca_qp=%p qp_num=%x <VALID STATE CHANGE> 
qp_state_xsit=%x",
-            my_qp, ibqp->qp_num, statetrans);
+       ehca_dbg(ibqp->device,
+                "ehca_qp=%p qp_num=%x <VALID STATE CHANGE> 
qp_state_xsit=%x",
+                my_qp, ibqp->qp_num, statetrans);
 
        /* sqe -> rts: set purge bit of bad wqe before actual trans */
        if ((my_qp->qp_type == IB_QPT_UD ||
@@ -935,7 +936,7 @@ static int internal_modify_qp(struct ib_
            statetrans == IB_QPST_SQE2RTS) {
                /* mark next free wqe if kernel */
                if (my_qp->uspace_squeue == 0) {
-                       struct ehca_wqe *wqe = NULL;
+                       struct ehca_wqe *wqe;
                        /* lock send queue */
                        spin_lock_irqsave(&my_qp->spinlock_s, spl_flags);
                        squeue_locked = 1;
@@ -943,12 +944,12 @@ static int internal_modify_qp(struct ib_
                        wqe = (struct ehca_wqe*)
                                ipz_qeit_get(&my_qp->ipz_squeue);
                        wqe->optype = wqe->wqef = 0xff;
-                       EDEB(7, "qp_num=%x next_free_wqe=%p",
-                            ibqp->qp_num, wqe);
+                       ehca_dbg(ibqp->device, "qp_num=%x 
next_free_wqe=%p",
+                                ibqp->qp_num, wqe);
                }
                ret = prepare_sqe_rts(my_qp, shca, &bad_wqe_cnt);
                if (ret) {
-                       EDEB_ERR(4, "prepare_sqe_rts() failed "
+                       ehca_err(ibqp->device, "prepare_sqe_rts() failed "
                                 "ehca_qp=%p qp_num=%x ret=%x",
                                 my_qp, ibqp->qp_num, ret);
                        goto modify_qp_exit2;
@@ -977,14 +978,11 @@ static int internal_modify_qp(struct ib_
        if (attr_mask & IB_QP_PKEY_INDEX) {
                mqpcb->prim_p_key_idx = attr->pkey_index;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PRIM_P_KEY_IDX, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "IB_QP_PKEY_INDEX update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_PORT) {
                if (attr->port_num < 1 || attr->port_num > 
shca->num_ports) {
                        ret = -EINVAL;
-                       EDEB_ERR(4, "Invalid port=%x. "
+                       ehca_err(ibqp->device, "Invalid port=%x. "
                                 "ehca_qp=%p qp_num=%x num_ports=%x",
                                 attr->port_num, my_qp, ibqp->qp_num,
                                 shca->num_ports);
@@ -992,14 +990,10 @@ static int internal_modify_qp(struct ib_
                }
                mqpcb->prim_phys_port = attr->port_num;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PRIM_PHYS_PORT, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_PORT update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_QKEY) {
                mqpcb->qkey = attr->qkey;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_QKEY, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_QKEY update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_AV) {
                int ah_mult = ib_rate_to_mult(attr->ah_attr.static_rate);
@@ -1013,18 +1007,12 @@ static int internal_modify_qp(struct ib_
                mqpcb->service_level = attr->ah_attr.sl;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SERVICE_LEVEL, 
1);
 
-                if (ah_mult < ehca_mult)
+               if (ah_mult < ehca_mult)
                        mqpcb->max_static_rate = (ah_mult > 0) ?
                        ((ehca_mult - 1) / ah_mult) : 0;
                else
                        mqpcb->max_static_rate = 0;
 
-               EDEB(7, " ipd=mqpcb->max_static_rate set %x "
-                       " ah_mult=%x  ehca_mult=%x "
-                       " attr->ah_attr.static_rate=%x",
-                    mqpcb->max_static_rate,ah_mult,ehca_mult,
-                    attr->ah_attr.static_rate);
-
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_MAX_STATIC_RATE, 
1);
 
                /*
@@ -1052,48 +1040,33 @@ static int internal_modify_qp(struct ib_
                        update_mask |=
                                EHCA_BMASK_SET(MQPCB_MASK_TRAFFIC_CLASS, 
1);
                }
-
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_AV update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
 
        if (attr_mask & IB_QP_PATH_MTU) {
                mqpcb->path_mtu = attr->path_mtu;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_PATH_MTU, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_PATH_MTU 
update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_TIMEOUT) {
                mqpcb->timeout = attr->timeout;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_TIMEOUT, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_TIMEOUT 
update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_RETRY_CNT) {
                mqpcb->retry_count = attr->retry_cnt;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RETRY_COUNT, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RETRY_CNT 
update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_RNR_RETRY) {
                mqpcb->rnr_retry_count = attr->rnr_retry;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RNR_RETRY_COUNT, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RNR_RETRY 
update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_RQ_PSN) {
                mqpcb->receive_psn = attr->rq_psn;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_RECEIVE_PSN, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_RQ_PSN 
update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) {
                mqpcb->rdma_nr_atomic_resp_res = attr->max_dest_rd_atomic 
< 3 ?
-                       attr->max_dest_rd_atomic : 2; /* max is 2 */
+                       attr->max_dest_rd_atomic : 2;
                update_mask |=
                        EHCA_BMASK_SET(MQPCB_MASK_RDMA_NR_ATOMIC_RESP_RES, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_MAX_DEST_RD_ATOMIC "
-                    "update_mask=%lx", my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) {
                mqpcb->rdma_atomic_outst_dest_qp = attr->max_rd_atomic < 3 
?
@@ -1101,8 +1074,6 @@ static int internal_modify_qp(struct ib_
                update_mask |=
                        EHCA_BMASK_SET
                        (MQPCB_MASK_RDMA_ATOMIC_OUTST_DEST_QP, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_MAX_QP_RD_ATOMIC "
-                    "update_mask=%lx", my_qp, ibqp->qp_num, update_mask);
        }
        if (attr_mask & IB_QP_ALT_PATH) {
                int ah_mult = 
ib_rate_to_mult(attr->alt_ah_attr.static_rate);
@@ -1123,10 +1094,6 @@ static int internal_modify_qp(struct ib_
                else
                        mqpcb->max_static_rate_al = 0;
 
-               EDEB(7, " ipd=mqpcb->max_static_rate set %x,"
-                       " ah_mult=%x ehca_mult=%x",
-                    mqpcb->max_static_rate,ah_mult,ehca_mult);
-
                update_mask |= 
EHCA_BMASK_SET(MQPCB_MASK_MAX_STATIC_RATE_AL, 1);
 
                /*
@@ -1159,43 +1126,28 @@ static int internal_modify_qp(struct ib_
                        update_mask |=
 EHCA_BMASK_SET(MQPCB_MASK_TRAFFIC_CLASS_AL, 1);
                }
-
-               EDEB(7, "ehca_qp=%p qp_num=%x IB_QP_ALT_PATH 
update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
 
        if (attr_mask & IB_QP_MIN_RNR_TIMER) {
                mqpcb->min_rnr_nak_timer_field = attr->min_rnr_timer;
                update_mask |=
                        EHCA_BMASK_SET(MQPCB_MASK_MIN_RNR_NAK_TIMER_FIELD, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "IB_QP_MIN_RNR_TIMER update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
 
        if (attr_mask & IB_QP_SQ_PSN) {
                mqpcb->send_psn = attr->sq_psn;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_SEND_PSN, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "IB_QP_SQ_PSN update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
 
        if (attr_mask & IB_QP_DEST_QPN) {
                mqpcb->dest_qp_nr = attr->dest_qp_num;
                update_mask |= EHCA_BMASK_SET(MQPCB_MASK_DEST_QP_NR, 1);
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "IB_QP_DEST_QPN update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
        }
 
        if (attr_mask & IB_QP_PATH_MIG_STATE) {
                mqpcb->path_migration_state = attr->path_mig_state;
                update_mask |=
                        EHCA_BMASK_SET(MQPCB_MASK_PATH_MIGRATION_STATE, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "IB_QP_PATH_MIG_STATE update_mask=%lx", my_qp,
-                    ibqp->qp_num, update_mask);
        }
 
        if (attr_mask & IB_QP_CAP) {
@@ -1205,13 +1157,11 @@ static int internal_modify_qp(struct ib_
                mqpcb->max_nr_outst_recv_wr = attr->cap.max_recv_wr+1;
                update_mask |=
                        EHCA_BMASK_SET(MQPCB_MASK_MAX_NR_OUTST_RECV_WR, 
1);
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "IB_QP_CAP update_mask=%lx",
-                    my_qp, ibqp->qp_num, update_mask);
                /* no support for max_send/recv_sge yet */
        }
 
-       EDEB_DMP(7, mqpcb, 4*70, "ehca_qp=%p qp_num=%x", my_qp, 
ibqp->qp_num);
+       if (ehca_debug_level)
+               ehca_dmp(mqpcb, 4*70, "qp_num=%x", ibqp->qp_num);
 
        h_ret = hipz_h_modify_qp(shca->ipz_hca_handle,
                                 my_qp->ipz_qp_handle,
@@ -1221,9 +1171,8 @@ static int internal_modify_qp(struct ib_
 
        if (h_ret != H_SUCCESS) {
                ret = ehca2ib_return_code(h_ret);
-               EDEB_ERR(4, "hipz_h_modify_qp() failed rc=%lx "
-                        "ehca_qp=%p qp_num=%x",
-                        h_ret, my_qp, ibqp->qp_num);
+               ehca_err(ibqp->device, "hipz_h_modify_qp() failed rc=%lx "
+                        "ehca_qp=%p qp_num=%x",h_ret, my_qp, 
ibqp->qp_num);
                goto modify_qp_exit2;
        }
 
@@ -1234,7 +1183,7 @@ static int internal_modify_qp(struct ib_
                /* doorbell to reprocessing wqes */
                iosync(); /* serialize GAL register access */
                hipz_update_sqa(my_qp, bad_wqe_cnt-1);
-               EDEB(6, "doorbell for %x wqes", bad_wqe_cnt);
+               ehca_gen_dbg("doorbell for %x wqes", bad_wqe_cnt);
        }
 
        if (statetrans == IB_QPST_RESET2INIT ||
@@ -1244,10 +1193,6 @@ static int internal_modify_qp(struct ib_
                update_mask = 0;
                update_mask = EHCA_BMASK_SET(MQPCB_MASK_QP_ENABLE, 1);
 
-               EDEB(7, "ehca_qp=%p qp_num=%x "
-                    "RESET_2_INIT needs an additional enable "
-                    "-> update_mask=%lx", my_qp, ibqp->qp_num, 
update_mask);
-
                h_ret = hipz_h_modify_qp(shca->ipz_hca_handle,
                                         my_qp->ipz_qp_handle,
                                         &my_qp->pf,
@@ -1257,10 +1202,9 @@ static int internal_modify_qp(struct ib_
 
                if (h_ret != H_SUCCESS) {
                        ret = ehca2ib_return_code(h_ret);
-                       EDEB_ERR(4, "ENABLE in context of "
-                                "RESET_2_INIT failed! "
-                                "Maybe you didn't get a LID"
-                                "h_ret=%lx ehca_qp=%p qp_num=%x",
+                       ehca_err(ibqp->device, "ENABLE in context of "
+                                "RESET_2_INIT failed! Maybe you didn't 
get "
+                                "a LID h_ret=%lx ehca_qp=%p qp_num=%x",
                                 h_ret, my_qp, ibqp->qp_num);
                        goto modify_qp_exit2;
                }
@@ -1283,91 +1227,60 @@ modify_qp_exit2:
 modify_qp_exit1:
        kfree(mqpcb);
 
-modify_qp_exit0:
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x ibqp_type=%x ret=%x",
-               my_qp, ibqp->qp_num, ibqp->qp_type, ret);
        return ret;
 }
 
 int ehca_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int 
attr_mask)
 {
-       int ret = 0;
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
+       struct ehca_pd *my_pd = container_of(my_qp->ib_qp.pd, struct 
ehca_pd,
+                                            ib_pd);
        u32 cur_pid = current->tgid;
 
-       EHCA_CHECK_ADR(ibqp);
-       EHCA_CHECK_ADR(attr);
-       EHCA_CHECK_ADR(ibqp->device);
-
-       my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
-
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x ibqp_type=%x attr_mask=%x",
-               my_qp, ibqp->qp_num, ibqp->qp_type, attr_mask);
-
-       my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(ibqp->pd->device, "Invalid caller pid=%x 
ownpid=%x",
                         cur_pid, my_pd->ownpid);
-               ret = -EINVAL;
-       } else
-               ret = internal_modify_qp(ibqp, attr, attr_mask, 0);
+               return -EINVAL;
+       }
 
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x ibqp_type=%x ret=%x",
-               my_qp, ibqp->qp_num, ibqp->qp_type, ret);
-       return ret;
+       return internal_modify_qp(ibqp, attr, attr_mask, 0);
 }
 
 int ehca_query_qp(struct ib_qp *qp,
                  struct ib_qp_attr *qp_attr,
                  int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr)
 {
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_shca *shca = NULL;
-       struct hcp_modify_qp_control_block *qpcb = NULL;
-       struct ipz_adapter_handle adapter_handle;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp);
+       struct ehca_pd *my_pd = container_of(my_qp->ib_qp.pd, struct 
ehca_pd,
+                                            ib_pd);
+       struct ehca_shca *shca = container_of(qp->device, struct 
ehca_shca,
+                                             ib_device);
+       struct ipz_adapter_handle adapter_handle = shca->ipz_hca_handle;
+       struct hcp_modify_qp_control_block *qpcb;
        u32 cur_pid = current->tgid;
-       int cnt = 0, ret = 0;
-       u64 h_ret = H_SUCCESS;
+       int cnt, ret = 0;
+       u64 h_ret;
 
-       EHCA_CHECK_ADR(qp);
-       EHCA_CHECK_ADR(qp_attr);
-       EHCA_CHECK_DEVICE(qp->device);
-
-       my_qp = container_of(qp, struct ehca_qp, ib_qp);
-
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x "
-               "qp_attr=%p qp_attr_mask=%x qp_init_attr=%p",
-               my_qp, qp->qp_num, qp_attr, qp_attr_mask, qp_init_attr);
-
-       my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject  && my_pd->ib_pd.uobject->context  &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(qp->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
-               ret = -EINVAL;
-               goto query_qp_exit0;
+               return -EINVAL;
        }
 
-       shca = container_of(qp->device, struct ehca_shca, ib_device);
-       adapter_handle = shca->ipz_hca_handle;
-
        if (qp_attr_mask & QP_ATTR_QUERY_NOT_SUPPORTED) {
-               ret = -EINVAL;
-               EDEB_ERR(4,"Invalid attribute mask "
+               ehca_err(qp->device,"Invalid attribute mask "
                         "ehca_qp=%p qp_num=%x qp_attr_mask=%x ",
                         my_qp, qp->qp_num, qp_attr_mask);
-               goto query_qp_exit0;
+               return -EINVAL;
        }
 
        qpcb = kzalloc(H_CB_ALIGNMENT, GFP_KERNEL );
        if (!qpcb) {
-               ret = -ENOMEM;
-               EDEB_ERR(4,"Out of memory for qpcb "
+               ehca_err(qp->device,"Out of memory for qpcb "
                         "ehca_qp=%p qp_num=%x", my_qp, qp->qp_num);
-               goto query_qp_exit0;
+               return -ENOMEM;
        }
 
        h_ret = hipz_h_query_qp(adapter_handle,
@@ -1377,7 +1290,7 @@ int ehca_query_qp(struct ib_qp *qp,
 
        if (h_ret != H_SUCCESS) {
                ret = ehca2ib_return_code(h_ret);
-               EDEB_ERR(4,"hipz_h_query_qp() failed "
+               ehca_err(qp->device,"hipz_h_query_qp() failed "
                         "ehca_qp=%p qp_num=%x h_ret=%lx",
                         my_qp, qp->qp_num, h_ret);
                goto query_qp_exit1;
@@ -1385,9 +1298,10 @@ int ehca_query_qp(struct ib_qp *qp,
 
        qp_attr->cur_qp_state = ehca2ib_qp_state(qpcb->qp_state);
        qp_attr->qp_state = qp_attr->cur_qp_state;
+
        if (qp_attr->cur_qp_state == -EINVAL) {
                ret = -EINVAL;
-               EDEB_ERR(4,"Got invalid ehca_qp_state=%x "
+               ehca_err(qp->device,"Got invalid ehca_qp_state=%x "
                         "ehca_qp=%p qp_num=%x",
                         qpcb->qp_state, my_qp, qp->qp_num);
                goto query_qp_exit1;
@@ -1482,54 +1396,33 @@ int ehca_query_qp(struct ib_qp *qp,
        if (qp_init_attr)
                *qp_init_attr = my_qp->init_attr;
 
-       EDEB(7, "ehca_qp=%p qp_number=%x dest_qp_number=%x "
-            "dlid=%x path_mtu=%x dest_gid=%lx_%lx "
-            "service_level=%x qp_state=%x",
-            my_qp, qpcb->qp_number, qpcb->dest_qp_nr,
-            qpcb->dlid, qpcb->path_mtu,
-            qpcb->dest_gid.dw[0], qpcb->dest_gid.dw[1],
-            qpcb->service_level, qpcb->qp_state);
-
-       EDEB_DMP(7, qpcb, 4*70, "ehca_qp=%p qp_num=%x", my_qp, 
qp->qp_num);
+       if (ehca_debug_level)
+               ehca_dmp(qpcb, 4*70, "qp_num=%x", qp->qp_num);
 
 query_qp_exit1:
        kfree(qpcb);
 
-query_qp_exit0:
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x",
-               my_qp, qp->qp_num, ret);
        return ret;
 }
 
 int ehca_destroy_qp(struct ib_qp *ibqp)
 {
-       extern struct ehca_module ehca_module;
-       struct ehca_qp *my_qp = NULL;
-       struct ehca_shca *shca = NULL;
-       struct ehca_pfqp *qp_pf = NULL;
-       struct ehca_pd *my_pd = NULL;
+       struct ehca_qp *my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
+       struct ehca_shca *shca = container_of(ibqp->device, struct 
ehca_shca,
+                                             ib_device);
+       struct ehca_pd *my_pd = container_of(my_qp->ib_qp.pd, struct 
ehca_pd,
+                                            ib_pd);
        u32 cur_pid = current->tgid;
-       u32 qp_num = 0;
-       int ret = 0;
-       u64 h_ret = H_SUCCESS;
-       u8 port_num = 0;
+       u32 qp_num = ibqp->qp_num;
+       int ret;
+       u64 h_ret;
+       u8 port_num;
        enum ib_qp_type qp_type;
        unsigned long flags;
 
-       EHCA_CHECK_ADR(ibqp);
-
-       my_qp = container_of(ibqp, struct ehca_qp, ib_qp);
-       qp_num = ibqp->qp_num;
-       qp_pf = &my_qp->pf;
-
-       shca = container_of(ibqp->device, struct ehca_shca, ib_device);
-
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x", my_qp, ibqp->qp_num);
-
-       my_pd = container_of(my_qp->ib_qp.pd, struct ehca_pd, ib_pd);
        if (my_pd->ib_pd.uobject && my_pd->ib_pd.uobject->context &&
            my_pd->ownpid != cur_pid) {
-               EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+               ehca_err(ibqp->device, "Invalid caller pid=%x ownpid=%x",
                         cur_pid, my_pd->ownpid);
                return -EINVAL;
        }
@@ -1538,11 +1431,10 @@ int ehca_destroy_qp(struct ib_qp *ibqp)
                ret = ehca_cq_unassign_qp(my_qp->send_cq,
                                              my_qp->real_qp_num);
                if (ret) {
-                       EDEB_ERR(4, "Couldn't unassign qp from send_cq "
-                                "ret=%x qp_num=%x cq_num=%x",
-                                ret, my_qp->ib_qp.qp_num,
-                                my_qp->send_cq->cq_number);
-                       goto destroy_qp_exit0;
+                       ehca_err(ibqp->device, "Couldn't unassign qp from 
"
+                                "send_cq ret=%x qp_num=%x cq_num=%x", 
ret,
+                                my_qp->ib_qp.qp_num, 
my_qp->send_cq->cq_number);
+                       return ret;
                }
        }
 
@@ -1554,17 +1446,25 @@ int ehca_destroy_qp(struct ib_qp *ibqp)
        if (my_qp->uspace_rqueue) {
                ret = ehca_munmap(my_qp->uspace_rqueue,
                                  my_qp->ipz_rqueue.queue_length);
+               if (ret)
+                       ehca_err(ibqp->device, "Could not munmap rqueue "
+                                "qp_num=%x", qp_num);
                ret = ehca_munmap(my_qp->uspace_squeue,
                                  my_qp->ipz_squeue.queue_length);
+               if (ret)
+                       ehca_err(ibqp->device, "Could not munmap squeue "
+                                "qp_num=%x", qp_num);
                ret = ehca_munmap(my_qp->uspace_fwh, EHCA_PAGESIZE);
+               if (ret)
+                       ehca_err(ibqp->device, "Could not munmap fwh 
qp_num=%x",
+                                qp_num);
        }
 
        h_ret = hipz_h_destroy_qp(shca->ipz_hca_handle, my_qp);
        if (h_ret != H_SUCCESS) {
-               EDEB_ERR(4, "hipz_h_destroy_qp() failed "
-                        "rc=%lx ehca_qp=%p qp_num=%x",
-                        h_ret, qp_pf, qp_num);
-               goto destroy_qp_exit0;
+               ehca_err(ibqp->device, "hipz_h_destroy_qp() failed rc=%lx 
"
+                        "ehca_qp=%p qp_num=%x", h_ret, my_qp, qp_num);
+               return ehca2ib_return_code(h_ret);
        }
 
        port_num = my_qp->init_attr.port_num;
@@ -1573,9 +1473,8 @@ int ehca_destroy_qp(struct ib_qp *ibqp)
        /* no support for IB_QPT_SMI yet */
        if (qp_type == IB_QPT_GSI) {
                struct ib_event event;
-
-               EDEB(4, "device %s: port %x is inactive.",
-                    shca->ib_device.name, port_num);
+               ehca_info(ibqp->device, "device %s: port %x is inactive.",
+                         shca->ib_device.name, port_num);
                event.device = &shca->ib_device;
                event.event = IB_EVENT_PORT_ERR;
                event.element.port_num = port_num;
@@ -1585,10 +1484,23 @@ int ehca_destroy_qp(struct ib_qp *ibqp)
 
        ipz_queue_dtor(&my_qp->ipz_rqueue);
        ipz_queue_dtor(&my_qp->ipz_squeue);
-       kmem_cache_free(ehca_module.cache_qp, my_qp);
+       kmem_cache_free(qp_cache, my_qp);
+       return 0;
+}
 
-destroy_qp_exit0:
-       ret = ehca2ib_return_code(h_ret);
-       EDEB_EX(7,"ret=%x", ret);
-       return ret;
+int ehca_init_qp_cache(void)
+{
+       qp_cache = kmem_cache_create("ehca_cache_qp",
+                                    sizeof(struct ehca_qp), 0,
+                                    SLAB_HWCACHE_ALIGN,
+                                    NULL, NULL);
+       if (!qp_cache)
+               return -ENOMEM;
+       return 0;
+}
+
+void ehca_cleanup_qp_cache(void)
+{
+       if (qp_cache)
+               kmem_cache_destroy(qp_cache);
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_reqs.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_reqs.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_reqs.c       2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_reqs.c    2006-08-30 
20:00:16.000000000 +0200
@@ -41,8 +41,6 @@
  */
 
 
-#define DEB_PREFIX "reqs"
-
 #include <asm-powerpc/system.h>
 #include "ehca_classes.h"
 #include "ehca_tools.h"
@@ -58,7 +56,7 @@ static inline int ehca_write_rwqe(struct
        u8 cnt_ds;
        if (unlikely((recv_wr->num_sge < 0) ||
                     (recv_wr->num_sge > ipz_rqueue->act_nr_of_sg))) {
-               EDEB_ERR(4, "Invalid number of WQE SGE. "
+               ehca_gen_err("Invalid number of WQE SGE. "
                         "num_sqe=%x max_nr_of_sg=%x",
                         recv_wr->num_sge, ipz_rqueue->act_nr_of_sg);
                return -EINVAL; /* invalid SG list length */
@@ -79,9 +77,9 @@ static inline int ehca_write_rwqe(struct
                        recv_wr->sg_list[cnt_ds].length;
        }
 
-       if (IS_EDEB_ON(7)) {
-               EDEB(7, "RECEIVE WQE written into ipz_rqueue=%p", 
ipz_rqueue);
-               EDEB_DMP(7, wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv 
wqe");
+       if (ehca_debug_level) {
+               ehca_gen_dbg("RECEIVE WQE written into ipz_rqueue=%p", 
ipz_rqueue);
+               ehca_dmp( wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv 
wqe");
        }
 
        return 0;
@@ -94,31 +92,35 @@ static inline int ehca_write_rwqe(struct
 
 static void trace_send_wr_ud(const struct ib_send_wr *send_wr)
 {
-       int idx = 0;
-       int j = 0;
+       int idx;
+       int j;
        while (send_wr) {
                struct ib_mad_hdr *mad_hdr = send_wr->wr.ud.mad_hdr;
                struct ib_sge *sge = send_wr->sg_list;
-               EDEB(4, "send_wr#%x wr_id=%lx num_sge=%x "
-                    "send_flags=%x opcode=%x",idx, send_wr->wr_id,
-                    send_wr->num_sge, send_wr->send_flags, 
send_wr->opcode);
+               ehca_gen_dbg("send_wr#%x wr_id=%lx num_sge=%x "
+                            "send_flags=%x opcode=%x",idx, 
send_wr->wr_id,
+                            send_wr->num_sge, send_wr->send_flags,
+                            send_wr->opcode);
                if (mad_hdr) {
-                       EDEB(4, "send_wr#%x mad_hdr base_version=%x "
-                            "mgmt_class=%x class_version=%x method=%x "
-                            "status=%x class_specific=%x tid=%lx 
attr_id=%x "
-                            "resv=%x attr_mod=%x",
-                            idx, mad_hdr->base_version, 
mad_hdr->mgmt_class,
-                            mad_hdr->class_version, mad_hdr->method,
-                            mad_hdr->status, mad_hdr->class_specific,
-                            mad_hdr->tid, mad_hdr->attr_id, 
mad_hdr->resv,
-                            mad_hdr->attr_mod);
+                       ehca_gen_dbg("send_wr#%x mad_hdr base_version=%x "
+                                    "mgmt_class=%x class_version=%x 
method=%x "
+                                    "status=%x class_specific=%x tid=%lx 
"
+                                    "attr_id=%x resv=%x attr_mod=%x",
+                                    idx, mad_hdr->base_version,
+                                    mad_hdr->mgmt_class,
+                                    mad_hdr->class_version, 
mad_hdr->method,
+                                    mad_hdr->status, 
mad_hdr->class_specific,
+                                    mad_hdr->tid, mad_hdr->attr_id,
+                                    mad_hdr->resv,
+                                    mad_hdr->attr_mod);
                }
                for (j = 0; j < send_wr->num_sge; j++) {
                        u8 *data = (u8 *) abs_to_virt(sge->addr);
-                       EDEB(4, "send_wr#%x sge#%x addr=%p length=%x 
lkey=%x",
-                            idx, j, data, sge->length, sge->lkey);
+                       ehca_gen_dbg("send_wr#%x sge#%x addr=%p length=%x 
"
+                                    "lkey=%x",
+                                    idx, j, data, sge->length, 
sge->lkey);
                        /* assume length is n*16 */
-                       EDEB_DMP(4, data, sge->length, "send_wr#%x 
sge#%x",
+                       ehca_dmp(data, sge->length, "send_wr#%x sge#%x",
                                 idx, j);
                        sge++;
                } /* eof for j */
@@ -140,7 +142,7 @@ static inline int ehca_write_swqe(struct
 
        if (unlikely((send_wr->num_sge < 0) ||
                     (send_wr->num_sge > qp->ipz_squeue.act_nr_of_sg))) {
-               EDEB_ERR(4, "Invalid number of WQE SGE. "
+               ehca_gen_err("Invalid number of WQE SGE. "
                         "num_sqe=%x max_nr_of_sg=%x",
                         send_wr->num_sge, qp->ipz_squeue.act_nr_of_sg);
                return -EINVAL; /* invalid SG list length */
@@ -164,7 +166,7 @@ static inline int ehca_write_swqe(struct
                wqe_p->optype = WQE_OPTYPE_RDMAREAD;
                break;
        default:
-               EDEB_ERR(4, "Invalid opcode=%x", send_wr->opcode);
+               ehca_gen_err("Invalid opcode=%x", send_wr->opcode);
                return -EINVAL; /* invalid opcode */
        }
 
@@ -196,7 +198,7 @@ static inline int ehca_write_swqe(struct
                wqe_p->destination_qp_number = send_wr->wr.ud.remote_qpn 
<< 8;
                wqe_p->local_ee_context_qkey = remote_qkey;
                if (!send_wr->wr.ud.ah) {
-                       EDEB_ERR(4, "wr.ud.ah is NULL. qp=%p", qp);
+                       ehca_gen_err("wr.ud.ah is NULL. qp=%p", qp);
                        return -EINVAL;
                }
                my_av = container_of(send_wr->wr.ud.ah, struct ehca_av, 
ib_ah);
@@ -254,13 +256,13 @@ static inline int ehca_write_swqe(struct
                break;
 
        default:
-               EDEB_ERR(4, "Invalid qptype=%x", qp->qp_type);
+               ehca_gen_err("Invalid qptype=%x", qp->qp_type);
                return -EINVAL;
        }
 
-       if (IS_EDEB_ON(7)) {
-               EDEB(7, "SEND WQE written into queue qp=%p ", qp);
-               EDEB_DMP(7, wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send 
wqe");
+       if (ehca_debug_level) {
+               ehca_gen_dbg("SEND WQE written into queue qp=%p ", qp);
+               ehca_dmp( wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send 
wqe");
        }
        return 0;
 }
@@ -355,19 +357,12 @@ int ehca_post_send(struct ib_qp *qp,
                   struct ib_send_wr *send_wr,
                   struct ib_send_wr **bad_send_wr)
 {
-       struct ehca_qp *my_qp = NULL;
-       struct ib_send_wr *cur_send_wr = NULL;
-       struct ehca_wqe *wqe_p = NULL;
+       struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp);
+       struct ib_send_wr *cur_send_wr;
+       struct ehca_wqe *wqe_p;
        int wqe_cnt = 0;
        int ret = 0;
-       unsigned long spl_flags = 0;
-
-       EHCA_CHECK_ADR(qp);
-       my_qp = container_of(qp, struct ehca_qp, ib_qp);
-       EHCA_CHECK_QP(my_qp);
-       EHCA_CHECK_ADR(send_wr);
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x send_wr=%p bad_send_wr=%p",
-               my_qp, qp->qp_num, send_wr, bad_send_wr);
+       unsigned long spl_flags;
 
        /* LOCK the QUEUE */
        spin_lock_irqsave(&my_qp->spinlock_s, spl_flags);
@@ -384,8 +379,8 @@ int ehca_post_send(struct ib_qp *qp,
                                *bad_send_wr = cur_send_wr;
                        if (wqe_cnt == 0) {
                                ret = -ENOMEM;
-                               EDEB_ERR(4, "Too many posted WQEs 
qp_num=%x",
-                                        qp->qp_num);
+                               ehca_err(qp->device, "Too many posted WQEs 
"
+                                        "qp_num=%x", qp->qp_num);
                        }
                        goto post_send_exit0;
                }
@@ -400,14 +395,14 @@ int ehca_post_send(struct ib_qp *qp,
                        *bad_send_wr = cur_send_wr;
                        if (wqe_cnt == 0) {
                                ret = -EINVAL;
-                               EDEB_ERR(4, "Could not write WQE 
qp_num=%x",
-                                        qp->qp_num);
+                               ehca_err(qp->device, "Could not write WQE 
"
+                                        "qp_num=%x", qp->qp_num);
                        }
                        goto post_send_exit0;
                }
                wqe_cnt++;
-               EDEB(7, "ehca_qp=%p qp_num=%x wqe_cnt=%d",
-                    my_qp, qp->qp_num, wqe_cnt);
+               ehca_dbg(qp->device, "ehca_qp=%p qp_num=%x wqe_cnt=%d",
+                        my_qp, qp->qp_num, wqe_cnt);
        } /* eof for cur_send_wr */
 
 post_send_exit0:
@@ -415,8 +410,6 @@ post_send_exit0:
        spin_unlock_irqrestore(&my_qp->spinlock_s, spl_flags);
        iosync(); /* serialize GAL register access */
        hipz_update_sqa(my_qp, wqe_cnt);
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x wqe_cnt=%d",
-               my_qp, qp->qp_num, ret, wqe_cnt);
        return ret;
 }
 
@@ -424,19 +417,12 @@ int ehca_post_recv(struct ib_qp *qp,
                   struct ib_recv_wr *recv_wr,
                   struct ib_recv_wr **bad_recv_wr)
 {
-       struct ehca_qp *my_qp = NULL;
-       struct ib_recv_wr *cur_recv_wr = NULL;
-       struct ehca_wqe *wqe_p = NULL;
+       struct ehca_qp *my_qp = container_of(qp, struct ehca_qp, ib_qp);
+       struct ib_recv_wr *cur_recv_wr;
+       struct ehca_wqe *wqe_p;
        int wqe_cnt = 0;
        int ret = 0;
-       unsigned long spl_flags = 0;
-
-       EHCA_CHECK_ADR(qp);
-       my_qp = container_of(qp, struct ehca_qp, ib_qp);
-       EHCA_CHECK_QP(my_qp);
-       EHCA_CHECK_ADR(recv_wr);
-       EDEB_EN(7, "ehca_qp=%p qp_num=%x recv_wr=%p bad_recv_wr=%p",
-               my_qp, qp->qp_num, recv_wr, bad_recv_wr);
+       unsigned long spl_flags;
 
        /* LOCK the QUEUE */
        spin_lock_irqsave(&my_qp->spinlock_r, spl_flags);
@@ -453,14 +439,13 @@ int ehca_post_recv(struct ib_qp *qp,
                                *bad_recv_wr = cur_recv_wr;
                        if (wqe_cnt == 0) {
                                ret = -ENOMEM;
-                               EDEB_ERR(4, "Too many posted WQEs 
qp_num=%x",
-                                        qp->qp_num);
+                               ehca_err(qp->device, "Too many posted WQEs 
"
+                                        "qp_num=%x", qp->qp_num);
                        }
                        goto post_recv_exit0;
                }
                /* write a RECV WQE into the QUEUE */
-               ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p,
-                                         cur_recv_wr);
+               ret = ehca_write_rwqe(&my_qp->ipz_rqueue, wqe_p, 
cur_recv_wr);
                /*
                 * if something failed,
                 * reset the free entry pointer to the start value
@@ -470,13 +455,13 @@ int ehca_post_recv(struct ib_qp *qp,
                        *bad_recv_wr = cur_recv_wr;
                        if (wqe_cnt == 0) {
                                ret = -EINVAL;
-                               EDEB_ERR(4, "Could not write WQE 
qp_num=%x",
-                                        qp->qp_num);
+                               ehca_err(qp->device, "Could not write WQE 
"
+                                        "qp_num=%x", qp->qp_num);
                        }
                        goto post_recv_exit0;
                }
                wqe_cnt++;
-               EDEB(7, "ehca_qp=%p qp_num=%x wqe_cnt=%d",
+               ehca_gen_dbg("ehca_qp=%p qp_num=%x wqe_cnt=%d",
                     my_qp, qp->qp_num, wqe_cnt);
        } /* eof for cur_recv_wr */
 
@@ -484,8 +469,6 @@ post_recv_exit0:
        spin_unlock_irqrestore(&my_qp->spinlock_r, spl_flags);
        iosync(); /* serialize GAL register access */
        hipz_update_rqa(my_qp, wqe_cnt);
-       EDEB_EX(7, "ehca_qp=%p qp_num=%x ret=%x wqe_cnt=%d",
-               my_qp, qp->qp_num, ret, wqe_cnt);
        return ret;
 }
 
@@ -510,18 +493,16 @@ static inline int ehca_poll_cq_one(struc
 {
        int ret = 0;
        struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
-       struct ehca_cqe *cqe = NULL;
+       struct ehca_cqe *cqe;
        int cqe_count = 0;
 
-       EDEB_EN(7, "ehca_cq=%p cq_num=%x wc=%p", my_cq, my_cq->cq_number, 
wc);
-
 poll_cq_one_read_cqe:
        cqe = (struct ehca_cqe *)
                ipz_qeit_get_inc_valid(&my_cq->ipz_queue);
        if (!cqe) {
                ret = -EAGAIN;
-               EDEB(7, "Completion queue is empty ehca_cq=%p cq_num=%x "
-                    "ret=%x", my_cq, my_cq->cq_number, ret);
+               ehca_dbg(cq->device, "Completion queue is empty ehca_cq=%p 
"
+                        "cq_num=%x ret=%x", my_cq, my_cq->cq_number, 
ret);
                goto  poll_cq_one_exit0;
        }
 
@@ -531,13 +512,13 @@ poll_cq_one_read_cqe:
        cqe_count++;
        if (unlikely(cqe->status & WC_STATUS_PURGE_BIT)) {
                struct ehca_qp *qp=ehca_cq_get_qp(my_cq, 
cqe->local_qp_number);
-               int purgeflag = 0;
-               unsigned long spl_flags = 0;
+               int purgeflag;
+               unsigned long spl_flags;
                if (!qp) {
-                       EDEB_ERR(4, "cq_num=%x qp_num=%x "
+                       ehca_err(cq->device, "cq_num=%x qp_num=%x "
                                 "could not find qp -> ignore cqe",
                                 my_cq->cq_number, cqe->local_qp_number);
-                       EDEB_DMP(4, cqe, 64, "cq_num=%x qp_num=%x",
+                       ehca_dmp(cqe, 64, "cq_num=%x qp_num=%x",
                                 my_cq->cq_number, cqe->local_qp_number);
                        /* ignore this purged cqe */
                        goto poll_cq_one_read_cqe;
@@ -547,10 +528,13 @@ poll_cq_one_read_cqe:
                spin_unlock_irqrestore(&qp->spinlock_s, spl_flags);
 
                if (purgeflag) {
-                       EDEB(6, "Got CQE with purged bit qp_num=%x 
src_qp=%x",
-                            cqe->local_qp_number, cqe->remote_qp_number);
-                       EDEB_DMP(6, cqe, 64, "qp_num=%x src_qp=%x",
+                       ehca_dbg(cq->device, "Got CQE with purged bit 
qp_num=%x "
+                                "src_qp=%x",
                                 cqe->local_qp_number, 
cqe->remote_qp_number);
+                       if (ehca_debug_level)
+                               ehca_dmp(cqe, 64, "qp_num=%x src_qp=%x",
+                                        cqe->local_qp_number,
+                                        cqe->remote_qp_number);
                        /*
                         * ignore this to avoid double cqes of bad wqe
                         * that caused sqe and turn off purge flag
@@ -561,13 +545,15 @@ poll_cq_one_read_cqe:
        }
 
        /* tracing cqe */
-       if (IS_EDEB_ON(7)) {
-               EDEB(7, "Received COMPLETION ehca_cq=%p cq_num=%x -----",
-                    my_cq, my_cq->cq_number);
-               EDEB_DMP(7, cqe, 64, "ehca_cq=%p cq_num=%x",
+       if (ehca_debug_level) {
+               ehca_dbg(cq->device,
+                        "Received COMPLETION ehca_cq=%p cq_num=%x -----",
+                        my_cq, my_cq->cq_number);
+               ehca_dmp(cqe, 64, "ehca_cq=%p cq_num=%x",
+                        my_cq, my_cq->cq_number);
+               ehca_dbg(cq->device,
+                        "ehca_cq=%p cq_num=%x -------------------------",
                         my_cq, my_cq->cq_number);
-               EDEB(7, "ehca_cq=%p cq_num=%x -------------------------",
-                    my_cq, my_cq->cq_number);
        }
 
        /* we got a completion! */
@@ -576,11 +562,11 @@ poll_cq_one_read_cqe:
        /* eval ib_wc_opcode */
        wc->opcode = ib_wc_opcode[cqe->optype]-1;
        if (unlikely(wc->opcode == -1)) {
-               EDEB_ERR(4, "Invalid cqe->OPType=%x cqe->status=%x "
+               ehca_err(cq->device, "Invalid cqe->OPType=%x 
cqe->status=%x "
                         "ehca_cq=%p cq_num=%x",
                         cqe->optype, cqe->status, my_cq, 
my_cq->cq_number);
                /* dump cqe for other infos */
-               EDEB_DMP(4, cqe, 64, "ehca_cq=%p cq_num=%x",
+               ehca_dmp(cqe, 64, "ehca_cq=%p cq_num=%x",
                         my_cq, my_cq->cq_number);
                /* update also queue adder to throw away this entry!!! */
                goto poll_cq_one_exit0;
@@ -604,49 +590,35 @@ poll_cq_one_read_cqe:
        wc->sl = cqe->service_level;
 
        if (wc->status != IB_WC_SUCCESS)
-               EDEB(6, "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe "
-                    "OPType=%x status=%x qp_num=%x src_qp=%x wr_id=%lx 
cqe=%p",
-                    my_cq, my_cq->cq_number, cqe->optype, cqe->status,
-                    cqe->local_qp_number, cqe->remote_qp_number,
-                    cqe->work_request_id, cqe);
+               ehca_dbg(cq->device,
+                        "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe "
+                        "OPType=%x status=%x qp_num=%x src_qp=%x 
wr_id=%lx "
+                        "cqe=%p", my_cq, my_cq->cq_number, cqe->optype,
+                        cqe->status, cqe->local_qp_number,
+                        cqe->remote_qp_number, cqe->work_request_id, 
cqe);
 
 poll_cq_one_exit0:
        if (cqe_count > 0)
                hipz_update_feca(my_cq, cqe_count);
 
-       EDEB_EX(7, "ret=%x ehca_cq=%p cq_number=%x wc=%p "
-               "status=%x opcode=%x qp_num=%x byte_len=%x",
-               ret, my_cq, my_cq->cq_number, wc, wc->status,
-               wc->opcode, wc->qp_num, wc->byte_len);
-
        return ret;
 }
 
 int ehca_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc)
 {
-       struct ehca_cq *my_cq = NULL;
-       int nr = 0;
-       struct ib_wc *current_wc = NULL;
+       struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
+       int nr;
+       struct ib_wc *current_wc = wc;
        int ret = 0;
-       unsigned long spl_flags = 0;
-
-       EHCA_CHECK_CQ(cq);
-       EHCA_CHECK_ADR(wc);
-
-       my_cq = container_of(cq, struct ehca_cq, ib_cq);
-       EHCA_CHECK_CQ(my_cq);
-
-       EDEB_EN(7, "ehca_cq=%p cq_num=%x num_entries=%d wc=%p",
-               my_cq, my_cq->cq_number, num_entries, wc);
+       unsigned long spl_flags;
 
        if (num_entries < 1) {
-               EDEB_ERR(4, "Invalid num_entries=%d ehca_cq=%p cq_num=%x",
-                        num_entries, my_cq, my_cq->cq_number);
+               ehca_err(cq->device, "Invalid num_entries=%d ehca_cq=%p "
+                        "cq_num=%x", num_entries, my_cq, 
my_cq->cq_number);
                ret = -EINVAL;
                goto poll_cq_exit0;
        }
 
-       current_wc = wc;
        spin_lock_irqsave(&my_cq->spinlock, spl_flags);
        for (nr = 0; nr < num_entries; nr++) {
                ret = ehca_poll_cq_one(cq, current_wc);
@@ -659,22 +631,12 @@ int ehca_poll_cq(struct ib_cq *cq, int n
                ret = nr;
 
 poll_cq_exit0:
-       EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x wc=%p nr_entries=%d",
-               my_cq, my_cq->cq_number, ret, wc, nr);
-
        return ret;
 }
 
 int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify)
 {
-       struct ehca_cq *my_cq = NULL;
-       int ret = 0;
-
-       EHCA_CHECK_CQ(cq);
-       my_cq = container_of(cq, struct ehca_cq, ib_cq);
-       EHCA_CHECK_CQ(my_cq);
-       EDEB_EN(7, "ehca_cq=%p cq_num=%x cq_notif=%x",
-               my_cq, my_cq->cq_number, cq_notify);
+       struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
 
        switch (cq_notify) {
        case IB_CQ_SOLICITED:
@@ -687,8 +649,5 @@ int ehca_req_notify_cq(struct ib_cq *cq,
                return -EINVAL;
        }
 
-       EDEB_EX(7, "ehca_cq=%p cq_num=%x ret=%x",
-               my_cq, my_cq->cq_number, ret);
-
-       return ret;
+       return 0;
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_sqp.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_sqp.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_sqp.c        2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_sqp.c     2006-08-30 
20:00:16.000000000 +0200
@@ -40,8 +40,6 @@
  */
 
 
-#define DEB_PREFIX "e_qp"
-
 #include <linux/module.h>
 #include <linux/err.h>
 #include "ehca_classes.h"
@@ -51,11 +49,6 @@
 #include "hcp_if.h"
 
 
-extern int ehca_create_aqp1(struct ehca_shca *shca, struct ehca_sport 
*sport);
-extern int ehca_destroy_aqp1(struct ehca_sport *sport);
-
-extern int ehca_port_act_time;
-
 /**
  * ehca_define_sqp - Defines special queue pair 1 (GSI QP). When special 
queue
  * pair is created successfully, the corresponding port gets active.
@@ -69,15 +62,10 @@ u64 ehca_define_sqp(struct ehca_shca *sh
                    struct ehca_qp *ehca_qp,
                    struct ib_qp_init_attr *qp_init_attr)
 {
-
-       u32 pma_qp_nr = 0;
-       u32 bma_qp_nr = 0;
-       u64 ret = H_SUCCESS;
+       u32 pma_qp_nr, bma_qp_nr;
+       u64 ret;
        u8 port = qp_init_attr->port_num;
-       int counter = 0;
-
-       EDEB_EN(7, "port=%x qp_type=%x",
-               port, qp_init_attr->qp_type);
+       int counter;
 
        shca->sport[port - 1].port_state = IB_PORT_DOWN;
 
@@ -93,31 +81,31 @@ u64 ehca_define_sqp(struct ehca_shca *sh
                                         &pma_qp_nr, &bma_qp_nr);
 
                if (ret != H_SUCCESS) {
-                       EDEB_ERR(4, "Can't define AQP1 for port %x. 
rc=%lx",
-                                   port, ret);
-                       goto ehca_define_aqp1;
+                       ehca_err(&shca->ib_device,
+                                "Can't define AQP1 for port %x. rc=%lx",
+                                port, ret);
+                       return ret;
                }
                break;
        default:
-               ret = H_PARAMETER;
-               goto ehca_define_aqp1;
+               ehca_err(&shca->ib_device, "invalid qp_type=%x",
+                        qp_init_attr->qp_type);
+               return H_PARAMETER;
        }
 
-       while ((shca->sport[port - 1].port_state != IB_PORT_ACTIVE) &&
-              (counter < ehca_port_act_time)) {
-               EDEB(6, "... wait until port %x is active",
-                       port);
+       for (counter = 0;
+            shca->sport[port - 1].port_state != IB_PORT_ACTIVE &&
+                    counter < ehca_port_act_time;
+            counter++) {
+               ehca_dbg(&shca->ib_device, "... wait until port %x is 
active",
+                        port);
                msleep_interruptible(1000);
-               counter++;
        }
 
        if (counter == ehca_port_act_time) {
-               EDEB_ERR(4, "Port %x is not active.", port);
-               ret = H_HARDWARE;
+               ehca_err(&shca->ib_device, "Port %x is not active.", 
port);
+               return H_HARDWARE;
        }
 
-ehca_define_aqp1:
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return H_SUCCESS;
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_tools.h 
linux-2.6/drivers/infiniband/hw/ehca/ehca_tools.h
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_tools.h      2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_tools.h   2006-08-30 
20:00:17.000000000 +0200
@@ -57,195 +57,70 @@
 #include <linux/version.h>
 #include <linux/notifier.h>
 #include <linux/cpu.h>
+#include <linux/device.h>
 
 #include <asm/abs_addr.h>
 #include <asm/ibmebus.h>
 #include <asm/io.h>
 #include <asm/pgtable.h>
 
-#define EHCA_EDEB_TRACE_MASK_SIZE 32
-extern u8 ehca_edeb_mask[EHCA_EDEB_TRACE_MASK_SIZE];
-#define EDEB_ID_TO_U32(str4) (str4[3] | (str4[2] << 8) | (str4[1] << 16) 
| \
-                             (str4[0] << 24))
+extern int ehca_debug_level;
 
-static inline u64 ehca_edeb_filter(const u32 level,
-                                  const u32 id, const u32 line)
-{
-       u64 ret = 0;
-       u32 filenr = 0;
-       u32 filter_level = 9;
-       u32 dynamic_level = 0;
-
-       /*
-        * This is code written for the gcc -O2 optimizer
-        * which should collapse  to two single ints.
-        * Filter_level is the first level kicked out by
-        * compiler and  means trace everything below 6.
-        */
-
-       if (id == EDEB_ID_TO_U32("ehav")) {
-               filenr = 0x01;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("clas")) {
-               filenr = 0x02;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("cqeq")) {
-               filenr = 0x03;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("shca")) {
-               filenr = 0x05;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("eirq")) {
-               filenr = 0x06;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("lMad")) {
-               filenr = 0x07;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("mcas")) {
-               filenr = 0x08;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("mrmw")) {
-               filenr = 0x09;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("vpd ")) {
-               filenr = 0x0a;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("e_qp")) {
-               filenr = 0x0b;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("uqes")) {
-               filenr = 0x0c;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("PHYP")) {
-               filenr = 0x0d;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("hcpi")) {
-               filenr = 0x0e;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("iptz")) {
-               filenr = 0x0f;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("spta")) {
-               filenr = 0x10;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("simp")) {
-               filenr = 0x11;
-               filter_level = 8;
-       }
-       if (id == EDEB_ID_TO_U32("reqs")) {
-               filenr = 0x12;
-               filter_level = 8;
-       }
-
-       if ((filenr - 1) > sizeof(ehca_edeb_mask)) {
-               filenr = 0;
-       }
-
-       if (filenr == 0) {
-               filter_level = 9;
-       } /* default */
-       ret = filenr * 0x10000 + line;
-       if (filter_level <= level) {
-               return ret | 0x100000000L; /* this is the flag to not 
trace */
-       }
-       dynamic_level = ehca_edeb_mask[filenr];
-       if (likely(dynamic_level <= level)) {
-               ret = ret | 0x100000000L;
-       };
-       return ret;
-}
-
-#ifdef EHCA_USE_HCALL_KERNEL
-#ifdef CONFIG_PPC_PSERIES
-
-#include <asm/paca.h>
+#define ehca_dbg(ib_dev, format, arg...) \
+       do { \
+               if (unlikely(ehca_debug_level)) \
+                       dev_printk(KERN_DEBUG, (ib_dev)->dma_device, \
+                                  "PU%04x EHCA_DBG:%s " format "\n", \
+                                  get_paca()->paca_index, __FUNCTION__, \
+                                  ## arg); \
+       } while (0)
 
-/*
- * IS_EDEB_ON - Checks if debug is on for the given level.
- */
-#define IS_EDEB_ON(level) \
-((ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), __LINE__) & \
-  0x100000000L) == 0)
-
-#define EDEB_P_GENERIC(level,idstring,format,args...) \
-do { \
-       u64 ehca_edeb_filterresult =                                    \
-               ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), 
__LINE__);\
-       if ((ehca_edeb_filterresult & 0x100000000L) == 0)               \
-               printk("PU%04x %08x:%s " idstring " "format "\n",       \
-                      get_paca()->paca_index, 
(u32)(ehca_edeb_filterresult), \
-                      __func__,  ##args);                              \
-} while (1 == 0)
-
-#elif REAL_HCALL
-
-#define EDEB_P_GENERIC(level,idstring,format,args...) \
-do { \
-       u64 ehca_edeb_filterresult =                                    \
-               ehca_edeb_filter(level, EDEB_ID_TO_U32(DEB_PREFIX), 
__LINE__); \
-       if ((ehca_edeb_filterresult & 0x100000000L) == 0)               \
-               printk("%08x:%s " idstring " "format "\n",      \
-                       (u32)(ehca_edeb_filterresult), \
-                       __func__,  ##args); \
-} while (1 == 0)
-
-#endif
-#else
-
-#define IS_EDEB_ON(level) (1)
-
-#define EDEB_P_GENERIC(level,idstring,format,args...) \
-do { \
-       printk("%s " idstring " "format "\n",   \
-              __func__,  ##args);              \
-} while (1 == 0)
+#define ehca_info(ib_dev, format, arg...) \
+       dev_info((ib_dev)->dma_device, "PU%04x EHCA_INFO:%s " format "\n", 
\
+                get_paca()->paca_index, __FUNCTION__, ## arg)
+
+#define ehca_warn(ib_dev, format, arg...) \
+       dev_warn((ib_dev)->dma_device, "PU%04x EHCA_WARN:%s " format "\n", 
\
+                get_paca()->paca_index, __FUNCTION__, ## arg)
+
+#define ehca_err(ib_dev, format, arg...) \
+       dev_err((ib_dev)->dma_device, "PU%04x EHCA_ERR:%s " format "\n", \
+               get_paca()->paca_index, __FUNCTION__, ## arg)
+
+/* use this one only if no ib_dev available */
+#define ehca_gen_dbg(format, arg...) \
+       do { \
+               if (unlikely(ehca_debug_level)) \
+                       printk(KERN_DEBUG "PU%04x EHCA_DBG:%s " format 
"\n",\
+                              get_paca()->paca_index, __FUNCTION__, ## 
arg); \
+       } while (0)
 
-#endif
+#define ehca_gen_warn(format, arg...) \
+       do { \
+               if (unlikely(ehca_debug_level)) \
+                       printk(KERN_INFO "PU%04x EHCA_WARN:%s " format 
"\n",\
+                              get_paca()->paca_index, __FUNCTION__, ## 
arg); \
+       } while (0)
 
-/**
- * EDEB - Trace output macro.
- * @level: tracelevel
- * @format: optional format string, use "" if not desired
- * @args: printf like arguments for trace
- */
-#define EDEB(level,format,args...) \
-       EDEB_P_GENERIC(level,"",format,##args)
-#define EDEB_ERR(level,format,args...) \
-       EDEB_P_GENERIC(level,"HCAD_ERROR ",format,##args)
-#define EDEB_EN(level,format,args...) \
-       EDEB_P_GENERIC(level,">>>",format,##args)
-#define EDEB_EX(level,format,args...) \
-       EDEB_P_GENERIC(level,"<<<",format,##args)
+#define ehca_gen_err(format, arg...) \
+       printk(KERN_ERR "PU%04x EHCA_ERR:%s " format "\n", \
+               get_paca()->paca_index, __FUNCTION__, ## arg)
 
 /**
- * EDEB_DMP - macro to dump a memory block, whose length is n*8 bytes.
+ * ehca_dmp - printk a memory block, whose length is n*8 bytes.
  * Each line has the following layout:
  * <format string> adr=X ofs=Y <8 bytes hex> <8 bytes hex>
  */
-#define EDEB_DMP(level,adr,len,format,args...) \
+#define ehca_dmp(adr, len, format, args...) \
        do {                                   \
                unsigned int x;                       \
                unsigned int l = (unsigned int)(len); \
                unsigned char *deb = (unsigned char*)(adr);     \
                for (x = 0; x < l; x += 16) { \
-                       EDEB(level, format " adr=%p ofs=%04x %016lx 
%016lx", \
-                            ##args, deb, x, \
-                            *((u64 *)&deb[0]), *((u64 *)&deb[8])); \
+                       printk("EHCA_DMP:%s" format \
+                              " adr=%p ofs=%04x %016lx %016lx\n", \
+                              __FUNCTION__, ##args, deb, x, \
+                              *((u64 *)&deb[0]), *((u64 *)&deb[8])); \
                        deb += 16; \
                } \
        } while (0)
@@ -275,129 +150,8 @@ do { \
  * EHCA_BMASK_GET - extract a parameter from value by mask
  */
 #define EHCA_BMASK_GET(mask,value) \
-       ( EHCA_BMASK_MASK(mask)& 
(((u64)(value))>>EHCA_BMASK_SHIFTPOS(mask)))
-
-#define PARANOIA_MODE
-#ifdef PARANOIA_MODE
+       (EHCA_BMASK_MASK(mask)& 
(((u64)(value))>>EHCA_BMASK_SHIFTPOS(mask)))
 
-#define EHCA_CHECK_ADR_P(adr)                                  \
-       if (unlikely(adr == 0)) {                               \
-               EDEB_ERR(4, "adr=%p check failed line %i", adr, \
-                        __LINE__);                             \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_ADR(adr)                                    \
-       if (unlikely(adr == 0)) {                               \
-               EDEB_ERR(4, "adr=%p check failed line %i", adr, \
-                        __LINE__);                             \
-               return -EFAULT; }
-
-#define EHCA_CHECK_DEVICE_P(device)                            \
-       if (unlikely(device == 0)) {                            \
-               EDEB_ERR(4, "device=%p check failed", device);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_DEVICE(device)                              \
-       if (unlikely(device == 0)) {                            \
-               EDEB_ERR(4, "device=%p check failed", device);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_PD(pd)                              \
-       if (unlikely(pd == 0)) {                        \
-               EDEB_ERR(4, "pd=%p check failed", pd);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_PD_P(pd)                            \
-       if (unlikely(pd == 0)) {                        \
-               EDEB_ERR(4, "pd=%p check failed", pd);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_AV(av)                              \
-       if (unlikely(av == 0)) {                        \
-               EDEB_ERR(4, "av=%p check failed", av);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_AV_P(av)                            \
-       if (unlikely(av == 0)) {                        \
-               EDEB_ERR(4, "av=%p check failed", av);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_CQ(cq)                              \
-       if (unlikely(cq == 0)) {                        \
-               EDEB_ERR(4, "cq=%p check failed", cq);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_CQ_P(cq)                            \
-       if (unlikely(cq == 0)) {                        \
-               EDEB_ERR(4, "cq=%p check failed", cq);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_EQ(eq)                              \
-       if (unlikely(eq == 0)) {                        \
-               EDEB_ERR(4, "eq=%p check failed", eq);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_EQ_P(eq)                            \
-       if (unlikely(eq == 0)) {                        \
-               EDEB_ERR(4, "eq=%p check failed", eq);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_QP(qp)                              \
-       if (unlikely(qp == 0)) {                        \
-               EDEB_ERR(4, "qp=%p check failed", qp);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_QP_P(qp)                            \
-       if (unlikely(qp == 0)) {                        \
-               EDEB_ERR(4, "qp=%p check failed", qp);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_MR(mr)                              \
-       if (unlikely(mr == 0)) {                        \
-               EDEB_ERR(4, "mr=%p check failed", mr);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_MR_P(mr)                            \
-       if (unlikely(mr == 0)) {                        \
-               EDEB_ERR(4, "mr=%p check failed", mr);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_MW(mw)                              \
-       if (unlikely(mw == 0)) {                        \
-               EDEB_ERR(4, "mw=%p check failed", mw);  \
-               return -EFAULT; }
-
-#define EHCA_CHECK_MW_P(mw)                            \
-       if (unlikely(mw == 0)) {                        \
-               EDEB_ERR(4, "mw=%p check failed", mw);  \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_CHECK_FMR(fmr)                                    \
-       if (unlikely(fmr == 0)) {                               \
-               EDEB_ERR(4, "fmr=%p check failed", fmr);        \
-               return -EFAULT; }
-
-#define EHCA_CHECK_FMR_P(fmr)                                  \
-       if (unlikely(fmr == 0)) {                               \
-               EDEB_ERR(4, "fmr=%p check failed", fmr);        \
-               return ERR_PTR(-EFAULT); }
-
-#define EHCA_REGISTER_PD(device,pd)
-#define EHCA_REGISTER_AV(pd,av)
-#define EHCA_DEREGISTER_PD(PD)
-#define EHCA_DEREGISTER_AV(av)
-#else
-#define EHCA_CHECK_DEVICE_P(device)
-
-#define EHCA_CHECK_PD(pd)
-#define EHCA_REGISTER_PD(device,pd)
-#define EHCA_DEREGISTER_PD(PD)
-#endif
-
-static inline int ehca_adr_bad(void *adr)
-{
-       return !adr;
-}
 
 /* Converts ehca to ib return code */
 static inline int ehca2ib_return_code(u64 ehca_rc)
@@ -414,4 +168,5 @@ static inline int ehca2ib_return_code(u6
        }
 }
 
+
 #endif /* EHCA_TOOLS_H */
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_uverbs.c 
linux-2.6/drivers/infiniband/hw/ehca/ehca_uverbs.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ehca_uverbs.c     2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ehca_uverbs.c  2006-08-30 
20:00:16.000000000 +0200
@@ -40,9 +40,6 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#undef DEB_PREFIX
-#define DEB_PREFIX "uver"
-
 #include <asm/current.h>
 
 #include "ehca_classes.h"
@@ -54,30 +51,20 @@
 struct ib_ucontext *ehca_alloc_ucontext(struct ib_device *device,
                                        struct ib_udata *udata)
 {
-       struct ehca_ucontext *my_context = NULL;
-
-       EHCA_CHECK_ADR_P(device);
-       EDEB_EN(7, "device=%p name=%s", device, device->name);
+       struct ehca_ucontext *my_context;
 
        my_context = kzalloc(sizeof *my_context, GFP_KERNEL);
        if (!my_context) {
-               EDEB_ERR(4, "Out of memory device=%p", device);
+               ehca_err(device, "Out of memory device=%p", device);
                return ERR_PTR(-ENOMEM);
        }
 
-       EDEB_EX(7, "device=%p ucontext=%p", device, my_context);
-
        return &my_context->ib_ucontext;
 }
 
 int ehca_dealloc_ucontext(struct ib_ucontext *context)
 {
-       struct ehca_ucontext *my_context = NULL;
-       EHCA_CHECK_ADR(context);
-       EDEB_EN(7, "ucontext=%p", context);
-       my_context = container_of(context, struct ehca_ucontext, 
ib_ucontext);
-       kfree(my_context);
-       EDEB_EN(7, "ucontext=%p", context);
+       kfree(container_of(context, struct ehca_ucontext, ib_ucontext));
        return 0;
 }
 
@@ -91,83 +78,88 @@ struct page *ehca_nopage(struct vm_area_
        u32 rsrc_type = (fileoffset >> 24) & 0xF; /* sq,rq,cmnd_window */
        u32 cur_pid = current->tgid;
        unsigned long flags;
+       struct ehca_cq *cq;
+       struct ehca_qp *qp;
+       struct ehca_pd *pd;
+       u64 offset;
+       void *vaddr;
 
-       EDEB_EN(7, "vm_start=%lx vm_end=%lx vm_page_prot=%lx 
vm_fileoff=%lx "
-               "address=%lx",
-               vma->vm_start, vma->vm_end, vma->vm_page_prot, fileoffset,
-               address);
-
-       if (q_type == 1) { /* CQ */
-               struct ehca_cq *cq = NULL;
-               u64 offset;
-               void *vaddr = NULL;
-
+       switch (q_type) {
+       case 1: /* CQ */
                spin_lock_irqsave(&ehca_cq_idr_lock, flags);
                cq = idr_find(&ehca_cq_idr, idr_handle);
                spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 
-               if (cq->ownpid != cur_pid) {
-                       EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
-                                cur_pid, cq->ownpid);
+               /* make sure this mmap really belongs to the authorized 
user */
+               if (!cq) {
+                       ehca_gen_err("cq is NULL ret=NOPAGE_SIGBUS");
                        return NOPAGE_SIGBUS;
                }
 
-               /* make sure this mmap really belongs to the authorized 
user */
-               if (!cq) {
-                       EDEB_ERR(4, "cq is NULL ret=NOPAGE_SIGBUS");
+               if (cq->ownpid != cur_pid) {
+                       ehca_err(cq->ib_cq.device,
+                                "Invalid caller pid=%x ownpid=%x",
+                                cur_pid, cq->ownpid);
                        return NOPAGE_SIGBUS;
                }
+
                if (rsrc_type == 2) {
-                       EDEB(6, "cq=%p cq queuearea", cq);
+                       ehca_dbg(cq->ib_cq.device, "cq=%p cq queuearea", 
cq);
                        offset = address - vma->vm_start;
                        vaddr = ipz_qeit_calc(&cq->ipz_queue, offset);
-                       EDEB(6, "offset=%lx vaddr=%p", offset, vaddr);
+                       ehca_dbg(cq->ib_cq.device, "offset=%lx vaddr=%p",
+                                offset, vaddr);
                        mypage = virt_to_page(vaddr);
                }
-       } else if (q_type == 2) { /* QP */
-               struct ehca_qp *qp = NULL;
-               struct ehca_pd *pd = NULL;
-               u64 offset;
-               void *vaddr = NULL;
+               break;
 
+       case 2: /* QP */
                spin_lock_irqsave(&ehca_qp_idr_lock, flags);
                qp = idr_find(&ehca_qp_idr, idr_handle);
                spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
+               /* make sure this mmap really belongs to the authorized 
user */
+               if (!qp) {
+                       ehca_gen_err("qp is NULL ret=NOPAGE_SIGBUS");
+                       return NOPAGE_SIGBUS;
+               }
 
                pd = container_of(qp->ib_qp.pd, struct ehca_pd, ib_pd);
                if (pd->ownpid != cur_pid) {
-                       EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+                       ehca_err(qp->ib_qp.device,
+                                "Invalid caller pid=%x ownpid=%x",
                                 cur_pid, pd->ownpid);
                        return NOPAGE_SIGBUS;
                }
 
-               /* make sure this mmap really belongs to the authorized 
user */
-               if (!qp) {
-                       EDEB_ERR(4, "qp is NULL ret=NOPAGE_SIGBUS");
-                       return NOPAGE_SIGBUS;
-               }
                if (rsrc_type == 2) {   /* rqueue */
-                       EDEB(6, "qp=%p qp rqueuearea", qp);
+                       ehca_dbg(qp->ib_qp.device, "qp=%p qp rqueuearea", 
qp);
                        offset = address - vma->vm_start;
                        vaddr = ipz_qeit_calc(&qp->ipz_rqueue, offset);
-                       EDEB(6, "offset=%lx vaddr=%p", offset, vaddr);
+                       ehca_dbg(qp->ib_qp.device, "offset=%lx vaddr=%p",
+                                offset, vaddr);
                        mypage = virt_to_page(vaddr);
                } else if (rsrc_type == 3) {    /* squeue */
-                       EDEB(6, "qp=%p qp squeuearea", qp);
+                       ehca_dbg(qp->ib_qp.device, "qp=%p qp squeuearea", 
qp);
                        offset = address - vma->vm_start;
                        vaddr = ipz_qeit_calc(&qp->ipz_squeue, offset);
-                       EDEB(6, "offset=%lx vaddr=%p", offset, vaddr);
+                       ehca_dbg(qp->ib_qp.device, "offset=%lx vaddr=%p",
+                                offset, vaddr);
                        mypage = virt_to_page(vaddr);
                }
+               break;
+
+       default:
+               ehca_gen_err("bad queue type %x", q_type);
+               return NOPAGE_SIGBUS;
        }
 
        if (!mypage) {
-               EDEB_ERR(4, "Invalid page adr==NULL ret=NOPAGE_SIGBUS");
+               ehca_gen_err("Invalid page adr==NULL ret=NOPAGE_SIGBUS");
                return NOPAGE_SIGBUS;
        }
        get_page(mypage);
-       EDEB_EX(7, "page adr=%p", mypage);
+
        return mypage;
 }
 
@@ -181,159 +173,161 @@ int ehca_mmap(struct ib_ucontext *contex
        u32 idr_handle = fileoffset >> 32;
        u32 q_type = (fileoffset >> 28) & 0xF;    /* CQ, QP,...        */
        u32 rsrc_type = (fileoffset >> 24) & 0xF; /* sq,rq,cmnd_window */
-       u32 ret = -EFAULT;      /* assume the worst             */
-       u64 vsize = 0;          /* must be calculated/set below */
-       u64 physical = 0;       /* must be calculated/set below */
        u32 cur_pid = current->tgid;
+       u32 ret;
+       u64 vsize, physical;
        unsigned long flags;
+       struct ehca_cq *cq;
+       struct ehca_qp *qp;
+       struct ehca_pd *pd;
 
-       EDEB_EN(7, "vm_start=%lx vm_end=%lx vm_page_prot=%lx 
vm_fileoff=%lx",
-               vma->vm_start, vma->vm_end, vma->vm_page_prot, 
fileoffset);
-
-       if (q_type == 1) { /* CQ */
-               struct ehca_cq *cq;
-
+       switch (q_type) {
+       case  1: /* CQ */
                spin_lock_irqsave(&ehca_cq_idr_lock, flags);
                cq = idr_find(&ehca_cq_idr, idr_handle);
                spin_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 
+               /* make sure this mmap really belongs to the authorized 
user */
+               if (!cq)
+                       return -EINVAL;
+
                if (cq->ownpid != cur_pid) {
-                       EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+                       ehca_err(cq->ib_cq.device,
+                                "Invalid caller pid=%x ownpid=%x",
                                 cur_pid, cq->ownpid);
                        return -ENOMEM;
                }
 
-               /* make sure this mmap really belongs to the authorized 
user */
-               if (!cq)
-                       return -EINVAL;
-               if (!cq->ib_cq.uobject)
-                       return -EINVAL;
-               if (cq->ib_cq.uobject->context != context)
+               if (!cq->ib_cq.uobject || cq->ib_cq.uobject->context != 
context)
                        return -EINVAL;
-               if (rsrc_type == 1) {   /* galpa fw handle */
-                       EDEB(6, "cq=%p cq triggerarea", cq);
+
+               switch (rsrc_type) {
+               case 1: /* galpa fw handle */
+                       ehca_dbg(cq->ib_cq.device, "cq=%p cq triggerarea", 
cq);
                        vma->vm_flags |= VM_RESERVED;
                        vsize = vma->vm_end - vma->vm_start;
                        if (vsize != EHCA_PAGESIZE) {
-                               EDEB_ERR(4, "invalid vsize=%lx",
+                               ehca_err(cq->ib_cq.device, "invalid 
vsize=%lx",
                                         vma->vm_end - vma->vm_start);
-                               ret = -EINVAL;
-                               goto mmap_exit0;
+                               return -EINVAL;
                        }
 
                        physical = cq->galpas.user.fw_handle;
                        vma->vm_page_prot = 
pgprot_noncached(vma->vm_page_prot);
                        vma->vm_flags |= VM_IO | VM_RESERVED;
 
-                       EDEB(6, "vsize=%lx physical=%lx", vsize, 
physical);
+                       ehca_dbg(cq->ib_cq.device,
+                                "vsize=%lx physical=%lx", vsize, 
physical);
                        ret = remap_pfn_range(vma, vma->vm_start,
                                              physical >> PAGE_SHIFT, 
vsize,
                                              vma->vm_page_prot);
                        if (ret) {
-                               EDEB_ERR(4, "remap_pfn_range() failed 
ret=%x",
+                               ehca_err(cq->ib_cq.device,
+                                        "remap_pfn_range() failed 
ret=%x",
                                         ret);
-                               ret = -ENOMEM;
+                               return -ENOMEM;
                        }
-                       goto mmap_exit0;
-               } else if (rsrc_type == 2) {    /* cq queue_addr */
-                       EDEB(6, "cq=%p cq q_addr", cq);
+                       break;
+
+               case 2: /* cq queue_addr */
+                       ehca_dbg(cq->ib_cq.device, "cq=%p cq q_addr", cq);
                        vma->vm_flags |= VM_RESERVED;
                        vma->vm_ops = &ehcau_vm_ops;
-                       ret = 0;
-                       goto mmap_exit0;
-               } else {
-                       EDEB_ERR(6, "bad resource type %x", rsrc_type);
-                       ret = -EINVAL;
-                       goto mmap_exit0;
+                       break;
+
+               default:
+                       ehca_err(cq->ib_cq.device, "bad resource type %x",
+                                rsrc_type);
+                       return -EINVAL;
                }
-       } else if (q_type == 2) { /* QP */
-               struct ehca_qp *qp = NULL;
-               struct ehca_pd *pd = NULL;
+               break;
 
+       case 2: /* QP */
                spin_lock_irqsave(&ehca_qp_idr_lock, flags);
                qp = idr_find(&ehca_qp_idr, idr_handle);
                spin_unlock_irqrestore(&ehca_qp_idr_lock, flags);
 
+               /* make sure this mmap really belongs to the authorized 
user */
+               if (!qp)
+                       return -EINVAL;
+
                pd = container_of(qp->ib_qp.pd, struct ehca_pd, ib_pd);
                if (pd->ownpid != cur_pid) {
-                       EDEB_ERR(4, "Invalid caller pid=%x ownpid=%x",
+                       ehca_err(qp->ib_qp.device,
+                                "Invalid caller pid=%x ownpid=%x",
                                 cur_pid, pd->ownpid);
                        return -ENOMEM;
                }
 
-               /* make sure this mmap really belongs to the authorized 
user */
-               if (!qp || !qp->ib_qp.uobject ||
-                   qp->ib_qp.uobject->context != context) {
-                       EDEB(6, "qp=%p, uobject=%p, context=%p",
-                            qp, qp->ib_qp.uobject, 
qp->ib_qp.uobject->context);
-                       ret = -EINVAL;
-                       goto mmap_exit0;
-               }
-               if (rsrc_type == 1) {   /* galpa fw handle */
-                       EDEB(6, "qp=%p qp triggerarea", qp);
+               if (!qp->ib_qp.uobject || qp->ib_qp.uobject->context != 
context)
+                       return -EINVAL;
+
+               switch (rsrc_type) {
+               case 1: /* galpa fw handle */
+                       ehca_dbg(qp->ib_qp.device, "qp=%p qp triggerarea", 
qp);
                        vma->vm_flags |= VM_RESERVED;
                        vsize = vma->vm_end - vma->vm_start;
                        if (vsize != EHCA_PAGESIZE) {
-                               EDEB_ERR(4, "invalid vsize=%lx",
+                               ehca_err(qp->ib_qp.device, "invalid 
vsize=%lx",
                                         vma->vm_end - vma->vm_start);
-                               ret = -EINVAL;
-                               goto mmap_exit0;
+                               return -EINVAL;
                        }
 
                        physical = qp->galpas.user.fw_handle;
                        vma->vm_page_prot = 
pgprot_noncached(vma->vm_page_prot);
                        vma->vm_flags |= VM_IO | VM_RESERVED;
 
-                       EDEB(6, "vsize=%lx physical=%lx", vsize, 
physical);
+                       ehca_dbg(qp->ib_qp.device, "vsize=%lx 
physical=%lx",
+                                vsize, physical);
                        ret = remap_pfn_range(vma, vma->vm_start,
                                              physical >> PAGE_SHIFT, 
vsize,
                                              vma->vm_page_prot);
                        if (ret) {
-                               EDEB_ERR(4, "remap_pfn_range() failed 
ret=%x",
+                               ehca_err(qp->ib_qp.device,
+                                        "remap_pfn_range() failed 
ret=%x",
                                         ret);
-                               ret = -ENOMEM;
+                               return -ENOMEM;
                        }
-                       goto mmap_exit0;
-               } else if (rsrc_type == 2) {    /* qp rqueue_addr */
-                       EDEB(6, "qp=%p qp rqueue_addr", qp);
+                       break;
+
+               case 2: /* qp rqueue_addr */
+                       ehca_dbg(qp->ib_qp.device, "qp=%p qp rqueue_addr", 
qp);
                        vma->vm_flags |= VM_RESERVED;
                        vma->vm_ops = &ehcau_vm_ops;
-                       ret = 0;
-                       goto mmap_exit0;
-               } else if (rsrc_type == 3) {    /* qp squeue_addr */
-                       EDEB(6, "qp=%p qp squeue_addr", qp);
+                       break;
+
+               case 3: /* qp squeue_addr */
+                       ehca_dbg(qp->ib_qp.device, "qp=%p qp squeue_addr", 
qp);
                        vma->vm_flags |= VM_RESERVED;
                        vma->vm_ops = &ehcau_vm_ops;
-                       ret = 0;
-                       goto mmap_exit0;
-               } else {
-                       EDEB_ERR(4, "bad resource type %x", rsrc_type);
-                       ret = -EINVAL;
-                       goto mmap_exit0;
+                       break;
+
+               default:
+                       ehca_err(qp->ib_qp.device, "bad resource type %x",
+                                rsrc_type);
+                       return -EINVAL;
                }
-       } else {
-               EDEB_ERR(4, "bad queue type %x", q_type);
-               ret = -EINVAL;
-               goto mmap_exit0;
+               break;
+
+       default:
+               ehca_gen_err("bad queue type %x", q_type);
+               return -EINVAL;
        }
 
-mmap_exit0:
-       EDEB_EX(7, "ret=%x", ret);
-       return ret;
+       return 0;
 }
 
-int ehca_mmap_nopage(u64 foffset, u64 length, void ** mapped,
-                    struct vm_area_struct ** vma)
+int ehca_mmap_nopage(u64 foffset, u64 length, void **mapped,
+                    struct vm_area_struct **vma)
 {
-       EDEB_EN(7, "foffset=%lx length=%lx", foffset, length);
        down_write(&current->mm->mmap_sem);
        *mapped = (void*)do_mmap(NULL,0, length, PROT_WRITE,
                                 MAP_SHARED | MAP_ANONYMOUS,
                                 foffset);
        up_write(&current->mm->mmap_sem);
        if (!(*mapped)) {
-               EDEB_ERR(4, "couldn't mmap foffset=%lx length=%lx",
-                        foffset, length);
+               ehca_gen_err("couldn't mmap foffset=%lx length=%lx",
+                            foffset, length);
                return -EINVAL;
        }
 
@@ -342,49 +336,47 @@ int ehca_mmap_nopage(u64 foffset, u64 le
                down_write(&current->mm->mmap_sem);
                do_munmap(current->mm, 0, length);
                up_write(&current->mm->mmap_sem);
-               EDEB_ERR(4, "couldn't find vma queue=%p", *mapped);
+               ehca_gen_err("couldn't find vma queue=%p", *mapped);
                return -EINVAL;
        }
        (*vma)->vm_flags |= VM_RESERVED;
        (*vma)->vm_ops = &ehcau_vm_ops;
 
-       EDEB_EX(7, "mapped=%p", *mapped);
        return 0;
 }
 
-int ehca_mmap_register(u64 physical, void ** mapped,
-                      struct vm_area_struct ** vma)
+int ehca_mmap_register(u64 physical, void **mapped,
+                      struct vm_area_struct **vma)
 {
-       int ret = 0;
+       int ret;
        unsigned long vsize;
        /* ehca hw supports only 4k page */
        ret = ehca_mmap_nopage(0, EHCA_PAGESIZE, mapped, vma);
        if (ret) {
-               EDEB(4, "could'nt mmap physical=%lx", physical);
+               ehca_gen_err("could'nt mmap physical=%lx", physical);
                return ret;
        }
 
        (*vma)->vm_flags |= VM_RESERVED;
        vsize = (*vma)->vm_end - (*vma)->vm_start;
        if (vsize != EHCA_PAGESIZE) {
-               EDEB_ERR(4, "invalid vsize=%lx",
-                        (*vma)->vm_end - (*vma)->vm_start);
-               ret = -EINVAL;
-               return ret;
+               ehca_gen_err("invalid vsize=%lx",
+                            (*vma)->vm_end - (*vma)->vm_start);
+               return -EINVAL;
        }
 
        (*vma)->vm_page_prot = pgprot_noncached((*vma)->vm_page_prot);
        (*vma)->vm_flags |= VM_IO | VM_RESERVED;
 
-       EDEB(6, "vsize=%lx physical=%lx", vsize, physical);
        ret = remap_pfn_range((*vma), (*vma)->vm_start,
                              physical >> PAGE_SHIFT, vsize,
                              (*vma)->vm_page_prot);
        if (ret) {
-               EDEB_ERR(4, "remap_pfn_range() failed ret=%x", ret);
-               ret = -ENOMEM;
+               ehca_gen_err("remap_pfn_range() failed ret=%x", ret);
+               return -ENOMEM;
        }
-       return ret;
+
+       return 0;
 
 }
 
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_if.c 
linux-2.6/drivers/infiniband/hw/ehca/hcp_if.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_if.c  2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/hcp_if.c       2006-08-30 
20:00:17.000000000 +0200
@@ -41,13 +41,12 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "hcpi"
-
 #include <asm/hvcall.h>
 #include "ehca_tools.h"
 #include "hcp_if.h"
 #include "hcp_phyp.h"
 #include "hipz_fns.h"
+#include "ipz_pt_fn.h"
 
 #define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9,11)
 #define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12,12)
@@ -112,12 +111,12 @@ static long ehca_hcall_7arg_7ret(unsigne
                                 unsigned long *out6,
                                 unsigned long *out7)
 {
-       long ret = H_SUCCESS;
+       long ret;
        int i, sleep_msecs;
 
-       EDEB_EN(7, "opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx 
arg5=%lx"
-               " arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, 
arg5,
-               arg6, arg7);
+       ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx 
arg5=%lx "
+                    "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, 
arg5,
+                    arg6, arg7);
 
        for (i = 0; i < 5; i++) {
                ret = plpar_hcall_7arg_7ret(opcode,
@@ -133,26 +132,24 @@ static long ehca_hcall_7arg_7ret(unsigne
                }
 
                if (ret < H_SUCCESS)
-                       EDEB_ERR(4, "opcode=%lx ret=%lx"
-                                " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
-                                " arg5=%lx arg6=%lx arg7=%lx"
-                                " out1=%lx out2=%lx out3=%lx out4=%lx"
-                                " out5=%lx out6=%lx out7=%lx",
-                                opcode, ret,
-                                arg1, arg2, arg3, arg4,
-                                arg5, arg6, arg7,
-                                *out1, *out2, *out3, *out4,
-                                *out5, *out6, *out7);
-
-               EDEB_EX(7, "opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx 
"
-                       "out4=%lx out5=%lx out6=%lx out7=%lx",
-                       opcode, ret, *out1, *out2, *out3, *out4, *out5,
-                       *out6, *out7);
+                       ehca_gen_err("opcode=%lx ret=%lx"
+                                    " arg1=%lx arg2=%lx arg3=%lx 
arg4=%lx"
+                                    " arg5=%lx arg6=%lx arg7=%lx"
+                                    " out1=%lx out2=%lx out3=%lx 
out4=%lx"
+                                    " out5=%lx out6=%lx out7=%lx",
+                                    opcode, ret,
+                                    arg1, arg2, arg3, arg4,
+                                    arg5, arg6, arg7,
+                                    *out1, *out2, *out3, *out4,
+                                    *out5, *out6, *out7);
+
+               ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx 
out3=%lx "
+                            "out4=%lx out5=%lx out6=%lx out7=%lx",
+                            opcode, ret, *out1, *out2, *out3, *out4, 
*out5,
+                            *out6, *out7);
                return ret;
        }
 
-       EDEB_EX(7, "opcode=%lx ret=H_BUSY", opcode);
-
        return H_BUSY;
 }
 
@@ -176,14 +173,13 @@ static long ehca_hcall_9arg_9ret(unsigne
                                 unsigned long *out8,
                                 unsigned long *out9)
 {
-       long ret = H_SUCCESS;
+       long ret;
        int i, sleep_msecs;
 
-       EDEB_EN(7, "opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx "
-               "arg5=%lx arg6=%lx arg7=%lx arg8=%lx arg9=%lx",
-               opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7,
-               arg8, arg9);
-
+       ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx "
+                    "arg5=%lx arg6=%lx arg7=%lx arg8=%lx arg9=%lx",
+                    opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7,
+                    arg8, arg9);
 
        for (i = 0; i < 5; i++) {
                ret = plpar_hcall_9arg_9ret(opcode,
@@ -201,32 +197,32 @@ static long ehca_hcall_9arg_9ret(unsigne
                }
 
                if (ret < H_SUCCESS)
-                       EDEB_ERR(4, "opcode=%lx ret=%lx"
-                                " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
-                                " arg5=%lx arg6=%lx arg7=%lx arg8=%lx"
-                                " arg9=%lx"
-                                " out1=%lx out2=%lx out3=%lx out4=%lx"
-                                " out5=%lx out6=%lx out7=%lx out8=%lx"
-                                " out9=%lx",
-                                opcode, ret,
-                                arg1, arg2, arg3, arg4,
-                                arg5, arg6, arg7, arg8,
-                                arg9,
-                                *out1, *out2, *out3, *out4,
-                                *out5, *out6, *out7, *out8,
-                                *out9);
-
-               EDEB_EX(7, "opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx 
"
-                       "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx 
out9=%lx",
-                       opcode, ret,*out1, *out2, *out3, *out4, *out5, 
*out6,
-                       *out7, *out8, *out9);
+                       ehca_gen_err("opcode=%lx ret=%lx"
+                                    " arg1=%lx arg2=%lx arg3=%lx 
arg4=%lx"
+                                    " arg5=%lx arg6=%lx arg7=%lx 
arg8=%lx"
+                                    " arg9=%lx"
+                                    " out1=%lx out2=%lx out3=%lx 
out4=%lx"
+                                    " out5=%lx out6=%lx out7=%lx 
out8=%lx"
+                                    " out9=%lx",
+                                    opcode, ret,
+                                    arg1, arg2, arg3, arg4,
+                                    arg5, arg6, arg7, arg8,
+                                    arg9,
+                                    *out1, *out2, *out3, *out4,
+                                    *out5, *out6, *out7, *out8,
+                                    *out9);
+
+               ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx 
out3=%lx "
+                            "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx 
"
+                            "out9=%lx", opcode, ret,*out1, *out2, *out3, 
*out4,
+                            *out5, *out6, *out7, *out8, *out9);
                return ret;
 
        }
 
-       EDEB_EX(7, "opcode=%lx ret=H_BUSY", opcode);
        return H_BUSY;
 }
+
 u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle 
adapter_handle,
                             struct ehca_pfeq *pfeq,
                             const u32 neq_control,
@@ -236,18 +232,10 @@ u64 hipz_h_alloc_resource_eq(const struc
                             u32 * act_pages,
                             u32 * eq_ist)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 act_nr_of_entries_out = 0;
-       u64 act_pages_out         = 0;
-       u64 eq_ist_out            = 0;
-       u64 allocate_controls     = 0;
-       u32 x = (u64)(&x);
-
-       EDEB_EN(7, "pfeq=%p adapter_handle=%lx  new_control=%x"
-               " number_of_entries=%x",
-               pfeq, adapter_handle.handle, neq_control,
-               number_of_entries);
+       u64 allocate_controls;
+       u64 act_nr_of_entries_out, act_pages_out, eq_ist_out;
 
        /* resource type */
        allocate_controls = 3ULL;
@@ -276,10 +264,7 @@ u64 hipz_h_alloc_resource_eq(const struc
        *eq_ist            = (u32)eq_ist_out;
 
        if (ret == H_NOT_ENOUGH_RESOURCES)
-               EDEB_ERR(4, "Not enough resource - ret=%lx ", ret);
-
-       EDEB_EX(7, "act_nr_of_entries=%x act_pages=%x eq_ist=%x",
-               *act_nr_of_entries, *act_pages, *eq_ist);
+               ehca_gen_err("Not enough resource - ret=%lx ", ret);
 
        return ret;
 }
@@ -288,45 +273,30 @@ u64 hipz_h_reset_event(const struct ipz_
                       struct ipz_eq_handle eq_handle,
                       const u64 event_mask)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
 
-       EDEB_EN(7, "eq_handle=%lx, adapter_handle=%lx  event_mask=%lx",
-               eq_handle.handle, adapter_handle.handle, event_mask);
-
-       ret = ehca_hcall_7arg_7ret(H_RESET_EVENTS,
-                                  adapter_handle.handle, /* r4 */
-                                  eq_handle.handle,      /* r5 */
-                                  event_mask,            /* r6 */
-                                  0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_RESET_EVENTS,
+                                   adapter_handle.handle, /* r4 */
+                                   eq_handle.handle,      /* r5 */
+                                   event_mask,            /* r6 */
+                                   0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle 
adapter_handle,
                             struct ehca_cq *cq,
                             struct ehca_alloc_cq_parms *param)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 act_nr_of_entries_out;
-       u64 act_pages_out;
-       u64 g_la_privileged_out;
-       u64 g_la_user_out;
-
-       EDEB_EN(7, "Adapter_handle=%lx eq_handle=%lx cq_token=%x"
-               " cq_number_of_entries=%x",
-               adapter_handle.handle, param->eq_handle.handle,
-               cq->token, param->nr_cqe);
+       u64 act_nr_of_entries_out, act_pages_out;
+       u64 g_la_privileged_out, g_la_user_out;
 
        ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
                                   adapter_handle.handle,     /* r4  */
@@ -350,10 +320,7 @@ u64 hipz_h_alloc_resource_cq(const struc
                hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, 
g_la_user_out);
 
        if (ret == H_NOT_ENOUGH_RESOURCES)
-               EDEB_ERR(4, "Not enough resources. ret=%lx", ret);
-
-       EDEB_EX(7, "cq_handle=%lx act_nr_of_entries=%x act_pages=%x",
-               cq->ipz_cq_handle.handle, param->act_nr_of_entries, 
param->act_pages);
+               ehca_gen_err("Not enough resources. ret=%lx", ret);
 
        return ret;
 }
@@ -362,32 +329,13 @@ u64 hipz_h_alloc_resource_qp(const struc
                             struct ehca_qp *qp,
                             struct ehca_alloc_qp_parms *parms)
 {
-       u64 ret = H_SUCCESS;
-       u64 allocate_controls;
-       u64 max_r10_reg;
-       u64 dummy         = 0;
-       u64 qp_nr_out     = 0;
-       u64 r6_out        = 0;
-       u64 r7_out        = 0;
-       u64 r8_out        = 0;
-       u64 g_la_user_out = 0;
-       u64 r11_out       = 0;
+       u64 ret;
+       u64 dummy, allocate_controls, max_r10_reg;
+       u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out;
        u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1;
        u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1;
        int daqp_ctrl = parms->daqp_ctrl;
 
-       EDEB_EN(7, "Adapter_handle=%lx servicetype=%x signalingtype=%x"
-               " ud_av_l_key=%x send_cq_handle=%lx receive_cq_handle=%lx"
-               " async_eq_handle=%lx qp_token=%x pd=%x 
max_nr_send_wqes=%x"
-               " max_nr_receive_wqes=%x max_nr_send_sges=%x"
-               " max_nr_receive_sges=%x ud_av_l_key=%x galpa.pid=%x",
-               adapter_handle.handle, parms->servicetype, parms->sigtype,
-               parms->ud_av_l_key_ctl, qp->send_cq->ipz_cq_handle.handle,
-               qp->recv_cq->ipz_cq_handle.handle, 
parms->ipz_eq_handle.handle,
-               qp->token, parms->pd.value, max_nr_send_wqes,
-               max_nr_receive_wqes, parms->max_send_sge, 
parms->max_recv_sge,
-               parms->ud_av_l_key_ctl, qp->galpas.pid);
-
        allocate_controls =
                EHCA_BMASK_SET(H_ALL_RES_QP_ENHANCED_OPS,
                               (daqp_ctrl & DAQP_CTRL_ENABLE) ? 1 : 0)
@@ -453,17 +401,7 @@ u64 hipz_h_alloc_resource_qp(const struc
                hcp_galpas_ctor(&qp->galpas, g_la_user_out, 
g_la_user_out);
 
        if (ret == H_NOT_ENOUGH_RESOURCES)
-               EDEB_ERR(4, "Not enough resources. ret=%lx",ret);
-
-       EDEB_EX(7, "qp_nr=%x act_nr_send_wqes=%x"
-               " act_nr_receive_wqes=%x act_nr_send_sges=%x"
-               " act_nr_receive_sges=%x nr_sq_pages=%x"
-               " nr_rq_pages=%x galpa.user=%lx galpa.kernel=%lx",
-               qp->real_qp_num, parms->act_nr_send_wqes,
-               parms->act_nr_recv_wqes, parms->act_nr_send_sges,
-               parms->act_nr_recv_sges, parms->nr_sq_pages,
-               parms->nr_rq_pages, qp->galpas.user.fw_handle,
-               qp->galpas.kernel.fw_handle);
+               ehca_gen_err("Not enough resources. ret=%lx",ret);
 
        return ret;
 }
@@ -472,20 +410,15 @@ u64 hipz_h_query_port(const struct ipz_a
                      const u8 port_id,
                      struct hipz_query_port *query_port_response_block)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 r_cb;
-
-       EDEB_EN(7, "adapter_handle=%lx port_id %x",
-               adapter_handle.handle, port_id);
+       u64 r_cb = virt_to_abs(query_port_response_block);
 
-       if (((u64)query_port_response_block) & 0xfff) {
-               EDEB_ERR(4, "response block not page aligned");
+       if (r_cb & (EHCA_PAGESIZE-1)) {
+               ehca_gen_err("response block not page aligned");
                return H_PARAMETER;
        }
 
-       r_cb = virt_to_abs(query_port_response_block);
-
        ret = ehca_hcall_7arg_7ret(H_QUERY_PORT,
                                   adapter_handle.handle, /* r4 */
                                   port_id,               /* r5 */
@@ -499,19 +432,8 @@ u64 hipz_h_query_port(const struct ipz_a
                                   &dummy,
                                   &dummy);
 
-       EDEB_DMP(7, query_port_response_block, 64, 
"query_port_response_block");
-       EDEB(7, "offset31=%x offset35=%x offset36=%x",
-            ((u32*)query_port_response_block)[32],
-            ((u32*)query_port_response_block)[36],
-            ((u32*)query_port_response_block)[37]);
-       EDEB(7, "offset200=%x offset201=%x offset202=%x "
-            "offset203=%x",
-            ((u32*)query_port_response_block)[0x200],
-            ((u32*)query_port_response_block)[0x201],
-            ((u32*)query_port_response_block)[0x202],
-            ((u32*)query_port_response_block)[0x203]);
-
-       EDEB_EX(7, "ret=%lx", ret);
+       if (ehca_debug_level)
+               ehca_dmp(query_port_response_block, 64, "response_block");
 
        return ret;
 }
@@ -519,62 +441,26 @@ u64 hipz_h_query_port(const struct ipz_a
 u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
                     struct hipz_query_hca *query_hca_rblock)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
-       u64 r_cb;
-       EDEB_EN(7, "adapter_handle=%lx", adapter_handle.handle);
+       u64 r_cb = virt_to_abs(query_hca_rblock);
 
-       if (((u64)query_hca_rblock) & 0xfff) {
-               EDEB_ERR(4, "response_block=%p not page aligned",
-                        query_hca_rblock);
+       if (r_cb & (EHCA_PAGESIZE-1)) {
+               ehca_gen_err("response_block=%p not page aligned",
+                            query_hca_rblock);
                return H_PARAMETER;
        }
 
-       r_cb = virt_to_abs(query_hca_rblock);
-
-       ret = ehca_hcall_7arg_7ret(H_QUERY_HCA,
-                                  adapter_handle.handle, /* r4 */
-                                  r_cb,                  /* r5 */
-                                  0, 0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB(7, "offset0=%x offset1=%x offset2=%x offset3=%x",
-            ((u32*)query_hca_rblock)[0],
-            ((u32*)query_hca_rblock)[1],
-            ((u32*)query_hca_rblock)[2], ((u32*)query_hca_rblock)[3]);
-       EDEB(7, "offset4=%x offset5=%x offset6=%x offset7=%x",
-            ((u32*)query_hca_rblock)[4],
-            ((u32*)query_hca_rblock)[5],
-            ((u32*)query_hca_rblock)[6], ((u32*)query_hca_rblock)[7]);
-       EDEB(7, "offset8=%x offset9=%x offseta=%x offsetb=%x",
-            ((u32*)query_hca_rblock)[8],
-            ((u32*)query_hca_rblock)[9],
-            ((u32*)query_hca_rblock)[10], ((u32*)query_hca_rblock)[11]);
-       EDEB(7, "offsetc=%x offsetd=%x offsete=%x offsetf=%x",
-            ((u32*)query_hca_rblock)[12],
-            ((u32*)query_hca_rblock)[13],
-            ((u32*)query_hca_rblock)[14], ((u32*)query_hca_rblock)[15]);
-       EDEB(7, "offset136=%x offset192=%x offset204=%x",
-            ((u32*)query_hca_rblock)[32],
-            ((u32*)query_hca_rblock)[48], ((u32*)query_hca_rblock)[51]);
-       EDEB(7, "offset231=%x offset235=%x",
-            ((u32*)query_hca_rblock)[57], ((u32*)query_hca_rblock)[58]);
-       EDEB(7, "offset200=%x offset201=%x offset202=%x offset203=%x",
-            ((u32*)query_hca_rblock)[0x201],
-            ((u32*)query_hca_rblock)[0x202],
-            ((u32*)query_hca_rblock)[0x203],
-            ((u32*)query_hca_rblock)[0x204]);
-
-       EDEB_EX(7, "ret=%lx adapter_handle=%lx",
-               ret, adapter_handle.handle);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_QUERY_HCA,
+                                   adapter_handle.handle, /* r4 */
+                                   r_cb,                  /* r5 */
+                                   0, 0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle,
@@ -584,32 +470,22 @@ u64 hipz_h_register_rpage(const struct i
                          const u64 logical_address_of_page,
                          u64 count)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
 
-       EDEB_EN(7, "adapter_handle=%lx pagesize=%x queue_type=%x"
-               " resource_handle=%lx logical_address_of_page=%lx 
count=%lx",
-               adapter_handle.handle, pagesize, queue_type,
-               resource_handle, logical_address_of_page, count);
-
-       ret = ehca_hcall_7arg_7ret(H_REGISTER_RPAGES,
-                                  adapter_handle.handle,      /* r4  */
-                                  queue_type | pagesize << 8, /* r5  */
-                                  resource_handle,            /* r6  */
-                                  logical_address_of_page,    /* r7  */
-                                  count,                      /* r8  */
-                                  0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES,
+                                   adapter_handle.handle,      /* r4  */
+                                   queue_type | pagesize << 8, /* r5  */
+                                   resource_handle,            /* r6  */
+                                   logical_address_of_page,    /* r7  */
+                                   count,                      /* r8  */
+                                   0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle 
adapter_handle,
@@ -620,34 +496,22 @@ u64 hipz_h_register_rpage_eq(const struc
                             const u64 logical_address_of_page,
                             const u64 count)
 {
-       u64 ret = H_SUCCESS;
-
-       EDEB_EN(7, "pfeq=%p adapter_handle=%lx eq_handle=%lx pagesize=%x"
-               " queue_type=%x logical_address_of_page=%lx count=%lx",
-               pfeq, adapter_handle.handle, eq_handle.handle, pagesize,
-               queue_type,logical_address_of_page, count);
-
        if (count != 1) {
-               EDEB_ERR(4, "Ppage counter=%lx", count);
+               ehca_gen_err("Ppage counter=%lx", count);
                return H_PARAMETER;
        }
-       ret = hipz_h_register_rpage(adapter_handle,
-                                   pagesize,
-                                   queue_type,
-                                   eq_handle.handle,
-                                   logical_address_of_page, count);
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return hipz_h_register_rpage(adapter_handle,
+                                    pagesize,
+                                    queue_type,
+                                    eq_handle.handle,
+                                    logical_address_of_page, count);
 }
 
 u32 hipz_h_query_int_state(const struct ipz_adapter_handle 
adapter_handle,
                           u32 ist)
 {
-       u32 ret = H_SUCCESS;
-       u64 dummy = 0;
-
-       EDEB_EN(7, "ist=%x", ist);
+       u32 ret;
+       u64 dummy;
 
        ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE,
                                   adapter_handle.handle, /* r4 */
@@ -662,9 +526,7 @@ u32 hipz_h_query_int_state(const struct 
                                   &dummy);
 
        if (ret != H_SUCCESS && ret != H_BUSY)
-               EDEB_ERR(4, "Could not query interrupt state.");
-
-       EDEB_EX(7, "interrupt state: %x", ret);
+               ehca_gen_err("Could not query interrupt state.");
 
        return ret;
 }
@@ -678,24 +540,14 @@ u64 hipz_h_register_rpage_cq(const struc
                             const u64 count,
                             const struct h_galpa gal)
 {
-       u64 ret = H_SUCCESS;
-
-       EDEB_EN(7, "pfcq=%p adapter_handle=%lx cq_handle=%lx pagesize=%x"
-               " queue_type=%x logical_address_of_page=%lx count=%lx",
-               pfcq, adapter_handle.handle, cq_handle.handle, pagesize,
-               queue_type, logical_address_of_page, count);
-
        if (count != 1) {
-               EDEB_ERR(4, "Page counter=%lx", count);
+               ehca_gen_err("Page counter=%lx", count);
                return H_PARAMETER;
        }
 
-       ret = hipz_h_register_rpage(adapter_handle, pagesize, queue_type,
-                                   cq_handle.handle, 
logical_address_of_page,
-                                   count);
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return hipz_h_register_rpage(adapter_handle, pagesize, queue_type,
+                                    cq_handle.handle, 
logical_address_of_page,
+                                    count);
 }
 
 u64 hipz_h_register_rpage_qp(const struct ipz_adapter_handle 
adapter_handle,
@@ -707,24 +559,14 @@ u64 hipz_h_register_rpage_qp(const struc
                             const u64 count,
                             const struct h_galpa galpa)
 {
-       u64 ret = H_SUCCESS;
-
-       EDEB_EN(7, "pfqp=%p adapter_handle=%lx qp_handle=%lx pagesize=%x"
-               " queue_type=%x logical_address_of_page=%lx count=%lx",
-               pfqp, adapter_handle.handle, qp_handle.handle, pagesize,
-               queue_type, logical_address_of_page, count);
-
        if (count != 1) {
-               EDEB_ERR(4, "Page counter=%lx", count);
+               ehca_gen_err("Page counter=%lx", count);
                return H_PARAMETER;
        }
 
-       ret = hipz_h_register_rpage(adapter_handle,pagesize,queue_type,
- qp_handle.handle,logical_address_of_page,
-                                   count);
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return hipz_h_register_rpage(adapter_handle,pagesize,queue_type,
+ qp_handle.handle,logical_address_of_page,
+                                    count);
 }
 
 u64 hipz_h_disable_and_get_wqe(const struct ipz_adapter_handle 
adapter_handle,
@@ -734,36 +576,25 @@ u64 hipz_h_disable_and_get_wqe(const str
                               void **log_addr_next_rq_wqe2processed,
                               int dis_and_get_function_code)
 {
-       u64 ret = H_SUCCESS;
-       u8 function_code = 1;
        u64 dummy, dummy1, dummy2;
 
-       EDEB_EN(7, "pfqp=%p adapter_handle=%lx function=%x qp_handle=%lx",
-               pfqp, adapter_handle.handle, function_code, 
qp_handle.handle);
-
        if (!log_addr_next_sq_wqe2processed)
                log_addr_next_sq_wqe2processed = (void**)&dummy1;
        if (!log_addr_next_rq_wqe2processed)
                log_addr_next_rq_wqe2processed = (void**)&dummy2;
 
-       ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-                                  adapter_handle.handle,     /* r4 */
-                                  dis_and_get_function_code, /* r5 */
-                                  qp_handle.handle,          /* r6 */
-                                  0, 0, 0, 0,
-                                  (void*)log_addr_next_sq_wqe2processed,
-                                  (void*)log_addr_next_rq_wqe2processed,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-       EDEB_EX(7, "ret=%lx ladr_next_rq_wqe_out=%p"
-               " ladr_next_sq_wqe_out=%p", ret,
-               *log_addr_next_sq_wqe2processed,
-               *log_addr_next_rq_wqe2processed);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
+                                   adapter_handle.handle,     /* r4 */
+                                   dis_and_get_function_code, /* r5 */
+                                   qp_handle.handle,          /* r6 */
+                                   0, 0, 0, 0,
+                                   (void*)log_addr_next_sq_wqe2processed,
+                                   (void*)log_addr_next_rq_wqe2processed,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle,
@@ -773,22 +604,15 @@ u64 hipz_h_modify_qp(const struct ipz_ad
                     struct hcp_modify_qp_control_block *mqpcb,
                     struct h_galpa gal)
 {
-       u64 ret = H_SUCCESS;
-       u64 invalid_attribute_identifier = 0;
-       u64 rc_attrib_mask = 0;
-       u64 dummy;
-       u64 r_cb;
-       EDEB_EN(7, "pfqp=%p adapter_handle=%lx qp_handle=%lx"
-               " update_mask=%lx qp_state=%x mqpcb=%p",
-               pfqp, adapter_handle.handle, qp_handle.handle,
-               update_mask, mqpcb->qp_state, mqpcb);
+       u64 ret;
+       u64 dummy;
+       u64 invalid_attribute_identifier, rc_attrib_mask;
 
-       r_cb = virt_to_abs(mqpcb);
        ret = ehca_hcall_7arg_7ret(H_MODIFY_QP,
                                   adapter_handle.handle,         /* r4 */
                                   qp_handle.handle,              /* r5 */
                                   update_mask,                   /* r6 */
-                                  r_cb,                          /* r7 */
+                                  virt_to_abs(mqpcb),            /* r7 */
                                   0, 0, 0,
                                   &invalid_attribute_identifier, /* r4 */
                                   &dummy,                        /* r5 */
@@ -797,12 +621,9 @@ u64 hipz_h_modify_qp(const struct ipz_ad
                                   &dummy,                        /* r8 */
                                   &rc_attrib_mask,               /* r9 */
                                   &dummy);
-       if (ret == H_NOT_ENOUGH_RESOURCES)
-               EDEB_ERR(4, "Insufficient resources ret=%lx", ret);
 
-       EDEB_EX(7, "ret=%lx invalid_attribute_identifier=%lx"
-               " invalid_attribute_MASK=%lx", ret,
-               invalid_attribute_identifier, rc_attrib_mask);
+       if (ret == H_NOT_ENOUGH_RESOURCES)
+               ehca_gen_err("Insufficient resources ret=%lx", ret);
 
        return ret;
 }
@@ -813,47 +634,32 @@ u64 hipz_h_query_qp(const struct ipz_ada
                    struct hcp_modify_qp_control_block *qqpcb,
                    struct h_galpa gal)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
-       u64 r_cb;
-       EDEB_EN(7, "adapter_handle=%lx qp_handle=%lx",
-               adapter_handle.handle, qp_handle.handle);
-
-       r_cb = virt_to_abs(qqpcb);
-       EDEB(7, "r_cb=%lx", r_cb);
-
-       ret = ehca_hcall_7arg_7ret(H_QUERY_QP,
-                                  adapter_handle.handle, /* r4 */
-                                  qp_handle.handle,      /* r5 */
-                                  r_cb,                  /* r6 */
-                                  0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB_EX(7, "ret=%lx", ret);
 
-       return ret;
+       return ehca_hcall_7arg_7ret(H_QUERY_QP,
+                                   adapter_handle.handle, /* r4 */
+                                   qp_handle.handle,      /* r5 */
+                                   virt_to_abs(qqpcb),    /* r6 */
+                                   0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle,
                      struct ehca_qp *qp)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 ladr_next_sq_wqe_out;
-       u64 ladr_next_rq_wqe_out;
-
-       EDEB_EN(7, "qp=%p ipz_qp_handle=%lx adapter_handle=%lx",
-               qp, qp->ipz_qp_handle.handle, adapter_handle.handle);
+       u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out;
 
        ret = hcp_galpas_dtor(&qp->galpas);
        if (ret) {
-               EDEB_ERR(4, "Could not destruct qp->galpas");
+               ehca_gen_err("Could not destruct qp->galpas");
                return H_RESOURCE;
        }
        ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
@@ -870,7 +676,7 @@ u64 hipz_h_destroy_qp(const struct ipz_a
                                   &dummy,
                                   &dummy);
        if (ret == H_HARDWARE)
-               EDEB_ERR(4, "HCA not operational. ret=%lx", ret);
+               ehca_gen_err("HCA not operational. ret=%lx", ret);
 
        ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
                                   adapter_handle.handle,     /* r4 */
@@ -885,9 +691,7 @@ u64 hipz_h_destroy_qp(const struct ipz_a
                                   &dummy);
 
        if (ret == H_RESOURCE)
-               EDEB_ERR(4, "Resource still in use. ret=%lx", ret);
-
-       EDEB_EX(7, "ret=%lx", ret);
+               ehca_gen_err("Resource still in use. ret=%lx", ret);
 
        return ret;
 }
@@ -897,28 +701,20 @@ u64 hipz_h_define_aqp0(const struct ipz_
                       struct h_galpa gal,
                       u32 port)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
 
-       EDEB_EN(7, "port=%x ipz_qp_handle=%lx adapter_handle=%lx",
-               port, qp_handle.handle, adapter_handle.handle);
-
-       ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP0,
-                                  adapter_handle.handle, /* r4 */
-                                  qp_handle.handle,      /* r5 */
-                                  port,                  /* r6 */
-                                  0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_DEFINE_AQP0,
+                                   adapter_handle.handle, /* r4 */
+                                   qp_handle.handle,      /* r5 */
+                                   port,                  /* r6 */
+                                   0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle,
@@ -927,13 +723,9 @@ u64 hipz_h_define_aqp1(const struct ipz_
                       u32 port, u32 * pma_qp_nr,
                       u32 * bma_qp_nr)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 pma_qp_nr_out;
-       u64 bma_qp_nr_out;
-
-       EDEB_EN(7, "port=%x qp_handle=%lx adapter_handle=%lx",
-               port, qp_handle.handle, adapter_handle.handle);
+       u64 pma_qp_nr_out, bma_qp_nr_out;
 
        ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1,
                                   adapter_handle.handle, /* r4 */
@@ -952,10 +744,7 @@ u64 hipz_h_define_aqp1(const struct ipz_
        *bma_qp_nr = (u32)bma_qp_nr_out;
 
        if (ret == H_ALIAS_EXIST)
-               EDEB_ERR(4, "AQP1 already exists. ret=%lx", ret);
-
-       EDEB_EX(7, "ret=%lx pma_qp_nr=%i bma_qp_nr=%i",
-               ret, (int)*pma_qp_nr, (int)*bma_qp_nr);
+               ehca_gen_err("AQP1 already exists. ret=%lx", ret);
 
        return ret;
 }
@@ -966,23 +755,8 @@ u64 hipz_h_attach_mcqp(const struct ipz_
                       u16 mcg_dlid,
                       u64 subnet_prefix, u64 interface_id)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u8 *dgid_sp = (u8*)&subnet_prefix;
-       u8 *dgid_ii = (u8*)&interface_id;
-
-       EDEB_EN(7, "qp_handle=%lx adapter_handle=%lx\nMCG_DGID ="
-               " %d.%d.%d.%d.%d.%d.%d.%d."
-               " %d.%d.%d.%d.%d.%d.%d.%d",
-               qp_handle.handle, adapter_handle.handle,
-               dgid_sp[0], dgid_sp[1],
-               dgid_sp[2], dgid_sp[3],
-               dgid_sp[4], dgid_sp[5],
-               dgid_sp[6], dgid_sp[7],
-               dgid_ii[0], dgid_ii[1],
-               dgid_ii[2], dgid_ii[3],
-               dgid_ii[4], dgid_ii[5],
-               dgid_ii[6], dgid_ii[7]);
 
        ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP,
                                   adapter_handle.handle,     /* r4 */
@@ -1000,9 +774,7 @@ u64 hipz_h_attach_mcqp(const struct ipz_
                                   &dummy);
 
        if (ret == H_NOT_ENOUGH_RESOURCES)
-               EDEB_ERR(4, "Not enough resources. ret=%lx", ret);
-
-       EDEB_EX(7, "ret=%lx", ret);
+               ehca_gen_err("Not enough resources. ret=%lx", ret);
 
        return ret;
 }
@@ -1013,56 +785,34 @@ u64 hipz_h_detach_mcqp(const struct ipz_
                       u16 mcg_dlid,
                       u64 subnet_prefix, u64 interface_id)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
-       u8 *dgid_sp = (u8*)&subnet_prefix;
-       u8 *dgid_ii = (u8*)&interface_id;
 
-       EDEB_EN(7, "qp_handle=%lx adapter_handle=%lx\nMCG_DGID ="
-               " %d.%d.%d.%d.%d.%d.%d.%d."
-               " %d.%d.%d.%d.%d.%d.%d.%d",
-               qp_handle.handle, adapter_handle.handle,
-               dgid_sp[0], dgid_sp[1],
-               dgid_sp[2], dgid_sp[3],
-               dgid_sp[4], dgid_sp[5],
-               dgid_sp[6], dgid_sp[7],
-               dgid_ii[0], dgid_ii[1],
-               dgid_ii[2], dgid_ii[3],
-               dgid_ii[4], dgid_ii[5],
-               dgid_ii[6], dgid_ii[7]);
-       ret = ehca_hcall_7arg_7ret(H_DETACH_MCQP,
-                                  adapter_handle.handle, /* r4 */
-                                  qp_handle.handle,      /* r5 */
-                                  mcg_dlid,              /* r6 */
-                                  interface_id,          /* r7 */
-                                  subnet_prefix,         /* r8 */
-                                  0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_DETACH_MCQP,
+                                   adapter_handle.handle, /* r4 */
+                                   qp_handle.handle,      /* r5 */
+                                   mcg_dlid,              /* r6 */
+                                   interface_id,          /* r7 */
+                                   subnet_prefix,         /* r8 */
+                                   0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle,
                      struct ehca_cq *cq,
                      u8 force_flag)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
 
-       EDEB_EN(7, "cq->pf=%p cq=.%p ipz_cq_handle=%lx 
adapter_handle=%lx",
-               &cq->pf, cq, cq->ipz_cq_handle.handle, 
adapter_handle.handle);
-
        ret = hcp_galpas_dtor(&cq->galpas);
        if (ret) {
-               EDEB_ERR(4, "Could not destruct cp->galpas");
+               ehca_gen_err("Could not destruct cp->galpas");
                return H_RESOURCE;
        }
 
@@ -1080,9 +830,7 @@ u64 hipz_h_destroy_cq(const struct ipz_a
                                   &dummy);
 
        if (ret == H_RESOURCE)
-               EDEB(4, "ret=%lx ", ret);
-
-       EDEB_EX(7, "ret=%lx", ret);
+               ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret);
 
        return ret;
 }
@@ -1090,16 +838,12 @@ u64 hipz_h_destroy_cq(const struct ipz_a
 u64 hipz_h_destroy_eq(const struct ipz_adapter_handle adapter_handle,
                      struct ehca_eq *eq)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
 
-       EDEB_EN(7, "eq->pf=%p eq=%p ipz_eq_handle=%lx adapter_handle=%lx",
-               &eq->pf, eq, eq->ipz_eq_handle.handle,
-               adapter_handle.handle);
-
        ret = hcp_galpas_dtor(&eq->galpas);
        if (ret) {
-               EDEB_ERR(4, "Could not destruct eq->galpas");
+               ehca_gen_err("Could not destruct eq->galpas");
                return H_RESOURCE;
        }
 
@@ -1117,9 +861,7 @@ u64 hipz_h_destroy_eq(const struct ipz_a
 
 
        if (ret == H_RESOURCE)
-               EDEB_ERR(4, "Resource in use. ret=%lx ", ret);
-
-       EDEB_EX(7, "ret=%lx", ret);
+               ehca_gen_err("Resource in use. ret=%lx ", ret);
 
        return ret;
 }
@@ -1132,16 +874,11 @@ u64 hipz_h_alloc_resource_mr(const struc
                             const struct ipz_pd pd,
                             struct ehca_mr_hipzout_parms *outparms)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
        u64 lkey_out;
        u64 rkey_out;
 
-       EDEB_EN(7, "adapter_handle=%lx mr=%p vaddr=%lx length=%lx"
-               " access_ctrl=%x pd=%x",
-               adapter_handle.handle, mr, vaddr, length, access_ctrl,
-               pd.value);
-
        ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
                                   adapter_handle.handle,            /* r4 
*/
                                   5,                                /* r5 
*/
@@ -1160,9 +897,6 @@ u64 hipz_h_alloc_resource_mr(const struc
        outparms->lkey = (u32)lkey_out;
        outparms->rkey = (u32)rkey_out;
 
-       EDEB_EX(7, "ret=%lx mr_handle=%lx lkey=%x rkey=%x",
-               ret, outparms->handle.handle, outparms->lkey, 
outparms->rkey);
-
        return ret;
 }
 
@@ -1173,27 +907,22 @@ u64 hipz_h_register_rpage_mr(const struc
                             const u64 logical_address_of_page,
                             const u64 count)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
 
-       EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx pagesize=%x"
-               " queue_type=%x logical_address_of_page=%lx count=%lx",
-               adapter_handle.handle, mr, mr->ipz_mr_handle.handle, 
pagesize,
-               queue_type, logical_address_of_page, count);
-
-       if ((count > 1) && (logical_address_of_page & 0xfff)) {
-               EDEB_ERR(4, "logical_address_of_page not on a 4k boundary 
"
-                        "adapter_handle=%lx mr=%p mr_handle=%lx "
-                        "pagesize=%x queue_type=%x 
logical_address_of_page=%lx"
-                        " count=%lx",
-                        adapter_handle.handle, mr, 
mr->ipz_mr_handle.handle,
-                        pagesize, queue_type, logical_address_of_page, 
count);
+       if ((count > 1) && (logical_address_of_page & (EHCA_PAGESIZE-1))) 
{
+               ehca_gen_err("logical_address_of_page not on a 4k boundary 
"
+                            "adapter_handle=%lx mr=%p mr_handle=%lx "
+                            "pagesize=%x queue_type=%x "
+                            "logical_address_of_page=%lx count=%lx",
+                            adapter_handle.handle, mr,
+                            mr->ipz_mr_handle.handle, pagesize, 
queue_type,
+                            logical_address_of_page, count);
                ret = H_PARAMETER;
        } else
                ret = hipz_h_register_rpage(adapter_handle, pagesize,
                                            queue_type,
                                            mr->ipz_mr_handle.handle,
                                            logical_address_of_page, 
count);
-       EDEB_EX(7, "ret=%lx", ret);
 
        return ret;
 }
@@ -1202,15 +931,9 @@ u64 hipz_h_query_mr(const struct ipz_ada
                    const struct ehca_mr *mr,
                    struct ehca_mr_hipzout_parms *outparms)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 remote_len_out;
-       u64 remote_vaddr_out;
-       u64 acc_ctrl_pd_out;
-       u64 r9_out;
-
-       EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx",
-               adapter_handle.handle, mr, mr->ipz_mr_handle.handle);
+       u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out;
 
        ret = ehca_hcall_7arg_7ret(H_QUERY_MR,
                                   adapter_handle.handle,     /* r4 */
@@ -1228,38 +951,25 @@ u64 hipz_h_query_mr(const struct ipz_ada
        outparms->lkey = (u32)(r9_out >> 32);
        outparms->rkey = (u32)(r9_out & (0xffffffff));
 
-       EDEB_EX(7, "ret=%lx mr_local_length=%lx mr_local_vaddr=%lx "
-               "mr_remote_length=%lx mr_remote_vaddr=%lx access_ctrl=%x "
-               "pd=%x lkey=%x rkey=%x", ret, outparms->len,
-               outparms->vaddr, remote_len_out, remote_vaddr_out,
-               outparms->acl, outparms->acl, outparms->lkey, 
outparms->rkey);
-
        return ret;
 }
 
 u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle 
adapter_handle,
                            const struct ehca_mr *mr)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
 
-       EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx",
-               adapter_handle.handle, mr, mr->ipz_mr_handle.handle);
-
-       ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-                                  adapter_handle.handle,    /* r4 */
-                                  mr->ipz_mr_handle.handle, /* r5 */
-                                  0, 0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
+                                   adapter_handle.handle,    /* r4 */
+                                   mr->ipz_mr_handle.handle, /* r5 */
+                                   0, 0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle,
@@ -1271,15 +981,9 @@ u64 hipz_h_reregister_pmr(const struct i
                          const u64 mr_addr_cb,
                          struct ehca_mr_hipzout_parms *outparms)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 lkey_out;
-       u64 rkey_out;
-
-       EDEB_EN(7, "adapter_handle=%lx mr=%p mr_handle=%lx vaddr_in=%lx "
-               "length=%lx access_ctrl=%x pd=%x mr_addr_cb=%lx",
-               adapter_handle.handle, mr, mr->ipz_mr_handle.handle, 
vaddr_in,
-               length, access_ctrl, pd.value, mr_addr_cb);
+       u64 lkey_out, rkey_out;
 
        ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR,
                                   adapter_handle.handle,    /* r4 */
@@ -1301,8 +1005,6 @@ u64 hipz_h_reregister_pmr(const struct i
        outparms->lkey = (u32)lkey_out;
        outparms->rkey = (u32)rkey_out;
 
-       EDEB_EX(7, "ret=%lx vaddr=%lx lkey=%x rkey=%x",
-               ret, outparms->vaddr, outparms->lkey, outparms->rkey);
        return ret;
 }
 
@@ -1314,16 +1016,9 @@ u64 hipz_h_register_smr(const struct ipz
                        const struct ipz_pd pd,
                        struct ehca_mr_hipzout_parms *outparms)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 lkey_out;
-       u64 rkey_out;
-
-       EDEB_EN(7, "adapter_handle=%lx orig_mr=%p orig_mr_handle=%lx "
-               "vaddr_in=%lx access_ctrl=%x pd=%x", 
adapter_handle.handle,
-               orig_mr, orig_mr->ipz_mr_handle.handle, vaddr_in, 
access_ctrl,
-               pd.value);
-
+       u64 lkey_out, rkey_out;
 
        ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR,
                                   adapter_handle.handle,            /* r4 
*/
@@ -1342,9 +1037,6 @@ u64 hipz_h_register_smr(const struct ipz
        outparms->lkey = (u32)lkey_out;
        outparms->rkey = (u32)rkey_out;
 
-       EDEB_EX(7, "ret=%lx mr_handle=%lx lkey=%x rkey=%x",
-               ret, outparms->handle.handle, outparms->lkey, 
outparms->rkey);
-
        return ret;
 }
 
@@ -1353,13 +1045,10 @@ u64 hipz_h_alloc_resource_mw(const struc
                             const struct ipz_pd pd,
                             struct ehca_mw_hipzout_parms *outparms)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
        u64 rkey_out;
 
-       EDEB_EN(7, "adapter_handle=%lx mw=%p pd=%x",
-               adapter_handle.handle, mw, pd.value);
-
        ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
                                   adapter_handle.handle,      /* r4 */
                                   6,                          /* r5 */
@@ -1375,8 +1064,6 @@ u64 hipz_h_alloc_resource_mw(const struc
 
        outparms->rkey = (u32)rkey_out;
 
-       EDEB_EX(7, "ret=%lx mw_handle=%lx rkey=%x",
-               ret, outparms->handle.handle, outparms->rkey);
        return ret;
 }
 
@@ -1384,13 +1071,9 @@ u64 hipz_h_query_mw(const struct ipz_ada
                    const struct ehca_mw *mw,
                    struct ehca_mw_hipzout_parms *outparms)
 {
-       u64 ret = H_SUCCESS;
+       u64 ret;
        u64 dummy;
-       u64 pd_out;
-       u64 rkey_out;
-
-       EDEB_EN(7, "adapter_handle=%lx mw=%p mw_handle=%lx",
-               adapter_handle.handle, mw, mw->ipz_mw_handle.handle);
+       u64 pd_out, rkey_out;
 
        ret = ehca_hcall_7arg_7ret(H_QUERY_MW,
                                   adapter_handle.handle,    /* r4 */
@@ -1405,34 +1088,25 @@ u64 hipz_h_query_mw(const struct ipz_ada
                                   &dummy);
        outparms->rkey = (u32)rkey_out;
 
-       EDEB_EX(7, "ret=%lx rkey=%x pd=%lx", ret, outparms->rkey, pd_out);
-
        return ret;
 }
 
 u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle 
adapter_handle,
                            const struct ehca_mw *mw)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
 
-       EDEB_EN(7, "adapter_handle=%lx mw=%p mw_handle=%lx",
-               adapter_handle.handle, mw, mw->ipz_mw_handle.handle);
-
-       ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-                                  adapter_handle.handle,    /* r4 */
-                                  mw->ipz_mw_handle.handle, /* r5 */
-                                  0, 0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
+                                   adapter_handle.handle,    /* r4 */
+                                   mw->ipz_mw_handle.handle, /* r5 */
+                                   0, 0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
 
 u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
@@ -1440,34 +1114,24 @@ u64 hipz_h_error_data(const struct ipz_a
                      void *rblock,
                      unsigned long *byte_count)
 {
-       u64 ret = H_SUCCESS;
        u64 dummy;
-       u64 r_cb;
-
-       EDEB_EN(7, "adapter_handle=%lx ressource_handle=%lx rblock=%p",
-               adapter_handle.handle, ressource_handle, rblock);
+       u64 r_cb = virt_to_abs(rblock);
 
-       if (((u64)rblock) & 0xfff) {
-               EDEB_ERR(4, "rblock not page aligned.");
+       if (r_cb & (EHCA_PAGESIZE-1)) {
+               ehca_gen_err("rblock not page aligned.");
                return H_PARAMETER;
        }
 
-       r_cb = virt_to_abs(rblock);
-
-       ret = ehca_hcall_7arg_7ret(H_ERROR_DATA,
-                                  adapter_handle.handle,
-                                  ressource_handle,
-                                  r_cb,
-                                  0, 0, 0, 0,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy,
-                                  &dummy);
-
-       EDEB_EX(7, "ret=%lx", ret);
-
-       return ret;
+       return ehca_hcall_7arg_7ret(H_ERROR_DATA,
+                                   adapter_handle.handle,
+                                   ressource_handle,
+                                   r_cb,
+                                   0, 0, 0, 0,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy,
+                                   &dummy);
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.c 
linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.c        2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.c     2006-08-30 
20:00:16.000000000 +0200
@@ -39,22 +39,17 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "PHYP"
-
 #include "ehca_classes.h"
 #include "hipz_hw.h"
 
 int hcall_map_page(u64 physaddr, u64 *mapaddr)
 {
        *mapaddr = (u64)(ioremap(physaddr, EHCA_PAGESIZE));
-
-       EDEB(7, "ioremap physaddr=%lx mapaddr=%lx", physaddr, *mapaddr);
        return 0;
 }
 
 int hcall_unmap_page(u64 mapaddr)
 {
-       EDEB(7, "mapaddr=%lx", mapaddr);
        iounmap((volatile void __iomem*)mapaddr);
        return 0;
 }
@@ -68,25 +63,18 @@ int hcp_galpas_ctor(struct h_galpas *gal
 
        galpas->user.fw_handle = paddr_user;
 
-       EDEB(7, "paddr_kernel=%lx paddr_user=%lx galpas->kernel=%lx"
-            " galpas->user=%lx",
-            paddr_kernel, paddr_user, galpas->kernel.fw_handle,
-            galpas->user.fw_handle);
-
-       return ret;
+       return 0;
 }
 
 int hcp_galpas_dtor(struct h_galpas *galpas)
 {
-       int ret = 0;
-
-       if (galpas->kernel.fw_handle)
-               ret = hcall_unmap_page(galpas->kernel.fw_handle);
-
-       if (ret)
-               return ret;
+       if (galpas->kernel.fw_handle) {
+               int ret = hcall_unmap_page(galpas->kernel.fw_handle);
+               if (ret)
+                       return ret;
+       }
 
        galpas->user.fw_handle = galpas->kernel.fw_handle = 0;
 
-       return ret;
+       return 0;
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.h 
linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.h
--- linux-2.6_orig/drivers/infiniband/hw/ehca/hcp_phyp.h        2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/hcp_phyp.h     2006-08-30 
20:00:16.000000000 +0200
@@ -69,19 +69,13 @@ struct h_galpas {
 static inline u64 hipz_galpa_load(struct h_galpa galpa, u32 offset)
 {
        u64 addr = galpa.fw_handle + offset;
-       u64 out;
-       EDEB_EN(7, "addr=%lx offset=%x ", addr, offset);
-       out = *(u64 *) addr;
-       EDEB_EX(7, "addr=%lx value=%lx", addr, out);
-       return out;
+       return *(volatile u64 __force *)addr;
 }
 
 static inline void hipz_galpa_store(struct h_galpa galpa, u32 offset, u64 
value)
 {
        u64 addr = galpa.fw_handle + offset;
-       EDEB(7, "addr=%lx offset=%x value=%lx", addr,
-            offset, value);
-       *(u64 *) addr = value;
+       *(volatile u64 __force *)addr = value;
 }
 
 int hcp_galpas_ctor(struct h_galpas *galpas,
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/hipz_fns_core.h 
linux-2.6/drivers/infiniband/hw/ehca/hipz_fns_core.h
--- linux-2.6_orig/drivers/infiniband/hw/ehca/hipz_fns_core.h   2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/hipz_fns_core.h        2006-08-30 
20:00:16.000000000 +0200
@@ -60,63 +60,41 @@
 
 static inline void hipz_update_sqa(struct ehca_qp *qp, u16 nr_wqes)
 {
-       struct h_galpa gal;
-
-       EDEB_EN(7, "qp=%p", qp);
-       gal = qp->galpas.kernel;
        /*  ringing doorbell :-) */
-       hipz_galpa_store_qp(gal, qpx_sqa, EHCA_BMASK_SET(QPX_SQADDER, 
nr_wqes));
-       EDEB_EX(7, "qp=%p QPx_SQA = %i", qp, nr_wqes);
+       hipz_galpa_store_qp(qp->galpas.kernel, qpx_sqa,
+                           EHCA_BMASK_SET(QPX_SQADDER, nr_wqes));
 }
 
 static inline void hipz_update_rqa(struct ehca_qp *qp, u16 nr_wqes)
 {
-       struct h_galpa gal;
-
-       EDEB_EN(7, "qp=%p", qp);
-       gal = qp->galpas.kernel;
        /*  ringing doorbell :-) */
-       hipz_galpa_store_qp(gal, qpx_rqa, EHCA_BMASK_SET(QPX_RQADDER, 
nr_wqes));
-       EDEB_EX(7, "qp=%p QPx_RQA = %i", qp, nr_wqes);
+       hipz_galpa_store_qp(qp->galpas.kernel, qpx_rqa,
+                           EHCA_BMASK_SET(QPX_RQADDER, nr_wqes));
 }
 
 static inline void hipz_update_feca(struct ehca_cq *cq, u32 nr_cqes)
 {
-       struct h_galpa gal;
-
-       EDEB_EN(7, "cq=%p", cq);
-       gal = cq->galpas.kernel;
-       hipz_galpa_store_cq(gal, cqx_feca,
+       hipz_galpa_store_cq(cq->galpas.kernel, cqx_feca,
                            EHCA_BMASK_SET(CQX_FECADDER, nr_cqes));
-       EDEB_EX(7, "cq=%p CQx_FECA = %i", cq, nr_cqes);
 }
 
 static inline void hipz_set_cqx_n0(struct ehca_cq *cq, u32 value)
 {
-       struct h_galpa gal;
-       u64 CQx_N0_reg = 0;
+       u64 cqx_n0_reg;
 
-       EDEB_EN(7, "cq=%p event on solicited completion -- write CQx_N0", 
cq);
-       gal = cq->galpas.kernel;
-       hipz_galpa_store_cq(gal, cqx_n0,
+       hipz_galpa_store_cq(cq->galpas.kernel, cqx_n0,
 EHCA_BMASK_SET(CQX_N0_GENERATE_SOLICITED_COMP_EVENT,
                                           value));
-       CQx_N0_reg = hipz_galpa_load_cq(gal, cqx_n0);
-       EDEB_EX(7, "cq=%p loaded CQx_N0=%lx", cq, (unsigned 
long)CQx_N0_reg);
+       cqx_n0_reg = hipz_galpa_load_cq(cq->galpas.kernel, cqx_n0);
 }
 
 static inline void hipz_set_cqx_n1(struct ehca_cq *cq, u32 value)
 {
-       struct h_galpa gal;
-       u64 CQx_N1_reg = 0;
+       u64 cqx_n1_reg;
 
-       EDEB_EN(7, "cq=%p event on completion -- write CQx_N1",
-               cq);
-       gal = cq->galpas.kernel;
-       hipz_galpa_store_cq(gal, cqx_n1,
+       hipz_galpa_store_cq(cq->galpas.kernel, cqx_n1,
                            EHCA_BMASK_SET(CQX_N1_GENERATE_COMP_EVENT, 
value));
-       CQx_N1_reg = hipz_galpa_load_cq(gal, cqx_n1);
-       EDEB_EX(7, "cq=%p loaded CQx_N1=%lx", cq, (unsigned 
long)CQx_N1_reg);
+       cqx_n1_reg = hipz_galpa_load_cq(cq->galpas.kernel, cqx_n1);
 }
 
 #endif /* __HIPZ_FNC_CORE_H__ */
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.c 
linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.c
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.c       2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.c    2006-08-30 
20:00:16.000000000 +0200
@@ -38,13 +38,9 @@
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
-#define DEB_PREFIX "iptz"
-
 #include "ehca_tools.h"
 #include "ipz_pt_fn.h"
 
-extern int ehca_hwlevel;
-
 void *ipz_qpageit_get_inc(struct ipz_queue *queue)
 {
        void *ret = ipz_qeit_get(queue);
@@ -54,10 +50,9 @@ void *ipz_qpageit_get_inc(struct ipz_que
                ret = NULL;
        }
        if (((u64)ret) % EHCA_PAGESIZE) {
-               EDEB(4, "ERROR!! not at PAGE-Boundary");
+               ehca_gen_err("ERROR!! not at PAGE-Boundary");
                return NULL;
        }
-       EDEB(7, "queue=%p ret=%p", queue, ret);
        return ret;
 }
 
@@ -65,15 +60,13 @@ void *ipz_qeit_eq_get_inc(struct ipz_que
 {
        void *ret = ipz_qeit_get(queue);
        u64 last_entry_in_q = queue->queue_length - queue->qe_size;
+
        queue->current_q_offset += queue->qe_size;
        if (queue->current_q_offset > last_entry_in_q) {
                queue->current_q_offset = 0;
                queue->toggle_state = (~queue->toggle_state) & 1;
        }
 
-       EDEB(7, "queue=%p ret=%p new current_q_offset=%lx qe_size=%x",
-            queue, ret, queue->current_q_offset, queue->qe_size);
-
        return ret;
 }
 
@@ -84,22 +77,20 @@ int ipz_queue_ctor(struct ipz_queue *que
        int pages_per_kpage = PAGE_SIZE >> EHCA_PAGESHIFT;
        int f;
 
-       EDEB_EN(7, "nr_of_pages=%x pagesize=%x qe_size=%x 
pages_per_kpage=%x",
-               nr_of_pages, pagesize, qe_size, pages_per_kpage);
        if (pagesize > PAGE_SIZE) {
-               EDEB_ERR(4, "FATAL ERROR: pagesize=%x is greater than "
-                        "kernel page size", pagesize);
+               ehca_gen_err("FATAL ERROR: pagesize=%x is greater "
+                            "than kernel page size", pagesize);
                return 0;
        }
        if (!pages_per_kpage) {
-               EDEB_ERR(4, "FATAL ERROR: invalid kernel page size. "
-                       "pages_per_kpage=%x", pages_per_kpage);
+               ehca_gen_err("FATAL ERROR: invalid kernel page size. "
+                            "pages_per_kpage=%x", pages_per_kpage);
                return 0;
        }
        queue->queue_length = nr_of_pages * pagesize;
        queue->queue_pages = vmalloc(nr_of_pages * sizeof(void *));
        if (!queue->queue_pages) {
-               EDEB(4, "ERROR!! didn't get the memory");
+               ehca_gen_err("ERROR!! didn't get the memory");
                return 0;
        }
        memset(queue->queue_pages, 0, nr_of_pages * sizeof(void *));
@@ -126,14 +117,11 @@ int ipz_queue_ctor(struct ipz_queue *que
        queue->act_nr_of_sg = nr_of_sg;
        queue->pagesize = pagesize;
        queue->toggle_state = 1;
-       EDEB_EX(7, "queue_length=%x queue_pages=%p qe_size=%x"
-               " act_nr_of_sg=%x", queue->queue_length, 
queue->queue_pages,
-               queue->qe_size, queue->act_nr_of_sg);
        return 1;
 
  ipz_queue_ctor_exit0:
-       EDEB_ERR(4, "Couldn't get alloc pages queue=%p f=%x 
nr_of_pages=%x",
-                queue, f, nr_of_pages);
+       ehca_gen_err("Couldn't get alloc pages queue=%p f=%x 
nr_of_pages=%x",
+                    queue, f, nr_of_pages);
        for (f = 0; f < nr_of_pages; f += pages_per_kpage) {
                if (!(queue->queue_pages)[f])
                        break;
@@ -148,19 +136,14 @@ int ipz_queue_dtor(struct ipz_queue *que
        int g;
        int nr_pages;
 
-       EDEB_EN(7, "ipz_queue pointer=%p", queue);
        if (!queue || !queue->queue_pages) {
-               EDEB_ERR(4, "queue or queue_pages is NULL");
+               ehca_gen_dbg("queue or queue_pages is NULL");
                return 0;
        }
-       EDEB(7, "destructing a queue with the following "
-            "properties:\n nr_of_pages=%x pagesize=%x qe_size=%x",
-            queue->act_nr_of_sg, queue->pagesize, queue->qe_size);
        nr_pages = queue->queue_length / queue->pagesize;
        for (g = 0; g < nr_pages; g += pages_per_kpage)
                free_page((unsigned long)(queue->queue_pages)[g]);
        vfree(queue->queue_pages);
 
-       EDEB_EX(7, "queue freed!");
        return 1;
 }
diff -Nurp linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h 
linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.h
--- linux-2.6_orig/drivers/infiniband/hw/ehca/ipz_pt_fn.h       2006-09-08 
00:16:13.000000000 +0200
+++ linux-2.6/drivers/infiniband/hw/ehca/ipz_pt_fn.h    2006-08-30 
20:00:17.000000000 +0200
@@ -43,7 +43,6 @@
 #ifndef __IPZ_PT_FN_H__
 #define __IPZ_PT_FN_H__
 
-#include "ehca_qes.h"
 #define EHCA_PAGESHIFT   12
 #define EHCA_PAGESIZE   4096UL
 #define EHCA_PAGEMASK   (~(EHCA_PAGESIZE-1))
@@ -76,7 +75,7 @@ struct ipz_queue {
  */
 static inline void *ipz_qeit_calc(struct ipz_queue *queue, u64 q_offset)
 {
-       struct ipz_page *current_page = NULL;
+       struct ipz_page *current_page;
        if (q_offset >= queue->queue_length)
                return NULL;
        current_page = (queue->queue_pages)[q_offset >> EHCA_PAGESHIFT];
@@ -118,9 +117,6 @@ static inline void *ipz_qeit_get_inc(str
                queue->toggle_state = (~queue->toggle_state) & 1;
        }
 
-       EDEB(7, "queue=%p ret=%p new current_q_addr=%lx qe_size=%x",
-            queue, ret, queue->current_q_offset, queue->qe_size);
-
        return ret;
 }
 
@@ -230,7 +226,6 @@ static inline void *ipz_eqit_eq_get_inc_
 {
        void *ret = ipz_qeit_get(queue);
        u32 qe = *(u8 *) ret;
-       EDEB(7, "ipz_QEit_EQ_get_inc_valid qe=%x", qe);
        if ((qe >> 7) == (queue->toggle_state & 1))
                ipz_qeit_eq_get_inc(queue); /* this is a good one */
        else
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/b9a6f0ba/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5203 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/b9a6f0ba/attachment.bin>

From mst at mellanox.co.il  Thu Sep  7 14:45:24 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 8 Sep 2006 00:45:24 +0300
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
Message-ID: <20060907214524.GA14791@mellanox.co.il>

OK, we are hitting the lost RTU case quite a lot in OFED.
So the following patch will ship with OFED.

Sean, did we decide what to do for upstream yet?
I would say we need something like the below for 2.6.19 too
(probably just need to update node type check).
And, I like it that this approach leaves all matters of policy
to users (such as whether move QP to RTS after asynchronous event
or after completion event).

As a side note, reasons for frequent loss of RTU must be investigated.

---

IB/cma: add rdma_establish

Make it possible for ULPs to handle RTU loss by calling
rdma_establish.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: a/include/rdma/rdma_cm.h
===================================================================
--- a/include/rdma/rdma_cm.h	(revision 8822)
+++ a/include/rdma/rdma_cm.h	(working copy)
@@ -256,6 +256,16 @@ int rdma_listen(struct rdma_cm_id *id, i
 int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);
 
 /**
+ * rdma_establish - Forces a connection state to established.
+ * @id: Connection identifier to transition to established.
+ *
+ * This routine should be invoked by users who receive messages on a
+ * QP before being notified that the connection has been established by the
+ * RDMA CM.
+ */
+int rdma_establish(struct rdma_cm_id *id);
+
+/**
  * rdma_reject - Called to reject a connection request or response.
  */
 int rdma_reject(struct rdma_cm_id *id, const void *private_data,
Index: a/drivers/infiniband/core/cm.c
===================================================================
--- a/drivers/infiniband/core/cm.c	(revision 8823)
+++ a/drivers/infiniband/core/cm.c	(working copy)
@@ -3207,6 +3207,10 @@ static int cm_init_qp_rts_attr(struct cm
 
 	spin_lock_irqsave(&cm_id_priv->lock, flags);
 	switch (cm_id_priv->id.state) {
+	/* Allow transition to RTS before sending REP */
+	case IB_CM_REQ_RCVD:
+	case IB_CM_MRA_REQ_SENT:
+
 	case IB_CM_REP_RCVD:
 	case IB_CM_MRA_REP_SENT:
 	case IB_CM_REP_SENT:
Index: a/drivers/infiniband/core/cma.c
===================================================================
--- a/drivers/infiniband/core/cma.c	(revision 8822)
+++ a/drivers/infiniband/core/cma.c	(working copy)
@@ -840,22 +840,6 @@ static int cma_verify_rep(struct rdma_id
 	return 0;
 }
 
-static int cma_rtu_recv(struct rdma_id_private *id_priv)
-{
-	int ret;
-
-	ret = cma_modify_qp_rts(&id_priv->id);
-	if (ret)
-		goto reject;
-
-	return 0;
-reject:
-	cma_modify_qp_err(&id_priv->id);
-	ib_send_cm_rej(id_priv->cm_id.ib, IB_CM_REJ_CONSUMER_DEFINED,
-		       NULL, 0, NULL, 0);
-	return ret;
-}
-
 static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event)
 {
 	struct rdma_id_private *id_priv = cm_id->context;
@@ -886,9 +870,8 @@ static int cma_ib_handler(struct ib_cm_i
 		private_data_len = IB_CM_REP_PRIVATE_DATA_SIZE;
 		break;
 	case IB_CM_RTU_RECEIVED:
-		status = cma_rtu_recv(id_priv);
-		event = status ? RDMA_CM_EVENT_CONNECT_ERROR :
-				 RDMA_CM_EVENT_ESTABLISHED;
+	case IB_CM_USER_ESTABLISHED:
+		event = RDMA_CM_EVENT_ESTABLISHED;
 		break;
 	case IB_CM_DREQ_ERROR:
 		status = -ETIMEDOUT; /* fall through */
@@ -1981,11 +1964,25 @@ static int cma_accept_ib(struct rdma_id_
 			 struct rdma_conn_param *conn_param)
 {
 	struct ib_cm_rep_param rep;
-	int ret;
+	struct ib_qp_attr qp_attr;
+	int qp_attr_mask, ret;
 
-	ret = cma_modify_qp_rtr(&id_priv->id);
-	if (ret)
-		return ret;
+	if (id_priv->id.qp) {
+		ret = cma_modify_qp_rtr(&id_priv->id);
+		if (ret)
+			goto out;
+
+		qp_attr.qp_state = IB_QPS_RTS;
+		ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, &qp_attr,
+					 &qp_attr_mask);
+		if (ret)
+			goto out;
+
+		qp_attr.max_rd_atomic = conn_param->initiator_depth;
+		ret = ib_modify_qp(id_priv->id.qp, &qp_attr, qp_attr_mask);
+		if (ret)
+			goto out;
+	}
 
 	memset(&rep, 0, sizeof rep);
 	rep.qp_num = id_priv->qp_num;
@@ -2000,7 +1997,9 @@ static int cma_accept_ib(struct rdma_id_
 	rep.rnr_retry_count = conn_param->rnr_retry_count;
 	rep.srq = id_priv->srq ? 1 : 0;
 
-	return ib_send_cm_rep(id_priv->cm_id.ib, &rep);
+	ret = ib_send_cm_rep(id_priv->cm_id.ib, &rep);
+out:
+	return ret;
 }
 
 static int cma_send_sidr_rep(struct rdma_id_private *id_priv,
@@ -2058,6 +2057,27 @@ reject:
 }
 EXPORT_SYMBOL(rdma_accept);
 
+int rdma_establish(struct rdma_cm_id *id)
+{
+	struct rdma_id_private *id_priv;
+	int ret;
+
+	id_priv = container_of(id, struct rdma_id_private, id);
+	if (!cma_comp(id_priv, CMA_CONNECT))
+		return -EINVAL;
+
+	switch (id->device->node_type) {
+	case IB_NODE_CA:
+		ret = ib_cm_establish(id_priv->cm_id.ib);
+		break;
+	default:
+		ret = 0;
+		break;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(rdma_establish);
+
 int rdma_reject(struct rdma_cm_id *id, const void *private_data,
 		u8 private_data_len)
 {


-- 
MST


From mst at mellanox.co.il  Thu Sep  7 14:46:15 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 8 Sep 2006 00:46:15 +0300
Subject: [openib-general] [PATCH for-2.6.19] IB/ipoib: Fix flush/start xmit
 race take 2 (from code review)
Message-ID: <20060907214614.GB14791@mellanox.co.il>

Hello, Roland!
The following patch in the for-2.6.19 series:

    IB/ipoib: Fix flush/start xmit race (from code review)

introduces a sleep-under spinlock condition: we don't drop tx_lock while
scanning remove_list (look at ipoib_flush_paths, I think it'll be obvious).
Here's a fixed version, pls queue in for-2.6.19.

--

ipoib race reported after code review by Eitan Rabin:
http://openib.org/pipermail/openib-general/2006-June/022916.html

Prevent flush task from freeing the ipoib_neigh pointer,
while ipoib_start_xmit is accessing the ipoib_neigh through
the pointer is has loaded from the hardware address.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cf71d2a..31c4b05 100644
Index: ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- ofed_1_1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c	2006-08-01 15:29:48.000000000 +0300
+++ ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c	2006-09-05 11:47:07.000000000 +0300
@@ -336,7 +336,8 @@ void ipoib_flush_paths(struct net_device
 	struct ipoib_path *path, *tp;
 	LIST_HEAD(remove_list);
 
-	spin_lock_irq(&priv->lock);
+	spin_lock_irq(&priv->tx_lock);
+	spin_lock(&priv->lock);
 
 	list_splice(&priv->path_list, &remove_list);
 	INIT_LIST_HEAD(&priv->path_list);
@@ -347,12 +348,15 @@ void ipoib_flush_paths(struct net_device
 	list_for_each_entry_safe(path, tp, &remove_list, list) {
 		if (path->query)
 			ib_sa_cancel_query(path->query_id, path->query);
-		spin_unlock_irq(&priv->lock);
+		spin_unlock(&priv->lock);
+		spin_unlock_irq(&priv->tx_lock);
 		wait_for_completion(&path->done);
 		path_free(dev, path);
-		spin_lock_irq(&priv->lock);
+		spin_lock_irq(&priv->tx_lock);
+		spin_lock(&priv->lock);
 	}
-	spin_unlock_irq(&priv->lock);
+	spin_unlock(&priv->lock);
+	spin_unlock_irq(&priv->tx_lock);
 }
 
 static void path_rec_completion(int status,

-- 
MST


From rdreier at cisco.com  Thu Sep  7 14:49:51 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 07 Sep 2006 14:49:51 -0700
Subject: [openib-general] [PATCH for-2.6.19] IB/ipoib: Fix flush/start
 xmit race take 2 (from code review)
In-Reply-To: <20060907214614.GB14791@mellanox.co.il> (Michael S.
	Tsirkin's message of "Fri, 8 Sep 2006 00:46:15 +0300")
References: <20060907214614.GB14791@mellanox.co.il>
Message-ID: <ada7j0f9wjk.fsf@cisco.com>

Thanks, I replaced the patch in my tree.


From ralphc at pathscale.com  Thu Sep  7 16:03:33 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Thu, 07 Sep 2006 16:03:33 -0700
Subject: [openib-general] [PATCH] IB/ipath Fix RPM build for libipathverbs
Message-ID: <1157670213.8759.117.camel@brick.pathscale.com>

A minor change to fix RPM builds for libipathverbs.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>

Index: src/userspace/libipathverbs/Makefile.am
===================================================================
--- src/userspace/libipathverbs/Makefile.am	(revision 9347)
+++ src/userspace/libipathverbs/Makefile.am	(working copy)
@@ -49,6 +49,7 @@ src_ipathverbs_la_LDFLAGS = -avoid-versi
     $(ipathverbs_version_script)
 
 EXTRA_DIST = src/ipathverbs.h \
+    src/ipath-abi.h \
     src/ipathverbs.map \
     libipathverbs.spec.in
 

From rjwalsh at pathscale.com  Thu Sep  7 17:04:15 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Thu, 07 Sep 2006 17:04:15 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C8DEB9A@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C8DEB9A@orsmsx418.amr.corp.intel.com>
Message-ID: <4500B37F.3080705@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Woodruff, Robert J wrote:
> Robert Walsh wrote,
>> I'll give it a spin this afternoon: it looks quite a bit more
>> comprehensive than the small patch I did.
> 
> I also just tried running the ib_rdma_bw test and it seems to
> be flaky if you stress it. If you just run the defaults, it seems to
> work, but if you crank up the iterations and the message size,
> it sometimes fails with.....
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey 0x2302400
> VAddr 0x00002a95dd3480
> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey 0x2402500
> VAddr 0x00002a95c85480
> 4730:main: Completion with error at client:
> 4730:main: Failed status 9: wr_id 3
> 4730:main: scnt=7584, ccnt=6584
> [woody at rkl-13 bin]$  

Hi Woody,

When RC4 is available, there should be a patch in there that will fix
this.  Can you let us know if you continue to see problems?

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQCzfvzvnpzTd9fxAQLfoAf+JWrBo/pPf/tAvTRFckCqjOn3dpH59mJK
n1KuN/M9lsP0UobIOEAMAR3KLvTfFe2czEb7ThMxcKjYgJHiikxuiSomB3pbsRK5
W0qTEqMmS5QYFXfpPlvVof4xxdvWZDDUzzkxG0bve4zBVjeJMUnu/8jVTTBmGbqd
nmqfLrIP+N8n876x1RZade3DTz0NEDDYRT5d25asbUVuoiF7ldVtbX5RmK6rRdFZ
1ym6fIyHT+fTZ5wnVoTJRdjV8icrR9JpPj/BFL6OoxDQvgMksplDnJaTGc4XinFl
WdwZV2NfImYvwSB4QUgqe4Me/BS1xl4gj+OpaviE2TzP7U6tqQVaHQ==
=OLHZ
-----END PGP SIGNATURE-----


From dledford at redhat.com  Thu Sep  7 17:44:06 2006
From: dledford at redhat.com (Doug Ledford)
Date: Thu, 07 Sep 2006 20:44:06 -0400
Subject: [openib-general] OpenSM/osm_log API: Use symbol versionsrather
 than polluting namespace
In-Reply-To: <20060907062243.GH6928@mellanox.co.il>
References: <1157602561.4652.53.camel@fc6.xsintricity.com>
	<20060907062243.GH6928@mellanox.co.il>
Message-ID: <1157676246.15761.182.camel@fc6.xsintricity.com>

On Thu, 2006-09-07 at 09:22 +0300, Michael S. Tsirkin wrote:
> Quoting r. Doug Ledford <dledford at redhat.com>:
> > Subject: Re: [openib-general] OpenSM/osm_log API: Use symbol versionsrather than polluting namespace
> > 
> > On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > > 
> > > > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting namespace
> > > > > > 
> > > > > > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > > > > > symbols, use symbol versions and have a versioned osm_log_init rather
> > > > > > than adding osm_log_init_v2 as an additional API
> > > > > > 
> > > > > > This patch is intended to be applied to both trunk and 1.1 versions.
> > > > > > 
> > > > > > Signed-off-by: Doug Ledford <dledford at redhat.com>
> > > > > > Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> > > > > 
> > > > > This preserves the ABI, but would this not break the API?
> > > > 
> > > > Yes, this patch changes the API (in a most trivial way).
> > > 
> > > So all users need to change code or they won't compile against the new
> > > library?
> > 
> > Yes, and that is the correct way to handle this change.
> 
> I disagree.
> 
> In my opinion, asking all users to add a parameter they don't care about is
> worse than having multiple functions with a convenient set of options.

Dude, you can't do that.

Ulrich Drepper, long time Cygnus/Red Hat employee, upstream maintainer
of glibc, and probably the most all around authoritative person I know
when it comes to open source library management, keeps most of the
papers he has delivered to different conventions on his web site:

http://people.redhat.com/drepper/

Amongst those papers are several best practices papers on shared library
design and maintenance.  Two things in particular jump out as reasons to
keep the list of exported symbols in a DSO to the absolute bare minimum:
1) every symbol in the DSO is part of the global symbol table for the
app, which has to be searched through during run time symbol resolution,
so the more symbols you have, the more dynamic linking slows down the
application (ever wondered why it takes from 20s to 1min to start
OpenOffice?  It's because of the proliferation of DSO symbol exports and
symbol table relocations and lookups in the OO libs) and 2) if you keep
the exported symbols at the bare minimum needed to implement the API, it
helps to free up the library to change and evolve behind the scenes
without requiring as many changes to linked applications, where as the
more you expose to the applications, the more likely you are to have to
change that exposed interface at some point in time.

What you just argued for is the opposite of both of those accepted and
commonly used practices.  Not only that, but since all the old code
*could* be made to work with the new API using nothing more than a macro
in a header file, to argue for an extra symbol export is *really* the
wrong thing to do.  It's violating those two best practices above when
you could achieve the same goal without violating either.

>   And if
> there is a low cost way to help apps compile without code change, I don't see
> why it makes sense to create work.

Dude, you can't do that either.  You have to keep the API clean.  If you
try to push the maintenance burden into the libraries instead of making
the apps carry their share of the maintenance, then the library just
ends up imploding under the impossible complexity of keeping all those
different API code bases working.  This particular issue certainly
wouldn't have been the end of the world, but when enough things like
this creep in over time, you'll eventually need to make a gen3 stack
because this one is unusable.  You have to put your foot down and just
say no on stuff like this.


> > APIs change.
> 
> APIs should not change with every release.

With a mature product, no.  This is hardly a mature product.


> > Any app you can build can compensate.
> 
> Sure it seems simple if you are RedHat and rebuild the whole OS.

OFED == whole stack == same thing.  If you are allowing some out of
stack software configuration issue to cause you as OFED maintainer to
put hacks into OFED, then you need to put your OFED interests in front
of your Mellanox interests when making OFED decisions.

> We are past code freeze. I agree with Hal that it might be hard to
> draw a line
> between a critical and a non-critical bugfix. However, an API change
> that
> 1. is purely cosmetical
> 2. requires code changes in dependent applications
> 3. is not uncontroversial
> is, for me, obviously beyond that line.

While I agree with the past code freeze and would support yanking the
entire log file truncation change for that reason, the preservation of a
clean API, proper library symbol versions, and DSO best practices are
far from "cosmetical".

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060907/d5c6939c/attachment.sig>

From maheshbarve at gmail.com  Thu Sep  7 21:23:38 2006
From: maheshbarve at gmail.com (Mahesh Barve)
Date: Fri, 8 Sep 2006 09:53:38 +0530
Subject: [openib-general] Multicast: help needed
Message-ID: <507df10d0609072123y7348a115q558bcdb83d3347d6@mail.gmail.com>

 Hi,

 I am trying to perform multicast over Infiniband. Can someone let me know
where I can get some sample code for it?

Awaiting your reply,
-Mahesh Barve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060908/891187e9/attachment.html>

From krkumar2 at in.ibm.com  Thu Sep  7 22:13:01 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Fri, 08 Sep 2006 10:43:01 +0530
Subject: [openib-general] [PATCH] Modify callers of cma_get_net_info for
 better error handling.
Message-ID: <20060908051301.5221.63041.sendpatchset@K50wks273895wss.in.ibm.com>

Re-organize code relating to cma_get_net_info() and rdma_create_id() to
optimize error case handling (no need to alloc memory/etc as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-08 09:51:40.000000000 +0530
+++ new/core/cma.c	2006-09-08 09:52:05.000000000 +0530
@@ -939,23 +939,24 @@ static struct rdma_id_private* cma_new_c
 	__u16 port;
 	u8 ip_ver;
 
+	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
+			     &ip_ver, &port, &src, &dst))
+		goto out;
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
 			    listen_id->ps);
 	if (IS_ERR(id))
-		return NULL;
+		goto out;
+
+	cma_save_net_info(&id->route.addr, &listen_id->route.addr,
+			  ip_ver, port, src, dst);
 
 	rt = &id->route;
 	rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1;
-	rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL);
+	rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths,
+			       GFP_KERNEL);
 	if (!rt->path_rec)
-		goto err;
+		goto destroy_id;
 
-	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
-			     &ip_ver, &port, &src, &dst))
-		goto err;
-
-	cma_save_net_info(&id->route.addr, &listen_id->route.addr,
-			  ip_ver, port, src, dst);
 	rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path;
 	if (rt->num_paths == 2)
 		rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
@@ -968,8 +969,10 @@ static struct rdma_id_private* cma_new_c
 	id_priv = container_of(id, struct rdma_id_private, id);
 	id_priv->state = CMA_CONNECT;
 	return id_priv;
-err:
+
+destroy_id:
 	rdma_destroy_id(id);
+out :
 	return NULL;
 }
 
@@ -981,31 +984,30 @@ static struct rdma_id_private* cma_new_u
 	union cma_ip_addr *src, *dst;
 	__u16 port;
 	u8 ip_ver;
-	int ret;
+
+	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
+			     &ip_ver, &port, &src, &dst))
+		goto out;
 
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
 			    listen_id->ps);
 	if (IS_ERR(id))
-		return NULL;
-
-
-	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
-			     &ip_ver, &port, &src, &dst))
-		goto err;
+		goto out;
 
 	cma_save_net_info(&id->route.addr, &listen_id->route.addr,
 			  ip_ver, port, src, dst);
 
-	ret = rdma_translate_ip(&id->route.addr.src_addr,
-				&id->route.addr.dev_addr);
-	if (ret)
-		goto err;
+	if (rdma_translate_ip(&id->route.addr.src_addr,
+				&id->route.addr.dev_addr))
+		goto destroy_id;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
 	id_priv->state = CMA_CONNECT;
 	return id_priv;
-err:
+
+destroy_id:
 	rdma_destroy_id(id);
+out:
 	return NULL;
 }
 

From krkumar2 at in.ibm.com  Thu Sep  7 22:13:13 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Fri, 08 Sep 2006 10:43:13 +0530
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
	cases.
Message-ID: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>

cma_connect_ib leaks an struct ib_cm_id* in failure cases.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-08 09:52:05.000000000 +0530
+++ new/core/cma.c	2006-09-08 09:52:30.000000000 +0530
@@ -1954,6 +1954,11 @@ static int cma_connect_ib(struct rdma_id
 
 	ret = ib_send_cm_req(id_priv->cm_id.ib, &req);
 out:
+	if (ret && !IS_ERR(id_priv->cm_id.ib)) {
+		ib_destroy_cm_id(id_priv->cm_id.ib);
+		id_priv->cm_id.ib = NULL;
+	}
+
 	kfree(private_data);
 	return ret;
 }


From krkumar2 at in.ibm.com  Thu Sep  7 22:14:39 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Fri, 08 Sep 2006 10:44:39 +0530
Subject: [openib-general] [PATCH] Optimize cma_process_remove()
Message-ID: <20060908051439.5229.71544.sendpatchset@K50wks273895wss.in.ibm.com>

Optimize cma_process_remove() by using the remove_list.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-08 09:52:30.000000000 +0530
+++ new/core/cma.c	2006-09-08 09:57:03.000000000 +0530
@@ -2332,7 +2332,7 @@ static int cma_remove_id_dev(struct rdma
 static void cma_process_remove(struct cma_device *cma_dev)
 {
 	struct list_head remove_list;
-	struct rdma_id_private *id_priv;
+	struct rdma_id_private *id_priv, *tmp;
 	int ret;
 
 	INIT_LIST_HEAD(&remove_list);
@@ -2344,22 +2344,20 @@ static void cma_process_remove(struct cm
 
 		if (cma_internal_listen(id_priv)) {
 			cma_destroy_listen(id_priv);
-			continue;
+		} else {
+			list_del(&id_priv->list);
+			list_add_tail(&id_priv->list, &remove_list);
 		}
+	}
+	mutex_unlock(&lock);
 
-		list_del(&id_priv->list);
-		list_add_tail(&id_priv->list, &remove_list);
+	list_for_each_entry_safe(id_priv, tmp, &remove_list, list) {
 		atomic_inc(&id_priv->refcount);
-		mutex_unlock(&lock);
-
 		ret = cma_remove_id_dev(id_priv);
 		cma_deref_id(id_priv);
 		if (ret)
 			rdma_destroy_id(&id_priv->id);
-
-		mutex_lock(&lock);
 	}
-	mutex_unlock(&lock);
 
 	cma_deref_dev(cma_dev);
 	wait_for_completion(&cma_dev->comp);


From or.gerlitz at gmail.com  Thu Sep  7 22:21:36 2006
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 8 Sep 2006 07:21:36 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <aday7sva13d.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com>
	<aday7sva13d.fsf@cisco.com>
Message-ID: <15ddcffd0609072221y1151d8dey48fdffc287660fbc@mail.gmail.com>

On 9/7/06, Roland Dreier <rdreier at cisco.com> wrote:
>     Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i
>     Or> think you are missing CONFIG_INET=m
>
> Seems like a bug in the iSER Kconfig -- it shouldn't be possible to
> select iSER without everything it needs to compile.

OK, this makes sense, we will look into that and send patch early next week.

Or.


From mst at mellanox.co.il  Fri Sep  8 02:29:39 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 8 Sep 2006 12:29:39 +0300
Subject: [openib-general] OpenSM/osm_log API: Use symbol
 versionsratherthan polluting namespace
In-Reply-To: <1157676246.15761.182.camel@fc6.xsintricity.com>
References: <1157676246.15761.182.camel@fc6.xsintricity.com>
Message-ID: <20060908092939.GA10741@mellanox.co.il>

Quoting r. Doug Ledford <dledford at redhat.com>:
> What you just argued for is the opposite of both of those accepted and
> commonly used practices.  Not only that, but since all the old code
> *could* be made to work with the new API using nothing more than a macro
> in a header file, to argue for an extra symbol export is *really* the
> wrong thing to do.

I didn't argue about exporting symbols at all.
Macro in a header file to make existing code work would be fine, but it is not
present in the patch that was posted.

-- 
MST


From johnt1johnt2 at gmail.com  Fri Sep  8 03:19:57 2006
From: johnt1johnt2 at gmail.com (john t)
Date: Fri, 8 Sep 2006 15:49:57 +0530
Subject: [openib-general] HCAs with and without memory
Message-ID: <a94efc20609080319w2fa92499lee9cfb3758bdaa13@mail.gmail.com>

Hi OpenIB group,

What is the difference between HCAs with memory and without memory. How is
the on-board memory used by HCAs? Is it that data is first copied into this
memory and then into physical memory?

Regards,
John T.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060908/473eeb31/attachment.html>

From thomas.bub at thomson.net  Fri Sep  8 04:07:47 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Fri, 8 Sep 2006 13:07:47 +0200
Subject: [openib-general] Wrong byte order in lid of struct ibv_port_attr
 reported by ibv_query port!?
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD38E@wdtssmail01.eu.thmulti.com>

Sean,
with the help of your modified cmpost.c example I found out that the
byte order in the lid your query_for_path in cmpost.c is getting into
the ib_sa_path_rec is the opposite to the one reported by
ibv_query_port.
Since I'm doing the connection establishment based on lid, GUID and
subnetID this explains why I can't connect to a client to antoher
machine.
Can you tell me which one is wrong or am I doing something wrong here?

Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060908/f6b3c2b5/attachment.html>

From devesh28 at gmail.com  Fri Sep  8 05:01:01 2006
From: devesh28 at gmail.com (Devesh Sharma)
Date: Fri, 8 Sep 2006 17:31:01 +0530
Subject: [openib-general] mthca_modify_qp : acquiring Send/Receive Q locks
 while modifying qp
Message-ID: <309a667c0609080501h23b2b54em716d94a025676c69@mail.gmail.com>

Hello all,
In mthca_modify_qp function, to read current qp state both send and
receive queues are locked,  why locking both WQ is required? Is there
any dependency on other qp operations?

if (attr_mask & IB_QP_CUR_STATE) {
                cur_state = attr->cur_qp_state;
        } else {
                spin_lock_irq(&qp->sq.lock);
                spin_lock(&qp->rq.lock);
                cur_state = qp->state;
                spin_unlock(&qp->rq.lock);
                spin_unlock_irq(&qp->sq.lock);
        }
Devesh


From thomas.bub at thomson.net  Fri Sep  8 06:57:39 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Fri, 8 Sep 2006 15:57:39 +0200
Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD38F@wdtssmail01.eu.thmulti.com>

Dortan Barak wrote:
If you are using RC QP:
the reason for not getting any completion in the CQ is that

Did you post any RR (Receive Request) at the listener side?


Dotan,
with the cmpost.c example I now get a cm connection even with another
machine.
However I don't get the cq event, on the sender side, when the
IBV_WR_SEND is done. Is this correct? Is this what you are saying below?
If it is correct this is different from gen1 drivers where I got a
VAPI_SUCCESS cq event. Is there a way to get this back?

On the receiver side I get an cq event for the receive request.

Thanks
Thomas


From swise at opengridcomputing.com  Fri Sep  8 07:14:31 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 08 Sep 2006 09:14:31 -0500
Subject: [openib-general] Multicast: help needed
In-Reply-To: <507df10d0609072123y7348a115q558bcdb83d3347d6@mail.gmail.com>
References: <507df10d0609072123y7348a115q558bcdb83d3347d6@mail.gmail.com>
Message-ID: <1157724871.31760.18.camel@stevo-desktop>

There's a simple test case at: 

gen2/trunk/src/userspace/librdmacm/examples/mckey.c

  
On Fri, 2006-09-08 at 09:53 +0530, Mahesh Barve wrote:
> Hi,
>  
>  I am trying to perform multicast over Infiniband. Can someone let me
> know where I can get some sample code for it? 
>  
> Awaiting your reply,
> -Mahesh Barve
>  
>  
>  
>  
>  
>  
>  
>  
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From HNGUYEN at de.ibm.com  Fri Sep  8 07:28:41 2006
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Fri, 8 Sep 2006 16:28:41 +0200
Subject: [openib-general] OFED 1.1 status
In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com>
Message-ID: <OF0A584E8F.0BC8F351-ONC12571E3.004DD9F6-C12571E3.004F305F@de.ibm.com>

Hello Tziporet!
First sorry for this late response regarding ehca build test in OFED 1.1
rc3.

1) The userspace lib dir for libehca contains only a few c-files, but no
header files.
On svn dir branches/1.1/src/userspace/libehca/src/ I saw all files needed.
Please correct
this for rc4!
Will you pick new version of libehca from that dir?

2) When I used the install.sh script to install the software packages or
compile
them on ppc64, kernel 2.6.18-rc5/6 I got the following error messages:

  gcc -m64 -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1
/drivers/infiniband/core/.ib_addr.mod.o.d  -nos
M/BUILD/openib-1.1/include  -I/var/tmp/OFEDRPM/BUILD/openib-1.1
/drivers/infiniband/include  -Iinclu
oft-float -pipe -mminimal-toc -mtraceback=none  -mcall-aixdesc
-mtune=power4 -mno-altivec -funit-at
lude -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
-I/var/tmp/OFEDRPM/BUILD/openi
g   -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ib_addr.mod)"
-D"KBUILD_MODNAME=KBUILD_STR(
o /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/ib_addr.mod.c
In file included from include/asm/system.h:9,
                 from include/linux/spinlock.h:56,
                 from include/linux/capability.h:45,
                 from include/linux/sched.h:44,
                 from include/linux/module.h:9,
                 from /var/tmp/OFEDRPM/BUILD/openib-1.1
/drivers/infiniband/core/ib_addr.mod.c:1:
include/asm/hw_irq.h: In function `local_irq_disable':
include/asm/hw_irq.h:51: warning: implicit declaration of function
`__mtmsrd'
In file included from include/asm/current.h:15,
                 from include/linux/capability.h:46,
                 from include/linux/sched.h:44,
                 from include/linux/module.h:9,
                 from /var/tmp/OFEDRPM/BUILD/openib-1.1
/drivers/infiniband/core/ib_addr.mod.c:1:
include/asm/paca.h: At top level:
include/asm/paca.h:84: error: `SLB_CACHE_ENTRIES' undeclared here (not in a
function)
In file included from include/linux/sched.h:49,
                 from include/linux/module.h:9,
                 from /var/tmp/OFEDRPM/BUILD/openib-1.1
/drivers/infiniband/core/ib_addr.mod.c:1:
include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined
include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined

If I use the kernel Makefile in /usr/src/linux-2.6.18-rc5 to compile e.g.
make -C /usr/src/linux-2.6.18-rc5 SUBDIRS=/var/tmp/OFEDRPM/BUILD/openib-1.1
/drivers/infiniband/core
then it works fine. We found out that the top-level kernel Makefile does
the following settings

LINUXINCLUDE    := -Iinclude \
                   $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \
                   -include include/linux/autoconf.h
CPPFLAGS        := -D__KERNEL__ $(LINUXINCLUDE)

that include autoconf.h with all configured kernel configs like
CONFIG_PPC64 etc. And obviously those
config defines are lost if one uses
/usr/src/linux-2.6.18-rc5/scripts/Makefile.build as OFED install.sh
does. I'm wondering if anyone else also sees this problem on other
architectures?
Is there any reasons not to use the top-level kernel Makefile?

Thanks!
Nam Nguyen

openib-general-bounces at openib.org wrote on 07.09.2006 22:01:30:

> Hi,
> OFED 1.1 RC4 will be published on Monday 11-Sep.
> We currently work on several showstoppers:
> 1. 223: mthca.so not properly linked to libibverbs – Vlad & Jack
> 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118  - Roland
> 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code – Vlad &
Jack
>
> Thus final release date will be delayed to end of next week
>
>
> Tziporet Koren
> Software Director
> Mellanox Technologies
> mailto: tziporet at mellanox.co.il
> Tel +972-4-9097200, ext 380
>  _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general

From rolandd at cisco.com  Fri Sep  8 14:55:41 2006
From: rolandd at cisco.com (Roland Dreier)
Date: Fri, 8 Sep 2006 14:55:41 -0700
Subject: [openib-general] [PATCH 1/2] RDMA: iWARP connection manager
In-Reply-To: <2006981455.F7Cau4RN2pBSAVMu@cisco.com>
Message-ID: <2006981455.AsEvtu6ZdAKrdkcn@cisco.com>

From: Tom Tucker <tom at opengridcomputing.com>

Add an iWARP Connection Manager (CM), which abstracts connection
management for iWARP devices (RNICs).  It is a logical instance of the
xx_cm where xx is the transport type (ib or iw).  The symbols exported
are used by the transport independent rdma_cm module, and are
available also for transport dependent ULPs.

Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/core/iwcm.c | 1019 ++++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/iwcm.h |   62 ++
 include/rdma/iw_cm.h           |  258 ++++++++++
 3 files changed, 1339 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
new file mode 100644
index 0000000..c3fb304
--- /dev/null
+++ b/drivers/infiniband/core/iwcm.c
@@ -0,0 +1,1019 @@
+/*
+ * Copyright (c) 2004, 2005 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
+ * Copyright (c) 2004, 2005 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/idr.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+
+#include <rdma/iw_cm.h>
+#include <rdma/ib_addr.h>
+
+#include "iwcm.h"
+
+MODULE_AUTHOR("Tom Tucker");
+MODULE_DESCRIPTION("iWARP CM");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static struct workqueue_struct *iwcm_wq;
+struct iwcm_work {
+	struct work_struct work;
+	struct iwcm_id_private *cm_id;
+	struct list_head list;
+	struct iw_cm_event event;
+	struct list_head free_list;
+};
+
+/*
+ * The following services provide a mechanism for pre-allocating iwcm_work
+ * elements.  The design pre-allocates them  based on the cm_id type:
+ *	LISTENING IDS: 	Get enough elements preallocated to handle the
+ *			listen backlog.
+ *	ACTIVE IDS:	4: CONNECT_REPLY, ESTABLISHED, DISCONNECT, CLOSE
+ *	PASSIVE IDS:	3: ESTABLISHED, DISCONNECT, CLOSE
+ *
+ * Allocating them in connect and listen avoids having to deal
+ * with allocation failures on the event upcall from the provider (which
+ * is called in the interrupt context).
+ *
+ * One exception is when creating the cm_id for incoming connection requests.
+ * There are two cases:
+ * 1) in the event upcall, cm_event_handler(), for a listening cm_id.  If
+ *    the backlog is exceeded, then no more connection request events will
+ *    be processed.  cm_event_handler() returns -ENOMEM in this case.  Its up
+ *    to the provider to reject the connectino request.
+ * 2) in the connection request workqueue handler, cm_conn_req_handler().
+ *    If work elements cannot be allocated for the new connect request cm_id,
+ *    then IWCM will call the provider reject method.  This is ok since
+ *    cm_conn_req_handler() runs in the workqueue thread context.
+ */
+
+static struct iwcm_work *get_work(struct iwcm_id_private *cm_id_priv)
+{
+	struct iwcm_work *work;
+
+	if (list_empty(&cm_id_priv->work_free_list))
+		return NULL;
+	work = list_entry(cm_id_priv->work_free_list.next, struct iwcm_work,
+			  free_list);
+	list_del_init(&work->free_list);
+	return work;
+}
+
+static void put_work(struct iwcm_work *work)
+{
+	list_add(&work->free_list, &work->cm_id->work_free_list);
+}
+
+static void dealloc_work_entries(struct iwcm_id_private *cm_id_priv)
+{
+	struct list_head *e, *tmp;
+
+	list_for_each_safe(e, tmp, &cm_id_priv->work_free_list)
+		kfree(list_entry(e, struct iwcm_work, free_list));
+}
+
+static int alloc_work_entries(struct iwcm_id_private *cm_id_priv, int count)
+{
+	struct iwcm_work *work;
+
+	BUG_ON(!list_empty(&cm_id_priv->work_free_list));
+	while (count--) {
+		work = kmalloc(sizeof(struct iwcm_work), GFP_KERNEL);
+		if (!work) {
+			dealloc_work_entries(cm_id_priv);
+			return -ENOMEM;
+		}
+		work->cm_id = cm_id_priv;
+		INIT_LIST_HEAD(&work->list);
+		put_work(work);
+	}
+	return 0;
+}
+
+/*
+ * Save private data from incoming connection requests in the
+ * cm_id_priv so the low level driver doesn't have to.  Adjust
+ * the event ptr to point to the local copy.
+ */
+static int copy_private_data(struct iwcm_id_private *cm_id_priv,
+		       struct iw_cm_event *event)
+{
+	void *p;
+
+	p = kmalloc(event->private_data_len, GFP_ATOMIC);
+	if (!p)
+		return -ENOMEM;
+	memcpy(p, event->private_data, event->private_data_len);
+	event->private_data = p;
+	return 0;
+}
+
+/*
+ * Release a reference on cm_id. If the last reference is being removed
+ * and iw_destroy_cm_id is waiting, wake up the waiting thread.
+ */
+static int iwcm_deref_id(struct iwcm_id_private *cm_id_priv)
+{
+	int ret = 0;
+
+	BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
+	if (atomic_dec_and_test(&cm_id_priv->refcount)) {
+		BUG_ON(!list_empty(&cm_id_priv->work_list));
+		if (waitqueue_active(&cm_id_priv->destroy_comp.wait)) {
+			BUG_ON(cm_id_priv->state != IW_CM_STATE_DESTROYING);
+			BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY,
+					&cm_id_priv->flags));
+			ret = 1;
+		}
+		complete(&cm_id_priv->destroy_comp);
+	}
+
+	return ret;
+}
+
+static void add_ref(struct iw_cm_id *cm_id)
+{
+	struct iwcm_id_private *cm_id_priv;
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	atomic_inc(&cm_id_priv->refcount);
+}
+
+static void rem_ref(struct iw_cm_id *cm_id)
+{
+	struct iwcm_id_private *cm_id_priv;
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	iwcm_deref_id(cm_id_priv);
+}
+
+static int cm_event_handler(struct iw_cm_id *cm_id, struct iw_cm_event *event);
+
+struct iw_cm_id *iw_create_cm_id(struct ib_device *device,
+				 iw_cm_handler cm_handler,
+				 void *context)
+{
+	struct iwcm_id_private *cm_id_priv;
+
+	cm_id_priv = kzalloc(sizeof(*cm_id_priv), GFP_KERNEL);
+	if (!cm_id_priv)
+		return ERR_PTR(-ENOMEM);
+
+	cm_id_priv->state = IW_CM_STATE_IDLE;
+	cm_id_priv->id.device = device;
+	cm_id_priv->id.cm_handler = cm_handler;
+	cm_id_priv->id.context = context;
+	cm_id_priv->id.event_handler = cm_event_handler;
+	cm_id_priv->id.add_ref = add_ref;
+	cm_id_priv->id.rem_ref = rem_ref;
+	spin_lock_init(&cm_id_priv->lock);
+	atomic_set(&cm_id_priv->refcount, 1);
+	init_waitqueue_head(&cm_id_priv->connect_wait);
+	init_completion(&cm_id_priv->destroy_comp);
+	INIT_LIST_HEAD(&cm_id_priv->work_list);
+	INIT_LIST_HEAD(&cm_id_priv->work_free_list);
+
+	return &cm_id_priv->id;
+}
+EXPORT_SYMBOL(iw_create_cm_id);
+
+
+static int iwcm_modify_qp_err(struct ib_qp *qp)
+{
+	struct ib_qp_attr qp_attr;
+
+	if (!qp)
+		return -EINVAL;
+
+	qp_attr.qp_state = IB_QPS_ERR;
+	return ib_modify_qp(qp, &qp_attr, IB_QP_STATE);
+}
+
+/*
+ * This is really the RDMAC CLOSING state. It is most similar to the
+ * IB SQD QP state.
+ */
+static int iwcm_modify_qp_sqd(struct ib_qp *qp)
+{
+	struct ib_qp_attr qp_attr;
+
+	BUG_ON(qp == NULL);
+	qp_attr.qp_state = IB_QPS_SQD;
+	return ib_modify_qp(qp, &qp_attr, IB_QP_STATE);
+}
+
+/*
+ * CM_ID <-- CLOSING
+ *
+ * Block if a passive or active connection is currenlty being processed. Then
+ * process the event as follows:
+ * - If we are ESTABLISHED, move to CLOSING and modify the QP state
+ *   based on the abrupt flag
+ * - If the connection is already in the CLOSING or IDLE state, the peer is
+ *   disconnecting concurrently with us and we've already seen the
+ *   DISCONNECT event -- ignore the request and return 0
+ * - Disconnect on a listening endpoint returns -EINVAL
+ */
+int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt)
+{
+	struct iwcm_id_private *cm_id_priv;
+	unsigned long flags;
+	int ret = 0;
+	struct ib_qp *qp = NULL;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	/* Wait if we're currently in a connect or accept downcall */
+	wait_event(cm_id_priv->connect_wait,
+		   !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags));
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	switch (cm_id_priv->state) {
+	case IW_CM_STATE_ESTABLISHED:
+		cm_id_priv->state = IW_CM_STATE_CLOSING;
+
+		/* QP could be <nul> for user-mode client */
+		if (cm_id_priv->qp)
+			qp = cm_id_priv->qp;
+		else
+			ret = -EINVAL;
+		break;
+	case IW_CM_STATE_LISTEN:
+		ret = -EINVAL;
+		break;
+	case IW_CM_STATE_CLOSING:
+		/* remote peer closed first */
+	case IW_CM_STATE_IDLE:
+		/* accept or connect returned !0 */
+		break;
+	case IW_CM_STATE_CONN_RECV:
+		/*
+		 * App called disconnect before/without calling accept after
+		 * connect_request event delivered.
+		 */
+		break;
+	case IW_CM_STATE_CONN_SENT:
+		/* Can only get here if wait above fails */
+	default:
+		BUG();
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	if (qp) {
+		if (abrupt)
+			ret = iwcm_modify_qp_err(qp);
+		else
+			ret = iwcm_modify_qp_sqd(qp);
+
+		/*
+		 * If both sides are disconnecting the QP could
+		 * already be in ERR or SQD states
+		 */
+		ret = 0;
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(iw_cm_disconnect);
+
+/*
+ * CM_ID <-- DESTROYING
+ *
+ * Clean up all resources associated with the connection and release
+ * the initial reference taken by iw_create_cm_id.
+ */
+static void destroy_cm_id(struct iw_cm_id *cm_id)
+{
+	struct iwcm_id_private *cm_id_priv;
+	unsigned long flags;
+	int ret;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	/*
+	 * Wait if we're currently in a connect or accept downcall. A
+	 * listening endpoint should never block here.
+	 */
+	wait_event(cm_id_priv->connect_wait,
+		   !test_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags));
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	switch (cm_id_priv->state) {
+	case IW_CM_STATE_LISTEN:
+		cm_id_priv->state = IW_CM_STATE_DESTROYING;
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		/* destroy the listening endpoint */
+		ret = cm_id->device->iwcm->destroy_listen(cm_id);
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+		break;
+	case IW_CM_STATE_ESTABLISHED:
+		cm_id_priv->state = IW_CM_STATE_DESTROYING;
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		/* Abrupt close of the connection */
+		(void)iwcm_modify_qp_err(cm_id_priv->qp);
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+		break;
+	case IW_CM_STATE_IDLE:
+	case IW_CM_STATE_CLOSING:
+		cm_id_priv->state = IW_CM_STATE_DESTROYING;
+		break;
+	case IW_CM_STATE_CONN_RECV:
+		/*
+		 * App called destroy before/without calling accept after
+		 * receiving connection request event notification.
+		 */
+		cm_id_priv->state = IW_CM_STATE_DESTROYING;
+		break;
+	case IW_CM_STATE_CONN_SENT:
+	case IW_CM_STATE_DESTROYING:
+	default:
+		BUG();
+		break;
+	}
+	if (cm_id_priv->qp) {
+		cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp);
+		cm_id_priv->qp = NULL;
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	(void)iwcm_deref_id(cm_id_priv);
+}
+
+/*
+ * This function is only called by the application thread and cannot
+ * be called by the event thread. The function will wait for all
+ * references to be released on the cm_id and then kfree the cm_id
+ * object.
+ */
+void iw_destroy_cm_id(struct iw_cm_id *cm_id)
+{
+	struct iwcm_id_private *cm_id_priv;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	BUG_ON(test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags));
+
+	destroy_cm_id(cm_id);
+
+	wait_for_completion(&cm_id_priv->destroy_comp);
+
+	dealloc_work_entries(cm_id_priv);
+
+	kfree(cm_id_priv);
+}
+EXPORT_SYMBOL(iw_destroy_cm_id);
+
+/*
+ * CM_ID <-- LISTEN
+ *
+ * Start listening for connect requests. Generates one CONNECT_REQUEST
+ * event for each inbound connect request.
+ */
+int iw_cm_listen(struct iw_cm_id *cm_id, int backlog)
+{
+	struct iwcm_id_private *cm_id_priv;
+	unsigned long flags;
+	int ret = 0;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+
+	ret = alloc_work_entries(cm_id_priv, backlog);
+	if (ret)
+		return ret;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	switch (cm_id_priv->state) {
+	case IW_CM_STATE_IDLE:
+		cm_id_priv->state = IW_CM_STATE_LISTEN;
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		ret = cm_id->device->iwcm->create_listen(cm_id, backlog);
+		if (ret)
+			cm_id_priv->state = IW_CM_STATE_IDLE;
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL(iw_cm_listen);
+
+/*
+ * CM_ID <-- IDLE
+ *
+ * Rejects an inbound connection request. No events are generated.
+ */
+int iw_cm_reject(struct iw_cm_id *cm_id,
+		 const void *private_data,
+		 u8 private_data_len)
+{
+	struct iwcm_id_private *cm_id_priv;
+	unsigned long flags;
+	int ret;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) {
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+		wake_up_all(&cm_id_priv->connect_wait);
+		return -EINVAL;
+	}
+	cm_id_priv->state = IW_CM_STATE_IDLE;
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	ret = cm_id->device->iwcm->reject(cm_id, private_data,
+					  private_data_len);
+
+	clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+	wake_up_all(&cm_id_priv->connect_wait);
+
+	return ret;
+}
+EXPORT_SYMBOL(iw_cm_reject);
+
+/*
+ * CM_ID <-- ESTABLISHED
+ *
+ * Accepts an inbound connection request and generates an ESTABLISHED
+ * event. Callers of iw_cm_disconnect and iw_destroy_cm_id will block
+ * until the ESTABLISHED event is received from the provider.
+ */
+int iw_cm_accept(struct iw_cm_id *cm_id,
+		 struct iw_cm_conn_param *iw_param)
+{
+	struct iwcm_id_private *cm_id_priv;
+	struct ib_qp *qp;
+	unsigned long flags;
+	int ret;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	if (cm_id_priv->state != IW_CM_STATE_CONN_RECV) {
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+		wake_up_all(&cm_id_priv->connect_wait);
+		return -EINVAL;
+	}
+	/* Get the ib_qp given the QPN */
+	qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
+	if (!qp) {
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		return -EINVAL;
+	}
+	cm_id->device->iwcm->add_ref(qp);
+	cm_id_priv->qp = qp;
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	ret = cm_id->device->iwcm->accept(cm_id, iw_param);
+	if (ret) {
+		/* An error on accept precludes provider events */
+		BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
+		cm_id_priv->state = IW_CM_STATE_IDLE;
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+		if (cm_id_priv->qp) {
+			cm_id->device->iwcm->rem_ref(qp);
+			cm_id_priv->qp = NULL;
+		}
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+		wake_up_all(&cm_id_priv->connect_wait);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(iw_cm_accept);
+
+/*
+ * Active Side: CM_ID <-- CONN_SENT
+ *
+ * If successful, results in the generation of a CONNECT_REPLY
+ * event. iw_cm_disconnect and iw_cm_destroy will block until the
+ * CONNECT_REPLY event is received from the provider.
+ */
+int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param)
+{
+	struct iwcm_id_private *cm_id_priv;
+	int ret = 0;
+	unsigned long flags;
+	struct ib_qp *qp;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+
+	ret = alloc_work_entries(cm_id_priv, 4);
+	if (ret)
+		return ret;
+
+	set_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+
+	if (cm_id_priv->state != IW_CM_STATE_IDLE) {
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+		wake_up_all(&cm_id_priv->connect_wait);
+		return -EINVAL;
+	}
+
+	/* Get the ib_qp given the QPN */
+	qp = cm_id->device->iwcm->get_qp(cm_id->device, iw_param->qpn);
+	if (!qp) {
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		return -EINVAL;
+	}
+	cm_id->device->iwcm->add_ref(qp);
+	cm_id_priv->qp = qp;
+	cm_id_priv->state = IW_CM_STATE_CONN_SENT;
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	ret = cm_id->device->iwcm->connect(cm_id, iw_param);
+	if (ret) {
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+		if (cm_id_priv->qp) {
+			cm_id->device->iwcm->rem_ref(qp);
+			cm_id_priv->qp = NULL;
+		}
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT);
+		cm_id_priv->state = IW_CM_STATE_IDLE;
+		clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+		wake_up_all(&cm_id_priv->connect_wait);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(iw_cm_connect);
+
+/*
+ * Passive Side: new CM_ID <-- CONN_RECV
+ *
+ * Handles an inbound connect request. The function creates a new
+ * iw_cm_id to represent the new connection and inherits the client
+ * callback function and other attributes from the listening parent.
+ *
+ * The work item contains a pointer to the listen_cm_id and the event. The
+ * listen_cm_id contains the client cm_handler, context and
+ * device. These are copied when the device is cloned. The event
+ * contains the new four tuple.
+ *
+ * An error on the child should not affect the parent, so this
+ * function does not return a value.
+ */
+static void cm_conn_req_handler(struct iwcm_id_private *listen_id_priv,
+				struct iw_cm_event *iw_event)
+{
+	unsigned long flags;
+	struct iw_cm_id *cm_id;
+	struct iwcm_id_private *cm_id_priv;
+	int ret;
+
+	/*
+	 * The provider should never generate a connection request
+	 * event with a bad status.
+	 */
+	BUG_ON(iw_event->status);
+
+	/*
+	 * We could be destroying the listening id. If so, ignore this
+	 * upcall.
+	 */
+	spin_lock_irqsave(&listen_id_priv->lock, flags);
+	if (listen_id_priv->state != IW_CM_STATE_LISTEN) {
+		spin_unlock_irqrestore(&listen_id_priv->lock, flags);
+		return;
+	}
+	spin_unlock_irqrestore(&listen_id_priv->lock, flags);
+
+	cm_id = iw_create_cm_id(listen_id_priv->id.device,
+				listen_id_priv->id.cm_handler,
+				listen_id_priv->id.context);
+	/* If the cm_id could not be created, ignore the request */
+	if (IS_ERR(cm_id))
+		return;
+
+	cm_id->provider_data = iw_event->provider_data;
+	cm_id->local_addr = iw_event->local_addr;
+	cm_id->remote_addr = iw_event->remote_addr;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	cm_id_priv->state = IW_CM_STATE_CONN_RECV;
+
+	ret = alloc_work_entries(cm_id_priv, 3);
+	if (ret) {
+		iw_cm_reject(cm_id, NULL, 0);
+		iw_destroy_cm_id(cm_id);
+		return;
+	}
+
+	/* Call the client CM handler */
+	ret = cm_id->cm_handler(cm_id, iw_event);
+	if (ret) {
+		set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
+		destroy_cm_id(cm_id);
+		if (atomic_read(&cm_id_priv->refcount)==0)
+			kfree(cm_id);
+	}
+
+	if (iw_event->private_data_len)
+		kfree(iw_event->private_data);
+}
+
+/*
+ * Passive Side: CM_ID <-- ESTABLISHED
+ *
+ * The provider generated an ESTABLISHED event which means that
+ * the MPA negotion has completed successfully and we are now in MPA
+ * FPDU mode.
+ *
+ * This event can only be received in the CONN_RECV state. If the
+ * remote peer closed, the ESTABLISHED event would be received followed
+ * by the CLOSE event. If the app closes, it will block until we wake
+ * it up after processing this event.
+ */
+static int cm_conn_est_handler(struct iwcm_id_private *cm_id_priv,
+			       struct iw_cm_event *iw_event)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+
+	/*
+	 * We clear the CONNECT_WAIT bit here to allow the callback
+	 * function to call iw_cm_disconnect. Calling iw_destroy_cm_id
+	 * from a callback handler is not allowed.
+	 */
+	clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+	BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_RECV);
+	cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
+	wake_up_all(&cm_id_priv->connect_wait);
+
+	return ret;
+}
+
+/*
+ * Active Side: CM_ID <-- ESTABLISHED
+ *
+ * The app has called connect and is waiting for the established event to
+ * post it's requests to the server. This event will wake up anyone
+ * blocked in iw_cm_disconnect or iw_destroy_id.
+ */
+static int cm_conn_rep_handler(struct iwcm_id_private *cm_id_priv,
+			       struct iw_cm_event *iw_event)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	/*
+	 * Clear the connect wait bit so a callback function calling
+	 * iw_cm_disconnect will not wait and deadlock this thread
+	 */
+	clear_bit(IWCM_F_CONNECT_WAIT, &cm_id_priv->flags);
+	BUG_ON(cm_id_priv->state != IW_CM_STATE_CONN_SENT);
+	if (iw_event->status == IW_CM_EVENT_STATUS_ACCEPTED) {
+		cm_id_priv->id.local_addr = iw_event->local_addr;
+		cm_id_priv->id.remote_addr = iw_event->remote_addr;
+		cm_id_priv->state = IW_CM_STATE_ESTABLISHED;
+	} else {
+		/* REJECTED or RESET */
+		cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp);
+		cm_id_priv->qp = NULL;
+		cm_id_priv->state = IW_CM_STATE_IDLE;
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
+
+	if (iw_event->private_data_len)
+		kfree(iw_event->private_data);
+
+	/* Wake up waiters on connect complete */
+	wake_up_all(&cm_id_priv->connect_wait);
+
+	return ret;
+}
+
+/*
+ * CM_ID <-- CLOSING
+ *
+ * If in the ESTABLISHED state, move to CLOSING.
+ */
+static void cm_disconnect_handler(struct iwcm_id_private *cm_id_priv,
+				  struct iw_cm_event *iw_event)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	if (cm_id_priv->state == IW_CM_STATE_ESTABLISHED)
+		cm_id_priv->state = IW_CM_STATE_CLOSING;
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+}
+
+/*
+ * CM_ID <-- IDLE
+ *
+ * If in the ESTBLISHED or CLOSING states, the QP will have have been
+ * moved by the provider to the ERR state. Disassociate the CM_ID from
+ * the QP,  move to IDLE, and remove the 'connected' reference.
+ *
+ * If in some other state, the cm_id was destroyed asynchronously.
+ * This is the last reference that will result in waking up
+ * the app thread blocked in iw_destroy_cm_id.
+ */
+static int cm_close_handler(struct iwcm_id_private *cm_id_priv,
+				  struct iw_cm_event *iw_event)
+{
+	unsigned long flags;
+	int ret = 0;
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+
+	if (cm_id_priv->qp) {
+		cm_id_priv->id.device->iwcm->rem_ref(cm_id_priv->qp);
+		cm_id_priv->qp = NULL;
+	}
+	switch (cm_id_priv->state) {
+	case IW_CM_STATE_ESTABLISHED:
+	case IW_CM_STATE_CLOSING:
+		cm_id_priv->state = IW_CM_STATE_IDLE;
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+		ret = cm_id_priv->id.cm_handler(&cm_id_priv->id, iw_event);
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+		break;
+	case IW_CM_STATE_DESTROYING:
+		break;
+	default:
+		BUG();
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+	return ret;
+}
+
+static int process_event(struct iwcm_id_private *cm_id_priv,
+			 struct iw_cm_event *iw_event)
+{
+	int ret = 0;
+
+	switch (iw_event->event) {
+	case IW_CM_EVENT_CONNECT_REQUEST:
+		cm_conn_req_handler(cm_id_priv, iw_event);
+		break;
+	case IW_CM_EVENT_CONNECT_REPLY:
+		ret = cm_conn_rep_handler(cm_id_priv, iw_event);
+		break;
+	case IW_CM_EVENT_ESTABLISHED:
+		ret = cm_conn_est_handler(cm_id_priv, iw_event);
+		break;
+	case IW_CM_EVENT_DISCONNECT:
+		cm_disconnect_handler(cm_id_priv, iw_event);
+		break;
+	case IW_CM_EVENT_CLOSE:
+		ret = cm_close_handler(cm_id_priv, iw_event);
+		break;
+	default:
+		BUG();
+	}
+
+	return ret;
+}
+
+/*
+ * Process events on the work_list for the cm_id. If the callback
+ * function requests that the cm_id be deleted, a flag is set in the
+ * cm_id flags to indicate that when the last reference is
+ * removed, the cm_id is to be destroyed. This is necessary to
+ * distinguish between an object that will be destroyed by the app
+ * thread asleep on the destroy_comp list vs. an object destroyed
+ * here synchronously when the last reference is removed.
+ */
+static void cm_work_handler(void *arg)
+{
+	struct iwcm_work *work = arg, lwork;
+	struct iwcm_id_private *cm_id_priv = work->cm_id;
+	unsigned long flags;
+	int empty;
+	int ret = 0;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	empty = list_empty(&cm_id_priv->work_list);
+	while (!empty) {
+		work = list_entry(cm_id_priv->work_list.next,
+				  struct iwcm_work, list);
+		list_del_init(&work->list);
+		empty = list_empty(&cm_id_priv->work_list);
+		lwork = *work;
+		put_work(work);
+		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+		ret = process_event(cm_id_priv, &work->event);
+		if (ret) {
+			set_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags);
+			destroy_cm_id(&cm_id_priv->id);
+		}
+		BUG_ON(atomic_read(&cm_id_priv->refcount)==0);
+		if (iwcm_deref_id(cm_id_priv))
+			return;
+
+		if (atomic_read(&cm_id_priv->refcount)==0 &&
+		    test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) {
+			dealloc_work_entries(cm_id_priv);
+			kfree(cm_id_priv);
+			return;
+		}
+		spin_lock_irqsave(&cm_id_priv->lock, flags);
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+}
+
+/*
+ * This function is called on interrupt context. Schedule events on
+ * the iwcm_wq thread to allow callback functions to downcall into
+ * the CM and/or block.  Events are queued to a per-CM_ID
+ * work_list. If this is the first event on the work_list, the work
+ * element is also queued on the iwcm_wq thread.
+ *
+ * Each event holds a reference on the cm_id. Until the last posted
+ * event has been delivered and processed, the cm_id cannot be
+ * deleted.
+ *
+ * Returns:
+ * 	      0	- the event was handled.
+ *	-ENOMEM	- the event was not handled due to lack of resources.
+ */
+static int cm_event_handler(struct iw_cm_id *cm_id,
+			     struct iw_cm_event *iw_event)
+{
+	struct iwcm_work *work;
+	struct iwcm_id_private *cm_id_priv;
+	unsigned long flags;
+	int ret = 0;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	work = get_work(cm_id_priv);
+	if (!work) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	INIT_WORK(&work->work, cm_work_handler, work);
+	work->cm_id = cm_id_priv;
+	work->event = *iw_event;
+
+	if ((work->event.event == IW_CM_EVENT_CONNECT_REQUEST ||
+	     work->event.event == IW_CM_EVENT_CONNECT_REPLY) &&
+	    work->event.private_data_len) {
+		ret = copy_private_data(cm_id_priv, &work->event);
+		if (ret) {
+			put_work(work);
+			goto out;
+		}
+	}
+
+	atomic_inc(&cm_id_priv->refcount);
+	if (list_empty(&cm_id_priv->work_list)) {
+		list_add_tail(&work->list, &cm_id_priv->work_list);
+		queue_work(iwcm_wq, &work->work);
+	} else
+		list_add_tail(&work->list, &cm_id_priv->work_list);
+out:
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	return ret;
+}
+
+static int iwcm_init_qp_init_attr(struct iwcm_id_private *cm_id_priv,
+				  struct ib_qp_attr *qp_attr,
+				  int *qp_attr_mask)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	switch (cm_id_priv->state) {
+	case IW_CM_STATE_IDLE:
+	case IW_CM_STATE_CONN_SENT:
+	case IW_CM_STATE_CONN_RECV:
+	case IW_CM_STATE_ESTABLISHED:
+		*qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
+		qp_attr->qp_access_flags = IB_ACCESS_LOCAL_WRITE |
+					   IB_ACCESS_REMOTE_WRITE|
+					   IB_ACCESS_REMOTE_READ;
+		ret = 0;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	return ret;
+}
+
+static int iwcm_init_qp_rts_attr(struct iwcm_id_private *cm_id_priv,
+				  struct ib_qp_attr *qp_attr,
+				  int *qp_attr_mask)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&cm_id_priv->lock, flags);
+	switch (cm_id_priv->state) {
+	case IW_CM_STATE_IDLE:
+	case IW_CM_STATE_CONN_SENT:
+	case IW_CM_STATE_CONN_RECV:
+	case IW_CM_STATE_ESTABLISHED:
+		*qp_attr_mask = 0;
+		ret = 0;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+	return ret;
+}
+
+int iw_cm_init_qp_attr(struct iw_cm_id *cm_id,
+		       struct ib_qp_attr *qp_attr,
+		       int *qp_attr_mask)
+{
+	struct iwcm_id_private *cm_id_priv;
+	int ret;
+
+	cm_id_priv = container_of(cm_id, struct iwcm_id_private, id);
+	switch (qp_attr->qp_state) {
+	case IB_QPS_INIT:
+	case IB_QPS_RTR:
+		ret = iwcm_init_qp_init_attr(cm_id_priv,
+					     qp_attr, qp_attr_mask);
+		break;
+	case IB_QPS_RTS:
+		ret = iwcm_init_qp_rts_attr(cm_id_priv,
+					    qp_attr, qp_attr_mask);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(iw_cm_init_qp_attr);
+
+static int __init iw_cm_init(void)
+{
+	iwcm_wq = create_singlethread_workqueue("iw_cm_wq");
+	if (!iwcm_wq)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit iw_cm_cleanup(void)
+{
+	destroy_workqueue(iwcm_wq);
+}
+
+module_init(iw_cm_init);
+module_exit(iw_cm_cleanup);
diff --git a/drivers/infiniband/core/iwcm.h b/drivers/infiniband/core/iwcm.h
new file mode 100644
index 0000000..3f6cc82
--- /dev/null
+++ b/drivers/infiniband/core/iwcm.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef IWCM_H
+#define IWCM_H
+
+enum iw_cm_state {
+	IW_CM_STATE_IDLE,             /* unbound, inactive */
+	IW_CM_STATE_LISTEN,           /* listen waiting for connect */
+	IW_CM_STATE_CONN_RECV,        /* inbound waiting for user accept */
+	IW_CM_STATE_CONN_SENT,        /* outbound waiting for peer accept */
+	IW_CM_STATE_ESTABLISHED,      /* established */
+	IW_CM_STATE_CLOSING,	      /* disconnect */
+	IW_CM_STATE_DESTROYING        /* object being deleted */
+};
+
+struct iwcm_id_private {
+	struct iw_cm_id	id;
+	enum iw_cm_state state;
+	unsigned long flags;
+	struct ib_qp *qp;
+	struct completion destroy_comp;
+	wait_queue_head_t connect_wait;
+	struct list_head work_list;
+	spinlock_t lock;
+	atomic_t refcount;
+	struct list_head work_free_list;
+};
+
+#define IWCM_F_CALLBACK_DESTROY   1
+#define IWCM_F_CONNECT_WAIT       2
+
+#endif /* IWCM_H */
diff --git a/include/rdma/iw_cm.h b/include/rdma/iw_cm.h
new file mode 100644
index 0000000..aeefa9b
--- /dev/null
+++ b/include/rdma/iw_cm.h
@@ -0,0 +1,258 @@
+/*
+ * Copyright (c) 2005 Network Appliance, Inc. All rights reserved.
+ * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef IW_CM_H
+#define IW_CM_H
+
+#include <linux/in.h>
+#include <rdma/ib_cm.h>
+
+struct iw_cm_id;
+
+enum iw_cm_event_type {
+	IW_CM_EVENT_CONNECT_REQUEST = 1, /* connect request received */
+	IW_CM_EVENT_CONNECT_REPLY,	 /* reply from active connect request */
+	IW_CM_EVENT_ESTABLISHED,	 /* passive side accept successful */
+	IW_CM_EVENT_DISCONNECT,		 /* orderly shutdown */
+	IW_CM_EVENT_CLOSE		 /* close complete */
+};
+
+enum iw_cm_event_status {
+	IW_CM_EVENT_STATUS_OK = 0,	 /* request successful */
+	IW_CM_EVENT_STATUS_ACCEPTED = 0, /* connect request accepted */
+	IW_CM_EVENT_STATUS_REJECTED,	 /* connect request rejected */
+	IW_CM_EVENT_STATUS_TIMEOUT,	 /* the operation timed out */
+	IW_CM_EVENT_STATUS_RESET,	 /* reset from remote peer */
+	IW_CM_EVENT_STATUS_EINVAL,	 /* asynchronous failure for bad parm */
+};
+
+struct iw_cm_event {
+	enum iw_cm_event_type event;
+	enum iw_cm_event_status status;
+	struct sockaddr_in local_addr;
+	struct sockaddr_in remote_addr;
+	void *private_data;
+	u8 private_data_len;
+	void* provider_data;
+};
+
+/**
+ * iw_cm_handler - Function to be called by the IW CM when delivering events
+ * to the client.
+ *
+ * @cm_id: The IW CM identifier associated with the event.
+ * @event: Pointer to the event structure.
+ */
+typedef int (*iw_cm_handler)(struct iw_cm_id *cm_id,
+			     struct iw_cm_event *event);
+
+/**
+ * iw_event_handler - Function called by the provider when delivering provider
+ * events to the IW CM.  Returns either 0 indicating the event was processed
+ * or -errno if the event could not be processed.
+ *
+ * @cm_id: The IW CM identifier associated with the event.
+ * @event: Pointer to the event structure.
+ */
+typedef int (*iw_event_handler)(struct iw_cm_id *cm_id,
+				 struct iw_cm_event *event);
+
+struct iw_cm_id {
+	iw_cm_handler		cm_handler;      /* client callback function */
+	void		        *context;	 /* client cb context */
+	struct ib_device	*device;
+	struct sockaddr_in      local_addr;
+	struct sockaddr_in	remote_addr;
+	void			*provider_data;	 /* provider private data */
+	iw_event_handler        event_handler;   /* cb for provider
+						    events */
+	/* Used by provider to add and remove refs on IW cm_id */
+	void (*add_ref)(struct iw_cm_id *);
+	void (*rem_ref)(struct iw_cm_id *);
+};
+
+struct iw_cm_conn_param {
+	const void *private_data;
+	u16 private_data_len;
+	u32 ord;
+	u32 ird;
+	u32 qpn;
+};
+
+struct iw_cm_verbs {
+	void		(*add_ref)(struct ib_qp *qp);
+
+	void		(*rem_ref)(struct ib_qp *qp);
+
+	struct ib_qp *	(*get_qp)(struct ib_device *device,
+				  int qpn);
+
+	int		(*connect)(struct iw_cm_id *cm_id,
+				   struct iw_cm_conn_param *conn_param);
+
+	int		(*accept)(struct iw_cm_id *cm_id,
+				  struct iw_cm_conn_param *conn_param);
+
+	int		(*reject)(struct iw_cm_id *cm_id,
+				  const void *pdata, u8 pdata_len);
+
+	int		(*create_listen)(struct iw_cm_id *cm_id,
+					 int backlog);
+
+	int		(*destroy_listen)(struct iw_cm_id *cm_id);
+};
+
+/**
+ * iw_create_cm_id - Create an IW CM identifier.
+ *
+ * @device: The IB device on which to create the IW CM identier.
+ * @event_handler: User callback invoked to report events associated with the
+ *   returned IW CM identifier.
+ * @context: User specified context associated with the id.
+ */
+struct iw_cm_id *iw_create_cm_id(struct ib_device *device,
+				 iw_cm_handler cm_handler, void *context);
+
+/**
+ * iw_destroy_cm_id - Destroy an IW CM identifier.
+ *
+ * @cm_id: The previously created IW CM identifier to destroy.
+ *
+ * The client can assume that no events will be delivered for the CM ID after
+ * this function returns.
+ */
+void iw_destroy_cm_id(struct iw_cm_id *cm_id);
+
+/**
+ * iw_cm_bind_qp - Unbind the specified IW CM identifier and QP
+ *
+ * @cm_id: The IW CM idenfier to unbind from the QP.
+ * @qp: The QP
+ *
+ * This is called by the provider when destroying the QP to ensure
+ * that any references held by the IWCM are released. It may also
+ * be called by the IWCM when destroying a CM_ID to that any
+ * references held by the provider are released.
+ */
+void iw_cm_unbind_qp(struct iw_cm_id *cm_id, struct ib_qp *qp);
+
+/**
+ * iw_cm_get_qp - Return the ib_qp associated with a QPN
+ *
+ * @ib_device: The IB device
+ * @qpn: The queue pair number
+ */
+struct ib_qp *iw_cm_get_qp(struct ib_device *device, int qpn);
+
+/**
+ * iw_cm_listen - Listen for incoming connection requests on the
+ * specified IW CM id.
+ *
+ * @cm_id: The IW CM identifier.
+ * @backlog: The maximum number of outstanding un-accepted inbound listen
+ *   requests to queue.
+ *
+ * The source address and port number are specified in the IW CM identifier
+ * structure.
+ */
+int iw_cm_listen(struct iw_cm_id *cm_id, int backlog);
+
+/**
+ * iw_cm_accept - Called to accept an incoming connect request.
+ *
+ * @cm_id: The IW CM identifier associated with the connection request.
+ * @iw_param: Pointer to a structure containing connection establishment
+ *   parameters.
+ *
+ * The specified cm_id will have been provided in the event data for a
+ * CONNECT_REQUEST event. Subsequent events related to this connection will be
+ * delivered to the specified IW CM identifier prior and may occur prior to
+ * the return of this function. If this function returns a non-zero value, the
+ * client can assume that no events will be delivered to the specified IW CM
+ * identifier.
+ */
+int iw_cm_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param);
+
+/**
+ * iw_cm_reject - Reject an incoming connection request.
+ *
+ * @cm_id: Connection identifier associated with the request.
+ * @private_daa: Pointer to data to deliver to the remote peer as part of the
+ *   reject message.
+ * @private_data_len: The number of bytes in the private_data parameter.
+ *
+ * The client can assume that no events will be delivered to the specified IW
+ * CM identifier following the return of this function. The private_data
+ * buffer is available for reuse when this function returns.
+ */
+int iw_cm_reject(struct iw_cm_id *cm_id, const void *private_data,
+		 u8 private_data_len);
+
+/**
+ * iw_cm_connect - Called to request a connection to a remote peer.
+ *
+ * @cm_id: The IW CM identifier for the connection.
+ * @iw_param: Pointer to a structure containing connection  establishment
+ *   parameters.
+ *
+ * Events may be delivered to the specified IW CM identifier prior to the
+ * return of this function. If this function returns a non-zero value, the
+ * client can assume that no events will be delivered to the specified IW CM
+ * identifier.
+ */
+int iw_cm_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *iw_param);
+
+/**
+ * iw_cm_disconnect - Close the specified connection.
+ *
+ * @cm_id: The IW CM identifier to close.
+ * @abrupt: If 0, the connection will be closed gracefully, otherwise, the
+ *   connection will be reset.
+ *
+ * The IW CM identifier is still active until the IW_CM_EVENT_CLOSE event is
+ * delivered.
+ */
+int iw_cm_disconnect(struct iw_cm_id *cm_id, int abrupt);
+
+/**
+ * iw_cm_init_qp_attr - Called to initialize the attributes of the QP
+ * associated with a IW CM identifier.
+ *
+ * @cm_id: The IW CM identifier associated with the QP
+ * @qp_attr: Pointer to the QP attributes structure.
+ * @qp_attr_mask: Pointer to a bit vector specifying which QP attributes are
+ *   valid.
+ */
+int iw_cm_init_qp_attr(struct iw_cm_id *cm_id, struct ib_qp_attr *qp_attr,
+		       int *qp_attr_mask);
+
+#endif /* IW_CM_H */
-- 
1.4.1


From rolandd at cisco.com  Fri Sep  8 14:55:41 2006
From: rolandd at cisco.com (Roland Dreier)
Date: Fri, 8 Sep 2006 14:55:41 -0700
Subject: [openib-general] [PATCH 2/2] RDMA: iWARP changes to IB core
In-Reply-To: <2006981455.AsEvtu6ZdAKrdkcn@cisco.com>
Message-ID: <2006981455.5zPhTm8jRQnxTde2@cisco.com>

From: Tom Tucker <tom at opengridcomputing.com>

Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/core/Makefile             |    4 
 drivers/infiniband/core/addr.c               |   18 +
 drivers/infiniband/core/cache.c              |    5 
 drivers/infiniband/core/cm.c                 |    3 
 drivers/infiniband/core/cma.c                |  355 +++++++++++++++++++++++---
 drivers/infiniband/core/device.c             |    4 
 drivers/infiniband/core/mad.c                |    7 -
 drivers/infiniband/core/sa_query.c           |    5 
 drivers/infiniband/core/smi.c                |   16 +
 drivers/infiniband/core/sysfs.c              |   11 -
 drivers/infiniband/core/ucm.c                |    3 
 drivers/infiniband/core/user_mad.c           |    5 
 drivers/infiniband/core/verbs.c              |   17 +
 drivers/infiniband/hw/ehca/ehca_main.c       |    2 
 drivers/infiniband/hw/ipath/ipath_verbs.c    |    2 
 drivers/infiniband/hw/mthca/mthca_provider.c |    2 
 drivers/infiniband/ulp/ipoib/ipoib_main.c    |    8 +
 drivers/infiniband/ulp/srp/ib_srp.c          |    2 
 include/rdma/ib_addr.h                       |   17 +
 include/rdma/ib_verbs.h                      |   25 ++
 20 files changed, 430 insertions(+), 81 deletions(-)

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 68e73ec..163d991 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -1,7 +1,7 @@
 infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS)	:= ib_addr.o rdma_cm.o
 
 obj-$(CONFIG_INFINIBAND) +=		ib_core.o ib_mad.o ib_sa.o \
-					ib_cm.o $(infiniband-y)
+					ib_cm.o iw_cm.o $(infiniband-y)
 obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o
 
@@ -14,6 +14,8 @@ ib_sa-y :=			sa_query.o
 
 ib_cm-y :=			cm.o
 
+iw_cm-y :=			iwcm.o
+
 rdma_cm-y :=			cma.o
 
 ib_addr-y :=			addr.o
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index d8e54e0..9cbf09e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -61,12 +61,15 @@ static LIST_HEAD(req_list);
 static DECLARE_WORK(work, process_req, NULL);
 static struct workqueue_struct *addr_wq;
 
-static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
-		     unsigned char *dst_dev_addr)
+int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
+		     const unsigned char *dst_dev_addr)
 {
 	switch (dev->type) {
 	case ARPHRD_INFINIBAND:
-		dev_addr->dev_type = IB_NODE_CA;
+		dev_addr->dev_type = RDMA_NODE_IB_CA;
+		break;
+	case ARPHRD_ETHER:
+		dev_addr->dev_type = RDMA_NODE_RNIC;
 		break;
 	default:
 		return -EADDRNOTAVAIL;
@@ -78,6 +81,7 @@ static int copy_addr(struct rdma_dev_add
 		memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
 	return 0;
 }
+EXPORT_SYMBOL(rdma_copy_addr);
 
 int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr)
 {
@@ -89,7 +93,7 @@ int rdma_translate_ip(struct sockaddr *a
 	if (!dev)
 		return -EADDRNOTAVAIL;
 
-	ret = copy_addr(dev_addr, dev, NULL);
+	ret = rdma_copy_addr(dev_addr, dev, NULL);
 	dev_put(dev);
 	return ret;
 }
@@ -161,7 +165,7 @@ static int addr_resolve_remote(struct so
 
 	/* If the device does ARP internally, return 'done' */
 	if (rt->idev->dev->flags & IFF_NOARP) {
-		copy_addr(addr, rt->idev->dev, NULL);
+		rdma_copy_addr(addr, rt->idev->dev, NULL);
 		goto put;
 	}
 
@@ -181,7 +185,7 @@ static int addr_resolve_remote(struct so
 		src_in->sin_addr.s_addr = rt->rt_src;
 	}
 
-	ret = copy_addr(addr, neigh->dev, neigh->ha);
+	ret = rdma_copy_addr(addr, neigh->dev, neigh->ha);
 release:
 	neigh_release(neigh);
 put:
@@ -245,7 +249,7 @@ static int addr_resolve_local(struct soc
 	if (ZERONET(src_ip)) {
 		src_in->sin_family = dst_in->sin_family;
 		src_in->sin_addr.s_addr = dst_ip;
-		ret = copy_addr(addr, dev, dev->dev_addr);
+		ret = rdma_copy_addr(addr, dev, dev->dev_addr);
 	} else if (LOOPBACK(src_ip)) {
 		ret = rdma_translate_ip((struct sockaddr *)dst_in, addr);
 		if (!ret)
diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
index 75313ad..20e9f64 100644
--- a/drivers/infiniband/core/cache.c
+++ b/drivers/infiniband/core/cache.c
@@ -62,12 +62,13 @@ struct ib_update_work {
 
 static inline int start_port(struct ib_device *device)
 {
-	return device->node_type == IB_NODE_SWITCH ? 0 : 1;
+	return (device->node_type == RDMA_NODE_IB_SWITCH) ? 0 : 1;
 }
 
 static inline int end_port(struct ib_device *device)
 {
-	return device->node_type == IB_NODE_SWITCH ? 0 : device->phys_port_cnt;
+	return (device->node_type == RDMA_NODE_IB_SWITCH) ?
+		0 : device->phys_port_cnt;
 }
 
 int ib_get_cached_gid(struct ib_device *device,
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 1c145fe..e130d2e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3280,6 +3280,9 @@ static void cm_add_one(struct ib_device 
 	int ret;
 	u8 i;
 
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
 	cm_dev = kmalloc(sizeof(*cm_dev) + sizeof(*port) *
 			 device->phys_port_cnt, GFP_KERNEL);
 	if (!cm_dev)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f7be5e7..c54c55a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -35,6 +35,7 @@ #include <linux/in6.h>
 #include <linux/mutex.h>
 #include <linux/random.h>
 #include <linux/idr.h>
+#include <linux/inetdevice.h>
 
 #include <net/tcp.h>
 
@@ -43,6 +44,7 @@ #include <rdma/rdma_cm_ib.h>
 #include <rdma/ib_cache.h>
 #include <rdma/ib_cm.h>
 #include <rdma/ib_sa.h>
+#include <rdma/iw_cm.h>
 
 MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("Generic RDMA CM Agent");
@@ -124,6 +126,7 @@ struct rdma_id_private {
 	int			query_id;
 	union {
 		struct ib_cm_id	*ib;
+		struct iw_cm_id	*iw;
 	} cm_id;
 
 	u32			seq_num;
@@ -259,14 +262,23 @@ static void cma_detach_from_dev(struct r
 	id_priv->cma_dev = NULL;
 }
 
-static int cma_acquire_ib_dev(struct rdma_id_private *id_priv)
+static int cma_acquire_dev(struct rdma_id_private *id_priv)
 {
+	enum rdma_node_type dev_type = id_priv->id.route.addr.dev_addr.dev_type;
 	struct cma_device *cma_dev;
 	union ib_gid gid;
 	int ret = -ENODEV;
 
-	ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid),
-
+	switch (rdma_node_get_transport(dev_type)) {
+	case RDMA_TRANSPORT_IB:
+		ib_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid);
+		break;
+	case RDMA_TRANSPORT_IWARP:
+		iw_addr_get_sgid(&id_priv->id.route.addr.dev_addr, &gid);
+		break;
+	default:
+		return -ENODEV;
+	}
 	mutex_lock(&lock);
 	list_for_each_entry(cma_dev, &dev_list, list) {
 		ret = ib_find_cached_gid(cma_dev->device, &gid,
@@ -280,16 +292,6 @@ static int cma_acquire_ib_dev(struct rdm
 	return ret;
 }
 
-static int cma_acquire_dev(struct rdma_id_private *id_priv)
-{
-	switch (id_priv->id.route.addr.dev_addr.dev_type) {
-	case IB_NODE_CA:
-		return cma_acquire_ib_dev(id_priv);
-	default:
-		return -ENODEV;
-	}
-}
-
 static void cma_deref_id(struct rdma_id_private *id_priv)
 {
 	if (atomic_dec_and_test(&id_priv->refcount))
@@ -347,6 +349,16 @@ static int cma_init_ib_qp(struct rdma_id
 					  IB_QP_PKEY_INDEX | IB_QP_PORT);
 }
 
+static int cma_init_iw_qp(struct rdma_id_private *id_priv, struct ib_qp *qp)
+{
+	struct ib_qp_attr qp_attr;
+
+	qp_attr.qp_state = IB_QPS_INIT;
+	qp_attr.qp_access_flags = IB_ACCESS_LOCAL_WRITE;
+
+	return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS);
+}
+
 int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
 		   struct ib_qp_init_attr *qp_init_attr)
 {
@@ -362,10 +374,13 @@ int rdma_create_qp(struct rdma_cm_id *id
 	if (IS_ERR(qp))
 		return PTR_ERR(qp);
 
-	switch (id->device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		ret = cma_init_ib_qp(id_priv, qp);
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = cma_init_iw_qp(id_priv, qp);
+		break;
 	default:
 		ret = -ENOSYS;
 		break;
@@ -451,13 +466,17 @@ int rdma_init_qp_attr(struct rdma_cm_id 
 	int ret;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
-	switch (id_priv->id.device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		ret = ib_cm_init_qp_attr(id_priv->cm_id.ib, qp_attr,
 					 qp_attr_mask);
 		if (qp_attr->qp_state == IB_QPS_RTR)
 			qp_attr->rq_psn = id_priv->seq_num;
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr,
+					qp_attr_mask);
+		break;
 	default:
 		ret = -ENOSYS;
 		break;
@@ -590,8 +609,8 @@ static int cma_notify_user(struct rdma_i
 
 static void cma_cancel_route(struct rdma_id_private *id_priv)
 {
-	switch (id_priv->id.device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		if (id_priv->query)
 			ib_sa_cancel_query(id_priv->query_id, id_priv->query);
 		break;
@@ -611,11 +630,15 @@ static void cma_destroy_listen(struct rd
 	cma_exch(id_priv, CMA_DESTROYING);
 
 	if (id_priv->cma_dev) {
-		switch (id_priv->id.device->node_type) {
-		case IB_NODE_CA:
+		switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
+		case RDMA_TRANSPORT_IB:
 			if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
 				ib_destroy_cm_id(id_priv->cm_id.ib);
 			break;
+		case RDMA_TRANSPORT_IWARP:
+			if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw))
+				iw_destroy_cm_id(id_priv->cm_id.iw);
+			break;
 		default:
 			break;
 		}
@@ -690,11 +713,15 @@ void rdma_destroy_id(struct rdma_cm_id *
 	cma_cancel_operation(id_priv, state);
 
 	if (id_priv->cma_dev) {
-		switch (id->device->node_type) {
-		case IB_NODE_CA:
+		switch (rdma_node_get_transport(id->device->node_type)) {
+		case RDMA_TRANSPORT_IB:
 			if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
 				ib_destroy_cm_id(id_priv->cm_id.ib);
 			break;
+		case RDMA_TRANSPORT_IWARP:
+			if (id_priv->cm_id.iw && !IS_ERR(id_priv->cm_id.iw))
+				iw_destroy_cm_id(id_priv->cm_id.iw);
+			break;
 		default:
 			break;
 		}
@@ -869,7 +896,7 @@ static struct rdma_id_private *cma_new_i
 	ib_addr_set_sgid(&rt->addr.dev_addr, &rt->path_rec[0].sgid);
 	ib_addr_set_dgid(&rt->addr.dev_addr, &rt->path_rec[0].dgid);
 	ib_addr_set_pkey(&rt->addr.dev_addr, be16_to_cpu(rt->path_rec[0].pkey));
-	rt->addr.dev_addr.dev_type = IB_NODE_CA;
+	rt->addr.dev_addr.dev_type = RDMA_NODE_IB_CA;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
 	id_priv->state = CMA_CONNECT;
@@ -898,7 +925,7 @@ static int cma_req_handler(struct ib_cm_
 	}
 
 	atomic_inc(&conn_id->dev_remove);
-	ret = cma_acquire_ib_dev(conn_id);
+	ret = cma_acquire_dev(conn_id);
 	if (ret) {
 		ret = -ENODEV;
 		cma_release_remove(conn_id);
@@ -982,6 +1009,128 @@ static void cma_set_compare_data(enum rd
 	}
 }
 
+static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event)
+{
+	struct rdma_id_private *id_priv = iw_id->context;
+	enum rdma_cm_event_type event = 0;
+	struct sockaddr_in *sin;
+	int ret = 0;
+
+	atomic_inc(&id_priv->dev_remove);
+
+	switch (iw_event->event) {
+	case IW_CM_EVENT_CLOSE:
+		event = RDMA_CM_EVENT_DISCONNECTED;
+		break;
+	case IW_CM_EVENT_CONNECT_REPLY:
+		sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr;
+		*sin = iw_event->local_addr;
+		sin = (struct sockaddr_in *) &id_priv->id.route.addr.dst_addr;
+		*sin = iw_event->remote_addr;
+		if (iw_event->status)
+			event = RDMA_CM_EVENT_REJECTED;
+		else
+			event = RDMA_CM_EVENT_ESTABLISHED;
+		break;
+	case IW_CM_EVENT_ESTABLISHED:
+		event = RDMA_CM_EVENT_ESTABLISHED;
+		break;
+	default:
+		BUG_ON(1);
+	}
+
+	ret = cma_notify_user(id_priv, event, iw_event->status,
+			      iw_event->private_data,
+			      iw_event->private_data_len);
+	if (ret) {
+		/* Destroy the CM ID by returning a non-zero value. */
+		id_priv->cm_id.iw = NULL;
+		cma_exch(id_priv, CMA_DESTROYING);
+		cma_release_remove(id_priv);
+		rdma_destroy_id(&id_priv->id);
+		return ret;
+	}
+
+	cma_release_remove(id_priv);
+	return ret;
+}
+
+static int iw_conn_req_handler(struct iw_cm_id *cm_id,
+			       struct iw_cm_event *iw_event)
+{
+	struct rdma_cm_id *new_cm_id;
+	struct rdma_id_private *listen_id, *conn_id;
+	struct sockaddr_in *sin;
+	struct net_device *dev = NULL;
+	int ret;
+
+	listen_id = cm_id->context;
+	atomic_inc(&listen_id->dev_remove);
+	if (!cma_comp(listen_id, CMA_LISTEN)) {
+		ret = -ECONNABORTED;
+		goto out;
+	}
+
+	/* Create a new RDMA id for the new IW CM ID */
+	new_cm_id = rdma_create_id(listen_id->id.event_handler,
+				   listen_id->id.context,
+				   RDMA_PS_TCP);
+	if (!new_cm_id) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	conn_id = container_of(new_cm_id, struct rdma_id_private, id);
+	atomic_inc(&conn_id->dev_remove);
+	conn_id->state = CMA_CONNECT;
+
+	dev = ip_dev_find(iw_event->local_addr.sin_addr.s_addr);
+	if (!dev) {
+		ret = -EADDRNOTAVAIL;
+		cma_release_remove(conn_id);
+		rdma_destroy_id(new_cm_id);
+		goto out;
+	}
+	ret = rdma_copy_addr(&conn_id->id.route.addr.dev_addr, dev, NULL);
+	if (ret) {
+		cma_release_remove(conn_id);
+		rdma_destroy_id(new_cm_id);
+		goto out;
+	}
+
+	ret = cma_acquire_dev(conn_id);
+	if (ret) {
+		cma_release_remove(conn_id);
+		rdma_destroy_id(new_cm_id);
+		goto out;
+	}
+
+	conn_id->cm_id.iw = cm_id;
+	cm_id->context = conn_id;
+	cm_id->cm_handler = cma_iw_handler;
+
+	sin = (struct sockaddr_in *) &new_cm_id->route.addr.src_addr;
+	*sin = iw_event->local_addr;
+	sin = (struct sockaddr_in *) &new_cm_id->route.addr.dst_addr;
+	*sin = iw_event->remote_addr;
+
+	ret = cma_notify_user(conn_id, RDMA_CM_EVENT_CONNECT_REQUEST, 0,
+			      iw_event->private_data,
+			      iw_event->private_data_len);
+	if (ret) {
+		/* User wants to destroy the CM ID */
+		conn_id->cm_id.iw = NULL;
+		cma_exch(conn_id, CMA_DESTROYING);
+		cma_release_remove(conn_id);
+		rdma_destroy_id(&conn_id->id);
+	}
+
+out:
+	if (dev)
+		dev_put(dev);
+	cma_release_remove(listen_id);
+	return ret;
+}
+
 static int cma_ib_listen(struct rdma_id_private *id_priv)
 {
 	struct ib_cm_compare_data compare_data;
@@ -1011,6 +1160,30 @@ static int cma_ib_listen(struct rdma_id_
 	return ret;
 }
 
+static int cma_iw_listen(struct rdma_id_private *id_priv, int backlog)
+{
+	int ret;
+	struct sockaddr_in *sin;
+
+	id_priv->cm_id.iw = iw_create_cm_id(id_priv->id.device,
+					    iw_conn_req_handler,
+					    id_priv);
+	if (IS_ERR(id_priv->cm_id.iw))
+		return PTR_ERR(id_priv->cm_id.iw);
+
+	sin = (struct sockaddr_in *) &id_priv->id.route.addr.src_addr;
+	id_priv->cm_id.iw->local_addr = *sin;
+
+	ret = iw_cm_listen(id_priv->cm_id.iw, backlog);
+
+	if (ret) {
+		iw_destroy_cm_id(id_priv->cm_id.iw);
+		id_priv->cm_id.iw = NULL;
+	}
+
+	return ret;
+}
+
 static int cma_listen_handler(struct rdma_cm_id *id,
 			      struct rdma_cm_event *event)
 {
@@ -1087,12 +1260,17 @@ int rdma_listen(struct rdma_cm_id *id, i
 
 	id_priv->backlog = backlog;
 	if (id->device) {
-		switch (id->device->node_type) {
-		case IB_NODE_CA:
+		switch (rdma_node_get_transport(id->device->node_type)) {
+		case RDMA_TRANSPORT_IB:
 			ret = cma_ib_listen(id_priv);
 			if (ret)
 				goto err;
 			break;
+		case RDMA_TRANSPORT_IWARP:
+			ret = cma_iw_listen(id_priv, backlog);
+			if (ret)
+				goto err;
+			break;
 		default:
 			ret = -ENOSYS;
 			goto err;
@@ -1231,6 +1409,23 @@ err:
 }
 EXPORT_SYMBOL(rdma_set_ib_paths);
 
+static int cma_resolve_iw_route(struct rdma_id_private *id_priv, int timeout_ms)
+{
+	struct cma_work *work;
+
+	work = kzalloc(sizeof *work, GFP_KERNEL);
+	if (!work)
+		return -ENOMEM;
+
+	work->id = id_priv;
+	INIT_WORK(&work->work, cma_work_handler, work);
+	work->old_state = CMA_ROUTE_QUERY;
+	work->new_state = CMA_ROUTE_RESOLVED;
+	work->event.event = RDMA_CM_EVENT_ROUTE_RESOLVED;
+	queue_work(cma_wq, &work->work);
+	return 0;
+}
+
 int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
 {
 	struct rdma_id_private *id_priv;
@@ -1241,10 +1436,13 @@ int rdma_resolve_route(struct rdma_cm_id
 		return -EINVAL;
 
 	atomic_inc(&id_priv->refcount);
-	switch (id->device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		ret = cma_resolve_ib_route(id_priv, timeout_ms);
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = cma_resolve_iw_route(id_priv, timeout_ms);
+		break;
 	default:
 		ret = -ENOSYS;
 		break;
@@ -1649,6 +1847,47 @@ out:
 	return ret;
 }
 
+static int cma_connect_iw(struct rdma_id_private *id_priv,
+			  struct rdma_conn_param *conn_param)
+{
+	struct iw_cm_id *cm_id;
+	struct sockaddr_in* sin;
+	int ret;
+	struct iw_cm_conn_param iw_param;
+
+	cm_id = iw_create_cm_id(id_priv->id.device, cma_iw_handler, id_priv);
+	if (IS_ERR(cm_id)) {
+		ret = PTR_ERR(cm_id);
+		goto out;
+	}
+
+	id_priv->cm_id.iw = cm_id;
+
+	sin = (struct sockaddr_in*) &id_priv->id.route.addr.src_addr;
+	cm_id->local_addr = *sin;
+
+	sin = (struct sockaddr_in*) &id_priv->id.route.addr.dst_addr;
+	cm_id->remote_addr = *sin;
+
+	ret = cma_modify_qp_rtr(&id_priv->id);
+	if (ret) {
+		iw_destroy_cm_id(cm_id);
+		return ret;
+	}
+
+	iw_param.ord = conn_param->initiator_depth;
+	iw_param.ird = conn_param->responder_resources;
+	iw_param.private_data = conn_param->private_data;
+	iw_param.private_data_len = conn_param->private_data_len;
+	if (id_priv->id.qp)
+		iw_param.qpn = id_priv->qp_num;
+	else
+		iw_param.qpn = conn_param->qp_num;
+	ret = iw_cm_connect(cm_id, &iw_param);
+out:
+	return ret;
+}
+
 int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
 {
 	struct rdma_id_private *id_priv;
@@ -1664,10 +1903,13 @@ int rdma_connect(struct rdma_cm_id *id, 
 		id_priv->srq = conn_param->srq;
 	}
 
-	switch (id->device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		ret = cma_connect_ib(id_priv, conn_param);
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = cma_connect_iw(id_priv, conn_param);
+		break;
 	default:
 		ret = -ENOSYS;
 		break;
@@ -1708,6 +1950,28 @@ static int cma_accept_ib(struct rdma_id_
 	return ib_send_cm_rep(id_priv->cm_id.ib, &rep);
 }
 
+static int cma_accept_iw(struct rdma_id_private *id_priv,
+		  struct rdma_conn_param *conn_param)
+{
+	struct iw_cm_conn_param iw_param;
+	int ret;
+
+	ret = cma_modify_qp_rtr(&id_priv->id);
+	if (ret)
+		return ret;
+
+	iw_param.ord = conn_param->initiator_depth;
+	iw_param.ird = conn_param->responder_resources;
+	iw_param.private_data = conn_param->private_data;
+	iw_param.private_data_len = conn_param->private_data_len;
+	if (id_priv->id.qp) {
+		iw_param.qpn = id_priv->qp_num;
+	} else
+		iw_param.qpn = conn_param->qp_num;
+
+	return iw_cm_accept(id_priv->cm_id.iw, &iw_param);
+}
+
 int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
 {
 	struct rdma_id_private *id_priv;
@@ -1723,13 +1987,16 @@ int rdma_accept(struct rdma_cm_id *id, s
 		id_priv->srq = conn_param->srq;
 	}
 
-	switch (id->device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		if (conn_param)
 			ret = cma_accept_ib(id_priv, conn_param);
 		else
 			ret = cma_rep_recv(id_priv);
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = cma_accept_iw(id_priv, conn_param);
+		break;
 	default:
 		ret = -ENOSYS;
 		break;
@@ -1756,12 +2023,16 @@ int rdma_reject(struct rdma_cm_id *id, c
 	if (!cma_comp(id_priv, CMA_CONNECT))
 		return -EINVAL;
 
-	switch (id->device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
 		ret = ib_send_cm_rej(id_priv->cm_id.ib,
 				     IB_CM_REJ_CONSUMER_DEFINED, NULL, 0,
 				     private_data, private_data_len);
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = iw_cm_reject(id_priv->cm_id.iw,
+				   private_data, private_data_len);
+		break;
 	default:
 		ret = -ENOSYS;
 		break;
@@ -1780,16 +2051,18 @@ int rdma_disconnect(struct rdma_cm_id *i
 	    !cma_comp(id_priv, CMA_DISCONNECT))
 		return -EINVAL;
 
-	ret = cma_modify_qp_err(id);
-	if (ret)
-		goto out;
-
-	switch (id->device->node_type) {
-	case IB_NODE_CA:
+	switch (rdma_node_get_transport(id->device->node_type)) {
+	case RDMA_TRANSPORT_IB:
+		ret = cma_modify_qp_err(id);
+		if (ret)
+			goto out;
 		/* Initiate or respond to a disconnect. */
 		if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
 			ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
 		break;
+	case RDMA_TRANSPORT_IWARP:
+		ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
+		break;
 	default:
 		break;
 	}
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index b2f3cb9..d978fbe 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -505,7 +505,7 @@ int ib_query_port(struct ib_device *devi
 		  u8 port_num,
 		  struct ib_port_attr *port_attr)
 {
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		if (port_num)
 			return -EINVAL;
 	} else if (port_num < 1 || port_num > device->phys_port_cnt)
@@ -580,7 +580,7 @@ int ib_modify_port(struct ib_device *dev
 		   u8 port_num, int port_modify_mask,
 		   struct ib_port_modify *port_modify)
 {
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		if (port_num)
 			return -EINVAL;
 	} else if (port_num < 1 || port_num > device->phys_port_cnt)
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 32d3028..082f03c 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2876,7 +2876,10 @@ static void ib_mad_init_device(struct ib
 {
 	int start, end, i;
 
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		start = 0;
 		end   = 0;
 	} else {
@@ -2923,7 +2926,7 @@ static void ib_mad_remove_device(struct 
 {
 	int i, num_ports, cur_port;
 
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		num_ports = 1;
 		cur_port = 0;
 	} else {
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index df762ba..ca8760a 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -919,7 +919,10 @@ static void ib_sa_add_one(struct ib_devi
 	struct ib_sa_device *sa_dev;
 	int s, e, i;
 
-	if (device->node_type == IB_NODE_SWITCH)
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
+	if (device->node_type == RDMA_NODE_IB_SWITCH)
 		s = e = 0;
 	else {
 		s = 1;
diff --git a/drivers/infiniband/core/smi.c b/drivers/infiniband/core/smi.c
index 35852e7..54b81e1 100644
--- a/drivers/infiniband/core/smi.c
+++ b/drivers/infiniband/core/smi.c
@@ -64,7 +64,7 @@ int smi_handle_dr_smp_send(struct ib_smp
 
 		/* C14-9:2 */
 		if (hop_ptr && hop_ptr < hop_cnt) {
-			if (node_type != IB_NODE_SWITCH)
+			if (node_type != RDMA_NODE_IB_SWITCH)
 				return 0;
 
 			/* smp->return_path set when received */
@@ -77,7 +77,7 @@ int smi_handle_dr_smp_send(struct ib_smp
 		if (hop_ptr == hop_cnt) {
 			/* smp->return_path set when received */
 			smp->hop_ptr++;
-			return (node_type == IB_NODE_SWITCH ||
+			return (node_type == RDMA_NODE_IB_SWITCH ||
 				smp->dr_dlid == IB_LID_PERMISSIVE);
 		}
 
@@ -95,7 +95,7 @@ int smi_handle_dr_smp_send(struct ib_smp
 
 		/* C14-13:2 */
 		if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
-			if (node_type != IB_NODE_SWITCH)
+			if (node_type != RDMA_NODE_IB_SWITCH)
 				return 0;
 
 			smp->hop_ptr--;
@@ -107,7 +107,7 @@ int smi_handle_dr_smp_send(struct ib_smp
 		if (hop_ptr == 1) {
 			smp->hop_ptr--;
 			/* C14-13:3 -- SMPs destined for SM shouldn't be here */
-			return (node_type == IB_NODE_SWITCH ||
+			return (node_type == RDMA_NODE_IB_SWITCH ||
 				smp->dr_slid == IB_LID_PERMISSIVE);
 		}
 
@@ -142,7 +142,7 @@ int smi_handle_dr_smp_recv(struct ib_smp
 
 		/* C14-9:2 -- intermediate hop */
 		if (hop_ptr && hop_ptr < hop_cnt) {
-			if (node_type != IB_NODE_SWITCH)
+			if (node_type != RDMA_NODE_IB_SWITCH)
 				return 0;
 
 			smp->return_path[hop_ptr] = port_num;
@@ -156,7 +156,7 @@ int smi_handle_dr_smp_recv(struct ib_smp
 				smp->return_path[hop_ptr] = port_num;
 			/* smp->hop_ptr updated when sending */
 
-			return (node_type == IB_NODE_SWITCH ||
+			return (node_type == RDMA_NODE_IB_SWITCH ||
 				smp->dr_dlid == IB_LID_PERMISSIVE);
 		}
 
@@ -175,7 +175,7 @@ int smi_handle_dr_smp_recv(struct ib_smp
 
 		/* C14-13:2 */
 		if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
-			if (node_type != IB_NODE_SWITCH)
+			if (node_type != RDMA_NODE_IB_SWITCH)
 				return 0;
 
 			/* smp->hop_ptr updated when sending */
@@ -190,7 +190,7 @@ int smi_handle_dr_smp_recv(struct ib_smp
 				return 1;
 			}
 			/* smp->hop_ptr updated when sending */
-			return (node_type == IB_NODE_SWITCH);
+			return (node_type == RDMA_NODE_IB_SWITCH);
 		}
 
 		/* C14-13:4 -- hop_ptr = 0 -> give to SM */
diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index fb66605..709323c 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -589,10 +589,11 @@ static ssize_t show_node_type(struct cla
 		return -ENODEV;
 
 	switch (dev->node_type) {
-	case IB_NODE_CA:     return sprintf(buf, "%d: CA\n", dev->node_type);
-	case IB_NODE_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type);
-	case IB_NODE_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type);
-	default:             return sprintf(buf, "%d: <unknown>\n", dev->node_type);
+	case RDMA_NODE_IB_CA:	  return sprintf(buf, "%d: CA\n", dev->node_type);
+	case RDMA_NODE_RNIC:	  return sprintf(buf, "%d: RNIC\n", dev->node_type);
+	case RDMA_NODE_IB_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type);
+	case RDMA_NODE_IB_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type);
+	default:		  return sprintf(buf, "%d: <unknown>\n", dev->node_type);
 	}
 }
 
@@ -708,7 +709,7 @@ int ib_device_register_sysfs(struct ib_d
 	if (ret)
 		goto err_put;
 
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		ret = add_port(device, 0);
 		if (ret)
 			goto err_put;
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index e74c964..ad4f4d5 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1247,7 +1247,8 @@ static void ib_ucm_add_one(struct ib_dev
 {
 	struct ib_ucm_device *ucm_dev;
 
-	if (!device->alloc_ucontext)
+	if (!device->alloc_ucontext ||
+	    rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
 		return;
 
 	ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL);
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 8a455ae..807fbd6 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1032,7 +1032,10 @@ static void ib_umad_add_one(struct ib_de
 	struct ib_umad_device *umad_dev;
 	int s, e, i;
 
-	if (device->node_type == IB_NODE_SWITCH)
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
+	if (device->node_type == RDMA_NODE_IB_SWITCH)
 		s = e = 0;
 	else {
 		s = 1;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 06f98e9..8b5dd36 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -79,6 +79,23 @@ enum ib_rate mult_to_ib_rate(int mult)
 }
 EXPORT_SYMBOL(mult_to_ib_rate);
 
+enum rdma_transport_type
+rdma_node_get_transport(enum rdma_node_type node_type)
+{
+	switch (node_type) {
+	case RDMA_NODE_IB_CA:
+	case RDMA_NODE_IB_SWITCH:
+	case RDMA_NODE_IB_ROUTER:
+		return RDMA_TRANSPORT_IB;
+	case RDMA_NODE_RNIC:
+		return RDMA_TRANSPORT_IWARP;
+	default:
+		BUG();
+		return 0;
+	}
+}
+EXPORT_SYMBOL(rdma_node_get_transport);
+
 /* Protection domains */
 
 struct ib_pd *ib_alloc_pd(struct ib_device *device)
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index a2a76c3..159b0be 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -268,7 +268,7 @@ int ehca_register_device(struct ehca_shc
 		(1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)	|
 		(1ull << IB_USER_VERBS_CMD_DETACH_MCAST);
 
-	shca->ib_device.node_type           = IB_NODE_CA;
+	shca->ib_device.node_type           = RDMA_NODE_IB_CA;
 	shca->ib_device.phys_port_cnt       = shca->num_ports;
 	shca->ib_device.dma_device          = &shca->ibmebus_dev->ofdev.dev;
 	shca->ib_device.query_device        = ehca_query_device;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index fbda773..b8381c5 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1538,7 +1538,7 @@ int ipath_register_ib_device(struct ipat
 		(1ull << IB_USER_VERBS_CMD_QUERY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_SRQ)		|
 		(1ull << IB_USER_VERBS_CMD_POST_SRQ_RECV);
-	dev->node_type = IB_NODE_CA;
+	dev->node_type = RDMA_NODE_IB_CA;
 	dev->phys_port_cnt = 1;
 	dev->dma_device = &dd->pcidev->dev;
 	dev->class_dev.dev = dev->dma_device;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 265b1d1..981fe2e 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -1288,7 +1288,7 @@ int mthca_register_device(struct mthca_d
 		(1ull << IB_USER_VERBS_CMD_DESTROY_QP)		|
 		(1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)	|
 		(1ull << IB_USER_VERBS_CMD_DETACH_MCAST);
-	dev->ib_dev.node_type            = IB_NODE_CA;
+	dev->ib_dev.node_type            = RDMA_NODE_IB_CA;
 	dev->ib_dev.phys_port_cnt        = dev->limits.num_ports;
 	dev->ib_dev.dma_device           = &dev->pdev->dev;
 	dev->ib_dev.class_dev.dev        = &dev->pdev->dev;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 36d7698..e9a7659 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1111,13 +1111,16 @@ static void ipoib_add_one(struct ib_devi
 	struct ipoib_dev_priv *priv;
 	int s, e, p;
 
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
 	dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL);
 	if (!dev_list)
 		return;
 
 	INIT_LIST_HEAD(dev_list);
 
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		s = 0;
 		e = 0;
 	} else {
@@ -1141,6 +1144,9 @@ static void ipoib_remove_one(struct ib_d
 	struct ipoib_dev_priv *priv, *tmp;
 	struct list_head *dev_list;
 
+	if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+		return;
+
 	dev_list = ib_get_client_data(device, &ipoib_client);
 
 	list_for_each_entry_safe(priv, tmp, dev_list, list) {
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 4f1775d..297c9ff 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1913,7 +1913,7 @@ static void srp_add_one(struct ib_device
 	if (IS_ERR(srp_dev->fmr_pool))
 		srp_dev->fmr_pool = NULL;
 
-	if (device->node_type == IB_NODE_SWITCH) {
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		s = 0;
 		e = 0;
 	} else {
diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h
index 0ff6739..81b6230 100644
--- a/include/rdma/ib_addr.h
+++ b/include/rdma/ib_addr.h
@@ -40,7 +40,7 @@ struct rdma_dev_addr {
 	unsigned char src_dev_addr[MAX_ADDR_LEN];
 	unsigned char dst_dev_addr[MAX_ADDR_LEN];
 	unsigned char broadcast[MAX_ADDR_LEN];
-	enum ib_node_type dev_type;
+	enum rdma_node_type dev_type;
 };
 
 /**
@@ -72,6 +72,9 @@ int rdma_resolve_ip(struct sockaddr *src
 
 void rdma_addr_cancel(struct rdma_dev_addr *addr);
 
+int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev,
+	      const unsigned char *dst_dev_addr);
+
 static inline int ip_addr_size(struct sockaddr *addr)
 {
 	return addr->sa_family == AF_INET6 ?
@@ -113,4 +116,16 @@ static inline void ib_addr_set_dgid(stru
 	memcpy(dev_addr->dst_dev_addr + 4, gid, sizeof *gid);
 }
 
+static inline void iw_addr_get_sgid(struct rdma_dev_addr *dev_addr,
+				    union ib_gid *gid)
+{
+	memcpy(gid, dev_addr->src_dev_addr, sizeof *gid);
+}
+
+static inline void iw_addr_get_dgid(struct rdma_dev_addr *dev_addr,
+				    union ib_gid *gid)
+{
+	memcpy(gid, dev_addr->dst_dev_addr, sizeof *gid);
+}
+
 #endif /* IB_ADDR_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 61eed39..8eacc35 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -56,12 +56,22 @@ union ib_gid {
 	} global;
 };
 
-enum ib_node_type {
-	IB_NODE_CA 	= 1,
-	IB_NODE_SWITCH,
-	IB_NODE_ROUTER
+enum rdma_node_type {
+	/* IB values map to NodeInfo:NodeType. */
+	RDMA_NODE_IB_CA 	= 1,
+	RDMA_NODE_IB_SWITCH,
+	RDMA_NODE_IB_ROUTER,
+	RDMA_NODE_RNIC
 };
 
+enum rdma_transport_type {
+	RDMA_TRANSPORT_IB,
+	RDMA_TRANSPORT_IWARP
+};
+
+enum rdma_transport_type
+rdma_node_get_transport(enum rdma_node_type node_type) __attribute_const__;
+
 enum ib_device_cap_flags {
 	IB_DEVICE_RESIZE_MAX_WR		= 1,
 	IB_DEVICE_BAD_PKEY_CNTR		= (1<<1),
@@ -78,6 +88,9 @@ enum ib_device_cap_flags {
 	IB_DEVICE_RC_RNR_NAK_GEN	= (1<<12),
 	IB_DEVICE_SRQ_RESIZE		= (1<<13),
 	IB_DEVICE_N_NOTIFY_CQ		= (1<<14),
+	IB_DEVICE_ZERO_STAG		= (1<<15),
+	IB_DEVICE_SEND_W_INV		= (1<<16),
+	IB_DEVICE_MEM_WINDOW		= (1<<17)
 };
 
 enum ib_atomic_cap {
@@ -835,6 +848,8 @@ struct ib_cache {
 	u8                     *lmc_cache;
 };
 
+struct iw_cm_verbs;
+
 struct ib_device {
 	struct device                *dma_device;
 
@@ -851,6 +866,8 @@ struct ib_device {
 
 	u32                           flags;
 
+	struct iw_cm_verbs	     *iwcm;
+
 	int		           (*query_device)(struct ib_device *device,
 						   struct ib_device_attr *device_attr);
 	int		           (*query_port)(struct ib_device *device,
-- 
1.4.1


From rolandd at cisco.com  Fri Sep  8 14:55:40 2006
From: rolandd at cisco.com (Roland Dreier)
Date: Fri, 8 Sep 2006 14:55:40 -0700
Subject: [openib-general] [PATCH 0/2] RDMA: merge iWARP support
Message-ID: <2006981455.F7Cau4RN2pBSAVMu@cisco.com>

Here is a series of patches that adds iWARP (RDMA over IP) support to
the InfiniBand support already in the kernel.  Since the iWARP RDMA
model is quite close to the InfiniBand model, the changes are not that
large.  The biggest difference is in how connections are established,
since iWARP connections are TCP connections, while IB uses a different
(native IB) mechanism for establishing a connection.

The first patch in the series adds an iWARP connection manager, which
handles establishing and tearing down connections for iWARP devices.
The second patch is all the small changes required to hook in the
connection manager and make the rest of the IB stuff also work with
iWARP devices.  The third patch (compressed due to its size) adds the
first driver for an iWARP device, the Ammasso 1100 1 Gb/sec RNIC.

My current plan is to merge this stuff for 2.6.19.  Please let me know
if you see anything (major or minor) that needs to be fixed up.

Thanks,
  Roland


From rdreier at cisco.com  Fri Sep  8 14:58:08 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 08 Sep 2006 14:58:08 -0700
Subject: [openib-general] [PATCH 3/2] RDMA: Ammasso 1100 RNIC driver
In-Reply-To: <2006981455.5zPhTm8jRQnxTde2@cisco.com> (Roland Dreier's
	message of "Fri, 8 Sep 2006 14:55:41 -0700")
References: <2006981455.5zPhTm8jRQnxTde2@cisco.com>
Message-ID: <ada1wqm81hr.fsf@cisco.com>

Here's the compressed patch adding the amso1100 driver.  You can also
find this in my git tree at

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git

in the for-2.6.19 branch.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-RDMA-amso1100-Add-driver-for-Ammasso-1100-RNIC.txt.bz2
Type: application/x-bzip
Size: 42201 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060908/5e5f2da3/attachment.bin>

From Terry.Yoder at qlogic.com  Fri Sep  8 16:48:14 2006
From: Terry.Yoder at qlogic.com (Terry Yoder)
Date: Fri, 8 Sep 2006 16:48:14 -0700
Subject: [openib-general] svn iwarp and OFED
Message-ID: <CD3B36223A7B9F40A6FE6BAAC4E2DA6137AA66@AVEXCH1.qlogic.org>

Is the svn iwarp branch in sync with OFED 1.1 rc3?

Terry	


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060908/a9d779cb/attachment.html>

From shahanse at cisco.com  Fri Sep  8 17:28:45 2006
From: shahanse at cisco.com (Shawn Hansen (shahanse))
Date: Fri, 8 Sep 2006 17:28:45 -0700
Subject: [openib-general] Goodbye and Transition
Message-ID: <BAF543018E1B014FA92FE52EAFCEE32402CE3A69@xmb-sjc-236.amer.cisco.com>

All,

FYI: I've decided to relocate my family to Seattle, and will be leaving
Cisco.  I plan to join Microsoft's Server and Tools division at the end
of this month.

I would like to recommend Jamie Riotto, Senior Director of Engineering,
as my EWG replacement.  Jamie is responsible for all engineering for
Cisco's Server Networking and Virtualization Business Unit, including
Cisco's host driver and RDMA development efforts.

Please stay in touch, and I wish the team the best.

Regards,

--Shawn 
----------------------------
Shawn Hansen
Director, Product Management
Cisco Systems


From swise at opengridcomputing.com  Sat Sep  9 05:44:54 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 9 Sep 2006 07:44:54 -0500
Subject: [openib-general] svn iwarp and OFED
References: <CD3B36223A7B9F40A6FE6BAAC4E2DA6137AA66@AVEXCH1.qlogic.org>
Message-ID: <003a01c6d40d$c0a96c70$020010ac@haggard>

No.

It is at trunk revision 7626.  Merged 6/2/2006 under revision 7631.

Steve.

----- Original Message ----- 
From: "Terry Yoder" <Terry.Yoder at qlogic.com>
To: <openib-general at openib.org>
Sent: Friday, September 08, 2006 6:48 PM
Subject: [openib-general] svn iwarp and OFED


Is the svn iwarp branch in sync with OFED 1.1 rc3?

Terry


--------------------------------------------------------------------------------


> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general 


From kliteyn at dev.mellanox.co.il  Sat Sep  9 23:35:54 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Sun, 10 Sep 2006 09:35:54 +0300
Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option
Message-ID: <1157870154.29270.42.camel@kliteynik.yok.mtl.com>

Hi Hal

This patch fixes the bug that was occurring when OSM was 
running with --run-once option (-o) and the SM port was down.
In that case, OSM would be stuck in cond_wait forever (or until
the port will become active), and could not be terminated, 
other than by SIGKILL.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Index: opensm/main.c
===================================================================
--- opensm/main.c       (revision 9354)
+++ opensm/main.c       (working copy)
@@ -908,9 +908,13 @@ main(

   if( run_once_flag == TRUE )
   {
-    status = osm_opensm_wait_for_subnet_up(
-      &osm, EVENT_NO_TIMEOUT, TRUE );
-    osm_exit_flag = 1;
+     while (!osm_exit_flag)
+     {
+        status = osm_opensm_wait_for_subnet_up(
+                  &osm, osm.subn.opt.sweep_interval * 1000000, TRUE );
+        if (!status)
+           osm_exit_flag = 1;
+     }
   }
   else
   {


From dotanb at dev.mellanox.co.il  Sat Sep  9 23:43:56 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Sun, 10 Sep 2006 09:43:56 +0300
Subject: [openib-general] HCAs with and without memory
In-Reply-To: <a94efc20609080319w2fa92499lee9cfb3758bdaa13@mail.gmail.com>
References: <a94efc20609080319w2fa92499lee9cfb3758bdaa13@mail.gmail.com>
Message-ID: <4503B42C.60405@dev.mellanox.co.il>

Hi john.

john t wrote:
> Hi OpenIB group,
>  
> What is the difference between HCAs with memory and without memory. 
> How is the on-board memory used by HCAs? Is it that data is first 
> copied into this memory and then into physical memory?
>  
> Regards,
> John T.

If you are asking about Mellanox HCAs i can answer you:

The difference is the technology which those HCAs are using:
The HCAs without the attached memory are using the memfree technology.

The main difference between the 2 HCAs is where the context of the 
various resources is located: in the host memory or in the attached memory.

The data itself (during data movement) is not stored in this memory at 
any point in the attached memory.

Dotan


From dotanb at dev.mellanox.co.il  Sat Sep  9 23:55:31 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Sun, 10 Sep 2006 09:55:31 +0300
Subject: [openib-general] ibv_poll_cq after ibv_post_send does not work
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD38F@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD38F@wdtssmail01.eu.thmulti.com>
Message-ID: <4503B6E3.3070506@dev.mellanox.co.il>

Bub Thomas wrote:
> Dortan Barak wrote:
> If you are using RC QP:
> the reason for not getting any completion in the CQ is that
>
> Did you post any RR (Receive Request) at the listener side?
>
>
> Dotan,
> with the cmpost.c example I now get a cm connection even with another
> machine.
> However I don't get the cq event, on the sender side, when the
> IBV_WR_SEND is done. Is this correct? Is this what you are saying below?
> If it is correct this is different from gen1 drivers where I got a
> VAPI_SUCCESS cq event. Is there a way to get this back?
>
> On the receiver side I get an cq event for the receive request.
>
> Thanks
> Thomas
>
>
>   

What do you mean that you don't get the cq event?

i assume that you are talking about the completions:

    in the receiver side there is always a completion.

    in the sender side there there may be a completion (depend on the QP 
/ WR configuration):
    if you want to have a completion you other need to set the 
sq_sig_all in the QP creation (if you want that completions will be 
created for all of the
    post sends in this QP) or you need to set the IBV_SEND_SIGNALED in 
the send_flags in the WR that you are posting.

Dotan


From greg.lindahl at qlogic.com  Sun Sep 10 00:24:17 2006
From: greg.lindahl at qlogic.com (Greg Lindahl)
Date: Sun, 10 Sep 2006 00:24:17 -0700
Subject: [openib-general] HCAs with and without memory
In-Reply-To: <a94efc20609080319w2fa92499lee9cfb3758bdaa13@mail.gmail.com>
References: <a94efc20609080319w2fa92499lee9cfb3758bdaa13@mail.gmail.com>
Message-ID: <20060910072417.GC1252@greglaptop.hsd1.ca.comcast.net>

On Fri, Sep 08, 2006 at 03:49:57PM +0530, john t wrote:

> What is the difference between HCAs with memory and without memory.

And to answer for QLogic InfiniPath HCAs, we don't sell HCAs with
memory. We don't need it. There's actually a small amount of memory
within the single chip that makes up our HCA, and that's all that's
necessary.

-- greg


From mst at mellanox.co.il  Sun Sep 10 00:58:18 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 10 Sep 2006 10:58:18 +0300
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060901102639.55709.qmail@web36915.mail.mud.yahoo.com>
References: <20060830045927.GB25478@mellanox.co.il>
	<20060901102639.55709.qmail@web36915.mail.mud.yahoo.com>
Message-ID: <20060910075818.GV6928@mellanox.co.il>

Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> Subject: Re: why sdp connections cost so much memory
> 
> OFED-1.1-rc3 has passed my tests. I have to adjust
> Post buffer size to 0x4 and use your patch for me. 
> Can you make it fixed not to do these myself manually?
> 
>  zhu

I plan to add the following patch to OFED. Could you please verify
that it fixes the issue for you, without tweaking the ring size?

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/ulp/sdp/sdp_bcopy.c b/drivers/infiniband/ulp/sdp/sdp_bcopy.c
index b30d2a0..4540fa4 100644
--- a/drivers/infiniband/ulp/sdp/sdp_bcopy.c
+++ b/drivers/infiniband/ulp/sdp/sdp_bcopy.c
@@ -37,6 +37,10 @@ #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
 #include "sdp.h"
 
+static int rcvbuf_scale = 0x1;
+module_param_named(rcvbuf_scale, rcvbuf_scale, int, 0644);
+MODULE_PARM_DESC(srcvbuf_scale, "Receive buffer size scale factor.");
+
 /* Like tcp_fin */
 static void sdp_fin(struct sock *sk)
 {
@@ -237,7 +241,7 @@ void sdp_post_recvs(struct sdp_sock *ssk
 	while ((likely(ssk->rx_head - ssk->rx_tail < SDP_RX_SIZE) &&
 		(ssk->rx_head - ssk->rx_tail - SDP_MIN_BUFS) *
 		SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE + rmem <
-		ssk->isk.sk.sk_rcvbuf * 0x10) ||
+		ssk->isk.sk.sk_rcvbuf * rcvbuf_scale) ||
 	       unlikely(ssk->rx_head - ssk->rx_tail < SDP_MIN_BUFS))
 		sdp_post_recv(ssk);
 }

-- 
MST


From erezz at voltaire.com  Sun Sep 10 02:08:48 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Sun, 10 Sep 2006 12:08:48 +0300
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <aday7sva13d.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<15ddcffd0609071252o477eeabfl31366719d0d3d9f0@mail.gmail.com>
	<aday7sva13d.fsf@cisco.com>
Message-ID: <4503D620.4000602@voltaire.com>

Roland Dreier wrote:
>     Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i
>     Or> think you are missing CONFIG_INET=m
>
> Seems like a bug in the iSER Kconfig -- it shouldn't be possible to
> select iSER without everything it needs to compile.
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>   
I wasn't able to reproduce this behavior. This is what I did:

   1. I deleted my .config file.
   2. make menuconfig
   3. Selected Device drivers/InfiniBand support = y (if I don't do
      that, I cannot select iSER).
   4. Selected ISCSI RDMA protocol = y

I've attached the .config file. Note that 
CONFIG_INFINIBAND_ADDR_TRANS=y. This is because in 
drivers/infiniband/Kconfig it says:

config INFINIBAND_ADDR_TRANS
        bool
        depends on INFINIBAND && INET
        default y

Therefore, I don't understand how did this behavior happen. Can you 
reproduce it and send the list of steps?

Erez
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: .config
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060910/3e4f62c5/attachment.ksh>

From eli at dev.mellanox.co.il  Sun Sep 10 02:30:28 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Sun, 10 Sep 2006 12:30:28 +0300
Subject: [openib-general] PXE + infiniband?
In-Reply-To: <003e01c6d24d$f19caed0$8000a8c0@blorp>
References: <2376B63A5AF8564F8A2A2D76BC6DB033D0C7A8@CINMLVEM11.e2k.ad.ge.com>
	<1157610538.30038.35.camel@localhost>
	<003e01c6d24d$f19caed0$8000a8c0@blorp>
Message-ID: <1157880628.5386.13.camel@localhost>

On Thu, 2006-09-07 at 08:19 +0100, Paul Baxter wrote:
> > There is an implementation of PXE for Mellanox's HCAs that can be found
> > here: http://sourceforge.net/forum/forum.php?forum_id=494529
> 
> Thanks for the tip
> 
> I, too, am interested in this.
> 
> Do you have a more direct link as I wandered around etherboot's project site 
> and couldn't find anything IB-specific.
> 
> Paul Baxter 
Hi,

Please use the following link
http://kent.dl.sourceforge.net/sourceforge/etherboot/etherboot-5.4.2.tar.bz2 to download the package. Unpack the package and cd to the src dir. Use an x86 arch machine to build the binaries. The infiniband drivers are located at src/drivers/net/mlx_ipoib/ where you can find a readme file in the doc directory. To build.

cd src
make bin/MT23108.zrom  // for MT230108
make bin/MT25208.zrom
make bin/MT25218.zrom

This covers all Mellanox HCAs. Please let me know if you need more
assistance.


From mst at mellanox.co.il  Sun Sep 10 02:37:47 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 10 Sep 2006 12:37:47 +0300
Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4
In-Reply-To: <OF609606CA.F2DB04D1-ONC12571E2.00719DBA-C12571E2.0076F34D@de.ibm.com>
References: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA7892@mtlexch01.mtl.com>
	<OF609606CA.F2DB04D1-ONC12571E2.00719DBA-C12571E2.0076F34D@de.ibm.com>
Message-ID: <20060910093747.GA11625@mellanox.co.il>

Quoting r. Hoang-Nam Nguyen <HNGUYEN at de.ibm.com>:
Subject: [PATCH] ehca for OFED 1.1-rc4


> Hello Tziporet!
> Below is a patch of ehca against the ofed git tree branch ehca-branch in order to upgrade it to the same code level of Roland's git tree branch for-2.6.19, which has been posted for a while. The main code changes are:
> - Replace the "huge" EDEB macro by a simpler wrapper based on dev_err/dbg
> - Remove superfluous variables initialization and arguments checking
> - Replace struct ehca_module by static member variables in appropriate files, where they are accessed
> - Rename module name to ib_ehca.ko
> Thanks!
> Nam Nguyen

Unfortunately, the patch doesn't apply against either ofed_1_1 or ehca_branch.
Please check that it does, before re-posting.

One additional request: your message included a copy
of patch in both plain text and html format. Please
post plain text only.

Thanks,

-- 
MST


From vlad at dev.mellanox.co.il  Sun Sep 10 04:00:18 2006
From: vlad at dev.mellanox.co.il (vlad at dev.mellanox.co.il)
Date: Sun, 10 Sep 2006 14:00:18 +0300 (IDT)
Subject: [openib-general] [openfabrics-ewg]  OFED 1.1 status
In-Reply-To: <OF0A584E8F.0BC8F351-ONC12571E3.004DD9F6-C12571E3.004F305F@de.ibm.com>
References: <OF0A584E8F.0BC8F351-ONC12571E3.004DD9F6-C12571E3.004F305F@de.ibm.com>
Message-ID: <14781.194.90.237.34.1157886018.squirrel@dev.mellanox.co.il>

Hello Nam Nguyen,
See my comments regarding OFED-1.1-rc3 below.
Please check also libehca compilation issue:
http://openib.org/bugzilla/show_bug.cgi?id=228

Regards,
Vladimir

> Hello Tziporet!
> First sorry for this late response regarding ehca build test in OFED 1.1
> rc3.
>
> 1) The userspace lib dir for libehca contains only a few c-files, but no
> header files.
> On svn dir branches/1.1/src/userspace/libehca/src/ I saw all files needed.
> Please correct
> this for rc4!
> Will you pick new version of libehca from that dir?
>

There was a missing EXTRA_DIST parameter in the libehca/Makefile.am.
I will fix it in the trunk and branches/1.1.

> 2) When I used the install.sh script to install the software packages or
> compile
> them on ppc64, kernel 2.6.18-rc5/6 I got the following error messages:
>
>   gcc -m64 -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1
> /drivers/infiniband/core/.ib_addr.mod.o.d  -nos
> M/BUILD/openib-1.1/include  -I/var/tmp/OFEDRPM/BUILD/openib-1.1
> /drivers/infiniband/include  -Iinclu
> oft-float -pipe -mminimal-toc -mtraceback=none  -mcall-aixdesc
> -mtune=power4 -mno-altivec -funit-at
> lude -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
> -I/var/tmp/OFEDRPM/BUILD/openi
> g   -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ib_addr.mod)"
> -D"KBUILD_MODNAME=KBUILD_STR(
> o /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/core/ib_addr.mod.c
> In file included from include/asm/system.h:9,
>                  from include/linux/spinlock.h:56,
>                  from include/linux/capability.h:45,
>                  from include/linux/sched.h:44,
>                  from include/linux/module.h:9,
>                  from /var/tmp/OFEDRPM/BUILD/openib-1.1
> /drivers/infiniband/core/ib_addr.mod.c:1:
> include/asm/hw_irq.h: In function `local_irq_disable':
> include/asm/hw_irq.h:51: warning: implicit declaration of function
> `__mtmsrd'
> In file included from include/asm/current.h:15,
>                  from include/linux/capability.h:46,
>                  from include/linux/sched.h:44,
>                  from include/linux/module.h:9,
>                  from /var/tmp/OFEDRPM/BUILD/openib-1.1
> /drivers/infiniband/core/ib_addr.mod.c:1:
> include/asm/paca.h: At top level:
> include/asm/paca.h:84: error: `SLB_CACHE_ENTRIES' undeclared here (not in
> a
> function)
> In file included from include/linux/sched.h:49,
>                  from include/linux/module.h:9,
>                  from /var/tmp/OFEDRPM/BUILD/openib-1.1
> /drivers/infiniband/core/ib_addr.mod.c:1:
> include/linux/jiffies.h:18:5: warning: "CONFIG_HZ" is not defined
> include/linux/jiffies.h:20:7: warning: "CONFIG_HZ" is not defined
>
> If I use the kernel Makefile in /usr/src/linux-2.6.18-rc5 to compile e.g.
> make -C /usr/src/linux-2.6.18-rc5
> SUBDIRS=/var/tmp/OFEDRPM/BUILD/openib-1.1
> /drivers/infiniband/core
> then it works fine. We found out that the top-level kernel Makefile does
> the following settings
>
> LINUXINCLUDE    := -Iinclude \
>                    $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \
>                    -include include/linux/autoconf.h
> CPPFLAGS        := -D__KERNEL__ $(LINUXINCLUDE)
>
> that include autoconf.h with all configured kernel configs like
> CONFIG_PPC64 etc. And obviously those
> config defines are lost if one uses
> /usr/src/linux-2.6.18-rc5/scripts/Makefile.build as OFED install.sh
> does. I'm wondering if anyone else also sees this problem on other
> architectures?
> Is there any reasons not to use the top-level kernel Makefile?
>
We are using top-level kernel Makefile.
It was an issue in the OFED-1.1-rc3 with 2.6.18 kernels.
It fixed in OFED-1.1-rc4.

> Thanks!
> Nam Nguyen
>
> openib-general-bounces at openib.org wrote on 07.09.2006 22:01:30:
>
>> Hi,
>> OFED 1.1 RC4 will be published on Monday 11-Sep.
>> We currently work on several showstoppers:
>> 1. 223: mthca.so not properly linked to libibverbs â Vlad & Jack
>> 2. 221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118  - Roland
>> 3. 219: OFED 1.1rc3 contains prerelease unstable libibverbs code â
>> Vlad &
> Jack
>>
>> Thus final release date will be delayed to end of next week
>>
>>
>> Tziporet Koren
>> Software Director
>> Mellanox Technologies
>> mailto: tziporet at mellanox.co.il
>> Tel +972-4-9097200, ext 380
>>  _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
>


From mst at mellanox.co.il  Sun Sep 10 04:11:45 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 10 Sep 2006 14:11:45 +0300
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
	cases.
In-Reply-To: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
Message-ID: <20060910111145.GA12111@mellanox.co.il>

Quoting r. Krishna Kumar <krkumar2 at in.ibm.com>:
> Subject: [PATCH] cma_connect_ib leaks memory in failure cases.
> 
> cma_connect_ib leaks an struct ib_cm_id* in failure cases.
> 
> Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

This one looks like it might be good for 2.6.18. Sean?

-- 
MST


From toralf.foerster at gmx.de  Sun Sep 10 04:43:00 2006
From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=)
Date: Sun, 10 Sep 2006 13:43:00 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <4503D620.4000602@voltaire.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<aday7sva13d.fsf@cisco.com> <4503D620.4000602@voltaire.com>
Message-ID: <200609101343.02740.toralf.foerster@gmx.de>

I copied the config file to .config, made then a "make oldconfig && make" against current sources 2.6.18-rc6-git3.
BTW, I attach another .config where the similar problem occured

Am Sunday 10 September 2006 11:08 schrieb Erez Zilber:
> Roland Dreier wrote:
> >     Or> you need to have CONFIG_INFINIBAND_ADDR_TRANS=m defined also i
> >     Or> think you are missing CONFIG_INET=m
> >
> > Seems like a bug in the iSER Kconfig -- it shouldn't be possible to
> > select iSER without everything it needs to compile.
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
> >   
> I wasn't able to reproduce this behavior. This is what I did:
> 
>    1. I deleted my .config file.
>    2. make menuconfig
>    3. Selected Device drivers/InfiniBand support = y (if I don't do
>       that, I cannot select iSER).
>    4. Selected ISCSI RDMA protocol = y
> 
> I've attached the .config file. Note that 
> CONFIG_INFINIBAND_ADDR_TRANS=y. This is because in 
> drivers/infiniband/Kconfig it says:
> 
> config INFINIBAND_ADDR_TRANS
>         bool
>         depends on INFINIBAND && INET
>         default y
> 
> Therefore, I don't understand how did this behavior happen. Can you 
> reproduce it and send the list of steps?
> 
> Erez
> 

-- 
MfG/Sincerely
Toralf Förster
-------------- next part --------------
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.18-rc6-git3
# Sun Sep 10 13:26:49 2006
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_RELAY=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_UID16=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_RT_MUTEXES=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
# CONFIG_MODULES is not set

#
# Block layer
#
CONFIG_LBD=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
CONFIG_MPENTIUMM=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_X86_UP_APIC is not set
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_NONFATAL is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_X86_REBOOTFIXUPS=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_EFI_VARS is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
CONFIG_MATH_EMULATION=y
# CONFIG_MTRR is not set
CONFIG_EFI=y
CONFIG_BOOT_IOREMAP=y
CONFIG_REGPARM=y
CONFIG_SECCOMP=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_PHYSICAL_START=0x100000
# CONFIG_COMPAT_VDSO is not set

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
CONFIG_PM_LEGACY=y
# CONFIG_PM_DEBUG is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SLEEP_PROC_FS=y
# CONFIG_ACPI_SLEEP_PROC_SLEEP is not set
# CONFIG_ACPI_AC is not set
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_VIDEO=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_THERMAL=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_IBM is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_ACPI_DEBUG=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y

#
# APM (Advanced Power Management) BIOS Support
#
# CONFIG_APM is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
CONFIG_PCI_GOMMCONFIG=y
# CONFIG_PCI_GODIRECT is not set
# CONFIG_PCI_GOANY is not set
CONFIG_PCI_MMCONFIG=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_PCI_DEBUG=y
CONFIG_ISA_DMA_API=y
# CONFIG_ISA is not set
# CONFIG_MCA is not set
CONFIG_SCx200=y
CONFIG_SCx200HR_TIMER=y

#
# PCCARD (PCMCIA/CardBus) support
#
CONFIG_PCCARD=y
CONFIG_PCMCIA_DEBUG=y
CONFIG_PCMCIA=y
CONFIG_PCMCIA_IOCTL=y
# CONFIG_CARDBUS is not set

#
# PC-card bridges
#
# CONFIG_YENTA is not set
CONFIG_PD6729=y
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y

#
# PCI Hotplug Support
#

#
# Executable file formats
#
# CONFIG_BINFMT_ELF is not set
CONFIG_BINFMT_AOUT=y
# CONFIG_BINFMT_MISC is not set

#
# Networking
#
# CONFIG_NET is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_DEBUG_DRIVER=y
# CONFIG_SYS_HYPERVISOR is not set

#
# Connector - unified userspace <-> kernelspace linker
#

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_SERIAL=y
# CONFIG_PARPORT_PC_PCMCIA is not set
CONFIG_PARPORT_NOT_PC=y
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=y
CONFIG_PARPORT_1284=y

#
# Plug and Play support
#
# CONFIG_PNP is not set

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
CONFIG_PARIDE=y
CONFIG_PARIDE_PARPORT=y

#
# Parallel IDE high-level drivers
#
# CONFIG_PARIDE_PD is not set
# CONFIG_PARIDE_PCD is not set
# CONFIG_PARIDE_PF is not set
# CONFIG_PARIDE_PT is not set
CONFIG_PARIDE_PG=y

#
# Parallel IDE protocol modules
#
CONFIG_PARIDE_ATEN=y
CONFIG_PARIDE_BPCK=y
CONFIG_PARIDE_BPCK6=y
CONFIG_PARIDE_COMM=y
CONFIG_PARIDE_DSTR=y
# CONFIG_PARIDE_FIT2 is not set
# CONFIG_PARIDE_FIT3 is not set
# CONFIG_PARIDE_EPAT is not set
CONFIG_PARIDE_EPIA=y
CONFIG_PARIDE_FRIQ=y
CONFIG_PARIDE_FRPW=y
# CONFIG_PARIDE_KBIC is not set
CONFIG_PARIDE_KTTI=y
CONFIG_PARIDE_ON20=y
# CONFIG_PARIDE_ON26 is not set
CONFIG_BLK_CPQ_DA=y
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
# CONFIG_BLK_DEV_LOOP is not set
CONFIG_BLK_DEV_SX8=y
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_CDROM_PKTCDVD is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_BLK_DEV_IDECS is not set
# CONFIG_BLK_DEV_IDECD is not set
CONFIG_BLK_DEV_IDEFLOPPY=y
# CONFIG_BLK_DEV_IDESCSI is not set
CONFIG_IDE_TASK_IOCTL=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_OFFBOARD=y
CONFIG_BLK_DEV_GENERIC=y
CONFIG_BLK_DEV_RZ1000=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_AEC62XX=y
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
CONFIG_BLK_DEV_ATIIXP=y
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_CS5535 is not set
# CONFIG_BLK_DEV_HPT34X is not set
CONFIG_BLK_DEV_HPT366=y
CONFIG_BLK_DEV_SC1200=y
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
# CONFIG_PDC202XX_BURST is not set
CONFIG_BLK_DEV_PDC202XX_NEW=y
# CONFIG_BLK_DEV_SVWKS is not set
CONFIG_BLK_DEV_SIIMAGE=y
# CONFIG_BLK_DEV_SIS5513 is not set
CONFIG_BLK_DEV_SLC90E66=y
CONFIG_BLK_DEV_TRM290=y
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_RAID_ATTRS=y
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=y
CONFIG_CHR_DEV_SCH=y

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
CONFIG_SCSI_LOGGING=y

#
# SCSI Transport Attributes
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_ISCSI_ATTRS=y
CONFIG_SCSI_SAS_ATTRS=y

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
CONFIG_SCSI_3W_9XXX=y
# CONFIG_SCSI_ACARD is not set
CONFIG_SCSI_AACRAID=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_CMDS_PER_DEVICE=32
CONFIG_AIC7XXX_RESET_DELAY_MS=5000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=5000
# CONFIG_AIC79XX_ENABLE_RD_STRM is not set
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_DPT_I2O is not set
CONFIG_SCSI_ADVANSYS=y
# CONFIG_MEGARAID_NEWGEN is not set
CONFIG_MEGARAID_LEGACY=y
CONFIG_MEGARAID_SAS=y
# CONFIG_SCSI_SATA is not set
CONFIG_SCSI_HPTIOP=y
CONFIG_SCSI_BUSLOGIC=y
# CONFIG_SCSI_OMIT_FLASHPOINT is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
CONFIG_SCSI_FUTURE_DOMAIN=y
CONFIG_SCSI_GDTH=y
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
CONFIG_SCSI_PPA=y
CONFIG_SCSI_IMM=y
# CONFIG_SCSI_IZIP_EPP16 is not set
CONFIG_SCSI_IZIP_SLOW_CTR=y
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=y
CONFIG_SCSI_QLA_FC=y
# CONFIG_SCSI_LPFC is not set
CONFIG_SCSI_DC390T=y
CONFIG_SCSI_NSP32=y
# CONFIG_SCSI_DEBUG is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_SPI is not set
# CONFIG_FUSION_FC is not set
# CONFIG_FUSION_SAS is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
CONFIG_I2O=y
CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=y
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_CONFIG=y
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=y
CONFIG_I2O_BLOCK=y
CONFIG_I2O_SCSI=y
CONFIG_I2O_PROC=y

#
# ISDN subsystem
#

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_EVBUG=y

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
CONFIG_KEYBOARD_LKKBD=y
CONFIG_KEYBOARD_XTKBD=y
CONFIG_KEYBOARD_NEWTON=y
# CONFIG_INPUT_MOUSE is not set
CONFIG_INPUT_JOYSTICK=y
CONFIG_JOYSTICK_ANALOG=y
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
CONFIG_JOYSTICK_GF2K=y
# CONFIG_JOYSTICK_GRIP is not set
CONFIG_JOYSTICK_GRIP_MP=y
CONFIG_JOYSTICK_GUILLEMOT=y
CONFIG_JOYSTICK_INTERACT=y
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
CONFIG_JOYSTICK_SPACEBALL=y
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=y
CONFIG_JOYSTICK_DB9=y
CONFIG_JOYSTICK_GAMECON=y
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_JOYDUMP=y
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
# CONFIG_INPUT_PCSPKR is not set
# CONFIG_INPUT_WISTRON_BTNS is not set
# CONFIG_INPUT_UINPUT is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
CONFIG_SERIO_PARKBD=y
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
CONFIG_GAMEPORT=y
CONFIG_GAMEPORT_NS558=y
# CONFIG_GAMEPORT_L4 is not set
# CONFIG_GAMEPORT_EMU10K1 is not set
CONFIG_GAMEPORT_FM801=y

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=y
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
CONFIG_MOXA_SMARTIO=y
CONFIG_ISI=y
CONFIG_SYNCLINK=y
CONFIG_SYNCLINKMP=y
CONFIG_SYNCLINK_GT=y
CONFIG_N_HDLC=y
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
CONFIG_SX=y
# CONFIG_RIO is not set
CONFIG_STALDRV=y
# CONFIG_STALLION is not set
# CONFIG_ISTALLION is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
# CONFIG_SERIAL_8250_CS is not set
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=y
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
# CONFIG_TIPAR is not set

#
# IPMI
#
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=y
CONFIG_IPMI_SI=y
CONFIG_IPMI_WATCHDOG=y
# CONFIG_IPMI_POWEROFF is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_INTEL is not set
CONFIG_HW_RANDOM_AMD=y
CONFIG_HW_RANDOM_GEODE=y
CONFIG_HW_RANDOM_VIA=y
CONFIG_NVRAM=y
# CONFIG_RTC is not set
CONFIG_GEN_RTC=y
# CONFIG_GEN_RTC_X is not set
# CONFIG_DTLK is not set
CONFIG_R3964=y
# CONFIG_APPLICOM is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
# CONFIG_AGP is not set
# CONFIG_DRM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
# CONFIG_CARDMAN_4000 is not set
# CONFIG_CARDMAN_4040 is not set
CONFIG_MWAVE=y
# CONFIG_SCx200_GPIO is not set
# CONFIG_PC8736x_GPIO is not set
# CONFIG_NSC_GPIO is not set
CONFIG_CS5535_GPIO=y
# CONFIG_RAW_DRIVER is not set
CONFIG_HPET=y
# CONFIG_HPET_RTC_IRQ is not set
# CONFIG_HPET_MMAP is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# TPM devices
#

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y

#
# I2C Algorithms
#
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_ALGOPCF=y
# CONFIG_I2C_ALGOPCA is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
CONFIG_I2C_I810=y
CONFIG_I2C_PIIX4=y
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
CONFIG_I2C_PROSAVAGE=y
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIAPRO is not set
# CONFIG_I2C_VOODOO3 is not set
# CONFIG_I2C_PCA_ISA is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_I2C_DEBUG_CORE is not set
CONFIG_I2C_DEBUG_ALGO=y
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# SPI support
#
# CONFIG_SPI is not set
# CONFIG_SPI_MASTER is not set

#
# Dallas's 1-wire bus
#

#
# Hardware Monitoring support
#
# CONFIG_HWMON is not set
# CONFIG_HWMON_VID is not set

#
# Misc devices
#

#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set
CONFIG_VIDEO_V4L2=y

#
# Digital Video Broadcasting Devices
#

#
# Graphics support
#
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
CONFIG_FB_CYBER2000=y
CONFIG_FB_ARC=y
CONFIG_FB_ASILIANT=y
CONFIG_FB_IMSTT=y
CONFIG_FB_VGA16=y
# CONFIG_FB_VESA is not set
# CONFIG_FB_IMAC is not set
# CONFIG_FB_HGA is not set
CONFIG_FB_S1D13XXX=y
# CONFIG_FB_NVIDIA is not set
CONFIG_FB_RIVA=y
CONFIG_FB_RIVA_I2C=y
CONFIG_FB_RIVA_DEBUG=y
CONFIG_FB_MATROX=y
# CONFIG_FB_MATROX_MILLENIUM is not set
CONFIG_FB_MATROX_MYSTIQUE=y
CONFIG_FB_MATROX_G=y
CONFIG_FB_MATROX_I2C=y
# CONFIG_FB_MATROX_MAVEN is not set
CONFIG_FB_MATROX_MULTIHEAD=y
CONFIG_FB_RADEON=y
# CONFIG_FB_RADEON_I2C is not set
CONFIG_FB_RADEON_DEBUG=y
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_SIS is not set
CONFIG_FB_NEOMAGIC=y
CONFIG_FB_KYRO=y
CONFIG_FB_3DFX=y
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_CYBLA is not set
CONFIG_FB_TRIDENT=y
CONFIG_FB_VIRTUAL=y

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_VIDEO_SELECT=y
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set

#
# Logo configuration
#
# CONFIG_LOGO is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Sound
#
CONFIG_SOUND=y

#
# Advanced Linux Sound Architecture
#
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_HWDEP=y
CONFIG_SND_RAWMIDI=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=y
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
# CONFIG_SND_PCM_OSS_PLUGINS is not set
# CONFIG_SND_SEQUENCER_OSS is not set
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set

#
# Generic devices
#
CONFIG_SND_MPU401_UART=y
CONFIG_SND_OPL3_LIB=y
CONFIG_SND_VX_LIB=y
CONFIG_SND_AC97_CODEC=y
CONFIG_SND_AC97_BUS=y
# CONFIG_SND_DUMMY is not set
CONFIG_SND_VIRMIDI=y
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=y

#
# PCI devices
#
# CONFIG_SND_AD1889 is not set
CONFIG_SND_ALS300=y
CONFIG_SND_ALS4000=y
CONFIG_SND_ALI5451=y
# CONFIG_SND_ATIIXP is not set
CONFIG_SND_ATIIXP_MODEM=y
CONFIG_SND_AU8810=y
CONFIG_SND_AU8820=y
CONFIG_SND_AU8830=y
CONFIG_SND_BT87X=y
# CONFIG_SND_BT87X_OVERCLOCK is not set
# CONFIG_SND_CA0106 is not set
CONFIG_SND_CMIPCI=y
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_CS46XX is not set
CONFIG_SND_CS5535AUDIO=y
# CONFIG_SND_DARLA20 is not set
# CONFIG_SND_GINA20 is not set
# CONFIG_SND_LAYLA20 is not set
CONFIG_SND_DARLA24=y
# CONFIG_SND_GINA24 is not set
CONFIG_SND_LAYLA24=y
# CONFIG_SND_MONA is not set
# CONFIG_SND_MIA is not set
# CONFIG_SND_ECHO3G is not set
CONFIG_SND_INDIGO=y
# CONFIG_SND_INDIGOIO is not set
CONFIG_SND_INDIGODJ=y
# CONFIG_SND_EMU10K1 is not set
CONFIG_SND_EMU10K1X=y
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
CONFIG_SND_ES1938=y
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_HDA_INTEL is not set
CONFIG_SND_HDSP=y
CONFIG_SND_HDSPM=y
# CONFIG_SND_ICE1712 is not set
CONFIG_SND_ICE1724=y
CONFIG_SND_INTEL8X0=y
CONFIG_SND_INTEL8X0M=y
CONFIG_SND_KORG1212=y
CONFIG_SND_MAESTRO3=y
# CONFIG_SND_MIXART is not set
CONFIG_SND_NM256=y
# CONFIG_SND_PCXHR is not set
CONFIG_SND_RIPTIDE=y
CONFIG_SND_RME32=y
CONFIG_SND_RME96=y
# CONFIG_SND_RME9652 is not set
CONFIG_SND_SONICVIBES=y
CONFIG_SND_TRIDENT=y
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VIA82XX_MODEM is not set
# CONFIG_SND_VX222 is not set
CONFIG_SND_YMFPCI=y

#
# PCMCIA devices
#
CONFIG_SND_VXPOCKET=y
# CONFIG_SND_PDAUDIOCF is not set

#
# Open Sound System
#
# CONFIG_SOUND_PRIME is not set

#
# USB support
#
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
# CONFIG_USB is not set

#
# NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support'
#

#
# USB Gadget Support
#
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_DEBUG_FILES=y
CONFIG_USB_GADGET_SELECTED=y
CONFIG_USB_GADGET_NET2280=y
CONFIG_USB_NET2280=y
# CONFIG_USB_GADGET_PXA2XX is not set
# CONFIG_USB_GADGET_GOKU is not set
# CONFIG_USB_GADGET_LH7A40X is not set
# CONFIG_USB_GADGET_OMAP is not set
# CONFIG_USB_GADGET_AT91 is not set
# CONFIG_USB_GADGET_DUMMY_HCD is not set
CONFIG_USB_GADGET_DUALSPEED=y
# CONFIG_USB_ZERO is not set
# CONFIG_USB_ETH is not set
# CONFIG_USB_GADGETFS is not set
CONFIG_USB_FILE_STORAGE=y
CONFIG_USB_FILE_STORAGE_TEST=y
# CONFIG_USB_G_SERIAL is not set

#
# MMC/SD Card support
#
CONFIG_MMC=y
CONFIG_MMC_DEBUG=y
# CONFIG_MMC_BLOCK is not set
CONFIG_MMC_WBSD=y

#
# LED devices
#
# CONFIG_NEW_LEDS is not set

#
# LED drivers
#

#
# LED Triggers
#

#
# InfiniBand support
#
CONFIG_INFINIBAND=y
CONFIG_INFINIBAND_USER_MAD=y
CONFIG_INFINIBAND_USER_ACCESS=y
CONFIG_INFINIBAND_MTHCA=y
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_SRP=y
CONFIG_INFINIBAND_ISER=y

#
# EDAC - error detection and reporting (RAS) (EXPERIMENTAL)
#

#
# Real Time Clock
#

#
# DMA Engine support
#
CONFIG_DMA_ENGINE=y

#
# DMA Clients
#

#
# DMA Devices
#
CONFIG_INTEL_IOATDMA=y

#
# File systems
#
# CONFIG_EXT2_FS is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
CONFIG_REISERFS_PROC_INFO=y
# CONFIG_REISERFS_FS_XATTR is not set
CONFIG_JFS_FS=y
CONFIG_JFS_POSIX_ACL=y
# CONFIG_JFS_SECURITY is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_INOTIFY is not set
# CONFIG_QUOTA is not set
CONFIG_DNOTIFY=y
CONFIG_AUTOFS_FS=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_HFSPLUS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
CONFIG_HPFS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
CONFIG_UFS_FS=y
# CONFIG_UFS_DEBUG is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
CONFIG_ACORN_PARTITION=y
# CONFIG_ACORN_PARTITION_CUMANA is not set
# CONFIG_ACORN_PARTITION_EESOX is not set
# CONFIG_ACORN_PARTITION_ICS is not set
# CONFIG_ACORN_PARTITION_ADFS is not set
# CONFIG_ACORN_PARTITION_POWERTEC is not set
CONFIG_ACORN_PARTITION_RISCIX=y
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
# CONFIG_MINIX_SUBPARTITION is not set
CONFIG_SOLARIS_X86_PARTITION=y
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
# CONFIG_SUN_PARTITION is not set
# CONFIG_KARMA_PARTITION is not set
CONFIG_EFI_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=y
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
CONFIG_NLS_CODEPAGE_860=y
CONFIG_NLS_CODEPAGE_861=y
CONFIG_NLS_CODEPAGE_862=y
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
CONFIG_NLS_CODEPAGE_865=y
# CONFIG_NLS_CODEPAGE_866 is not set
CONFIG_NLS_CODEPAGE_869=y
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
CONFIG_NLS_CODEPAGE_932=y
# CONFIG_NLS_CODEPAGE_949 is not set
CONFIG_NLS_CODEPAGE_874=y
CONFIG_NLS_ISO8859_8=y
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
# CONFIG_NLS_ISO8859_1 is not set
CONFIG_NLS_ISO8859_2=y
# CONFIG_NLS_ISO8859_3 is not set
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=y
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
CONFIG_NLS_ISO8859_13=y
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOG_BUF_SHIFT=15
CONFIG_DETECT_SOFTLOCKUP=y
CONFIG_SCHEDSTATS=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SLAB_LEAK=y
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_RWSEMS is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_VM=y
CONFIG_FRAME_POINTER=y
# CONFIG_UNWIND_INFO is not set
# CONFIG_FORCED_INLINING is not set
CONFIG_RCU_TORTURE_TEST=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_RODATA is not set
CONFIG_4KSTACKS=y
CONFIG_DOUBLEFAULT=y

#
# Security options
#
CONFIG_KEYS=y
# CONFIG_KEYS_DEBUG_PROC_KEYS is not set
# CONFIG_SECURITY is not set

#
# Cryptographic options
#
CONFIG_CRYPTO=y
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_MD5=y
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
CONFIG_CRYPTO_SHA512=y
# CONFIG_CRYPTO_WP512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_BLOWFISH is not set
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_AES_586=y
CONFIG_CRYPTO_CAST5=y
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_ARC4 is not set
CONFIG_CRYPTO_KHAZAD=y
# CONFIG_CRYPTO_ANUBIS is not set
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_MICHAEL_MIC=y
# CONFIG_CRYPTO_CRC32C is not set

#
# Hardware crypto devices
#
# CONFIG_CRYPTO_DEV_PADLOCK is not set

#
# Library routines
#
CONFIG_CRC_CCITT=y
# CONFIG_CRC16 is not set
CONFIG_CRC32=y
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_PLIST=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_KTIME_SCALAR=y
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060910/8ea4e5af/attachment.sig>

From halr at voltaire.com  Sun Sep 10 04:48:11 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 10 Sep 2006 07:48:11 -0400
Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option
In-Reply-To: <1157870154.29270.42.camel@kliteynik.yok.mtl.com>
References: <1157870154.29270.42.camel@kliteynik.yok.mtl.com>
Message-ID: <1157888830.27427.49152.camel@hal.voltaire.com>

Hi Yevgeny,

On Sun, 2006-09-10 at 02:35, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> This patch fixes the bug that was occurring when OSM was 
> running with --run-once option (-o) and the SM port was down.
> In that case, OSM would be stuck in cond_wait forever (or until
> the port will become active), and could not be terminated, 
> other than by SIGKILL.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Should this go to 1.1 as well as trunk ? How critical for 1.1 ?

-- Hal


From erezz at voltaire.com  Sun Sep 10 05:14:38 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Sun, 10 Sep 2006 15:14:38 +0300
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <200609101343.02740.toralf.foerster@gmx.de>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<aday7sva13d.fsf@cisco.com> <4503D620.4000602@voltaire.com>
	<200609101343.02740.toralf.foerster@gmx.de>
Message-ID: <450401AE.2030606@voltaire.com>

Toralf Förster wrote:
> I copied the config file to .config, made then a "make oldconfig && make" against current sources 2.6.18-rc6-git3.
> BTW, I attach another .config where the similar problem occured
>
Where did you get this config file from? I don't think that this config 
file was generated by a too like 'make menuconfig'. As I explained 
before, if you select CONFIG_INFINIBAND=y using the menuconfig tool, it 
also sets CONFIG_INFINIBAND_ADDR_TRANS=y. Therefore, I can only guess 
that this config file was generated manually (or at least modified 
manually).

If you can explain how can I generate the config file that you used, 
maybe I can reproduce it. Else, I suggest that you delete your .config 
file and run 'make menuconfig'. Then, select InfiniBand & iSER and it 
should work fine. Let me know if it works for you.

Erez


From toralf.foerster at gmx.de  Sun Sep 10 07:45:19 2006
From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=)
Date: Sun, 10 Sep 2006 16:45:19 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <450401AE.2030606@voltaire.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
Message-ID: <200609101645.22695.toralf.foerster@gmx.de>

Your're right, sorry,

I forgot this to say that:
- first I created a random .config file using "make rndconfig"
- after that I removed all options not fitting my system
- then I prepended some options commonly used by me on top of the .config file
- and run finally a "make oldconfig" to-hopefully- get a clean config

Doesn't "make oldconfig" make a clean .config file ?

Am Sunday 10 September 2006 14:14 schrieb Erez Zilber:
> Toralf Förster wrote:
> > I copied the config file to .config, made then a "make oldconfig && make" against current sources 2.6.18-rc6-git3.
> > BTW, I attach another .config where the similar problem occured
> >
> Where did you get this config file from? I don't think that this config 
> file was generated by a too like 'make menuconfig'. As I explained 
> before, if you select CONFIG_INFINIBAND=y using the menuconfig tool, it 
> also sets CONFIG_INFINIBAND_ADDR_TRANS=y. Therefore, I can only guess 
> that this config file was generated manually (or at least modified 
> manually).
> 
> If you can explain how can I generate the config file that you used, 
> maybe I can reproduce it. Else, I suggest that you delete your .config 
> file and run 'make menuconfig'. Then, select InfiniBand & iSER and it 
> should work fine. Let me know if it works for you.
> 
> Erez
> 
> 

-- 
MfG/Sincerely
Toralf Förster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060910/4542d68b/attachment.sig>

From eitan at mellanox.co.il  Sun Sep 10 11:46:22 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Sun, 10 Sep 2006 21:46:22 +0300
Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option
In-Reply-To: <1157888830.27427.49152.camel@hal.voltaire.com>
References: <1157870154.29270.42.camel@kliteynik.yok.mtl.com>
	<1157888830.27427.49152.camel@hal.voltaire.com>
Message-ID: <45045D7E.6090908@mellanox.co.il>

Hal Rosenstock wrote:

>Hi Yevgeny,
>
>On Sun, 2006-09-10 at 02:35, Yevgeny Kliteynik wrote:
>  
>
>>Hi Hal
>>
>>This patch fixes the bug that was occurring when OSM was 
>>running with --run-once option (-o) and the SM port was down.
>>In that case, OSM would be stuck in cond_wait forever (or until
>>the port will become active), and could not be terminated, 
>>other than by SIGKILL.
>>
>>Yevgeny
>>
>>Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>>    
>>
>
>Should this go to 1.1 as well as trunk ? How critical for 1.1 ?
>  
>
IMO this should only be applied to the trunk as the --run-once is not a 
user mode rather then a testing mode.So it is not critical for the branch.
EZ

>-- Hal
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From zhushisongzhu at yahoo.com  Sun Sep 10 21:37:07 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Sun, 10 Sep 2006 21:37:07 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060910075818.GV6928@mellanox.co.il>
Message-ID: <20060911043707.54946.qmail@web36909.mail.mud.yahoo.com>

I tested for 10 times and every time issued 5000
concurrent connections to access Internet through SDP.
Until now I haven't found any problem.  I hope you can
also reduce memory used by one SDP connection.
  Another question, When I use ib_query_gid in
sdp_init function, it return -EINVAL.  How to use
ib_query_gid correctly?
  If I want to let SDP module is independent of ipoib
module, it it difficult?
  tks
  zhu

--- "Michael S. Tsirkin" <mst at mellanox.co.il> wrote:

> Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> > Subject: Re: why sdp connections cost so much
> memory
> > 
> > OFED-1.1-rc3 has passed my tests. I have to adjust
> > Post buffer size to 0x4 and use your patch for me.
> 
> > Can you make it fixed not to do these myself
> manually?
> > 
> >  zhu
> 
> I plan to add the following patch to OFED. Could you
> please verify
> that it fixes the issue for you, without tweaking
> the ring size?
> 
> Signed-off-by: Michael S. Tsirkin
> <mst at mellanox.co.il>
> 
> diff --git a/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> b/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> index b30d2a0..4540fa4 100644
> --- a/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> +++ b/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> @@ -37,6 +37,10 @@ #include <rdma/ib_verbs.h>
>  #include <rdma/rdma_cm.h>
>  #include "sdp.h"
>  
> +static int rcvbuf_scale = 0x1;
> +module_param_named(rcvbuf_scale, rcvbuf_scale, int,
> 0644);
> +MODULE_PARM_DESC(srcvbuf_scale, "Receive buffer
> size scale factor.");
> +
>  /* Like tcp_fin */
>  static void sdp_fin(struct sock *sk)
>  {
> @@ -237,7 +241,7 @@ void sdp_post_recvs(struct
> sdp_sock *ssk
>  	while ((likely(ssk->rx_head - ssk->rx_tail <
> SDP_RX_SIZE) &&
>  		(ssk->rx_head - ssk->rx_tail - SDP_MIN_BUFS) *
>  		SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE + rmem <
> -		ssk->isk.sk.sk_rcvbuf * 0x10) ||
> +		ssk->isk.sk.sk_rcvbuf * rcvbuf_scale) ||
>  	       unlikely(ssk->rx_head - ssk->rx_tail <
> SDP_MIN_BUFS))
>  		sdp_post_recv(ssk);
>  }
> 
> -- 
> MST
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From mst at mellanox.co.il  Sun Sep 10 21:57:56 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 07:57:56 +0300
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911043707.54946.qmail@web36909.mail.mud.yahoo.com>
References: <20060911043707.54946.qmail@web36909.mail.mud.yahoo.com>
Message-ID: <20060911045756.GA8709@mellanox.co.il>

Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> Subject: Re: why sdp connections cost so much memory
> 
> I tested for 10 times and every time issued 5000
> concurrent connections to access Internet through SDP.
> Until now I haven't found any problem.  I hope you can
> also reduce memory used by one SDP connection.

You mean - when only a single socket is open?

>   Another question, When I use ib_query_gid in
> sdp_init function, it return -EINVAL.  How to use
> ib_query_gid correctly?

Looks like you are passing in an invalid gid.

>   If I want to let SDP module is independent of ipoib
> module, it it difficult?

The SDP protocol uses ARP over IPoIB for its address resolution.
So you'd need to find some other way to perform address resolution.

-- 
MST


From sweitzen at cisco.com  Sun Sep 10 22:27:10 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 10 Sep 2006 22:27:10 -0700
Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EA982@xmb-sjc-216.amer.cisco.com>

Please make sure 1. and 3. are fixed before you release rc4, thanks.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

________________________________

	From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren
	Sent: Thursday, September 07, 2006 1:02 PM
	To: EWG
	Cc: openib
	Subject: [openfabrics-ewg] OFED 1.1 status
	
	
	Hi,

	OFED 1.1 RC4 will be published on Monday 11-Sep.

	We currently work on several showstoppers:

	1.	223: mthca.so not properly linked to libibverbs - Vlad &
Jack 
	2.	221: SRP on V40Z and Sun T4 gets Kernel BUG at
spinlock:118  - Roland 
	3.	219: OFED 1.1rc3 contains prerelease unstable libibverbs
code - Vlad & Jack 

	 
	Thus final release date will be delayed to end of next week

	 
	Tziporet Koren

	Software Director

	Mellanox Technologies

	mailto: tziporet at mellanox.co.il
	Tel +972-4-9097200, ext 380

	 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060910/9ec87763/attachment.html>

From sweitzen at cisco.com  Sun Sep 10 22:47:24 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Sun, 10 Sep 2006 22:47:24 -0700
Subject: [openib-general] is there a plan for getting SDP into kernel.org?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EA989@xmb-sjc-216.amer.cisco.com>

I would like to see netstat support, zcopy support, and ideally AIO
support get added first...
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060910/7e5982af/attachment.html>

From tziporet at mellanox.co.il  Sun Sep 10 23:02:39 2006
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 11 Sep 2006 09:02:39 +0300
Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status
Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E301FA78BA@mtlexch01.mtl.com>

both already fixed.

 
Tziporet

 
-----Original Message-----
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
Sent: Monday, September 11, 2006 8:27 AM
To: Tziporet Koren; EWG
Cc: openib
Subject: RE: [openfabrics-ewg] OFED 1.1 status

 
Please make sure 1. and 3. are fixed before you release rc4, thanks.

 
Scott Weitzenkamp

SQA and Release Manager

Server Virtualization Business Unit

Cisco Systems

 
________________________________


	From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren
	Sent: Thursday, September 07, 2006 1:02 PM
	To: EWG
	Cc: openib
	Subject: [openfabrics-ewg] OFED 1.1 status

	Hi,

	OFED 1.1 RC4 will be published on Monday 11-Sep.

	We currently work on several showstoppers:

	1.223: mthca.so not properly linked to libibverbs - Vlad & Jack 

	2.221: SRP on V40Z and Sun T4 gets Kernel BUG at spinlock:118  -
Roland 

	3.219: OFED 1.1rc3 contains prerelease unstable libibverbs code
- Vlad & Jack 

	 
	Thus final release date will be delayed to end of next week

	 
	Tziporet Koren

	Software Director

	Mellanox Technologies

	mailto: tziporet at mellanox.co.il
	Tel +972-4-9097200, ext 380

	 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/6dffe0a8/attachment.html>

From mst at mellanox.co.il  Sun Sep 10 23:18:57 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 09:18:57 +0300
Subject: [openib-general] is there a plan for getting SDP into
	kernel.org?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EA989@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EA989@xmb-sjc-216.amer.cisco.com>
Message-ID: <20060911061857.GA8948@mellanox.co.il>

Quoting r. Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: is there a plan for getting SDP into kernel.org?
> 
> I would like to see netstat support, zcopy support, and ideally AIO support get added first...
>  
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems

IMO, only netstat support makes some sense at this point.

-- 
MST


From erezz at voltaire.com  Sun Sep 10 23:33:15 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 09:33:15 +0300
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <200609101645.22695.toralf.foerster@gmx.de>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
Message-ID: <4505032B.3050706@voltaire.com>

Toralf Förster wrote:
> Your're right, sorry,
>
> I forgot this to say that:
> - first I created a random .config file using "make rndconfig"
> - after that I removed all options not fitting my system
> - then I prepended some options commonly used by me on top of the .config file
> - and run finally a "make oldconfig" to-hopefully- get a clean config
>
> Doesn't "make oldconfig" make a clean .config file ?
>   
>
Here's what the kernel's README file has to say about it:
"make oldconfig": Default all questions based on the contents of your 
existing ./.config file and asking about new config symbols.

I guess that 'make rndconfig' selected CONFIG_INFINIBAND=y, but didn't 
select CONFIG_INFINIBAND_ADDR_TRANS=y. Then, 'make oldconfig' asked you 
about new symbols. I guess that running 'make rndconfig' may create 
scenarios like this, but I don't think that there's a bug in iSER's 
Kconfig file. If you still want to use your .config file, reselect 
InfiniBand in 'make menuconfig'. It will set CONFIG_INFINIBAND_ADDR_TRANS=y.

I hope this helps.

Erez


From bugzilla-daemon at openib.org  Mon Sep 11 00:15:00 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Mon, 11 Sep 2006 00:15:00 -0700 (PDT)
Subject: [openib-general] [Bug 229] New: heavy CPU load can starve ib_mad
	thread on latest processors
Message-ID: <20060911071500.4CB962283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=229

           Summary: heavy CPU load can starve ib_mad thread on latest
                    processors
           Product: OpenFabrics Linux
           Version: 1.1rc3
          Platform: All
        OS/Version: RHEL 4
            Status: NEW
          Severity: normal
          Priority: P3
         Component: IB Core
        AssignedTo: bugzilla at openib.org
        ReportedBy: sweitzen at cisco.com


RHEL4 U3 x86_64

We have a proprietary test tool that places a very heavy CPU load on system. 
When this test is run on an IB host on Intel Woodcrest, AMD Opteron (Rev F, I
believe, not sure about Rev G), and PowerPC JS21 systems, IB port goes from
ACTIVE to INIT state.  The workaround is to renice the ib_mad thread to highest
priority, we recommend changing the OpenIB code to do this when ib_mad thread
is created.

This does not seem to happen on older Intel or AMD processors, dunno about
PowerPC.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From zhushisongzhu at yahoo.com  Mon Sep 11 00:43:05 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Mon, 11 Sep 2006 00:43:05 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911045756.GA8709@mellanox.co.il>
Message-ID: <20060911074305.52197.qmail@web36905.mail.mud.yahoo.com>


>> You mean - when only a single socket is open?
Every one connection will cost 2M RAM. So I make the
following changes:
#define SDP_TX_SIZE 0x4
#define SDP_RX_SIZE 0x4

> The SDP protocol uses ARP over IPoIB for its address
> resolution.
> So you'd need to find some other way to perform
> address resolution.
I'll try pre-resolute the address, So I can remove ARP
from ipoib

zhu

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From mst at mellanox.co.il  Mon Sep 11 00:50:38 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 10:50:38 +0300
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911074305.52197.qmail@web36905.mail.mud.yahoo.com>
References: <20060911074305.52197.qmail@web36905.mail.mud.yahoo.com>
Message-ID: <20060911075038.GC10024@mellanox.co.il>

Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> Subject: Re: why sdp connections cost so much memory
> 
> 
> 
> >> You mean - when only a single socket is open?
> Every one connection will cost 2M RAM. So I make the
> following changes:
> #define SDP_TX_SIZE 0x4
> #define SDP_RX_SIZE 0x4

You should not need this change with the scale patch I posted - after applying
this, and setting the scale parameter to 0x1, each connection should use around
128K for RX. Please confirm.

> > The SDP protocol uses ARP over IPoIB for its address
> > resolution.
> > So you'd need to find some other way to perform
> > address resolution.
> >
> I'll try pre-resolute the address, So I can remove ARP
> from ipoib

But you'll still need the ipoib module loaded.

-- 
MST


From bugzilla-daemon at openib.org  Mon Sep 11 00:58:40 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Mon, 11 Sep 2006 00:58:40 -0700 (PDT)
Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread
	on latest processors
Message-ID: <20060911075840.8800A2283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=229


------- Comment #1 from amitk at mellanox.co.il  2006-09-11 00:58 -------
Which SM are you running ?


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From HNGUYEN at de.ibm.com  Mon Sep 11 01:30:57 2006
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Mon, 11 Sep 2006 10:30:57 +0200
Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4
In-Reply-To: <20060910093747.GA11625@mellanox.co.il>
Message-ID: <OF5E56DFB2.C0CCE0F2-ONC12571E6.002E3D28-C12571E6.002E6F7F@de.ibm.com>

I guess my email client must have wrapped the lines so that the patch is
not applicable any more. Sorry for that!
Need little time to fix that problem. For now I'm sending you the patch
file as attachment that I could apply without errors.
Thanks,
Nam Nguyen

(See attached file: ofed_svnehca_0015.patch)

openib-general-bounces at openib.org wrote on 10.09.2006 11:37:47:

> Quoting r. Hoang-Nam Nguyen <HNGUYEN at de.ibm.com>:
> Subject: [PATCH] ehca for OFED 1.1-rc4
>
>
> > Hello Tziporet!
> > Below is a patch of ehca against the ofed git tree branch ehca-
> branch in order to upgrade it to the same code level of Roland's git
> tree branch for-2.6.19, which has been posted for a while. The main
> code changes are:
> > - Replace the "huge" EDEB macro by a simpler wrapper based on
dev_err/dbg
> > - Remove superfluous variables initialization and arguments checking
> > - Replace struct ehca_module by static member variables in
> appropriate files, where they are accessed
> > - Rename module name to ib_ehca.ko
> > Thanks!
> > Nam Nguyen
>
> Unfortunately, the patch doesn't apply against either ofed_1_1 or
ehca_branch.
> Please check that it does, before re-posting.
>
> One additional request: your message included a copy
> of patch in both plain text and html format. Please
> post plain text only.
>
> Thanks,
>
> --
> MST
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ofed_svnehca_0015.patch
Type: application/octet-stream
Size: 291382 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/ddc23f06/attachment.obj>

From zhushisongzhu at yahoo.com  Mon Sep 11 01:36:54 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Mon, 11 Sep 2006 01:36:54 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911075038.GC10024@mellanox.co.il>
Message-ID: <20060911083655.6871.qmail@web36911.mail.mud.yahoo.com>


--- "Michael S. Tsirkin" <mst at mellanox.co.il> wrote:

> Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> > Subject: Re: why sdp connections cost so much
> memory
> > 
> > >> You mean - when only a single socket is open?
> > Every one connection will cost 2M RAM. So I make
> the
> > following changes:
> > #define SDP_TX_SIZE 0x4
> > #define SDP_RX_SIZE 0x4
> 
> You should not need this change with the scale patch
> I posted - after applying
> this, and setting the scale parameter to 0x1, each
> connection should use around
> 128K for RX. Please confirm.
can each connection use 64K  memory?
 
> > > The SDP protocol uses ARP over IPoIB for its
> address
> > > resolution.
> > > So you'd need to find some other way to perform
> > > address resolution.
> > >
> > I'll try pre-resolute the address, So I can remove
> ARP
> > from ipoib
> 
> But you'll still need the ipoib module loaded.
> 
is it difficult not to load ipoib module?

zhu


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From toralf.foerster at gmx.de  Mon Sep 11 01:40:32 2006
From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=)
Date: Mon, 11 Sep 2006 10:40:32 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <4505032B.3050706@voltaire.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com>
Message-ID: <200609111040.36277.toralf.foerster@gmx.de>

Ah, thanks for clarifying this.

Unfortunately this means, that there is a small chance, that "make oldconfig" will not
work correctly in all cases, eg. upgrading a kernel to a newer version could yield into
such failed compile step :-(

Am Monday 11 September 2006 08:33 schrieb Erez Zilber:
> Toralf Förster wrote:
> > Your're right, sorry,
> >
> > I forgot this to say that:
> > - first I created a random .config file using "make rndconfig"
> > - after that I removed all options not fitting my system
> > - then I prepended some options commonly used by me on top of the .config file
> > - and run finally a "make oldconfig" to-hopefully- get a clean config
> >
> > Doesn't "make oldconfig" make a clean .config file ?
> >   
> >
> Here's what the kernel's README file has to say about it:
> "make oldconfig": Default all questions based on the contents of your 
> existing ./.config file and asking about new config symbols.
> 
> I guess that 'make rndconfig' selected CONFIG_INFINIBAND=y, but didn't 
> select CONFIG_INFINIBAND_ADDR_TRANS=y. Then, 'make oldconfig' asked you 
> about new symbols. I guess that running 'make rndconfig' may create 
> scenarios like this, but I don't think that there's a bug in iSER's 
> Kconfig file. If you still want to use your .config file, reselect 
> InfiniBand in 'make menuconfig'. It will set CONFIG_INFINIBAND_ADDR_TRANS=y.
> 
> I hope this helps.
> 
> Erez
> 
> 

-- 
MfG/Sincerely
Toralf Förster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/5bc1be4f/attachment.sig>

From mst at mellanox.co.il  Mon Sep 11 02:07:54 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 12:07:54 +0300
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911083655.6871.qmail@web36911.mail.mud.yahoo.com>
References: <20060911075038.GC10024@mellanox.co.il>
	<20060911083655.6871.qmail@web36911.mail.mud.yahoo.com>
Message-ID: <20060911090754.GA10480@mellanox.co.il>

Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> Subject: Re: why sdp connections cost so much memory
> 
> 
> 
> --- "Michael S. Tsirkin" <mst at mellanox.co.il> wrote:
> 
> > Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> > > Subject: Re: why sdp connections cost so much
> > memory
> > > 
> > > >> You mean - when only a single socket is open?
> > > Every one connection will cost 2M RAM. So I make
> > the
> > > following changes:
> > > #define SDP_TX_SIZE 0x4
> > > #define SDP_RX_SIZE 0x4
> > 
> > You should not need this change with the scale patch
> > I posted - after applying
> > this, and setting the scale parameter to 0x1, each
> > connection should use around
> > 128K for RX. Please confirm.

Could you please confirm that setting scale factor to 1 works for you,
without changing SDP_TX_SIZE/SDP_RX_SIZE?

> can each connection use 64K  memory?

SDP_MAX_SEND_SKB_FRAGS controls the number of pages per descriptor.
You need at least 4 of these.
I have it at 8 at the moment, try scaling it down.

-- 
MST


From erezz at voltaire.com  Mon Sep 11 02:17:36 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 12:17:36 +0300 (IDT)
Subject: [openib-general] [PATCH 0/5] IB/iser: iSER code changes for 2.6.19
Message-ID: <Pine.LNX.4.44.0609111216090.10648-100000@hydrus>

Here is a series of patches that fix some bugs that were found in iSER 
during testing (some were found while testing iSER on architectures 
like ia64). All of them are related to memory registartion.

Thanks
Erez


From erezz at voltaire.com  Mon Sep 11 02:19:17 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 12:19:17 +0300 (IDT)
Subject: [openib-general] [PATCH 1/5] IB/iser: fix a check of SG alignment
	for RDMA
In-Reply-To: <Pine.LNX.4.44.0609111216090.10648-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609111218080.10648-100000@hydrus>

dma mapping may include a "compaction" of the sg associated with scsi command.
Hence, the size of the maximal prefix of the SG which is aligned for rdma must be
compared against the length of the dma mapped sg (mem->dma_nents) and not against
the size of it before it was mapped (mem->size).

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iser_memory.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

5301a4bb4f73250a93bc0c103839ae527f6b4110
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 31950a5..53af956 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -378,7 +378,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 	regd_buf = &iser_ctask->rdma_regd[cmd_dir];
 
 	aligned_len = iser_data_buf_aligned_len(mem);
-	if (aligned_len != mem->size) {
+	if (aligned_len != mem->dma_nents) {
 		iser_err("rdma alignment violation %d/%d aligned\n",
 			 aligned_len, mem->size);
 		iser_data_buf_dump(mem);
-- 
1.2.6


From erezz at voltaire.com  Mon Sep 11 02:20:54 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 12:20:54 +0300 (IDT)
Subject: [openib-general] [PATCH 2/5] IB/iser: Limit the max size of a scsi
	command
In-Reply-To: <Pine.LNX.4.44.0609111216090.10648-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609111219580.10648-100000@hydrus>

Currently, the data length of a command coming down from scsi-ml
is limited only by the size of its sg list (sg_tablesize). The
max data length may be different for different page size values.
By setting max_sectors, we limit the data length to
max_sectors*512 bytes.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

522817c2dbb865c98465f3d17978dbdc8c4ff100
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 101e407..2a14fe2 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -545,6 +545,7 @@ static struct scsi_host_template iscsi_i
 	.queuecommand           = iscsi_queuecommand,
 	.can_queue		= ISCSI_XMIT_CMDS_MAX - 1,
 	.sg_tablesize           = ISCSI_ISER_SG_TABLESIZE,
+	.max_sectors		= 1024,
 	.cmd_per_lun            = ISCSI_MAX_CMD_PER_LUN,
 	.eh_abort_handler       = iscsi_eh_abort,
 	.eh_host_reset_handler	= iscsi_eh_host_reset,
-- 
1.2.6


From erezz at voltaire.com  Mon Sep 11 02:22:30 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 12:22:30 +0300 (IDT)
Subject: [openib-general] [PATCH 3/5] IB/iser: make FMR "page size" be 4K
 and not PAGE_SIZE
In-Reply-To: <Pine.LNX.4.44.0609111216090.10648-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609111221240.10648-100000@hydrus>

As iser is able to use at most one rdma operation for the
execution of a scsi command, and registration of the sg
associated with scsi command has its restrictions, the code
checks if an sg is "aligned for rdma".

Alignment for rdma is measured in "fmr page" units whose
possible resolutions are different between HCAs and can be
smaller, equal or bigger to the system page size.

When the system page size is bigger than 4KB (eg the default
with ia64 kernels) there a bigger chance that an sg would be
aligned for rdma if the fmr page size is 4KB.

Change the code to create FMR whose pages are of size 4KB
and to take that into account when processing the sg.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.h  |    6 +++++-
 drivers/infiniband/ulp/iser/iser_memory.c |   31 +++++++++++++++++++----------
 drivers/infiniband/ulp/iser/iser_verbs.c  |    4 ++--
 3 files changed, 27 insertions(+), 14 deletions(-)

1f90243f796772fcaea6ad059876a0aad6a06d52
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 7c3d0c9..2c8bc67 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -82,8 +82,12 @@
 		       __func__ , ## arg);		\
 	} while (0)
 
+#define SHIFT_4K	12
+#define SIZE_4K	(1UL << SHIFT_4K)
+#define MASK_4K	(~(SIZE_4K-1))
+
 					/* support upto 512KB in one RDMA */
-#define ISCSI_ISER_SG_TABLESIZE         (0x80000 >> PAGE_SHIFT)
+#define ISCSI_ISER_SG_TABLESIZE         (0x80000 >> SHIFT_4K)
 #define ISCSI_ISER_MAX_LUN		256
 #define ISCSI_ISER_MAX_CMD_LEN		16
 
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 53af956..bcef0d3 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -42,6 +42,7 @@
 #include "iscsi_iser.h"
 
 #define ISER_KMALLOC_THRESHOLD 0x20000 /* 128K - kmalloc limit */
+
 /**
  * Decrements the reference count for the
  * registered buffer & releases it
@@ -239,7 +240,7 @@ static int iser_sg_to_page_vec(struct is
 	int i;
 
 	/* compute the offset of first element */
-	page_vec->offset = (u64) sg[0].offset;
+	page_vec->offset = (u64) sg[0].offset & ~MASK_4K;
 
 	for (i = 0; i < data->dma_nents; i++) {
 		total_sz += sg_dma_len(&sg[i]);
@@ -247,21 +248,30 @@ static int iser_sg_to_page_vec(struct is
 		first_addr = sg_dma_address(&sg[i]);
 		last_addr  = first_addr + sg_dma_len(&sg[i]);
 
-		start_aligned = !(first_addr & ~PAGE_MASK);
-		end_aligned   = !(last_addr  & ~PAGE_MASK);
+		start_aligned = !(first_addr & ~MASK_4K);
+		end_aligned   = !(last_addr  & ~MASK_4K);
 
 		/* continue to collect page fragments till aligned or SG ends */
 		while (!end_aligned && (i + 1 < data->dma_nents)) {
 			i++;
 			total_sz += sg_dma_len(&sg[i]);
 			last_addr = sg_dma_address(&sg[i]) + sg_dma_len(&sg[i]);
-			end_aligned = !(last_addr  & ~PAGE_MASK);
+			end_aligned = !(last_addr  & ~MASK_4K);
 		}
 
-		first_addr = first_addr & PAGE_MASK;
-
-		for (page = first_addr; page < last_addr; page += PAGE_SIZE)
-			page_vec->pages[cur_page++] = page;
+		/* handle the 1st page in the 1st DMA element */
+		if (cur_page == 0) {
+			page = first_addr & MASK_4K;
+			page_vec->pages[cur_page] = page;
+			cur_page++;
+			page += SIZE_4K;
+		} else
+			page = first_addr;
+
+		for (; page < last_addr; page += SIZE_4K) {
+			page_vec->pages[cur_page] = page;
+			cur_page++;
+		}
 
 	}
 	page_vec->data_size = total_sz;
@@ -269,8 +279,7 @@ static int iser_sg_to_page_vec(struct is
 	return cur_page;
 }
 
-#define MASK_4K			((1UL << 12) - 1) /* 0xFFF */
-#define IS_4K_ALIGNED(addr)	((((unsigned long)addr) & MASK_4K) == 0)
+#define IS_4K_ALIGNED(addr)	((((unsigned long)addr) & ~MASK_4K) == 0)
 
 /**
  * iser_data_buf_aligned_len - Tries to determine the maximal correctly aligned
@@ -352,7 +361,7 @@ static void iser_page_vec_build(struct i
 
 	page_vec->length = page_vec_len;
 
-	if (page_vec_len * PAGE_SIZE < page_vec->data_size) {
+	if (page_vec_len * SIZE_4K < page_vec->data_size) {
 		iser_err("page_vec too short to hold this SG\n");
 		iser_data_buf_dump(data);
 		iser_dump_page_vec(page_vec);
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 72febf1..9b27a7c 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -150,7 +150,7 @@ static int iser_create_ib_conn_res(struc
 	}
 	ib_conn->page_vec->pages = (u64 *) (ib_conn->page_vec + 1);
 
-	params.page_shift        = PAGE_SHIFT;
+	params.page_shift        = SHIFT_4K;
 	/* when the first/last SG element are not start/end *
 	 * page aligned, the map whould be of N+1 pages     */
 	params.max_pages_per_fmr = ISCSI_ISER_SG_TABLESIZE + 1;
@@ -604,7 +604,7 @@ int iser_reg_page_vec(struct iser_conn  
 
 	mem_reg->lkey  = mem->fmr->lkey;
 	mem_reg->rkey  = mem->fmr->rkey;
-	mem_reg->len   = page_vec->length * PAGE_SIZE;
+	mem_reg->len   = page_vec->length * SIZE_4K;
 	mem_reg->va    = io_addr;
 	mem_reg->mem_h = (void *)mem;
 
-- 
1.2.6


From erezz at voltaire.com  Mon Sep 11 02:24:00 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 12:24:00 +0300 (IDT)
Subject: [openib-general] [PATCH 4/5] IB/iser: fix some debug prints
In-Reply-To: <Pine.LNX.4.44.0609111216090.10648-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609111223000.10648-100000@hydrus>

fix and add some debug prints related to iser
handling of memory for rdma.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iser_memory.c |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

00703cf2800ce3ac864b149ce75435b00480d9d2
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index bcef0d3..8fea0bc 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -329,9 +329,9 @@ static void iser_data_buf_dump(struct is
 	struct scatterlist *sg = (struct scatterlist *)data->buf;
 	int i;
 
-	for (i = 0; i < data->size; i++)
+	for (i = 0; i < data->dma_nents; i++)
 		iser_err("sg[%d] dma_addr:0x%lX page:0x%p "
-			 "off:%d sz:%d dma_len:%d\n",
+			 "off:0x%x sz:0x%x dma_len:0x%x\n",
 			 i, (unsigned long)sg_dma_address(&sg[i]),
 			 sg[i].page, sg[i].offset,
 			 sg[i].length,sg_dma_len(&sg[i]));
@@ -383,6 +383,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 	struct iser_regd_buf *regd_buf;
 	int aligned_len;
 	int err;
+	int i;
 
 	regd_buf = &iser_ctask->rdma_regd[cmd_dir];
 
@@ -400,8 +401,18 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 
 	iser_page_vec_build(mem, ib_conn->page_vec);
 	err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, &regd_buf->reg);
-	if (err)
+	if (err) {
+		iser_data_buf_dump(mem);
+		iser_err("mem->dma_nents = %d (dlength = 0x%x)\n", mem->dma_nents,
+			 ntoh24(iser_ctask->desc.iscsi_header.dlength));
+		iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n",
+			 ib_conn->page_vec->data_size, ib_conn->page_vec->length,
+			 ib_conn->page_vec->offset);
+		for (i=0 ; i<ib_conn->page_vec->length ; i++) {
+			iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]);
+		}
 		return err;
+	}
 
 	/* take a reference on this regd buf such that it will not be released *
 	 * (eg in send dto completion) before we get the scsi response         */
-- 
1.2.6


From erezz at voltaire.com  Mon Sep 11 02:26:33 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 12:26:33 +0300 (IDT)
Subject: [openib-general] [PATCH 5/5] IB/iser: Do not use FMR for a single
	dma entry sg
In-Reply-To: <Pine.LNX.4.44.0609111216090.10648-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609111225020.10648-100000@hydrus>

Fast Memory Registration (fmr) is used to register for rdma an sg whose
elements are not linearly sequential after dma mapping.

The IB verbs layer provides an "all dma memory MR (memory region)" which
can be used for RDMA-ing a dma linearly sequential buffer.

Change the code to use the dma mr instead of doing fmr when dma mapping
produces a single dma entry sg.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.h  |    1 +
 drivers/infiniband/ulp/iser/iser_memory.c |   48 +++++++++++++++++++++--------
 drivers/infiniband/ulp/iser/iser_verbs.c  |    6 ++--
 3 files changed, 39 insertions(+), 16 deletions(-)

c403e930977afb2838588523d10819ce586951a2
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 2c8bc67..2cf9ae0 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -175,6 +175,7 @@ struct iser_mem_reg {
 	u64  va;
 	u64  len;
 	void *mem_h;
+	int  is_fmr;
 };
 
 struct iser_regd_buf {
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 8fea0bc..e0d4347 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -56,7 +56,7 @@ int iser_regd_buff_release(struct iser_r
 	if ((atomic_read(&regd_buf->ref_count) == 0) ||
 	    atomic_dec_and_test(&regd_buf->ref_count)) {
 		/* if we used the dma mr, unreg is just NOP */
-		if (regd_buf->reg.rkey != 0)
+		if (regd_buf->reg.is_fmr)
 			iser_unreg_mem(&regd_buf->reg);
 
 		if (regd_buf->dma_addr) {
@@ -91,9 +91,9 @@ void iser_reg_single(struct iser_device 
 	BUG_ON(dma_mapping_error(dma_addr));
 
 	regd_buf->reg.lkey = device->mr->lkey;
-	regd_buf->reg.rkey = 0; /* indicate there's no need to unreg */
 	regd_buf->reg.len  = regd_buf->data_size;
 	regd_buf->reg.va   = dma_addr;
+	regd_buf->reg.is_fmr = 0;
 
 	regd_buf->dma_addr  = dma_addr;
 	regd_buf->direction = direction;
@@ -379,11 +379,13 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 		      enum   iser_data_dir        cmd_dir)
 {
 	struct iser_conn     *ib_conn = iser_ctask->iser_conn->ib_conn;
+	struct iser_device   *device = ib_conn->device;
 	struct iser_data_buf *mem = &iser_ctask->data[cmd_dir];
 	struct iser_regd_buf *regd_buf;
 	int aligned_len;
 	int err;
 	int i;
+	struct scatterlist *sg;
 
 	regd_buf = &iser_ctask->rdma_regd[cmd_dir];
 
@@ -399,19 +401,37 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 		mem = &iser_ctask->data_copy[cmd_dir];
 	}
 
-	iser_page_vec_build(mem, ib_conn->page_vec);
-	err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, &regd_buf->reg);
-	if (err) {
-		iser_data_buf_dump(mem);
-		iser_err("mem->dma_nents = %d (dlength = 0x%x)\n", mem->dma_nents,
-			 ntoh24(iser_ctask->desc.iscsi_header.dlength));
-		iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n",
-			 ib_conn->page_vec->data_size, ib_conn->page_vec->length,
-			 ib_conn->page_vec->offset);
-		for (i=0 ; i<ib_conn->page_vec->length ; i++) {
-			iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]);
+	/* if there a single dma entry, FMR is not needed */
+	if (mem->dma_nents == 1) {
+		sg = (struct scatterlist *)mem->buf;
+
+		regd_buf->reg.lkey = device->mr->lkey;
+		regd_buf->reg.rkey = device->mr->rkey;
+		regd_buf->reg.len  = sg_dma_len(&sg[0]);
+		regd_buf->reg.va   = sg_dma_address(&sg[0]);
+		regd_buf->reg.is_fmr = 0;
+
+		iser_dbg("PHYSICAL Mem.register: lkey: 0x%08X rkey: 0x%08X  "
+			 "va: 0x%08lX sz: %ld]\n",
+			 (unsigned int)regd_buf->reg.lkey,
+			 (unsigned int)regd_buf->reg.rkey,
+			 (unsigned long)regd_buf->reg.va,
+			 (unsigned long)regd_buf->reg.len);
+	} else { /* use FMR for multiple dma entries */
+		iser_page_vec_build(mem, ib_conn->page_vec);
+		err = iser_reg_page_vec(ib_conn, ib_conn->page_vec, &regd_buf->reg);
+		if (err) {
+			iser_data_buf_dump(mem);
+			iser_err("mem->dma_nents = %d (dlength = 0x%x)\n", mem->dma_nents,
+				 ntoh24(iser_ctask->desc.iscsi_header.dlength));
+			iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n",
+				 ib_conn->page_vec->data_size, ib_conn->page_vec->length,
+				 ib_conn->page_vec->offset);
+			for (i=0 ; i<ib_conn->page_vec->length ; i++) {
+				iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]);
+			}
+			return err;
 		}
-		return err;
 	}
 
 	/* take a reference on this regd buf such that it will not be released *
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 9b27a7c..ecdca7f 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -88,8 +88,9 @@ static int iser_create_device_ib_res(str
 		     iser_cq_tasklet_fn,
 		     (unsigned long)device);
 
-	device->mr = ib_get_dma_mr(device->pd,
-				   IB_ACCESS_LOCAL_WRITE);
+	device->mr = ib_get_dma_mr(device->pd, IB_ACCESS_LOCAL_WRITE |
+				   IB_ACCESS_REMOTE_WRITE |
+				   IB_ACCESS_REMOTE_READ);
 	if (IS_ERR(device->mr))
 		goto dma_mr_err;
 
@@ -606,6 +607,7 @@ int iser_reg_page_vec(struct iser_conn  
 	mem_reg->rkey  = mem->fmr->rkey;
 	mem_reg->len   = page_vec->length * SIZE_4K;
 	mem_reg->va    = io_addr;
+	mem_reg->is_fmr = 1;
 	mem_reg->mem_h = (void *)mem;
 
 	mem_reg->va   += page_vec->offset;
-- 
1.2.6


From mst at mellanox.co.il  Mon Sep 11 02:51:19 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 12:51:19 +0300
Subject: [openib-general] [PATCH] ehca for OFED 1.1-rc4
In-Reply-To: <OF5E56DFB2.C0CCE0F2-ONC12571E6.002E3D28-C12571E6.002E6F7F@de.ibm.com>
References: <20060910093747.GA11625@mellanox.co.il>
	<OF5E56DFB2.C0CCE0F2-ONC12571E6.002E3D28-C12571E6.002E6F7F@de.ibm.com>
Message-ID: <20060911095119.GA11825@mellanox.co.il>

Quoting r. Hoang-Nam Nguyen <HNGUYEN at de.ibm.com>:
> Subject: Re: [PATCH] ehca for OFED 1.1-rc4
> 
> I guess my email client must have wrapped the lines so that the patch is
> not applicable any more. Sorry for that!

You also want to fix the mail format - you currently send each mail
in both HTML and plain text - make it plaintext only.

> Need little time to fix that problem. For now I'm sending you the patch
> file as attachment that I could apply without errors.
> Thanks,
> Nam Nguyen
> 
> (See attached file: ofed_svnehca_0015.patch)

OK, applied and will be pushed out.

-- 
MST


From zhushisongzhu at yahoo.com  Mon Sep 11 02:53:11 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Mon, 11 Sep 2006 02:53:11 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911075038.GC10024@mellanox.co.il>
Message-ID: <20060911095311.71272.qmail@web36909.mail.mud.yahoo.com>

> You should not need this change with the scale patch
> I posted - after applying
> this, and setting the scale parameter to 0x1, each
> connection should use around
> 128K for RX. Please confirm.
I have tested it again, yes, you are right. I just set
the scale parameter to 0x1, each connection cost about
128K memory.

  zhu
 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From halr at voltaire.com  Mon Sep 11 02:49:37 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Sep 2006 05:49:37 -0400
Subject: [openib-general] [PATCH] osm: OSM bug fix with --run-once option
In-Reply-To: <1157870154.29270.42.camel@kliteynik.yok.mtl.com>
References: <1157870154.29270.42.camel@kliteynik.yok.mtl.com>
Message-ID: <1157968170.27427.97391.camel@hal.voltaire.com>

Hi Yevgeny,

On Sun, 2006-09-10 at 02:35, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> This patch fixes the bug that was occurring when OSM was 
> running with --run-once option (-o) and the SM port was down.
> In that case, OSM would be stuck in cond_wait forever (or until
> the port will become active), and could not be terminated, 
> other than by SIGKILL.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From zhushisongzhu at yahoo.com  Mon Sep 11 02:59:41 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Mon, 11 Sep 2006 02:59:41 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911075038.GC10024@mellanox.co.il>
Message-ID: <20060911095941.87733.qmail@web36903.mail.mud.yahoo.com>

> You should not need this change with the scale patch
> I posted - after applying
> this, and setting the scale parameter to 0x1, each
> connection should use around
> 128K for RX. Please confirm.
Just setting the scale parameter to 0x1, memory
reduction is OK.  But there occurred one bug,
sometimes my kernel crashed. So I think PRE POST buf
size should be changed either.
  zhu

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From mst at mellanox.co.il  Mon Sep 11 04:05:24 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 14:05:24 +0300
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911095941.87733.qmail@web36903.mail.mud.yahoo.com>
References: <20060911095941.87733.qmail@web36903.mail.mud.yahoo.com>
Message-ID: <20060911110524.GB11825@mellanox.co.il>

Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> Subject: Re: why sdp connections cost so much memory
> 
> > You should not need this change with the scale patch
> > I posted - after applying
> > this, and setting the scale parameter to 0x1, each
> > connection should use around
> > 128K for RX. Please confirm.
> Just setting the scale parameter to 0x1, memory
> reduction is OK.  But there occurred one bug,
> sometimes my kernel crashed.

Shouldn't happen. Backtrace?

> So I think PRE POST buf
> size should be changed either.
>   zhu

Hmm. I don't really see how this would help.
Is it true that changing just the RX size fixes the crashes for you?
If yes I'd like to investigate.

-- 
MST


From johnt1johnt2 at gmail.com  Mon Sep 11 05:18:40 2006
From: johnt1johnt2 at gmail.com (john t)
Date: Mon, 11 Sep 2006 17:48:40 +0530
Subject: [openib-general] kernel mode
Message-ID: <a94efc20609110518s74f4d159s32f7c10c74279819@mail.gmail.com>

Hi,

A general doubt.

If I write a kernel program (linux kernel module) to send and receive data
using IB, will it perform better then its user mode counterpart. Unlike user
mode, in  kernel mode, I think it is possible to allocate physically
contiguous memory using "kmalloc or alloc_pages" which means HCAs need not
do any address translation (i.e. no need of page table lookup as I guess in
this case virtual address and physical address will differ only by a fixed
offset) for copying data into main memory. Besides I think traditional DMAs
give better performance with contiguous memory and use a special GFP_DMA
zone. Moreover polling a CQ may be more efficient in kernel. Is this
correct?

Regards,
John T.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/7928ad2b/attachment.html>

From halr at voltaire.com  Mon Sep 11 06:23:37 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 11 Sep 2006 09:23:37 -0400
Subject: [openib-general] [PATCH][TRIVIAL] OpenSM: Change QoS syntax for CA
	ports
Message-ID: <1157981006.27427.104217.camel@hal.voltaire.com>

OpenSM: Change QoS syntax for CA ports

Change names from hca_ to ca_ to make it clearer that these are for both
HCAs and TCAs.

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: doc/qos-config.txt
===================================================================
--- doc/qos-config.txt	(revision 9347)
+++ doc/qos-config.txt	(working copy)
@@ -28,11 +28,11 @@ values may be stored in OpenSM config fi
 
 In addition to the above, we may define separate QoS configuration
 parameters sets for various target types. As targets, we currently support
-HCA, routers, switch external ports, and switch's enhanced port 0. The
+CAs, routers, switch external ports, and switch's enhanced port 0. The
 names of such specialized parameters are prefixed by "qos_<type>_"
 string. Here is a full list of the currently supported sets:
 
-  qos_hca_ - QoS configuration parameters set for HCAs.
+  qos_ca_  - QoS configuration parameters set for CAs.
   qos_rtr_ - parameters set for routers.
   qos_sw0_ - parameters set for switches' port 0.
   qos_swe_ - parameters set for switches' external ports.
@@ -40,5 +40,5 @@ string. Here is a full list of the curre
 Examples:
 
   qos_sw0_max_vls=2
-  qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0,
+  qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0,
   qos_swe_high_limit=0
Index: man/opensm.8
===================================================================
--- man/opensm.8	(revision 9347)
+++ man/opensm.8	(working copy)
@@ -1,4 +1,4 @@
-.TH OPENSM 8 "Setpember 6, 2006" "OpenIB" "OpenIB Management"
+.TH OPENSM 8 "Setpember 11, 2006" "OpenIB" "OpenIB Management"
 
 .SH NAME
 opensm \- InfiniBand subnet manager and administration (SM/SA) 
@@ -365,18 +365,18 @@ values may be stored in OpenSM config fi
 
 In addition to the above, we may define separate QoS configuration
 parameters sets for various target types. As targets, we currently support
-HCA, routers, switch external ports, and switch's enhanced port 0. The
+CAs, routers, switch external ports, and switch's enhanced port 0. The
 names of such specialized parameters are prefixed by "qos_<type>_"
 string. Here is a full list of the currently supported sets:
 
- qos_hca_ - QoS configuration parameters set for HCAs.
+ qos_ca_  - QoS configuration parameters set for CAs.
  qos_rtr_ - parameters set for routers.
  qos_sw0_ - parameters set for switches' port 0.
  qos_swe_ - parameters set for switches' external ports.
 
 Examples:
  qos_sw0_max_vls=2
- qos_hca_sl2vl=0,1,2,3,5,5,5,12,12,0,
+ qos_ca_sl2vl=0,1,2,3,5,5,5,12,12,0,
  qos_swe_high_limit=0
 
 .SH ROUTING
Index: include/opensm/osm_subnet.h
===================================================================
--- include/opensm/osm_subnet.h	(revision 9351)
+++ include/opensm/osm_subnet.h	(working copy)
@@ -282,7 +282,7 @@ typedef struct _osm_subn_opt
   boolean_t                exit_on_fatal;
   boolean_t                honor_guid2lid_file;
   osm_qos_options_t        qos_options;
-  osm_qos_options_t        qos_hca_options;
+  osm_qos_options_t        qos_ca_options;
   osm_qos_options_t        qos_sw0_options;
   osm_qos_options_t        qos_swe_options;
   osm_qos_options_t        qos_rtr_options;
@@ -457,8 +457,8 @@ typedef struct _osm_subn_opt
 *	qos_options
 *		Default set of QoS options
 *
-*	qos_hca_options
-*		QoS options for HCA ports
+*	qos_ca_options
+*		QoS options for CA ports
 *
 *	qos_sw0_options
 *		QoS options for switches' port 0
Index: opensm/osm_subnet.c
===================================================================
--- opensm/osm_subnet.c	(revision 9351)
+++ opensm/osm_subnet.c	(working copy)
@@ -495,7 +495,7 @@ osm_subn_set_default_opt(
   p_opt->updn_guid_file = NULL;
   p_opt->exit_on_fatal = TRUE;
   subn_set_default_qos_options(&p_opt->qos_options);
-  subn_set_default_qos_options(&p_opt->qos_hca_options);
+  subn_set_default_qos_options(&p_opt->qos_ca_options);
   subn_set_default_qos_options(&p_opt->qos_sw0_options);
   subn_set_default_qos_options(&p_opt->qos_swe_options);
   subn_set_default_qos_options(&p_opt->qos_rtr_options);
@@ -737,8 +737,8 @@ osm_subn_rescan_conf_file(
       subn_parse_qos_options("qos",
         p_key, p_val, &p_opts->qos_options);
 
-      subn_parse_qos_options("qos_hca",
-        p_key, p_val, &p_opts->qos_hca_options);
+      subn_parse_qos_options("qos_ca",
+        p_key, p_val, &p_opts->qos_ca_options);
 
       subn_parse_qos_options("qos_sw0",
         p_key, p_val, &p_opts->qos_sw0_options);
@@ -967,8 +967,8 @@ osm_subn_parse_conf_file(
       subn_parse_qos_options("qos",
         p_key, p_val, &p_opts->qos_options);
 
-      subn_parse_qos_options("qos_hca",
-        p_key, p_val, &p_opts->qos_hca_options);
+      subn_parse_qos_options("qos_ca",
+        p_key, p_val, &p_opts->qos_ca_options);
 
       subn_parse_qos_options("qos_sw0",
         p_key, p_val, &p_opts->qos_sw0_options);
@@ -1211,7 +1211,7 @@ osm_subn_write_conf_file(
     "QoS default options", "qos", &p_opts->qos_options);
   fprintf(opts_file, "\n");
   subn_dump_qos_options(opts_file,
-    "QoS HCA options", "qos_hca", &p_opts->qos_hca_options);
+    "QoS CA options", "qos_ca", &p_opts->qos_ca_options);
   fprintf(opts_file, "\n");
   subn_dump_qos_options(opts_file,
     "QoS Switch Port 0 options", "qos_sw0", &p_opts->qos_sw0_options);
Index: opensm/osm_qos.c
===================================================================
--- opensm/osm_qos.c	(revision 9347)
+++ opensm/osm_qos.c	(working copy)
@@ -318,7 +318,7 @@ static ib_api_status_t qos_physp_setup(o
 
 osm_signal_t osm_qos_setup(osm_opensm_t * p_osm)
 {
-	struct qos_config hca_config, sw0_config, swe_config, rtr_config;
+	struct qos_config ca_config, sw0_config, swe_config, rtr_config;
 	struct qos_config *cfg;
 	osm_switch_t *p_sw;
 	ib_switch_info_t *p_si;
@@ -336,7 +336,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t 
 
 	OSM_LOG_ENTER(&p_osm->log, osm_qos_setup);
 
-	qos_build_config(&hca_config, &p_osm->subn.opt.qos_hca_options,
+	qos_build_config(&ca_config, &p_osm->subn.opt.qos_ca_options,
 			 &p_osm->subn.opt.qos_options);
 	qos_build_config(&sw0_config, &p_osm->subn.opt.qos_sw0_options,
 			 &p_osm->subn.opt.qos_options);
@@ -376,7 +376,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t 
 		else if (node_type == IB_NODE_TYPE_ROUTER)
 			cfg = &rtr_config;
 		else
-			cfg = &hca_config;
+			cfg = &ca_config;
 
 		p_physp = osm_port_get_default_phys_ptr(p_port);
 		if (!osm_physp_is_valid(p_physp))


From jim.ryan at intel.com  Mon Sep 11 07:22:28 2006
From: jim.ryan at intel.com (Ryan, Jim)
Date: Mon, 11 Sep 2006 07:22:28 -0700
Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition
Message-ID: <E8A7BA95545A91438970202FFE79C023E353E8@orsmsx413.amr.corp.intel.com>

Shawn, thanks for the note and best of luck at Microsoft. I suggest we
take Shawn's recommendation and ask Jamie to continue Shawn's leadership
of the EWG.

Jim

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen
(shahanse)
Sent: Friday, September 08, 2006 5:29 PM
To: OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: [openfabrics-ewg] Goodbye and Transition

All,

FYI: I've decided to relocate my family to Seattle, and will be leaving
Cisco.  I plan to join Microsoft's Server and Tools division at the end
of this month.

I would like to recommend Jamie Riotto, Senior Director of Engineering,
as my EWG replacement.  Jamie is responsible for all engineering for
Cisco's Server Networking and Virtualization Business Unit, including
Cisco's host driver and RDMA development efforts.

Please stay in touch, and I wish the team the best.

Regards,

--Shawn 
----------------------------
Shawn Hansen
Director, Product Management
Cisco Systems

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From rdreier at cisco.com  Mon Sep 11 07:37:00 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 07:37:00 -0700
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <4505032B.3050706@voltaire.com> (Erez Zilber's message of
	"Mon, 11 Sep 2006 09:33:15 +0300")
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com>
Message-ID: <ada1wqi79mb.fsf@cisco.com>

There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig
file.  ISER only depends on INFINIBAND && SCSI.  However it is easily
possible to enable INFINIBAND and SCSI without enabling INET (in fact
they can be enabled without NET as in the original config in this thread).

iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it
depends on, so this alone will result in a broken config.  However
nothing will enable INET (which I think you said iser depends on).  So
something like the below is required, I think.  Although it would
probably be better to make iser depend on INET (as ISCSI_TCP does)
rather than selecting NET and INET.

Toralf, can you confirm that applying this patch and doing make
oldconfig and make with your original config works OK?

Thanks,
  Roland

diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
index fead87d..a122bb4 100644
--- a/drivers/infiniband/ulp/iser/Kconfig
+++ b/drivers/infiniband/ulp/iser/Kconfig
@@ -1,6 +1,8 @@
 config INFINIBAND_ISER
 	tristate "ISCSI RDMA Protocol"
 	depends on INFINIBAND && SCSI
+	select NET
+	select INET
 	select SCSI_ISCSI_ATTRS
 	---help---
 	  Support for the ISCSI RDMA Protocol over InfiniBand.  This


From mst at mellanox.co.il  Mon Sep 11 07:44:38 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 17:44:38 +0300
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <ada1wqi79mb.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
Message-ID: <20060911144438.GA13919@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: Fwd: linux- 2.6.18-rc6-git1 issue 46: drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to `rdma_create_id'
> 
> There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig
> file.  ISER only depends on INFINIBAND && SCSI.  However it is easily
> possible to enable INFINIBAND and SCSI without enabling INET (in fact
> they can be enabled without NET as in the original config in this thread).
> 
> iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it
> depends on, so this alone will result in a broken config.  However
> nothing will enable INET (which I think you said iser depends on).  So
> something like the below is required, I think.  Although it would
> probably be better to make iser depend on INET (as ISCSI_TCP does)
> rather than selecting NET and INET.

Maybe just make iser depend on CMA since that is what it really needs?

-- 
MST


From rdreier at cisco.com  Mon Sep 11 07:52:38 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 07:52:38 -0700
Subject: [openib-general] [openfabrics-ewg] is there a plan for getting
 SDP into kernel.org?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EA989@xmb-sjc-216.amer.cisco.com>
	(Scott Weitzenkamp's message of "Sun, 10 Sep 2006 22:47:24 -0700")
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EA989@xmb-sjc-216.amer.cisco.com>
Message-ID: <adawt8a5ubt.fsf@cisco.com>

    Scott> I would like to see netstat support, zcopy support, and
    Scott> ideally AIO support get added first...
 
Better to merge first and then add features I think.

 - R.


From rdreier at cisco.com  Mon Sep 11 07:54:33 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 07:54:33 -0700
Subject: [openib-general] [PATCH 5/5] IB/iser: Do not use FMR for a
 single dma entry sg
In-Reply-To: <Pine.LNX.4.44.0609111225020.10648-100000@hydrus> (Erez
	Zilber's message of "Mon, 11 Sep 2006 12:26:33 +0300 (IDT)")
References: <Pine.LNX.4.44.0609111225020.10648-100000@hydrus>
Message-ID: <adasliy5u8m.fsf@cisco.com>

Thanks, applied 1-5 with this minor fix for a compile warning:

--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -427,9 +427,9 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 			iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n",
 				 ib_conn->page_vec->data_size, ib_conn->page_vec->length,
 				 ib_conn->page_vec->offset);
-			for (i=0 ; i<ib_conn->page_vec->length ; i++) {
-				iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]);
-			}
+			for (i=0 ; i<ib_conn->page_vec->length ; i++)
+				iser_err("page_vec[%d] = 0x%llx\n", i,
+					 (unsigned long long) ib_conn->page_vec->pages[i]);
 			return err;
 		}
 	}


From erezz at voltaire.com  Mon Sep 11 08:19:05 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 18:19:05 +0300
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <ada1wqi79mb.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
Message-ID: <45057E69.6040503@voltaire.com>

Roland Dreier wrote:
> There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig
> file.  ISER only depends on INFINIBAND && SCSI.  However it is easily
> possible to enable INFINIBAND and SCSI without enabling INET (in fact
> they can be enabled without NET as in the original config in this thread).
>
> iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it
> depends on, so this alone will result in a broken config.  However
> nothing will enable INET (which I think you said iser depends on).  So
> something like the below is required, I think.  Although it would
> probably be better to make iser depend on INET (as ISCSI_TCP does)
> rather than selecting NET and INET.
>
>   
Let me make sure that I understand:
If INET is disabled and we enable INFINIBAND, INFINIBAND_ADDR_TRANS will 
not be enabled (because INET is disbaled). This results in the scenario 
that Toralf is in. If this is correct, I agree with your patch.

Thanks
Erez


From erezz at voltaire.com  Mon Sep 11 08:19:57 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 11 Sep 2006 18:19:57 +0300
Subject: [openib-general] [PATCH 5/5] IB/iser: Do not use FMR for a
 single dma entry sg
In-Reply-To: <adasliy5u8m.fsf@cisco.com>
References: <Pine.LNX.4.44.0609111225020.10648-100000@hydrus>
	<adasliy5u8m.fsf@cisco.com>
Message-ID: <45057E9D.7030502@voltaire.com>

Roland Dreier wrote:
> Thanks, applied 1-5 with this minor fix for a compile warning:
>
> --- a/drivers/infiniband/ulp/iser/iser_memory.c
> +++ b/drivers/infiniband/ulp/iser/iser_memory.c
> @@ -427,9 +427,9 @@ int iser_reg_rdma_mem(struct iscsi_iser_
>  			iser_err("page_vec: data_size = 0x%x, length = %d, offset = 0x%x\n",
>  				 ib_conn->page_vec->data_size, ib_conn->page_vec->length,
>  				 ib_conn->page_vec->offset);
> -			for (i=0 ; i<ib_conn->page_vec->length ; i++) {
> -				iser_err("page_vec[%d] = 0x%lx\n", i, ib_conn->page_vec->pages[i]);
> -			}
> +			for (i=0 ; i<ib_conn->page_vec->length ; i++)
> +				iser_err("page_vec[%d] = 0x%llx\n", i,
> +					 (unsigned long long) ib_conn->page_vec->pages[i]);
>  			return err;
>  		}
>  	}
>   
OK, thanks.


From rdreier at cisco.com  Mon Sep 11 08:24:18 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 08:24:18 -0700
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <45057E69.6040503@voltaire.com> (Erez Zilber's message of
	"Mon, 11 Sep 2006 18:19:05 +0300")
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<45057E69.6040503@voltaire.com>
Message-ID: <adak64a5sv1.fsf@cisco.com>

    Erez> Let me make sure that I understand: If INET is disabled and
    Erez> we enable INFINIBAND, INFINIBAND_ADDR_TRANS will not be
    Erez> enabled (because INET is disbaled). This results in the
    Erez> scenario that Toralf is in. If this is correct, I agree with
    Erez> your patch.

Yes, that's right.

 - R.


From Sujal at Mellanox.com  Mon Sep 11 09:28:04 2006
From: Sujal at Mellanox.com (Sujal Das)
Date: Mon, 11 Sep 2006 09:28:04 -0700
Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition
Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F31DE2D@mtiexch01.mti.com>

Sounds like a good idea. Not sure if the EWG community knows Jamie (I do
not, for example) - it might be a good idea if Jamie introduces himself,
and specifically highlights his roles and contributions to OFA in the
past and what his vision is for OFED and its adoption by OSVs, ISVs, HPC
and enterprise customers.

-Sujal

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Ryan, Jim
Sent: Monday, September 11, 2006 7:22 AM
To: Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: Re: [openfabrics-ewg] Goodbye and Transition

Shawn, thanks for the note and best of luck at Microsoft. I suggest we
take Shawn's recommendation and ask Jamie to continue Shawn's leadership
of the EWG.

Jim

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen
(shahanse)
Sent: Friday, September 08, 2006 5:29 PM
To: OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: [openfabrics-ewg] Goodbye and Transition

All,

FYI: I've decided to relocate my family to Seattle, and will be leaving
Cisco.  I plan to join Microsoft's Server and Tools division at the end
of this month.

I would like to recommend Jamie Riotto, Senior Director of Engineering,
as my EWG replacement.  Jamie is responsible for all engineering for
Cisco's Server Networking and Virtualization Business Unit, including
Cisco's host driver and RDMA development efforts.

Please stay in touch, and I wish the team the best.

Regards,

--Shawn 
----------------------------
Shawn Hansen
Director, Product Management
Cisco Systems

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From stan.smith at intel.com  Mon Sep 11 09:27:28 2006
From: stan.smith at intel.com (Smith, Stan)
Date: Mon, 11 Sep 2006 09:27:28 -0700
Subject: [openib-general] PXE + infiniband?
Message-ID: <E8A7BA95545A91438970202FFE79C0230F16F4@orsmsx413.amr.corp.intel.com>

Eli cohen wrote:
> On Thu, 2006-09-07 at 08:19 +0100, Paul Baxter wrote:
>>> There is an implementation of PXE for Mellanox's HCAs that can be
>>> found here: http://sourceforge.net/forum/forum.php?forum_id=494529
>> 
>> Thanks for the tip
>> 
>> I, too, am interested in this.
>> 
>> Do you have a more direct link as I wandered around etherboot's
>> project site and couldn't find anything IB-specific.
>> 
>> Paul Baxter
> Hi,
> 
> Please use the following link
>
http://kent.dl.sourceforge.net/sourceforge/etherboot/etherboot-5.4.2.tar
.bz2
> to download the package. Unpack the package and cd to the src dir.
> Use an x86 arch machine to build the binaries. The infiniband drivers
> are located at src/drivers/net/mlx_ipoib/ where you can find a readme
> file in the doc directory. To build.    
> 
> cd src
> make bin/MT23108.zrom  // for MT230108
> make bin/MT25208.zrom
> make bin/MT25218.zrom
> 
> This covers all Mellanox HCAs. Please let me know if you need more
> assistance.
> 

A less involved solution is to use ROM-o-matic http://rom-o-matic.net/ .
The Etherboot 5.4.2 image for MT23108 works nicely.

> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general 


From jim.ryan at intel.com  Mon Sep 11 09:34:34 2006
From: jim.ryan at intel.com (Ryan, Jim)
Date: Mon, 11 Sep 2006 09:34:34 -0700
Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition
Message-ID: <E8A7BA95545A91438970202FFE79C023E35567@orsmsx413.amr.corp.intel.com>

Sujal, yes, thanks, makes sense. I got a "no longer there" response from
my earlier email, so Shawn won't be around to do a handoff

Jim

-----Original Message-----
From: Sujal Das [mailto:Sujal at Mellanox.com] 
Sent: Monday, September 11, 2006 9:28 AM
To: Ryan, Jim; Shawn Hansen (shahanse); OpenFabricsEWG;
openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: RE: [openfabrics-ewg] Goodbye and Transition

Sounds like a good idea. Not sure if the EWG community knows Jamie (I do
not, for example) - it might be a good idea if Jamie introduces himself,
and specifically highlights his roles and contributions to OFA in the
past and what his vision is for OFED and its adoption by OSVs, ISVs, HPC
and enterprise customers.

-Sujal

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Ryan, Jim
Sent: Monday, September 11, 2006 7:22 AM
To: Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: Re: [openfabrics-ewg] Goodbye and Transition

Shawn, thanks for the note and best of luck at Microsoft. I suggest we
take Shawn's recommendation and ask Jamie to continue Shawn's leadership
of the EWG.

Jim

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen
(shahanse)
Sent: Friday, September 08, 2006 5:29 PM
To: OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: [openfabrics-ewg] Goodbye and Transition

All,

FYI: I've decided to relocate my family to Seattle, and will be leaving
Cisco.  I plan to join Microsoft's Server and Tools division at the end
of this month.

I would like to recommend Jamie Riotto, Senior Director of Engineering,
as my EWG replacement.  Jamie is responsible for all engineering for
Cisco's Server Networking and Virtualization Business Unit, including
Cisco's host driver and RDMA development efforts.

Please stay in touch, and I wish the team the best.

Regards,

--Shawn 
----------------------------
Shawn Hansen
Director, Product Management
Cisco Systems

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From sweitzen at cisco.com  Mon Sep 11 09:38:35 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 11 Sep 2006 09:38:35 -0700
Subject: [openib-general] [openfabrics-ewg] is there a plan for getting
 SDP into kernel.org?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAAE4@xmb-sjc-216.amer.cisco.com>

>     Scott> I would like to see netstat support, zcopy support, and
>     Scott> ideally AIO support get added first...
>  
> Better to merge first and then add features I think.
> 
>  - R.
> 

How about just adding netstat before the merge, so we have some
visibility into what SDP connections are in use?

Scott


From mshefty at ichips.intel.com  Mon Sep 11 10:11:06 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 10:11:06 -0700
Subject: [openib-general] [PATCH v3] ib_sa: require SA registration
In-Reply-To: <adairkco0rc.fsf@cisco.com>
References: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com>
	<44F38370.7050809@ichips.intel.com> <adairkco0rc.fsf@cisco.com>
Message-ID: <450598AA.1070003@ichips.intel.com>

Roland Dreier wrote:
> I haven't really read the later patches but I am planning on merging
> at least the registration stuff for 2.6.19.

I'd like to commit the SA related patches soon.  There have been several e-mails 
recently about using IB multicast and the IB CM directly.

- Sean


From mshefty at ichips.intel.com  Mon Sep 11 10:18:08 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 10:18:08 -0700
Subject: [openib-general] Wrong byte order in lid of struct
 ibv_port_attr reported by ibv_query port!?
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD38E@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD38E@wdtssmail01.eu.thmulti.com>
Message-ID: <45059A50.1030405@ichips.intel.com>

Bub Thomas wrote:
> with the help of your modified cmpost.c example I found out that the 
> byte order in the lid your query_for_path in cmpost.c is getting into 
> the ib_sa_path_rec is the opposite to the one reported by  ibv_query_port.

The path record defines all fields in network-byte order.  The verb calls use 
host-byte order.  Typically, the path record information will come directly from 
the SA, which defines the fields in network-byte order, which is why it isn't 
converted to host-order.

- Sean


From mst at mellanox.co.il  Mon Sep 11 10:28:24 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 20:28:24 +0300
Subject: [openib-general] [PATCH v3] ib_sa: require SA registration
In-Reply-To: <adairkco0rc.fsf@cisco.com>
References: <adairkco0rc.fsf@cisco.com>
Message-ID: <20060911172824.GB15556@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH v3] ib_sa: require SA registration
> 
>     Sean> Roland, Not sure if you've had a chance to review the SA
>     Sean> patches, but any comments on any of the SA related patches?
>     Sean> (SA registration, generic RMPP query support, or userspace
>     Sean> SA)
> 
> I haven't really read the later patches but I am planning on merging
> at least the registration stuff for 2.6.19.

Yes, the registration stuff is clearly safe

-- 
MST


From rdreier at cisco.com  Mon Sep 11 10:30:58 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 10:30:58 -0700
Subject: [openib-general] [openfabrics-ewg] is there a plan for getting
 SDP into kernel.org?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAAE4@xmb-sjc-216.amer.cisco.com>
	(Scott Weitzenkamp's message of "Mon, 11 Sep 2006 09:38:35 -0700")
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAAE4@xmb-sjc-216.amer.cisco.com>
Message-ID: <ada7j0a5mzx.fsf@cisco.com>

    Scott> How about just adding netstat before the merge, so we have
    Scott> some visibility into what SDP connections are in use?

That's fine.  Merging upstream is somewhat long-term anyway, since
Michael has not even posted a first candidate for review -- I expect
SDP will require several go-arounds to get merged.

 - R.


From sweitzen at cisco.com  Mon Sep 11 10:34:25 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 11 Sep 2006 10:34:25 -0700
Subject: [openib-general] [openfabrics-ewg] is there a plan for getting
 SDP into kernel.org?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAB61@xmb-sjc-216.amer.cisco.com>

>     Scott> How about just adding netstat before the merge, so we have
>     Scott> some visibility into what SDP connections are in use?
> 
> That's fine.  Merging upstream is somewhat long-term anyway, since
> Michael has not even posted a first candidate for review -- I expect
> SDP will require several go-arounds to get merged.
> 
>  - R.

Michael, when do you expect to post a first candidate for review?

Scott


From mshefty at ichips.intel.com  Mon Sep 11 10:38:47 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 10:38:47 -0700
Subject: [openib-general] RDMA CMA and C++
In-Reply-To: <45003711.3040108@dev.mellanox.co.il>
References: <1157640982.20399.5.camel@trinity.ogc.int>
	<45003711.3040108@dev.mellanox.co.il>
Message-ID: <45059F27.8050805@ichips.intel.com>

Dotan Barak wrote:
>>The user-mode cm header files don't have the C++ stuff to identify all
>>the declarations as C. The verbs.h file has it and works fine if you
>>wanted to copy it, but all you really need is ...
>>
> Sean, please add those definitions to the libibcm header as well.

I've updated the libibcm and librdmacm header files.  Thanks.

- Sean


From mshefty at ichips.intel.com  Mon Sep 11 10:44:46 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 10:44:46 -0700
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
 cases.
In-Reply-To: <20060910111145.GA12111@mellanox.co.il>
References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
	<20060910111145.GA12111@mellanox.co.il>
Message-ID: <4505A08E.5000705@ichips.intel.com>

Michael S. Tsirkin wrote:
>>cma_connect_ib leaks an struct ib_cm_id* in failure cases.
>>
>>Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> 
> 
> This one looks like it might be good for 2.6.18. Sean?

The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a 
second call is not made to rdma_connect after the first call fails.  So we're 
probably safe deferring this until 2.6.19, unless someone has code which calls 
rdma_connect twice.

- Sean


From toralf.foerster at gmx.de  Mon Sep 11 10:45:59 2006
From: toralf.foerster at gmx.de (Toralf =?iso-8859-1?q?F=F6rster?=)
Date: Mon, 11 Sep 2006 19:45:59 +0200
Subject: [openib-general] Fwd: linux- 2.6.18-rc6-git1 issue 46:
 drivers/infiniband/ulp/iser/iser_verbs.c:514: undefined reference to
 `rdma_create_id'
In-Reply-To: <ada1wqi79mb.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
Message-ID: <200609111946.03315.toralf.foerster@gmx.de>

Yep,

that patch fixes the bug :-)
Thanks

Am Monday 11 September 2006 16:37 schrieb Roland Dreier:
> There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig
> file.  ISER only depends on INFINIBAND && SCSI.  However it is easily
> possible to enable INFINIBAND and SCSI without enabling INET (in fact
> they can be enabled without NET as in the original config in this thread).
> 
> iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it
> depends on, so this alone will result in a broken config.  However
> nothing will enable INET (which I think you said iser depends on).  So
> something like the below is required, I think.  Although it would
> probably be better to make iser depend on INET (as ISCSI_TCP does)
> rather than selecting NET and INET.
> 
> Toralf, can you confirm that applying this patch and doing make
> oldconfig and make with your original config works OK?
> 
> Thanks,
>   Roland
> 
> diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
> index fead87d..a122bb4 100644
> --- a/drivers/infiniband/ulp/iser/Kconfig
> +++ b/drivers/infiniband/ulp/iser/Kconfig
> @@ -1,6 +1,8 @@
>  config INFINIBAND_ISER
>  	tristate "ISCSI RDMA Protocol"
>  	depends on INFINIBAND && SCSI
> +	select NET
> +	select INET
>  	select SCSI_ISCSI_ATTRS
>  	---help---
>  	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
> 
> 

-- 
MfG/Sincerely
Toralf Förster
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/0b498257/attachment.sig>

From mshefty at ichips.intel.com  Mon Sep 11 10:50:31 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 10:50:31 -0700
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
 cases.
In-Reply-To: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
Message-ID: <4505A1E7.1060007@ichips.intel.com>

Krishna Kumar wrote:
> cma_connect_ib leaks an struct ib_cm_id* in failure cases.

Thanks - committed.

- Sean


From stephanieh at owenmedia.com  Mon Sep 11 10:52:19 2006
From: stephanieh at owenmedia.com (Stephanie Howard)
Date: Mon, 11 Sep 2006 10:52:19 -0700
Subject: [openib-general] InfiniBand DevCon Conference
Message-ID: <EB77B7A4312E1043AD0453828C9659915A1342@omi-server.omi.owenmedia.com>

Hello,

 
Attached is the final reminder for InfinBand DevCon conference.  If you
have any questions, please let me know.

 
Thank you,

 
Stephanie

 
Stephanie Howard

Owen Media

206.322.1167 ext. 102

StephanieH at owenmedia.com

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/86e90593/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: InfiniBand DevCon Blast Final Blast - OFA.doc
Type: application/msword
Size: 32256 bytes
Desc: InfiniBand DevCon Blast Final Blast - OFA.doc
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/86e90593/attachment.doc>

From mst at mellanox.co.il  Mon Sep 11 10:53:12 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 20:53:12 +0300
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
	cases.
In-Reply-To: <4505A08E.5000705@ichips.intel.com>
References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
	<20060910111145.GA12111@mellanox.co.il>
	<4505A08E.5000705@ichips.intel.com>
Message-ID: <20060911175312.GC15556@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] cma_connect_ib leaks memory in failure cases.
> 
> Michael S. Tsirkin wrote:
> >>cma_connect_ib leaks an struct ib_cm_id* in failure cases.
> >>
> >>Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> > 
> > 
> > This one looks like it might be good for 2.6.18. Sean?
> 
> The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a 
> second call is not made to rdma_connect after the first call fails.  So we're 
> probably safe deferring this until 2.6.19, unless someone has code which calls 
> rdma_connect twice.

SDP can do this I think.

-- 
MST


From john.blackwood at ccur.com  Mon Sep 11 10:53:47 2006
From: john.blackwood at ccur.com (John Blackwood)
Date: Mon, 11 Sep 2006 13:53:47 -0400
Subject: [openib-general] ib_madeye kfree() problem on module unload
Message-ID: <4505A2AB.3000603@ccur.com>


When using OFED-1.1-rc3 on a x86_64 system running a 2.6.17.3 debug 
kernel in a RHEL4 U2 environment, I see the follwing console warning 
messages when I unload the ib_madeye kernel module:

	modprobe ib_madeye
	modprobe -r ib_madeye
------------------------------------
	console messages
------------------------------------
slab error in cache_free_debugcheck(): cache `size-32': double free, or
memory outside object was overwritten

Call Trace:
   <ffffffff80279cfe>{__slab_error+36}
   <ffffffff8027a694>{cache_free_debugcheck+365}
   <ffffffff8027bbad>{kfree+136}
   <ffffffff88101d39>{:ib_madeye:madeye_remove_one+123}
   <ffffffff88055465>{:ib_core:ib_unregister_client+75}
   <ffffffff88101d54>{:ib_madeye:ib_madeye_cleanup+16}
   <ffffffff802472e7>{sys_delete_module+446}
   <ffffffff80209c9a>{tracesys+113}
   <ffffffff80209cfa>{tracesys+209}

ffff81007834bd48: redzone 1:0x170fc2a5, redzone 2:0xffff8100400929c8


From mshefty at ichips.intel.com  Mon Sep 11 11:35:51 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 11:35:51 -0700
Subject: [openib-general] [PATCH] Modify callers of cma_get_net_info for
 better error handling.
In-Reply-To: <20060908051301.5221.63041.sendpatchset@K50wks273895wss.in.ibm.com>
References: <20060908051301.5221.63041.sendpatchset@K50wks273895wss.in.ibm.com>
Message-ID: <4505AC87.4070309@ichips.intel.com>

Krishna Kumar wrote:
> Re-organize code relating to cma_get_net_info() and rdma_create_id() to
> optimize error case handling (no need to alloc memory/etc as part of
> rdma_create_id() if input parameters are wrong).

Thanks!  Committed with a minor adjustment to rename 'out' label 'err'.

- Sean


From mshefty at ichips.intel.com  Mon Sep 11 11:52:24 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 11:52:24 -0700
Subject: [openib-general] [PATCH] Optimize cma_process_remove()
In-Reply-To: <20060908051439.5229.71544.sendpatchset@K50wks273895wss.in.ibm.com>
References: <20060908051439.5229.71544.sendpatchset@K50wks273895wss.in.ibm.com>
Message-ID: <4505B068.8050501@ichips.intel.com>

Krishna Kumar wrote:
>  static void cma_process_remove(struct cma_device *cma_dev)
>  {
>  	struct list_head remove_list;
> -	struct rdma_id_private *id_priv;
> +	struct rdma_id_private *id_priv, *tmp;
>  	int ret;
>  
>  	INIT_LIST_HEAD(&remove_list);
> @@ -2344,22 +2344,20 @@ static void cma_process_remove(struct cm
>  
>  		if (cma_internal_listen(id_priv)) {
>  			cma_destroy_listen(id_priv);
> -			continue;
> +		} else {
> +			list_del(&id_priv->list);
> +			list_add_tail(&id_priv->list, &remove_list);
>  		}
> +	}
> +	mutex_unlock(&lock);
>  
> -		list_del(&id_priv->list);
> -		list_add_tail(&id_priv->list, &remove_list);
> +	list_for_each_entry_safe(id_priv, tmp, &remove_list, list) {
>  		atomic_inc(&id_priv->refcount);
> -		mutex_unlock(&lock);
> -

I don't think that this will work.  The issue is that we need to walk a list of 
IDs associated with a particular device to notify the user that the device is 
being removed.  While we're doing that, the user could try to destroy the ID, 
which removes the ID from the device list.

The original code takes a reference on the ID before removing it from the from 
cma_dev's list to ensure that the ID will be valid while we process it.  The 
remove list ensures that the user is only notified once of a device removal. 
(We don't know where the thread calling rdma_destroy_id() is at.)

We can eliminate the remove_list by calling list_del_init().

- Sean


From jriotto at cisco.com  Mon Sep 11 12:26:10 2006
From: jriotto at cisco.com (Jamie Riotto (jriotto))
Date: Mon, 11 Sep 2006 12:26:10 -0700
Subject: [openib-general] [openfabrics-ewg] Goodbye and Transition
Message-ID: <944AD9DA9232E346ADF590C41BFFEC410294E3FF@xmb-sjc-232.amer.cisco.com>

Hi everyone. Just wanted to respond and say I'm on the alias, and will
prepare a small statement in line with what has been asked below. I 
should be able to get this out in a day or two. Looking forward to 
working with you all. Cheers - jamie

Jamie Riotto
Sr. Director Engineering
Server Virtualization Business Unit (SVBU)
Cisco Communications
408-853-7813
jriotto at cisco.com
 

-----Original Message-----
From: Ryan, Jim [mailto:jim.ryan at intel.com] 
Sent: Monday, September 11, 2006 9:35 AM
To: Sujal Das; OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: RE: [openfabrics-ewg] Goodbye and Transition

Sujal, yes, thanks, makes sense. I got a "no longer there" response from
my earlier email, so Shawn won't be around to do a handoff

Jim

-----Original Message-----
From: Sujal Das [mailto:Sujal at Mellanox.com] 
Sent: Monday, September 11, 2006 9:28 AM
To: Ryan, Jim; Shawn Hansen (shahanse); OpenFabricsEWG;
openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: RE: [openfabrics-ewg] Goodbye and Transition

Sounds like a good idea. Not sure if the EWG community knows Jamie (I do
not, for example) - it might be a good idea if Jamie introduces himself,
and specifically highlights his roles and contributions to OFA in the
past and what his vision is for OFED and its adoption by OSVs, ISVs, HPC
and enterprise customers.

-Sujal

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Ryan, Jim
Sent: Monday, September 11, 2006 7:22 AM
To: Shawn Hansen (shahanse); OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: Re: [openfabrics-ewg] Goodbye and Transition

Shawn, thanks for the note and best of luck at Microsoft. I suggest we
take Shawn's recommendation and ask Jamie to continue Shawn's leadership
of the EWG.

Jim

-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Shawn Hansen
(shahanse)
Sent: Friday, September 08, 2006 5:29 PM
To: OpenFabricsEWG; openib-general at openib.org
Cc: Jamie Riotto (jriotto)
Subject: [openfabrics-ewg] Goodbye and Transition

All,

FYI: I've decided to relocate my family to Seattle, and will be leaving
Cisco.  I plan to join Microsoft's Server and Tools division at the end
of this month.

I would like to recommend Jamie Riotto, Senior Director of Engineering,
as my EWG replacement.  Jamie is responsible for all engineering for
Cisco's Server Networking and Virtualization Business Unit, including
Cisco's host driver and RDMA development efforts.

Please stay in touch, and I wish the team the best.

Regards,

--Shawn 
----------------------------
Shawn Hansen
Director, Product Management
Cisco Systems

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From mst at mellanox.co.il  Mon Sep 11 12:29:09 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 11 Sep 2006 22:29:09 +0300
Subject: [openib-general] CMA issue: bind selects the same port after close
Message-ID: <20060911192909.GA16667@mellanox.co.il>

We have encountered an issue in CMA: if
I bind to port 0, destroy the id, then bind to port 0 again
I often get back the same port from both binds.

TCP behaves differently - it seems to assign  new port numbers
each time.
This is an issue for some socket programs that assume that
the same port number won't be reused to a remote side that
connects to the same port after I have  closed by socket will get
connection refused message.
I also see applications looking for a port number that matches
some rule by repeating the create/bind/close cycle.
With CMA they always get back the same port number it seems.

Is this something that can be fixed in CMA?

Thanks,

-- 
MST


From rdreier at cisco.com  Mon Sep 11 14:06:56 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 14:06:56 -0700
Subject: [openib-general] [PATCH v3] ib_sa: require SA registration
In-Reply-To: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com> (Sean
	Hefty's message of "Mon, 21 Aug 2006 16:40:12 -0700")
References: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com>
Message-ID: <ada3bay3yfj.fsf@cisco.com>

OK, I added the following to my for-2.6.19 branch.  The differences
from your patch are:

 - CMA can have a static variable (good to avoid clashes with a global
   'sa_client' variable name too)
 - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too.
 - Simplify sa_query.c changes a little.  I don't like the
   "deref_client" name for a function, since it sounds too much like
   dereferencing a pointer rather than dropping a reference.  And I
   also didn't like ib_sa_client_get() having a magic side effect of
   setting query->client.  So I just open-coded more stuff.

How does this look?

 - R.


From sean.hefty at intel.com  Mon Sep 11 14:21:14 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 14:21:14 -0700
Subject: [openib-general] [PATCH v3] ib_sa: require SA registration
In-Reply-To: <ada3bay3yfj.fsf@cisco.com>
Message-ID: <000001c6d5e8$3662a040$a4d0180a@amr.corp.intel.com>

> - CMA can have a static variable (good to avoid clashes with a global
>   'sa_client' variable name too)

Sounds good - that's a goof on my part.

> - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too.

Okay - As an FYI, I will probably submit the multicast module upstream for
2.6.20, along with some sort of support for userspace access.

> - Simplify sa_query.c changes a little.  I don't like the
>   "deref_client" name for a function, since it sounds too much like
>   dereferencing a pointer rather than dropping a reference.  And I
>   also didn't like ib_sa_client_get() having a magic side effect of
>   setting query->client.  So I just open-coded more stuff.

Those changes sound fine to me.

- Sean


From mshefty at ichips.intel.com  Mon Sep 11 15:07:56 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 15:07:56 -0700
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <20060911192909.GA16667@mellanox.co.il>
References: <20060911192909.GA16667@mellanox.co.il>
Message-ID: <4505DE3C.7090205@ichips.intel.com>

Michael S. Tsirkin wrote:
> We have encountered an issue in CMA: if
> I bind to port 0, destroy the id, then bind to port 0 again
> I often get back the same port from both binds.
> 
> TCP behaves differently - it seems to assign  new port numbers
> each time.
> This is an issue for some socket programs that assume that
> the same port number won't be reused to a remote side that
> connects to the same port after I have  closed by socket will get
> connection refused message.
> I also see applications looking for a port number that matches
> some rule by repeating the create/bind/close cycle.
> With CMA they always get back the same port number it seems.
> 
> Is this something that can be fixed in CMA?

I think we can fix this without a huge impact.  Is there anything that states 
the way bind is supposed to behave wrt this?  Is there some delay between 
releasing a port and it being re-used that needs to be taken into account?

The basic problem in the CMA is in cma_alloc_port().  If the port number (passed 
in as snum) is 0, the first available port starting at 
sysctl_local_port_range[0] is used.  We could instead start our search by 
adding an increasing counter or a random value to the lower-end of the port 
range.  Then expand the code to handle searching below our starting value if we 
failed to find one above it.

Are the port numbers assigned by TCP sequential or more random?

- Sean


From mst at mellanox.co.il  Mon Sep 11 15:16:33 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 01:16:33 +0300
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <4505DE3C.7090205@ichips.intel.com>
References: <4505DE3C.7090205@ichips.intel.com>
Message-ID: <20060911221633.GB17098@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] CMA issue: bind selects the same port after close
> 
> Michael S. Tsirkin wrote:
> > We have encountered an issue in CMA: if
> > I bind to port 0, destroy the id, then bind to port 0 again
> > I often get back the same port from both binds.
> > 
> > TCP behaves differently - it seems to assign  new port numbers
> > each time.
> > This is an issue for some socket programs that assume that
> > the same port number won't be reused to a remote side that
> > connects to the same port after I have  closed by socket will get
> > connection refused message.
> > I also see applications looking for a port number that matches
> > some rule by repeating the create/bind/close cycle.
> > With CMA they always get back the same port number it seems.
> > 
> > Is this something that can be fixed in CMA?
> 
> I think we can fix this without a huge impact.  Is there anything that states 
> the way bind is supposed to behave wrt this?

I don't think so. But since that's how it works on linux and other systems,
apps assume this.

> Is there some delay between 
> releasing a port and it being re-used that needs to be taken into account?

TCP keeps port busy while in timewait state, unless REUSEADDR is given.
I have not yet seen any app rely on this, so it might not be important
to emulate this.

> The basic problem in the CMA is in cma_alloc_port().  If the port number (passed 
> in as snum) is 0, the first available port starting at 
> sysctl_local_port_range[0] is used.  We could instead start our search by 
> adding an increasing counter or a random value to the lower-end of the port 
> range.  Then expand the code to handle searching below our starting value if we 
> failed to find one above it.

Sounds good.

> Are the port numbers assigned by TCP sequential or more random?

TCP ports seem to be sequential.

-- 
MST


From mshefty at ichips.intel.com  Mon Sep 11 15:20:32 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Mon, 11 Sep 2006 15:20:32 -0700
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <20060907214524.GA14791@mellanox.co.il>
References: <20060907214524.GA14791@mellanox.co.il>
Message-ID: <4505E130.8010301@ichips.intel.com>

Michael S. Tsirkin wrote:
> Sean, did we decide what to do for upstream yet?
> I would say we need something like the below for 2.6.19 too
> (probably just need to update node type check).
> And, I like it that this approach leaves all matters of policy
> to users (such as whether move QP to RTS after asynchronous event
> or after completion event).

I will go with a patch similar to this one.  It seems the most flexible.

> As a side note, reasons for frequent loss of RTU must be investigated.

A lost RTU shouldn't be any more likely than a lost REQ or REP.  Is the RTU 
never showing up?  I will look into the ib_cm and see if there's an issue that 
would cause an RTU not to be retried.

- Sean


From mst at mellanox.co.il  Mon Sep 11 15:29:56 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 01:29:56 +0300
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <4505E130.8010301@ichips.intel.com>
References: <20060907214524.GA14791@mellanox.co.il>
	<4505E130.8010301@ichips.intel.com>
Message-ID: <20060911222956.GD17098@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> > As a side note, reasons for frequent loss of RTU must be investigated.
> 
> A lost RTU shouldn't be any more likely than a lost REQ or REP.  Is the RTU 
> never showing up?

Seems like that. I know fir sure I do accept after REP but remote side never
gets ESTABLISHED.

> I will look into the ib_cm and see if there's an issue that 
> would cause an RTU not to be retried.

-- 
MST


From mst at mellanox.co.il  Mon Sep 11 15:52:56 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 01:52:56 +0300
Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs
 (was Fwd: Re: [PATCH ] RFC IB/cm do not track remote QPN in timewait state)
In-Reply-To: <20060829130908.GA24322@mellanox.co.il>
References: <20060829130908.GA24322@mellanox.co.il>
Message-ID: <20060911225256.GE17098@mellanox.co.il>

Roland, all, we plan to implement the timewait handling in mthca
in time for 2.6.19:

For all connected QPs:
- upon QP destroy or move from RTS to reset/error,
  start timer for the duration of packet lifetime
- until packet expires, do not reuse this QPN

This must be done to prevent stale packets from corruptiing the new connection
(see 9.7.1).

Could you pls let me know if this approach looks sane to you?

This approach has a number of advantages over attempting to implement
same in CM on top of verbs by not destroying the QP:

- Reduce resource usage by freeing the QP (only track QPN+timer)
- Applies to all verbs users even if they bypass CM
- Solves problem for userspace CM where we can't rely on CM
  to enforce timewait

More detail can be found in thread I'm replying to.

Please comment.
_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST


From rdreier at cisco.com  Mon Sep 11 16:25:13 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 16:25:13 -0700
Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs
In-Reply-To: <20060911225256.GE17098@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 12 Sep 2006 01:52:56 +0300")
References: <20060829130908.GA24322@mellanox.co.il>
	<20060911225256.GE17098@mellanox.co.il>
Message-ID: <adamz962dgm.fsf@cisco.com>

My gut reaction is that it seems pretty ugly.  I guess we'll also need
similar patches for ipath and ehca too -- which makes me want to have
this in common code somehow.

Also timewait is really only part of the CM spec -- do we want to
limit the rate of RC QP creation in general for potential non-CM users
that know what they're doing?

I'm not sure the following is a real concern (since a hostile user can
currently just create a ton of QPs and hold onto them forever), but
this also allows someone to create a bunch of QPs with a super-long
timeout and prevent any other QPs from being created for a few hours
(until the timewait expires).

Finally one implementation comment: I think you'll want a list in
addition to QPN + timer, to allow the ib_mthca module to be unloaded
without having to wait an hour for all timers to expire.  This allows
timewait to be bypassed by unloading + reloading but that's no
different than rebooting really.

Another good prophylactic measure would probably to randomize initial
PSNs for RC connections.  SRP currently does this.

 - R.


From mst at mellanox.co.il  Mon Sep 11 16:37:40 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 02:37:40 +0300
Subject: [openib-general] 4 patches in mst-for-2.6.19
Message-ID: <20060911233740.GA19021@mellanox.co.il>

I have put the following patches in my mst-for-2.6.10 tree:

$git log --pretty=short origin..mst-for-2.6.19
commit ddfe6867088167b64962399934d21cf3e37c338b
Author: Jack Morgenstein <jackm at mellanox.co.il>

    [PATCH] IB/mthca: recover from device errors

commit 4403ad431b139b03a291263be4686363fd04138b
Author: Michael S. Tsirkin <mst at mellanox.co.il>

    [PATCH] IB/cm: do not track remote QPN in timewait state

commit 12f4b3b6fabcccf96ca0fa9911e86c1a6d9fc7a4
Author: Ishai Rabinovitz <ishai at mellanox.co.il>

    [PATCH] IB/srp: don't schedule reconnect from srp, scsi does it for us

commit a6f9624098dada22825d116d104c92bfd34465b2
Author: Ishai Rabinovitz <ishai at mellanox.co.il>

    [PATCH] IB/srp: destroy and re-create QP and CQ on reconnect

You can get them here
git://www.mellanox.co.il/~git/infiniband mst-for-2.6.19

This is against Roland's for-2.6.19 001c6b9030233a14fa27795ab3e6a6f45f16a317

These patches have been posted on the list previously, but let me know
and I'll repost them if needed.

Please comment.

-- 
MST


From mst at mellanox.co.il  Mon Sep 11 16:54:46 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 02:54:46 +0300
Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs
In-Reply-To: <adamz962dgm.fsf@cisco.com>
References: <adamz962dgm.fsf@cisco.com>
Message-ID: <20060911235446.GB19021@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: RFC: mthca: implement timewait by tracking QPNs
> 
> My gut reaction is that it seems pretty ugly.

Hmm. All of it or just some bits?


> I guess we'll also need
> similar patches for ipath and ehca too -- which makes me want to have
> this in common code somehow.

Could be a library function in core so that ipath etc can reuse it.
But note how there's no dependency between drivers here - no
reason to block change in mthca until ipath/ehca implement this functionality,
too.

> Also timewait is really only part of the CM spec

Not entirely corect. Please look at 9.7.1 - search for "stale packets":

    In addition to duplicate packets and invalid packets, there is a third
    condition, called a Stale Packet (.TIME WAIT packet.). If a connection to a
    responder is torn down and a new connection is established while packets are in
    flight, a packet from the old (stale) connection may arrive at the responder.
    
    The responder, in turn, may interpret this stale incoming packet
    as a valid packet, when in fact it is a remnant of a previous connection.
    
    There are no transport layer mechanisms to guard against this condition;
    it is the responsibility of connection management to avoid re-using QPs
    until there is no possibility that a stale packet could arrive at the responder.
    This is done by placing the requester and responder QPs in a .Time Wait.
    state long enough to ensure that any stale packets left in the fabric have
    expired before re-using those QPs.

So the spec suggests that timewait be implemented in CM, but timewait
is needed to solve a problem that affects the transport layer and that
is described in Chapter 9.

> -- do we want to
> limit the rate of RC QP creation in general for potential non-CM users
> that know what they're doing?

I don't see how this limits the rate of QP creation. Could you explain?

Second, there's no way I can see verbs user can check there no stale packets
(AK TimeWait packets). Is there? So user only *thinks* he knows what he's
doing, meanwhile getting silen data corruption. Correct?

> I'm not sure the following is a real concern (since a hostile user can
> currently just create a ton of QPs and hold onto them forever), but
> this also allows someone to create a bunch of QPs with a super-long
> timeout and prevent any other QPs from being created for a few hours
> (until the timewait expires).

Another reason why this might not be an issue is that the QPN space
is reasonably big - 2^24. I guess when we start looking at limiting
#of QPs per user, we'll need to limit the max legal packet lifetime too.
Might be a good idea anyway.

> Finally one implementation comment: I think you'll want a list in
> addition to QPN + timer, to allow the ib_mthca module to be unloaded
> without having to wait an hour for all timers to expire.  This allows
> timewait to be bypassed by unloading + reloading but that's no
> different than rebooting really.

Sure, that's obvious.

> Another good prophylactic measure would probably to randomize initial
> PSNs for RC connections.  SRP currently does this.

I agree this also helps.

-- 
MST


From rdreier at cisco.com  Mon Sep 11 18:09:17 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 11 Sep 2006 18:09:17 -0700
Subject: [openib-general] 4 patches in mst-for-2.6.19
In-Reply-To: <20060911233740.GA19021@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 12 Sep 2006 02:37:40 +0300")
References: <20060911233740.GA19021@mellanox.co.il>
Message-ID: <ada7j093n7m.fsf@cisco.com>

OK, I applied

    [PATCH] IB/cm: do not track remote QPN in timewait state

since Sean has acked that already.

I'll review the rest in the next day or two.

 - R.


From rjwalsh at pathscale.com  Mon Sep 11 20:08:32 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Mon, 11 Sep 2006 20:08:32 -0700
Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs
In-Reply-To: <20060911235446.GB19021@mellanox.co.il>
References: <adamz962dgm.fsf@cisco.com> <20060911235446.GB19021@mellanox.co.il>
Message-ID: <450624B0.3010709@pathscale.com>

> Could be a library function in core so that ipath etc can reuse it.
> But note how there's no dependency between drivers here - no
> reason to block change in mthca until ipath/ehca implement this functionality,
> too.

True.  But FWIW, we (QLogic) could probably spin something like this 
pretty quickly anyway.

Regards,
  Robert.


From zhushisongzhu at yahoo.com  Mon Sep 11 20:46:47 2006
From: zhushisongzhu at yahoo.com (zhu shi song)
Date: Mon, 11 Sep 2006 20:46:47 -0700 (PDT)
Subject: [openib-general] why sdp connections cost so much memory
In-Reply-To: <20060911110524.GB11825@mellanox.co.il>
Message-ID: <20060912034647.9016.qmail@web36909.mail.mud.yahoo.com>


--- "Michael S. Tsirkin" <mst at mellanox.co.il> wrote:

> Quoting r. zhu shi song <zhushisongzhu at yahoo.com>:
> > Subject: Re: why sdp connections cost so much
> memory
> > 
> > > You should not need this change with the scale
> patch
> > > I posted - after applying
> > > this, and setting the scale parameter to 0x1,
> each
> > > connection should use around
> > > 128K for RX. Please confirm.
> > Just setting the scale parameter to 0x1, memory
> > reduction is OK.  But there occurred one bug,
> > sometimes my kernel crashed.
> 
> Shouldn't happen. Backtrace?
> 
> > So I think PRE POST buf
> > size should be changed either.
> >   zhu
> 
> Hmm. I don't really see how this would help.
> Is it true that changing just the RX size fixes the
> crashes for you?
> If yes I'd like to investigate.
> 
> -- 
> MST
> 

(1) when changing RX_SIZE=0x4 and TX_SIZE=0x4, I ran
my testbench for 30 times, there was no kernel crash.
I found sdp worked more stably and fast when I changed
RX and TX size.
(2) when RX_SIZE=0x40 and TX_SIZE=0x40, I could just
run my testbench for several times before kernel
crashed.  
The result is very different for the two cases.  

zhu


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From krkumar2 at in.ibm.com  Mon Sep 11 21:27:50 2006
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Tue, 12 Sep 2006 09:57:50 +0530
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <20060911221633.GB17098@mellanox.co.il>
Message-ID: <OFE49D8CF3.9457DE5E-ON652571E7.0017D094-652571E7.0017E8AC@in.ibm.com>

Hi Michael,

> > The basic problem in the CMA is in cma_alloc_port().  If the port 
number (passed 
> > in as snum) is 0, the first available port starting at 
> > sysctl_local_port_range[0] is used.  We could instead start our search 
by 
> > adding an increasing counter or a random value to the lower-end of the 
port 
> > range.  Then expand the code to handle searching below our starting 
value if we 
> > failed to find one above it.
> 
> Sounds good.
> 
> > Are the port numbers assigned by TCP sequential or more random?
> 
> TCP ports seem to be sequential.

Are you getting sequential port numbers ? inet_csk_get_port() is actually 
using random
number to get the *starting* value between sysctl_local_port_range[0] and
sysctl_local_port_range[2]. Once it gets this starting number, it goes 
sequentially all the
way to the high limit (sysctl*[1]) and then loops back from low 
(sysctl*[0]) limit until all
the numbers in the middle are looked at.

I think we can easily use the same logic. Sean's second option seems to be 
followed
here "> > adding a random value to the lower-end of the port range"

Thanks,

- KK


From krkumar2 at in.ibm.com  Mon Sep 11 21:31:50 2006
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Tue, 12 Sep 2006 10:01:50 +0530
Subject: [openib-general] [PATCH] Optimize cma_process_remove()
In-Reply-To: <4505B068.8050501@ichips.intel.com>
Message-ID: <OF3C82975D.FF08E852-ON652571E7.00171C95-652571E7.0018469A@in.ibm.com>

Hi Sean,

> I don't think that this will work.  The issue is that we need to walk a 
list of 
> IDs associated with a particular device to notify the user that the 
device is 
> being removed.  While we're doing that, the user could try to destroy 
the ID, 
> which removes the ID from the device list.
> 
> The original code takes a reference on the ID before removing it from 
the from 
> cma_dev's list to ensure that the ID will be valid while we process it. 
The 
> remove list ensures that the user is only notified once of a device 
removal. 
> (We don't know where the thread calling rdma_destroy_id() is at.)

Yes, you are right - I missed the parallel rdma_destroy_id's. How about 
something
like this then (it is cleaner than dropping/re-getting locks) :

        mutex_lock(&lock);
        while (!list_empty(&cma_dev->id_list)) {
                id_priv = list_entry(cma_dev->id_list.next,
                                     struct rdma_id_private, list);

                if (cma_internal_listen(id_priv)) {
                        cma_destroy_listen(id_priv);
                } else {
                        atomic_inc(&id_priv->refcount);
                        list_del(&id_priv->list);
                        list_add_tail(&id_priv->list, &remove_list);
                }
        }
        mutex_unlock(&lock);

        list_for_each_entry_safe(id_priv, tmp, &remove_list, list) {
                ret = cma_remove_id_dev(id_priv);
                cma_deref_id(id_priv);
                if (ret)
                        rdma_destroy_id(&id_priv->id);
        }

thanks,

- KK


From bugzilla-daemon at openib.org  Mon Sep 11 21:54:43 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Mon, 11 Sep 2006 21:54:43 -0700 (PDT)
Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread
	on latest processors
Message-ID: <20060912045443.27D7C2283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=229


------- Comment #2 from sweitzen at cisco.com  2006-09-11 21:54 -------
Cisco embedded SM on a switch, thus no SM on a host, only IB drivers.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From sweitzen at cisco.com  Mon Sep 11 22:55:48 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 11 Sep 2006 22:55:48 -0700
Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAF3C@xmb-sjc-216.amer.cisco.com>

When will rc4 be available?  I'd also like to suggest we not rush the
final build, end of this week seems too soon.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

________________________________

	From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Tziporet Koren
	Sent: Thursday, September 07, 2006 1:02 PM
	To: EWG
	Cc: openib
	Subject: [openfabrics-ewg] OFED 1.1 status
	
	
	Hi,

	OFED 1.1 RC4 will be published on Monday 11-Sep.

	We currently work on several showstoppers:

	1.	223: mthca.so not properly linked to libibverbs - Vlad &
Jack 
	2.	221: SRP on V40Z and Sun T4 gets Kernel BUG at
spinlock:118  - Roland 
	3.	219: OFED 1.1rc3 contains prerelease unstable libibverbs
code - Vlad & Jack 

	 
	Thus final release date will be delayed to end of next week

	 
	Tziporet Koren

	Software Director

	Mellanox Technologies

	mailto: tziporet at mellanox.co.il
	Tel +972-4-9097200, ext 380

	 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060911/19569ef7/attachment.html>

From mst at mellanox.co.il  Mon Sep 11 23:01:52 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 09:01:52 +0300
Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad
 thread on latest processors
In-Reply-To: <20060912045443.27D7C2283D4@openib.ca.sandia.gov>
References: <20060912045443.27D7C2283D4@openib.ca.sandia.gov>
Message-ID: <20060912060152.GA14719@mellanox.co.il>

Quoting r. bugzilla-daemon at openib.org <bugzilla-daemon at openib.org>:
> Subject: [Bug 229] heavy CPU load can starve ib_mad thread on latest processors
> 
> http://openib.org/bugzilla/show_bug.cgi?id=229
> 
> 
> 
> 
> 
> ------- Comment #2 from sweitzen at cisco.com  2006-09-11 21:54 -------
> Cisco embedded SM on a switch, thus no SM on a host, only IB drivers.

Looks like we'll add the workaround for ofed.
What renice level are you using?

-- 
MST


From sweitzen at cisco.com  Mon Sep 11 23:02:33 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Mon, 11 Sep 2006 23:02:33 -0700
Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad
 thread on latest processors
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAF3E@xmb-sjc-216.amer.cisco.com>

I only tested with renice -20.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> Sent: Monday, September 11, 2006 11:02 PM
> To: Scott Weitzenkamp (sweitzen)
> Cc: openib-general at openib.org
> Subject: Re: [Bug 229] heavy CPU load can starve ib_mad 
> thread on latest processors
> 
> Quoting r. bugzilla-daemon at openib.org <bugzilla-daemon at openib.org>:
> > Subject: [Bug 229] heavy CPU load can starve ib_mad thread 
> on latest processors
> > 
> > http://openib.org/bugzilla/show_bug.cgi?id=229
> > 
> > 
> > 
> > 
> > 
> > ------- Comment #2 from sweitzen at cisco.com  2006-09-11 21:54 -------
> > Cisco embedded SM on a switch, thus no SM on a host, only 
> IB drivers.
> 
> Looks like we'll add the workaround for ofed.
> What renice level are you using?
> 
> -- 
> MST
> 


From mst at mellanox.co.il  Mon Sep 11 23:09:14 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 09:09:14 +0300
Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad
 thread on latest processors
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAF3E@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAF3E@xmb-sjc-216.amer.cisco.com>
Message-ID: <20060912060914.GC14719@mellanox.co.il>

Hmm, OK.
I'd like to figure out whether this could be something other than a scheduler
issue.
Could you test on kernel 2.6.18 or 2.6.17 please?
If this is a scheduler issue, there's a chance scheduler is more fair there.

Quoting r. Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: RE: [Bug 229] heavy CPU load can starve ib_mad thread on latest processors
> 
> I only tested with renice -20.
> 
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> > Sent: Monday, September 11, 2006 11:02 PM
> > To: Scott Weitzenkamp (sweitzen)
> > Cc: openib-general at openib.org
> > Subject: Re: [Bug 229] heavy CPU load can starve ib_mad 
> > thread on latest processors
> > 
> > Quoting r. bugzilla-daemon at openib.org <bugzilla-daemon at openib.org>:
> > > Subject: [Bug 229] heavy CPU load can starve ib_mad thread 
> > on latest processors
> > > 
> > > http://openib.org/bugzilla/show_bug.cgi?id=229
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ------- Comment #2 from sweitzen at cisco.com  2006-09-11 21:54 -------
> > > Cisco embedded SM on a switch, thus no SM on a host, only 
> > IB drivers.
> > 
> > Looks like we'll add the workaround for ofed.
> > What renice level are you using?
> > 
> > -- 
> > MST
> > 
> 

-- 
MST


From bugzilla-daemon at openib.org  Mon Sep 11 23:14:17 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Mon, 11 Sep 2006 23:14:17 -0700 (PDT)
Subject: [openib-general] [Bug 229] heavy CPU load can starve ib_mad thread
	on latest processors
Message-ID: <20060912061417.B9B2F2283D8@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=229


------- Comment #3 from sweitzen at cisco.com  2006-09-11 23:14 -------
Put email in bugzilla:

Hmm, OK.
I'd like to figure out whether this could be something other than a scheduler
issue.
Could you test on kernel 2.6.18 or 2.6.17 please?
If this is a scheduler issue, there's a chance scheduler is more fair there.

Quoting r. Scott Weitzenkamp (sweitzen) <sweitzen at cisco.com>:
> Subject: RE: [Bug 229] heavy CPU load can starve ib_mad thread on latest processors
> 
> I only tested with renice -20.
> 
> Scott Weitzenkamp
> SQA and Release Manager
> Server Virtualization Business Unit
> Cisco Systems
>  
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> > Sent: Monday, September 11, 2006 11:02 PM
> > To: Scott Weitzenkamp (sweitzen)
> > Cc: openib-general at openib.org
> > Subject: Re: [Bug 229] heavy CPU load can starve ib_mad 
> > thread on latest processors
> > 
> > Quoting r. bugzilla-daemon at openib.org <bugzilla-daemon at openib.org>:
> > > Subject: [Bug 229] heavy CPU load can starve ib_mad thread 
> > on latest processors
> > > 
> > > http://openib.org/bugzilla/show_bug.cgi?id=229
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ------- Comment #2 from sweitzen at cisco.com  2006-09-11 21:54 -------
> > > Cisco embedded SM on a switch, thus no SM on a host, only 
> > IB drivers.
> > 
> > Looks like we'll add the workaround for ofed.
> > What renice level are you using?
> > 
> > -- 
> > MST
> > 
> 


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From k_mahesh85 at yahoo.co.in  Mon Sep 11 23:28:50 2006
From: k_mahesh85 at yahoo.co.in (keshetti mahesh)
Date: Tue, 12 Sep 2006 07:28:50 +0100 (BST)
Subject: [openib-general] reason behind locking the WQs while checking the
 state in modify_qp?
Message-ID: <20060912062851.56875.qmail@web8315.mail.in.yahoo.com>

hello all

recently i have gone through the discussions  how you have decided to split the QP lock in to separate WQ locks and the locking mechanism

http://openib.org/pipermail/openib-general/2005-February/004491.html

in this patch it is mentioned the only place we will be taking the lock is in modify_qp while checking the state of the QP but no description why it is required to do so 

my question is why it is required to lock the WQs. Is there any dependence of the QP state on the posting WRs

-Mahesh


---------------------------------
 Find out what India is talking about on  - Yahoo! Answers India 
 Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060912/94fa7cd8/attachment.html>

From ogerlitz at voltaire.com  Tue Sep 12 00:58:49 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 12 Sep 2006 10:58:49 +0300 (IDT)
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
	rdma_accept
Message-ID: <Pine.LNX.4.64.0609121053140.13564@zuben>

Document the reject sending and modifying qp to error done in rdma_accept

Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>

diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index 402c63d..f932c16 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -237,6 +237,10 @@ int rdma_listen(struct rdma_cm_id *id, i
  * Typically, this routine is only called by the listener to accept a connection
  * request.  It must also be called on the active side of a connection if the
  * user is performing their own QP transitions.
+ *
+ * In the case of error, a reject message is sent to the remote side and the
+ * state of the qp associated with the id is modified to error, such that any
+ * previously posted receive buffers would be flushed.
  */
 int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);


From ogerlitz at voltaire.com  Tue Sep 12 01:33:22 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 12 Sep 2006 11:33:22 +0300
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <4505E130.8010301@ichips.intel.com>
References: <20060907214524.GA14791@mellanox.co.il>
	<4505E130.8010301@ichips.intel.com>
Message-ID: <450670D2.4040805@voltaire.com>

Sean Hefty wrote:
> Michael S. Tsirkin wrote:
>> Sean, did we decide what to do for upstream yet?
>> I would say we need something like the below for 2.6.19 too
>> (probably just need to update node type check).
>> And, I like it that this approach leaves all matters of policy
>> to users (such as whether move QP to RTS after asynchronous event
>> or after completion event).

> I will go with a patch similar to this one.  It seems the most flexible.

Just to make sure, you come to say that you would merge this patch 
instead the one that had the CM track local qp numbers and install a 
callback for the consumer QP to catch the async event etc?

Also i'd like to make sure i follow what would happen:

T1) the consumer gets an rx completion on a QP associated with a non 
established CMA ID

[also on some point along time the async handler is called with  a 
COMM_EST async event for this QP]

T2) the consumer calls rdma_establish()

T3) the consumer cma callback is called with ESTABLISHED event and is 
now able to post sends to the QP

Indeed the **patch** for itself is somehow simpler, but the consumer 
must get established event before posting sends to the qp so they need 
to either queue RX-es or modify the QP to RTS before sending the REP.

As i said before this is fine with our iser target as we queue the sole 
possible RX (login request) till getting the established.

Is rdma_established() --> cm_establish() callable from non interruptible 
context? our target does a context jump once the cq handler is called so 
it does the actual processing in thread level, but there may be other 
consumers attempting to call rdma_establish from the hard-irq cq 
callback context.

Also does the patch ensures only one ESTABLISHED event would be called 
for the id, no matter if rdma_establish() and an RTU reception happen in 
parallel?

>> As a side note, reasons for frequent loss of RTU must be investigated.

> A lost RTU shouldn't be any more likely than a lost REQ or REP.  Is the RTU 
> never showing up?  I will look into the ib_cm and see if there's an issue that 
> would cause an RTU not to be retried.

Indeed, my initial suspect was that heavy CPU load on the server node 
prevents the mad/cm threads to be scheduled in, but as REQ messages do 
appear i also thought we should see if a "retried" REP cause a resend on 
the RTU.


From ogerlitz at voltaire.com  Tue Sep 12 01:43:18 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 12 Sep 2006 11:43:18 +0300
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <44FBC374.8040709@voltaire.com>
References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com>
	<44FBC374.8040709@voltaire.com>
Message-ID: <45067326.5070305@voltaire.com>

Or Gerlitz wrote:
> diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
> index 402c63d..b9e22c8 100644
> --- a/include/rdma/rdma_cm.h
> +++ b/include/rdma/rdma_cm.h
> @@ -117,6 +117,14 @@ struct rdma_cm_id {
>   struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
>   				  void *context, enum rdma_port_space ps);
> 
> +/**
> + * rdma_destroy_id - Destroys an RDMA identifier.
> + *
> + * @id: RDMA identifier.
> + *
> + * Note: calling this function has the effect of canceling in-flight
> + * asynchronous operations associated with the id.
> + */
>   void rdma_destroy_id(struct rdma_cm_id *id);
> 
>   /**

Hi Sean,

Can you queue this for 2.6.19 ?

Or.


From ogerlitz at voltaire.com  Tue Sep 12 01:46:52 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 12 Sep 2006 11:46:52 +0300
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <45067326.5070305@voltaire.com>
References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com>
	<44FBC374.8040709@voltaire.com> <45067326.5070305@voltaire.com>
Message-ID: <450673FC.3000309@voltaire.com>

Or Gerlitz wrote:
> Or Gerlitz wrote:
>> diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
>> index 402c63d..b9e22c8 100644
>> --- a/include/rdma/rdma_cm.h
>> +++ b/include/rdma/rdma_cm.h
>> @@ -117,6 +117,14 @@ struct rdma_cm_id {
>>   struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
>>                     void *context, enum rdma_port_space ps);
>>
>> +/**
>> + * rdma_destroy_id - Destroys an RDMA identifier.
>> + *
>> + * @id: RDMA identifier.
>> + *
>> + * Note: calling this function has the effect of canceling in-flight
>> + * asynchronous operations associated with the id.
>> + */
>>   void rdma_destroy_id(struct rdma_cm_id *id);
>>
>>   /**
> 
> Hi Sean,
> 
> Can you queue this for 2.6.19 ?
> 
> Or.
> 


From krkumar2 at in.ibm.com  Tue Sep 12 02:33:04 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Tue, 12 Sep 2006 15:03:04 +0530
Subject: [openib-general] [RFC] [PATCH] Re: CMA issue : bind selects the
 same port after close
Message-ID: <20060912093304.6648.62748.sendpatchset@K50wks273895wss.in.ibm.com>

> The basic problem in the CMA is in cma_alloc_port().  If the port number
> (passed in as snum) is 0, the first available port starting at
> sysctl_local_port_range[0] is used.  We could instead start our search by
> adding an increasing counter or a random value to the lower-end of the port
> range.  Then expand the code to handle searching below our starting value 
> if we failed to find one above it.

Implement the above method where we start search for port# at a
random offset from the lower-end of the port range, and on failure
search at the lower-end of the port range.

(only compile tested)

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-12 11:25:18.000000000 +0530
+++ new/core/cma.c	2006-09-12 14:28:26.000000000 +0530
@@ -1652,12 +1652,21 @@ static int cma_alloc_port(struct idr *ps
 {
 	struct rdma_bind_list *bind_list;
 	int port, start, ret;
+	int out_of_range;
 
 	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
 	if (!bind_list)
 		return -ENOMEM;
 
-	start = snum ? snum : sysctl_local_port_range[0];
+	if (snum) {
+		start = snum;
+	} else {
+		int low = sysctl_local_port_range[0];
+		int high = sysctl_local_port_range[1];
+
+		get_random_bytes(&start, sizeof start);
+		start = start % (high - low) + low;
+	}
 
 	do {
 		ret = idr_get_new_above(ps, bind_list, start, &port);
@@ -1666,8 +1675,21 @@ static int cma_alloc_port(struct idr *ps
 	if (ret)
 		goto err;
 
-	if ((snum && port != snum) ||
-	    (!snum && port > sysctl_local_port_range[1])) {
+	out_of_range = 0;
+	if (!snum && port > sysctl_local_port_range[1]) {
+		/*
+		 * Couldn't find one from random() off of start, try from
+		 * low.
+		 */
+		idr_remove(ps, port);
+		start = sysctl_local_port_range[0];
+		do {
+			ret = idr_get_new_above(ps, bind_list, start, &port);
+		} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
+		if (port > sysctl_local_port_range[1])
+			out_of_range = 1;
+	}
+	if ((snum && port != snum) || out_of_range) {
 		idr_remove(ps, port);
 		ret = -EADDRNOTAVAIL;
 		goto err;


From mst at mellanox.co.il  Tue Sep 12 02:14:44 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 12:14:44 +0300
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <45067326.5070305@voltaire.com>
References: <000001c6cfd2$737e0bc0$51d8180a@amr.corp.intel.com>
	<44FBC374.8040709@voltaire.com> <45067326.5070305@voltaire.com>
Message-ID: <20060912091443.GA15301@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [PATCH] for-2.6.19 cma: protect against adding device during destruction
> 
> Or Gerlitz wrote:
> > diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
> > index 402c63d..b9e22c8 100644
> > --- a/include/rdma/rdma_cm.h
> > +++ b/include/rdma/rdma_cm.h
> > @@ -117,6 +117,14 @@ struct rdma_cm_id {
> >   struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
> >   				  void *context, enum rdma_port_space ps);
> > 
> > +/**
> > + * rdma_destroy_id - Destroys an RDMA identifier.
> > + *
> > + * @id: RDMA identifier.
> > + *
> > + * Note: calling this function has the effect of canceling in-flight
> > + * asynchronous operations associated with the id.
> > + */
> >   void rdma_destroy_id(struct rdma_cm_id *id);
> > 
> >   /**
> 
> Hi Sean,
> 
> Can you queue this for 2.6.19 ?

You might want to repost, with proper Signed-off-by line, subject and patch description.

Hint: git-applymbox seems to like mail in the following format:

Subject: [PATCH] IB/xx: .....

Short description - goes into message

Signed-off-by: xxxx

---

Long discussion - including requests for inclusion,
etc. Will be ignored by git.

diff .....
diff --git xxxxx
index yyyy
--- zzzzzzzzzzzzz
+++ zzzzzzzzzzzzz
@@ prqs
  Patch itself


Arbitrary discussion - will be ignored by git.

-- 
MST


From halr at voltaire.com  Tue Sep 12 02:35:15 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Sep 2006 05:35:15 -0400
Subject: [openib-general] [PATCH][TRIVIAL] OpenSM: Eliminate unused
 max_port_profile parameter
Message-ID: <1158053698.27427.144058.camel@hal.voltaire.com>

OpenSM: Eliminate unused max_port_profile parameter in OpenSM subnet
options structure

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: include/opensm/osm_subnet.h
===================================================================
--- include/opensm/osm_subnet.h	(revision 9424)
+++ include/opensm/osm_subnet.h	(working copy)
@@ -269,7 +269,6 @@ typedef struct _osm_subn_opt
   boolean_t                console;
   cl_map_t                 port_prof_ignore_guids;
   boolean_t                port_profile_switch_nodes;
-  uint32_t                 max_port_profile;
   osm_pfn_ui_extension_t   pfn_ui_pre_lid_assign;
   void *                   ui_pre_lid_assign_ctx;
   osm_pfn_ui_mcast_extension_t pfn_ui_mcast_fdb_assign;
@@ -405,10 +404,6 @@ typedef struct _osm_subn_opt
 *		If TRUE will count the number of switch nodes routed through
 *		the link. If FALSE - only CA/RT nodes are counted.
 *
-*	max_port_profile
-*		Prevent routing through a port subscribed with more than this
-*		number of routes.
-*
 *	pfn_ui_pre_lid_assign
 *		A UI function to be invoked prior to lid assigment. It should
 *		return 1 if any change was made to any lid or 0 otherwise.
Index: include/opensm/osm_switch.h
===================================================================
--- include/opensm/osm_switch.h	(revision 9347)
+++ include/opensm/osm_switch.h	(working copy)
@@ -1108,7 +1108,6 @@ osm_switch_recommend_path(
 	IN OUT uint16_t *p_num_used_sys,
 	IN OUT uint64_t *remote_node_guids,
 	IN OUT uint16_t *p_num_used_nodes,
-	IN const uint32_t max_routes_subscribed,
 	IN boolean_t      ui_ucast_fdb_assign_func_defined
  );
 /*
@@ -1139,12 +1138,6 @@ osm_switch_recommend_path(
 *  p_num_used_nodes
 *     [in out] The number of remote nodes used for routing to the port.
 *
-*  max_routes_subscribed
-*     [in] The maximum allowed number of target lids routed through 
-*     a specific port of the switch. If the port already assigned 
-*     (in the lfdb) this number of target lids - it will not be used
-*     even if it has the smallest hops count to the target lid.
-*
 *  ui_ucast_fdb_assign_func_defined
 *     [in] If TRUE - this means that there is a ui ucast_fdb_assign table
 *     function defined (in pfn_ui_ucast_fdb_assign in subnet opts). This
Index: opensm/osm_subnet.c
===================================================================
--- opensm/osm_subnet.c	(revision 9423)
+++ opensm/osm_subnet.c	(working copy)
@@ -483,7 +483,6 @@ osm_subn_set_default_opt(
   p_opt->no_qos = FALSE;
   p_opt->accum_log_file = TRUE;
   p_opt->port_profile_switch_nodes = FALSE;
-  p_opt->max_port_profile = 0xffffffff;
   p_opt->pfn_ui_pre_lid_assign = NULL;
   p_opt->ui_pre_lid_assign_ctx = NULL;
   p_opt->pfn_ui_mcast_fdb_assign = NULL;
Index: opensm/osm_switch.c
===================================================================
--- opensm/osm_switch.c	(revision 9427)
+++ opensm/osm_switch.c	(working copy)
@@ -233,7 +233,6 @@ osm_switch_recommend_path(
   IN OUT uint16_t *p_num_used_sys,
   IN OUT uint64_t *remote_node_guids,
   IN OUT uint16_t *p_num_used_nodes,
-  IN const uint32_t max_routes_subscribed,
   IN boolean_t      ui_ucast_fdb_assign_func_defined
   )
 {
@@ -425,8 +424,7 @@ osm_switch_recommend_path(
         /*
           the count is min but also lower then the max subscribed
         */
-        if( (check_count < least_paths) &&
-            (check_count <= max_routes_subscribed))
+        if( check_count < least_paths )
         {
           port_found = TRUE;
           best_port = port_num;
Index: opensm/osm_ucast_mgr.c
===================================================================
--- opensm/osm_ucast_mgr.c	(revision 9347)
+++ opensm/osm_ucast_mgr.c	(working copy)
@@ -281,7 +281,7 @@ __osm_ucast_mgr_dump_ucast_routes(
       best_port = osm_switch_recommend_path(
         p_sw, lid_ho, TRUE,
         NULL, NULL, NULL, NULL, /* No LMC Optimization */
-        0xffffffff, ui_ucast_fdb_assign_func_defined );
+        ui_ucast_fdb_assign_func_defined );
       sprintf( line, "No %u hop path possible via port %u!",
                best_hops, best_port );
       strcat( p_mgr->p_report_buf, line );
@@ -752,12 +752,10 @@ __osm_ucast_mgr_process_port(
       port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing,
                                         remote_sys_guids, &num_used_sys,
                                         remote_node_guids, &num_used_nodes,
-                                        p_mgr->p_subn->opt.max_port_profile,
                                         ui_ucast_fdb_assign_func_defined );
     else
       port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing,
                                         NULL, NULL, NULL, NULL,
-                                        p_mgr->p_subn->opt.max_port_profile,
                                         ui_ucast_fdb_assign_func_defined );
 
     /*


From michael.arndt at informatik.tu-chemnitz.de  Tue Sep 12 04:20:58 2006
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Tue, 12 Sep 2006 13:20:58 +0200
Subject: [openib-general] OpenSM Multiple HCA cards on the same host
Message-ID: <002901c6d65d$858004e0$21606d86@one7>

Hi,

in the osm/docs is mentioned that at the next release multiple HCA cards on 
the same host will be supported. does anybody know when this release comes 
or if there is any other implementation which works for multiple HCA cards. 
Maybe a pre-version is available?

thanks Michael Arndt 


From tziporet at dev.mellanox.co.il  Tue Sep 12 04:32:46 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 12 Sep 2006 14:32:46 +0300
Subject: [openib-general] [openfabrics-ewg] OFED 1.1 status
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAF3C@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EAF3C@xmb-sjc-216.amer.cisco.com>
Message-ID: <45069ADE.3000503@dev.mellanox.co.il>

Scott Weitzenkamp (sweitzen) wrote:
> When will rc4 be available?  I'd also like to suggest we not rush the 
> final build, end of this week seems too soon.
>  
> Scott Weitzenkamp

RC4 will be out today or tomorrow.
Final build is planed for mid-end of next week.

Tziporet


From tziporet at dev.mellanox.co.il  Tue Sep 12 04:45:29 2006
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 12 Sep 2006 14:45:29 +0300
Subject: [openib-general] On vacation on Sep-13 till 12-Oct
Message-ID: <45069DD9.5070506@dev.mellanox.co.il>

Hi,
I am going for a month vacation starting today.
I will not read emails during the vacation. :-)

During my absence the release coordination will be done by Aviram Gutman 
and Vlad Sokolovsky.
Michael Tsirkin will be responsible on approving new patches for OFED 1.1.
Jack will be responsible for all release documents.

"Shana Tova"

Tziporet


From halr at voltaire.com  Tue Sep 12 04:36:03 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Sep 2006 07:36:03 -0400
Subject: [openib-general] OpenSM Multiple HCA cards on the same host
In-Reply-To: <002901c6d65d$858004e0$21606d86@one7>
References: <002901c6d65d$858004e0$21606d86@one7>
Message-ID: <1158060928.27427.147885.camel@hal.voltaire.com>

Hi Michael,

On Tue, 2006-09-12 at 07:20, Michael Arndt wrote:
> Hi,
> 
> in the osm/docs

Which doc ?

BTW, what version of OpenSM are you using ?

>  is mentioned that at the next release multiple HCA cards on 
> the same host will be supported.

If I understand your question correctly, OpenIB OpenSM supports multiple
HCA cards. The HCA port is chosen by specifying the port GUID. If one is
not specified, the first available port (which is LinkUp) is chosen.
Both Mellanox and QLogic HCAs can support running an SM.

>  does anybody know when this release comes 
> or if there is any other implementation which works for multiple HCA cards. 
> Maybe a pre-version is available?

Let me know if your question was different from what I answered above.

-- Hal

> thanks Michael Arndt 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From dotanb at dev.mellanox.co.il  Tue Sep 12 05:15:00 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 12 Sep 2006 15:15:00 +0300
Subject: [openib-general] an example to use of multicast messages over the
 verbs exists in the openib svn
Message-ID: <4506A4C4.9070907@dev.mellanox.co.il>

Hi all.

In 
https://openib.org/svn/trunk/contrib/mellanox/ibtp/gen2/userspace/useraccess/multicast_test 
there is
an example for using multicast messages over the verbs.

This test (for now) don't send any join message to the SA, it only 
attach (and detach) the QP to the multicast group.

Dotan


From thomas.bub at thomson.net  Tue Sep 12 05:23:56 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Tue, 12 Sep 2006 14:23:56 +0200
Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES
	10 machine
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD392@wdtssmail01.eu.thmulti.com>

Just migrated from SLES 9 x86_64 to SLES 10 x86_64 in order to get
32-Bit support.
Stumbled over some installation problems. 
First I tried "All packages" then "Basic install". Both failed to build
at different places. 
Only a "customizied" installation worked.
Find the details blow.

Thomas Bub

An All packages fails at:

gcc
-Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.i
scsi_iser.o.d  -nostdinc -isystem
/usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/include
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
-Iinclude  -Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser  -Wall
-Wundef -Wstrict-prototypes -Wno-trigraphs
-Werror-implicit-function-declaration -fno-strict-aliasing -fno-common
-ffreestanding -Os -fomit-frame-pointer -mtune=generic -m64
-mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks
-Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time
-mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement
-Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib
-I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug  -DMODULE
-D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(iscsi_iser)"
-D"KBUILD_MODNAME=KBUILD_STR(ib_iser)" -c -o
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iscsi
_iser.o
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c: In function 'iscsi_iser_set_param':
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c:478: error: implicit declaration of function 'iscsi_set_param'
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c: At top level:
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c:612: warning: initialization from incompatible pointer type
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c:613: error: 'iscsi_conn_get_param' undeclared here (not in a
function)
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
.c:614: error: 'iscsi_session_get_param' undeclared here (not in a
function)

A Basic install fails at:

gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -Wall
-D_GNU_SOURCE -g -O2 -MT src_ipathverbs_la-ipathverbs.lo -MD -MP -MF
.deps/src_ipathverbs_la-ipathverbs.Tpo -c src/ipathverbs.c  -fPIC -DPIC
-o .libs/src_ipathverbs_la-ipa
thverbs.o
In file included from src/ipathverbs.c:45:
src/ipathverbs.h: In function 'to_ictx':
src/ipathverbs.h:72: warning: implicit declaration of function
'offsetof'
src/ipathverbs.h:72: error: expected expression before 'struct'ib_mthca


My customized installation that works:

ib_verbs
kernel-ib
kernel-ib-devel
libibcm
libibcm-devel
libibverbs
libibverbs-devel
libibverbs-utils
libmthca
libmthca-devel


From mst at mellanox.co.il  Tue Sep 12 05:54:37 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 15:54:37 +0300
Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64
	SLES 10 machine
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD392@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD392@wdtssmail01.eu.thmulti.com>
Message-ID: <20060912125437.GB22369@mellanox.co.il>

Quoting r. Bub Thomas <thomas.bub at thomson.net>:
> Subject: Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine
> 
> Just migrated from SLES 9 x86_64 to SLES 10 x86_64 in order to get
> 32-Bit support.
> Stumbled over some installation problems. 
> First I tried "All packages" then "Basic install". Both failed to build
> at different places. 
> Only a "customizied" installation worked.
> Find the details blow.
> 
> Thomas Bub
> 
> An All packages fails at:
> 
> gcc
> -Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.i
> scsi_iser.o.d  -nostdinc -isystem
> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
> -Iinclude  -Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser  -Wall
> -Wundef -Wstrict-prototypes -Wno-trigraphs
> -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common
> -ffreestanding -Os -fomit-frame-pointer -mtune=generic -m64
> -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks
> -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time
> -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement
> -Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug  -DMODULE
> -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(iscsi_iser)"
> -D"KBUILD_MODNAME=KBUILD_STR(ib_iser)" -c -o
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iscsi
> _iser.o
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c: In function 'iscsi_iser_set_param':
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:478: error: implicit declaration of function 'iscsi_set_param'
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c: At top level:
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:612: warning: initialization from incompatible pointer type
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:613: error: 'iscsi_conn_get_param' undeclared here (not in a
> function)
> /var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:614: error: 'iscsi_session_get_param' undeclared here (not in a
> function)

Or - could you check this please? AFAIK iser should work on this kernel.

> A Basic install fails at:
> 
> gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -Wall
> -D_GNU_SOURCE -g -O2 -MT src_ipathverbs_la-ipathverbs.lo -MD -MP -MF
> .deps/src_ipathverbs_la-ipathverbs.Tpo -c src/ipathverbs.c  -fPIC -DPIC
> -o .libs/src_ipathverbs_la-ipa
> thverbs.o
> In file included from src/ipathverbs.c:45:
> src/ipathverbs.h: In function 'to_ictx':
> src/ipathverbs.h:72: warning: implicit declaration of function
> 'offsetof'
> src/ipathverbs.h:72: error: expected expression before 'struct'ib_mthca

Looks like ipthverbs.h uses offsetof without including stddef.h
Please post fix for trunk and OFED branch.

> My customized installation that works:
> 
> ib_verbs
> kernel-ib
> kernel-ib-devel
> libibcm
> libibcm-devel
> libibverbs
> libibverbs-devel
> libibverbs-utils
> libmthca
> libmthca-devel

-- 
MST


From Richard.Frank at oracle.com  Tue Sep 12 06:24:03 2006
From: Richard.Frank at oracle.com (Richard Frank)
Date: Tue, 12 Sep 2006 09:24:03 -0400
Subject: [openib-general] IPOIB failover ?
Message-ID: <1158067443.11227.207.camel@localhost.localdomain>

Does IPOIB in this stack support transparent fail over between ports and
across redundant HCAs using a "virtual IP" ?


From thomas.bub at thomson.net  Tue Sep 12 06:20:04 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Tue, 12 Sep 2006 15:20:04 +0200
Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64
 SLES 10 machine
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC43F@wdtssmail01.eu.thmulti.com>

Michael,
I don't understand what you mean on the iser trouble.
I'm only a "comsumer" and not actively developing in the openIB world.
I'm having enough trouble with my own application connecting a PowerPC
gen1 from an x86_64 PC gen2 using verbs and cm. ;-)
Thus I haven't installed SVN and can't work on this.
I wanted to let the people know that there are some issues.
Thanks
Thomas


-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Michael S.
Tsirkin
Sent: Tuesday, September 12, 2006 2:55 PM
To: Bub Thomas
Cc: openib-general at openib.org
Subject: Re: [openib-general] Trouble installing OFED-1.1-rc3 on a
x86_64 SLES 10 machine

Quoting r. Bub Thomas <thomas.bub at thomson.net>:
> Subject: Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine
> 
> Just migrated from SLES 9 x86_64 to SLES 10 x86_64 in order to get
> 32-Bit support.
> Stumbled over some installation problems. 
> First I tried "All packages" then "Basic install". Both failed to
build
> at different places. 
> Only a "customizied" installation worked.
> Find the details blow.
> 
> Thomas Bub
> 
> An All packages fails at:
> 
> gcc
>
-Wp,-MD,/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.i
> scsi_iser.o.d  -nostdinc -isystem
> /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
> -Iinclude  -Iinclude2 -I/usr/src/linux-2.6.16.21-0.8/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser  -Wall
> -Wundef -Wstrict-prototypes -Wno-trigraphs
> -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common
> -ffreestanding -Os -fomit-frame-pointer -mtune=generic -m64
> -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks
> -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time
> -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Wdeclaration-after-statement
> -Wno-pointer-sign -I/var/tmp/OFEDRPM/BUILD/openib-1.1/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/include
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/ipoib
> -I/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/debug  -DMODULE
> -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(iscsi_iser)"
> -D"KBUILD_MODNAME=KBUILD_STR(ib_iser)" -c -o
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/.tmp_iscsi
> _iser.o
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c: In function 'iscsi_iser_set_param':
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:478: error: implicit declaration of function 'iscsi_set_param'
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c: At top level:
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:612: warning: initialization from incompatible pointer type
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:613: error: 'iscsi_conn_get_param' undeclared here (not in a
> function)
>
/var/tmp/OFEDRPM/BUILD/openib-1.1/drivers/infiniband/ulp/iser/iscsi_iser
> .c:614: error: 'iscsi_session_get_param' undeclared here (not in a
> function)

Or - could you check this please? AFAIK iser should work on this kernel.

> A Basic install fails at:
> 
> gcc -DHAVE_CONFIG_H -I. -I. -I. -I../libibverbs/include -Wall
> -D_GNU_SOURCE -g -O2 -MT src_ipathverbs_la-ipathverbs.lo -MD -MP -MF
> .deps/src_ipathverbs_la-ipathverbs.Tpo -c src/ipathverbs.c  -fPIC
-DPIC
> -o .libs/src_ipathverbs_la-ipa
> thverbs.o
> In file included from src/ipathverbs.c:45:
> src/ipathverbs.h: In function 'to_ictx':
> src/ipathverbs.h:72: warning: implicit declaration of function
> 'offsetof'
> src/ipathverbs.h:72: error: expected expression before
'struct'ib_mthca

Looks like ipthverbs.h uses offsetof without including stddef.h
Please post fix for trunk and OFED branch.

> My customized installation that works:
> 
> ib_verbs
> kernel-ib
> kernel-ib-devel
> libibcm
> libibcm-devel
> libibverbs
> libibverbs-devel
> libibverbs-utils
> libmthca
> libmthca-devel

-- 
MST

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From mst at mellanox.co.il  Tue Sep 12 06:47:25 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 16:47:25 +0300
Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64
 SLES 10 machine
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD222029AC43F@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC43F@wdtssmail01.eu.thmulti.com>
Message-ID: <20060912134725.GC22369@mellanox.co.il>

Quoting r. Bub Thomas <thomas.bub at thomson.net>:
> Subject: RE: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine
> 
> Michael,
> I don't understand what you mean on the iser trouble.

Or Gerlitz from Voltaire is the iser maintainer. I Cc him.

-- 
MST


From minich at ornl.gov  Tue Sep 12 06:56:23 2006
From: minich at ornl.gov (Makia Minich)
Date: Tue, 12 Sep 2006 09:56:23 -0400
Subject: [openib-general] RDMA question
Message-ID: <C12C34C7.3749%minich@ornl.gov>

I'm looking for some information on whether or not you can set a service
level for RDMA packets (as a way to start working on a QoS design).

So, does anyone:
 * know if this already works?
 * have an example of setting it?
or
 * know if this could possibly work?

Thanks for your help.
-- 
Makia Minich <minich at ornl.gov>
National Center for Computation Science
Oak Ridge National Laboratory


From halr at voltaire.com  Tue Sep 12 07:45:50 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 12 Sep 2006 10:45:50 -0400
Subject: [openib-general] RDMA question
In-Reply-To: <C12C34C7.3749%minich@ornl.gov>
References: <C12C34C7.3749%minich@ornl.gov>
Message-ID: <1158072263.27427.153907.camel@hal.voltaire.com>

Hi Makia,

On Tue, 2006-09-12 at 09:56, Makia Minich wrote:
> I'm looking for some information on whether or not you can set a service
> level for RDMA packets

What API or ULP are you planning on using ? Sounds like you are planning
on using verbs directly. Is this userspace or kernel ?

>  (as a way to start working on a QoS design).

What do you mean by "QoS design" here ?

> So, does anyone:
>  * know if this already works?
>  * have an example of setting it?
> or
>  * know if this could possibly work?

OpenSM (on the trunk or OFED 1.1) supports configuring QoS in a coarse
manner. 

It looks to me like SL is supported in the AH attribute which can be set
for an RC QP so you should be able to do this from user verbs. Not sure
if this has been tried or not.

It can be used with certain ULPs. I've done it with IPoIB. It is
possible with others as well.

-- Hal

> Thanks for your help.


From thomas.bub at thomson.net  Tue Sep 12 07:55:25 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Tue, 12 Sep 2006 16:55:25 +0200
Subject: [openib-general] cmpost establisehd connections are very fragile!?
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD394@wdtssmail01.eu.thmulti.com>

Sean,
got my libibverbs/libibcm code working on SLES9 x86_64 after following
all the tricks in cmpost.c
What I don't understand why the local_cm_response_timeout set to 254
instead of 20 can block IBV_WR_SEND from client to server while the
opposite direction from server to client works!?
You don't have a more detailed description to the libibcm parameters?
There are a lot more that I don't understand. ;-)

After having a running gen2 example I moved to my final distribution
which is SLES 10 x86_64. I have to do this since I have to use a 32 Bit
executable for 32-Bit and 64 Bit machines and this is supported in OFED
from SLES10 onwards.

Coming back to the fragile connection I encountered the same issue where
the client can't do an IBV_WR_SEND to the server. This time both your
cmpost example and my code failed. I tried to reduce the
local_cm_response_timeout to 10 but thuis did not help at all.

All above is done for 64-Bit executables.

Interesting enough the 32 Bit executable of cmpost and my own build on a
x86 SLES9  machine did not have the IBV_WR_SEND trouble.

Thanks in advance for enlighten me. ;-)
Thomas Bub

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060912/fb4205dc/attachment.html>

From vlad at dev.mellanox.co.il  Tue Sep 12 08:14:20 2006
From: vlad at dev.mellanox.co.il (vlad at dev.mellanox.co.il)
Date: Tue, 12 Sep 2006 18:14:20 +0300 (IDT)
Subject: [openib-general] OFED-1.1-rc4 is ready
Message-ID: <30291.194.90.237.34.1158074060.squirrel@dev.mellanox.co.il>

Hi,

OFED-1.1-rc4 is available on
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
File: OFED-1.1-rc4.tgz
Please report any issues in bugzilla http://openib.org/bugzilla/

Schedule reminder:
==================
Next milestone:
Final release is planed for Sep-20.

Owners - please update release notes for final release not later then
Sep-18.


Tziporet & Vlad
------------------------------------------------------------------------
-------------

Release details:
================
Build_id:
OFED-1.1-rc4

openib-1.1 (REV=9435)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace
Git: git://www.mellanox.co.il/~git/infiniband
ref: refs/heads/ofed_1_1
commit 796b6cb83392fd840549e3b6e559dfce022a2c49

# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm

OS support:
===========
Novell:
     - SLES 9.0 SP3
     - SLES10
Redhat:
     - Redhat EL4 up3

     - Redhat EL4 up4
kernel.org:
     - Kernel 2.6.17

Note: Redhat EL4 up2, Fedora C4 and SuSE Pro 10 were dropped from the
list. We keep the backport patches for these OSes and make sure OFED
compile and loaded properly but will not do full QA cycle.

Note: Kernel components were updated to 2.6.18-rc6

Systems:
========
     * x86_64
     * x86
     * ia64
     * ppc64

Bug fixes from OFED-1.1-rc3:
============================
 1. SDP: Data corruption fix
 2. libibverbs was reverted to 1.0 version (bug 219)
 3. libsdp: TCP_RR fix
 4. Compilation on kernel 2.6.18-rcX is failing
 5. OSU MPI: fix failure in Intel tests
 6. SRP: Kernel oops in case of port down
 7. ib_uverbs fails to load on ia64 (bug 222)
 8. IPoIB: Spinlock corruption in stress tests
 9. Added srp_daemon service, enable from /etc/infiniband/openib.conf
10. mthca.so not properly linked to libibverbs (bug 223)
11. ipath compilation problem on SLES10 (bug 226)
12. problem with MPI: Get_processor_name on MVAPICH (bug 226)
13. Add an option to renice the ib_mad thread to highest priority. Enable
from /etc/infiniband/openib.conf (workarouund for bug 229)
14. Update for ehca driver
15. Update for ipath driver
16. Madaye installation using OPENIB_PARAMS. To build madeye run:
    export OPENIB_PARAMS="--with-madeye-mod" (or put it into ofed.conf file
    for unattended installation) and run install.sh
17. OFED sources: Added kernel include files under:
<prefix/src/openib/include>.
    Can be used by kerenl modules, and already include the backport
pathces for each kernel.
18. ibutils - updated with new flags (-P, -pc and -pm)
19. SDP: RTU packet is lost Accept call blocks even if client connected.


Limitations and known issues:
=============================
1. SDP: For Mellanox Sinai HCAs one must use latest FW version (1.1.000).
2. SDP: Scalability issue when hundreds of connections are opened
3. ipath driver is not supported on SLES9 SP3
4. ehca driver supports only PPC machines and compiled on kernel 2.6.18
5. OFED installation fails on PPC64 with SLES9.


From mshefty at ichips.intel.com  Tue Sep 12 08:22:09 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 08:22:09 -0700
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <450670D2.4040805@voltaire.com>
References: <20060907214524.GA14791@mellanox.co.il>
	<4505E130.8010301@ichips.intel.com> <450670D2.4040805@voltaire.com>
Message-ID: <4506D0A1.7060405@ichips.intel.com>

Or Gerlitz wrote:
> Just to make sure, you come to say that you would merge this patch 
> instead the one that had the CM track local qp numbers and install a 
> callback for the consumer QP to catch the async event etc?

correct

> Indeed the **patch** for itself is somehow simpler, but the consumer 
> must get established event before posting sends to the qp so they need 
> to either queue RX-es or modify the QP to RTS before sending the REP.

The first patch only allows the option of waiting for the established event.

> Is rdma_established() --> cm_establish() callable from non interruptible 
> context?

Yes

> Also does the patch ensures only one ESTABLISHED event would be called 
> for the id, no matter if rdma_establish() and an RTU reception happen in 
> parallel?

Yes

- Sean


From mshefty at ichips.intel.com  Tue Sep 12 08:34:22 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 08:34:22 -0700
Subject: [openib-general] [PATCH] Optimize cma_process_remove()
In-Reply-To: <OF3C82975D.FF08E852-ON652571E7.00171C95-652571E7.0018469A@in.ibm.com>
References: <OF3C82975D.FF08E852-ON652571E7.00171C95-652571E7.0018469A@in.ibm.com>
Message-ID: <4506D37E.9040802@ichips.intel.com>

Krishna Kumar2 wrote:
>         mutex_lock(&lock);
>         while (!list_empty(&cma_dev->id_list)) {
>                 id_priv = list_entry(cma_dev->id_list.next,
>                                      struct rdma_id_private, list);
> 
>                 if (cma_internal_listen(id_priv)) {
>                         cma_destroy_listen(id_priv);
>                 } else {
>                         atomic_inc(&id_priv->refcount);
>                         list_del(&id_priv->list);
>                         list_add_tail(&id_priv->list, &remove_list);
>                 }
>         }
>         mutex_unlock(&lock);
> 
>         list_for_each_entry_safe(id_priv, tmp, &remove_list, list) {
>                 ret = cma_remove_id_dev(id_priv);
>                 cma_deref_id(id_priv);
>                 if (ret)
>                         rdma_destroy_id(&id_priv->id);
>         }

I believe that this has the same issue.  If a user tries to destroy an 
rdma_cm_id, it will remove itself from the "device list".  (This is why the ID's 
are moved to a new list, so that the removal still works.)  In the code above, 
destroy thread(s) will remove ID(s) from the remove_list while we're trying to 
walk it.

- Sean


From mshefty at ichips.intel.com  Tue Sep 12 08:41:18 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 08:41:18 -0700
Subject: [openib-general] cmpost establisehd connections are very
	fragile!?
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD394@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD394@wdtssmail01.eu.thmulti.com>
Message-ID: <4506D51E.8050107@ichips.intel.com>

Bub Thomas wrote:
> What I don’t understand why the local_cm_response_timeout set to 254 
> instead of 20 can block IBV_WR_SEND from client to server while the 
> opposite direction from server to client works!?

local_cm_response_timeout is a 5-bit value.  It's 4.096 x 2 ^ 
local_cm_response_timeout micro-seconds if that helps any.

> You don’t have a more detailed description to the libibcm parameters? 
> There are a lot more that I don’t understand. ;-)

You will need to refer to the IB spec, sections 12.6 and 12.7 for descriptions.

- Sean


From mshefty at ichips.intel.com  Tue Sep 12 08:44:37 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 08:44:37 -0700
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
 rdma_accept
In-Reply-To: <Pine.LNX.4.64.0609121053140.13564@zuben>
References: <Pine.LNX.4.64.0609121053140.13564@zuben>
Message-ID: <4506D5E5.2010602@ichips.intel.com>

Or Gerlitz wrote:
> + * In the case of error, a reject message is sent to the remote side and the
> + * state of the qp associated with the id is modified to error, such that any
> + * previously posted receive buffers would be flushed.

Hmm... this makes me question whether this is what it should be doing.  Is there 
any reason not to reject the connection if accept fails?

- Sean


From sean.hefty at intel.com  Tue Sep 12 09:03:33 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 09:03:33 -0700
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <450673FC.3000309@voltaire.com>
Message-ID: <000101c6d684$ff9f87b0$d8248686@amr.corp.intel.com>

>> Can you queue this for 2.6.19 ?

Roland, can you pull this patch in for 2.6.19?  It's SVN check-in 9273.
---

Clarify that rdma_destroy_id cancels outstanding asynchronous operations on the
Associated id.

Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty at intel.com>

Index: rdma_cm.h
===================================================================
--- rdma_cm.h	(revision 9272)
+++ rdma_cm.h	(revision 9273)
@@ -126,6 +126,14 @@
 struct rdma_cm_id *rdma_create_id(rdma_cm_event_handler event_handler,
 				  void *context, enum rdma_port_space ps);
 
+/**
+  * rdma_destroy_id - Destroys an RDMA identifier.
+  *
+  * @id: RDMA identifier.
+  *
+  * Note: calling this function has the effect of canceling in-flight
+  * asynchronous operations associated with the id.
+  */
 void rdma_destroy_id(struct rdma_cm_id *id);
 
 /**


From mshefty at ichips.intel.com  Tue Sep 12 09:09:22 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 09:09:22 -0700
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
	cases.
In-Reply-To: <20060911175312.GC15556@mellanox.co.il>
References: <20060908051313.5223.80022.sendpatchset@K50wks273895wss.in.ibm.com>
	<20060910111145.GA12111@mellanox.co.il>
	<4505A08E.5000705@ichips.intel.com>
	<20060911175312.GC15556@mellanox.co.il>
Message-ID: <4506DBB2.5020400@ichips.intel.com>

Michael S. Tsirkin wrote:
>>The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a 
>>second call is not made to rdma_connect after the first call fails.  So we're 
>>probably safe deferring this until 2.6.19, unless someone has code which calls 
>>rdma_connect twice.
> 
> SDP can do this I think.

To clarify, SDP would need to do something like:

ret = rdma_connect(id_7471 ...)
if (ret)
	rdma_connect(id_7471 ...)

The same ID would need to be used twice.

- Sean


From rdreier at cisco.com  Tue Sep 12 09:13:55 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 09:13:55 -0700
Subject: [openib-general] [PATCH] for-2.6.19 cma: protect against adding
 device during destruction
In-Reply-To: <000101c6d684$ff9f87b0$d8248686@amr.corp.intel.com> (Sean
	Hefty's message of "Tue, 12 Sep 2006 09:03:33 -0700")
References: <000101c6d684$ff9f87b0$d8248686@amr.corp.intel.com>
Message-ID: <adau03d12rg.fsf@cisco.com>

Thanks, applied.


From sean.hefty at intel.com  Tue Sep 12 09:19:36 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 09:19:36 -0700
Subject: [openib-general] an example to use of multicast messages over
 the verbs exists in the openib svn
In-Reply-To: <4506A4C4.9070907@dev.mellanox.co.il>
Message-ID: <000201c6d687$3d808640$d8248686@amr.corp.intel.com>

>This test (for now) don't send any join message to the SA, it only
>attach (and detach) the QP to the multicast group.

I posted a simple multicast test program that uses the proposed libibsa
interface in:

http://openib.org/pipermail/openib-general/2006-August/025433.html

(See the program at the bottom of the message.)  Combined with the kernel
support, this will result in sending join messages to the SA.

- Sean


From rdreier at cisco.com  Tue Sep 12 09:28:44 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 09:28:44 -0700
Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs
In-Reply-To: <20060911235446.GB19021@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 12 Sep 2006 02:54:46 +0300")
References: <adamz962dgm.fsf@cisco.com> <20060911235446.GB19021@mellanox.co.il>
Message-ID: <adalkop122r.fsf@cisco.com>

    Roland> My gut reaction is that it seems pretty ugly.

    Michael> Hmm. All of it or just some bits?

Well, the idea of pushing timewait handling down into the low-level
drivers seems strange to me.  I don't think any other stack or any
other OS does anything like this.

    Michael> Could be a library function in core so that ipath etc can
    Michael> reuse it.  But note how there's no dependency between
    Michael> drivers here - no reason to block change in mthca until
    Michael> ipath/ehca implement this functionality, too.

I guess the only thing would be that we should implement this for
mthca to maximize the amount ipath/ehca can reuse when they implement this.

    Michael> Not entirely corect. Please look at 9.7.1 - search for
    Michael> "stale packets":

OK, this is somewhat convincing...

    Michael> I don't see how this limits the rate of QP
    Michael> creation. Could you explain?

Once all QPs are tied up in timewait state, then new QPs can only be
created as old QPs leave timewait.  Probably there are enough QPs and
timewait is short enough that this won't be a problem in practice, but
it's the same idea in theory as a busy server running out of fds
because of sockets in timewait state.

 - R.


From caitlinb at broadcom.com  Tue Sep 12 09:31:13 2006
From: caitlinb at broadcom.com (Caitlin Bestler)
Date: Tue, 12 Sep 2006 09:31:13 -0700
Subject: [openib-general] RDMA question
In-Reply-To: <C12C34C7.3749%minich@ornl.gov>
References: <C12C34C7.3749%minich@ornl.gov>
Message-ID: <469958e00609120931n56b58444r86b0473b4bb79651@mail.gmail.com>

On 9/12/06, Makia Minich <minich at ornl.gov> wrote:
> I'm looking for some information on whether or not you can set a service
> level for RDMA packets (as a way to start working on a QoS design).
>

Transport independent QoS is not truly feasible. You'll have to
apply QoS to the underlying transport (IB or IP) using IB or IP
tools and concepts.

You *can* identify one or more transport neutral Classes
of Service, and then have your application layer select
that class of service. But translating the class of service
to actual network controls will always be transport specific.

So the short answer is that you don't set a service level
for RDMA packets, you set a service level for IB or IP
packets that happen to carry RDMA.


From sean.hefty at intel.com  Tue Sep 12 09:41:12 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 09:41:12 -0700
Subject: [openib-general] RFC: mthca: implement timewait by tracking QPNs
In-Reply-To: <adalkop122r.fsf@cisco.com>
Message-ID: <000301c6d68a$42028490$d8248686@amr.corp.intel.com>

>Well, the idea of pushing timewait handling down into the low-level
>drivers seems strange to me.  I don't think any other stack or any
>other OS does anything like this.

I think the Windows IB stack may do something similar.

The difficulty is doing this at a higher level is that the QP must be destroyed
in order for the CQs / PD to be destroyed.  The IB CM already tracks timewait,
so could notify drivers when a QP can be re-used, but that requires connections
to go through the CM.  For stale connection handling, that may be a good thing
anyway, but there's nothing enforcing this today.

- Sean


From jlentini at netapp.com  Tue Sep 12 11:15:42 2006
From: jlentini at netapp.com (James Lentini)
Date: Tue, 12 Sep 2006 14:15:42 -0400 (EDT)
Subject: [openib-general] [PATCH DAPLTEST] - compile failure on
	FC5/X86_64
In-Reply-To: <1156859264.31129.12.camel@stevo-desktop>
References: <1156859264.31129.12.camel@stevo-desktop>
Message-ID: <Pine.LNX.4.64.0609121415270.23580@jlentini-linux.nane.netapp.com>


On Tue, 29 Aug 2006, Steve Wise wrote:

> Dunno if this is the correct fix for all platforms/distros, but it 
> worked for me on FC5/X86_64...  CLK_TCK wasn't getting defined for 
> this distro...

Committed in revision 9442.


From mst at mellanox.co.il  Tue Sep 12 11:24:42 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 12 Sep 2006 21:24:42 +0300
Subject: [openib-general] [PATCH] cma_connect_ib leaks memory in failure
	cases.
In-Reply-To: <4506DBB2.5020400@ichips.intel.com>
References: <4506DBB2.5020400@ichips.intel.com>
Message-ID: <20060912182442.GA23428@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] cma_connect_ib leaks memory in failure cases.
> 
> Michael S. Tsirkin wrote:
> >>The ib_cm_id will be cleaned up if the rdma_cm_id is destroyed, as long as a 
> >>second call is not made to rdma_connect after the first call fails.  So we're 
> >>probably safe deferring this until 2.6.19, unless someone has code which calls 
> >>rdma_connect twice.
> > 
> > SDP can do this I think.
> 
> To clarify, SDP would need to do something like:
> 
> ret = rdma_connect(id_7471 ...)
> if (ret)
> 	rdma_connect(id_7471 ...)
> 
> The same ID would need to be used twice.
> 
> - Sean
> 

Sure - if connect on socket fails, application can retry.

-- 
MST


From rdreier at cisco.com  Tue Sep 12 13:26:21 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 13:26:21 -0700
Subject: [openib-general] user-mode data strucures
In-Reply-To: <1158092339.9296.20.camel@trinity.ogc.int> (Tom Tucker's
	message of "Tue, 12 Sep 2006 15:18:59 -0500")
References: <1158092339.9296.20.camel@trinity.ogc.int>
Message-ID: <adawt88zv9u.fsf@cisco.com>

    Tom> In working with the Intel compilers recently, however, I've
    Tom> found that this compiler attempts to align data structures on
    Tom> boundaries that are native to the data types. So uint64_t's
    Tom> are aligned on a 64b boundary.  This is an issue for
    Tom> ibv_recv_wr and ibv_send_wr because they are immediately
    Tom> preceded by a *next ptr which is 32b on 32b architectures.

Ugh.

How about swapping wr_id and next in the ibv_recv_wr and ibv_send_wr
structures?  I hate adding __attribute__((packed)) because it ruins
things on ia64 et al.

 - R.


From tom at opengridcomputing.com  Tue Sep 12 13:18:59 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Tue, 12 Sep 2006 15:18:59 -0500
Subject: [openib-general] user-mode data strucures
Message-ID: <1158092339.9296.20.camel@trinity.ogc.int>

Roland:

The user-mode data structures do not include specific alignment
instructions to compilers. This all works great provided that the
libraries and the applications are built using the same compiler.

In working with the Intel compilers recently, however, I've found that
this compiler attempts to align data structures on boundaries that are
native to the data types. So uint64_t's are aligned on a 64b boundary.
This is an issue for ibv_recv_wr and ibv_send_wr because they are
immediately preceded by a *next ptr which is 32b on 32b architectures.

To make a long story short, I've added __attribute__((packed)) to my
locally installed header files to make this work, but we should probably
either pad these data structures internally or explicitly pack them like
I'm doing now. 

What do people think?

Tom


From tom at opengridcomputing.com  Tue Sep 12 14:45:15 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Tue, 12 Sep 2006 16:45:15 -0500
Subject: [openib-general] user-mode data strucures
In-Reply-To: <adawt88zv9u.fsf@cisco.com>
References: <1158092339.9296.20.camel@trinity.ogc.int>
	<adawt88zv9u.fsf@cisco.com>
Message-ID: <1158097515.9296.25.camel@trinity.ogc.int>

On Tue, 2006-09-12 at 13:26 -0700, Roland Dreier wrote:
>     Tom> In working with the Intel compilers recently, however, I've
>     Tom> found that this compiler attempts to align data structures on
>     Tom> boundaries that are native to the data types. So uint64_t's
>     Tom> are aligned on a 64b boundary.  This is an issue for
>     Tom> ibv_recv_wr and ibv_send_wr because they are immediately
>     Tom> preceded by a *next ptr which is 32b on 32b architectures.
> 
> Ugh.
> 
> How about swapping wr_id and next in the ibv_recv_wr and ibv_send_wr
> structures?  I hate adding __attribute__((packed)) because it ruins
> things on ia64 et al.
> 

I think that just moves the alignment issue to the first word of the sge
since it's first element is a uint64_t. I think the only thing that
works across the board without packing is to #if __BITS_IN_WORD==32 add
a pad word after *next. erf...ugly code. 


>  - R.


From Brian.Cain at ge.com  Tue Sep 12 14:50:24 2006
From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare))
Date: Tue, 12 Sep 2006 17:50:24 -0400
Subject: [openib-general] [PATCH] leak in *_pingpong.c?
Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033E84C59@CINMLVEM11.e2k.ad.ge.com>

Be gentle, it's my first patch submission.  :)

The following is untested, but it looks like it's probably pretty
trivial.


Index: examples/rc_pingpong.c
===================================================================
--- examples/rc_pingpong.c	(revision 9442)
+++ examples/rc_pingpong.c	(working copy)
@@ -143,6 +143,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(servername, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for %s:%d\n", gai_strerror(n),
servername, port);
@@ -209,6 +210,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(NULL, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for port %d\n", gai_strerror(n),
port);
Index: examples/srq_pingpong.c
===================================================================
--- examples/srq_pingpong.c	(revision 9442)
+++ examples/srq_pingpong.c	(working copy)
@@ -154,6 +154,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(servername, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for %s:%d\n", gai_strerror(n),
servername, port);
@@ -233,6 +234,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(NULL, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for port %d\n", gai_strerror(n),
port);
Index: examples/uc_pingpong.c
===================================================================
--- examples/uc_pingpong.c	(revision 9442)
+++ examples/uc_pingpong.c	(working copy)
@@ -131,6 +131,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(servername, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for %s:%d\n", gai_strerror(n),
servername, port);
@@ -197,6 +198,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(NULL, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for port %d\n", gai_strerror(n),
port);
Index: examples/ud_pingpong.c
===================================================================
--- examples/ud_pingpong.c	(revision 9442)
+++ examples/ud_pingpong.c	(working copy)
@@ -132,6 +132,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(servername, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for %s:%d\n", gai_strerror(n),
servername, port);
@@ -198,6 +199,7 @@
 
 	asprintf(&service, "%d", port);
 	n = getaddrinfo(NULL, service, &hints, &res);
+	free(service);
 
 	if (n < 0) {
 		fprintf(stderr, "%s for port %d\n", gai_strerror(n),
port);

--
-Brian 


From somenath at veritas.com  Tue Sep 12 15:08:22 2006
From: somenath at veritas.com (somenath)
Date: Tue, 12 Sep 2006 15:08:22 -0700
Subject: [openib-general] HCAs with and without memory
In-Reply-To: <4503B42C.60405@dev.mellanox.co.il>
References: <a94efc20609080319w2fa92499lee9cfb3758bdaa13@mail.gmail.com>
	<4503B42C.60405@dev.mellanox.co.il>
Message-ID: <45072FD6.3090802@veritas.com>

is there any performance difference observed between memFree and 
non-memFree HCAs?

thanks, som.

Dotan Barak wrote:

>Hi john.
>
>john t wrote:
>  
>
>>Hi OpenIB group,
>> 
>>What is the difference between HCAs with memory and without memory. 
>>How is the on-board memory used by HCAs? Is it that data is first 
>>copied into this memory and then into physical memory?
>> 
>>Regards,
>>John T.
>>    
>>
>
>If you are asking about Mellanox HCAs i can answer you:
>
>The difference is the technology which those HCAs are using:
>The HCAs without the attached memory are using the memfree technology.
>
>The main difference between the 2 HCAs is where the context of the 
>various resources is located: in the host memory or in the attached memory.
>
>The data itself (during data movement) is not stored in this memory at 
>any point in the attached memory.
>
>Dotan
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>  
>


From rdreier at cisco.com  Tue Sep 12 15:21:28 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 15:21:28 -0700
Subject: [openib-general] user-mode data strucures
In-Reply-To: <1158097515.9296.25.camel@trinity.ogc.int> (Tom Tucker's
	message of "Tue, 12 Sep 2006 16:45:15 -0500")
References: <1158092339.9296.20.camel@trinity.ogc.int>
	<adawt88zv9u.fsf@cisco.com> <1158097515.9296.25.camel@trinity.ogc.int>
Message-ID: <adasliwzpxz.fsf@cisco.com>

    Tom> I think that just moves the alignment issue to the first word
    Tom> of the sge since it's first element is a uint64_t. I think
    Tom> the only thing that works across the board without packing is
    Tom> to #if __BITS_IN_WORD==32 add a pad word after
    Tom> *next. erf...ugly code.

Actually I think we're OK -- the sg_list member is a pointer, which will
be 32 bits too.  It looks to me like ibv_send_wr has an even number of
32-bit quantities before the union, so everything should pack naturally.


From tom at opengridcomputing.com  Tue Sep 12 15:27:58 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Tue, 12 Sep 2006 17:27:58 -0500
Subject: [openib-general] user-mode data strucures
In-Reply-To: <adasliwzpxz.fsf@cisco.com>
References: <1158092339.9296.20.camel@trinity.ogc.int>
	<adawt88zv9u.fsf@cisco.com> <1158097515.9296.25.camel@trinity.ogc.int>
	<adasliwzpxz.fsf@cisco.com>
Message-ID: <1158100078.9296.29.camel@trinity.ogc.int>

On Tue, 2006-09-12 at 15:21 -0700, Roland Dreier wrote:
>     Tom> I think that just moves the alignment issue to the first word
>     Tom> of the sge since it's first element is a uint64_t. I think
>     Tom> the only thing that works across the board without packing is
>     Tom> to #if __BITS_IN_WORD==32 add a pad word after
>     Tom> *next. erf...ugly code.
> 
> Actually I think we're OK -- the sg_list member is a pointer, which will
> be 32 bits too.  It looks to me like ibv_send_wr has an even number of
> 32-bit quantities before the union, so everything should pack naturally.

Oops, you're right, I'm dumb... We're done then.


From rdreier at cisco.com  Tue Sep 12 15:47:29 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 15:47:29 -0700
Subject: [openib-general] user-mode data strucures
In-Reply-To: <1158100078.9296.29.camel@trinity.ogc.int> (Tom Tucker's
	message of "Tue, 12 Sep 2006 17:27:58 -0500")
References: <1158092339.9296.20.camel@trinity.ogc.int>
	<adawt88zv9u.fsf@cisco.com> <1158097515.9296.25.camel@trinity.ogc.int>
	<adasliwzpxz.fsf@cisco.com> <1158100078.9296.29.camel@trinity.ogc.int>
Message-ID: <adaodtkzoqm.fsf@cisco.com>

OK, I checked the corresponding change into the libibverbs 1.1 devel
tree.  I'm not sure how to fix this in libibverbs 1.0 without
affecting the ABI though...

 - R.


From tom at opengridcomputing.com  Tue Sep 12 16:10:36 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Tue, 12 Sep 2006 18:10:36 -0500
Subject: [openib-general] user-mode data strucures
In-Reply-To: <adaodtkzoqm.fsf@cisco.com>
References: <1158092339.9296.20.camel@trinity.ogc.int>
	<adawt88zv9u.fsf@cisco.com> <1158097515.9296.25.camel@trinity.ogc.int>
	<adasliwzpxz.fsf@cisco.com> <1158100078.9296.29.camel@trinity.ogc.int>
	<adaodtkzoqm.fsf@cisco.com>
Message-ID: <1158102636.9296.34.camel@trinity.ogc.int>


I'm OK with a work-around in the near term. BTW, how do we correlate
libibverbs 1.x with cat /sys/class/infiniband/verbs/abi_version?

On Tue, 2006-09-12 at 15:47 -0700, Roland Dreier wrote:
> OK, I checked the corresponding change into the libibverbs 1.1 devel
> tree.  I'm not sure how to fix this in libibverbs 1.0 without
> affecting the ABI though...
> 
>  - R.


From mshefty at ichips.intel.com  Tue Sep 12 16:17:11 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 16:17:11 -0700
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <20060911222956.GD17098@mellanox.co.il>
References: <20060907214524.GA14791@mellanox.co.il>
	<4505E130.8010301@ichips.intel.com>
	<20060911222956.GD17098@mellanox.co.il>
Message-ID: <45073FF7.7020506@ichips.intel.com>

Michael S. Tsirkin wrote:
>>>As a side note, reasons for frequent loss of RTU must be investigated.
>>
>>A lost RTU shouldn't be any more likely than a lost REQ or REP.  Is the RTU 
>>never showing up?
> 
> 
> Seems like that. I know fir sure I do accept after REP but remote side never
> gets ESTABLISHED.

I looked at the code, then ran some tests.  The REP is retried until an RTU is 
received, or its number of retries is exhausted.  By modifying the IB CM, I was 
able to force RTU drops.  Using madeye, I could see that the REP would be 
retried, resulting in the RTU being resent.  After 4 drops, I had the code 
receive the RTU, which allowed the test to proceed.

A couple things to look at in OFED would be the setting of max cm retries and 
the cm timeout.

- Sean


From ralphc at pathscale.com  Tue Sep 12 17:40:10 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Tue, 12 Sep 2006 17:40:10 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
Message-ID: <1158108010.8759.192.camel@brick.pathscale.com>

Problem:

The IB kernel to IB device driver interface uses dma_map_single()
and dma_map_sg() to allocate device bus addresses for HW DMA.
These bus addresses are passed to the IB device driver via ib_post_send()
and ib_post_recv().

The ib_ipath driver needs kernel virtual addresses in order to be able
to copy data to/from the posted work requests since it does not
use HW DMA. It currently relies on the mapping being one-to-one
and cannot reasonably reverse the mapping when an IOMMU is present.

History:

I first proposed modifying the dma_* routines to allow a device
driver to interpose on the function calls.  This was not well
received by the Linux kernel maintainers since it would have too
much impact on the current code.

I also tried proposing adding a flag to the ib_device structure
and modifying the kernel IB code to check the flag and pass
either the dma_*() mapped address or a kernel virtual address.
This works OK for kmalloc() buffers where dma_map_single() is
being called but doesn't work well for SRP which has lists
of physical pages and calls dma_map_sg().
It also means that the kernel IB layer needs to explicitly handle
two different kinds of addresses.

Current Proposal:

My current proposal is to provide wrapper routines for the
dma_*() routines which only the IB kernel code would use.
These ib_dma_*() variants would allow a device driver to interpose
on the call and do appropriate code to convert the kernel virtual
or physical page addresses to something the device driver can handle.
For ib_mthca and ib_ehca, these would result in the corresponding
dma_*() routine being called. For ib_ipath, a different implementation
would be needed.

My expectation is that this would add little overhead, be easy to
explain and document, and would be straightforward to convert existing
code to the new convention (see sample patch below).

I would like to get some consensus that this is an acceptable
approach before I spend a bunch of time developing it further.

Index: ib_verbs.h
===================================================================
--- ib_verbs.h	(revision 9441)
+++ ib_verbs.h	(working copy)
@@ -43,6 +43,7 @@
 
 #include <linux/types.h>
 #include <linux/device.h>
+#include <linux/dma-mapping.h>
 
 #include <asm/atomic.h>
 #include <asm/scatterlist.h>
@@ -984,6 +985,19 @@ struct ib_device {
 						  struct ib_grh *in_grh,
 						  struct ib_mad *in_mad,
 						  struct ib_mad *out_mad);
+	int                        (*mapping_error)(dma_addr_t dma_addr);
+	dma_addr_t                 (*map_single)(struct device *hwdev,
+						 void *ptr, size_t size,
+						 int direction);
+	void                       (*unmap_single)(struct device *dev,
+						   dma_addr_t addr,
+						   size_t size, int direction);
+	int                        (*map_sg)(struct device *hwdev,
+					     struct scatterlist *sg,
+					     int nents, int direction);
+	void                       (*unmap_sg)(struct device *hwdev,
+					       struct scatterlist *sg,
+					       int nents, int direction);
 
 	struct module               *owner;
 	struct class_device          class_dev;
@@ -1392,6 +1406,64 @@ static inline int ib_req_ncomp_notif(str
 struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
 
 /**
+ * ib_dma_mapping_error -
+ */
+static inline int ib_dma_mapping_error(struct ib_device *dev,
+				       dma_addr_t dma_addr)
+{
+	return dev->mapping_error ?
+		dev->mapping_error(dma_addr) : dma_mapping_error(dma_addr);
+}
+
+/**
+ * ib_dma_map_single -
+ */
+static inline dma_addr_t ib_dma_map_single(struct ib_device *dev,
+					   void *cpu_addr, size_t size,
+					   enum dma_data_direction direction)
+{
+	return dev->map_single ?
+		dev->map_single(dev, cpu_addr, size, direction) :
+		dma_map_single(dev->dma_device, cpu_addr, size, direction);
+}
+
+/**
+ * ib_dma_unmap_single -
+ */
+static inline void ib_dma_unmap_single(struct ib_device *dev,
+				       dma_addr_t addr, size_t size,
+				       enum dma_data_direction direction)
+{
+	dev->unmap_single ?
+		dev->unmap_single(dev, addr, size, direction) :
+		dma_unmap_single(dev->dma_device, addr, size, direction);
+}
+
+/**
+ * ib_dma_map_sg -
+ */
+static inline dma_addr_t ib_dma_map_sg(struct ib_device *dev,
+				       struct scatterlist *sg, int nents,
+				       enum dma_data_direction direction)
+{
+	return dev->map_sg ?
+		dev->map_sg(dev, sg, nents, direction) :
+		dma_map_sg(dev->dma_device, sg, nents, direction);
+}
+
+/**
+ * ib_dma_unmap_sg -
+ */
+static inline void ib_dma_unmap_sg(struct ib_device *dev,
+				   struct scatterlist *sg, int nents,
+				   enum dma_data_direction direction)
+{
+	dev->unmap_sg ?
+		dev->unmap_sg(dev, sg, nents, direction) :
+		dma_unmap_sg(dev->dma_device, sg, nents, direction);
+}
+
+/**
  * ib_reg_phys_mr - Prepares a virtually addressed memory region for use
  *   by an HCA.
  * @pd: The protection domain associated assigned to the registered region.


From tom at opengridcomputing.com  Tue Sep 12 18:10:17 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Tue, 12 Sep 2006 20:10:17 -0500
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <OFE49D8CF3.9457DE5E-ON652571E7.0017D094-652571E7.0017E8AC@in.ibm.com>
Message-ID: <C12CC4A9.8A2D%tom@opengridcomputing.com>


There is a whole array of Linux port management services that perform
exactly the logic that you are trying to emulate. Wouldn't our efforts be
more productively spent figuring out to use the existing services you are
currently trying to emulate? What do you do, for example, when the port
allocation policy in the kernel changes? Change your emulation?

I completely understand that the existing port management services are not
exported, but functionally, they support multiple port spaces, show up in
netstat, etc... Can someone please explain to me the reluctance to use these
services in favor of replicating them?

Sorry if this reads as a rant...but I feel we're on the wrong track...

On 9/11/06 11:27 PM, "Krishna Kumar2" <krkumar2 at in.ibm.com> wrote:

> Hi Michael,
> 
>>> The basic problem in the CMA is in cma_alloc_port().  If the port
> number (passed 
>>> in as snum) is 0, the first available port starting at
>>> sysctl_local_port_range[0] is used.  We could instead start our search
> by 
>>> adding an increasing counter or a random value to the lower-end of the
> port 
>>> range.  Then expand the code to handle searching below our starting
> value if we 
>>> failed to find one above it.
>> 
>> Sounds good.
>> 
>>> Are the port numbers assigned by TCP sequential or more random?
>> 
>> TCP ports seem to be sequential.
> 
> Are you getting sequential port numbers ? inet_csk_get_port() is actually
> using random
> number to get the *starting* value between sysctl_local_port_range[0] and
> sysctl_local_port_range[2]. Once it gets this starting number, it goes
> sequentially all the
> way to the high limit (sysctl*[1]) and then loops back from low
> (sysctl*[0]) limit until all
> the numbers in the middle are looked at.
> 
> I think we can easily use the same logic. Sean's second option seems to be
> followed
> here "> > adding a random value to the lower-end of the port range"
> 
> Thanks,
> 
> - KK
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From rjwalsh at pathscale.com  Tue Sep 12 20:01:54 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 12 Sep 2006 20:01:54 -0700
Subject: [openib-general] ibv_driver_init renamed?
Message-ID: <450774A2.8080402@pathscale.com>

Somewhere between OFED-1.1-RC3 and -RC4, the ibv_driver_init function 
was renamed to openib_driver_init.  We at QLogic were aware this change 
was being made and so now our user verbs support does not work at all in 
RC4.  Why did something like this happen between two release candidates?

Regards,
  Robert.


From jgunthorpe at obsidianresearch.com  Tue Sep 12 20:10:54 2006
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 12 Sep 2006 21:10:54 -0600
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
Message-ID: <20060913031054.GA4464@obsidianresearch.com>

On Tue, Sep 12, 2006 at 05:40:10PM -0700, Ralph Campbell wrote:

> The ib_ipath driver needs kernel virtual addresses in order to be able
> to copy data to/from the posted work requests since it does not
> use HW DMA. It currently relies on the mapping being one-to-one
> and cannot reasonably reverse the mapping when an IOMMU is present.

I'm sure this must have been answered, but given a PCI
domain:bus:device:function tuple and a DMA address, shouldn't any
effects of an IOMMU be easially duplicated in software to result in a
cpu-bus physical address? Ie on AMD64 it is just a matter of following
the GART tables in software - assuming the address in question hits
the GART region (which for ipath, I expect, it never would)

Jason


From rdreier at cisco.com  Tue Sep 12 20:15:53 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 20:15:53 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com> (Ralph
	Campbell's message of "Tue, 12 Sep 2006 17:40:10 -0700")
References: <1158108010.8759.192.camel@brick.pathscale.com>
Message-ID: <adaejugzcba.fsf@cisco.com>

 > My current proposal is to provide wrapper routines for the
 > dma_*() routines which only the IB kernel code would use.
 > These ib_dma_*() variants would allow a device driver to interpose
 > on the call and do appropriate code to convert the kernel virtual
 > or physical page addresses to something the device driver can handle.
 > For ib_mthca and ib_ehca, these would result in the corresponding
 > dma_*() routine being called. For ib_ipath, a different implementation
 > would be needed.

Seems like the least-bad way forward.

A few comments on the proposed implementation:

 > @@ -984,6 +985,19 @@ struct ib_device {
 >  						  struct ib_grh *in_grh,
 >  						  struct ib_mad *in_mad,
 >  						  struct ib_mad *out_mad);
 > +	int                        (*mapping_error)(dma_addr_t dma_addr);
 > +	dma_addr_t                 (*map_single)(struct device *hwdev,
 > +						 void *ptr, size_t size,
 > +						 int direction);
 > +	void                       (*unmap_single)(struct device *dev,
 > +						   dma_addr_t addr,
 > +						   size_t size, int direction);
 > +	int                        (*map_sg)(struct device *hwdev,
 > +					     struct scatterlist *sg,
 > +					     int nents, int direction);
 > +	void                       (*unmap_sg)(struct device *hwdev,
 > +					       struct scatterlist *sg,
 > +					       int nents, int direction);

First of all I would put all this into a "struct ib_dma_ops" or
something like that, so struct ib_device can have just a member like

	struct ib_dma_ops	*dma_ops;

That keeps the definition of struct ib_device from getting too much
more gigantic, and also makes it easy for the core to export a
standard dma_ops pointer that devices that use the default
implementation can use.

Why not make the DMA operations take a struct ib_device * instead of a
struct device *?  I think that would actually clean up the consumer
code, and it would make it easier for ipath -- otherwise you have to
find your way back from the struct device *.

Also, I think you will need a few more methods.  <asm-x86_64/dma-mapping.h>
has a definition of DMA operations that might be useful to refer too.
But for example SRP uses at least dma_sync_single_for_cpu() and
dma_sync_single_for_device().  Actually that might be the only extra
method needed for now.

 - R.


From rdreier at cisco.com  Tue Sep 12 20:19:22 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 20:19:22 -0700
Subject: [openib-general] ibv_driver_init renamed?
In-Reply-To: <450774A2.8080402@pathscale.com> (Robert Walsh's message of
	"Tue, 12 Sep 2006 20:01:54 -0700")
References: <450774A2.8080402@pathscale.com>
Message-ID: <ada64fszc5h.fsf@cisco.com>

I just replied to the other copy of this email:

Because OFED 1.1-rc2 and -rc3 inadvertently contained libibverbs code
taken from the unstable unreleased libibverbs 1.1 tree.  -rc4 reverted
back to the stable libibverbs 1.0 code.

http://openib.org/bugzilla/show_bug.cgi?id=219 has more details.


From rdreier at cisco.com  Tue Sep 12 20:21:37 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 12 Sep 2006 20:21:37 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <20060913031054.GA4464@obsidianresearch.com> (Jason
	Gunthorpe's message of "Tue, 12 Sep 2006 21:10:54 -0600")
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<20060913031054.GA4464@obsidianresearch.com>
Message-ID: <ada1wqgzc1q.fsf@cisco.com>

    Jason> I'm sure this must have been answered, but given a PCI
    Jason> domain:bus:device:function tuple and a DMA address,
    Jason> shouldn't any effects of an IOMMU be easially duplicated in
    Jason> software to result in a cpu-bus physical address? Ie on
    Jason> AMD64 it is just a matter of following the GART tables in
    Jason> software - assuming the address in question hits the GART
    Jason> region (which for ipath, I expect, it never would)

Yes, you could do this.  However there's no exported interface for
reversing a DMA mapping.  And it seems like a lot of unneeded
complexity to add -- why not just avoid the DMA mapping in the first place?

 - R.


From mst at mellanox.co.il  Tue Sep 12 20:57:20 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 06:57:20 +0300
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <C12CC4A9.8A2D%tom@opengridcomputing.com>
References: <C12CC4A9.8A2D%tom@opengridcomputing.com>
Message-ID: <20060913035720.GA20225@mellanox.co.il>

Quoting r. Tom Tucker <tom at opengridcomputing.com>:
> Subject: Re: [openib-general] CMA issue: bind selects the same port after close
> 
> 
> There is a whole array of Linux port management services that perform
> exactly the logic that you are trying to emulate. Wouldn't our efforts be
> more productively spent figuring out to use the existing services you are
> currently trying to emulate? What do you do, for example, when the port
> allocation policy in the kernel changes? Change your emulation?
> 
> I completely understand that the existing port management services are not
> exported, but functionally, they support multiple port spaces, show up in
> netstat, etc... Can someone please explain to me the reluctance to use these
> services in favor of replicating them?
> 
> Sorry if this reads as a rant...but I feel we're on the wrong track...

Hmm.

inet_csk_get_port actually *is* exported, and while it might be hard for CMA to
use it (needs struct sock*), maybe it is easy for SDP.

So, possibly we should just leave the CMA port allocation as is,
and enhance SDP to use inet_csk_get_port.

-- 
MST


From mst at mellanox.co.il  Tue Sep 12 21:14:38 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 07:14:38 +0300
Subject: [openib-general] [Bug 232] SLES10 PPC64: uverbs_mem.c fails to
 link due to missing HPAGE_SHIFT
In-Reply-To: <20060913040938.734472283D4@openib.ca.sandia.gov>
References: <20060913040938.734472283D4@openib.ca.sandia.gov>
Message-ID: <20060913041438.GC20225@mellanox.co.il>

Probably not exported. Look at ia64 work around.

Quoting r. bugzilla-daemon at openib.org <bugzilla-daemon at openib.org>:
Subject: [Bug 232] SLES10 PPC64: uverbs_mem.c fails to link due to missing HPAGE_SHIFT

http://openib.org/bugzilla/show_bug.cgi?id=232


------- Comment #1 from bos at pathscale.com  2006-09-12 21:09 -------
I see HPAGE_SHIFT in /proc/kallsyms.  This gets weirder.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

-- 
MST


From sean.hefty at intel.com  Tue Sep 12 21:39:29 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 12 Sep 2006 21:39:29 -0700
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <20060913035720.GA20225@mellanox.co.il>
Message-ID: <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com>

>> I completely understand that the existing port management services are not
>> exported, but functionally, they support multiple port spaces, show up in
>> netstat, etc... Can someone please explain to me the reluctance to use these
>> services in favor of replicating them?

My reluctance to use the existing port spaces is that we're not guaranteed to
run TCP or IP.  I'm happy to map the address spaces, but that's not the same as
using those addresses when you're not using that protocol.

>inet_csk_get_port actually *is* exported, and while it might be hard for CMA to
>use it (needs struct sock*), maybe it is easy for SDP.

I did look at this, but the use of struck sock made it extremely difficult for
the CMA to use the existing calls.

>So, possibly we should just leave the CMA port allocation as is,
>and enhance SDP to use inet_csk_get_port.

That sounds reasonable.

- Sean


From bos at pathscale.com  Tue Sep 12 21:51:18 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Tue, 12 Sep 2006 21:51:18 -0700
Subject: [openib-general] [Bug 232] SLES10 PPC64: uverbs_mem.c fails to
 link due to missing HPAGE_SHIFT
In-Reply-To: <20060913041438.GC20225@mellanox.co.il>
References: <20060913040938.734472283D4@openib.ca.sandia.gov>
	<20060913041438.GC20225@mellanox.co.il>
Message-ID: <1158123078.30173.13.camel@sardonyx>

On Wed, 2006-09-13 at 07:14 +0300, Michael S. Tsirkin wrote:
> Probably not exported. Look at ia64 work around.

That's right; it's not exported.  I don't see any sign of a possible
workaround for powerpc, though; none of the necessary stuff is exported.

I'm inclined to think that this patch and the hpage backport patch
should probably be dropped.

	<b


From mst at mellanox.co.il  Tue Sep 12 22:23:41 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 08:23:41 +0300
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <45073FF7.7020506@ichips.intel.com>
References: <20060907214524.GA14791@mellanox.co.il>
	<4505E130.8010301@ichips.intel.com>
	<20060911222956.GD17098@mellanox.co.il>
	<45073FF7.7020506@ichips.intel.com>
Message-ID: <20060913052341.GD20225@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] IB/cma: add rdma_establish
> 
> Michael S. Tsirkin wrote:
> >>>As a side note, reasons for frequent loss of RTU must be investigated.
> >>
> >>A lost RTU shouldn't be any more likely than a lost REQ or REP.  Is the RTU 
> >>never showing up?
> > 
> > 
> > Seems like that. I know fir sure I do accept after REP but remote side never
> > gets ESTABLISHED.
> 
> I looked at the code, then ran some tests.  The REP is retried until an RTU is 
> received, or its number of retries is exhausted.  By modifying the IB CM, I was 
> able to force RTU drops.  Using madeye, I could see that the REP would be 
> retried, resulting in the RTU being resent.  After 4 drops, I had the code 
> receive the RTU, which allowed the test to proceed.
> 
> A couple things to look at in OFED would be the setting of max cm retries and 
> the cm timeout.
> 
> - Sean

OFED uses CMA from upstream kernel. If default parameters there
are inappropriate, maybe should fix them?

BTW, how about the idea of exporting max cm retries in transport-independent
header?

-- 
MST


From mst at mellanox.co.il  Tue Sep 12 22:24:40 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 08:24:40 +0300
Subject: [openib-general] CMA issue: bind selects the same port after
	close
In-Reply-To: <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com>
References: <20060913035720.GA20225@mellanox.co.il>
	<000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com>
Message-ID: <20060913052440.GE20225@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> >So, possibly we should just leave the CMA port allocation as is,
> >and enhance SDP to use inet_csk_get_port.
> 
> That sounds reasonable.

OK, so this needs looking into.

-- 
MST


From krkumar2 at in.ibm.com  Tue Sep 12 22:37:53 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Wed, 13 Sep 2006 11:07:53 +0530
Subject: [openib-general] [PATCH] Optimize cma_process_remove()
Message-ID: <20060913053753.5539.76298.sendpatchset@localhost.localdomain>

Hi Sean,

> I believe that this has the same issue.  If a user tries to destroy an 
> rdma_cm_id, it will remove itself from the "device list".  (This is why 
the ID's 
> are moved to a new list, so that the removal still works.)  In the code 
above, 
> destroy thread(s) will remove ID(s) from the remove_list while we're 
trying to 
> walk it.

Thanks for the explanation. So a list_del_init() would be the best
thing to do. Another option is to add a remove_list to rdma_id_private
by which this entry could be added to a local remove_list and traversed
without holding a lock, but it doesn't make sense to add that for one case.

Does the following patch look OK ?

Thanks,

- KK

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-13 10:56:56.000000000 +0530
+++ new/core/cma.c	2006-09-13 10:57:20.000000000 +0530
@@ -2332,12 +2332,9 @@ static int cma_remove_id_dev(struct rdma
 
 static void cma_process_remove(struct cma_device *cma_dev)
 {
-	struct list_head remove_list;
 	struct rdma_id_private *id_priv;
 	int ret;
 
-	INIT_LIST_HEAD(&remove_list);
-
 	mutex_lock(&lock);
 	while (!list_empty(&cma_dev->id_list)) {
 		id_priv = list_entry(cma_dev->id_list.next,
@@ -2348,8 +2345,7 @@ static void cma_process_remove(struct cm
 			continue;
 		}
 
-		list_del(&id_priv->list);
-		list_add_tail(&id_priv->list, &remove_list);
+		list_del_init(&id_priv->list);
 		atomic_inc(&id_priv->refcount);
 		mutex_unlock(&lock);
 

From mst at mellanox.co.il  Tue Sep 12 22:55:29 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 08:55:29 +0300
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
Message-ID: <20060913055529.GF20225@mellanox.co.il>


Quoting r. Ralph Campbell <ralphc at pathscale.com>:
> Subject: How to support IOMMUs for ipath driver
> 
> Problem:
> 
> The IB kernel to IB device driver interface uses dma_map_single()
> and dma_map_sg() to allocate device bus addresses for HW DMA.
> These bus addresses are passed to the IB device driver via ib_post_send()
> and ib_post_recv().
> 
> The ib_ipath driver needs kernel virtual addresses in order to be able
> to copy data to/from the posted work requests since it does not
> use HW DMA. It currently relies on the mapping being one-to-one
> and cannot reasonably reverse the mapping when an IOMMU is present.

Actually, Mellanox HCAs support DMA, but they *also* have the ability to copy
data to the posted work requests - this is the INLINE flag which we only
implemented in userspace - but not in kernel, since we could not get at the
kernel virtual address. It is actually useful for reducing latency for small
messages.

I wander whether ehca also can benefit from this capability. Anyone?

Unfortunately, the API you propose is not flexible enough in this respect
as it still does not seem to allow this optimization in kernel.
If we are changing the API, I would like the new API to be flexible
enough to enable this optimization.

I am not exactly sure what the best way to do that would be.
How about making it possible for ULPs to pass in kernel virtual address
in post send?

-- 
MST


From eitan at mellanox.co.il  Tue Sep 12 22:58:56 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Wed, 13 Sep 2006 08:58:56 +0300
Subject: [openib-general] [PATCH][TRIVIAL] OpenSM: Eliminate unused
 max_port_profile parameter
In-Reply-To: <1158053698.27427.144058.camel@hal.voltaire.com>
References: <1158053698.27427.144058.camel@hal.voltaire.com>
Message-ID: <45079E20.303@mellanox.co.il>

Hi Hal,

Thanks for leaning this up.

Eitan

Hal Rosenstock wrote:

>OpenSM: Eliminate unused max_port_profile parameter in OpenSM subnet
>options structure
>
>Signed-off-by: Hal Rosenstock <halr at voltaire.com>
>
>Index: include/opensm/osm_subnet.h
>===================================================================
>--- include/opensm/osm_subnet.h	(revision 9424)
>+++ include/opensm/osm_subnet.h	(working copy)
>@@ -269,7 +269,6 @@ typedef struct _osm_subn_opt
>   boolean_t                console;
>   cl_map_t                 port_prof_ignore_guids;
>   boolean_t                port_profile_switch_nodes;
>-  uint32_t                 max_port_profile;
>   osm_pfn_ui_extension_t   pfn_ui_pre_lid_assign;
>   void *                   ui_pre_lid_assign_ctx;
>   osm_pfn_ui_mcast_extension_t pfn_ui_mcast_fdb_assign;
>@@ -405,10 +404,6 @@ typedef struct _osm_subn_opt
> *		If TRUE will count the number of switch nodes routed through
> *		the link. If FALSE - only CA/RT nodes are counted.
> *
>-*	max_port_profile
>-*		Prevent routing through a port subscribed with more than this
>-*		number of routes.
>-*
> *	pfn_ui_pre_lid_assign
> *		A UI function to be invoked prior to lid assigment. It should
> *		return 1 if any change was made to any lid or 0 otherwise.
>Index: include/opensm/osm_switch.h
>===================================================================
>--- include/opensm/osm_switch.h	(revision 9347)
>+++ include/opensm/osm_switch.h	(working copy)
>@@ -1108,7 +1108,6 @@ osm_switch_recommend_path(
> 	IN OUT uint16_t *p_num_used_sys,
> 	IN OUT uint64_t *remote_node_guids,
> 	IN OUT uint16_t *p_num_used_nodes,
>-	IN const uint32_t max_routes_subscribed,
> 	IN boolean_t      ui_ucast_fdb_assign_func_defined
>  );
> /*
>@@ -1139,12 +1138,6 @@ osm_switch_recommend_path(
> *  p_num_used_nodes
> *     [in out] The number of remote nodes used for routing to the port.
> *
>-*  max_routes_subscribed
>-*     [in] The maximum allowed number of target lids routed through 
>-*     a specific port of the switch. If the port already assigned 
>-*     (in the lfdb) this number of target lids - it will not be used
>-*     even if it has the smallest hops count to the target lid.
>-*
> *  ui_ucast_fdb_assign_func_defined
> *     [in] If TRUE - this means that there is a ui ucast_fdb_assign table
> *     function defined (in pfn_ui_ucast_fdb_assign in subnet opts). This
>Index: opensm/osm_subnet.c
>===================================================================
>--- opensm/osm_subnet.c	(revision 9423)
>+++ opensm/osm_subnet.c	(working copy)
>@@ -483,7 +483,6 @@ osm_subn_set_default_opt(
>   p_opt->no_qos = FALSE;
>   p_opt->accum_log_file = TRUE;
>   p_opt->port_profile_switch_nodes = FALSE;
>-  p_opt->max_port_profile = 0xffffffff;
>   p_opt->pfn_ui_pre_lid_assign = NULL;
>   p_opt->ui_pre_lid_assign_ctx = NULL;
>   p_opt->pfn_ui_mcast_fdb_assign = NULL;
>Index: opensm/osm_switch.c
>===================================================================
>--- opensm/osm_switch.c	(revision 9427)
>+++ opensm/osm_switch.c	(working copy)
>@@ -233,7 +233,6 @@ osm_switch_recommend_path(
>   IN OUT uint16_t *p_num_used_sys,
>   IN OUT uint64_t *remote_node_guids,
>   IN OUT uint16_t *p_num_used_nodes,
>-  IN const uint32_t max_routes_subscribed,
>   IN boolean_t      ui_ucast_fdb_assign_func_defined
>   )
> {
>@@ -425,8 +424,7 @@ osm_switch_recommend_path(
>         /*
>           the count is min but also lower then the max subscribed
>         */
>-        if( (check_count < least_paths) &&
>-            (check_count <= max_routes_subscribed))
>+        if( check_count < least_paths )
>         {
>           port_found = TRUE;
>           best_port = port_num;
>Index: opensm/osm_ucast_mgr.c
>===================================================================
>--- opensm/osm_ucast_mgr.c	(revision 9347)
>+++ opensm/osm_ucast_mgr.c	(working copy)
>@@ -281,7 +281,7 @@ __osm_ucast_mgr_dump_ucast_routes(
>       best_port = osm_switch_recommend_path(
>         p_sw, lid_ho, TRUE,
>         NULL, NULL, NULL, NULL, /* No LMC Optimization */
>-        0xffffffff, ui_ucast_fdb_assign_func_defined );
>+        ui_ucast_fdb_assign_func_defined );
>       sprintf( line, "No %u hop path possible via port %u!",
>                best_hops, best_port );
>       strcat( p_mgr->p_report_buf, line );
>@@ -752,12 +752,10 @@ __osm_ucast_mgr_process_port(
>       port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing,
>                                         remote_sys_guids, &num_used_sys,
>                                         remote_node_guids, &num_used_nodes,
>-                                        p_mgr->p_subn->opt.max_port_profile,
>                                         ui_ucast_fdb_assign_func_defined );
>     else
>       port = osm_switch_recommend_path( p_sw, lid_ho, ignore_existing,
>                                         NULL, NULL, NULL, NULL,
>-                                        p_mgr->p_subn->opt.max_port_profile,
>                                         ui_ucast_fdb_assign_func_defined );
> 
>     /*
>
>
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From mst at mellanox.co.il  Tue Sep 12 23:25:18 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 09:25:18 +0300
Subject: [openib-general] OFED-1.1-rc4 is ready
In-Reply-To: <1158125915.30173.27.camel@sardonyx>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EB437@xmb-sjc-216.amer.cisco.com>
	<1158125915.30173.27.camel@sardonyx>
Message-ID: <20060913062518.GL20225@mellanox.co.il>

Quoting r. Bryan O'Sullivan <bos at pathscale.com>:
> > the ibv_driver_init function was changed to openib_driver_init. 
>
> By the way, I find it unsettling that the current libibverbs internal
> ABI allows silent breakage like this that cannot be detected except at
> runtime, and then only when the right hardware is present.
> 
> Mind you, I don't have any better suggestions in mind (at least not at
> 10:30pm).
> 
> But I worry about the possibility this leaves open for botched field
> upgrades breaking userspace in you-don't-find-out-until-it's-too-late
> ways when libibverbs 1.1 starts being used.

libipathverbs can simply export both ibv_driver_init and
openib_driver_init like libmthca does, that's what we'll do for OFED.

Or maybe Doug here can come up with some symbol versioning trick.
Dough?

-- 
MST


From erezz at voltaire.com  Wed Sep 13 01:20:37 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 13 Sep 2006 11:20:37 +0300
Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64
 SLES 10 machine
In-Reply-To: <20060912134725.GC22369@mellanox.co.il>
References: <B79FAF8BB536314E859EA1963CFFD222029AC43F@wdtssmail01.eu.thmulti.com>
	<20060912134725.GC22369@mellanox.co.il>
Message-ID: <4507BF55.8010507@voltaire.com>


Michael S. Tsirkin wrote:
> Quoting r. Bub Thomas <thomas.bub at thomson.net>:
>   
>> Subject: RE: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine
>>
>> Michael,
>> I don't understand what you mean on the iser trouble.
>>     
>
> Or Gerlitz from Voltaire is the iser maintainer. I Cc him.
>
>   
It seems that the iSER version in OFED-1.1-rc3 is not compatible with 
open-iscsi in SLES 10. I will take a look in it.

Michael - I'm taking responsibility on iSER from Or Gerlitz. Can you cc 
me on this kind of e-mails in the future?

Erez


From ogerlitz at voltaire.com  Wed Sep 13 01:26:47 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 13 Sep 2006 11:26:47 +0300
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
 rdma_accept
In-Reply-To: <4506D5E5.2010602@ichips.intel.com>
References: <Pine.LNX.4.64.0609121053140.13564@zuben>
	<4506D5E5.2010602@ichips.intel.com>
Message-ID: <4507C0C7.3020600@voltaire.com>

Sean Hefty wrote:
> Or Gerlitz wrote:
>> + * In the case of error, a reject message is sent to the remote side 
>> and the
>> + * state of the qp associated with the id is modified to error, such 
>> that any
>> + * previously posted receive buffers would be flushed.
> 
> Hmm... this makes me question whether this is what it should be doing.  
> Is there any reason not to reject the connection if accept fails?

I think this (sending REJ, modifying the QP to ERROR) is exactly what it 
should be doing. Why would someone count/expect that a REJ would not be 
  sent in this case? Even if for some reason which i don't see now we 
will do some change here (eg let the ULP send the REJ etc), lets have 
this patch which document what we have now merged for 2.6.19.

Or.


From ogerlitz at voltaire.com  Wed Sep 13 02:00:50 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 13 Sep 2006 12:00:50 +0300
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
Message-ID: <4507C8C2.6050206@voltaire.com>

Ralph Campbell wrote:
> Problem:
> 
> The IB kernel to IB device driver interface uses dma_map_single()
> and dma_map_sg() to allocate device bus addresses for HW DMA.
> These bus addresses are passed to the IB device driver via ib_post_send()
> and ib_post_recv().
> 
> The ib_ipath driver needs kernel virtual addresses in order to be able
> to copy data to/from the posted work requests since it does not
> use HW DMA. It currently relies on the mapping being one-to-one
> and cannot reasonably reverse the mapping when an IOMMU is present.

Oops, please note that one can get through the DMA api a DMA address for 
a page which is currently **not** mapped into the kernel virtual address 
space (that is page_address(p) is NULL), so you must add kmap and kunmap 
into your fast RX/TX code path.

Examples for scenarios when this happen i can think of are Direct I/O 
and some sort of pre-fetching done by File-System. Some pages present in 
a kernel SG which needs to be sent/received/RDMA-ed over IB need not be 
mapped into the kernel virtual address space.

As for RDMA, please note that the problem has two faces, the remote 
device which does the RDMA or the local device does RDMA from/to and 
second, the local device.

Since you need to be able interop between devices that support DMA 
mappings to ones which do not, how do you suggest to manage the 
addresses for the following schemes (1 stands for device supporting DMA 
addresses and 0 for device which does not)

<1,1>
<1,0>
<0,1>
<0,0>

Please assume for the purpose of discussion that each side knows the 
polarity of the remote side?

After writing the section on RDMA i think i might went to the wrong 
direction since ipath emulates RDMA in SW, can you shed some light on this?

> I also tried proposing adding a flag to the ib_device structure
> and modifying the kernel IB code to check the flag and pass
> either the dma_*() mapped address or a kernel virtual address.
> This works OK for kmalloc() buffers where dma_map_single() is
> being called but doesn't work well for SRP which has lists
> of physical pages and calls dma_map_sg().
> It also means that the kernel IB layer needs to explicitly handle
> two different kinds of addresses.

Just a note, its not just SRP there... its any ulp which needs to move 
over IB data present bunch of pages (eg packed in a kernel SG list), 
namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc.

Or.


From ogerlitz at voltaire.com  Wed Sep 13 02:13:16 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 13 Sep 2006 12:13:16 +0300
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <4506D0A1.7060405@ichips.intel.com>
References: <20060907214524.GA14791@mellanox.co.il>
	<4505E130.8010301@ichips.intel.com> <450670D2.4040805@voltaire.com>
	<4506D0A1.7060405@ichips.intel.com>
Message-ID: <4507CBAC.40008@voltaire.com>

Sean Hefty wrote:
> Or Gerlitz wrote:
>> Just to make sure, you come to say that you would merge this patch 
>> instead the one that had the CM track local qp numbers and install a 
>> callback for the consumer QP to catch the async event etc?
> 
> correct
> 
>> Indeed the **patch** for itself is somehow simpler, but the consumer 
>> must get established event before posting sends to the qp so they need 
>> to either queue RX-es or modify the QP to RTS before sending the REP.
> 
> The first patch only allows the option of waiting for the established 
> event.
> 
>> Is rdma_established() --> cm_establish() callable from non 
>> interruptible context?
> 
> Yes
> 
>> Also does the patch ensures only one ESTABLISHED event would be called 
>> for the id, no matter if rdma_establish() and an RTU reception happen 
>> in parallel?
> 
> Yes

OK, thanks for all the clarifications.

Or.


From mst at mellanox.co.il  Wed Sep 13 02:31:29 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 12:31:29 +0300
Subject: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64
	SLES 10 machine
In-Reply-To: <4507BF55.8010507@voltaire.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC43F@wdtssmail01.eu.thmulti.com>
	<20060912134725.GC22369@mellanox.co.il> <4507BF55.8010507@voltaire.com>
Message-ID: <20060913093129.GH22222@mellanox.co.il>

Quoting r. Erez Zilber <erezz at voltaire.com>:
> Subject: Re: Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine
> 
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Bub Thomas <thomas.bub at thomson.net>:
> >   
> >> Subject: RE: [openib-general] Trouble installing OFED-1.1-rc3 on a x86_64 SLES 10 machine
> >>
> >> Michael,
> >> I don't understand what you mean on the iser trouble.
> >>     
> >
> > Or Gerlitz from Voltaire is the iser maintainer. I Cc him.
> >
> >   
> It seems that the iSER version in OFED-1.1-rc3 is not compatible with 
> open-iscsi in SLES 10. I will take a look in it.

Please do - note we need a patch today to make it into (hopefully last) RC.

> Michael - I'm taking responsibility on iSER from Or Gerlitz. Can you cc 
> me on this kind of e-mails in the future?
> 
> Erez

Sure.

-- 
MST


From halr at voltaire.com  Wed Sep 13 03:21:12 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 06:21:12 -0400
Subject: [openib-general] [PATCH] libibmad: Add sa_rpc_call API
Message-ID: <1158142806.27427.193351.camel@hal.voltaire.com>

libibmad: Add sa_rpc_call API

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: libibmad/include/infiniband/mad.h
===================================================================
--- libibmad/include/infiniband/mad.h	(revision 9425)
+++ libibmad/include/infiniband/mad.h	(working copy)
@@ -748,6 +748,8 @@ safe_smp_set(void *rcvbuf, ib_portid_t *
 /* sa.c */
 uint8_t * sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa,
 		  uint timeout);
+uint8_t * sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid,
+		      ib_sa_call_t *sa, uint timeout);
 int	ib_path_query(ib_gid_t srcgid, ib_gid_t destgid, ib_portid_t *sm_id,
 		      void *buf);	/* returns lid */
 
Index: libibmad/src/libibmad.map
===================================================================
--- libibmad/src/libibmad.map	(revision 9425)
+++ libibmad/src/libibmad.map	(working copy)
@@ -1,4 +1,4 @@
-IBMAD_1.1 {
+IBMAD_1.2 {
 	global:
 		_mad_dump;
 		_mad_dump_field;
@@ -79,6 +79,7 @@ IBMAD_1.1 {
 		madrpc_unlock;
 		ib_path_query;
 		sa_call;
+		sa_rpc_call;
 		mad_alloc;
 		mad_free;
 		mad_receive;
Index: libibmad/src/sa.c
===================================================================
--- libibmad/src/sa.c	(revision 9425)
+++ libibmad/src/sa.c	(working copy)
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2004,2005 Voltaire Inc.  All rights reserved.
+ * Copyright (c) 2004-2006 Voltaire Inc.  All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -50,7 +50,8 @@
 #define DEBUG 	if (ibdebug)	IBWARN
 
 uint8_t *
-sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, uint timeout)
+sa_rpc_call(void *ibmad_port, void *rcvbuf, ib_portid_t *portid,
+	    ib_sa_call_t *sa, uint timeout)
 {
 	ib_rpc_t rpc = {0};
 	uint8_t *p;
@@ -77,7 +78,7 @@ sa_call(void *rcvbuf, ib_portid_t *porti
 	if (!portid->qkey)
 		portid->qkey = IB_DEFAULT_QP1_QKEY;
 
-	p = madrpc_rmpp(&rpc, portid, 0/*&sa->rmpp*/, rcvbuf);	/* TODO: RMPP */
+	p = mad_rpc_rmpp(ibmad_port, &rpc, portid, 0/*&sa->rmpp*/, rcvbuf);	/* TODO: RMPP */
 
 	sa->recsz = rpc.recsz;
 
Index: libibmad/src/rpc.c
===================================================================
--- libibmad/src/rpc.c	(revision 9425)
+++ libibmad/src/rpc.c	(working copy)
@@ -386,3 +386,14 @@ mad_rpc_close_port(void *port_id)
 	umad_close_port(p->port_id);
 	free(p);
 }
+
+uint8_t *
+sa_call(void *rcvbuf, ib_portid_t *portid, ib_sa_call_t *sa, uint timeout)
+{
+	struct ibmad_port port;
+
+	port.port_id = mad_portid;
+	port.class_agents[IB_SA_CLASS] = mad_class_agent(IB_SA_CLASS);
+	return sa_rpc_call(&port, rcvbuf, portid, sa, timeout);
+}
+


From ogerlitz at voltaire.com  Wed Sep 13 04:27:51 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 13 Sep 2006 14:27:51 +0300
Subject: [openib-general] IPOIB failover ?
In-Reply-To: <1158067443.11227.207.camel@localhost.localdomain>
References: <1158067443.11227.207.camel@localhost.localdomain>
Message-ID: <4507EB37.5080702@voltaire.com>

Richard Frank wrote:
> Does IPOIB in this stack support transparent fail over between ports and
> across redundant HCAs using a "virtual IP" ?

I am working on a patch to the linux bonding driver which will allow it 
to enslave also IPoIB devices for the active-backup mode. I will send an 
RFC to netdev for review next week. Does this meets your needs?

Does by virtual IP you mean an ***alias address*** assigned at one point 
of time to one ipoib device and in another point of time (eg during 
fail-over) to a second ipoib device?  does this approach have any 
advantage on the bonding approach?

Or.


From mst at mellanox.co.il  Wed Sep 13 05:01:54 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 15:01:54 +0300
Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add
	rdma_establish
In-Reply-To: <45073FF7.7020506@ichips.intel.com>
References: <45073FF7.7020506@ichips.intel.com>
Message-ID: <20060913120154.GA23890@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH] IB/cma: add rdma_establish
> 
> Michael S. Tsirkin wrote:
> >>>As a side note, reasons for frequent loss of RTU must be investigated.
> >>
> >>A lost RTU shouldn't be any more likely than a lost REQ or REP.  Is the RTU 
> >>never showing up?
> > 
> > 
> > Seems like that. I know fir sure I do accept after REP but remote side never
> > gets ESTABLISHED.
> 
> I looked at the code, then ran some tests.  The REP is retried until an RTU is 
> received, or its number of retries is exhausted.  By modifying the IB CM, I was 
> able to force RTU drops.  Using madeye, I could see that the REP would be 
> retried, resulting in the RTU being resent.  After 4 drops, I had the code 
> receive the RTU, which allowed the test to proceed.
> 
> A couple things to look at in OFED would be the setting of max cm retries and 
> the cm timeout.

What I think we need for 2.6.18 is the following. Pls comment.


IB/cma: increase the retry count in CMA from 3 to maximum 15.
3 seems low - we see connections failing under stress - and in any case looks
like an arbitrary number. 15 is the max value allowed by spec.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d6f99d5..5d625a8 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -49,7 +49,7 @@ MODULE_DESCRIPTION("Generic RDMA CM Agen
 MODULE_LICENSE("Dual BSD/GPL");
 
 #define CMA_CM_RESPONSE_TIMEOUT 20
-#define CMA_MAX_CM_RETRIES 3
+#define CMA_MAX_CM_RETRIES 15
 
 static void cma_add_one(struct ib_device *device);
 static void cma_remove_one(struct ib_device *device);

-- 
MST


From Richard.Frank at oracle.com  Wed Sep 13 05:12:19 2006
From: Richard.Frank at oracle.com (Richard Frank)
Date: Wed, 13 Sep 2006 08:12:19 -0400
Subject: [openib-general] IPOIB failover ?
In-Reply-To: <4507EB37.5080702@voltaire.com>
References: <1158067443.11227.207.camel@localhost.localdomain>
	<4507EB37.5080702@voltaire.com>
Message-ID: <1158149539.13254.45.camel@localhost.localdomain>

Supporting IPOIB fail over with the Bonding driver will work - we
currently use this for GE, etc. 


On Wed, 2006-09-13 at 14:27 +0300, Or Gerlitz wrote:
> Richard Frank wrote:
> > Does IPOIB in this stack support transparent fail over between ports and
> > across redundant HCAs using a "virtual IP" ?
> 
> I am working on a patch to the linux bonding driver which will allow it 
> to enslave also IPoIB devices for the active-backup mode. I will send an 
> RFC to netdev for review next week. Does this meets your needs?
> 
> Does by virtual IP you mean an ***alias address*** assigned at one point 
> of time to one ipoib device and in another point of time (eg during 
> fail-over) to a second ipoib device?  does this approach have any 
> advantage on the bonding approach?
> 
> Or.
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From johnt1johnt2 at gmail.com  Wed Sep 13 06:05:58 2006
From: johnt1johnt2 at gmail.com (john t)
Date: Wed, 13 Sep 2006 18:35:58 +0530
Subject: [openib-general] ibis
Message-ID: <a94efc20609130605q48ea2e6fy62a5cd5ff080684@mail.gmail.com>

Hi,

In OFED there are commands like ibis, ibdmsh and ibmssh all of these provide
a shell prompt and allow some operations like "new_IBFabric",
"delete_IBFabric"  etc. What are these operations and how to use them.

Besides there are many commands which ask for a topology file. How do I
generate a topology file. Is it same as produced by "ibnetdiscover" (which
is not working in my case)?

Regards
John T.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060913/8400bbed/attachment.html>

From thomas.bub at thomson.net  Wed Sep 13 06:39:55 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Wed, 13 Sep 2006 15:39:55 +0200
Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC445@wdtssmail01.eu.thmulti.com>

Michael,
this is another little issue using OFED under SLES10.
In sa.h there is the definition of a struct ibv_sa_path_record that gets
re-defined against ib_sa_path_record in the same header file.
While the gcc 3.3.3 compile of SLES 9 is OK with this the gcc 4.1
comiple of SLEs 10 does not like this.
Thomas Bub


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060913/ce56cc99/attachment.html>

From Brian.Cain at ge.com  Wed Sep 13 06:50:53 2006
From: Brian.Cain at ge.com (Cain, Brian (GE Healthcare))
Date: Wed, 13 Sep 2006 09:50:53 -0400
Subject: [openib-general] IPOIB failover ?
In-Reply-To: <1158149539.13254.45.camel@localhost.localdomain>
Message-ID: <2376B63A5AF8564F8A2A2D76BC6DB033E84F8F@CINMLVEM11.e2k.ad.ge.com>

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Richard Frank
> Sent: Wednesday, September 13, 2006 7:12 AM
> To: Or Gerlitz
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] IPOIB failover ?
> 
> Supporting IPOIB fail over with the Bonding driver will work - we
> currently use this for GE, etc. 

You can also get failover with IPoIB if you're willing to use SCTP as
the transport.

-Brian


From halr at voltaire.com  Wed Sep 13 06:47:42 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 09:47:42 -0400
Subject: [openib-general] ibis
In-Reply-To: <a94efc20609130605q48ea2e6fy62a5cd5ff080684@mail.gmail.com>
References: <a94efc20609130605q48ea2e6fy62a5cd5ff080684@mail.gmail.com>
Message-ID: <1158155255.13748.171.camel@hal.voltaire.com>

Hi,

On Wed, 2006-09-13 at 09:05, john t wrote: 
> Hi,
>  
> In OFED there are commands like ibis, ibdmsh and ibmssh 

I'm adding Eitan who is the maintainer for these tools.

> all of these provide a shell prompt and allow some operations like
> "new_IBFabric", "delete_IBFabric"  etc. What are these operations and
> how to use them.
>  
> Besides there are many commands which ask for a topology file. How do
> I generate a topology file. Is it same as produced by "ibnetdiscover"

I don't think so.

> (which is not working in my case)?

What are the symptoms ? Do you have ib_umad module loaded ? Does it have
proper permissions ?

-- Hal

> Regards
> John T.
> 
> ______________________________________________________________________
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From thomas.bub at thomson.net  Wed Sep 13 07:11:29 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Wed, 13 Sep 2006 16:11:29 +0200
Subject: [openib-general] How to connect gen2 CM to gen1 IBGD CM?
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD395@wdtssmail01.eu.thmulti.com>

Sean,
with your patience, the cmpost.c example and the OFED 1.1-rc4 on all
machines I finally got a gen2 connection under SLES10 even with a 32-Bit
executable on a x86_64 machine. Cool!

Now the last part on my journey is standing out.
It's a gen2 client connecting to a gen1 IBGD server.
I have to do this since my gen1 server is running a 2.4 Montavista RT
Linux on a PowerPC that I can't upgrade to gen2. :-(
BTW.: Our application is a high speed film image transfer in the film
postproduction industry leveraging the benefits of the high speed IB
RDMA transport. 

While I have gen1 to gen1 and gen2 to gen2 running the only thing that
is missing is the gen2 connecting to gen1.

Just tried this with my test-executables but I did not get anything to
the gen1 server. The gen1 userspace application does not even receive
the IB_CM_REQ.

So since your cmpost example did help me a lot on gen2 the question is:
Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to
gen1?
Or is there any other trick to play here?

Thanks in advance for your assistance
Thomas

............................................................
Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: +49 6150 104 147
Fax: +49 6150 104 656
Email: Thomas.Bub at thomson.net
www.GrassValley.com  <http://www.grassvalley.com> 
............................................................


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060913/37eadc60/attachment.html>

From eitan at mellanox.co.il  Wed Sep 13 07:22:57 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Wed, 13 Sep 2006 17:22:57 +0300
Subject: [openib-general] ibis
In-Reply-To: <a94efc20609130605q48ea2e6fy62a5cd5ff080684@mail.gmail.com>
References: <a94efc20609130605q48ea2e6fy62a5cd5ff080684@mail.gmail.com>
Message-ID: <45081441.80703@mellanox.co.il>

Hi John,

You should read the man pages of these commands.
If you use OFED 1.0 you should have those as part of the main doc directory.
But they are all present in the source tree too. Please see
https://openib.org/svn/gen2/utils/src/linux-user :
ibdm/doc/ibdmtr.1
ibdm/doc/ibdmchk.1
ibdm/doc/ibdmsh.1
ibdm/doc/ibdm-topo-file.1
ibdm/doc/ibdm-ibnl-file.1
ibdm/doc/ibtopodiff.1
ibis/doc/ibis.1
ibmgtsim/doc/IBMgtSim.1
ibmgtsim/doc/RunSimTest.1
ibmgtsim/doc/ibmsquit.1
ibmgtsim/doc/mkSimNodeDir.1
ibmgtsim/doc/ibmssh.1

Regarding the topology file, you should read the man page:
ibdm/doc/ibdm-topo-file.1

It is not the one generated by ibnetdiscover 


Eitan


john t wrote:

> Hi,
>
> In OFED there are commands like ibis, ibdmsh and ibmssh all of these 
> provide
> a shell prompt and allow some operations like "new_IBFabric",
> "delete_IBFabric"  etc. What are these operations and how to use them.
>
> Besides there are many commands which ask for a topology file. How do I
> generate a topology file. Is it same as produced by "ibnetdiscover" 
> (which
> is not working in my case)?
>
> Regards
> John T.
>
>------------------------------------------------------------------------
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>


From mst at mellanox.co.il  Wed Sep 13 07:51:22 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 17:51:22 +0300
Subject: [openib-general] OFED can't compile against sa.h under SLES10
	x86_64
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD222029AC445@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC445@wdtssmail01.eu.thmulti.com>
Message-ID: <20060913145122.GB24608@mellanox.co.il>

Quoting r. Bub Thomas <thomas.bub at thomson.net>:
> Subject: OFED can't compile against sa.h under SLES10 x86_64
> 
> Michael,
> 
> this is another little issue using OFED under SLES10.
> 
> In sa.h there is the definition of a struct ibv_sa_path_record that gets re-defined against ib_sa_path_record in the same header file.
> 
> While the gcc 3.3.3 compile of SLES 9 is OK with this the gcc 4.1 comiple of SLEs 10 does not like this.
> 
> Thomas Bub
> 


I don't see that. What files are affected? What kind of error do you see?

-- 
MST


From rdreier at cisco.com  Wed Sep 13 08:09:29 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Sep 2006 08:09:29 -0700
Subject: [openib-general] user-mode data strucures
References: <1158092339.9296.20.camel@trinity.ogc.int>
	<adawt88zv9u.fsf@cisco.com> <1158097515.9296.25.camel@trinity.ogc.int>
	<adasliwzpxz.fsf@cisco.com> <1158100078.9296.29.camel@trinity.ogc.int>
	<adaodtkzoqm.fsf@cisco.com> <1158102636.9296.34.camel@trinity.ogc.int>
Message-ID: <adavenryf9y.fsf@cisco.com>

    Tom> I'm OK with a work-around in the near term. BTW, how do we
    Tom> correlate libibverbs 1.x with cat
    Tom> /sys/class/infiniband/verbs/abi_version?

abi_version is the ABI exported by the kernel -- all up-to-date
versions of libibverbs (1.0.x and 1.1.x) should be able to cope with
all kernel versions.

So there's not really a connection.

 - R.


From mst at mellanox.co.il  Wed Sep 13 08:57:26 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 18:57:26 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limit MTU to
	1K
Message-ID: <20060913155726.GA24954@mellanox.co.il>

Tavor systems get better performance with 1K MTU. Since there does
not seem to be any way to find out whether the remote system uses Tavor,
add an option to limit the MTU globally.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Sean, can you ack the following for 2.6.18 please?

Index: linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c
===================================================================
--- linux-2.6.18-rc2-devel.orig/drivers/infiniband/core/cma.c	2006-09-11 16:01:37.000000000 +0300
+++ linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c	2006-09-13 18:51:45.000000000 +0300
@@ -48,6 +48,10 @@ MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("Generic RDMA CM Agent");
 MODULE_LICENSE("Dual BSD/GPL");
 
+static int tavor_quirk = 0;
+module_param_named(tavor_quirk, tavor_quirk, int, 0644);
+MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0");
+
 #define CMA_CM_RESPONSE_TIMEOUT 20
 #define CMA_MAX_CM_RETRIES 3
 
@@ -1123,6 +1127,11 @@ static int cma_query_ib_route(struct rdm
 	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
 	path_rec.numb_path = 1;
 
+	if (tavor_quirk) {
+		path_rec.mtu_selector = IB_SA_LTE;
+		path_rec.mtu = IB_MTU_1024;
+	}
+
 	id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device,
 				id_priv->id.port_num, &path_rec,
 				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |

-- 
MST


From halr at voltaire.com  Wed Sep 13 09:06:21 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 12:06:21 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limit MTU
 to 1K
In-Reply-To: <20060913155726.GA24954@mellanox.co.il>
References: <20060913155726.GA24954@mellanox.co.il>
Message-ID: <1158163574.13748.5521.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> Tavor systems get better performance with 1K MTU. Since there does
> not seem to be any way to find out whether the remote system uses Tavor,
> add an option to limit the MTU globally.

Can't Tavor be determined locally ?

And couldn't the remote end negotiate the MTU down (if Tavor) as well ?

> Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> 
> ---
> 
> Sean, can you ack the following for 2.6.18 please?
> 
> Index: linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c
> ===================================================================
> --- linux-2.6.18-rc2-devel.orig/drivers/infiniband/core/cma.c	2006-09-11 16:01:37.000000000 +0300
> +++ linux-2.6.18-rc2-devel/drivers/infiniband/core/cma.c	2006-09-13 18:51:45.000000000 +0300
> @@ -48,6 +48,10 @@ MODULE_AUTHOR("Sean Hefty");
>  MODULE_DESCRIPTION("Generic RDMA CM Agent");
>  MODULE_LICENSE("Dual BSD/GPL");
>  
> +static int tavor_quirk = 0;
> +module_param_named(tavor_quirk, tavor_quirk, int, 0644);
> +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk: limit MTU to 1K if > 0");
> +
>  #define CMA_CM_RESPONSE_TIMEOUT 20
>  #define CMA_MAX_CM_RETRIES 3
>  
> @@ -1123,6 +1127,11 @@ static int cma_query_ib_route(struct rdm
>  	path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
>  	path_rec.numb_path = 1;
>  
> +	if (tavor_quirk) {
> +		path_rec.mtu_selector = IB_SA_LTE;
> +		path_rec.mtu = IB_MTU_1024;
> +	}
> +
>  	id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device,
>  				id_priv->id.port_num, &path_rec,
>  				IB_SA_PATH_REC_DGID | IB_SA_PATH_REC_SGID |

Aren't more component mask bits needed here for MTU selector and MTU ?

-- Hal


From mst at mellanox.co.il  Wed Sep 13 09:22:45 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 19:22:45 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <1158163574.13748.5521.camel@hal.voltaire.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
Message-ID: <20060913162245.GA25666@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > Tavor systems get better performance with 1K MTU. Since there does
> > not seem to be any way to find out whether the remote system uses Tavor,
> > add an option to limit the MTU globally.
> 
> Can't Tavor be determined locally ?

It can, but we need this for remote tavor as well, anyway.

> And couldn't the remote end negotiate the MTU down (if Tavor) as well ?

The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
and if this does not conflict with MTU selector.

However
1. Even opensm does not implement this optimization yet
2. We need to work with existing SMs too

-- 
MST


From mshefty at ichips.intel.com  Wed Sep 13 09:30:26 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 09:30:26 -0700
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
 rdma_accept
In-Reply-To: <Pine.LNX.4.64.0609121053140.13564@zuben>
References: <Pine.LNX.4.64.0609121053140.13564@zuben>
Message-ID: <45083222.9000005@ichips.intel.com>

Committed to svn 9461.  Roland, can you also pull into 2.6.19?

Signed-off-by: Sean Hefty <sean.hefty at intel.com>

Or Gerlitz wrote:
> Document the reject sending and modifying qp to error done in rdma_accept
> 
> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
> 
> diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
> index 402c63d..f932c16 100644
> --- a/include/rdma/rdma_cm.h
> +++ b/include/rdma/rdma_cm.h
> @@ -237,6 +237,10 @@ int rdma_listen(struct rdma_cm_id *id, i
>   * Typically, this routine is only called by the listener to accept a connection
>   * request.  It must also be called on the active side of a connection if the
>   * user is performing their own QP transitions.
> + *
> + * In the case of error, a reject message is sent to the remote side and the
> + * state of the qp associated with the id is modified to error, such that any
> + * previously posted receive buffers would be flushed.
>   */
>  int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param);


From halr at voltaire.com  Wed Sep 13 09:26:30 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 12:26:30 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913162245.GA25666@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
Message-ID: <1158164787.13748.6289.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 12:22, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > 
> > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > Tavor systems get better performance with 1K MTU. Since there does
> > > not seem to be any way to find out whether the remote system uses Tavor,
> > > add an option to limit the MTU globally.
> > 
> > Can't Tavor be determined locally ?
> 
> It can, but we need this for remote tavor as well, anyway.
> 
> > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> 
> The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> and if this does not conflict with MTU selector.

But it only needs the MTU on each local side (once for the REQ and on
the remote side for the REP). It would mean that if the local side were
capable of larger MTU and the remote side were Tavor, that the REQ would
be REJ with MTU too large and need to be retried at a smaller MTU.

> However
> 1. Even opensm does not implement this optimization yet

What optimization ? I don't understand what you are saying OpenSM
doesn't support.

> 2. We need to work with existing SMs too

Not sure what the SA issue is here.

-- Hal


From ftillier at silverstorm.com  Wed Sep 13 09:39:00 2006
From: ftillier at silverstorm.com (Fabian Tillier)
Date: Wed, 13 Sep 2006 09:39:00 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913162245.GA25666@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
Message-ID: <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>

On 9/13/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> >
> > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > Tavor systems get better performance with 1K MTU. Since there does
> > > not seem to be any way to find out whether the remote system uses Tavor,
> > > add an option to limit the MTU globally.
> >
> > Can't Tavor be determined locally ?
>
> It can, but we need this for remote tavor as well, anyway.
>
> > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
>
> The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> and if this does not conflict with MTU selector.

You can't do this because the SA doesn't have a way to tell if a path
query is going to be used for RC or UD, and IPoIB needs paths with 2K
MTU.

Would be nice if the CM REP would allow the MTU to be negotiated down.
 There is plenty of space in the REP if we were to use up some of the
reserved fields.

- Fab


From halr at voltaire.com  Wed Sep 13 09:35:58 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 12:35:58 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
Message-ID: <1158165350.13748.6667.camel@hal.voltaire.com>

Hi Fab,

On Wed, 2006-09-13 at 12:39, Fabian Tillier wrote:
> On 9/13/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > >
> > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > add an option to limit the MTU globally.
> > >
> > > Can't Tavor be determined locally ?
> >
> > It can, but we need this for remote tavor as well, anyway.
> >
> > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> >
> > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > and if this does not conflict with MTU selector.
> 
> You can't do this because the SA doesn't have a way to tell if a path
> query is going to be used for RC or UD, and IPoIB needs paths with 2K
> MTU.

Are you referring to IPoIB-CM ?

The patch appears to be for the SA PR request prior to the CM REQ. I
don't think it affects IPoIB SA PR requests.

-- Hal


> Would be nice if the CM REP would allow the MTU to be negotiated down.
>  There is plenty of space in the REP if we were to use up some of the
> reserved fields.
> 
> - Fab


From mshefty at ichips.intel.com  Wed Sep 13 10:13:33 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 10:13:33 -0700
Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add
	rdma_establish
In-Reply-To: <20060913120154.GA23890@mellanox.co.il>
References: <45073FF7.7020506@ichips.intel.com>
	<20060913120154.GA23890@mellanox.co.il>
Message-ID: <45083C3D.1000209@ichips.intel.com>

Michael S. Tsirkin wrote:
> What I think we need for 2.6.18 is the following. Pls comment.
> 
> 
> IB/cma: increase the retry count in CMA from 3 to maximum 15.
> 3 seems low - we see connections failing under stress - and in any case looks
> like an arbitrary number. 15 is the max value allowed by spec.
> 
> Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Dropping 3 packets in a row seems likely only under stress testing, so I'm not 
sure that this is worthy of a change to 2.6.18 at this point (we're at rc7). 
This seems fine for 19 though.

Acked-by: Sean Hefty <sean.hefty at intel.com>


From mshefty at ichips.intel.com  Wed Sep 13 10:19:54 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 10:19:54 -0700
Subject: [openib-general] [PATCH] Optimize cma_process_remove()
In-Reply-To: <20060913053753.5539.76298.sendpatchset@localhost.localdomain>
References: <20060913053753.5539.76298.sendpatchset@localhost.localdomain>
Message-ID: <45083DBA.2070809@ichips.intel.com>

Krishna Kumar wrote:
> Thanks for the explanation. So a list_del_init() would be the best
> thing to do. Another option is to add a remove_list to rdma_id_private
> by which this entry could be added to a local remove_list and traversed
> without holding a lock, but it doesn't make sense to add that for one case.
> 
> Does the following patch look OK ?

Thanks - I committed this to svn 9462.

- Sean


From bos at pathscale.com  Wed Sep 13 10:25:18 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Wed, 13 Sep 2006 10:25:18 -0700
Subject: [openib-general] OFED can't compile against sa.h under SLES10
 x86_64
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD222029AC445@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC445@wdtssmail01.eu.thmulti.com>
Message-ID: <1158168318.4503.12.camel@sardonyx>

On Wed, 2006-09-13 at 15:39 +0200, Bub Thomas wrote:

> While the gcc 3.3.3 compile of SLES 9 is OK with this the gcc 4.1
> comiple of SLEs 10 does not like this.

I haven't seen this happen, and I do a lot of x86_64 SLES10 builds.

	<b


From ftillier at silverstorm.com  Wed Sep 13 10:23:33 2006
From: ftillier at silverstorm.com (Fabian Tillier)
Date: Wed, 13 Sep 2006 10:23:33 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <1158165350.13748.6667.camel@hal.voltaire.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
Message-ID: <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>

Hi Hal,

On 13 Sep 2006 12:35:58 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> Hi Fab,
>
> On Wed, 2006-09-13 at 12:39, Fabian Tillier wrote:
> > On 9/13/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > >
> > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > > add an option to limit the MTU globally.
> > > >
> > > > Can't Tavor be determined locally ?
> > >
> > > It can, but we need this for remote tavor as well, anyway.
> > >
> > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> > >
> > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > > and if this does not conflict with MTU selector.
> >
> > You can't do this because the SA doesn't have a way to tell if a path
> > query is going to be used for RC or UD, and IPoIB needs paths with 2K
> > MTU.
>
> Are you referring to IPoIB-CM ?
>
> The patch appears to be for the SA PR request prior to the CM REQ. I
> don't think it affects IPoIB SA PR requests.

I interpreted Michael's comment as suggesting the SA return paths with
a 1K MTU when it detects that either endpoint is Tavor.  The SA has
access to this information based on the vendor ID/device ID in the
node record.

If I understood Michael's comment properly, this will have the side
effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
I know, there is no way to specify whether a path is needed for UD vs.
RC in the path query.

I like your suggestion to reject with a smaller MTU.  Seems like the
proper way to handle this, as well as allowing for the retry logic to
be put in the CMA itself so clients don't have to deal with it.

- Fab


From halr at voltaire.com  Wed Sep 13 10:21:35 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 13:21:35 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
	<79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
Message-ID: <1158168091.13748.8242.camel@hal.voltaire.com>

Hi Fab,

On Wed, 2006-09-13 at 13:23, Fabian Tillier wrote:
> Hi Hal,
> 
> On 13 Sep 2006 12:35:58 -0400, Hal Rosenstock <halr at voltaire.com> wrote:
> > Hi Fab,
> >
> > On Wed, 2006-09-13 at 12:39, Fabian Tillier wrote:
> > > On 9/13/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > > >
> > > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > > > add an option to limit the MTU globally.
> > > > >
> > > > > Can't Tavor be determined locally ?
> > > >
> > > > It can, but we need this for remote tavor as well, anyway.
> > > >
> > > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> > > >
> > > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > > > and if this does not conflict with MTU selector.
> > >
> > > You can't do this because the SA doesn't have a way to tell if a path
> > > query is going to be used for RC or UD, and IPoIB needs paths with 2K
> > > MTU.
> >
> > Are you referring to IPoIB-CM ?
> >
> > The patch appears to be for the SA PR request prior to the CM REQ. I
> > don't think it affects IPoIB SA PR requests.
> 
> I interpreted Michael's comment as suggesting the SA return paths with
> a 1K MTU when it detects that either endpoint is Tavor.  The SA has
> access to this information based on the vendor ID/device ID in the
> node record.

That's the part I missed.

> If I understood Michael's comment properly, this will have the side
> effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> I know, there is no way to specify whether a path is needed for UD vs.
> RC in the path query.

I don't know how either. I don't think it can be done (at least
currently per the standard).

> I like your suggestion to reject with a smaller MTU.  Seems like the
> proper way to handle this, as well as allowing for the retry logic to
> be put in the CMA itself so clients don't have to deal with it.

But a penalty is paid for connect setup (more connection setup latency)
in more round trips until the right MTU is achieved so as most
engineering "solutions" it is a tradeoff with pros and cons.

-- Hal


> - Fab


From mshefty at ichips.intel.com  Wed Sep 13 10:28:17 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 10:28:17 -0700
Subject: [openib-general] How to connect gen2 CM to gen1 IBGD CM?
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD395@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD395@wdtssmail01.eu.thmulti.com>
Message-ID: <45083FB1.4070808@ichips.intel.com>

Bub Thomas wrote:
> Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to gen1?

No - the gen1 code is really the old Topspin code.  Topspin is now part of 
Cisco, so they may have something.

> Or is there any other trick to play here?

I don't think so.  I'm pretty sure that this has been tried before and has 
worked.  Can you try connecting from the gen1 system to the gen2 system and see 
if the REQ shows up?

On the gen2 system, when you send the REQ, what happens?  Does the REQ just 
timeout, or does it receive a REJ message back from the gen1 system?

- Sean


From ralphc at pathscale.com  Wed Sep 13 10:33:09 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Wed, 13 Sep 2006 10:33:09 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <20060913031054.GA4464@obsidianresearch.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<20060913031054.GA4464@obsidianresearch.com>
Message-ID: <1158168789.8759.199.camel@brick.pathscale.com>

On Tue, 2006-09-12 at 21:10 -0600, Jason Gunthorpe wrote:
> On Tue, Sep 12, 2006 at 05:40:10PM -0700, Ralph Campbell wrote:
> 
> > The ib_ipath driver needs kernel virtual addresses in order to be able
> > to copy data to/from the posted work requests since it does not
> > use HW DMA. It currently relies on the mapping being one-to-one
> > and cannot reasonably reverse the mapping when an IOMMU is present.
> 
> I'm sure this must have been answered, but given a PCI
> domain:bus:device:function tuple and a DMA address, shouldn't any
> effects of an IOMMU be easially duplicated in software to result in a
> cpu-bus physical address? Ie on AMD64 it is just a matter of following
> the GART tables in software - assuming the address in question hits
> the GART region (which for ipath, I expect, it never would)
> 
> Jason

The problem is that this reverse mapping code would either need to
be added to every device driver for every possible IOMMU or
it would need to be added to the general dma interface as a new
architecture dependent interface. Neither of these is acceptable
to the kernel community.


From ralphc at pathscale.com  Wed Sep 13 10:35:02 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Wed, 13 Sep 2006 10:35:02 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <adaejugzcba.fsf@cisco.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<adaejugzcba.fsf@cisco.com>
Message-ID: <1158168902.8759.201.camel@brick.pathscale.com>

On Tue, 2006-09-12 at 20:15 -0700, Roland Dreier wrote:
>  > My current proposal is to provide wrapper routines for the
>  > dma_*() routines which only the IB kernel code would use.
>  > These ib_dma_*() variants would allow a device driver to interpose
>  > on the call and do appropriate code to convert the kernel virtual
>  > or physical page addresses to something the device driver can handle.
>  > For ib_mthca and ib_ehca, these would result in the corresponding
>  > dma_*() routine being called. For ib_ipath, a different implementation
>  > would be needed.
> 
> Seems like the least-bad way forward.
> 
> A few comments on the proposed implementation:
> 
>  > @@ -984,6 +985,19 @@ struct ib_device {
>  >  						  struct ib_grh *in_grh,
>  >  						  struct ib_mad *in_mad,
>  >  						  struct ib_mad *out_mad);
>  > +	int                        (*mapping_error)(dma_addr_t dma_addr);
>  > +	dma_addr_t                 (*map_single)(struct device *hwdev,
>  > +						 void *ptr, size_t size,
>  > +						 int direction);
>  > +	void                       (*unmap_single)(struct device *dev,
>  > +						   dma_addr_t addr,
>  > +						   size_t size, int direction);
>  > +	int                        (*map_sg)(struct device *hwdev,
>  > +					     struct scatterlist *sg,
>  > +					     int nents, int direction);
>  > +	void                       (*unmap_sg)(struct device *hwdev,
>  > +					       struct scatterlist *sg,
>  > +					       int nents, int direction);
> 
> First of all I would put all this into a "struct ib_dma_ops" or
> something like that, so struct ib_device can have just a member like
> 
> 	struct ib_dma_ops	*dma_ops;
> 
> That keeps the definition of struct ib_device from getting too much
> more gigantic, and also makes it easy for the core to export a
> standard dma_ops pointer that devices that use the default
> implementation can use.
> 
> Why not make the DMA operations take a struct ib_device * instead of a
> struct device *?  I think that would actually clean up the consumer
> code, and it would make it easier for ipath -- otherwise you have to
> find your way back from the struct device *.
> 
> Also, I think you will need a few more methods.  <asm-x86_64/dma-mapping.h>
> has a definition of DMA operations that might be useful to refer too.
> But for example SRP uses at least dma_sync_single_for_cpu() and
> dma_sync_single_for_device().  Actually that might be the only extra
> method needed for now.
> 
>  - R.

These are all good suggestions and I will incorporate them.


From mshefty at ichips.intel.com  Wed Sep 13 11:18:13 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 11:18:13 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <1158164787.13748.6289.camel@hal.voltaire.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<1158164787.13748.6289.camel@hal.voltaire.com>
Message-ID: <45084B65.4000007@ichips.intel.com>

Hal Rosenstock wrote:
> But it only needs the MTU on each local side (once for the REQ and on
> the remote side for the REP). It would mean that if the local side were
> capable of larger MTU and the remote side were Tavor, that the REQ would
> be REJ with MTU too large and need to be retried at a smaller MTU.

I agree with this approach.  The user should determine the proper MTU based on 
local information, and either set it to 1k if sending a REQ, or REJ the REQ if 
the MTU is too large.  I'm not sure that this policy should be in the CMA, 
versus the consumer, but I can go with the CMA.

I do think that the MTU could be negotiated down as part of the private data in 
the REP, but this would need to be done outside of the CMA.

- Sean


From ralphc at pathscale.com  Wed Sep 13 11:30:57 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Wed, 13 Sep 2006 11:30:57 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <4507C8C2.6050206@voltaire.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<4507C8C2.6050206@voltaire.com>
Message-ID: <1158172258.8759.230.camel@brick.pathscale.com>

On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote:
> Ralph Campbell wrote:
> > Problem:
> > 
> > The IB kernel to IB device driver interface uses dma_map_single()
> > and dma_map_sg() to allocate device bus addresses for HW DMA.
> > These bus addresses are passed to the IB device driver via ib_post_send()
> > and ib_post_recv().
> > 
> > The ib_ipath driver needs kernel virtual addresses in order to be able
> > to copy data to/from the posted work requests since it does not
> > use HW DMA. It currently relies on the mapping being one-to-one
> > and cannot reasonably reverse the mapping when an IOMMU is present.
> 
> Oops, please note that one can get through the DMA api a DMA address for 
> a page which is currently **not** mapped into the kernel virtual address 
> space (that is page_address(p) is NULL), so you must add kmap and kunmap 
> into your fast RX/TX code path.

Yes, these are called "high pages".

> Examples for scenarios when this happen i can think of are Direct I/O 
> and some sort of pre-fetching done by File-System. Some pages present in 
> a kernel SG which needs to be sent/received/RDMA-ed over IB need not be 
> mapped into the kernel virtual address space.

Well, the other parts of the kernel might not need a kernel virtual
address but the ib_ipath driver still does.

> As for RDMA, please note that the problem has two faces, the remote 
> device which does the RDMA or the local device does RDMA from/to and 
> second, the local device.
> 
> Since you need to be able interop between devices that support DMA 
> mappings to ones which do not, how do you suggest to manage the 
> addresses for the following schemes (1 stands for device supporting DMA 
> addresses and 0 for device which does not)
> 
> <1,1>
> <1,0>
> <0,1>
> <0,0>
> 
> Please assume for the purpose of discussion that each side knows the 
> polarity of the remote side?
> 
> After writing the section on RDMA i think i might went to the wrong 
> direction since ipath emulates RDMA in SW, can you shed some light on this?

I don't understand what you are talking about. There is an IB
wire protocol for RDMA, SEND, etc. That doesn't change depending
on the HCA.
The InfiniPath HCA has a ring buffer of receive buffers and all
incoming IB packets are DMA'ed into one of these buffers.
The ib_ipath software driver examines the packet and
copies it to the appropriate address. For a packet received with
a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert
that into a kernel virtual address and the data is copied.
The same happens for RC_SEND_FIRST but the KV address comes from
the LKEY and address in the work request posted by ib_post_recv().

Sending data is similar, the driver constructs a packet with the
appropriate opcode and writes it to the chip which puts it on
the wire.

> > I also tried proposing adding a flag to the ib_device structure
> > and modifying the kernel IB code to check the flag and pass
> > either the dma_*() mapped address or a kernel virtual address.
> > This works OK for kmalloc() buffers where dma_map_single() is
> > being called but doesn't work well for SRP which has lists
> > of physical pages and calls dma_map_sg().
> > It also means that the kernel IB layer needs to explicitly handle
> > two different kinds of addresses.
> 
> Just a note, its not just SRP there... its any ulp which needs to move 
> over IB data present bunch of pages (eg packed in a kernel SG list), 
> namely iSER, NFSoRDMA, Lustre, IB native imp of send_page(), etc.

Sure. In each such case, the code would need to be modified to
use the ib_dma_*() routines instead of dma_*() for addresses used
with the LKEY/RKEY returned from ibv_get_dma_mr().


From mst at mellanox.co.il  Wed Sep 13 12:03:28 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 22:03:28 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
Message-ID: <20060913190328.GB26959@mellanox.co.il>

Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On 9/13/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > >
> > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > add an option to limit the MTU globally.
> > >
> > > Can't Tavor be determined locally ?
> >
> > It can, but we need this for remote tavor as well, anyway.
> >
> > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> >
> > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > and if this does not conflict with MTU selector.
> 
> You can't do this because the SA doesn't have a way to tell if a path
> query is going to be used for RC or UD, and IPoIB needs paths with 2K
> MTU.

I think we can do that without breaking IPoIB.
IPoIB needs mtu >= 1K. IPoIB sets mtu selector to >= 2K.
I am talking about users that do not set mtu selector.


-- 
MST


From mst at mellanox.co.il  Wed Sep 13 12:05:39 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 22:05:39 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158168091.13748.8242.camel@hal.voltaire.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
	<79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
	<1158168091.13748.8242.camel@hal.voltaire.com>
Message-ID: <20060913190539.GC26959@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > If I understood Michael's comment properly, this will have the side
> > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > I know, there is no way to specify whether a path is needed for UD vs.
> > RC in the path query.
> 
> I don't know how either. I don't think it can be done (at least
> currently per the standard).

We don't really need to know whether path is for RC or UD QP.
IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.
In this case SM will return path with MTU >= 2K.
CMA will not set mtu selector and then SM will choose MTU for best performance.

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 12:13:43 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 22:13:43 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158164787.13748.6289.camel@hal.voltaire.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<1158164787.13748.6289.camel@hal.voltaire.com>
Message-ID: <20060913191343.GD26959@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On Wed, 2006-09-13 at 12:22, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > 
> > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > add an option to limit the MTU globally.
> > > 
> > > Can't Tavor be determined locally ?
> > 
> > It can, but we need this for remote tavor as well, anyway.
> > 
> > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> > 
> > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > and if this does not conflict with MTU selector.
> 
> But it only needs the MTU on each local side (once for the REQ and on
> the remote side for the REP). It would mean that if the local side were
> capable of larger MTU and the remote side were Tavor, that the REQ would
> be REJ with MTU too large and need to be retried at a smaller MTU.

This has 3 implications that make it impractical:
. connection rate will suffer greatly
. this will need ot be done in each ulp, and it's a lot of code
. protocols such as sdp explicitly say what to do on rej
  and do not seem to speak about retries

> > However
> > 1. Even opensm does not implement this optimization yet
> 
> What optimization ? I don't understand what you are saying OpenSM
> doesn't support.
> 
> > 2. We need to work with existing SMs too
> 
> Not sure what the SA issue is here.

If path MTU selector in path query allows MTU 1K (e.g. "best MTU")
and one of the sides is Tavor, select the best MTU that is 1K
and not the largest possible.

If path MTU selector requires 2K MTU, return path with 2K MTU.

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 12:18:41 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 22:18:41 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <45084B65.4000007@ichips.intel.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<1158164787.13748.6289.camel@hal.voltaire.com>
	<45084B65.4000007@ichips.intel.com>
Message-ID: <20060913191841.GE26959@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> Hal Rosenstock wrote:
> > But it only needs the MTU on each local side (once for the REQ and on
> > the remote side for the REP). It would mean that if the local side were
> > capable of larger MTU and the remote side were Tavor, that the REQ would
> > be REJ with MTU too large and need to be retried at a smaller MTU.
> 
> I agree with this approach.  The user should determine the proper MTU based on 
> local information, and either set it to 1k if sending a REQ, or REJ the REQ if 
> the MTU is too large.  I'm not sure that this policy should be in the CMA, 
> versus the consumer, but I can go with the CMA.
> 
> I do think that the MTU could be negotiated down as part of the private data in 
> the REP, but this would need to be done outside of the CMA.
> 
> - Sean

Putting knowledge about hw quirks in all protocols is really horrible.

MTU should be decided by SA as part of path information.
If ULPs have spicific limitations wrt MTU they should use mtu selector
in path record query.

-- 
MST


From mshefty at ichips.intel.com  Wed Sep 13 12:22:30 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 12:22:30 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913190328.GB26959@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<20060913190328.GB26959@mellanox.co.il>
Message-ID: <45085A76.8080800@ichips.intel.com>

Michael S. Tsirkin wrote:
> I think we can do that without breaking IPoIB.
> IPoIB needs mtu >= 1K. IPoIB sets mtu selector to >= 2K.
> I am talking about users that do not set mtu selector.

The ipoib spec requires support for a 2k MTU, but allows support for smaller 
MTUs.  I agree that if the ipoib implementation requires an MTU of 2k, then it 
should be setting this as part of its query request.

- Sean


From mst at mellanox.co.il  Wed Sep 13 12:30:16 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 22:30:16 +0300
Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add
	rdma_establish
In-Reply-To: <45083C3D.1000209@ichips.intel.com>
References: <45073FF7.7020506@ichips.intel.com>
	<20060913120154.GA23890@mellanox.co.il>
	<45083C3D.1000209@ichips.intel.com>
Message-ID: <20060913193016.GF26959@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish
> 
> Michael S. Tsirkin wrote:
> > What I think we need for 2.6.18 is the following. Pls comment.
> > 
> > 
> > IB/cma: increase the retry count in CMA from 3 to maximum 15.
> > 3 seems low - we see connections failing under stress - and in any case looks
> > like an arbitrary number. 15 is the max value allowed by spec.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> 
> Dropping 3 packets in a row seems likely only under stress testing, so I'm not 
> sure that this is worthy of a change to 2.6.18 at this point (we're at rc7). 

I don't really understand. The fix is a one-liner.
The problem is observed in practice, under stress.
Who *wants* systems that fall apart under stress?

It seems that with retry of 3, chances of losing
one out of 3 packets would be close to 100% if loss rate is about 10%.
Ranking it up to 15, you need loss rate on top of 50% to get close to 100%
chance of losing connection request.

Losing a DREP is also bad - as it leaves stale connections around
munching up resources.

So why aren't we fixing this?

-- 
MST


From halr at voltaire.com  Wed Sep 13 12:25:37 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 15:25:37 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <20060913190539.GC26959@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
	<79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
	<1158168091.13748.8242.camel@hal.voltaire.com>
	<20060913190539.GC26959@mellanox.co.il>
Message-ID: <1158175522.13748.12872.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > If I understood Michael's comment properly, this will have the side
> > > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > > I know, there is no way to specify whether a path is needed for UD vs.
> > > RC in the path query.
> > 
> > I don't know how either. I don't think it can be done (at least
> > currently per the standard).
> 
> We don't really need to know whether path is for RC or UD QP.
> IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.

That's the default and not the minimum MTU (for IPoIB).

> In this case SM will return path with MTU >= 2K.
> CMA will not set mtu selector and then SM will choose MTU for best performance.


From sean.hefty at intel.com  Wed Sep 13 12:32:18 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 12:32:18 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913191841.GE26959@mellanox.co.il>
Message-ID: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com>

>Putting knowledge about hw quirks in all protocols is really horrible.

Agreed.

>MTU should be decided by SA as part of path information.
>If ULPs have spicific limitations wrt MTU they should use mtu selector
>in path record query.

Thinking about this more, the proper place for this does seem to be in the
selection of the path record (where you put it), rather than during connection
establishment.

Although, I don't like the idea of the CMA changing every path to use an MTU of
1k.

- Sean


From mshefty at ichips.intel.com  Wed Sep 13 12:40:38 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 12:40:38 -0700
Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add
	rdma_establish
In-Reply-To: <20060913193016.GF26959@mellanox.co.il>
References: <45073FF7.7020506@ichips.intel.com>
	<20060913120154.GA23890@mellanox.co.il>
	<45083C3D.1000209@ichips.intel.com>
	<20060913193016.GF26959@mellanox.co.il>
Message-ID: <45085EB6.80107@ichips.intel.com>

Michael S. Tsirkin wrote:
> I don't really understand. The fix is a one-liner.
> The problem is observed in practice, under stress.
> Who *wants* systems that fall apart under stress?

My view is: is this worth delaying the release of the kernel?  And I don't see 
that it is at this point in the 2.6.18 release cycle.  This does not fix a 
system crash.  It only allows a connection to be made if the system is under 
heavy stress.

> It seems that with retry of 3, chances of losing
> one out of 3 packets would be close to 100% if loss rate is about 10%.
> Ranking it up to 15, you need loss rate on top of 50% to get close to 100%
> chance of losing connection request.

I'm not quite following the math here.

> Losing a DREP is also bad - as it leaves stale connections around
> munching up resources.

Yes - but retrying the DREQ doesn't end up fixing the issue.  The side that 
sends the DREP often ends of entering and exiting timewait before the DREQ can 
be retried.  This results in the DREQ being lost.  Eventually the DREQ will time 
out, and the connection will be torn down.  Increasing the number of times that 
the DREQ is retried ends up increasing how long the connection stays around.

- Sean


From halr at voltaire.com  Wed Sep 13 13:10:15 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 16:10:15 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913190328.GB26959@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<20060913190328.GB26959@mellanox.co.il>
Message-ID: <1158178200.13748.14583.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 15:03, Michael S. Tsirkin wrote:
> Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > 
> > On 9/13/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > >
> > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > > add an option to limit the MTU globally.
> > > >
> > > > Can't Tavor be determined locally ?
> > >
> > > It can, but we need this for remote tavor as well, anyway.
> > >
> > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> > >
> > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > > and if this does not conflict with MTU selector.
> > 
> > You can't do this because the SA doesn't have a way to tell if a path
> > query is going to be used for RC or UD, and IPoIB needs paths with 2K
> > MTU.
> 
> I think we can do that without breaking IPoIB.
> IPoIB needs mtu >= 1K.

Huh ?

>  IPoIB sets mtu selector to >= 2K.

I don't think that's a requirement for IPoIB.

> I am talking about users that do not set mtu selector.

Understood.

-- Hal


From rdreier at cisco.com  Wed Sep 13 13:45:34 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Sep 2006 13:45:34 -0700
Subject: [openib-general] How to connect gen2 CM to gen1 IBGD CM?
In-Reply-To: <45083FB1.4070808@ichips.intel.com> (Sean Hefty's message
	of "Wed, 13 Sep 2006 10:28:17 -0700")
References: <B79FAF8BB536314E859EA1963CFFD22201FBD395@wdtssmail01.eu.thmulti.com>
	<45083FB1.4070808@ichips.intel.com>
Message-ID: <adamz93xzpt.fsf@cisco.com>

    >> Do you have a cmpost for gen1 IBGD I can use to connect from
    >> gen2 to gen1?

    Sean> No - the gen1 code is really the old Topspin code.  Topspin
    Sean> is now part of Cisco, so they may have something.

No, no one has bothered to port any of that stuff to the old obsolete stack.

    >> Or is there any other trick to play here?

    Sean> I don't think so.  I'm pretty sure that this has been tried
    Sean> before and has worked.  Can you try connecting from the gen1
    Sean> system to the gen2 system and see if the REQ shows up?

Yes, for example Mellanox SRP target code is based on gen1, and the
current Linux ("gen2") SRP initiator can connect to it fine.

 - R.


From mst at mellanox.co.il  Wed Sep 13 13:54:31 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 23:54:31 +0300
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
 (was Re: [PATCH for-2.6.18] IB/cma: option to limitMTUto 1K)
In-Reply-To: <1158178200.13748.14583.camel@hal.voltaire.com>
References: <1158178200.13748.14583.camel@hal.voltaire.com>
Message-ID: <20060913205430.GA27766@mellanox.co.il>

> >  IPoIB sets mtu selector to >= 2K.
> 
> I don't think that's a requirement for IPoIB.

whatever MTU IPoIB needs, it should set selector appropriately.

> > I am talking about users that do not set mtu selector.
> 
> Understood.

Roland, would it make sense for this to go upstream? In my opinion,
it's important to have this in sooner rather than later since this is
a question of interoperability with SM. If we have IPoIB implementations
that don't set MTU selector appropriately, we'll need workarounds
in SM.

----

IPoIB in linux needs 2K MTU. Therefore it must set mtu selector
in path record query accordingly.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cf71d2a..e92c3f8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat
 	INIT_LIST_HEAD(&path->neigh_list);
 
 	memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid));
-	path->pathrec.sgid      = priv->local_gid;
-	path->pathrec.pkey      = cpu_to_be16(priv->pkey);
-	path->pathrec.numb_path = 1;
+	path->pathrec.sgid           = priv->local_gid;
+	path->pathrec.pkey           = cpu_to_be16(priv->pkey);
+	path->pathrec.numb_path      = 1;
+	path->pathrec.mtu            = IB_MTU_2048;
+	path->pathrec.mtu_selector   = IB_SA_GTE;
 
 	return path;
 }

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 13:56:08 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 13 Sep 2006 23:56:08 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com>
References: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com>
Message-ID: <20060913205608.GB27766@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> >Putting knowledge about hw quirks in all protocols is really horrible.
> 
> Agreed.
> 
> >MTU should be decided by SA as part of path information.
> >If ULPs have spicific limitations wrt MTU they should use mtu selector
> >in path record query.
> 
> Thinking about this more, the proper place for this does seem to be in the
> selection of the path record (where you put it), rather than during connection
> establishment.
> 
> Although, I don't like the idea of the CMA changing every path to use an MTU of
> 1k.

Well, that's why it's off by default.
So, Ack?

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 14:09:40 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 00:09:40 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158175522.13748.12872.camel@hal.voltaire.com>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
	<79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
	<1158168091.13748.8242.camel@hal.voltaire.com>
	<20060913190539.GC26959@mellanox.co.il>
	<1158175522.13748.12872.camel@hal.voltaire.com>
Message-ID: <20060913210940.GC27766@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > If I understood Michael's comment properly, this will have the side
> > > > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > > > I know, there is no way to specify whether a path is needed for UD vs.
> > > > RC in the path query.
> > > 
> > > I don't know how either. I don't think it can be done (at least
> > > currently per the standard).
> > 
> > We don't really need to know whether path is for RC or UD QP.
> > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.
> 
> That's the default and not the minimum MTU (for IPoIB).

How isn't it? By default, IPoIB reports 2K MTU to linux.
So it will get 2K packets, and since IB swiches
can not fragment packets, they will simply get dropped.

I conclude that IPoIB by default requires minimum mtu of 2K.
Right?

And it's not a problem since all HCAs support 2K.

-- 
MST


From halr at voltaire.com  Wed Sep 13 14:03:57 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 17:03:57 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <20060913191343.GD26959@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<1158164787.13748.6289.camel@hal.voltaire.com>
	<20060913191343.GD26959@mellanox.co.il>
Message-ID: <1158181429.13748.16691.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 15:13, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > 
> > On Wed, 2006-09-13 at 12:22, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > > 
> > > > On Wed, 2006-09-13 at 11:57, Michael S. Tsirkin wrote:
> > > > > Tavor systems get better performance with 1K MTU. Since there does
> > > > > not seem to be any way to find out whether the remote system uses Tavor,
> > > > > add an option to limit the MTU globally.
> > > > 
> > > > Can't Tavor be determined locally ?
> > > 
> > > It can, but we need this for remote tavor as well, anyway.
> > > 
> > > > And couldn't the remote end negotiate the MTU down (if Tavor) as well ?
> > > 
> > > The way to do this is would be for SA to select 1K MTU if it detects Tavor on one side
> > > and if this does not conflict with MTU selector.
> > 
> > But it only needs the MTU on each local side (once for the REQ and on
> > the remote side for the REP). It would mean that if the local side were
> > capable of larger MTU and the remote side were Tavor, that the REQ would
> > be REJ with MTU too large and need to be retried at a smaller MTU.
> 
> This has 3 implications that make it impractical:
> . connection rate will suffer greatly
> . this will need ot be done in each ulp, and it's a lot of code
> . protocols such as sdp explicitly say what to do on rej
>   and do not seem to speak about retries

OK.

> > > However
> > > 1. Even opensm does not implement this optimization yet
> > 
> > What optimization ? I don't understand what you are saying OpenSM
> > doesn't support.
> > 
> > > 2. We need to work with existing SMs too
> > 
> > Not sure what the SA issue is here.
> 
> If path MTU selector in path query allows MTU 1K (e.g. "best MTU")
> and one of the sides is Tavor, select the best MTU that is 1K
> and not the largest possible.

How would it be identified if the SA supports this ?

> If path MTU selector requires 2K MTU, return path with 2K MTU.

Also, I'm not sure that this is the required difference in the SA
requests :-(

-- Hal


From mst at mellanox.co.il  Wed Sep 13 14:13:29 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 00:13:29 +0300
Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add
	rdma_establish
In-Reply-To: <45085EB6.80107@ichips.intel.com>
References: <45073FF7.7020506@ichips.intel.com>
	<20060913120154.GA23890@mellanox.co.il>
	<45083C3D.1000209@ichips.intel.com>
	<20060913193016.GF26959@mellanox.co.il>
	<45085EB6.80107@ichips.intel.com>
Message-ID: <20060913211328.GD27766@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [PATCH for-2.6.18] Re: [PATCH] IB/cma: add rdma_establish
> 
> Michael S. Tsirkin wrote:
> > I don't really understand. The fix is a one-liner.
> > The problem is observed in practice, under stress.
> > Who *wants* systems that fall apart under stress?
> 
> My view is: is this worth delaying the release of the kernel?

One line very low risk patch won't delay the release of the kernel.

> And I don't see 
> that it is at this point in the 2.6.18 release cycle.  This does not fix a 
> system crash.  It only allows a connection to be made if the system is under 
> heavy stress.

Well, applications happen to need connections to do stuff.
If you can't connect, what good is it that it does not crash?
No?

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 14:17:54 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 00:17:54 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158181429.13748.16691.camel@hal.voltaire.com>
References: <1158181429.13748.16691.camel@hal.voltaire.com>
Message-ID: <20060913211754.GE27766@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > If path MTU selector in path query allows MTU 1K (e.g. "best MTU")
> > and one of the sides is Tavor, select the best MTU that is 1K
> > and not the largest possible.
> 
> How would it be identified if the SA supports this ?

You mean, if SA ignores mtu selector?
Then we are not worse off than we were before we set it - we get 2K MTU for
tavor and it works a bit slower.

> > If path MTU selector requires 2K MTU, return path with 2K MTU.
> 
> Also, I'm not sure that this is the required difference in the SA
> requests :-(

What do you mean?
Its not required, but its legal and it will give us better performance.

-- 
MST


From mshefty at ichips.intel.com  Wed Sep 13 14:22:35 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 14:22:35 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913205608.GB27766@mellanox.co.il>
References: <000701c6d76b$53761d40$ff0da8c0@amr.corp.intel.com>
	<20060913205608.GB27766@mellanox.co.il>
Message-ID: <4508769B.8070000@ichips.intel.com>

Michael S. Tsirkin wrote:
>>Although, I don't like the idea of the CMA changing every path to use an MTU of
>>1k.
> 
> Well, that's why it's off by default.
> So, Ack?

I'd like to find a way to support a 1k MTU to tavor HCAs without making the MTU 
1k to other HCAs, in case we're dealing with a heterogeneous environment.

Is this really the responsibility of the querying node or the SA?

- Sean


From mshefty at ichips.intel.com  Wed Sep 13 14:24:34 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 14:24:34 -0700
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913210940.GC27766@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
	<79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
	<1158168091.13748.8242.camel@hal.voltaire.com>
	<20060913190539.GC26959@mellanox.co.il>
	<1158175522.13748.12872.camel@hal.voltaire.com>
	<20060913210940.GC27766@mellanox.co.il>
Message-ID: <45087712.3050504@ichips.intel.com>

Michael S. Tsirkin wrote:
>>That's the default and not the minimum MTU (for IPoIB).
> 
> How isn't it? By default, IPoIB reports 2K MTU to linux.
> So it will get 2K packets, and since IB swiches
> can not fragment packets, they will simply get dropped.

I think this is simply the difference between the spec and the implementation. 
Given that the implementation requires a 2k MTU, IMO it should request paths 
with a 2k MTU.

- Sean


From halr at voltaire.com  Wed Sep 13 14:20:15 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 17:20:15 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <20060913211754.GE27766@mellanox.co.il>
References: <1158181429.13748.16691.camel@hal.voltaire.com>
	<20060913211754.GE27766@mellanox.co.il>
Message-ID: <1158182401.13748.17301.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 17:17, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > If path MTU selector in path query allows MTU 1K (e.g. "best MTU")
> > > and one of the sides is Tavor, select the best MTU that is 1K
> > > and not the largest possible.
> > 
> > How would it be identified if the SA supports this ?
> 
> You mean, if SA ignores mtu selector?

No; I meant detect that one end of the PR request is a Tavor. Wasn't
that part of it ?

If SA doesn't support MTU selector and ignoring MTU selector, it is not
compliant and should be fixed.

> Then we are not worse off than we were before we set it - we get 2K MTU for
> tavor and it works a bit slower.
> 
> > > If path MTU selector requires 2K MTU, return path with 2K MTU.
> > 
> > Also, I'm not sure that this is the required difference in the SA
> > requests :-(
> 
> What do you mean?
> Its not required, but its legal and it will give us better performance.

I mean that there is no requirement on what the IPoIB SA PR request
looks like what you are using to differentiate from the PR requests for
a connection setup.

-- Hal


From halr at voltaire.com  Wed Sep 13 14:21:28 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 17:21:28 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <20060913210940.GC27766@mellanox.co.il>
References: <1158163574.13748.5521.camel@hal.voltaire.com>
	<20060913162245.GA25666@mellanox.co.il>
	<79ae2f320609130939u22de493ay6edf70778884aba2@mail.gmail.com>
	<1158165350.13748.6667.camel@hal.voltaire.com>
	<79ae2f320609131023r12d5dc95o72c0e382dac5f921@mail.gmail.com>
	<1158168091.13748.8242.camel@hal.voltaire.com>
	<20060913190539.GC26959@mellanox.co.il>
	<1158175522.13748.12872.camel@hal.voltaire.com>
	<20060913210940.GC27766@mellanox.co.il>
Message-ID: <1158182416.13748.17303.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > 
> > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > If I understood Michael's comment properly, this will have the side
> > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > > > > I know, there is no way to specify whether a path is needed for UD vs.
> > > > > RC in the path query.
> > > > 
> > > > I don't know how either. I don't think it can be done (at least
> > > > currently per the standard).
> > > 
> > > We don't really need to know whether path is for RC or UD QP.
> > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.
> > 
> > That's the default and not the minimum MTU (for IPoIB).
> 
> How isn't it?

Look at RFC 4391 as to the requirement.

> By default, IPoIB reports 2K MTU to linux.
> So it will get 2K packets, and since IB swiches
> can not fragment packets, they will simply get dropped.

With ifconfig, the MTU can be changed. Fragmentation is at the IP layer
in the end station stack, not the IB switches.

> I conclude that IPoIB by default requires minimum mtu of 2K.
> Right?

Not minimum.

> And it's not a problem since all HCAs support 2K.

or more but it could be less per the RFC.

-- Hal


From robert.j.woodruff at intel.com  Wed Sep 13 14:43:56 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 13 Sep 2006 14:43:56 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691C9E76A1@orsmsx418.amr.corp.intel.com>

Robert Walsh wrote,
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
0x2302400
> VAddr 0x00002a95dd3480
> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
0x2402500
> VAddr 0x00002a95c85480
> 4730:main: Completion with error at client:
> 4730:main: Failed status 9: wr_id 3
> 4730:main: scnt=7584, ccnt=6584
> [woody at rkl-13 bin]$  

>Hi Woody,
Robert Walsh wrote, 
>When RC4 is available, there should be a patch in there that will fix
>this.  Can you let us know if you continue to see problems?

>Regards,
> Robert.

I installed RC4 and now get this, 


[woody at rkl-13 bin]$ ./ib_rdma_bw 
9035: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000
| duplex=0 | cma=0 |
libibverbs: Warning: no userspace device-specific driver found for
uverbs0
        driver search path: /usr/local/ofed/lib64/infiniband
9035:main: No IB devices found

I tried getting the latest ofed 1.1 ipathverbs from svn today that I
thought would have
a fix for this, and I think I got it built ok, although the mellanox
build environment is less than intuitive, but it still seems to fail.
Guess we will try again with RC5 tomorrow. 

woody


From mst at mellanox.co.il  Wed Sep 13 14:43:23 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 00:43:23 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <4508769B.8070000@ichips.intel.com>
References: <4508769B.8070000@ichips.intel.com>
Message-ID: <20060913214323.GF27766@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> Michael S. Tsirkin wrote:
> >>Although, I don't like the idea of the CMA changing every path to use an MTU of
> >>1k.
> > 
> > Well, that's why it's off by default.
> > So, Ack?
> 
> I'd like to find a way to support a 1k MTU to tavor HCAs without making the MTU 
> 1k to other HCAs, in case we're dealing with a heterogeneous environment.

IMO the cleanway is to do it in SA.

> 
> Is this really the responsibility of the querying node or the SA?

IMO it's really SA's job.
But, a simple option as a work around for SA's that don't support
it properly would be also nice.

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 14:45:20 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 00:45:20 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158182401.13748.17301.camel@hal.voltaire.com>
References: <1158182401.13748.17301.camel@hal.voltaire.com>
Message-ID: <20060913214520.GG27766@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On Wed, 2006-09-13 at 17:17, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > If path MTU selector in path query allows MTU 1K (e.g. "best MTU")
> > > > and one of the sides is Tavor, select the best MTU that is 1K
> > > > and not the largest possible.
> > > 
> > > How would it be identified if the SA supports this ?
> > 
> > You mean, if SA ignores mtu selector?
> 
> No; I meant detect that one end of the PR request is a Tavor. Wasn't
> that part of it ?

SA can easily figure out it's talking to tavor by looking at vendor part id.

> If SA doesn't support MTU selector and ignoring MTU selector, it is not
> compliant and should be fixed.
> 
> > Then we are not worse off than we were before we set it - we get 2K MTU for
> > tavor and it works a bit slower.
> > 
> > > > If path MTU selector requires 2K MTU, return path with 2K MTU.
> > > 
> > > Also, I'm not sure that this is the required difference in the SA
> > > requests :-(
> > 
> > What do you mean?
> > Its not required, but its legal and it will give us better performance.
> 
> I mean that there is no requirement on what the IPoIB SA PR request
> looks like what you are using to differentiate from the PR requests for
> a connection setup.

Correct. But if IPoIB requires 2K MTU it must use MTU selector,
if it does not it's OK to give it a smaller MTU.

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 14:49:55 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 00:49:55 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158182416.13748.17303.camel@hal.voltaire.com>
References: <1158182416.13748.17303.camel@hal.voltaire.com>
Message-ID: <20060913214955.GH27766@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > 
> > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote:
> > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > > If I understood Michael's comment properly, this will have the side
> > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > > > > > I know, there is no way to specify whether a path is needed for UD vs.
> > > > > > RC in the path query.
> > > > > 
> > > > > I don't know how either. I don't think it can be done (at least
> > > > > currently per the standard).
> > > > 
> > > > We don't really need to know whether path is for RC or UD QP.
> > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.
> > > 
> > > That's the default and not the minimum MTU (for IPoIB).
> > 
> > How isn't it?
> 
> Look at RFC 4391 as to the requirement.

I'm talking about our implementation not the spec.

> > By default, IPoIB reports 2K MTU to linux.
> > So it will get 2K packets, and since IB swiches
> > can not fragment packets, they will simply get dropped.
> 
> With ifconfig, the MTU can be changed. Fragmentation is at the IP layer
> in the end station stack, not the IB switches.

AFAIK linux won't fragment packets that do not exceed MTU and MSS.

> > I conclude that IPoIB by default requires minimum mtu of 2K.
> > Right?
> 
> Not minimum.
> 
> > And it's not a problem since all HCAs support 2K.
> 
> or more but it could be less per the RFC.

Again, if IPoIB implementation does not need 2K mtu there's
no problem to give it 1K in path. If it wants 2K MTU it must
set selector accordingly.

-- 
MST


From dledford at redhat.com  Wed Sep 13 14:52:33 2006
From: dledford at redhat.com (Doug Ledford)
Date: Wed, 13 Sep 2006 17:52:33 -0400
Subject: [openib-general] OFED-1.1-rc4 is ready
In-Reply-To: <20060913062518.GL20225@mellanox.co.il>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EB437@xmb-sjc-216.amer.cisco.com>
	<1158125915.30173.27.camel@sardonyx>
	<20060913062518.GL20225@mellanox.co.il>
Message-ID: <1158184353.2661.22.camel@fc6.xsintricity.com>

On Wed, 2006-09-13 at 09:25 +0300, Michael S. Tsirkin wrote:
> Quoting r. Bryan O'Sullivan <bos at pathscale.com>:
> > > the ibv_driver_init function was changed to openib_driver_init. 
> >
> > By the way, I find it unsettling that the current libibverbs internal
> > ABI allows silent breakage like this that cannot be detected except at
> > runtime, and then only when the right hardware is present.
> > 
> > Mind you, I don't have any better suggestions in mind (at least not at
> > 10:30pm).
> > 
> > But I worry about the possibility this leaves open for botched field
> > upgrades breaking userspace in you-don't-find-out-until-it's-too-late
> > ways when libibverbs 1.1 starts being used.
> 
> libipathverbs can simply export both ibv_driver_init and
> openib_driver_init like libmthca does, that's what we'll do for OFED.
> 
> Or maybe Doug here can come up with some symbol versioning trick.
> Dough?

I don't think you can do symbol versioning here.  For symbol versioning
to work you have to have a compile time map from the source used to the
version you are linking to.  For all the drivers, like mthca, they are
compiled after libibverbs, and so libibverbs is built blind to the
drivers if you will, yet it is the drivers that provide the symbol and
therefore the symbol version according to the linker, so libibverbs can
never have the automated type symbol versioning.


-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060913/d7ac0507/attachment.sig>

From trimmer at silverstorm.com  Wed Sep 13 14:54:26 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Wed, 13 Sep 2006 17:54:26 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <4508769B.8070000@ichips.intel.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE89BFC5@mail.silverstorm.com>

> From: Sean Hefty
> Sent: Wednesday, September 13, 2006 5:23 PM
> To: Michael S. Tsirkin
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to
> limitMTU to 1K
> 
> Michael S. Tsirkin wrote:
> >>Although, I don't like the idea of the CMA changing every path to
use an
> MTU of
> >>1k.
> >
> > Well, that's why it's off by default.
> > So, Ack?
> 
> I'd like to find a way to support a 1k MTU to tavor HCAs without
making
> the MTU
> 1k to other HCAs, in case we're dealing with a heterogeneous
environment.
> 
> Is this really the responsibility of the querying node or the SA?
> 
> - Sean
> 

The real issue here is how to handle "optimization" tricks for selected
models of HCAs.  While Tavor supports a 2K MTU and works with it, it has
been found to offer better MPI bandwidth when running 1K MTU.  For many
other ULPs no difference in performance is observable (because many
other ULPs don't stress the HCA the way MPI bandwidth benchmarks do).

Another dimension to this problem is that its not clear what the best
optimization will be in heterogeneous environments.  Such as a Tavor HCA
talking to a Sinai, Arbel or other type of TCA based device using a
non-MPI protocol (such as a storage target).  In those environments a 2K
MTU may perform the same (or depending on the storage target, perhaps
even better).

At this point I would suggest this is a subtle performance issue
specific to MPI and MPI libraries can appropriately provide options to
tune the maximum MTU MPI to use or request (which is only one of dozens
of MPI tunables needed to fine tune MPI).  MPI environments will tend to
be more homogeneous which also simplifies the solution.

Pushing these types of ULP and source/destination specific issues into
the core stack or SM will get very complex very quick.  Given the issue
on the table (Tavor performance) is specific to an older HCA model, it
may not even be that critical since the highest performance customers
have long since moved toward PCIe and DDR fabrics, neither of which are
supported by Tavor.

Todd Rimmer


From rdreier at cisco.com  Wed Sep 13 14:58:34 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Sep 2006 14:58:34 -0700
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <20060913205430.GA27766@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 13 Sep 2006 23:54:31 +0300")
References: <1158178200.13748.14583.camel@hal.voltaire.com>
	<20060913205430.GA27766@mellanox.co.il>
Message-ID: <adafyevxwc5.fsf@cisco.com>

    Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
    Michael> selector in path record query accordingly.

Umm -- why does it need a 2K MTU?  As far as I know it should work
fine with any MTU, assuming the SA sets the MTU of the broadcast
multicast group correctly.

 - R.


From mst at mellanox.co.il  Wed Sep 13 15:01:03 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 01:01:03 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE89BFC5@mail.silverstorm.com>
References: <D80D83302DEE6249A221093BF2BB69AE89BFC5@mail.silverstorm.com>
Message-ID: <20060913220103.GA28790@mellanox.co.il>

Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
> Subject: RE: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> > From: Sean Hefty
> > Sent: Wednesday, September 13, 2006 5:23 PM
> > To: Michael S. Tsirkin
> > Cc: openib-general at openib.org
> > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to
> > limitMTU to 1K
> > 
> > Michael S. Tsirkin wrote:
> > >>Although, I don't like the idea of the CMA changing every path to
> use an
> > MTU of
> > >>1k.
> > >
> > > Well, that's why it's off by default.
> > > So, Ack?
> > 
> > I'd like to find a way to support a 1k MTU to tavor HCAs without
> making
> > the MTU
> > 1k to other HCAs, in case we're dealing with a heterogeneous
> environment.
> > 
> > Is this really the responsibility of the querying node or the SA?
> > 
> > - Sean
> > 
> 
> The real issue here is how to handle "optimization" tricks for selected
> models of HCAs.  While Tavor supports a 2K MTU and works with it, it has
> been found to offer better MPI bandwidth when running 1K MTU.  For many
> other ULPs no difference in performance is observable (because many
> other ULPs don't stress the HCA the way MPI bandwidth benchmarks do).
> 
> Another dimension to this problem is that its not clear what the best
> optimization will be in heterogeneous environments.  Such as a Tavor HCA
> talking to a Sinai, Arbel or other type of TCA based device using a
> non-MPI protocol (such as a storage target).  In those environments a 2K
> MTU may perform the same (or depending on the storage target, perhaps
> even better).

If Tavor is involved at either end, 1K MTU is better than 2K MTU.

> At this point I would suggest this is a subtle performance issue
> specific to MPI 

This is not specific to MPI. All ULPs experience this issue.

> and MPI libraries can appropriately provide options to
> tune the maximum MTU MPI to use or request (which is only one of dozens
> of MPI tunables needed to fine tune MPI).  MPI environments will tend to
> be more homogeneous which also simplifies the solution.
> 
> Pushing these types of ULP and source/destination specific issues into
> the core stack or SM will get very complex very quick.

It's actually relatively simple.

> Given the issue
> on the table (Tavor performance) is specific to an older HCA model, it
> may not even be that critical since the highest performance customers
> have long since moved toward PCIe and DDR fabrics, neither of which are
> supported by Tavor.

All the more reason to pt the simple logic in one place
and not expect all apprlications to optimize for this hardware.

-- 
MST


From halr at voltaire.com  Wed Sep 13 14:57:27 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 17:57:27 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <20060913214955.GH27766@mellanox.co.il>
References: <1158182416.13748.17303.camel@hal.voltaire.com>
	<20060913214955.GH27766@mellanox.co.il>
Message-ID: <1158184605.13748.18709.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 17:49, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > 
> > On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > > 
> > > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote:
> > > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > > > If I understood Michael's comment properly, this will have the side
> > > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > > > > > > I know, there is no way to specify whether a path is needed for UD vs.
> > > > > > > RC in the path query.
> > > > > > 
> > > > > > I don't know how either. I don't think it can be done (at least
> > > > > > currently per the standard).
> > > > > 
> > > > > We don't really need to know whether path is for RC or UD QP.
> > > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.
> > > > 
> > > > That's the default and not the minimum MTU (for IPoIB).
> > > 
> > > How isn't it?
> > 
> > Look at RFC 4391 as to the requirement.
> 
> I'm talking about our implementation not the spec.

Don't we risk interop issues by relying on things not required in the
spec ?

> > > By default, IPoIB reports 2K MTU to linux.
> > > So it will get 2K packets, and since IB swiches
> > > can not fragment packets, they will simply get dropped.
> > 
> > With ifconfig, the MTU can be changed. Fragmentation is at the IP layer
> > in the end station stack, not the IB switches.
> 
> AFAIK linux won't fragment packets that do not exceed MTU and MSS.
> 
> > > I conclude that IPoIB by default requires minimum mtu of 2K.
> > > Right?
> > 
> > Not minimum.
> > 
> > > And it's not a problem since all HCAs support 2K.
> > 
> > or more but it could be less per the RFC.
> 
> Again, if IPoIB implementation does not need 2K mtu there's
> no problem to give it 1K in path. If it wants 2K MTU it must
> set selector accordingly.


From mst at mellanox.co.il  Wed Sep 13 15:08:59 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 01:08:59 +0300
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <adafyevxwc5.fsf@cisco.com>
References: <adafyevxwc5.fsf@cisco.com>
Message-ID: <20060913220859.GB28790@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> 
>     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
>     Michael> selector in path record query accordingly.
> 
> Umm -- why does it need a 2K MTU?  As far as I know it should work
> fine with any MTU, assuming the SA sets the MTU of the broadcast
> multicast group correctly.

Hmm, you are right, it is just that existing implementations all
set that to 2K.

But there is a silent assumption that MTU of any path is >= broadcast
multicast group MTU, and this is what I want to fix.

Like this then? We could look at dev->mtu instead, but that's
a couple of extra lines and I'm not sure it's worth the complexity.
What do you think?

--

IPoIB in linux needs MTU on any path to be >= broadcast mtu.
Therefore it must set mtu selector in path record query accordingly.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cf71d2a..3bc052f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat
 	INIT_LIST_HEAD(&path->neigh_list);
 
 	memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid));
-	path->pathrec.sgid      = priv->local_gid;
-	path->pathrec.pkey      = cpu_to_be16(priv->pkey);
-	path->pathrec.numb_path = 1;
+	path->pathrec.sgid           = priv->local_gid;
+	path->pathrec.pkey           = cpu_to_be16(priv->pkey);
+	path->pathrec.numb_path      = 1;
+	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
+	path->pathrec.mtu_selector   = IB_SA_GTE;
 
 	return path;
 }

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 15:11:40 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 01:11:40 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <1158184605.13748.18709.camel@hal.voltaire.com>
References: <1158184605.13748.18709.camel@hal.voltaire.com>
Message-ID: <20060913221140.GC28790@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> 
> On Wed, 2006-09-13 at 17:49, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > 
> > > On Wed, 2006-09-13 at 17:09, Michael S. Tsirkin wrote:
> > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > Subject: Re: [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
> > > > > 
> > > > > On Wed, 2006-09-13 at 15:05, Michael S. Tsirkin wrote:
> > > > > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > > > > > If I understood Michael's comment properly, this will have the side
> > > > > > > > effect that IPoIB won't work since IPoIB requires 2K MTUs.  As far as
> > > > > > > > I know, there is no way to specify whether a path is needed for UD vs.
> > > > > > > > RC in the path query.
> > > > > > > 
> > > > > > > I don't know how either. I don't think it can be done (at least
> > > > > > > currently per the standard).
> > > > > > 
> > > > > > We don't really need to know whether path is for RC or UD QP.
> > > > > > IPoIB needs MTU >= 2K so it should set mtu selector to >= 2K.
> > > > > 
> > > > > That's the default and not the minimum MTU (for IPoIB).
> > > > 
> > > > How isn't it?
> > > 
> > > Look at RFC 4391 as to the requirement.
> > 
> > I'm talking about our implementation not the spec.
> 
> Don't we risk interop issues by relying on things not required in the
> spec ?

Yo confuse me. IPoIB currently assumes that
broadcast group MTU <= path MTU
for any path, but does not set MTU selector in SA query
so SA could thinkably give it any MTU.

This is assumption not in the spec and I think should be fixed ASAP,
by setting path selector.


-- 
MST


From mshefty at ichips.intel.com  Wed Sep 13 15:10:38 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 15:10:38 -0700
Subject: [openib-general] [PATCH] IB/cma: add rdma_establish
In-Reply-To: <20060907214524.GA14791@mellanox.co.il>
References: <20060907214524.GA14791@mellanox.co.il>
Message-ID: <450881DE.9070806@ichips.intel.com>

Michael S. Tsirkin wrote:
> IB/cma: add rdma_establish
> 
> Make it possible for ULPs to handle RTU loss by calling
> rdma_establish.

I've committed this patch to svn 9470.  It still requires exporting the 
rdma_establish call to userspace.

- Sean


From jgunthorpe at obsidianresearch.com  Wed Sep 13 15:19:40 2006
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 13 Sep 2006 16:19:40 -0600
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <20060913220859.GB28790@mellanox.co.il>
References: <adafyevxwc5.fsf@cisco.com> <20060913220859.GB28790@mellanox.co.il>
Message-ID: <20060913221940.GC31285@obsidianresearch.com>

On Thu, Sep 14, 2006 at 01:08:59AM +0300, Michael S. Tsirkin wrote:

> > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > multicast group correctly.
> 
> Hmm, you are right, it is just that existing implementations all
> set that to 2K.

IPv6 has a required minimum MTU of 1280 bytes. If IPv6 is to be used
over IB then the MTU must be 2k.

Jason


From trimmer at silverstorm.com  Wed Sep 13 15:28:14 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Wed, 13 Sep 2006 18:28:14 -0400
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
 to 1K
In-Reply-To: <20060913220103.GA28790@mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE89BFD7@mail.silverstorm.com>


> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
> Sent: Wednesday, September 13, 2006 6:01 PM
> To: Rimmer, Todd
> Cc: Sean Hefty; openib-general at openib.org
> Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to
> limitMTU to 1K
> 
> Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:

> >
> > Pushing these types of ULP and source/destination specific issues
into
> > the core stack or SM will get very complex very quick.
> 
> It's actually relatively simple.

So here is how it gets complex.  The best MTU needs to be selected for
various combos such as:
Tavor w/IPoIB
Tavor to Storage target with SRP
Tavor to eHCA with SDP
Tavor to PathScale with MPI
Tavor to DDR Arbel with SRP
etc etc

The answer for many of the above combos may not be 1K MTU runs best.
Hence if we try to support this in the SA, it needs to know about all
these subtle combinations.

The IB spec avoids such complex combos by having each Node reports its
MTU capabilities (as well as others like outstanding RDMA reads, etc).

> 
> > Given the issue
> > on the table (Tavor performance) is specific to an older HCA model,
it
> > may not even be that critical since the highest performance
customers
> > have long since moved toward PCIe and DDR fabrics, neither of which
are
> > supported by Tavor.
> 
> All the more reason to pt the simple logic in one place
> and not expect all apprlications to optimize for this hardware.

All the reason to invest in more important requirements, such as SDP
Z-Copy.  Especially since most of the performance critical applications
(Open MPI, Scali MPI, MVAPICH MPI, etc) have already implemented this
optimization.

Todd Rimmer


From halr at voltaire.com  Wed Sep 13 15:37:24 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 18:37:24 -0400
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <20060913220859.GB28790@mellanox.co.il>
References: <adafyevxwc5.fsf@cisco.com> <20060913220859.GB28790@mellanox.co.il>
Message-ID: <1158187004.13748.20243.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote:
> Quoting r. Roland Dreier <rdreier at cisco.com>:
> > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > 
> >     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
> >     Michael> selector in path record query accordingly.
> > 
> > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > multicast group correctly.
> 
> Hmm, you are right, it is just that existing implementations all
> set that to 2K.

By default yes. It can be configured.

> But there is a silent assumption that MTU of any path is >= broadcast
> multicast group MTU, and this is what I want to fix.

The spec says:
"The value (for IB MTU) assigned to the broadcast-GID must not be
greater than any physical link MTU spanned by the IPoIB subnet".
so if the broadcast group is improperly setup not to follow this, there
will be other issues. It doesn't need to be included in the PR request.

-- Hal

> Like this then? We could look at dev->mtu instead, but that's
> a couple of extra lines and I'm not sure it's worth the complexity.
> What do you think?
> 
> --
> 
> IPoIB in linux needs MTU on any path to be >= broadcast mtu.
> Therefore it must set mtu selector in path record query accordingly.
> 
> Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index cf71d2a..3bc052f 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat
>  	INIT_LIST_HEAD(&path->neigh_list);
>  
>  	memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid));
> -	path->pathrec.sgid      = priv->local_gid;
> -	path->pathrec.pkey      = cpu_to_be16(priv->pkey);
> -	path->pathrec.numb_path = 1;
> +	path->pathrec.sgid           = priv->local_gid;
> +	path->pathrec.pkey           = cpu_to_be16(priv->pkey);
> +	path->pathrec.numb_path      = 1;
> +	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
> +	path->pathrec.mtu_selector   = IB_SA_GTE;
>  
>  	return path;
>  }


From rjwalsh at pathscale.com  Wed Sep 13 15:49:25 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 13 Sep 2006 15:49:25 -0700
Subject: [openib-general] [openfabrics-ewg] OFED-1.1-rc4 is ready
In-Reply-To: <1158184353.2661.22.camel@fc6.xsintricity.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3023EB437@xmb-sjc-216.amer.cisco.com>
	<1158125915.30173.27.camel@sardonyx>
	<20060913062518.GL20225@mellanox.co.il>
	<1158184353.2661.22.camel@fc6.xsintricity.com>
Message-ID: <45088AF5.4020806@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I don't think you can do symbol versioning here.

Right - probably being a little more verbose about finding the file but
not the symbol would be a good idea, though.  It took me a bit of gdb
work to track the problem down: not really a big deal, but a clearer
error might have helped.

Another idea (I haven't thought the implications through yet - just
throwing it out there) is to dlsym() a "version" symbol that the library
is expected to provide and check that it matches what you expect it to:
sort of like the way the user verbs stuff checks that the kernel uverbs
module matches it.  If dlsym() fails to find a symbol, then you're
running an older one anyway.

None of this helps at compile time, where it would be preferable to spot
the problem.  Another not-really-sat-down-and-thought-about-it idea is
to have something like this in infiniband/driver.h:

  #define VERBS_LIBRARY_VERSION 2

  #ifdef VERBS_DRIVER_VERSION
  #if VERBS_DRIVER_VERSION != VERBS_LIBRARY_VERSION
  #error verbs library version doesn't match driver version.
  #endif
  #endif

The VERBS_LIBRARY_VERSION would be bumped on an API change.

VERBS_DRIVER_VERSION would be defined in the driver library (in mthca.h,
ipathverbs.h and ehca_uinit.c) and would be updated to match.

Just brainstorming.  Anyone else got any thoughts or suggestions?

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQiK9fzvnpzTd9fxAQLIEAf9G56jSWtGyG3kFwnbV7WpMNVmuA04xvm4
FA/y3BwjsNAckfBGm+13BVUuvqs9idm7UmC82jaXxIvm+cwoDNfBXSUj/4VqJW/y
ZHESz0ulcyNXEhEANoIFb2NjmL1Fadl8cWEPW9rDPxyw7eSke/Wd1a8qwkKA+1dq
L9L5+Cp72IV+5cKm4EPqV+R+MeO5UjNkd06/g4XVKVuEMYhnTvBhpu9ePt+mZ1zP
otwwC/eI5ngvMAk2thBQfi0zEaFkqiLkiEUGP/PofmaJZuN4lcp1R/2FSiP7K2fj
3KY6HLGl+6wDpjJ0PpnIhSp3h3vFkeRFtJHKhNhOr+vM8qiGrQTlHg==
=LVqb
-----END PGP SIGNATURE-----


From rdreier at cisco.com  Wed Sep 13 16:07:19 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 13 Sep 2006 16:07:19 -0700
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <20060913220859.GB28790@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 14 Sep 2006 01:08:59 +0300")
References: <adafyevxwc5.fsf@cisco.com> <20060913220859.GB28790@mellanox.co.il>
Message-ID: <adabqpjxt5k.fsf@cisco.com>

 > +	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
 > +	path->pathrec.mtu_selector   = IB_SA_GTE;

Does this do anything without setting the component mask of the actual request??

 - R.


From halr at voltaire.com  Wed Sep 13 16:37:58 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 13 Sep 2006 19:37:58 -0400
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <adabqpjxt5k.fsf@cisco.com>
References: <adafyevxwc5.fsf@cisco.com>
	<20060913220859.GB28790@mellanox.co.il> <adabqpjxt5k.fsf@cisco.com>
Message-ID: <1158190635.13748.22540.camel@hal.voltaire.com>

On Wed, 2006-09-13 at 19:07, Roland Dreier wrote:
>  > +	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
>  > +	path->pathrec.mtu_selector   = IB_SA_GTE;
> 
> Does this do anything without setting the component mask of the actual request??

As you imply (if you are asking for verification), SA would ignore these
fields without the corresponding CM bits set.

-- Hal

>  - R.


From mshefty at ichips.intel.com  Wed Sep 13 17:02:14 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 13 Sep 2006 17:02:14 -0700
Subject: [openib-general] [PATCH v3] ib_sa: require SA registration
In-Reply-To: <ada3bay3yfj.fsf@cisco.com>
References: <000501c6c57b$2594dd00$8698070a@amr.corp.intel.com>
	<ada3bay3yfj.fsf@cisco.com>
Message-ID: <45089C06.4050908@ichips.intel.com>

Roland Dreier wrote:
> OK, I added the following to my for-2.6.19 branch.  The differences
> from your patch are:
> 
>  - CMA can have a static variable (good to avoid clashes with a global
>    'sa_client' variable name too)
>  - IPoIB does not use multicast module upstream, fix ipoib_multicast.c too.
>  - Simplify sa_query.c changes a little.  I don't like the
>    "deref_client" name for a function, since it sounds too much like
>    dereferencing a pointer rather than dropping a reference.  And I
>    also didn't like ib_sa_client_get() having a magic side effect of
>    setting query->client.  So I just open-coded more stuff.
> 
> How does this look?

I took the changes in your for-2.6.19 branch, modified the original patch to 
match, and committed that to svn.

- Sean


From yc_zhou at ncic.ac.cn  Wed Sep 13 17:26:18 2006
From: yc_zhou at ncic.ac.cn (Yingchao Zhou)
Date: Thu, 14 Sep 2006 08:26:18 +0800
Subject: [openib-general] Problem related to integration of OS/NIC
Message-ID: <20060914002634.42842FB046@ncic.ac.cn>

     The current kernel set PAGE_COPY without write bit. This will cause intermittent non-cosistent data for user-level network drivers such as Infiniband, Quadrics and Myrinet. Which has also be mentioned by Costin Iancu in the paper "HUNTing the Overlap " (PACT'05).
    An example of such phenomena is the following sequences: 
	register a memory space BUFF for receive message, 
	receive message,
	call mprotect(...PROT_NONE...) and mprotect(...PROT_READ|PROT_WRITE) one by one, 	
	write into BUFF, 
	then receive again.      
    The second time received data will perhaps not be the data sent by the peer machine but the data written by itself in the 4th step.

     The reson is that :
     1) User-level network driver locks phy pages when memory space is registered;
     2) 2 calls to mprotect change ptes in the space to PAGE_COPY, so write any page in the space will cause a page fault;
     3) In the page fault handler, it goes to do_wp_page, and in it if Page Is Locked, a new page is generated and filled into the pte, which is the COW(Copy-On-Write). So the physical page seen by the host is not the same one by the NIC.

     Adding PAGE_RW to PAGE_COPY will resolve this problem.  
     In my option, the reason for absense of RW is to save memory by mapping all those only read pages into ZERO_PAGE. But is there really programs which make many read-ops in memory space without even initialize them?

___________________________________________________
_      Yingchao Zhou                              _
_      ICT, CAS                                   _
_      (86)010-62601009                           _
___________________________________________________


From tom at opengridcomputing.com  Wed Sep 13 20:33:23 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Wed, 13 Sep 2006 22:33:23 -0500
Subject: [openib-general] CMA issue: bind selects the same port after
 close
In-Reply-To: <000001c6d6ee$9a47e000$5bd9180a@amr.corp.intel.com>
Message-ID: <C12E37B3.8C72%tom@opengridcomputing.com>


On 9/12/06 11:39 PM, "Sean Hefty" <sean.hefty at intel.com> wrote:

>>> I completely understand that the existing port management services are not
>>> exported, but functionally, they support multiple port spaces, show up in
>>> netstat, etc... Can someone please explain to me the reluctance to use these
>>> services in favor of replicating them?
> 
> My reluctance to use the existing port spaces is that we're not guaranteed to
> run TCP or IP.  I'm happy to map the address spaces, but that's not the same
> as
> using those addresses when you're not using that protocol.
> 
>> inet_csk_get_port actually *is* exported, and while it might be hard for CMA
>> to
>> use it (needs struct sock*), maybe it is easy for SDP.
> 

Yes, I agree. This is the crux of the issue. The sock structure is coupled
with inet_csk_get_port, and it is not trivial in size. This service,
however, is itself built on lower level port allocation services that are
not coupled with struct sock, but are also not exported. So what I think
needs to be done is to look at these lower level services and decide a) how
to effectively export them, and b) rationalize their export.
 

> I did look at this, but the use of struck sock made it extremely difficult for
> the CMA to use the existing calls.
> 
>> So, possibly we should just leave the CMA port allocation as is,
>> and enhance SDP to use inet_csk_get_port.
> 
> That sounds reasonable.
> 

Short term, perhaps, but long-term, I think we end up with this same kind of
logic being replicated in ULP all over the place.

> - Sean


From mst at mellanox.co.il  Wed Sep 13 21:46:22 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 07:46:22 +0300
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <1158187004.13748.20243.camel@hal.voltaire.com>
References: <1158187004.13748.20243.camel@hal.voltaire.com>
Message-ID: <20060914044622.GA24586@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> 
> On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote:
> > Quoting r. Roland Dreier <rdreier at cisco.com>:
> > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > 
> > >     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
> > >     Michael> selector in path record query accordingly.
> > > 
> > > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > > multicast group correctly.
> > 
> > Hmm, you are right, it is just that existing implementations all
> > set that to 2K.
> 
> By default yes. It can be configured.
> 
> > But there is a silent assumption that MTU of any path is >= broadcast
> > multicast group MTU, and this is what I want to fix.
> 
> The spec says:
> "The value (for IB MTU) assigned to the broadcast-GID must not be
> greater than any physical link MTU spanned by the IPoIB subnet".
> so if the broadcast group is improperly setup not to follow this, there
> will be other issues.

Correct. IPoIB uses broadcast group MTU to get the value reported to
Linux. If some link has a lower MTU IPoIB can not use it.

> It doesn't need to be included in the PR request.

I disagree here. If you do not set selector, SA is free to return
a path with lower MTU even though physical link allows higher MTU.
Does it say otherwise somewhere?


-- 
MST


From mst at mellanox.co.il  Wed Sep 13 22:03:35 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 08:03:35 +0300
Subject: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU
	to 1K
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE89BFD7@mail.silverstorm.com>
References: <20060913220103.GA28790@mellanox.co.il>
	<D80D83302DEE6249A221093BF2BB69AE89BFD7@mail.silverstorm.com>
Message-ID: <20060914050334.GD24586@mellanox.co.il>

Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
> > All the more reason to pt the simple logic in one place
> > and not expect all apprlications to optimize for this hardware.
> 
> All the reason to invest in more important requirements

This is completely orthogonal - Tavor gets better speed with 1K MTU
no matter what you do.

-- 
MST


From mst at mellanox.co.il  Wed Sep 13 22:35:09 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 08:35:09 +0300
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <adabqpjxt5k.fsf@cisco.com>
References: <adabqpjxt5k.fsf@cisco.com>
Message-ID: <20060914053509.GA24868@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> 
>  > +	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
>  > +	path->pathrec.mtu_selector   = IB_SA_GTE;
> 
> Does this do anything without setting the component mask of the actual request??
> 

Ugh. Correct of course. The SA I was testing against seems to have a bug.
The following is yet untested - any more obvious gotchas you can see?


diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index cf71d2a..c8e8dd3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -441,9 +441,11 @@ static struct ipoib_path *path_rec_creat
 	INIT_LIST_HEAD(&path->neigh_list);
 
 	memcpy(path->pathrec.dgid.raw, gid, sizeof (union ib_gid));
-	path->pathrec.sgid      = priv->local_gid;
-	path->pathrec.pkey      = cpu_to_be16(priv->pkey);
-	path->pathrec.numb_path = 1;
+	path->pathrec.sgid           = priv->local_gid;
+	path->pathrec.pkey           = cpu_to_be16(priv->pkey);
+	path->pathrec.numb_path      = 1;
+	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
+	path->pathrec.mtu_selector   = IB_SA_GTE;
 
 	return path;
 }
@@ -464,7 +466,8 @@ static int path_rec_start(struct net_dev
 				   IB_SA_PATH_REC_DGID		|
 				   IB_SA_PATH_REC_SGID		|
 				   IB_SA_PATH_REC_NUMB_PATH	|
-				   IB_SA_PATH_REC_PKEY,
+				   IB_SA_PATH_REC_PKEY          |
+				   IB_SA_PATH_REC_MTU_SELECTOR,
 				   1000, GFP_ATOMIC,
 				   path_rec_completion,
 				   path, &path->query);


-- 
MST


From mst at mellanox.co.il  Wed Sep 13 22:41:05 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 08:41:05 +0300
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691C9E76A1@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691C9E76A1@orsmsx418.amr.corp.intel.com>
Message-ID: <20060914054104.GC24868@mellanox.co.il>

Quoting r. Woodruff, Robert J <robert.j.woodruff at intel.com>:
> I tried getting the latest ofed 1.1 ipathverbs from svn today that I
> thought would have
> a fix for this, and I think I got it built ok, although the mellanox
> build environment is less than intuitive, but it still seems to fail.
> Guess we will try again with RC5 tomorrow. 

It's actually OFED build environment now :)
So you really should report improvement suggestions on list.

-- 
MST


From thomas.bub at thomson.net  Wed Sep 13 23:03:10 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Thu, 14 Sep 2006 08:03:10 +0200
Subject: [openib-general] OFED can't compile against sa.h under SLES10
 x86_64
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC448@wdtssmail01.eu.thmulti.com>

Michael and Bryan,
find the libibcm example cmpost.c that fails to compile enclosed.
My compiler output looks like:

cc -ggdb  -Wall -O0 -I. -I./usr/include -I./oibfix
-I/usr/local/ofed/include -I/usr/src/linux/drivers/infiniband/include
-D__x86_64__   -c -o cmpost.o cmpost.c
cmpost.c: In function 'query_for_path':
cmpost.c:658: error: invalid use of undefined type 'struct
ibv_sa_path_rec'
cmpost.c:658: error: dereferencing pointer to incomplete type
cmpost.c: In function 'run_client':
cmpost.c:679: warning: assignment from incompatible pointer type
make: *** [cmpost.o] Error 1


It's OK under SLES9 but fails under SLES10.
Thanks
Thomas


-------------- next part --------------
A non-text attachment was scrubbed...
Name: cmpost.c
Type: application/octet-stream
Size: 16040 bytes
Desc: cmpost.c
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060914/c5accb22/attachment.obj>

From ogerlitz at voltaire.com  Wed Sep 13 23:36:00 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 14 Sep 2006 09:36:00 +0300
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158172258.8759.230.camel@brick.pathscale.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<4507C8C2.6050206@voltaire.com>
	<1158172258.8759.230.camel@brick.pathscale.com>
Message-ID: <4508F850.5050804@voltaire.com>

Ralph Campbell wrote:
> On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote:
>> Ralph Campbell wrote:

> Well, the other parts of the kernel might not need a kernel virtual
> address but the ib_ipath driver still does.

So you agree there is a need to kmap/kunamp pages which the user wants 
to  use with IB and are not mapped into the kernel virt address space?

> I don't understand what you are talking about. There is an IB
> wire protocol for RDMA, SEND, etc. That doesn't change depending
> on the HCA.
> The InfiniPath HCA has a ring buffer of receive buffers and all
> incoming IB packets are DMA'ed into one of these buffers.
> The ib_ipath software driver examines the packet and
> copies it to the appropriate address. For a packet received with
> a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert
> that into a kernel virtual address and the data is copied.
> The same happens for RC_SEND_FIRST but the KV address comes from
> the LKEY and address in the work request posted by ib_post_recv().

OK, this make sense.

Lets see if i follow: you say that the Infinipath HCA is RX DMA-able but 
it does RX DMA to the ipath driver private RX buffers and then the 
driver copies from these buffers to the user buffer. My guess is that 
you do that to support both recv and rdma read on this QP since if you 
would only need to support recv you can have the hca dma-ing to the user 
posted rx buffer.

> Sending data is similar, the driver constructs a packet with the
> appropriate opcode and writes it to the chip which puts it on
> the wire.

OK.


From ogerlitz at voltaire.com  Thu Sep 14 00:12:36 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 14 Sep 2006 10:12:36 +0300 (IDT)
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change CMA
	config name
Message-ID: <Pine.LNX.4.64.0609141005480.7597@zuben>

change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add help text
clarifying what the thing does. Adding the help text also has the side
effect of the cma config being visible when one does make menuconfig

Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 69a53d4..7feea77 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -29,11 +29,16 @@ config INFINIBAND_USER_ACCESS
 	  libibverbs, libibcm and a hardware driver library from
 	  <http://www.openib.org>.

-config INFINIBAND_ADDR_TRANS
+config INFINIBAND_RDMA_CM
 	bool
 	depends on INFINIBAND && INET
 	default y
-
+	---help---
+	  RDMA transport independent communication management support.
+	  This includes handling of IP to RDMA address resolution (eg IB ARP),
+	  IB route resolution (eg IB SA Path query) and interaction with the
+	  transport communication manager (eg the IB and iWARP CM).
+
 source "drivers/infiniband/hw/mthca/Kconfig"
 source "drivers/infiniband/hw/ipath/Kconfig"

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 68e73ec..531b3c4 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -1,4 +1,4 @@
-infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS)	:= ib_addr.o rdma_cm.o
+infiniband-$(CONFIG_INFINIBAND_RDMA_CM)	:= ib_addr.o rdma_cm.o

 obj-$(CONFIG_INFINIBAND) +=		ib_core.o ib_mad.o ib_sa.o \
 					ib_cm.o $(infiniband-y)


From erezz at voltaire.com  Thu Sep 14 01:28:48 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 14 Sep 2006 11:28:48 +0300
Subject: [openib-general] fix iSER description and selections in Kconfig
In-Reply-To: <ada1wqi79mb.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
Message-ID: <450912C0.8070807@voltaire.com>

Roland Dreier wrote:
> There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig
> file.  ISER only depends on INFINIBAND && SCSI.  However it is easily
> possible to enable INFINIBAND and SCSI without enabling INET (in fact
> they can be enabled without NET as in the original config in this thread).
>
> iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it
> depends on, so this alone will result in a broken config.  However
> nothing will enable INET (which I think you said iser depends on).  So
> something like the below is required, I think.  Although it would
> probably be better to make iser depend on INET (as ISCSI_TCP does)
> rather than selecting NET and INET.
>
> Toralf, can you confirm that applying this patch and doing make
> oldconfig and make with your original config works OK?
>
> Thanks,
>   Roland
>
> diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
> index fead87d..a122bb4 100644
> --- a/drivers/infiniband/ulp/iser/Kconfig
> +++ b/drivers/infiniband/ulp/iser/Kconfig
> @@ -1,6 +1,8 @@
>  config INFINIBAND_ISER
>  	tristate "ISCSI RDMA Protocol"
>  	depends on INFINIBAND && SCSI
> +	select NET
> +	select INET
>  	select SCSI_ISCSI_ATTRS
>  	---help---
>  	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
>   
Roland,

I think that the patch below covers all cases. It depends on the patch 
that Or sent this morning for the config entry of the CMA.

fix the description of iSER in Kconfig. It is not accurate.
Also, iSER used the CMA and INET. It depends on SCSI_ISCSI_ATTRS
that depends on NET. Selecting NET, INET & INFINIBAND_RDMA_CM
ensures that the config won't break.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/Kconfig |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

3dc4e3bf0716d502a6fd7e62806c4932e8978e6b
diff --git a/drivers/infiniband/ulp/iser/Kconfig 
b/drivers/infiniband/ulp/iser/Kconfig
index fead87d..c251855 100644
--- a/drivers/infiniband/ulp/iser/Kconfig
+++ b/drivers/infiniband/ulp/iser/Kconfig
@@ -1,11 +1,14 @@
 config INFINIBAND_ISER
-       tristate "ISCSI RDMA Protocol"
+       tristate "iSCSI Extensions for RDMA (iSER)"
        depends on INFINIBAND && SCSI
+       select NET
+       select INET
+       select INFINIBAND_RDMA_CM
        select SCSI_ISCSI_ATTRS
        ---help---
-         Support for the ISCSI RDMA Protocol over InfiniBand.  This
-         allows you to access storage devices that speak ISER/ISCSI
+         Support for the iSCSI Extensions for RDMA (iSER) Protocol over 
InfiniBand. This
+         allows you to access storage devices that speak iSCSI over iSER
          over InfiniBand.

          The ISER protocol is defined by IETF.
-         See <http://www.ietf.org/>.
+         See 
<http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-05.txt>.
--
1.2.6


From erezz at voltaire.com  Thu Sep 14 01:42:22 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 14 Sep 2006 11:42:22 +0300
Subject: [openib-general] 2 SLES 10 backport directories
Message-ID: <450915EE.1090705@voltaire.com>

Michael,

I saw that there are 2 SLES 10 backport directories in the svn:

https://openib.org/svn/gen2/branches/backport/sles10/ - this one 
contains patches that we added for SLES 10

https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one 
was added later by you.

Can we unite them?

Here's my motivation: I want to be able to install SLES 10, replace its 
infiniband dir with infiniband from openib's svn, apply all SLES 10 
patches (from a single directory) and then it should work.

This should help us in future OFED releases.

Thanks
-- 

____________________________________________________________

Erez Zilber | 972-9-971-7689

Software Engineer, Storage Team

Voltaire – _The Grid Backbone_

__

www.voltaire.com <http://www.voltaire.com/>


From erezz at voltaire.com  Thu Sep 14 02:03:00 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 14 Sep 2006 12:03:00 +0300
Subject: [openib-general] [PATCH] IB/iser: fix iSER description and
 selections in Kconfig
In-Reply-To: <450912C0.8070807@voltaire.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<450912C0.8070807@voltaire.com>
Message-ID: <45091AC4.3090005@voltaire.com>

Erez Zilber wrote:
> Roland Dreier wrote:
>   
>> There is definitely a bug in the drivers/infiniband/ulp/iser/Kconfig
>> file.  ISER only depends on INFINIBAND && SCSI.  However it is easily
>> possible to enable INFINIBAND and SCSI without enabling INET (in fact
>> they can be enabled without NET as in the original config in this thread).
>>
>> iser does select SCSI_ISCSI_ATTRS, but without selecting NET that it
>> depends on, so this alone will result in a broken config.  However
>> nothing will enable INET (which I think you said iser depends on).  So
>> something like the below is required, I think.  Although it would
>> probably be better to make iser depend on INET (as ISCSI_TCP does)
>> rather than selecting NET and INET.
>>
>> Toralf, can you confirm that applying this patch and doing make
>> oldconfig and make with your original config works OK?
>>
>> Thanks,
>>   Roland
>>
>> diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
>> index fead87d..a122bb4 100644
>> --- a/drivers/infiniband/ulp/iser/Kconfig
>> +++ b/drivers/infiniband/ulp/iser/Kconfig
>> @@ -1,6 +1,8 @@
>>  config INFINIBAND_ISER
>>  	tristate "ISCSI RDMA Protocol"
>>  	depends on INFINIBAND && SCSI
>> +	select NET
>> +	select INET
>>  	select SCSI_ISCSI_ATTRS
>>  	---help---
>>  	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
>>   
>>     
> Roland,
>
> I think that the patch below covers all cases. It depends on the patch 
> that Or sent this morning for the config entry of the CMA.
>
>
>   
Please ignore the previous message. I didn't format the subject 
correctly. Here it is again:

fix the description of iSER in Kconfig. It is not accurate.
Also, iSER used the CMA and INET. It depends on SCSI_ISCSI_ATTRS
that depends on NET. Selecting NET, INET & INFINIBAND_RDMA_CM
ensures that the config won't break.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/Kconfig |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

3dc4e3bf0716d502a6fd7e62806c4932e8978e6b
diff --git a/drivers/infiniband/ulp/iser/Kconfig 
b/drivers/infiniband/ulp/iser/Kconfig
index fead87d..c251855 100644
--- a/drivers/infiniband/ulp/iser/Kconfig
+++ b/drivers/infiniband/ulp/iser/Kconfig
@@ -1,11 +1,14 @@
 config INFINIBAND_ISER
-       tristate "ISCSI RDMA Protocol"
+       tristate "iSCSI Extensions for RDMA (iSER)"
        depends on INFINIBAND && SCSI
+       select NET
+       select INET
+       select INFINIBAND_RDMA_CM
        select SCSI_ISCSI_ATTRS
        ---help---
-         Support for the ISCSI RDMA Protocol over InfiniBand.  This
-         allows you to access storage devices that speak ISER/ISCSI
+         Support for the iSCSI Extensions for RDMA (iSER) Protocol over 
InfiniBand. This
+         allows you to access storage devices that speak iSCSI over iSER
          over InfiniBand.

          The ISER protocol is defined by IETF.
-         See <http://www.ietf.org/>.
+         See 
<http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-05.txt>.
--
1.2.6


From ogerlitz at voltaire.com  Thu Sep 14 03:51:20 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 14 Sep 2006 13:51:20 +0300
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158108010.8759.192.camel@brick.pathscale.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
Message-ID: <45093428.5010009@voltaire.com>

Ralph Campbell wrote:
> +static inline dma_addr_t ib_dma_map_sg(struct ib_device *dev,
> +				       struct scatterlist *sg, int nents,
> +				       enum dma_data_direction direction)
> +{
> +	return dev->map_sg ?
> +		dev->map_sg(dev, sg, nents, direction) :
> +		dma_map_sg(dev->dma_device, sg, nents, direction);
> +}

As SG dma mapping happens in place and you don't want to change struct 
scatterlist for every arch, i think you would need to keep some mapping 
(hash) from each struct scatterlist to its ipath buddy...

Also you would need to implement the sg_dma_address() and sg_dma_len() 
macros used by ULP code when page/s is/are to be input-ed for the IB 
verbs layer eg to get an SG FMR-ed or send/recv from/into a page and use 
queries into the ipath scatterlist buddy.

Or.


From halr at voltaire.com  Thu Sep 14 03:54:03 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 06:54:03 -0400
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <20060914044622.GA24586@mellanox.co.il>
References: <1158187004.13748.20243.camel@hal.voltaire.com>
	<20060914044622.GA24586@mellanox.co.il>
Message-ID: <1158231231.13748.47916.camel@hal.voltaire.com>

On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > 
> > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote:
> > > Quoting r. Roland Dreier <rdreier at cisco.com>:
> > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > > 
> > > >     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
> > > >     Michael> selector in path record query accordingly.
> > > > 
> > > > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > > > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > > > multicast group correctly.
> > > 
> > > Hmm, you are right, it is just that existing implementations all
> > > set that to 2K.
> > 
> > By default yes. It can be configured.
> > 
> > > But there is a silent assumption that MTU of any path is >= broadcast
> > > multicast group MTU, and this is what I want to fix.
> > 
> > The spec says:
> > "The value (for IB MTU) assigned to the broadcast-GID must not be
> > greater than any physical link MTU spanned by the IPoIB subnet".
> > so if the broadcast group is improperly setup not to follow this, there
> > will be other issues.
> 
> Correct. IPoIB uses broadcast group MTU to get the value reported to
> Linux. If some link has a lower MTU IPoIB can not use it.
> 
> > It doesn't need to be included in the PR request.
> 
> I disagree here. If you do not set selector, SA is free to return
> a path with lower MTU even though physical link allows higher MTU.
> Does it say otherwise somewhere?

No but isn't this relying on using PRs in a certain way by IPoIB
implementations (and any other UD application) v. connected apps ?

-- Hal


From jackm at dev.mellanox.co.il  Thu Sep 14 04:12:27 2006
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Thu, 14 Sep 2006 14:12:27 +0300
Subject: [openib-general] OFED can't compile against sa.h under SLES10 x86_64
Message-ID: <200609141412.27577.jackm@dev.mellanox.co.il>

I was unable to reproduce the problem you describe, under SLES10 x86_64.
Here, your cmpost.c file compiled and linked without any problems.
I used a slightly different gcc command line (given below).

I took the cmpost.c file you provided, placed it under 
/usr/local/ofed/src/openib-1.1/src/userspace/libibcm/examples 
(under an OFED 1.1-rc5 prerelease candidate installation).
I then did the following:

 cd libibcm/examples

 gcc -ggdb  -Wall -O0 -I/usr/local/ofed/include  -D__x86_64__  
      /usr/local/ofed/lib64/libibcommon.so /usr/local/ofed/lib64/librdmacm.so 
      /usr/local/ofed/lib64/libibcm.so  -o cmpost cmpost.c

(the above gcc command is broken up into several lines for easy reading)

The compilation was successful. I did not experience any compilation or linkage problems.
I was able to run the resulting "cmpost" executable file.

gcc version: gcc (GCC) 4.1.0 (SUSE Linux)
Linux distribution: (from file /etc/SuSE-release):
    SUSE Linux Enterprise Server 10 (x86_64)
    VERSION = 10

Kernel version (uname -a):
    Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux

I then retried everything using OFED 1.1 RC4, and also succeeded in compiling and running cmpost.c.

The following is the list of OFED packages that I installed for the above experiment:
ib_ipoib
ib_mthca
ib_verbs
kernel-ib
kernel-ib-devel
libibcm
libibcm-devel
libibcommon
libibcommon-devel
libibmad
libibmad-devel
libibumad
libibumad-devel
libibverbs
libibverbs-devel
libibverbs-utils
libmthca
libmthca-devel
librdmacm
librdmacm-devel
librdmacm-utils
ofed-scripts

- Jack


From mst at mellanox.co.il  Thu Sep 14 04:14:03 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 14:14:03 +0300
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <1158231231.13748.47916.camel@hal.voltaire.com>
References: <1158231231.13748.47916.camel@hal.voltaire.com>
Message-ID: <20060914111403.GA25691@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> 
> On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > 
> > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote:
> > > > Quoting r. Roland Dreier <rdreier at cisco.com>:
> > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > > > 
> > > > >     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
> > > > >     Michael> selector in path record query accordingly.
> > > > > 
> > > > > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > > > > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > > > > multicast group correctly.
> > > > 
> > > > Hmm, you are right, it is just that existing implementations all
> > > > set that to 2K.
> > > 
> > > By default yes. It can be configured.
> > > 
> > > > But there is a silent assumption that MTU of any path is >= broadcast
> > > > multicast group MTU, and this is what I want to fix.
> > > 
> > > The spec says:
> > > "The value (for IB MTU) assigned to the broadcast-GID must not be
> > > greater than any physical link MTU spanned by the IPoIB subnet".
> > > so if the broadcast group is improperly setup not to follow this, there
> > > will be other issues.
> > 
> > Correct. IPoIB uses broadcast group MTU to get the value reported to
> > Linux. If some link has a lower MTU IPoIB can not use it.
> > 
> > > It doesn't need to be included in the PR request.
> > 
> > I disagree here. If you do not set selector, SA is free to return
> > a path with lower MTU even though physical link allows higher MTU.
> > Does it say otherwise somewhere?
> 
> No but isn't this relying on using PRs in a certain way by IPoIB
> implementations (and any other UD application) v. connected apps ?

Not really.

Tavor is faster with 1K MTU than with 2K MTU - it does not matter connected or
not. So, for me, it makes sense for SM to choose 1K if Tavor is involved,
unless application requested otherwise.

If an application (again, no matter connected or UD) needs a specific MTU it
should use mtu selector in path query. If it does not, SM is free to choose any
MTU supported by link, for best performance. If one end is Tavor, this happens to
be 1K and not the maximum MTU.

So what we have here is IPoIB bug - it requires that path mtu >= bcast group
mtu, but does not pass this information in query. This only happens to work
if SM always selects max link MTU for each path query.

Makes sense?

-- 
MST


From halr at voltaire.com  Thu Sep 14 04:35:10 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 07:35:10 -0400
Subject: [openib-general] [PATCH] IB/ipoib: use appropriate path selector
In-Reply-To: <20060914111403.GA25691@mellanox.co.il>
References: <1158231231.13748.47916.camel@hal.voltaire.com>
	<20060914111403.GA25691@mellanox.co.il>
Message-ID: <1158233667.13748.49356.camel@hal.voltaire.com>

On Thu, 2006-09-14 at 07:14, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > 
> > On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > > 
> > > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote:
> > > > > Quoting r. Roland Dreier <rdreier at cisco.com>:
> > > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > > > > 
> > > > > >     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
> > > > > >     Michael> selector in path record query accordingly.
> > > > > > 
> > > > > > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > > > > > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > > > > > multicast group correctly.
> > > > > 
> > > > > Hmm, you are right, it is just that existing implementations all
> > > > > set that to 2K.
> > > > 
> > > > By default yes. It can be configured.
> > > > 
> > > > > But there is a silent assumption that MTU of any path is >= broadcast
> > > > > multicast group MTU, and this is what I want to fix.
> > > > 
> > > > The spec says:
> > > > "The value (for IB MTU) assigned to the broadcast-GID must not be
> > > > greater than any physical link MTU spanned by the IPoIB subnet".
> > > > so if the broadcast group is improperly setup not to follow this, there
> > > > will be other issues.
> > > 
> > > Correct. IPoIB uses broadcast group MTU to get the value reported to
> > > Linux. If some link has a lower MTU IPoIB can not use it.
> > > 
> > > > It doesn't need to be included in the PR request.
> > > 
> > > I disagree here. If you do not set selector, SA is free to return
> > > a path with lower MTU even though physical link allows higher MTU.
> > > Does it say otherwise somewhere?
> > 
> > No but isn't this relying on using PRs in a certain way by IPoIB
> > implementations (and any other UD application) v. connected apps ?
> 
> Not really.
> 
> Tavor is faster with 1K MTU than with 2K MTU - it does not matter connected or
> not. So, for me, it makes sense for SM to choose 1K if Tavor is involved,
> unless application requested otherwise.
> 
> If an application (again, no matter connected or UD) needs a specific MTU it
> should use mtu selector in path query. If it does not, SM is free to choose any
> MTU supported by link, for best performance. If one end is Tavor, this happens to
> be 1K and not the maximum MTU.
> 
> So what we have here is IPoIB bug - it requires that path mtu >= bcast group
> mtu, but does not pass this information in query. This only happens to work
> if SM always selects max link MTU for each path query.

> Makes sense?

Understood. As I said in a previous email, if it happens that the path
MTU < broadcast group MTU, I think there would be join issues for some
nodes out there.

-- Hal


From thomas.bub at thomson.net  Thu Sep 14 05:07:38 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Thu, 14 Sep 2006 14:07:38 +0200
Subject: [openib-general] OFED can't compile against sa.h under SLES10
 x86_64
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC44D@wdtssmail01.eu.thmulti.com>

Jack et all,
I have to apologize my -I. include path pointed to OFED-1.0.1 includes
where the ibv_sa_path_record was not defined yet.
Doing it right it works
Thanks to all for the sudden support 
Thomas (humbling backwards) ;-)


> -----Original Message-----
> From: openib-general-bounces at openib.org [mailto:openib-general-
> bounces at openib.org] On Behalf Of Jack Morgenstein
> Sent: Thursday, September 14, 2006 1:12 PM
> To: Bub Thomas
> Cc: openib-general at openib.org
> Subject: [openib-general] OFED can't compile against sa.h under SLES10
> x86_64
> 
> I was unable to reproduce the problem you describe, under SLES10
x86_64.
> Here, your cmpost.c file compiled and linked without any problems.
> I used a slightly different gcc command line (given below).
> 
> I took the cmpost.c file you provided, placed it under
> /usr/local/ofed/src/openib-1.1/src/userspace/libibcm/examples
> (under an OFED 1.1-rc5 prerelease candidate installation).
> I then did the following:
> 
>  cd libibcm/examples
> 
>  gcc -ggdb  -Wall -O0 -I/usr/local/ofed/include  -D__x86_64__
>       /usr/local/ofed/lib64/libibcommon.so
> /usr/local/ofed/lib64/librdmacm.so
>       /usr/local/ofed/lib64/libibcm.so  -o cmpost cmpost.c
> 
> (the above gcc command is broken up into several lines for easy
reading)
> 
> The compilation was successful. I did not experience any compilation
or
> linkage problems.
> I was able to run the resulting "cmpost" executable file.
> 
> gcc version: gcc (GCC) 4.1.0 (SUSE Linux)
> Linux distribution: (from file /etc/SuSE-release):
>     SUSE Linux Enterprise Server 10 (x86_64)
>     VERSION = 10
> 
> Kernel version (uname -a):
>     Linux 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64
> x86_64 x86_64 GNU/Linux
> 
> I then retried everything using OFED 1.1 RC4, and also succeeded in
> compiling and running cmpost.c.
> 
> The following is the list of OFED packages that I installed for the
above
> experiment:
> ib_ipoib
> ib_mthca
> ib_verbs
> kernel-ib
> kernel-ib-devel
> libibcm
> libibcm-devel
> libibcommon
> libibcommon-devel
> libibmad
> libibmad-devel
> libibumad
> libibumad-devel
> libibverbs
> libibverbs-devel
> libibverbs-utils
> libmthca
> libmthca-devel
> librdmacm
> librdmacm-devel
> librdmacm-utils
> ofed-scripts
> 
> - Jack
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general


From halr at voltaire.com  Thu Sep 14 06:12:28 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 09:12:28 -0400
Subject: [openib-general] [PATCH] OpenSM: Change default temp directory
 location for non Windows platforms
Message-ID: <1158239541.13748.52729.camel@hal.voltaire.com>

OpenSM: Change default temp directory location for non Windows platforms

This patch is intended for both trunk and 1.1.

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: ../osm/include/opensm/osm_base.h
===================================================================
--- ../osm/include/opensm/osm_base.h	(revision 9347)
+++ ../osm/include/opensm/osm_base.h	(working copy)
@@ -176,16 +176,15 @@ BEGIN_C_DECLS
 *	OSM_DEFAULT_TMP_DIR
 *
 * DESCRIPTION
-*	Specifies the default temporary directory for the log file, subnet.lst
-*  and the other log files (with the exception of osm.log for Linux being 
-*  in /var/log).
+*	Specifies the default temporary directory for the log file,
+*  subnet.lst, and other log files.
 *
 * SYNOPSIS
 */
 #ifdef __WIN__
 #define OSM_DEFAULT_TMP_DIR GetOsmTempPath()
 #else
-#define OSM_DEFAULT_TMP_DIR "/tmp/"
+#define OSM_DEFAULT_TMP_DIR "/var/log/"
 #endif
 /***********/
 

From eitan at mellanox.co.il  Thu Sep 14 06:39:16 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Thu, 14 Sep 2006 16:39:16 +0300
Subject: [openib-general] [PATCH] OpenSM: Change default temp directory
 location for non Windows platforms
In-Reply-To: <1158239541.13748.52729.camel@hal.voltaire.com>
References: <1158239541.13748.52729.camel@hal.voltaire.com>
Message-ID: <45095B84.6050900@mellanox.co.il>

Hi Hal,
Looks simple enough to get into the OFED 1.1
I assume you are going to commit it into the branch?

EZ

Hal Rosenstock wrote:

>OpenSM: Change default temp directory location for non Windows platforms
>
>This patch is intended for both trunk and 1.1.
>
>Signed-off-by: Hal Rosenstock <halr at voltaire.com>
>
>Index: ../osm/include/opensm/osm_base.h
>===================================================================
>--- ../osm/include/opensm/osm_base.h	(revision 9347)
>+++ ../osm/include/opensm/osm_base.h	(working copy)
>@@ -176,16 +176,15 @@ BEGIN_C_DECLS
> *	OSM_DEFAULT_TMP_DIR
> *
> * DESCRIPTION
>-*	Specifies the default temporary directory for the log file, subnet.lst
>-*  and the other log files (with the exception of osm.log for Linux being 
>-*  in /var/log).
>+*	Specifies the default temporary directory for the log file,
>+*  subnet.lst, and other log files.
> *
> * SYNOPSIS
> */
> #ifdef __WIN__
> #define OSM_DEFAULT_TMP_DIR GetOsmTempPath()
> #else
>-#define OSM_DEFAULT_TMP_DIR "/tmp/"
>+#define OSM_DEFAULT_TMP_DIR "/var/log/"
> #endif
> /***********/
> 
>
>
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From mst at mellanox.co.il  Thu Sep 14 07:06:51 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 17:06:51 +0300
Subject: [openib-general] [PATCH] OpenSM: Change default temp directory
 location for non Windows platforms
In-Reply-To: <1158239541.13748.52729.camel@hal.voltaire.com>
References: <1158239541.13748.52729.camel@hal.voltaire.com>
Message-ID: <20060914140651.GE25691@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: [PATCH] OpenSM: Change default temp directory location for non Windows platforms
> 
> OpenSM: Change default temp directory location for non Windows platforms
> 
> This patch is intended for both trunk and 1.1.

Could you please delay the commit till tomorrow so that we can get RC5 out?
This still can get in before final.

-- 
MST


From halr at voltaire.com  Thu Sep 14 07:05:58 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 10:05:58 -0400
Subject: [openib-general] [PATCH] OpenSM: Change default temp directory
 location for non Windows platforms
In-Reply-To: <45095B84.6050900@mellanox.co.il>
References: <1158239541.13748.52729.camel@hal.voltaire.com>
	<45095B84.6050900@mellanox.co.il>
Message-ID: <1158242732.13748.54281.camel@hal.voltaire.com>

Hi Eitan,

On Thu, 2006-09-14 at 09:39, Eitan Zahavi wrote:
> Hi Hal,
> Looks simple enough to get into the OFED 1.1
> I assume you are going to commit it into the branch?

Done (r9484).

-- Hal

> 
> EZ
> 
> Hal Rosenstock wrote:
> 
> >OpenSM: Change default temp directory location for non Windows platforms
> >
> >This patch is intended for both trunk and 1.1.
> >
> >Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> >
> >Index: ../osm/include/opensm/osm_base.h
> >===================================================================
> >--- ../osm/include/opensm/osm_base.h	(revision 9347)
> >+++ ../osm/include/opensm/osm_base.h	(working copy)
> >@@ -176,16 +176,15 @@ BEGIN_C_DECLS
> > *	OSM_DEFAULT_TMP_DIR
> > *
> > * DESCRIPTION
> >-*	Specifies the default temporary directory for the log file, subnet.lst
> >-*  and the other log files (with the exception of osm.log for Linux being 
> >-*  in /var/log).
> >+*	Specifies the default temporary directory for the log file,
> >+*  subnet.lst, and other log files.
> > *
> > * SYNOPSIS
> > */
> > #ifdef __WIN__
> > #define OSM_DEFAULT_TMP_DIR GetOsmTempPath()
> > #else
> >-#define OSM_DEFAULT_TMP_DIR "/tmp/"
> >+#define OSM_DEFAULT_TMP_DIR "/var/log/"
> > #endif
> > /***********/
> > 
> >
> >
> >
> >
> >_______________________________________________
> >openib-general mailing list
> >openib-general at openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >  
> >
> 


From mst at mellanox.co.il  Thu Sep 14 07:19:01 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 17:19:01 +0300
Subject: [openib-general] Fwd: IPoIB Multicast
Message-ID: <20060914141901.GG25691@mellanox.co.il>


Subject: IPoIB Multicast
Date: Thu, 14 Sep 2006 17:08:55 +0300
From: "Eitan Zahavi" <eitan at mellanox.co.il>
> 
> Quoting the <draft-ietf-ipoib-ip-over-infiniband-09.txt>              
>     
>     A node joining an IP multicast group must first construct a MGID
>     according to the rule described in section 4 above. Once the correct
>     MGID is calculated, the node must call the SA of the outbound link
>     to attempt a "FullMember" join of the IB multicast group
>     corresponding to the MGID. If the IB multicast group doesn't already
>     exist, one must be created first with the IPoIB link MTU.  The MGID
>     MUST use the same P_Key, Q_Key, SL, MTU and HopLimit as those used
>     in the broadcast-GID. For the rest of attributes too, the values
>     used in the broadcast-GID SHOULD be used.

Hmm, IPoIB does not seem to copy anything except the pkey.
Looks like a compliance issue.

Specifically, I'm not sure what "other attributes" are, but I think
this should include the static rate. Right?

-- 
MST


From thomas.bub at thomson.net  Thu Sep 14 07:28:20 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Thu, 14 Sep 2006 16:28:20 +0200
Subject: [openib-general] Different byte order between gen1 CM and gen2 CM
 ->RE: How to connect gen2 CM to gen1 IBGD CM?
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC450@wdtssmail01.eu.thmulti.com>

Sean,
I should have checked this earlier after you told me last time that the
LID is taken in network order by the gen2 CM instead of host order in
gen1.
This time it was the service_id I stumbeled over.
After putting my service_id into network order I could at least get a
REQ_RECEIVED.
The rest must be fine tuning from here onwards.
Do you know rany other Verbs or CM parameter that does have a different
byte order between gen1 and gen2?
Thanks
Thomas

P.S.: Maybe someone should put a big "Warning" sign somewhere so that
others don't stumple into that pit again. ;-)

_____________________________________________
From: Bub Thomas 
Sent: Wednesday, September 13, 2006 4:11 PM
To: 'Sean Hefty'; 'Thomas.Bub at gmx.net'
Cc: openib-general at openib.org
Subject: How to connect gen2 CM to gen1 IBGD CM?

Sean,
with your patience, the cmpost.c example and the OFED 1.1-rc4 on all
machines I finally got a gen2 connection under SLES10 even with a 32-Bit
executable on a x86_64 machine. Cool!

Now the last part on my journey is standing out.
It's a gen2 client connecting to a gen1 IBGD server.
I have to do this since my gen1 server is running a 2.4 Montavista RT
Linux on a PowerPC that I can't upgrade to gen2. :-(
BTW.: Our application is a high speed film image transfer in the film
postproduction industry leveraging the benefits of the high speed IB
RDMA transport. 

While I have gen1 to gen1 and gen2 to gen2 running the only thing that
is missing is the gen2 connecting to gen1.

Just tried this with my test-executables but I did not get anything to
the gen1 server. The gen1 userspace application does not even receive
the IB_CM_REQ.

So since your cmpost example did help me a lot on gen2 the question is:
Do you have a cmpost for gen1 IBGD I can use to connect from gen2 to
gen1?
Or is there any other trick to play here?

Thanks in advance for your assistance
Thomas

............................................................
Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: +49 6150 104 147
Fax: +49 6150 104 656
Email: Thomas.Bub at thomson.net
www.GrassValley.com  <http://www.grassvalley.com> 
............................................................


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060914/220acf87/attachment.html>

From mshefty at ichips.intel.com  Thu Sep 14 08:25:11 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Sep 2006 08:25:11 -0700
Subject: [openib-general] Different byte order between gen1 CM and gen2
 CM ->RE: How to connect gen2 CM to gen1 IBGD CM?
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD222029AC450@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC450@wdtssmail01.eu.thmulti.com>
Message-ID: <45097457.5020007@ichips.intel.com>

Bub Thomas wrote:
> Do you know rany other Verbs or CM parameter that does have a different 
> byte order between gen1 and gen2?

I'm not really familiar with the gen1 code.

> P.S.: Maybe someone should put a big “Warning” sign somewhere so that 
> others don’t stumple into that pit again. ;-)

The byte ordering in the kernel APIs are fairly clear about this, but that 
documentation didn't carry up to userspace everywhere.  I will update the 
userspace documentation, but it may take me a few weeks to get to this.

- Sean


From rdreier at cisco.com  Thu Sep 14 08:30:29 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 08:30:29 -0700
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
In-Reply-To: <Pine.LNX.4.64.0609141005480.7597@zuben> (Or Gerlitz's
	message of "Thu, 14 Sep 2006 10:12:36 +0300 (IDT)")
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
Message-ID: <aday7smwjmy.fsf@cisco.com>

    Or> change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add
    Or> help text clarifying what the thing does. Adding the help text
    Or> also has the side effect of the cma config being visible when
    Or> one does make menuconfig

Why do we want to make this config option visible?  Isn't it better
for it to just take the right value automatically?

 - R.


From rdreier at cisco.com  Thu Sep 14 08:31:45 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 08:31:45 -0700
Subject: [openib-general] [PATCH] IB/iser: fix iSER description and
 selections in Kconfig
In-Reply-To: <45091AC4.3090005@voltaire.com> (Erez Zilber's message of
	"Thu, 14 Sep 2006 12:03:00 +0300")
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com>
Message-ID: <adau03awjku.fsf@cisco.com>

Wouldn't it better just to depend on INET the way ISCSI_TCP does?
'select' is more fragile and harder to maintain than 'depends' since
you always have to make sure you select the full dependency tree of
every option you really need.

 - R.


From mshefty at ichips.intel.com  Thu Sep 14 08:35:44 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 14 Sep 2006 08:35:44 -0700
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
In-Reply-To: <Pine.LNX.4.64.0609141005480.7597@zuben>
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
Message-ID: <450976D0.8020803@ichips.intel.com>

Or Gerlitz wrote:
> change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add help text
> clarifying what the thing does. Adding the help text also has the side
> effect of the cma config being visible when one does make menuconfig
> 
> Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>

Acked-by: Sean Hefty <sean.hefty at intel.com>

Were you wanting this for 2.6.19?


From eli at dev.mellanox.co.il  Thu Sep 14 08:47:54 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 14 Sep 2006 18:47:54 +0300
Subject: [openib-general] ipoib send only failure
Message-ID: <1158248874.18456.9.camel@localhost>

Hi,
when running a test I encountered the following scenario:
the test sends to multicast address
ipoib issues send only joins which fails.
successive joins to this group will not be attempted since the query
field of the mcast object holds the old pointer.


From eli at dev.mellanox.co.il  Thu Sep 14 08:47:58 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 14 Sep 2006 18:47:58 +0300
Subject: [openib-general]  [PATCH] ipoib sendonly join
Message-ID: <1158248878.18456.11.camel@localhost>

When sendonly join fails mcast->query must be set to NULL in
order that succeesive joins will be attempted for the group.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2006-09-12 14:28:33.000000000 +0300
+++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2006-09-14 17:17:12.000000000 +0300
@@ -326,6 +326,7 @@
 
 		/* Clear the busy flag so we try again */
 		clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+		mcast->query = NULL;
 	}
 
 	complete(&mcast->done);


From sean.hefty at intel.com  Thu Sep 14 09:33:16 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 14 Sep 2006 09:33:16 -0700
Subject: [openib-general] [PATCH v2] ib_sa: add generic RMPP query interface
In-Reply-To: <000601c6c580$8343eb30$8698070a@amr.corp.intel.com>
Message-ID: <000101c6d81b$7b2f6ca0$97d8180a@amr.corp.intel.com>

Patch updated to svn tip, which includes SA registration.

The following patch adds a generic interface to send MADs to the SA.
The primary motivation of adding these calls is to expand the SA query
interface to include RMPP responses for users wanting more than a
single attribute returned from a query (e.g. multipath record queries),
but it also simplifies a userspace interface.

The implementation of existing SA query routines were layered on top
of the generic query interface.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Index: include/rdma/ib_sa.h
===================================================================
--- include/rdma/ib_sa.h	(revision 9490)
+++ include/rdma/ib_sa.h	(working copy)
@@ -82,6 +82,32 @@ enum {
 	IB_SA_ATTR_INFORM_INFO_REC   = 0xf3
 };
 
+/* Length of SA attributes on the wire */
+enum {
+	IB_SA_ATTR_CLASS_PORTINFO_LEN	= 72,
+	IB_SA_ATTR_NOTICE_LEN		= 80,
+	IB_SA_ATTR_INFORM_INFO_LEN	= 36,
+	IB_SA_ATTR_NODE_REC_LEN		= 108,
+	IB_SA_ATTR_PORT_INFO_REC_LEN	= 58,
+	IB_SA_ATTR_SL2VL_REC_LEN	= 16,
+	IB_SA_ATTR_SWITCH_REC_LEN	= 21,
+	IB_SA_ATTR_LINEAR_FDB_REC_LEN	= 72,
+	IB_SA_ATTR_RANDOM_FDB_REC_LEN	= 72,
+	IB_SA_ATTR_MCAST_FDB_REC_LEN	= 72,
+	IB_SA_ATTR_SM_INFO_REC_LEN	= 25,
+	IB_SA_ATTR_LINK_REC_LEN		= 6,
+	IB_SA_ATTR_GUID_INFO_REC_LEN	= 72,
+	IB_SA_ATTR_SERVICE_REC_LEN	= 176,
+	IB_SA_ATTR_PARTITION_REC_LEN	= 72,
+	IB_SA_ATTR_PATH_REC_LEN		= 64,
+	IB_SA_ATTR_VL_ARB_REC_LEN	= 72,
+	IB_SA_ATTR_MC_MEMBER_REC_LEN	= 52,
+	IB_SA_ATTR_TRACE_REC_LEN	= 46,
+	IB_SA_ATTR_MULTI_PATH_REC_LEN	= 56,
+	IB_SA_ATTR_SERVICE_ASSOC_REC_LEN= 80,
+	IB_SA_ATTR_INFORM_INFO_REC_LEN	= 60
+};
+
 enum ib_sa_selector {
 	IB_SA_GTE  = 0,
 	IB_SA_LTE  = 1,
@@ -270,10 +296,83 @@ void ib_sa_register_client(struct ib_sa_
  */
 void ib_sa_unregister_client(struct ib_sa_client *client);
 
+struct ib_sa_iter;
+
+/**
+ * ib_sa_iter_create - Create an iterator that may be used to walk through
+ *   a list of returned SA records.
+ * @mad_recv_wc: A received response from the SA.
+ *
+ * This call allocates an iterator that is used to walk through a list of 
+ * SA records.  Users must free the iterator by calling ib_sa_iter_free.
+ */
+struct ib_sa_iter *ib_sa_iter_create(struct ib_mad_recv_wc *mad_recv_wc);
+
+/**
+ * ib_sa_iter_free - Release an iterator.
+ * @iter: The iterator to free.
+ */
+void ib_sa_iter_free(struct ib_sa_iter *iter);
+
+/**
+ * ib_sa_iter_next - Move an iterator to reference the next attribute and
+ *   return the attribute.
+ * @iter: The iterator to move.
+ *
+ * The referenced attribute will be in wire format.  The funtion returns NULL
+ * if there are no more attributes to return.
+ */
+void *ib_sa_iter_next(struct ib_sa_iter *iter);
+
+/**
+ * ib_sa_attr_size - Return the length of an SA attribute on the wire.
+ * @attr_id: Attribute identifier.
+ */
+int ib_sa_attr_size(__be16 attr_id);
+
 struct ib_sa_query;
 
 void ib_sa_cancel_query(int id, struct ib_sa_query *query);
 
+/**
+ * ib_sa_send_mad - Send a MAD to the SA.
+ * @client:SA client
+ * @device:device to send query on
+ * @port_num: port number to send query on
+ * @method:MAD method to use in the send.
+ * @attr:Reference to attribute in wire format to send in MAD.
+ * @attr_id:Attribute type identifier.
+ * @comp_mask:component mask to send in MAD
+ * @timeout_ms:time to wait for response, if one is expected
+ * @retries:number of times to retry request
+ * @gfp_mask:GFP mask to use for internal allocations
+ * @callback:function called when query completes, times out or is
+ * canceled
+ * @context:opaque user context passed to callback
+ * @sa_query:query context, used to cancel query
+ *
+ * Send a message to the SA.  If a response is expected (timeout_ms is
+ * non-zero), the callback function will be called when the query completes.
+ * Status is 0 for a successful response, -EINTR if the query
+ * is canceled, -ETIMEDOUT is the query timed out, or -EIO if an error
+ * occurred sending the query.  Mad_recv_wc will reference any returned
+ * response from the SA.  It is the responsibility of the caller to free
+ * mad_recv_wc by call ib_free_recv_mad() if it is non-NULL.
+ *
+ * If the return value of ib_sa_send_mad() is negative, it is an
+ * error code.  Otherwise it is a query ID that can be used to cancel
+ * the query.
+ */
+int ib_sa_send_mad(struct ib_sa_client *client,
+		   struct ib_device *device, u8 port_num,
+		   int method, void *attr, __be16 attr_id,
+		   ib_sa_comp_mask comp_mask,
+		   int timeout_ms, int retries, gfp_t gfp_mask,
+		   void (*callback)(int status,
+				    struct ib_mad_recv_wc *mad_recv_wc,
+				    void *context),
+		   void *context, struct ib_sa_query **query);
+
 int ib_sa_path_rec_get(struct ib_sa_client *client,
 		       struct ib_device *device, u8 port_num,
 		       struct ib_sa_path_rec *rec,
Index: core/sa_query.c
===================================================================
--- core/sa_query.c	(revision 9490)
+++ core/sa_query.c	(working copy)
@@ -73,31 +73,42 @@ struct ib_sa_device {
 };
 
 struct ib_sa_query {
-	void (*callback)(struct ib_sa_query *, int, struct ib_sa_mad *);
-	void (*release)(struct ib_sa_query *);
+	void (*callback)(int, struct ib_mad_recv_wc *, void *);
 	struct ib_sa_client    *client;
 	struct ib_sa_port      *port;
 	struct ib_mad_send_buf *mad_buf;
 	struct ib_sa_sm_ah     *sm_ah;
+	void		       *context;
 	int			id;
 };
 
 struct ib_sa_service_query {
 	void (*callback)(int, struct ib_sa_service_rec *, void *);
 	void *context;
-	struct ib_sa_query sa_query;
+	struct ib_sa_query *sa_query;
 };
 
 struct ib_sa_path_query {
 	void (*callback)(int, struct ib_sa_path_rec *, void *);
 	void *context;
-	struct ib_sa_query sa_query;
+	struct ib_sa_query *sa_query;
 };
 
 struct ib_sa_mcmember_query {
 	void (*callback)(int, struct ib_sa_mcmember_rec *, void *);
 	void *context;
-	struct ib_sa_query sa_query;
+	struct ib_sa_query *sa_query;
+};
+
+struct ib_sa_iter {
+	struct ib_mad_recv_wc *recv_wc;
+	struct ib_mad_recv_buf *recv_buf;
+	int attr_size;
+	int attr_offset;
+	int data_offset;
+	int data_left;
+	void *attr;
+	u8 attr_data[0];
 };
 
 static void ib_sa_add_one(struct ib_device *device);
@@ -532,9 +543,17 @@ EXPORT_SYMBOL(ib_init_ah_from_mcmember);
 int ib_sa_pack_attr(void *dst, void *src, int attr_id)
 {
 	switch (attr_id) {
+	case IB_SA_ATTR_SERVICE_REC:
+		ib_pack(service_rec_table, ARRAY_SIZE(service_rec_table),
+			src, dst);
+		break;
 	case IB_SA_ATTR_PATH_REC:
 		ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), src, dst);
 		break;
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table),
+			src, dst);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -545,9 +564,17 @@ EXPORT_SYMBOL(ib_sa_pack_attr);
 int ib_sa_unpack_attr(void *dst, void *src, int attr_id)
 {
 	switch (attr_id) {
+	case IB_SA_ATTR_SERVICE_REC:
+		ib_unpack(service_rec_table, ARRAY_SIZE(service_rec_table),
+			  src, dst);
+		break;
 	case IB_SA_ATTR_PATH_REC:
 		ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table), src, dst);
 		break;
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table),
+			  src, dst);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -555,15 +582,100 @@ int ib_sa_unpack_attr(void *dst, void *s
 }
 EXPORT_SYMBOL(ib_sa_unpack_attr);
 
-static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent)
+/* Return size of SA attributes on the wire. */
+int ib_sa_attr_size(__be16 attr_id)
 {
-	unsigned long flags;
+	int size;
 
-	memset(mad, 0, sizeof *mad);
+	switch (be16_to_cpu(attr_id)) {
+	case IB_SA_ATTR_CLASS_PORTINFO:
+		size = IB_SA_ATTR_CLASS_PORTINFO_LEN;
+		break;
+	case IB_SA_ATTR_NOTICE:
+		size = IB_SA_ATTR_NOTICE_LEN;
+		break;
+	case IB_SA_ATTR_INFORM_INFO:
+		size = IB_SA_ATTR_INFORM_INFO_LEN;
+		break;
+	case IB_SA_ATTR_NODE_REC:
+		size = IB_SA_ATTR_NODE_REC_LEN;
+		break;
+	case IB_SA_ATTR_PORT_INFO_REC:
+		size = IB_SA_ATTR_PORT_INFO_REC_LEN;
+		break;
+	case IB_SA_ATTR_SL2VL_REC:
+		size = IB_SA_ATTR_SL2VL_REC_LEN;
+		break;
+	case IB_SA_ATTR_SWITCH_REC:
+		size = IB_SA_ATTR_SWITCH_REC_LEN;
+		break;
+	case IB_SA_ATTR_LINEAR_FDB_REC:
+		size = IB_SA_ATTR_LINEAR_FDB_REC_LEN;
+		break;
+	case IB_SA_ATTR_RANDOM_FDB_REC:
+		size = IB_SA_ATTR_RANDOM_FDB_REC_LEN;
+		break;
+	case IB_SA_ATTR_MCAST_FDB_REC:
+		size = IB_SA_ATTR_MCAST_FDB_REC_LEN;
+		break;
+	case IB_SA_ATTR_SM_INFO_REC:
+		size = IB_SA_ATTR_SM_INFO_REC_LEN;
+		break;
+	case IB_SA_ATTR_LINK_REC:
+		size = IB_SA_ATTR_LINK_REC_LEN;
+		break;
+	case IB_SA_ATTR_GUID_INFO_REC:
+		size = IB_SA_ATTR_GUID_INFO_REC_LEN;
+		break;
+	case IB_SA_ATTR_SERVICE_REC:
+		size = IB_SA_ATTR_SERVICE_REC_LEN;
+		break;
+	case IB_SA_ATTR_PARTITION_REC:
+		size = IB_SA_ATTR_PARTITION_REC_LEN;
+		break;
+	case IB_SA_ATTR_PATH_REC:
+		size = IB_SA_ATTR_PATH_REC_LEN;
+		break;
+	case IB_SA_ATTR_VL_ARB_REC:
+		size = IB_SA_ATTR_VL_ARB_REC_LEN;
+		break;
+	case IB_SA_ATTR_MC_MEMBER_REC:
+		size = IB_SA_ATTR_MC_MEMBER_REC_LEN;
+		break;
+	case IB_SA_ATTR_TRACE_REC:
+		size = IB_SA_ATTR_TRACE_REC_LEN;
+		break;
+	case IB_SA_ATTR_MULTI_PATH_REC:
+		size = IB_SA_ATTR_MULTI_PATH_REC_LEN;
+		break;
+	case IB_SA_ATTR_SERVICE_ASSOC_REC:
+		size = IB_SA_ATTR_SERVICE_ASSOC_REC_LEN;
+		break;
+	case IB_SA_ATTR_INFORM_INFO_REC:
+		size = IB_SA_ATTR_INFORM_INFO_REC_LEN;
+		break;
+	default:
+		size = 0;
+		break;
+	}
+	return size;
+}
+EXPORT_SYMBOL(ib_sa_attr_size);
+
+static void init_mad(struct ib_sa_mad *mad, struct ib_mad_agent *agent,
+		     int method, void *attr, __be16 attr_id,
+		     ib_sa_comp_mask comp_mask)
+{
+	unsigned long flags;
 
 	mad->mad_hdr.base_version  = IB_MGMT_BASE_VERSION;
 	mad->mad_hdr.mgmt_class    = IB_MGMT_CLASS_SUBN_ADM;
 	mad->mad_hdr.class_version = IB_SA_CLASS_VERSION;
+	mad->mad_hdr.method	   = method;
+	mad->mad_hdr.attr_id	   = attr_id;
+	mad->sa_hdr.comp_mask	   = comp_mask;
+
+	memcpy(mad->data, attr, ib_sa_attr_size(attr_id));
 
 	spin_lock_irqsave(&tid_lock, flags);
 	mad->mad_hdr.tid           =
@@ -617,31 +729,162 @@ retry:
 	return ret ? ret : id;
 }
 
-static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query,
-				    int status,
-				    struct ib_sa_mad *mad)
+struct ib_sa_iter *ib_sa_iter_create(struct ib_mad_recv_wc *mad_recv_wc)
+{
+	struct ib_sa_iter *iter;
+	struct ib_sa_mad *mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad;
+	int attr_size, attr_offset;
+
+	attr_offset = be16_to_cpu(mad->sa_hdr.attr_offset) * 8;
+	attr_size = ib_sa_attr_size(mad->mad_hdr.attr_id);
+	if (!attr_size || attr_offset < attr_size)
+		return ERR_PTR(-EINVAL);
+
+	iter = kzalloc(sizeof *iter + attr_size, GFP_KERNEL);
+	if (!iter)
+		return ERR_PTR(-ENOMEM);
+
+	iter->data_left = mad_recv_wc->mad_len - IB_MGMT_SA_HDR;
+	iter->recv_wc = mad_recv_wc;
+	iter->recv_buf = &mad_recv_wc->recv_buf;
+	iter->attr_offset = attr_offset;
+	iter->attr_size = attr_size;
+	return iter;
+}
+EXPORT_SYMBOL(ib_sa_iter_create);
+
+void ib_sa_iter_free(struct ib_sa_iter *iter)
+{
+	kfree(iter);
+}
+EXPORT_SYMBOL(ib_sa_iter_free);
+
+void *ib_sa_iter_next(struct ib_sa_iter *iter)
+{
+	struct ib_sa_mad *mad;
+	int left, offset = 0;
+
+	while (iter->data_left >= iter->attr_offset) {
+		while (iter->data_offset < IB_MGMT_SA_DATA) {
+			mad = (struct ib_sa_mad *) iter->recv_buf->mad;
+
+			left = IB_MGMT_SA_DATA - iter->data_offset;
+			if (left < iter->attr_size) {
+				/* copy first piece of the attribute */
+				iter->attr = &iter->attr_data;
+				memcpy(iter->attr,
+				       &mad->data[iter->data_offset], left);
+				offset = left;
+				break;
+			} else if (offset) {
+				/* copy the second piece of the attribute */
+				memcpy(iter->attr + offset, &mad->data[0],
+				       iter->attr_size - offset);
+				iter->data_offset = iter->attr_size - offset;
+				offset = 0;
+			} else {
+				iter->attr = &mad->data[iter->data_offset];
+				iter->data_offset += iter->attr_size;
+			}
+
+			iter->data_left -= iter->attr_offset;
+			goto out;
+		}
+		iter->data_offset = 0;
+		iter->recv_buf = list_entry(iter->recv_buf->list.next,
+					    struct ib_mad_recv_buf, list);
+	}
+	iter->attr = NULL;
+out:
+	return iter->attr;
+}
+EXPORT_SYMBOL(ib_sa_iter_next);
+
+int ib_sa_send_mad(struct ib_sa_client *client,
+		   struct ib_device *device, u8 port_num,
+		   int method, void *attr, __be16 attr_id,
+		   ib_sa_comp_mask comp_mask,
+		   int timeout_ms, int retries, gfp_t gfp_mask,
+		   void (*callback)(int status,
+				    struct ib_mad_recv_wc *mad_recv_wc,
+				    void *context),
+		   void *context, struct ib_sa_query **query)
 {
-	struct ib_sa_path_query *query =
-		container_of(sa_query, struct ib_sa_path_query, sa_query);
+	struct ib_sa_query  *sa_query;
+	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
+	struct ib_sa_port   *port;
+	struct ib_mad_agent *agent;
+	int ret;
+
+	if (!sa_dev)
+		return -ENODEV;
+
+	port  = &sa_dev->port[port_num - sa_dev->start_port];
+	agent = port->agent;
+
+	sa_query = kmalloc(sizeof *sa_query, gfp_mask);
+	if (!sa_query)
+		return -ENOMEM;
+
+	sa_query->mad_buf = ib_create_send_mad(agent, 1, 0,
+					       method == IB_SA_METHOD_GET_MULTI,
+					       IB_MGMT_SA_HDR, IB_MGMT_SA_DATA,
+					       gfp_mask);
+	if (!sa_query->mad_buf) {
+		ret = -ENOMEM;
+		goto err1;
+	}
 
-	if (mad) {
-		struct ib_sa_path_rec rec;
+	sa_query->port	   = port;
+	sa_query->callback = callback;
+	sa_query->context  = context;
 
-		ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table),
-			  mad->data, &rec);
-		query->callback(status, &rec, query->context);
-	} else
-		query->callback(status, NULL, query->context);
+	init_mad(sa_query->mad_buf->mad, agent, method, attr, attr_id,
+		 comp_mask);
+
+	ib_sa_client_get(client);
+	sa_query->client = client;
+	ret = send_mad(sa_query, timeout_ms, retries, gfp_mask);
+	if (ret < 0)
+		goto err2;
+
+	*query = sa_query;
+	return ret;
+
+err2:
+	ib_sa_client_put(sa_query->client);
+	ib_free_send_mad(sa_query->mad_buf);
+err1:
+	kfree(query);
+	return ret;
 }
+EXPORT_SYMBOL(ib_sa_send_mad);
 
-static void ib_sa_path_rec_release(struct ib_sa_query *sa_query)
+static void ib_sa_path_rec_callback(int status,
+				    struct ib_mad_recv_wc *mad_recv_wc,
+				    void *context)
 {
-	kfree(container_of(sa_query, struct ib_sa_path_query, sa_query));
+	struct ib_sa_path_query *query = context;
+
+	if (query->callback) {
+		if (mad_recv_wc) {
+			struct ib_sa_mad *mad;
+			struct ib_sa_path_rec rec;
+
+			mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad;
+			ib_unpack(path_rec_table, ARRAY_SIZE(path_rec_table),
+				  mad->data, &rec);
+			query->callback(status, &rec, query->context);
+		} else
+			query->callback(status, NULL, query->context);
+	}
+	if (mad_recv_wc)
+		ib_free_recv_mad(mad_recv_wc);
+	kfree(query);
 }
 
 /**
  * ib_sa_path_rec_get - Start a Path get query
- * @client:SA client
  * @device:device to send query on
  * @port_num: port number to send query on
  * @rec:Path Record to send in query
@@ -677,91 +920,54 @@ int ib_sa_path_rec_get(struct ib_sa_clie
 		       struct ib_sa_query **sa_query)
 {
 	struct ib_sa_path_query *query;
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-	struct ib_sa_port   *port;
-	struct ib_mad_agent *agent;
-	struct ib_sa_mad *mad;
+	u8 path[IB_SA_ATTR_PATH_REC_LEN];
 	int ret;
 
-	if (!sa_dev)
-		return -ENODEV;
-
-	port  = &sa_dev->port[port_num - sa_dev->start_port];
-	agent = port->agent;
-
 	query = kmalloc(sizeof *query, gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
-	query->sa_query.mad_buf = ib_create_send_mad(agent, 1, 0,
-						     0, IB_MGMT_SA_HDR,
-						     IB_MGMT_SA_DATA, gfp_mask);
-	if (!query->sa_query.mad_buf) {
-		ret = -ENOMEM;
-		goto err1;
-	}
-
-	ib_sa_client_get(client);
-	query->sa_query.client = client;
 	query->callback        = callback;
 	query->context         = context;
 
-	mad = query->sa_query.mad_buf->mad;
-	init_mad(mad, agent);
-
-	query->sa_query.callback = callback ? ib_sa_path_rec_callback : NULL;
-	query->sa_query.release  = ib_sa_path_rec_release;
-	query->sa_query.port     = port;
-	mad->mad_hdr.method	 = IB_MGMT_METHOD_GET;
-	mad->mad_hdr.attr_id	 = cpu_to_be16(IB_SA_ATTR_PATH_REC);
-	mad->sa_hdr.comp_mask	 = comp_mask;
-
-	ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), rec, mad->data);
-
-	*sa_query = &query->sa_query;
-
-	ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask);
+	ib_pack(path_rec_table, ARRAY_SIZE(path_rec_table), rec, path);
+	ret = ib_sa_send_mad(client, device, port_num, IB_MGMT_METHOD_GET, path,
+			     cpu_to_be16(IB_SA_ATTR_PATH_REC), comp_mask,
+			     timeout_ms, retries, gfp_mask,
+			     ib_sa_path_rec_callback, query, &query->sa_query);
 	if (ret < 0)
-		goto err2;
+		kfree(query);
 
 	return ret;
-
-err2:
-	*sa_query = NULL;
-	ib_sa_client_put(query->sa_query.client);
-	ib_free_send_mad(query->sa_query.mad_buf);
-
-err1:
-	kfree(query);
-	return ret;
 }
 EXPORT_SYMBOL(ib_sa_path_rec_get);
 
-static void ib_sa_service_rec_callback(struct ib_sa_query *sa_query,
-				    int status,
-				    struct ib_sa_mad *mad)
+static void ib_sa_service_rec_callback(int status,
+				       struct ib_mad_recv_wc *mad_recv_wc,
+				       void *context)
 {
-	struct ib_sa_service_query *query =
-		container_of(sa_query, struct ib_sa_service_query, sa_query);
+	struct ib_sa_service_query *query = context;
 
-	if (mad) {
-		struct ib_sa_service_rec rec;
-
-		ib_unpack(service_rec_table, ARRAY_SIZE(service_rec_table),
-			  mad->data, &rec);
-		query->callback(status, &rec, query->context);
-	} else
-		query->callback(status, NULL, query->context);
-}
-
-static void ib_sa_service_rec_release(struct ib_sa_query *sa_query)
-{
-	kfree(container_of(sa_query, struct ib_sa_service_query, sa_query));
+	if (query->callback) {
+		if (mad_recv_wc) {
+			struct ib_sa_mad *mad;
+			struct ib_sa_service_rec rec;
+
+			mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad;
+			ib_unpack(service_rec_table,
+				  ARRAY_SIZE(service_rec_table),
+				  mad->data, &rec);
+			query->callback(status, &rec, query->context);
+		} else
+			query->callback(status, NULL, query->context);
+	}
+	if (mad_recv_wc)
+		ib_free_recv_mad(mad_recv_wc);
+	kfree(query);
 }
 
 /**
  * ib_sa_service_rec_query - Start Service Record operation
- * @client:SA client
  * @device:device to send request on
  * @port_num: port number to send request on
  * @method:SA method - should be get, set, or delete
@@ -799,98 +1005,56 @@ int ib_sa_service_rec_query(struct ib_sa
 			    struct ib_sa_query **sa_query)
 {
 	struct ib_sa_service_query *query;
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-	struct ib_sa_port   *port;
-	struct ib_mad_agent *agent;
-	struct ib_sa_mad *mad;
+	u8 service[IB_SA_ATTR_SERVICE_REC_LEN];
 	int ret;
 
-	if (!sa_dev)
-		return -ENODEV;
-
-	port  = &sa_dev->port[port_num - sa_dev->start_port];
-	agent = port->agent;
-
-	if (method != IB_MGMT_METHOD_GET &&
-	    method != IB_MGMT_METHOD_SET &&
-	    method != IB_SA_METHOD_DELETE)
-		return -EINVAL;
-
 	query = kmalloc(sizeof *query, gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
-	query->sa_query.mad_buf = ib_create_send_mad(agent, 1, 0,
-						     0, IB_MGMT_SA_HDR,
-						     IB_MGMT_SA_DATA, gfp_mask);
-	if (!query->sa_query.mad_buf) {
-		ret = -ENOMEM;
-		goto err1;
-	}
-
-	ib_sa_client_get(client);
-	query->sa_query.client = client;
 	query->callback        = callback;
 	query->context         = context;
 
-	mad = query->sa_query.mad_buf->mad;
-	init_mad(mad, agent);
-
-	query->sa_query.callback = callback ? ib_sa_service_rec_callback : NULL;
-	query->sa_query.release  = ib_sa_service_rec_release;
-	query->sa_query.port     = port;
-	mad->mad_hdr.method	 = method;
-	mad->mad_hdr.attr_id	 = cpu_to_be16(IB_SA_ATTR_SERVICE_REC);
-	mad->sa_hdr.comp_mask	 = comp_mask;
-
-	ib_pack(service_rec_table, ARRAY_SIZE(service_rec_table),
-		rec, mad->data);
-
-	*sa_query = &query->sa_query;
-
-	ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask);
+	ib_pack(service_rec_table, ARRAY_SIZE(service_rec_table), rec, service);
+	ret = ib_sa_send_mad(client, device, port_num, method, service,
+			     cpu_to_be16(IB_SA_ATTR_SERVICE_REC), comp_mask,
+			     timeout_ms, retries, gfp_mask,
+			     ib_sa_service_rec_callback, query,
+			     &query->sa_query);
 	if (ret < 0)
-		goto err2;
-
-	return ret;
+		kfree(query);
 
-err2:
-	*sa_query = NULL;
-	ib_sa_client_put(query->sa_query.client);
-	ib_free_send_mad(query->sa_query.mad_buf);
-
-err1:
-	kfree(query);
 	return ret;
 }
 EXPORT_SYMBOL(ib_sa_service_rec_query);
 
-static void ib_sa_mcmember_rec_callback(struct ib_sa_query *sa_query,
-					int status,
-					struct ib_sa_mad *mad)
+static void ib_sa_mcmember_rec_callback(int status,
+				        struct ib_mad_recv_wc *mad_recv_wc,
+				        void *context)
 {
-	struct ib_sa_mcmember_query *query =
-		container_of(sa_query, struct ib_sa_mcmember_query, sa_query);
-
-	if (mad) {
-		struct ib_sa_mcmember_rec rec;
-
-		ib_unpack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table),
-			  mad->data, &rec);
-		query->callback(status, &rec, query->context);
-	} else
-		query->callback(status, NULL, query->context);
-}
+	struct ib_sa_mcmember_query *query = context;
 
-static void ib_sa_mcmember_rec_release(struct ib_sa_query *sa_query)
-{
-	kfree(container_of(sa_query, struct ib_sa_mcmember_query, sa_query));
+	if (query->callback) {
+		if (mad_recv_wc) {
+			struct ib_sa_mad *mad;
+			struct ib_sa_mcmember_rec rec;
+
+			mad = (struct ib_sa_mad *) mad_recv_wc->recv_buf.mad;
+			ib_unpack(mcmember_rec_table,
+				  ARRAY_SIZE(mcmember_rec_table),
+				  mad->data, &rec);
+			query->callback(status, &rec, query->context);
+		} else
+			query->callback(status, NULL, query->context);
+	}
+	if (mad_recv_wc)
+		ib_free_recv_mad(mad_recv_wc);
+	kfree(query);
 }
 
 int ib_sa_mcmember_rec_query(struct ib_sa_client *client,
 			     struct ib_device *device, u8 port_num,
-			     u8 method,
-			     struct ib_sa_mcmember_rec *rec,
+			     u8 method, struct ib_sa_mcmember_rec *rec,
 			     ib_sa_comp_mask comp_mask,
 			     int timeout_ms, int retries, gfp_t gfp_mask,
 			     void (*callback)(int status,
@@ -900,64 +1064,27 @@ int ib_sa_mcmember_rec_query(struct ib_s
 			     struct ib_sa_query **sa_query)
 {
 	struct ib_sa_mcmember_query *query;
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-	struct ib_sa_port   *port;
-	struct ib_mad_agent *agent;
-	struct ib_sa_mad *mad;
+	u8 mcmember[IB_SA_ATTR_MC_MEMBER_REC_LEN];
 	int ret;
 
-	if (!sa_dev)
-		return -ENODEV;
-
-	port  = &sa_dev->port[port_num - sa_dev->start_port];
-	agent = port->agent;
-
 	query = kmalloc(sizeof *query, gfp_mask);
 	if (!query)
 		return -ENOMEM;
 
-	query->sa_query.mad_buf = ib_create_send_mad(agent, 1, 0,
-						     0, IB_MGMT_SA_HDR,
-						     IB_MGMT_SA_DATA, gfp_mask);
-	if (!query->sa_query.mad_buf) {
-		ret = -ENOMEM;
-		goto err1;
-	}
-
-	ib_sa_client_get(client);
-	query->sa_query.client = client;
 	query->callback        = callback;
 	query->context         = context;
 
-	mad = query->sa_query.mad_buf->mad;
-	init_mad(mad, agent);
-
-	query->sa_query.callback = callback ? ib_sa_mcmember_rec_callback : NULL;
-	query->sa_query.release  = ib_sa_mcmember_rec_release;
-	query->sa_query.port     = port;
-	mad->mad_hdr.method	 = method;
-	mad->mad_hdr.attr_id	 = cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC);
-	mad->sa_hdr.comp_mask	 = comp_mask;
-
 	ib_pack(mcmember_rec_table, ARRAY_SIZE(mcmember_rec_table),
-		rec, mad->data);
-
-	*sa_query = &query->sa_query;
-
-	ret = send_mad(&query->sa_query, timeout_ms, retries, gfp_mask);
+		rec, mcmember);
+	ret = ib_sa_send_mad(client, device, port_num, method, mcmember,
+			     cpu_to_be16(IB_SA_ATTR_MC_MEMBER_REC), comp_mask,
+			     timeout_ms, retries, gfp_mask,
+			     ib_sa_mcmember_rec_callback, query,
+			     &query->sa_query);
 	if (ret < 0)
-		goto err2;
+		kfree(query);
 
 	return ret;
-
-err2:
-	*sa_query = NULL;
-	ib_sa_client_put(query->sa_query.client);
-	ib_free_send_mad(query->sa_query.mad_buf);
-
-err1:
-	kfree(query);
-	return ret;
 }
 EXPORT_SYMBOL(ib_sa_mcmember_rec_query);
 
@@ -973,13 +1100,13 @@ static void send_handler(struct ib_mad_a
 			/* No callback -- already got recv */
 			break;
 		case IB_WC_RESP_TIMEOUT_ERR:
-			query->callback(query, -ETIMEDOUT, NULL);
+			query->callback(-ETIMEDOUT, NULL, query->context);
 			break;
 		case IB_WC_WR_FLUSH_ERR:
-			query->callback(query, -EINTR, NULL);
+			query->callback(-EINTR, NULL, query->context);
 			break;
 		default:
-			query->callback(query, -EIO, NULL);
+			query->callback(-EIO, NULL, query->context);
 			break;
 		}
 
@@ -990,7 +1117,7 @@ static void send_handler(struct ib_mad_a
         ib_free_send_mad(mad_send_wc->send_buf);
 	kref_put(&query->sm_ah->ref, free_sm_ah);
 	ib_sa_client_put(query->client);
-	query->release(query);
+	kfree(query);
 }
 
 static void recv_handler(struct ib_mad_agent *mad_agent,
@@ -1002,17 +1129,11 @@ static void recv_handler(struct ib_mad_a
 	mad_buf = (void *) (unsigned long) mad_recv_wc->wc->wr_id;
 	query = mad_buf->context[0];
 
-	if (query->callback) {
-		if (mad_recv_wc->wc->status == IB_WC_SUCCESS)
-			query->callback(query,
-					mad_recv_wc->recv_buf.mad->mad_hdr.status ?
-					-EINVAL : 0,
-					(struct ib_sa_mad *) mad_recv_wc->recv_buf.mad);
-		else
-			query->callback(query, -EIO, NULL);
-	}
-
-	ib_free_recv_mad(mad_recv_wc);
+	if (query->callback)
+		query->callback(mad_recv_wc->recv_buf.mad->mad_hdr.status ?
+				-EINVAL : 0, mad_recv_wc, query->context);
+	else
+		ib_free_recv_mad(mad_recv_wc);
 }
 
 static void ib_sa_add_one(struct ib_device *device)
@@ -1046,8 +1167,9 @@ static void ib_sa_add_one(struct ib_devi
 
 		sa_dev->port[i].agent =
 			ib_register_mad_agent(device, i + s, IB_QPT_GSI,
-					      NULL, 0, send_handler,
-					      recv_handler, sa_dev);
+					      NULL, IB_MGMT_RMPP_VERSION,
+					      send_handler, recv_handler,
+					      sa_dev);
 		if (IS_ERR(sa_dev->port[i].agent))
 			goto err;
 

From vlad at dev.mellanox.co.il  Thu Sep 14 09:39:16 2006
From: vlad at dev.mellanox.co.il (vlad at dev.mellanox.co.il)
Date: Thu, 14 Sep 2006 19:39:16 +0300 (IDT)
Subject: [openib-general] OFED-1.1-RC5 is ready
In-Reply-To: <4507D4E2.90406@dev.mellanox.co.il>
References: <4507D4E2.90406@dev.mellanox.co.il>
Message-ID: <22607.194.90.237.34.1158251956.squirrel@dev.mellanox.co.il>

Hi,

OFED-1.1-rc5 is available on
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
File: OFED-1.1-rc5.tgz
Please report any issues in bugzilla http://openib.org/bugzilla/


Release details:
================
Build_id:

OFED-1.1-rc5

openib-1.1 (REV=9485)
# User space
https://openib.org/svn/gen2/branches/1.1/src/userspace
Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1
commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09

# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm

OS support:
===========
Novell:
     - SLES 9.0 SP3
     - SLES10
Redhat:
     - Redhat EL4 up3

     - Redhat EL4 up4
kernel.org:
     - Kernel 2.6.17


Bug fixes from OFED-1.1-rc4:
==========================
1. ISER compilation fixed on SLES10
2. Fixed build on SLES9 PPC64
3. Updated libehca
4. OpenSM fixes
5. Added tavor_quirk option to rdma_cm module (disabled by default): Tavor
performance quirk: limit MTU to 1K if > 0 (int)

Known issues:
=============
libipathverbs compilation fails on SLES10 (Bug:204)


OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday or
Tuesday.


Regards,
Vladimir


> Hi,
>
> The plan is to issue OFED RC5 on Thursday 9/14 and final release next
> week. I am aware of the  following issues:
>
>
> 1) Compilation on SLES9 on PPC     - Jack Morgenstein
> 2) Huge pages on PPC                      - Eli Cohen
> 3) libipathverbs:                                 - Qlogic
>             a) libipathverbs ABI issue
>             b) libipathverbs build on SLES10
> 4) SDP performance on Tavor           - Michael Tsirkin
> 5) iSER issue on SLES10                   - Voltaire
>
>
> In order to meet tomorrow's RC5 release all owners please send your
> patches by end of today.
>
>
> Regards,
>
>     Aviram
>
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
>


From gallen at arlut.utexas.edu  Thu Sep 14 09:50:04 2006
From: gallen at arlut.utexas.edu (Greg Allen)
Date: Thu, 14 Sep 2006 11:50:04 -0500
Subject: [openib-general] IB for FC5/x86_64
Message-ID: <p06230919c12f36f27767@[10.8.18.118]>

Is there a set of RPMs or SRPMs for FC5/x86_64? Even better, a yum 
server with them? I've tried generating them from the svn tree, but I 
keep getting hung up.

I love the way it works in RHEL4, but the SATA controller in my new 
box is currently unsupported in RHEL4.

Thanks,
-Greg
-- 
  Gregory E. Allen, MSEE Engineering Scientist
  Applied Research Laboratories: The University of Texas at Austin
  Please help find my missing daughter: http://FindSabrina.org/


From mst at mellanox.co.il  Thu Sep 14 10:15:30 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 14 Sep 2006 20:15:30 +0300
Subject: [openib-general] IB for FC5/x86_64
In-Reply-To: <p06230919c12f36f27767@[10.8.18.118]>
References: <p06230919c12f36f27767@[10.8.18.118]>
Message-ID: <20060914171530.GB27318@mellanox.co.il>

Quoting r. Greg Allen <gallen at arlut.utexas.edu>:
> Subject: IB for FC5/x86_64
> 
> Is there a set of RPMs or SRPMs for FC5/x86_64? Even better, a yum 
> server with them? I've tried generating them from the svn tree, but I 
> keep getting hung up.
> 
> I love the way it works in RHEL4, but the SATA controller in my new 
> box is currently unsupported in RHEL4.
> 
> Thanks,
> -Greg

Try OFED 1.1 RC.
https://openib.org/svn/gen2/branches/1.1/ofed/releases/


-- 
MST


From ralphc at pathscale.com  Thu Sep 14 10:55:47 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Thu, 14 Sep 2006 10:55:47 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <4508F850.5050804@voltaire.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<4507C8C2.6050206@voltaire.com>
	<1158172258.8759.230.camel@brick.pathscale.com>
	<4508F850.5050804@voltaire.com>
Message-ID: <1158256547.8759.260.camel@brick.pathscale.com>

On Thu, 2006-09-14 at 09:36 +0300, Or Gerlitz wrote:
> Ralph Campbell wrote:
> > On Wed, 2006-09-13 at 12:00 +0300, Or Gerlitz wrote:
> >> Ralph Campbell wrote:
> 
> > Well, the other parts of the kernel might not need a kernel virtual
> > address but the ib_ipath driver still does.
> 
> So you agree there is a need to kmap/kunamp pages which the user wants 
> to  use with IB and are not mapped into the kernel virt address space?

Yes, I agree for systems which have high memory pages.

> > I don't understand what you are talking about. There is an IB
> > wire protocol for RDMA, SEND, etc. That doesn't change depending
> > on the HCA.
> > The InfiniPath HCA has a ring buffer of receive buffers and all
> > incoming IB packets are DMA'ed into one of these buffers.
> > The ib_ipath software driver examines the packet and
> > copies it to the appropriate address. For a packet received with
> > a RC_RDMA_WRITE_FIRST, the RKEY and IB address are used to convert
> > that into a kernel virtual address and the data is copied.
> > The same happens for RC_SEND_FIRST but the KV address comes from
> > the LKEY and address in the work request posted by ib_post_recv().
> 
> OK, this make sense.
> 
> Lets see if i follow: you say that the Infinipath HCA is RX DMA-able but 
> it does RX DMA to the ipath driver private RX buffers and then the 
> driver copies from these buffers to the user buffer. My guess is that 
> you do that to support both recv and rdma read on this QP since if you 
> would only need to support recv you can have the hca dma-ing to the user 
> posted rx buffer.

You mostly understand. The hardware doesn't have separate receive
queues for each QP. All packets go into a single (or at most 4
currently) receive queues and the driver figures out which QP,
RDMA memory region, etc. to copy them to.


From ralphc at pathscale.com  Thu Sep 14 12:43:39 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Thu, 14 Sep 2006 12:43:39 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <45093428.5010009@voltaire.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<45093428.5010009@voltaire.com>
Message-ID: <1158263019.8759.324.camel@brick.pathscale.com>

On Thu, 2006-09-14 at 13:51 +0300, Or Gerlitz wrote:
> Ralph Campbell wrote:
> > +static inline dma_addr_t ib_dma_map_sg(struct ib_device *dev,
> > +				       struct scatterlist *sg, int nents,
> > +				       enum dma_data_direction direction)
> > +{
> > +	return dev->map_sg ?
> > +		dev->map_sg(dev, sg, nents, direction) :
> > +		dma_map_sg(dev->dma_device, sg, nents, direction);
> > +}
> 
> As SG dma mapping happens in place and you don't want to change struct 
> scatterlist for every arch, i think you would need to keep some mapping 
> (hash) from each struct scatterlist to its ipath buddy...
> 
> Also you would need to implement the sg_dma_address() and sg_dma_len() 
> macros used by ULP code when page/s is/are to be input-ed for the IB 
> verbs layer eg to get an SG FMR-ed or send/recv from/into a page and use 
> queries into the ipath scatterlist buddy.
> 
> Or.

Here is my thinking so far:

The driver is passed an LKEY/RKEY plus an address.
For ib_get_dma_mr(), the address is currently from
dma_map_single(), dma_map_page(), or dma_map_sg().
With the ib_dma_*() routines, I can intercept these calls
and return something instead of a bus or IOMMU address.
I would like to return a kernel virtual address since that
is the simplest and is what I ultimately need. This is
trivial for dma_map_single() and trivial for low memory
pages for dma_map_page().

I think I can safely just return error for architectures
with high memory pages since the driver really only works
on 64-bit systems (for a variety of reasons which I won't
go into) and those systems don't have high memory.

If I did have to support high memory pages, I think the
DMA address would have to be the address of some kmalloc'ed
structure containing the page pointer and offset (or an
index into a table of such data structures).
I wouldn't want to make ib_dma_map_single() have to use that
but then I would need a way to distingush addresses
returned from ib_dma_map_single() and dma_map_page()/dma_map_sg().

ib_dma_map_sg() is a bit more complex.
The struct scatterlist is defined in the architecture specific headers.
The sg_dma_address() and sg_dma_len() macros are the exported
interface for accessing the DMA address and length.
I would like to minimize the impact to architecture specific code.
Given these constraints, I think the best thing to do is
add ib_sg_dma_address() and ib_sg_dma_len() functions which should
be used instead of sg_dma_address() and sg_dma_len().
Struct scatterlist has to contain at least page, offset, and length
since the SCSI code relies on those.
ib_sg_dma_address would return the page_address() of sg->page
but wouldn't be able to rely on other fields which might be in
the struct scatterlist.
Again, if high page support is needed, ib_sg_dma_address() would
have to do something trickier like for ib_dma_map_page().


From robert.j.woodruff at intel.com  Thu Sep 14 12:52:08 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 14 Sep 2006 12:52:08 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>

Robert Walsh wrote, 
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
0x2302400
> VAddr 0x00002a95dd3480
> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
0x2402500
> VAddr 0x00002a95c85480
> 4730:main: Completion with error at client:
> 4730:main: Failed status 9: wr_id 3
> 4730:main: scnt=7584, ccnt=6584
> [woody at rkl-13 bin]$  

>Hi Woody,
Robert Walsh wrote, 
>When RC4 is available, there should be a patch in there that will fix
>this.  Can you let us know if you continue to see problems?

>Regards,
> Robert.

I installed RC5 and now it just hangs, 

[woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
iters=10000 | duplex=0 | cma=0 |
4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
VAddr 0x00002a95dc8480
4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
VAddr 0x00002a95c7c480
hangs here and have to cntrl-c the test.


Intel MPI also fails with, 
# Barrier
[1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
error. status=0x8. cookie=0x514ee0
rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
  exit status of rank 1: killed by signal 9 

woody


From rjwalsh at pathscale.com  Thu Sep 14 13:24:30 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Thu, 14 Sep 2006 13:24:30 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
Message-ID: <4509BA7E.3060906@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I installed RC5 and now it just hangs, 
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
> VAddr 0x00002a95dc8480
> 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
> VAddr 0x00002a95c7c480
> hangs here and have to cntrl-c the test.
> 
> 
> Intel MPI also fails with, 
> # Barrier
> [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
> error. status=0x8. cookie=0x514ee0
> rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9 

OK - thanks for the report - I'll look into it.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQm6fvzvnpzTd9fxAQKmiggAhKyznnhzO3ndlYYJx58cSX8XK/R5WNz0
CVhrKxVtjhq+cYaP6HAC9HmwuhMm18vlHGmw8fvoiwrhYP1h7dxaVgiAt9dX2rRz
svPd4rZnfIu+L9oZYmy7XBkfawwQR30IZPSUbfQDU1ag2r44HsnyZ6VpKucuHLfL
jUFxryC2lmwAU6GhuTKJ8k7XEEQBL3UoczPfL/PTwpFVYvM8CjMgLjwhIfqH++Hv
khciAfsl8HgK5Hd6jj1WCOzMyZmL7GBGrpTsia/hgUGOHkpmEC9wy3dSDZeIqCbI
4cs961Y2TIuciNraaLPbF4mhFFgaLJe4nzxSeTLfcbfxXraSqKbn9Q==
=pWln
-----END PGP SIGNATURE-----


From rdreier at cisco.com  Thu Sep 14 13:38:56 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 13:38:56 -0700
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
 rdma_accept
In-Reply-To: <45083222.9000005@ichips.intel.com> (Sean Hefty's message
	of "Wed, 13 Sep 2006 09:30:26 -0700")
References: <Pine.LNX.4.64.0609121053140.13564@zuben>
	<45083222.9000005@ichips.intel.com>
Message-ID: <ada4pvaw5cv.fsf@cisco.com>

 > Committed to svn 9461.  Roland, can you also pull into 2.6.19?

Done.

<whine>
I merge > 100 patches every kernel release.  If I have to spend an
extra 5 minutes creating a patch or pulling it out of svn, then I end
up burning an extra day of stupid work.  If 20+ people who contribute
patches sent me clean patches, then everyone will be happier because
I'll be able to merge things quicker and focus on productive work.
</whine>

Thanks,
  Roland


From sean.hefty at intel.com  Thu Sep 14 13:43:42 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 14 Sep 2006 13:43:42 -0700
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
 rdma_accept
In-Reply-To: <ada4pvaw5cv.fsf@cisco.com>
Message-ID: <000201c6d83e$776c8350$97d8180a@amr.corp.intel.com>

>I merge > 100 patches every kernel release.  If I have to spend an
>extra 5 minutes creating a patch or pulling it out of svn, then I end
>up burning an extra day of stupid work.  If 20+ people who contribute
>patches sent me clean patches, then everyone will be happier because
>I'll be able to merge things quicker and focus on productive work.

Sorry about that.  I was assuming that you could use Or's original patch
directly.

- Sean


From rdreier at cisco.com  Thu Sep 14 13:45:47 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 13:45:47 -0700
Subject: [openib-general] [PATCH] RDMA/cma: document error flow of
 rdma_accept
In-Reply-To: <000201c6d83e$776c8350$97d8180a@amr.corp.intel.com> (Sean
	Hefty's message of "Thu, 14 Sep 2006 13:43:42 -0700")
References: <000201c6d83e$776c8350$97d8180a@amr.corp.intel.com>
Message-ID: <adazmd2uqh0.fsf@cisco.com>

    Sean> Sorry about that.  I was assuming that you could use Or's
    Sean> original patch directly.

Then I have to track down the original email which isn't always easy
either.  Anyway don't take it personally, I just created that <whine>
block for standard use now -- you'll probably see it again in other
threads ;)

 - R.


From rdreier at cisco.com  Thu Sep 14 13:52:49 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 13:52:49 -0700
Subject: [openib-general] [PATCH] ipoib sendonly join
In-Reply-To: <1158248878.18456.11.camel@localhost> (Eli cohen's message
	of "Thu, 14 Sep 2006 18:47:58 +0300")
References: <1158248878.18456.11.camel@localhost>
Message-ID: <adavenquq5a.fsf@cisco.com>

Thanks, applied


From rdreier at cisco.com  Thu Sep 14 13:59:48 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 13:59:48 -0700
Subject: [openib-general] [GIT PULL] please pull infiniband.git
Message-ID: <adar6yeuptn.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This contains a few last-minute fixes -- a couple of one-liners, and a
panic fix that turns out to be pure deletions:

Eli Cohen:
      IPoIB: Retry failed send-only multicast group joins

Ishai Rabinovitz:
      IB/srp: Don't schedule reconnect from srp

Michael S. Tsirkin:
      RDMA/cma: Increase the IB CM retry count in CMA

 drivers/infiniband/core/cma.c                  |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    1 +
 drivers/infiniband/ulp/srp/ib_srp.c            |   14 --------------
 3 files changed, 2 insertions(+), 15 deletions(-)


diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d6f99d5..5d625a8 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -49,7 +49,7 @@ MODULE_DESCRIPTION("Generic RDMA CM Agen
 MODULE_LICENSE("Dual BSD/GPL");
 
 #define CMA_CM_RESPONSE_TIMEOUT 20
-#define CMA_MAX_CM_RETRIES 3
+#define CMA_MAX_CM_RETRIES 15
 
 static void cma_add_one(struct ib_device *device);
 static void cma_remove_one(struct ib_device *device);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index b5e6a7b..ec356ce 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -326,6 +326,7 @@ ipoib_mcast_sendonly_join_complete(int s
 
 		/* Clear the busy flag so we try again */
 		clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
+		mcast->query = NULL;
 	}
 
 	complete(&mcast->done);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 8257d5a..fd8344c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -799,13 +799,6 @@ static void srp_process_rsp(struct srp_t
 	spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
 }
 
-static void srp_reconnect_work(void *target_ptr)
-{
-	struct srp_target_port *target = target_ptr;
-
-	srp_reconnect_target(target);
-}
-
 static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 {
 	struct srp_iu *iu;
@@ -858,7 +851,6 @@ static void srp_completion(struct ib_cq 
 {
 	struct srp_target_port *target = target_ptr;
 	struct ib_wc wc;
-	unsigned long flags;
 
 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
@@ -866,10 +858,6 @@ static void srp_completion(struct ib_cq 
 			printk(KERN_ERR PFX "failed %s status %d\n",
 			       wc.wr_id & SRP_OP_RECV ? "receive" : "send",
 			       wc.status);
-			spin_lock_irqsave(target->scsi_host->host_lock, flags);
-			if (target->state == SRP_TARGET_LIVE)
-				schedule_work(&target->work);
-			spin_unlock_irqrestore(target->scsi_host->host_lock, flags);
 			break;
 		}
 
@@ -1705,8 +1693,6 @@ static ssize_t srp_create_target(struct 
 	target->scsi_host  = target_host;
 	target->srp_host   = host;
 
-	INIT_WORK(&target->work, srp_reconnect_work, target);
-
 	INIT_LIST_HEAD(&target->free_reqs);
 	INIT_LIST_HEAD(&target->req_queue);
 	for (i = 0; i < SRP_SQ_SIZE; ++i) {


From rdreier at cisco.com  Thu Sep 14 14:00:47 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 14:00:47 -0700
Subject: [openib-general] [PATCH for-2.6.18] Re: [PATCH] IB/cma: add
	rdma_establish
In-Reply-To: <20060913120154.GA23890@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 13 Sep 2006 15:01:54 +0300")
References: <45073FF7.7020506@ichips.intel.com>
	<20060913120154.GA23890@mellanox.co.il>
Message-ID: <adak646ups0.fsf@cisco.com>

OK, I put this in 2.6.18 since I had a few other fixes that I thought
should go into 2.6.18 too.  It was a close call between merging this
now or putting it into 2.6.19 and waiting for 2.6.18.1, but I don't
think it matters much either way.

 - R.


From rdreier at cisco.com  Thu Sep 14 14:11:32 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 14:11:32 -0700
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <20060914141901.GG25691@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 14 Sep 2006 17:19:01 +0300")
References: <20060914141901.GG25691@mellanox.co.il>
Message-ID: <adafyeuupa3.fsf@cisco.com>

 > Hmm, IPoIB does not seem to copy anything except the pkey.
 > Looks like a compliance issue.

Well, the only MUST attributes we seem to be missing are HopLimit and MTU
(P_Key, Q_Key and SL are all copied from the broadcast group when
creating a new multicast group).

 > Specifically, I'm not sure what "other attributes" are, but I think
 > this should include the static rate. Right?

I guess we should definitely do HopLimit and MTU, since those are
MUSTs.  The only other attribute we look at is Rate, so I guess we
should set that also.

 - R.


From sean.hefty at intel.com  Thu Sep 14 16:17:33 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 14 Sep 2006 16:17:33 -0700
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
	unmatched DREQ
Message-ID: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>

Currently a DREP is only sent in response to a DREQ if a connection
has been found matching the DREQ, and it is in the proper state.  Once
a DREP is sent, the local connection moves into timewait.  Duplicate
DREQs received while in this state result in re-sending the DREP.

However, it's likely that the local connection will enter and exit
timewait before the remote side times out a lost DREP and resends a DREQ.
There are a couple possible solutions to this.  One is to increase how
long a connection remains in timewait, by multiplying its wait time by
max_cm_retries.  This can greatly increase the timewait state before a QP
can be re-used when CM messages are not lost.

An alternative is to send a DREP in response to a DREQ, even if a local
connection is not found, which is what this patch does.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Index: cm.c
===================================================================
--- cm.c	(revision 9490)
+++ cm.c	(working copy)
@@ -1900,6 +1900,32 @@ out:	spin_unlock_irqrestore(&cm_id_priv-
 }
 EXPORT_SYMBOL(ib_send_cm_drep);
 
+static int cm_issue_drep(struct cm_port *port,
+			 struct ib_mad_recv_wc *mad_recv_wc)
+{
+	struct ib_mad_send_buf *msg = NULL;
+	struct cm_dreq_msg *dreq_msg;
+	struct cm_drep_msg *drep_msg;
+	int ret;
+
+	ret = cm_alloc_response_msg(port, mad_recv_wc, &msg);
+	if (ret)
+		return ret;
+
+	dreq_msg = (struct cm_dreq_msg *) mad_recv_wc->recv_buf.mad;
+	drep_msg = (struct cm_drep_msg *) msg->mad;
+
+	cm_format_mad_hdr(&drep_msg->hdr, CM_DREP_ATTR_ID, dreq_msg->hdr.tid);
+	drep_msg->remote_comm_id = dreq_msg->local_comm_id;
+	drep_msg->local_comm_id = dreq_msg->remote_comm_id;
+
+	ret = ib_post_send_mad(msg, NULL);
+	if (ret)
+		cm_free_msg(msg);
+
+	return ret;
+}
+
 static int cm_dreq_handler(struct cm_work *work)
 {
 	struct cm_id_private *cm_id_priv;
@@ -1911,8 +1937,10 @@ static int cm_dreq_handler(struct cm_wor
 	dreq_msg = (struct cm_dreq_msg *)work->mad_recv_wc->recv_buf.mad;
 	cm_id_priv = cm_acquire_id(dreq_msg->remote_comm_id,
 				   dreq_msg->local_comm_id);
-	if (!cm_id_priv)
+	if (!cm_id_priv) {
+		cm_issue_drep(work->port, work->mad_recv_wc);
 		return -EINVAL;
+	}
 
 	work->cm_event.private_data = &dreq_msg->private_data;
 

From halr at voltaire.com  Thu Sep 14 17:27:17 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 20:27:17 -0400
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <adafyeuupa3.fsf@cisco.com>
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
Message-ID: <1158280030.25157.19154.camel@hal.voltaire.com>

On Thu, 2006-09-14 at 17:11, Roland Dreier wrote:
>  > Hmm, IPoIB does not seem to copy anything except the pkey.
>  > Looks like a compliance issue.
> 
> Well, the only MUST attributes we seem to be missing are HopLimit and MTU
> (P_Key, Q_Key and SL are all copied from the broadcast group when
> creating a new multicast group).

Are HopLimit and MTU needed for join ? I thought it was fine to wildcard
those for a join. If they are specified though, they do need to match
the group.

>  > Specifically, I'm not sure what "other attributes" are, but I think
>  > this should include the static rate. Right?
> 
> I guess we should definitely do HopLimit and MTU, since those are
> MUSTs.

Are you referring to create or join here ?

>   The only other attribute we look at is Rate, so I guess we
> should set that also.

The rate should be set properly (with the rate selector set to exactly)
in the response regardless of whether it was set in the request (e.g.
wildcarded or not).

-- Hal

>  - R.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From rdreier at cisco.com  Thu Sep 14 17:35:54 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 17:35:54 -0700
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <1158280030.25157.19154.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "14 Sep 2006 20:27:17 -0400")
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
	<1158280030.25157.19154.camel@hal.voltaire.com>
Message-ID: <ada7j06ufth.fsf@cisco.com>

    Hal> Are you referring to create or join here ?

The whole thing is about new groups that IPoIB creates.  Currently we
don't specify HopLimit, MTU or Rate, and the IPoIB RFC says we should.


From rjwalsh at pathscale.com  Thu Sep 14 17:39:29 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Thu, 14 Sep 2006 17:39:29 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
Message-ID: <4509F641.5030302@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> I installed RC5 and now it just hangs, 

Wow - we can't even get RC5 to build here.  What distro are you running?

I've tried this on RC4 + a fixed libipathverbs package and it runs OK
(although it does take a while, which might explain the hang you were
seeing.)

But mostly I'm curious how you get RC5 to build at all.

We really really really shouldn't be attempting to turn RC's around as
fast as RC4 to RC5 went: we basically had about enough time to throw a
patch together without being able to do much testing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQn2QPzvnpzTd9fxAQJFogf/fJidIu6UVaSTbGMyia66kgYrtrL5lvtr
FcmyBI01SbjOUnd9rfejt0y1IeN+1O88wBBJBnQPSi3aRUmCufuGYRWM9T2ZXmw8
PxCLyN44AvyF/B6SUfwr8ygXcAQ2nJPvxfdpnEyFlTxBf5gatDg00YiSRu88NtxR
5DrDsK/8OSpy6j0lRVoB7hJh2cs74NhtXawvvzlmGBI4ZhoTmifNPSmPnXwMHJ7+
a4A+dK1cSqjLFUXDh6WPIM5OHS6bKbQeKQ3J4H+I99uK+5n3fb/9CP+Z/aZ3/JEG
Qg9dfgsF4onKNBDsXPoGHjI1iU+FOghLFZCTvYXirkqXPgVsTAVK5A==
=hwu5
-----END PGP SIGNATURE-----


From halr at voltaire.com  Thu Sep 14 17:37:47 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 20:37:47 -0400
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <ada7j06ufth.fsf@cisco.com>
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
	<1158280030.25157.19154.camel@hal.voltaire.com>
	<ada7j06ufth.fsf@cisco.com>
Message-ID: <1158280653.25157.19569.camel@hal.voltaire.com>

On Thu, 2006-09-14 at 20:35, Roland Dreier wrote:
>     Hal> Are you referring to create or join here ?
> 
> The whole thing is about new groups that IPoIB creates.  Currently we
> don't specify HopLimit, MTU or Rate, and the IPoIB RFC says we should.

That indeed is true for create. However, send only members can never
create a group (only full members can do this). Am I confusing this with
a different patch which went by ?

-- Hal


From rdreier at cisco.com  Thu Sep 14 17:52:16 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 17:52:16 -0700
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <1158280653.25157.19569.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "14 Sep 2006 20:37:47 -0400")
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
	<1158280030.25157.19154.camel@hal.voltaire.com>
	<ada7j06ufth.fsf@cisco.com>
	<1158280653.25157.19569.camel@hal.voltaire.com>
Message-ID: <ada3bauuf27.fsf@cisco.com>

    Hal> That indeed is true for create. However, send only members
    Hal> can never create a group (only full members can do this). Am
    Hal> I confusing this with a different patch which went by ?

Yes, I think so.  Look back to the beginning of this thread for the
initial report of the problem.

 - R.


From halr at voltaire.com  Thu Sep 14 17:52:31 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 14 Sep 2006 20:52:31 -0400
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <ada3bauuf27.fsf@cisco.com>
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
	<1158280030.25157.19154.camel@hal.voltaire.com>
	<ada7j06ufth.fsf@cisco.com>
	<1158280653.25157.19569.camel@hal.voltaire.com>
	<ada3bauuf27.fsf@cisco.com>
Message-ID: <1158281542.25157.20107.camel@hal.voltaire.com>

On Thu, 2006-09-14 at 20:52, Roland Dreier wrote:
>     Hal> That indeed is true for create. However, send only members
>     Hal> can never create a group (only full members can do this). Am
>     Hal> I confusing this with a different patch which went by ?
> 
> Yes, I think so.  Look back to the beginning of this thread for the
> initial report of the problem.

I see now. It does refer to full members. I know this used to be right
(with perhaps the exception of rate which was what the email referred
to). Is my memory wrong ?

-- Hal


From rjwalsh at pathscale.com  Thu Sep 14 18:22:21 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Thu, 14 Sep 2006 18:22:21 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
Message-ID: <450A004D.9090804@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Woodruff, Robert J wrote:
> Robert Walsh wrote, 
>> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
>> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
>> iters=10000 | duplex=0 | cma=0 |
>> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
> 0x2302400
>> VAddr 0x00002a95dd3480
>> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
> 0x2402500
>> VAddr 0x00002a95c85480
>> 4730:main: Completion with error at client:
>> 4730:main: Failed status 9: wr_id 3
>> 4730:main: scnt=7584, ccnt=6584
>> [woody at rkl-13 bin]$  
> 
>> Hi Woody,
> Robert Walsh wrote, 
>> When RC4 is available, there should be a patch in there that will fix
>> this.  Can you let us know if you continue to see problems?
> 
>> Regards,
>> Robert.
> 
> I installed RC5 and now it just hangs, 
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
> VAddr 0x00002a95dc8480
> 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
> VAddr 0x00002a95c7c480
> hangs here and have to cntrl-c the test.
> 
> 
> Intel MPI also fails with, 
> # Barrier
> [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
> error. status=0x8. cookie=0x514ee0
> rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9 

Hi Woody,

So, we built everything using RC5 plus the libipathverbs from subversion
and we were successfully able to run ib_rdma_bw (with your arguments
above) and Intel MPI (a simple MPI hello world program).  I'm going to
continue testing with the Intel MPI testsuite and some applications ISV
applications.

I'll keep you informed.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC
L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3
21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36
G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy
9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij
zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ==
=06gu
-----END PGP SIGNATURE-----


From rdreier at cisco.com  Thu Sep 14 19:19:00 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 14 Sep 2006 19:19:00 -0700
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <1158281542.25157.20107.camel@hal.voltaire.com> (Hal
	Rosenstock's message of "14 Sep 2006 20:52:31 -0400")
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
	<1158280030.25157.19154.camel@hal.voltaire.com>
	<ada7j06ufth.fsf@cisco.com>
	<1158280653.25157.19569.camel@hal.voltaire.com>
	<ada3bauuf27.fsf@cisco.com>
	<1158281542.25157.20107.camel@hal.voltaire.com>
Message-ID: <aday7slub1n.fsf@cisco.com>

    Hal> I see now. It does refer to full members. I know this used to
    Hal> be right (with perhaps the exception of rate which was what
    Hal> the email referred to). Is my memory wrong ?

I think that IPoIB has always used the attributes required by the IBA
spec to create a multicast group, but not all the attributes required
by the IPoIB spec.

 - R.


From sweitzen at cisco.com  Thu Sep 14 21:39:59 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 14 Sep 2006 21:39:59 -0700
Subject: [openib-general] [openfabrics-ewg] [PATCH] OFED 1.1-rc3 is ready
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302446271@xmb-sjc-216.amer.cisco.com>


> > I installed RC5 and now it just hangs, 
> 
> Wow - we can't even get RC5 to build here.  What distro are 
> you running?
> 
> I've tried this on RC4 + a fixed libipathverbs package and it runs OK
> (although it does take a while, which might explain the hang you were
> seeing.)
> 
> But mostly I'm curious how you get RC5 to build at all.
> 
> We really really really shouldn't be attempting to turn RC's around as
> fast as RC4 to RC5 went: we basically had about enough time to throw a
> patch together without being able to do much testing.

I think many of us are in agreement, before RC6 I propose we only check
in critical work on the release branch, and get some time in to
thoroughly test RC5.  Non-critical fixes can wait until after 1.1.  I
personally would like a week to test RC5.  I feel like we had forgotten
what the E in OFED stands for, if we have to slip the release schedule
to make this code really stable I'm in favor of it.

Scott


From mst at mellanox.co.il  Thu Sep 14 22:03:50 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 15 Sep 2006 08:03:50 +0300
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <450A004D.9090804@pathscale.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
	<450A004D.9090804@pathscale.com>
Message-ID: <20060915050349.GD24221@mellanox.co.il>

Well, it looks like the libipathverbs that went into 1.1 branch was botched.
How come?
Please note that Mellanox for one is unable to test libipathverbs at all.
libipathverbs maintainers, please, try to fix by Sunday.
And please, test the changes before you commit them.


Quoting r. Robert Walsh <rjwalsh at pathscale.com>:
Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Woodruff, Robert J wrote:
> Robert Walsh wrote, 
>> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
>> 4730: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
>> iters=10000 | duplex=0 | cma=0 |
>> 4730: Local address:  LID 0x03, QPN 0x001d, PSN 0x9e070c RKey
> 0x2302400
>> VAddr 0x00002a95dd3480
>> 4730: Remote address: LID 0x04, QPN 0x001e, PSN 0x2bd6ba, RKey
> 0x2402500
>> VAddr 0x00002a95c85480
>> 4730:main: Completion with error at client:
>> 4730:main: Failed status 9: wr_id 3
>> 4730:main: scnt=7584, ccnt=6584
>> [woody at rkl-13 bin]$  
> 
>> Hi Woody,
> Robert Walsh wrote, 
>> When RC4 is available, there should be a patch in there that will fix
>> this.  Can you let us know if you continue to see problems?
> 
>> Regards,
>> Robert.
> 
> I installed RC5 and now it just hangs, 
> 
> [woody at rkl-13 bin]$ ./ib_rdma_bw -n 10000 -t 1000 -s 2000000 rkl-12
> 4702: | port=18515 | ib_port=1 | size=2000000 | tx_depth=1000 |
> iters=10000 | duplex=0 | cma=0 |
> 4702: Local address:  LID 0x03, QPN 0x000d, PSN 0xf1b711 RKey 0x1101200
> VAddr 0x00002a95dc8480
> 4702: Remote address: LID 0x04, QPN 0x000d, PSN 0xe62247, RKey 0x1101200
> VAddr 0x00002a95c7c480
> hangs here and have to cntrl-c the test.
> 
> 
> Intel MPI also fails with, 
> # Barrier
> [1][rdma_iba.c:260] Intel MPI fatal error: DTO operation completed with
> error. status=0x8. cookie=0x514ee0
> rank 1 in job 4  rkl-13_32779   caused collective abort of all ranks
>   exit status of rank 1: killed by signal 9 

Hi Woody,

So, we built everything using RC5 plus the libipathverbs from subversion
and we were successfully able to run ib_rdma_bw (with your arguments
above) and Intel MPI (a simple MPI hello world program).  I'm going to
continue testing with the Intel MPI testsuite and some applications ISV
applications.

I'll keep you informed.

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRQoATfzvnpzTd9fxAQLUKQf9E1ps9XbbXplMm6+5O/XDdlWF0BQws1SC
L/aGygh34fZSkpGmCrfze3HhsaOqasu9gUOsJQ89jX6pKNkv4tJAxSJCr+n+bdG3
21Bqr9gcM0MbzrDvOcUDHqvnmC0THlCf0XhikjKg/FJR1e48BIiAOFUzfi0VvI36
G1ZtD8xZXydOfWq7Z4xvyf9Y3qNPIeSKR2JZGJQoGHjxY4+vcteK0UVHfic1Bgpy
9uql47af6tncN+CazYcwf8xnHegiDr34iEEre5wUz//Qy62j8JNPnxhit0W9lXij
zFszTkOHQeibxbFWi9ZRyigTmHanxxRUuznW54NL8NIF30jhnmcksQ==
=06gu
-----END PGP SIGNATURE-----

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg

-- 
MST


From mst at mellanox.co.il  Thu Sep 14 22:09:53 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 15 Sep 2006 08:09:53 +0300
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <4509F641.5030302@pathscale.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
	<4509F641.5030302@pathscale.com>
Message-ID: <20060915050953.GE24221@mellanox.co.il>

Quoting r. Robert Walsh <rjwalsh at pathscale.com>:
> Subject: Re: [openib-general] [PATCH] OFED 1.1-rc3 is ready
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> > I installed RC5 and now it just hangs, 
> 
> Wow - we can't even get RC5 to build here.  What distro are you running?
> 
> I've tried this on RC4 + a fixed libipathverbs package and it runs OK
> (although it does take a while, which might explain the hang you were
> seeing.)
> 
> But mostly I'm curious how you get RC5 to build at all.
> 
> We really really really shouldn't be attempting to turn RC's around as
> fast as RC4 to RC5 went: we basically had about enough time to throw a
> patch together without being able to do much testing.

Changes are expected to be tested before you commit.
This is really maintainer's responsibility, please take it seriously.

-- 
MST


From rjwalsh at pathscale.com  Thu Sep 14 23:04:55 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Thu, 14 Sep 2006 23:04:55 -0700
Subject: [openib-general] [PATCH] OFED 1.1-rc3 is ready
In-Reply-To: <20060915050953.GE24221@mellanox.co.il>
References: <BAE9DCEF64577A439B3A37F36F9B691CA1BE51@orsmsx418.amr.corp.intel.com>
	<4509F641.5030302@pathscale.com>
	<20060915050953.GE24221@mellanox.co.il>
Message-ID: <450A4287.1070309@pathscale.com>

> Changes are expected to be tested before you commit.
> This is really maintainer's responsibility, please take it seriously.

I have to take exception here.  It's only possible for us to make a 
serious attempt at doing something like this if OFED takes a more 
serious approach to the idea of what a "release candidate" is.  Throwing 
out an RC in one day was not a good idea; nor was changing an API in the 
middle of the process.  We're lucky that's all that broke.

Regards,
  Robert.


From thomas.bub at thomson.net  Thu Sep 14 23:24:19 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Fri, 15 Sep 2006 08:24:19 +0200
Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9 x86_64?
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC451@wdtssmail01.eu.thmulti.com>

Is there any chance/trick to get 32-Bit Libraries build and usable on
SLES9 x86_64?
When I installed OFED-1.1-rc4 I get:
 
WARNING: sysfsutils 32-bit version is required to build 32-bit
libibverbs package.
WARNING: Skiping build of 32-bit libraries.

I googled around and didn't find any sysfsutils 32-bit for SLES9.
I now that tit is working under SLES10 b  ut our customer base is on
SLES9 and very conservative when it comes down to using the latest and
greates Os/distribution.

Thomas

............................................................
Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: +49 6150 104 147
Fax: +49 6150 104 656
Email: Thomas.Bub at thomson.net
www.GrassValley.com  <http://www.grassvalley.com> 
............................................................


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060915/f687c483/attachment.html>

From bugzilla-daemon at openib.org  Fri Sep 15 00:24:51 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Fri, 15 Sep 2006 00:24:51 -0700 (PDT)
Subject: [openib-general] [Bug 222] ib_uverbs fails to load on ia64,
	OFED 1.1 - rc3
Message-ID: <20060915072451.2C0652283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=222


sweitzen at cisco.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED


------- Comment #2 from sweitzen at cisco.com  2006-09-15 00:24 -------
Now loads OK on RHEL4 U3 ia64 using OFED 1.1 rc4.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From thomas.bub at thomson.net  Fri Sep 15 02:37:54 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Fri, 15 Sep 2006 11:37:54 +0200
Subject: [openib-general] What can be the reason for VAPI_WR_FLUSH_ERR when
 sending from gen2 to gen1
Message-ID: <B79FAF8BB536314E859EA1963CFFD22201FBD397@wdtssmail01.eu.thmulti.com>

This seems to be the very last little bug in my journey migrating from
gen1 client and server to gen2 client  and gen1 server.
While I came over all CM issues I had so far (thanks to Sean Heafty)
I'm now in the situation that I have a gen2 client connected to a gen1
server via CM.
Unfortunately the first IBV_WR_SEND causes a:
(syndrome=0xf9=VAPI_WR_FLUSH_ERR , opcode=6=VAPI_CQE_RQ_SEND_DATA)
error in the receive completion queue of the server.

Doing the CM connection and the first send in the opposite direction
from gen1 client to gen2 server it is OK.
Needless to say that connection and send between gen1 <-> gen1 and
gen2<->gen2 is OK as well.

I copied Erez Cohen from Mellanox as well.

Maybe someone can explain me in more detail what the error is about and
how to avoid it.

Thanks
Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060915/6897b7aa/attachment.html>

From eitan at mellanox.co.il  Fri Sep 15 04:45:35 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 15 Sep 2006 14:45:35 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108
	devices
Message-ID: <86y7sle4kg.fsf@mtl066.yok.mtl.com>

Hi Hal

The following patch solves an issue with OpenSM preferring largest MTU 
for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
devices instead of using a 1K MTU which is best for this device.

Since this is a device specific quirk I have added a configuration option
named enable_quirks which is FALSE by default to enable this functionality.

To summarize the functionality change:
1. Added enable_quirks option 
2. If enable_quirks is FALSE do nothing
3. If a specific MTU is requested (either =2K or >1K) do nothing
4. If either source port or destination port is a Tavor device  
	MTU is limited to 1K (can be further reduced by path traversal) 

Target is both trunk and OFED 1.1

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: include/opensm/osm_subnet.h
===================================================================
--- include/opensm/osm_subnet.h	(revision 9493)
+++ include/opensm/osm_subnet.h	(working copy)
@@ -286,6 +286,7 @@ typedef struct _osm_subn_opt
   osm_qos_options_t        qos_sw0_options;
   osm_qos_options_t        qos_swe_options;
   osm_qos_options_t        qos_rtr_options;
+  boolean_t                enable_quirks;
 } osm_subn_opt_t;
 /*
 * FIELDS
@@ -469,6 +470,10 @@ typedef struct _osm_subn_opt
 *	qos_rtr_options
 *		QoS options for router ports
 *
+*  enable_quirks
+*     Enable high risk new features and not fully qualified 
+*     hardware specific work arounds
+*
 * SEE ALSO
 *	Subnet object
 *********/
Index: include/opensm/osm_base.h
===================================================================
--- include/opensm/osm_base.h	(revision 9493)
+++ include/opensm/osm_base.h	(working copy)
@@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type
 #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120
 /**********/
 
+/****s* OpenSM: Base/VendorOUIs
+* NAME
+*	VendorOUIs
+*
+* DESCRIPTION
+*	Known device vendor ID and GUID OUIs
+*
+* SYNOPSIS
+*/
+#define OSM_VENDOR_ID_INTEL         0x00D0B7
+#define OSM_VENDOR_ID_MELLANOX      0x0002C9
+#define OSM_VENDOR_ID_REDSWITCH     0x000617
+#define OSM_VENDOR_ID_SILVERSTORM   0x00066A
+#define OSM_VENDOR_ID_TOPSPIN       0x0005AD
+#define OSM_VENDOR_ID_FUJITSU       0x00E000
+#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
+#define OSM_VENDOR_ID_VOLTAIRE      0x0008F1
+#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
+#define OSM_VENDOR_ID_PATHSCALE     0x001175
+#define OSM_VENDOR_ID_IBM           0x000255
+#define OSM_VENDOR_ID_DIVERGENET    0x00084E
+#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
+#define OSM_VENDOR_ID_AGILENT       0x0030D3
+#define OSM_VENDOR_ID_OBSIDIAN      0x001777
+#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
+#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
+/**********/
+
 END_C_DECLS
 
 #endif	/* _OSM_BASE_H_ */
Index: opensm/osm_sa_multipath_record.c
===================================================================
--- opensm/osm_sa_multipath_record.c	(revision 9493)
+++ opensm/osm_sa_multipath_record.c	(working copy)
@@ -150,6 +150,75 @@ osm_mpr_rcv_init(
 
 /**********************************************************************
  **********************************************************************/
+static inline boolean_t
+__osm_sa_multipath_rec_is_tavor_port(
+	IN const osm_port_t*     const p_port)
+{
+	osm_node_t const* p_node;
+	ib_net32_t vend_id;
+
+	p_node = osm_port_get_parent_node( p_port );
+	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
+	
+	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
+			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
+				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
+				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
+				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
+}
+
+/**********************************************************************
+ **********************************************************************/
+boolean_t
+ __osm_sa_multipath_rec_apply_tavor_mtu_limit(
+  IN const ib_multipath_rec_t*  const p_mpr,
+  IN const osm_port_t*          const p_src_port,
+  IN const osm_port_t*          const p_dest_port,
+  IN const ib_net64_t           comp_mask)
+{
+	uint8_t   required_mtu;
+	
+	/* only if one of the ports is a Tavor device */
+	if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && 
+		 ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) )
+		return( FALSE );
+	
+	/*
+	  we can apply the patch if either:
+	  1. No MTU required
+	  2. Required MTU < 
+	  3. Required MTU = 1K or 512 or 256
+	  4. Required MTU > 256 or 512
+	*/
+	required_mtu = ib_multipath_rec_mtu( p_mpr );
+	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
+		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
+	{
+		switch( ib_multipath_rec_mtu_sel( p_mpr ) )
+		{
+		case 0:    /* must be greater than */
+		case 2:    /* exact match */
+			if( IB_MTU_LEN_1024 < required_mtu )
+				return(FALSE);
+			break;
+
+		case 1:    /* must be less than */
+		case 3:    /* largest available */
+			/* can't be disqualified by this one */
+			break;
+			
+		default:
+			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
+			CL_ASSERT( FALSE );
+			break;
+		}
+	}
+
+	return(TRUE);
+}
+
+/**********************************************************************
+ **********************************************************************/
 static ib_api_status_t
 __osm_mpr_rcv_get_path_parms(
   IN osm_mpr_rcv_t*		const p_rcv,
@@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms(
   mtu = ib_port_info_get_mtu_cap( p_pi );
   rate = ib_port_info_compute_rate( p_pi );
 
+  /* 
+	  Mellanox Tavor device performance is better using 1K MTU.
+	  If required MTU and MTU selector are such that 1K is OK 
+	  and one of the ends of the path is Tavor we override the
+	  port MTU with 1K.
+  */
+  if ( p_rcv->p_subn->opt.enable_quirks &&
+		 __osm_sa_multipath_rec_apply_tavor_mtu_limit(
+			 p_mpr, p_src_port, p_dest_port, comp_mask) )
+	  if (mtu > IB_MTU_LEN_1024) 
+	  {
+		  mtu = IB_MTU_LEN_1024;
+  		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
+					  "__osm_mpr_rcv_get_path_parms: "
+					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
+	  }
+
   if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC &&
        cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) )
     required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp );
Index: opensm/osm_subnet.c
===================================================================
--- opensm/osm_subnet.c	(revision 9493)
+++ opensm/osm_subnet.c	(working copy)
@@ -494,6 +494,7 @@ osm_subn_set_default_opt(
   p_opt->ucast_dump_file = NULL;
   p_opt->updn_guid_file = NULL;
   p_opt->exit_on_fatal = TRUE;
+  p_opt->enable_quirks = FALSE;
   subn_set_default_qos_options(&p_opt->qos_options);
   subn_set_default_qos_options(&p_opt->qos_ca_options);
   subn_set_default_qos_options(&p_opt->qos_sw0_options);
@@ -979,6 +980,10 @@ osm_subn_parse_conf_file(
       subn_parse_qos_options("qos_rtr",
         p_key, p_val, &p_opts->qos_rtr_options);
 
+      __osm_subn_opts_unpack_boolean(
+        "enable_quirks",
+        p_key, p_val, &p_opts->enable_quirks);
+
     }
   }
   fclose(opts_file);
@@ -1179,11 +1184,15 @@ osm_subn_write_conf_file(
     "force_log_flush %s\n\n"
     "# Log file to be used\n"
     "log_file %s\n\n"
+	 "# Limit the the size of the log file. If overrun log is restarted\n"
     "log_max_size %lu\n\n"
+	 "# If TRUE will accumulate the log over multiple OpenSM sessions\n"
     "accum_log_file %s\n\n"
     "# The directory to hold the file OpenSM dumps\n"
     "dump_files_dir %s\n\n"
-    "# If TRUE if OpenSM should disable multicast support\n"
+	 "# If TRUE enables new high risk options and hardware specific quirks\n"
+	 "enable_quirks %s\n\n"
+    "# If TRUE OpenSM should disable multicast support\n"  
     "no_multicast_option %s\n\n"
     "# No multicast routing is performed if TRUE\n"
     "disable_multicast %s\n\n"
@@ -1195,6 +1204,7 @@ osm_subn_write_conf_file(
     p_opts->log_max_size,
     p_opts->accum_log_file ? "TRUE" : "FALSE",
     p_opts->dump_files_dir,
+    p_opts->enable_quirks ? "TRUE" : "FALSE",
     p_opts->no_multicast_option ? "TRUE" : "FALSE",
     p_opts->disable_multicast ? "TRUE" : "FALSE",
     p_opts->exit_on_fatal ? "TRUE" : "FALSE"
Index: opensm/osm_helper.c
===================================================================
--- opensm/osm_helper.c	(revision 9493)
+++ opensm/osm_helper.c	(working copy)
@@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width(
   return( __osm_node_type_str_fixed_width[node_type] );
 }
 
-#define OSM_VENDOR_ID_INTEL         0x00D0B7
-#define OSM_VENDOR_ID_MELLANOX      0x0002C9
-#define OSM_VENDOR_ID_REDSWITCH     0x000617
-#define OSM_VENDOR_ID_SILVERSTORM     0x00066A
-#define OSM_VENDOR_ID_TOPSPIN    0x0005AD
-#define OSM_VENDOR_ID_FUJITSU    0x00E000
-#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
-#define OSM_VENDOR_ID_VOLTAIRE   0x0008F1
-#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
-#define OSM_VENDOR_ID_PATHSCALE     0x001175
-#define OSM_VENDOR_ID_IBM           0x000255
-#define OSM_VENDOR_ID_DIVERGENET    0x00084E
-#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
-#define OSM_VENDOR_ID_AGILENT       0x0030D3
-#define OSM_VENDOR_ID_OBSIDIAN      0x001777
-#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
-#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
-
 /**********************************************************************
  **********************************************************************/
 const char*
Index: opensm/osm_sa_path_record.c
===================================================================
--- opensm/osm_sa_path_record.c	(revision 9493)
+++ opensm/osm_sa_path_record.c	(working copy)
@@ -57,6 +57,7 @@
 #include <complib/cl_passivelock.h>
 #include <complib/cl_debug.h>
 #include <complib/cl_qlist.h>
+#include <opensm/osm_base.h>
 #include <opensm/osm_sa_path_record.h>
 #include <opensm/osm_port.h>
 #include <opensm/osm_node.h>
@@ -150,6 +151,75 @@ osm_pr_rcv_init(
 
 /**********************************************************************
  **********************************************************************/
+static inline boolean_t
+__osm_sa_path_rec_is_tavor_port(
+	IN const osm_port_t*     const p_port)
+{
+	osm_node_t const* p_node;
+	ib_net32_t vend_id;
+
+	p_node = osm_port_get_parent_node( p_port );
+	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
+	
+	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
+			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
+				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
+				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
+				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
+}
+
+/**********************************************************************
+ **********************************************************************/
+static boolean_t
+ __osm_sa_path_rec_apply_tavor_mtu_limit(
+  IN const ib_path_rec_t*  const p_pr,
+  IN const osm_port_t*     const p_src_port,
+  IN const osm_port_t*     const p_dest_port,
+  IN const ib_net64_t      comp_mask)
+{
+	uint8_t   required_mtu;
+	
+	/* only if one of the ports is a Tavor device */
+	if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && 
+		 ! __osm_sa_path_rec_is_tavor_port(p_dest_port) )
+		return( FALSE );
+	
+	/*
+	  we can apply the patch if either:
+	  1. No MTU required
+	  2. Required MTU < 
+	  3. Required MTU = 1K or 512 or 256
+	  4. Required MTU > 256 or 512
+	*/
+	required_mtu = ib_path_rec_mtu( p_pr );
+	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
+		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
+	{
+		switch( ib_path_rec_mtu_sel( p_pr ) )
+		{
+		case 0:    /* must be greater than */
+		case 2:    /* exact match */
+			if( IB_MTU_LEN_1024 < required_mtu )
+				return(FALSE);
+			break;
+
+		case 1:    /* must be less than */
+		case 3:    /* largest available */
+			/* can't be disqualified by this one */
+			break;
+			
+		default:
+			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
+			CL_ASSERT( FALSE );
+			break;
+		}
+	}
+
+	return(TRUE);
+}
+
+/**********************************************************************
+ **********************************************************************/
 static ib_api_status_t
 __osm_pr_rcv_get_path_parms(
   IN osm_pr_rcv_t*         const p_rcv,
@@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms(
   mtu = ib_port_info_get_mtu_cap( p_pi );
   rate = ib_port_info_compute_rate( p_pi );
 
+  /* 
+	  Mellanox Tavor device performance is better using 1K MTU.
+	  If required MTU and MTU selector are such that 1K is OK 
+	  and one of the ends of the path is Tavor we override the
+	  port MTU with 1K.
+  */
+  if (  p_rcv->p_subn->opt.enable_quirks &&
+		  __osm_sa_path_rec_apply_tavor_mtu_limit(
+			  p_pr, p_src_port, p_dest_port, comp_mask) )
+	  if (mtu > IB_MTU_LEN_1024) 
+	  {
+		  mtu = IB_MTU_LEN_1024;
+		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
+					  "__osm_pr_rcv_get_path_parms: "
+					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
+	  }
+
   /*
     Walk the subnet object from source to destination,
     tracking the most restrictive rate and mtu values along the way...
@@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms(
   */
 
   /* we silently ignore cases where only the MTU selector is defined */
+  required_mtu = ib_path_rec_mtu( p_pr );
   if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
        ( comp_mask & IB_PR_COMPMASK_MTU ) )
   {
-    required_mtu = ib_path_rec_mtu( p_pr );
     switch( ib_path_rec_mtu_sel( p_pr ) )
     {
     case 0:    /* must be greater than */


From mst at mellanox.co.il  Fri Sep 15 08:29:17 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Fri, 15 Sep 2006 18:29:17 +0300
Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9
	x86_64?
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD222029AC451@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC451@wdtssmail01.eu.thmulti.com>
Message-ID: <20060915152917.GC25880@mellanox.co.il>

Quoting r. Bub Thomas <thomas.bub at thomson.net>:
> Subject: Any chance to get 32-Bit libraries on SLES9 x86_64?
> 
> Is there any chance/trick to get 32-Bit Libraries build and usable on SLES9 x86_64?
> 
> When I installed OFED-1.1-rc4 I get:
> 
>  
> 
> WARNING: sysfsutils 32-bit version is required to build 32-bit libibverbs package.
> 
> WARNING: Skiping build of 32-bit libraries.
> 
> I googled around and didn't find any sysfsutils 32-bit for SLES9.
> 
> I now that tit is working under SLES10 b  ut our customer base is on SLES9 and very conservative when it comes down to using the latest and greates Os/distribution.
> 
> Thomas
> 

Well, you need 32 bit of libsysfs otehrwise nothing will work.

-- 
MST


From sashak at voltaire.com  Fri Sep 15 15:17:09 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sat, 16 Sep 2006 01:17:09 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
Message-ID: <20060915221709.GB5891@sashak.voltaire.com>

Hi Eitan,

Some comments about the patch.

On 14:45 Fri 15 Sep     , Eitan Zahavi wrote:
> Hi Hal
> 
> The following patch solves an issue with OpenSM preferring largest MTU 
> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> devices instead of using a 1K MTU which is best for this device.
> 
> Since this is a device specific quirk I have added a configuration option
> named enable_quirks which is FALSE by default to enable this functionality.
> 
> To summarize the functionality change:
> 1. Added enable_quirks option 
> 2. If enable_quirks is FALSE do nothing

I see those quirks are SA specific. Then should this option be called
'enable_sa_quirks' instead?

> 3. If a specific MTU is requested (either =2K or >1K) do nothing
> 4. If either source port or destination port is a Tavor device  
> 	MTU is limited to 1K (can be further reduced by path traversal) 
> 
> Target is both trunk and OFED 1.1
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> 
> Index: include/opensm/osm_subnet.h
> ===================================================================
> --- include/opensm/osm_subnet.h	(revision 9493)
> +++ include/opensm/osm_subnet.h	(working copy)
> @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt
>    osm_qos_options_t        qos_sw0_options;
>    osm_qos_options_t        qos_swe_options;
>    osm_qos_options_t        qos_rtr_options;
> +  boolean_t                enable_quirks;
>  } osm_subn_opt_t;
>  /*
>  * FIELDS
> @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt
>  *	qos_rtr_options
>  *		QoS options for router ports
>  *
> +*  enable_quirks
> +*     Enable high risk new features and not fully qualified 
> +*     hardware specific work arounds
> +*
>  * SEE ALSO
>  *	Subnet object
>  *********/
> Index: include/opensm/osm_base.h
> ===================================================================
> --- include/opensm/osm_base.h	(revision 9493)
> +++ include/opensm/osm_base.h	(working copy)
> @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type
>  #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120
>  /**********/
>  
> +/****s* OpenSM: Base/VendorOUIs
> +* NAME
> +*	VendorOUIs
> +*
> +* DESCRIPTION
> +*	Known device vendor ID and GUID OUIs
> +*
> +* SYNOPSIS
> +*/
> +#define OSM_VENDOR_ID_INTEL         0x00D0B7
> +#define OSM_VENDOR_ID_MELLANOX      0x0002C9
> +#define OSM_VENDOR_ID_REDSWITCH     0x000617
> +#define OSM_VENDOR_ID_SILVERSTORM   0x00066A
> +#define OSM_VENDOR_ID_TOPSPIN       0x0005AD
> +#define OSM_VENDOR_ID_FUJITSU       0x00E000
> +#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
> +#define OSM_VENDOR_ID_VOLTAIRE      0x0008F1
> +#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
> +#define OSM_VENDOR_ID_PATHSCALE     0x001175
> +#define OSM_VENDOR_ID_IBM           0x000255
> +#define OSM_VENDOR_ID_DIVERGENET    0x00084E
> +#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
> +#define OSM_VENDOR_ID_AGILENT       0x0030D3
> +#define OSM_VENDOR_ID_OBSIDIAN      0x001777
> +#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
> +#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
> +/**********/
> +
>  END_C_DECLS
>  
>  #endif	/* _OSM_BASE_H_ */
> Index: opensm/osm_sa_multipath_record.c
> ===================================================================
> --- opensm/osm_sa_multipath_record.c	(revision 9493)
> +++ opensm/osm_sa_multipath_record.c	(working copy)
> @@ -150,6 +150,75 @@ osm_mpr_rcv_init(
>  
>  /**********************************************************************
>   **********************************************************************/
> +static inline boolean_t
> +__osm_sa_multipath_rec_is_tavor_port(
> +	IN const osm_port_t*     const p_port)
> +{
> +	osm_node_t const* p_node;
> +	ib_net32_t vend_id;
> +
> +	p_node = osm_port_get_parent_node( p_port );
> +	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
> +	
> +	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
> +			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
> +				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
> +				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
> +				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
> +}
> +
> +/**********************************************************************
> + **********************************************************************/
> +boolean_t
> + __osm_sa_multipath_rec_apply_tavor_mtu_limit(
> +  IN const ib_multipath_rec_t*  const p_mpr,
> +  IN const osm_port_t*          const p_src_port,
> +  IN const osm_port_t*          const p_dest_port,
> +  IN const ib_net64_t           comp_mask)
> +{
> +	uint8_t   required_mtu;
> +	
> +	/* only if one of the ports is a Tavor device */
> +	if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && 
> +		 ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) )
> +		return( FALSE );
> +	
> +	/*
> +	  we can apply the patch if either:
> +	  1. No MTU required
> +	  2. Required MTU < 
> +	  3. Required MTU = 1K or 512 or 256
> +	  4. Required MTU > 256 or 512
> +	*/
> +	required_mtu = ib_multipath_rec_mtu( p_mpr );
> +	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> +		  ( comp_mask & IB_PR_COMPMASK_MTU ) )

Should here be IB_MPR_COMPMASK_* instead of IB_PR_COMPMASK_*?

> +	{
> +		switch( ib_multipath_rec_mtu_sel( p_mpr ) )
> +		{
> +		case 0:    /* must be greater than */
> +		case 2:    /* exact match */
> +			if( IB_MTU_LEN_1024 < required_mtu )
> +				return(FALSE);
> +			break;
> +
> +		case 1:    /* must be less than */
> +		case 3:    /* largest available */
> +			/* can't be disqualified by this one */
> +			break;
> +			
> +		default:
> +			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
> +			CL_ASSERT( FALSE );
> +			break;
> +		}
> +	}
> +
> +	return(TRUE);
> +}
> +
> +/**********************************************************************
> + **********************************************************************/
>  static ib_api_status_t
>  __osm_mpr_rcv_get_path_parms(
>    IN osm_mpr_rcv_t*		const p_rcv,
> @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms(
>    mtu = ib_port_info_get_mtu_cap( p_pi );
>    rate = ib_port_info_compute_rate( p_pi );
>  
> +  /* 
> +	  Mellanox Tavor device performance is better using 1K MTU.
> +	  If required MTU and MTU selector are such that 1K is OK 
> +	  and one of the ends of the path is Tavor we override the
> +	  port MTU with 1K.
> +  */
> +  if ( p_rcv->p_subn->opt.enable_quirks &&
> +		 __osm_sa_multipath_rec_apply_tavor_mtu_limit(
> +			 p_mpr, p_src_port, p_dest_port, comp_mask) )
> +	  if (mtu > IB_MTU_LEN_1024) 
> +	  {
> +		  mtu = IB_MTU_LEN_1024;
> +  		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> +					  "__osm_mpr_rcv_get_path_parms: "
> +					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
> +	  }
> +

This part is pure hardcode, isn't it? Could this be at least isolated in
single call 'osm_*_do_quirks()' or like thin?

>    if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC &&
>         cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) )
>      required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp );
> Index: opensm/osm_subnet.c
> ===================================================================
> --- opensm/osm_subnet.c	(revision 9493)
> +++ opensm/osm_subnet.c	(working copy)
> @@ -494,6 +494,7 @@ osm_subn_set_default_opt(
>    p_opt->ucast_dump_file = NULL;
>    p_opt->updn_guid_file = NULL;
>    p_opt->exit_on_fatal = TRUE;
> +  p_opt->enable_quirks = FALSE;
>    subn_set_default_qos_options(&p_opt->qos_options);
>    subn_set_default_qos_options(&p_opt->qos_ca_options);
>    subn_set_default_qos_options(&p_opt->qos_sw0_options);
> @@ -979,6 +980,10 @@ osm_subn_parse_conf_file(
>        subn_parse_qos_options("qos_rtr",
>          p_key, p_val, &p_opts->qos_rtr_options);
>  
> +      __osm_subn_opts_unpack_boolean(
> +        "enable_quirks",
> +        p_key, p_val, &p_opts->enable_quirks);
> +
>      }
>    }
>    fclose(opts_file);
> @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file(
>      "force_log_flush %s\n\n"
>      "# Log file to be used\n"
>      "log_file %s\n\n"
> +	 "# Limit the the size of the log file. If overrun log is restarted\n"
>      "log_max_size %lu\n\n"
> +	 "# If TRUE will accumulate the log over multiple OpenSM sessions\n"
>      "accum_log_file %s\n\n"
>      "# The directory to hold the file OpenSM dumps\n"
>      "dump_files_dir %s\n\n"
> -    "# If TRUE if OpenSM should disable multicast support\n"
> +	 "# If TRUE enables new high risk options and hardware specific quirks\n"
> +	 "enable_quirks %s\n\n"
> +    "# If TRUE OpenSM should disable multicast support\n"  
>      "no_multicast_option %s\n\n"
>      "# No multicast routing is performed if TRUE\n"
>      "disable_multicast %s\n\n"
> @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file(
>      p_opts->log_max_size,
>      p_opts->accum_log_file ? "TRUE" : "FALSE",
>      p_opts->dump_files_dir,
> +    p_opts->enable_quirks ? "TRUE" : "FALSE",
>      p_opts->no_multicast_option ? "TRUE" : "FALSE",
>      p_opts->disable_multicast ? "TRUE" : "FALSE",
>      p_opts->exit_on_fatal ? "TRUE" : "FALSE"
> Index: opensm/osm_helper.c
> ===================================================================
> --- opensm/osm_helper.c	(revision 9493)
> +++ opensm/osm_helper.c	(working copy)
> @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width(
>    return( __osm_node_type_str_fixed_width[node_type] );
>  }
>  
> -#define OSM_VENDOR_ID_INTEL         0x00D0B7
> -#define OSM_VENDOR_ID_MELLANOX      0x0002C9
> -#define OSM_VENDOR_ID_REDSWITCH     0x000617
> -#define OSM_VENDOR_ID_SILVERSTORM     0x00066A
> -#define OSM_VENDOR_ID_TOPSPIN    0x0005AD
> -#define OSM_VENDOR_ID_FUJITSU    0x00E000
> -#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
> -#define OSM_VENDOR_ID_VOLTAIRE   0x0008F1
> -#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
> -#define OSM_VENDOR_ID_PATHSCALE     0x001175
> -#define OSM_VENDOR_ID_IBM           0x000255
> -#define OSM_VENDOR_ID_DIVERGENET    0x00084E
> -#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
> -#define OSM_VENDOR_ID_AGILENT       0x0030D3
> -#define OSM_VENDOR_ID_OBSIDIAN      0x001777
> -#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
> -#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
> -
>  /**********************************************************************
>   **********************************************************************/
>  const char*
> Index: opensm/osm_sa_path_record.c
> ===================================================================
> --- opensm/osm_sa_path_record.c	(revision 9493)
> +++ opensm/osm_sa_path_record.c	(working copy)
> @@ -57,6 +57,7 @@
>  #include <complib/cl_passivelock.h>
>  #include <complib/cl_debug.h>
>  #include <complib/cl_qlist.h>
> +#include <opensm/osm_base.h>
>  #include <opensm/osm_sa_path_record.h>
>  #include <opensm/osm_port.h>
>  #include <opensm/osm_node.h>
> @@ -150,6 +151,75 @@ osm_pr_rcv_init(
>  
>  /**********************************************************************
>   **********************************************************************/
> +static inline boolean_t
> +__osm_sa_path_rec_is_tavor_port(
> +	IN const osm_port_t*     const p_port)
> +{
> +	osm_node_t const* p_node;
> +	ib_net32_t vend_id;
> +
> +	p_node = osm_port_get_parent_node( p_port );
> +	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
> +	
> +	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
> +			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
> +				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
> +				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
> +				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
> +}
> +
> +/**********************************************************************
> + **********************************************************************/
> +static boolean_t
> + __osm_sa_path_rec_apply_tavor_mtu_limit(
> +  IN const ib_path_rec_t*  const p_pr,
> +  IN const osm_port_t*     const p_src_port,
> +  IN const osm_port_t*     const p_dest_port,
> +  IN const ib_net64_t      comp_mask)
> +{
> +	uint8_t   required_mtu;
> +	
> +	/* only if one of the ports is a Tavor device */
> +	if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && 
> +		 ! __osm_sa_path_rec_is_tavor_port(p_dest_port) )
> +		return( FALSE );
> +	
> +	/*
> +	  we can apply the patch if either:
> +	  1. No MTU required
> +	  2. Required MTU < 
> +	  3. Required MTU = 1K or 512 or 256
> +	  4. Required MTU > 256 or 512
> +	*/
> +	required_mtu = ib_path_rec_mtu( p_pr );
> +	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> +		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
> +	{
> +		switch( ib_path_rec_mtu_sel( p_pr ) )
> +		{
> +		case 0:    /* must be greater than */
> +		case 2:    /* exact match */
> +			if( IB_MTU_LEN_1024 < required_mtu )
> +				return(FALSE);
> +			break;
> +
> +		case 1:    /* must be less than */
> +		case 3:    /* largest available */
> +			/* can't be disqualified by this one */
> +			break;
> +			
> +		default:
> +			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
> +			CL_ASSERT( FALSE );
> +			break;
> +		}
> +	}
> +
> +	return(TRUE);
> +}
> +
> +/**********************************************************************
> + **********************************************************************/
>  static ib_api_status_t
>  __osm_pr_rcv_get_path_parms(
>    IN osm_pr_rcv_t*         const p_rcv,
> @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms(
>    mtu = ib_port_info_get_mtu_cap( p_pi );
>    rate = ib_port_info_compute_rate( p_pi );
>  
> +  /* 
> +	  Mellanox Tavor device performance is better using 1K MTU.
> +	  If required MTU and MTU selector are such that 1K is OK 
> +	  and one of the ends of the path is Tavor we override the
> +	  port MTU with 1K.
> +  */
> +  if (  p_rcv->p_subn->opt.enable_quirks &&
> +		  __osm_sa_path_rec_apply_tavor_mtu_limit(
> +			  p_pr, p_src_port, p_dest_port, comp_mask) )
> +	  if (mtu > IB_MTU_LEN_1024) 
> +	  {
> +		  mtu = IB_MTU_LEN_1024;
> +		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> +					  "__osm_pr_rcv_get_path_parms: "
> +					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
> +	  }
> +

The same is here (about hardcodes).

Also I see that tavor specific functions are pretty similar for PR and
MPR cases. Why not to share this in something like osm_sa_quirks.c?

Sasha

>    /*
>      Walk the subnet object from source to destination,
>      tracking the most restrictive rate and mtu values along the way...
> @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms(
>    */
>  
>    /* we silently ignore cases where only the MTU selector is defined */
> +  required_mtu = ib_path_rec_mtu( p_pr );
>    if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
>         ( comp_mask & IB_PR_COMPMASK_MTU ) )
>    {
> -    required_mtu = ib_path_rec_mtu( p_pr );
>      switch( ib_path_rec_mtu_sel( p_pr ) )
>      {
>      case 0:    /* must be greater than */
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From halr at voltaire.com  Fri Sep 15 16:06:35 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Sep 2006 19:06:35 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <20060915221709.GB5891@sashak.voltaire.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
	<20060915221709.GB5891@sashak.voltaire.com>
Message-ID: <1158361564.25157.67561.camel@hal.voltaire.com>

Hi Sasha,

On Fri, 2006-09-15 at 18:17, Sasha Khapyorsky wrote:
> Hi Eitan,
> 
> Some comments about the patch.
> 
> On 14:45 Fri 15 Sep     , Eitan Zahavi wrote:
> > Hi Hal
> > 
> > The following patch solves an issue with OpenSM preferring largest MTU 
> > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> > devices instead of using a 1K MTU which is best for this device.
> > 
> > Since this is a device specific quirk I have added a configuration option
> > named enable_quirks which is FALSE by default to enable this functionality.
> > 
> > To summarize the functionality change:
> > 1. Added enable_quirks option 
> > 2. If enable_quirks is FALSE do nothing
> 
> I see those quirks are SA specific. Then should this option be called
> 'enable_sa_quirks' instead?

Not sure what the right "granularity" is for this. Would all quirks be
enabled at once or would this end up being a pick and choose ?

> > 3. If a specific MTU is requested (either =2K or >1K) do nothing
> > 4. If either source port or destination port is a Tavor device  
> > 	MTU is limited to 1K (can be further reduced by path traversal) 
> > 
> > Target is both trunk and OFED 1.1
> > 
> > Thanks
> > 
> > Eitan
> > 
> > Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> > 
> > Index: include/opensm/osm_subnet.h
> > ===================================================================
> > --- include/opensm/osm_subnet.h	(revision 9493)
> > +++ include/opensm/osm_subnet.h	(working copy)
> > @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt
> >    osm_qos_options_t        qos_sw0_options;
> >    osm_qos_options_t        qos_swe_options;
> >    osm_qos_options_t        qos_rtr_options;
> > +  boolean_t                enable_quirks;
> >  } osm_subn_opt_t;
> >  /*
> >  * FIELDS
> > @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt
> >  *	qos_rtr_options
> >  *		QoS options for router ports
> >  *
> > +*  enable_quirks
> > +*     Enable high risk new features and not fully qualified 
> > +*     hardware specific work arounds
> > +*
> >  * SEE ALSO
> >  *	Subnet object
> >  *********/
> > Index: include/opensm/osm_base.h
> > ===================================================================
> > --- include/opensm/osm_base.h	(revision 9493)
> > +++ include/opensm/osm_base.h	(working copy)
> > @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type
> >  #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120
> >  /**********/
> >  
> > +/****s* OpenSM: Base/VendorOUIs
> > +* NAME
> > +*	VendorOUIs
> > +*
> > +* DESCRIPTION
> > +*	Known device vendor ID and GUID OUIs
> > +*
> > +* SYNOPSIS
> > +*/
> > +#define OSM_VENDOR_ID_INTEL         0x00D0B7
> > +#define OSM_VENDOR_ID_MELLANOX      0x0002C9
> > +#define OSM_VENDOR_ID_REDSWITCH     0x000617
> > +#define OSM_VENDOR_ID_SILVERSTORM   0x00066A
> > +#define OSM_VENDOR_ID_TOPSPIN       0x0005AD
> > +#define OSM_VENDOR_ID_FUJITSU       0x00E000
> > +#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
> > +#define OSM_VENDOR_ID_VOLTAIRE      0x0008F1
> > +#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
> > +#define OSM_VENDOR_ID_PATHSCALE     0x001175
> > +#define OSM_VENDOR_ID_IBM           0x000255
> > +#define OSM_VENDOR_ID_DIVERGENET    0x00084E
> > +#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
> > +#define OSM_VENDOR_ID_AGILENT       0x0030D3
> > +#define OSM_VENDOR_ID_OBSIDIAN      0x001777
> > +#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
> > +#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
> > +/**********/
> > +
> >  END_C_DECLS
> >  
> >  #endif	/* _OSM_BASE_H_ */
> > Index: opensm/osm_sa_multipath_record.c
> > ===================================================================
> > --- opensm/osm_sa_multipath_record.c	(revision 9493)
> > +++ opensm/osm_sa_multipath_record.c	(working copy)
> > @@ -150,6 +150,75 @@ osm_mpr_rcv_init(
> >  
> >  /**********************************************************************
> >   **********************************************************************/
> > +static inline boolean_t
> > +__osm_sa_multipath_rec_is_tavor_port(
> > +	IN const osm_port_t*     const p_port)
> > +{
> > +	osm_node_t const* p_node;
> > +	ib_net32_t vend_id;
> > +
> > +	p_node = osm_port_get_parent_node( p_port );
> > +	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
> > +	
> > +	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
> > +			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
> > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
> > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
> > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
> > +}
> > +
> > +/**********************************************************************
> > + **********************************************************************/
> > +boolean_t
> > + __osm_sa_multipath_rec_apply_tavor_mtu_limit(
> > +  IN const ib_multipath_rec_t*  const p_mpr,
> > +  IN const osm_port_t*          const p_src_port,
> > +  IN const osm_port_t*          const p_dest_port,
> > +  IN const ib_net64_t           comp_mask)
> > +{
> > +	uint8_t   required_mtu;
> > +	
> > +	/* only if one of the ports is a Tavor device */
> > +	if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && 
> > +		 ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) )
> > +		return( FALSE );
> > +	
> > +	/*
> > +	  we can apply the patch if either:
> > +	  1. No MTU required
> > +	  2. Required MTU < 
> > +	  3. Required MTU = 1K or 512 or 256
> > +	  4. Required MTU > 256 or 512
> > +	*/
> > +	required_mtu = ib_multipath_rec_mtu( p_mpr );
> > +	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> > +		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
> 
> Should here be IB_MPR_COMPMASK_* instead of IB_PR_COMPMASK_*?

Good catch.

> > +	{
> > +		switch( ib_multipath_rec_mtu_sel( p_mpr ) )
> > +		{
> > +		case 0:    /* must be greater than */
> > +		case 2:    /* exact match */
> > +			if( IB_MTU_LEN_1024 < required_mtu )
> > +				return(FALSE);
> > +			break;
> > +
> > +		case 1:    /* must be less than */
> > +		case 3:    /* largest available */
> > +			/* can't be disqualified by this one */
> > +			break;
> > +			
> > +		default:
> > +			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
> > +			CL_ASSERT( FALSE );
> > +			break;
> > +		}
> > +	}
> > +
> > +	return(TRUE);
> > +}
> > +
> > +/**********************************************************************
> > + **********************************************************************/
> >  static ib_api_status_t
> >  __osm_mpr_rcv_get_path_parms(
> >    IN osm_mpr_rcv_t*		const p_rcv,
> > @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms(
> >    mtu = ib_port_info_get_mtu_cap( p_pi );
> >    rate = ib_port_info_compute_rate( p_pi );
> >  
> > +  /* 
> > +	  Mellanox Tavor device performance is better using 1K MTU.
> > +	  If required MTU and MTU selector are such that 1K is OK 
> > +	  and one of the ends of the path is Tavor we override the
> > +	  port MTU with 1K.
> > +  */
> > +  if ( p_rcv->p_subn->opt.enable_quirks &&
> > +		 __osm_sa_multipath_rec_apply_tavor_mtu_limit(
> > +			 p_mpr, p_src_port, p_dest_port, comp_mask) )
> > +	  if (mtu > IB_MTU_LEN_1024) 
> > +	  {
> > +		  mtu = IB_MTU_LEN_1024;
> > +  		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> > +					  "__osm_mpr_rcv_get_path_parms: "
> > +					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
> > +	  }
> > +
> 
> This part is pure hardcode, isn't it? Could this be at least isolated in
> single call 'osm_*_do_quirks()' or like thin?

Perhaps. This can be worked on the trunk.

> >    if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC &&
> >         cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) )
> >      required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp );
> > Index: opensm/osm_subnet.c
> > ===================================================================
> > --- opensm/osm_subnet.c	(revision 9493)
> > +++ opensm/osm_subnet.c	(working copy)
> > @@ -494,6 +494,7 @@ osm_subn_set_default_opt(
> >    p_opt->ucast_dump_file = NULL;
> >    p_opt->updn_guid_file = NULL;
> >    p_opt->exit_on_fatal = TRUE;
> > +  p_opt->enable_quirks = FALSE;
> >    subn_set_default_qos_options(&p_opt->qos_options);
> >    subn_set_default_qos_options(&p_opt->qos_ca_options);
> >    subn_set_default_qos_options(&p_opt->qos_sw0_options);
> > @@ -979,6 +980,10 @@ osm_subn_parse_conf_file(
> >        subn_parse_qos_options("qos_rtr",
> >          p_key, p_val, &p_opts->qos_rtr_options);
> >  
> > +      __osm_subn_opts_unpack_boolean(
> > +        "enable_quirks",
> > +        p_key, p_val, &p_opts->enable_quirks);
> > +
> >      }
> >    }
> >    fclose(opts_file);
> > @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file(
> >      "force_log_flush %s\n\n"
> >      "# Log file to be used\n"
> >      "log_file %s\n\n"
> > +	 "# Limit the the size of the log file. If overrun log is restarted\n"
> >      "log_max_size %lu\n\n"
> > +	 "# If TRUE will accumulate the log over multiple OpenSM sessions\n"
> >      "accum_log_file %s\n\n"
> >      "# The directory to hold the file OpenSM dumps\n"
> >      "dump_files_dir %s\n\n"
> > -    "# If TRUE if OpenSM should disable multicast support\n"
> > +	 "# If TRUE enables new high risk options and hardware specific quirks\n"
> > +	 "enable_quirks %s\n\n"
> > +    "# If TRUE OpenSM should disable multicast support\n"  
> >      "no_multicast_option %s\n\n"
> >      "# No multicast routing is performed if TRUE\n"
> >      "disable_multicast %s\n\n"
> > @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file(
> >      p_opts->log_max_size,
> >      p_opts->accum_log_file ? "TRUE" : "FALSE",
> >      p_opts->dump_files_dir,
> > +    p_opts->enable_quirks ? "TRUE" : "FALSE",
> >      p_opts->no_multicast_option ? "TRUE" : "FALSE",
> >      p_opts->disable_multicast ? "TRUE" : "FALSE",
> >      p_opts->exit_on_fatal ? "TRUE" : "FALSE"
> > Index: opensm/osm_helper.c
> > ===================================================================
> > --- opensm/osm_helper.c	(revision 9493)
> > +++ opensm/osm_helper.c	(working copy)
> > @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width(
> >    return( __osm_node_type_str_fixed_width[node_type] );
> >  }
> >  
> > -#define OSM_VENDOR_ID_INTEL         0x00D0B7
> > -#define OSM_VENDOR_ID_MELLANOX      0x0002C9
> > -#define OSM_VENDOR_ID_REDSWITCH     0x000617
> > -#define OSM_VENDOR_ID_SILVERSTORM     0x00066A
> > -#define OSM_VENDOR_ID_TOPSPIN    0x0005AD
> > -#define OSM_VENDOR_ID_FUJITSU    0x00E000
> > -#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
> > -#define OSM_VENDOR_ID_VOLTAIRE   0x0008F1
> > -#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
> > -#define OSM_VENDOR_ID_PATHSCALE     0x001175
> > -#define OSM_VENDOR_ID_IBM           0x000255
> > -#define OSM_VENDOR_ID_DIVERGENET    0x00084E
> > -#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
> > -#define OSM_VENDOR_ID_AGILENT       0x0030D3
> > -#define OSM_VENDOR_ID_OBSIDIAN      0x001777
> > -#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
> > -#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
> > -
> >  /**********************************************************************
> >   **********************************************************************/
> >  const char*
> > Index: opensm/osm_sa_path_record.c
> > ===================================================================
> > --- opensm/osm_sa_path_record.c	(revision 9493)
> > +++ opensm/osm_sa_path_record.c	(working copy)
> > @@ -57,6 +57,7 @@
> >  #include <complib/cl_passivelock.h>
> >  #include <complib/cl_debug.h>
> >  #include <complib/cl_qlist.h>
> > +#include <opensm/osm_base.h>
> >  #include <opensm/osm_sa_path_record.h>
> >  #include <opensm/osm_port.h>
> >  #include <opensm/osm_node.h>
> > @@ -150,6 +151,75 @@ osm_pr_rcv_init(
> >  
> >  /**********************************************************************
> >   **********************************************************************/
> > +static inline boolean_t
> > +__osm_sa_path_rec_is_tavor_port(
> > +	IN const osm_port_t*     const p_port)
> > +{
> > +	osm_node_t const* p_node;
> > +	ib_net32_t vend_id;
> > +
> > +	p_node = osm_port_get_parent_node( p_port );
> > +	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
> > +	
> > +	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
> > +			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
> > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
> > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
> > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
> > +}
> > +
> > +/**********************************************************************
> > + **********************************************************************/
> > +static boolean_t
> > + __osm_sa_path_rec_apply_tavor_mtu_limit(
> > +  IN const ib_path_rec_t*  const p_pr,
> > +  IN const osm_port_t*     const p_src_port,
> > +  IN const osm_port_t*     const p_dest_port,
> > +  IN const ib_net64_t      comp_mask)
> > +{
> > +	uint8_t   required_mtu;
> > +	
> > +	/* only if one of the ports is a Tavor device */
> > +	if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && 
> > +		 ! __osm_sa_path_rec_is_tavor_port(p_dest_port) )
> > +		return( FALSE );
> > +	
> > +	/*
> > +	  we can apply the patch if either:
> > +	  1. No MTU required
> > +	  2. Required MTU < 
> > +	  3. Required MTU = 1K or 512 or 256
> > +	  4. Required MTU > 256 or 512
> > +	*/
> > +	required_mtu = ib_path_rec_mtu( p_pr );
> > +	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> > +		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
> > +	{
> > +		switch( ib_path_rec_mtu_sel( p_pr ) )
> > +		{
> > +		case 0:    /* must be greater than */
> > +		case 2:    /* exact match */
> > +			if( IB_MTU_LEN_1024 < required_mtu )
> > +				return(FALSE);
> > +			break;
> > +
> > +		case 1:    /* must be less than */
> > +		case 3:    /* largest available */
> > +			/* can't be disqualified by this one */
> > +			break;
> > +			
> > +		default:
> > +			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
> > +			CL_ASSERT( FALSE );
> > +			break;
> > +		}
> > +	}
> > +
> > +	return(TRUE);
> > +}
> > +
> > +/**********************************************************************
> > + **********************************************************************/
> >  static ib_api_status_t
> >  __osm_pr_rcv_get_path_parms(
> >    IN osm_pr_rcv_t*         const p_rcv,
> > @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms(
> >    mtu = ib_port_info_get_mtu_cap( p_pi );
> >    rate = ib_port_info_compute_rate( p_pi );
> >  
> > +  /* 
> > +	  Mellanox Tavor device performance is better using 1K MTU.
> > +	  If required MTU and MTU selector are such that 1K is OK 
> > +	  and one of the ends of the path is Tavor we override the
> > +	  port MTU with 1K.
> > +  */
> > +  if (  p_rcv->p_subn->opt.enable_quirks &&
> > +		  __osm_sa_path_rec_apply_tavor_mtu_limit(
> > +			  p_pr, p_src_port, p_dest_port, comp_mask) )
> > +	  if (mtu > IB_MTU_LEN_1024) 
> > +	  {
> > +		  mtu = IB_MTU_LEN_1024;
> > +		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> > +					  "__osm_pr_rcv_get_path_parms: "
> > +					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
> > +	  }
> > +
> 
> The same is here (about hardcodes).
> 
> Also I see that tavor specific functions are pretty similar for PR and
> MPR cases. Why not to share this in something like osm_sa_quirks.c?

I think we can work on this on the trunk and see if there is an OFED 1.1
opening.

-- Hal

> Sasha
> 
> >    /*
> >      Walk the subnet object from source to destination,
> >      tracking the most restrictive rate and mtu values along the way...
> > @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms(
> >    */
> >  
> >    /* we silently ignore cases where only the MTU selector is defined */
> > +  required_mtu = ib_path_rec_mtu( p_pr );
> >    if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> >         ( comp_mask & IB_PR_COMPMASK_MTU ) )
> >    {
> > -    required_mtu = ib_path_rec_mtu( p_pr );
> >      switch( ib_path_rec_mtu_sel( p_pr ) )
> >      {
> >      case 0:    /* must be greater than */
> > 
> > 
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 


From halr at voltaire.com  Fri Sep 15 16:26:00 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Sep 2006 19:26:00 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
	MT23108 devices
In-Reply-To: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
Message-ID: <1158362707.25157.68156.camel@hal.voltaire.com>

Hi Eitan,

On Fri, 2006-09-15 at 07:45, Eitan Zahavi wrote:
> Hi Hal
> 
> The following patch solves an issue with OpenSM preferring largest MTU 
> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> devices instead of using a 1K MTU which is best for this device.
> 
> Since this is a device specific quirk I have added a configuration option
> named enable_quirks which is FALSE by default to enable this functionality.
> 
> To summarize the functionality change:
> 1. Added enable_quirks option 
> 2. If enable_quirks is FALSE do nothing
> 3. If a specific MTU is requested (either =2K or >1K) do nothing
> 4. If either source port or destination port is a Tavor device  
> 	MTU is limited to 1K (can be further reduced by path traversal) 
> 
> Target is both trunk and OFED 1.1

Thanks. Applied to both trunk and 1.1 with the MPR compmask change that
Sasha saw and some other cosmetic changes. Please retest to be sure.

-- Hal

> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>


From Sudhakar.Dindukurti at Sun.COM  Fri Sep 15 17:00:08 2006
From: Sudhakar.Dindukurti at Sun.COM (Sudhakar Dindukurti)
Date: Fri, 15 Sep 2006 17:00:08 -0700
Subject: [openib-general] Some questions on OF interfaces
Message-ID: <450B3E88.3080407@Sun.COM>

Hello,

       I am new to OpenFabrics and trying to understand
       OF interfaces. I appreciate if some one could
       provide answers to the following questions.

       1) How/when to use IB_SEND_INLINE feature ?

       2) What are possible values for
            struct ib_device_attr  -> page_size_cap ?

Thanks in advance,
Sudhakar


From sashak at voltaire.com  Fri Sep 15 17:04:32 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sat, 16 Sep 2006 03:04:32 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <1158361564.25157.67561.camel@hal.voltaire.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
	<20060915221709.GB5891@sashak.voltaire.com>
	<1158361564.25157.67561.camel@hal.voltaire.com>
Message-ID: <20060916000432.GB8912@sashak.voltaire.com>

On 19:06 Fri 15 Sep     , Hal Rosenstock wrote:
> Hi Sasha,
> 
> On Fri, 2006-09-15 at 18:17, Sasha Khapyorsky wrote:
> > Hi Eitan,
> > 
> > Some comments about the patch.
> > 
> > On 14:45 Fri 15 Sep     , Eitan Zahavi wrote:
> > > Hi Hal
> > > 
> > > The following patch solves an issue with OpenSM preferring largest MTU 
> > > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> > > devices instead of using a 1K MTU which is best for this device.
> > > 
> > > Since this is a device specific quirk I have added a configuration option
> > > named enable_quirks which is FALSE by default to enable this functionality.
> > > 
> > > To summarize the functionality change:
> > > 1. Added enable_quirks option 
> > > 2. If enable_quirks is FALSE do nothing
> > 
> > I see those quirks are SA specific. Then should this option be called
> > 'enable_sa_quirks' instead?
> 
> Not sure what the right "granularity" is for this. Would all quirks be
> enabled at once or would this end up being a pick and choose ?

Of course this matters how we define this (so I was asking). Right now I
see that this is used for SA.

Sasha

> 
> > > 3. If a specific MTU is requested (either =2K or >1K) do nothing
> > > 4. If either source port or destination port is a Tavor device  
> > > 	MTU is limited to 1K (can be further reduced by path traversal) 
> > > 
> > > Target is both trunk and OFED 1.1
> > > 
> > > Thanks
> > > 
> > > Eitan
> > > 
> > > Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> > > 
> > > Index: include/opensm/osm_subnet.h
> > > ===================================================================
> > > --- include/opensm/osm_subnet.h	(revision 9493)
> > > +++ include/opensm/osm_subnet.h	(working copy)
> > > @@ -286,6 +286,7 @@ typedef struct _osm_subn_opt
> > >    osm_qos_options_t        qos_sw0_options;
> > >    osm_qos_options_t        qos_swe_options;
> > >    osm_qos_options_t        qos_rtr_options;
> > > +  boolean_t                enable_quirks;
> > >  } osm_subn_opt_t;
> > >  /*
> > >  * FIELDS
> > > @@ -469,6 +470,10 @@ typedef struct _osm_subn_opt
> > >  *	qos_rtr_options
> > >  *		QoS options for router ports
> > >  *
> > > +*  enable_quirks
> > > +*     Enable high risk new features and not fully qualified 
> > > +*     hardware specific work arounds
> > > +*
> > >  * SEE ALSO
> > >  *	Subnet object
> > >  *********/
> > > Index: include/opensm/osm_base.h
> > > ===================================================================
> > > --- include/opensm/osm_base.h	(revision 9493)
> > > +++ include/opensm/osm_base.h	(working copy)
> > > @@ -778,6 +778,34 @@ typedef enum _osm_mcast_req_type
> > >  #define MAX_UPDN_GUID_FILE_LINE_LENGTH 120
> > >  /**********/
> > >  
> > > +/****s* OpenSM: Base/VendorOUIs
> > > +* NAME
> > > +*	VendorOUIs
> > > +*
> > > +* DESCRIPTION
> > > +*	Known device vendor ID and GUID OUIs
> > > +*
> > > +* SYNOPSIS
> > > +*/
> > > +#define OSM_VENDOR_ID_INTEL         0x00D0B7
> > > +#define OSM_VENDOR_ID_MELLANOX      0x0002C9
> > > +#define OSM_VENDOR_ID_REDSWITCH     0x000617
> > > +#define OSM_VENDOR_ID_SILVERSTORM   0x00066A
> > > +#define OSM_VENDOR_ID_TOPSPIN       0x0005AD
> > > +#define OSM_VENDOR_ID_FUJITSU       0x00E000
> > > +#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
> > > +#define OSM_VENDOR_ID_VOLTAIRE      0x0008F1
> > > +#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
> > > +#define OSM_VENDOR_ID_PATHSCALE     0x001175
> > > +#define OSM_VENDOR_ID_IBM           0x000255
> > > +#define OSM_VENDOR_ID_DIVERGENET    0x00084E
> > > +#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
> > > +#define OSM_VENDOR_ID_AGILENT       0x0030D3
> > > +#define OSM_VENDOR_ID_OBSIDIAN      0x001777
> > > +#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
> > > +#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
> > > +/**********/
> > > +
> > >  END_C_DECLS
> > >  
> > >  #endif	/* _OSM_BASE_H_ */
> > > Index: opensm/osm_sa_multipath_record.c
> > > ===================================================================
> > > --- opensm/osm_sa_multipath_record.c	(revision 9493)
> > > +++ opensm/osm_sa_multipath_record.c	(working copy)
> > > @@ -150,6 +150,75 @@ osm_mpr_rcv_init(
> > >  
> > >  /**********************************************************************
> > >   **********************************************************************/
> > > +static inline boolean_t
> > > +__osm_sa_multipath_rec_is_tavor_port(
> > > +	IN const osm_port_t*     const p_port)
> > > +{
> > > +	osm_node_t const* p_node;
> > > +	ib_net32_t vend_id;
> > > +
> > > +	p_node = osm_port_get_parent_node( p_port );
> > > +	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
> > > +	
> > > +	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
> > > +			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
> > > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
> > > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
> > > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
> > > +}
> > > +
> > > +/**********************************************************************
> > > + **********************************************************************/
> > > +boolean_t
> > > + __osm_sa_multipath_rec_apply_tavor_mtu_limit(
> > > +  IN const ib_multipath_rec_t*  const p_mpr,
> > > +  IN const osm_port_t*          const p_src_port,
> > > +  IN const osm_port_t*          const p_dest_port,
> > > +  IN const ib_net64_t           comp_mask)
> > > +{
> > > +	uint8_t   required_mtu;
> > > +	
> > > +	/* only if one of the ports is a Tavor device */
> > > +	if (! __osm_sa_multipath_rec_is_tavor_port(p_src_port) && 
> > > +		 ! __osm_sa_multipath_rec_is_tavor_port(p_dest_port) )
> > > +		return( FALSE );
> > > +	
> > > +	/*
> > > +	  we can apply the patch if either:
> > > +	  1. No MTU required
> > > +	  2. Required MTU < 
> > > +	  3. Required MTU = 1K or 512 or 256
> > > +	  4. Required MTU > 256 or 512
> > > +	*/
> > > +	required_mtu = ib_multipath_rec_mtu( p_mpr );
> > > +	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> > > +		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
> > 
> > Should here be IB_MPR_COMPMASK_* instead of IB_PR_COMPMASK_*?
> 
> Good catch.
> 
> > > +	{
> > > +		switch( ib_multipath_rec_mtu_sel( p_mpr ) )
> > > +		{
> > > +		case 0:    /* must be greater than */
> > > +		case 2:    /* exact match */
> > > +			if( IB_MTU_LEN_1024 < required_mtu )
> > > +				return(FALSE);
> > > +			break;
> > > +
> > > +		case 1:    /* must be less than */
> > > +		case 3:    /* largest available */
> > > +			/* can't be disqualified by this one */
> > > +			break;
> > > +			
> > > +		default:
> > > +			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
> > > +			CL_ASSERT( FALSE );
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	return(TRUE);
> > > +}
> > > +
> > > +/**********************************************************************
> > > + **********************************************************************/
> > >  static ib_api_status_t
> > >  __osm_mpr_rcv_get_path_parms(
> > >    IN osm_mpr_rcv_t*		const p_rcv,
> > > @@ -195,6 +264,23 @@ __osm_mpr_rcv_get_path_parms(
> > >    mtu = ib_port_info_get_mtu_cap( p_pi );
> > >    rate = ib_port_info_compute_rate( p_pi );
> > >  
> > > +  /* 
> > > +	  Mellanox Tavor device performance is better using 1K MTU.
> > > +	  If required MTU and MTU selector are such that 1K is OK 
> > > +	  and one of the ends of the path is Tavor we override the
> > > +	  port MTU with 1K.
> > > +  */
> > > +  if ( p_rcv->p_subn->opt.enable_quirks &&
> > > +		 __osm_sa_multipath_rec_apply_tavor_mtu_limit(
> > > +			 p_mpr, p_src_port, p_dest_port, comp_mask) )
> > > +	  if (mtu > IB_MTU_LEN_1024) 
> > > +	  {
> > > +		  mtu = IB_MTU_LEN_1024;
> > > +  		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> > > +					  "__osm_mpr_rcv_get_path_parms: "
> > > +					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
> > > +	  }
> > > +
> > 
> > This part is pure hardcode, isn't it? Could this be at least isolated in
> > single call 'osm_*_do_quirks()' or like thin?
> 
> Perhaps. This can be worked on the trunk.
> 
> > >    if ( comp_mask & IB_MPR_COMPMASK_RAWTRAFFIC &&
> > >         cl_ntoh32( p_mpr->hop_flow_raw ) & ( 1<<31 ) )
> > >      required_pkey = osm_physp_find_common_pkey( p_physp, p_dest_physp );
> > > Index: opensm/osm_subnet.c
> > > ===================================================================
> > > --- opensm/osm_subnet.c	(revision 9493)
> > > +++ opensm/osm_subnet.c	(working copy)
> > > @@ -494,6 +494,7 @@ osm_subn_set_default_opt(
> > >    p_opt->ucast_dump_file = NULL;
> > >    p_opt->updn_guid_file = NULL;
> > >    p_opt->exit_on_fatal = TRUE;
> > > +  p_opt->enable_quirks = FALSE;
> > >    subn_set_default_qos_options(&p_opt->qos_options);
> > >    subn_set_default_qos_options(&p_opt->qos_ca_options);
> > >    subn_set_default_qos_options(&p_opt->qos_sw0_options);
> > > @@ -979,6 +980,10 @@ osm_subn_parse_conf_file(
> > >        subn_parse_qos_options("qos_rtr",
> > >          p_key, p_val, &p_opts->qos_rtr_options);
> > >  
> > > +      __osm_subn_opts_unpack_boolean(
> > > +        "enable_quirks",
> > > +        p_key, p_val, &p_opts->enable_quirks);
> > > +
> > >      }
> > >    }
> > >    fclose(opts_file);
> > > @@ -1179,11 +1184,15 @@ osm_subn_write_conf_file(
> > >      "force_log_flush %s\n\n"
> > >      "# Log file to be used\n"
> > >      "log_file %s\n\n"
> > > +	 "# Limit the the size of the log file. If overrun log is restarted\n"
> > >      "log_max_size %lu\n\n"
> > > +	 "# If TRUE will accumulate the log over multiple OpenSM sessions\n"
> > >      "accum_log_file %s\n\n"
> > >      "# The directory to hold the file OpenSM dumps\n"
> > >      "dump_files_dir %s\n\n"
> > > -    "# If TRUE if OpenSM should disable multicast support\n"
> > > +	 "# If TRUE enables new high risk options and hardware specific quirks\n"
> > > +	 "enable_quirks %s\n\n"
> > > +    "# If TRUE OpenSM should disable multicast support\n"  
> > >      "no_multicast_option %s\n\n"
> > >      "# No multicast routing is performed if TRUE\n"
> > >      "disable_multicast %s\n\n"
> > > @@ -1195,6 +1204,7 @@ osm_subn_write_conf_file(
> > >      p_opts->log_max_size,
> > >      p_opts->accum_log_file ? "TRUE" : "FALSE",
> > >      p_opts->dump_files_dir,
> > > +    p_opts->enable_quirks ? "TRUE" : "FALSE",
> > >      p_opts->no_multicast_option ? "TRUE" : "FALSE",
> > >      p_opts->disable_multicast ? "TRUE" : "FALSE",
> > >      p_opts->exit_on_fatal ? "TRUE" : "FALSE"
> > > Index: opensm/osm_helper.c
> > > ===================================================================
> > > --- opensm/osm_helper.c	(revision 9493)
> > > +++ opensm/osm_helper.c	(working copy)
> > > @@ -2289,24 +2289,6 @@ osm_get_node_type_str_fixed_width(
> > >    return( __osm_node_type_str_fixed_width[node_type] );
> > >  }
> > >  
> > > -#define OSM_VENDOR_ID_INTEL         0x00D0B7
> > > -#define OSM_VENDOR_ID_MELLANOX      0x0002C9
> > > -#define OSM_VENDOR_ID_REDSWITCH     0x000617
> > > -#define OSM_VENDOR_ID_SILVERSTORM     0x00066A
> > > -#define OSM_VENDOR_ID_TOPSPIN    0x0005AD
> > > -#define OSM_VENDOR_ID_FUJITSU    0x00E000
> > > -#define OSM_VENDOR_ID_FUJITSU2      0x000B5D
> > > -#define OSM_VENDOR_ID_VOLTAIRE   0x0008F1
> > > -#define OSM_VENDOR_ID_YOTTAYOTTA    0x000453
> > > -#define OSM_VENDOR_ID_PATHSCALE     0x001175
> > > -#define OSM_VENDOR_ID_IBM           0x000255
> > > -#define OSM_VENDOR_ID_DIVERGENET    0x00084E
> > > -#define OSM_VENDOR_ID_FLEXTRONICS   0x000B8C
> > > -#define OSM_VENDOR_ID_AGILENT       0x0030D3
> > > -#define OSM_VENDOR_ID_OBSIDIAN      0x001777
> > > -#define OSM_VENDOR_ID_BAYMICRO      0x000BC1
> > > -#define OSM_VENDOR_ID_LSILOGIC      0x00A0B8
> > > -
> > >  /**********************************************************************
> > >   **********************************************************************/
> > >  const char*
> > > Index: opensm/osm_sa_path_record.c
> > > ===================================================================
> > > --- opensm/osm_sa_path_record.c	(revision 9493)
> > > +++ opensm/osm_sa_path_record.c	(working copy)
> > > @@ -57,6 +57,7 @@
> > >  #include <complib/cl_passivelock.h>
> > >  #include <complib/cl_debug.h>
> > >  #include <complib/cl_qlist.h>
> > > +#include <opensm/osm_base.h>
> > >  #include <opensm/osm_sa_path_record.h>
> > >  #include <opensm/osm_port.h>
> > >  #include <opensm/osm_node.h>
> > > @@ -150,6 +151,75 @@ osm_pr_rcv_init(
> > >  
> > >  /**********************************************************************
> > >   **********************************************************************/
> > > +static inline boolean_t
> > > +__osm_sa_path_rec_is_tavor_port(
> > > +	IN const osm_port_t*     const p_port)
> > > +{
> > > +	osm_node_t const* p_node;
> > > +	ib_net32_t vend_id;
> > > +
> > > +	p_node = osm_port_get_parent_node( p_port );
> > > +	vend_id = ib_node_info_get_vendor_id( &p_node->node_info );
> > > +	
> > > +	return( (p_node->node_info.device_id == CL_HTON16(23108)) &&
> > > +			  ((vend_id == CL_HTON32(OSM_VENDOR_ID_MELLANOX)) || 
> > > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_TOPSPIN)) || 
> > > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_SILVERSTORM)) || 
> > > +				(vend_id == CL_HTON32(OSM_VENDOR_ID_VOLTAIRE))));
> > > +}
> > > +
> > > +/**********************************************************************
> > > + **********************************************************************/
> > > +static boolean_t
> > > + __osm_sa_path_rec_apply_tavor_mtu_limit(
> > > +  IN const ib_path_rec_t*  const p_pr,
> > > +  IN const osm_port_t*     const p_src_port,
> > > +  IN const osm_port_t*     const p_dest_port,
> > > +  IN const ib_net64_t      comp_mask)
> > > +{
> > > +	uint8_t   required_mtu;
> > > +	
> > > +	/* only if one of the ports is a Tavor device */
> > > +	if (! __osm_sa_path_rec_is_tavor_port(p_src_port) && 
> > > +		 ! __osm_sa_path_rec_is_tavor_port(p_dest_port) )
> > > +		return( FALSE );
> > > +	
> > > +	/*
> > > +	  we can apply the patch if either:
> > > +	  1. No MTU required
> > > +	  2. Required MTU < 
> > > +	  3. Required MTU = 1K or 512 or 256
> > > +	  4. Required MTU > 256 or 512
> > > +	*/
> > > +	required_mtu = ib_path_rec_mtu( p_pr );
> > > +	if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> > > +		  ( comp_mask & IB_PR_COMPMASK_MTU ) )
> > > +	{
> > > +		switch( ib_path_rec_mtu_sel( p_pr ) )
> > > +		{
> > > +		case 0:    /* must be greater than */
> > > +		case 2:    /* exact match */
> > > +			if( IB_MTU_LEN_1024 < required_mtu )
> > > +				return(FALSE);
> > > +			break;
> > > +
> > > +		case 1:    /* must be less than */
> > > +		case 3:    /* largest available */
> > > +			/* can't be disqualified by this one */
> > > +			break;
> > > +			
> > > +		default:
> > > +			/* if we're here, there's a bug in ib_path_rec_mtu_sel() */
> > > +			CL_ASSERT( FALSE );
> > > +			break;
> > > +		}
> > > +	}
> > > +
> > > +	return(TRUE);
> > > +}
> > > +
> > > +/**********************************************************************
> > > + **********************************************************************/
> > >  static ib_api_status_t
> > >  __osm_pr_rcv_get_path_parms(
> > >    IN osm_pr_rcv_t*         const p_rcv,
> > > @@ -191,6 +261,23 @@ __osm_pr_rcv_get_path_parms(
> > >    mtu = ib_port_info_get_mtu_cap( p_pi );
> > >    rate = ib_port_info_compute_rate( p_pi );
> > >  
> > > +  /* 
> > > +	  Mellanox Tavor device performance is better using 1K MTU.
> > > +	  If required MTU and MTU selector are such that 1K is OK 
> > > +	  and one of the ends of the path is Tavor we override the
> > > +	  port MTU with 1K.
> > > +  */
> > > +  if (  p_rcv->p_subn->opt.enable_quirks &&
> > > +		  __osm_sa_path_rec_apply_tavor_mtu_limit(
> > > +			  p_pr, p_src_port, p_dest_port, comp_mask) )
> > > +	  if (mtu > IB_MTU_LEN_1024) 
> > > +	  {
> > > +		  mtu = IB_MTU_LEN_1024;
> > > +		  osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> > > +					  "__osm_pr_rcv_get_path_parms: "
> > > +					  "Optimized Path MTU to 1K for Mellanox Tavor device\n");
> > > +	  }
> > > +
> > 
> > The same is here (about hardcodes).
> > 
> > Also I see that tavor specific functions are pretty similar for PR and
> > MPR cases. Why not to share this in something like osm_sa_quirks.c?
> 
> I think we can work on this on the trunk and see if there is an OFED 1.1
> opening.
> 
> -- Hal
> 
> > Sasha
> > 
> > >    /*
> > >      Walk the subnet object from source to destination,
> > >      tracking the most restrictive rate and mtu values along the way...
> > > @@ -444,10 +531,10 @@ __osm_pr_rcv_get_path_parms(
> > >    */
> > >  
> > >    /* we silently ignore cases where only the MTU selector is defined */
> > > +  required_mtu = ib_path_rec_mtu( p_pr );
> > >    if ( ( comp_mask & IB_PR_COMPMASK_MTUSELEC ) &&
> > >         ( comp_mask & IB_PR_COMPMASK_MTU ) )
> > >    {
> > > -    required_mtu = ib_path_rec_mtu( p_pr );
> > >      switch( ib_path_rec_mtu_sel( p_pr ) )
> > >      {
> > >      case 0:    /* must be greater than */
> > > 
> > > 
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > > 
> > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > > 
> 


From halr at voltaire.com  Fri Sep 15 17:12:21 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 15 Sep 2006 20:12:21 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <20060916000432.GB8912@sashak.voltaire.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
	<20060915221709.GB5891@sashak.voltaire.com>
	<1158361564.25157.67561.camel@hal.voltaire.com>
	<20060916000432.GB8912@sashak.voltaire.com>
Message-ID: <1158365510.25157.69612.camel@hal.voltaire.com>

On Fri, 2006-09-15 at 20:04, Sasha Khapyorsky wrote:
[snip...]
> > > > To summarize the functionality change:
> > > > 1. Added enable_quirks option 
> > > > 2. If enable_quirks is FALSE do nothing
> > > 
> > > I see those quirks are SA specific. Then should this option be called
> > > 'enable_sa_quirks' instead?
> > 
> > Not sure what the right "granularity" is for this. Would all quirks be
> > enabled at once or would this end up being a pick and choose ?
> 
> Of course this matters how we define this (so I was asking). Right now I
> see that this is used for SA.

Would there be other SA quirks ? Would there be SM quirks ?
Are they more hardware related and then tavor_quirks would make more
sense ? 

Would all the quirks be enabled together or is it a mix and match ?

It's unclear to me which is the best way to go on these aspects.

-- Hal


From bgreen at nas.nasa.gov  Sat Sep 16 00:29:15 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Sat, 16 Sep 2006 00:29:15 -0700
Subject: [openib-general] patch trouble
Message-ID: <200609160729.k8G7TFFl020478@ece06.nas.nasa.gov>

Hello,
Many of the patches in subversion fail to have an effect when I apply them to a kernel,
because they create headers in 'drivers/infiniband/include' which depend on being included
before the like-named headers in the toplevel 'include'.  Is there a step I am missing to
make the headers in 'drivers/infiniband/include' get chosen for inclusion first?

Here is an example of such a patch that creates a header file that never gets included:

https://openib.org/svn/gen2/branches/backport/2.6.12/gfp_6138_to_2_6_13.patch

Index: linux-2.6.9/drivers/infiniband/include/linux/types.h
===================================================================
--- /dev/null   1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.9/drivers/infiniband/include/linux/types.h      2006-04-02
14:40:14.000000000 +0200
@@ -0,0 +1,10 @@
+#ifndef LINUX_TYPES_BACKPORT_H
+#define LINUX_TYPES_BACKPORT_H
+
+#include_next <linux/types.h>
+
+#ifdef __KERNEL__
+typedef unsigned int gfp_t;
+#endif
+
+#endif


From eitan at mellanox.co.il  Sat Sep 16 05:19:45 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Sat, 16 Sep 2006 15:19:45 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <1158365510.25157.69612.camel@hal.voltaire.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
	<20060915221709.GB5891@sashak.voltaire.com>
	<1158361564.25157.67561.camel@hal.voltaire.com>
	<20060916000432.GB8912@sashak.voltaire.com>
	<1158365510.25157.69612.camel@hal.voltaire.com>
Message-ID: <450BEBE1.1060506@mellanox.co.il>

Hi Shasha, Hal,

I'm back online.

Hal Rosenstock wrote:

>On Fri, 2006-09-15 at 20:04, Sasha Khapyorsky wrote:
>[snip...]
>  
>
>>>>>To summarize the functionality change:
>>>>>1. Added enable_quirks option 
>>>>>2. If enable_quirks is FALSE do nothing
>>>>>          
>>>>>
>>>>I see those quirks are SA specific. Then should this option be called
>>>>'enable_sa_quirks' instead?
>>>>        
>>>>
>>>Not sure what the right "granularity" is for this. Would all quirks be
>>>enabled at once or would this end up being a pick and choose ?
>>>      
>>>
>>Of course this matters how we define this (so I was asking). Right now I
>>see that this is used for SA.
>>    
>>
>
>Would there be other SA quirks ? Would there be SM quirks ?
>Are they more hardware related and then tavor_quirks would make more
>sense ? 
>  
>
Who knows?

>Would all the quirks be enabled together or is it a mix and match ?
>  
>
I think that as we currently one only one quirk we can avoid this till 
later when we will have a few.

>It's unclear to me which is the best way to go on these aspects.
>
>-- Hal
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From eitan at mellanox.co.il  Sat Sep 16 05:20:18 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Sat, 16 Sep 2006 15:20:18 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <1158362707.25157.68156.camel@hal.voltaire.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
	<1158362707.25157.68156.camel@hal.voltaire.com>
Message-ID: <450BEC02.6010805@mellanox.co.il>

Hi Hal,

Many thanks!

Eitan

Hal Rosenstock wrote:

>Hi Eitan,
>
>On Fri, 2006-09-15 at 07:45, Eitan Zahavi wrote:
>  
>
>>Hi Hal
>>
>>The following patch solves an issue with OpenSM preferring largest MTU 
>>for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
>>devices instead of using a 1K MTU which is best for this device.
>>
>>Since this is a device specific quirk I have added a configuration option
>>named enable_quirks which is FALSE by default to enable this functionality.
>>
>>To summarize the functionality change:
>>1. Added enable_quirks option 
>>2. If enable_quirks is FALSE do nothing
>>3. If a specific MTU is requested (either =2K or >1K) do nothing
>>4. If either source port or destination port is a Tavor device  
>>	MTU is limited to 1K (can be further reduced by path traversal) 
>>
>>Target is both trunk and OFED 1.1
>>    
>>
>
>Thanks. Applied to both trunk and 1.1 with the MPR compmask change that
>Sasha saw and some other cosmetic changes. Please retest to be sure.
>
>-- Hal
>
>  
>
>>Thanks
>>
>>Eitan
>>
>>Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
>>    
>>
>
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From mst at mellanox.co.il  Sat Sep 16 10:56:28 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sat, 16 Sep 2006 20:56:28 +0300
Subject: [openib-general] patch trouble
In-Reply-To: <200609160729.k8G7TFFl020478@ece06.nas.nasa.gov>
References: <200609160729.k8G7TFFl020478@ece06.nas.nasa.gov>
Message-ID: <20060916175628.GB22267@mellanox.co.il>

Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> Subject: patch trouble
> 
> Hello,
> Many of the patches in subversion fail to have an effect when I apply them to a kernel,
> because they create headers in 'drivers/infiniband/include' which depend on being included
> before the like-named headers in the toplevel 'include'.  Is there a step I am missing to
> make the headers in 'drivers/infiniband/include' get chosen for inclusion first?

Note that backport patches are intended to be applied in an out-of-kernel
fashion - they are not changing the kernel at all.

So you build as an out-of-tree driver, and dd something like this to make
command line:

                LINUXINCLUDE='-I$(CWD)/include \
                -I$(CWD)/drivers/infiniband/include \
                -Iinclude \
                $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include) \
                -include include/linux/autoconf.h \
                -include $(CWD)/include/linux/autoconf.h \
                ' \

You can find an example here
https://openib.org/svn/gen2/trunk/ofed/openib/scripts/Makefile

BTW, Mellanox is not actively supporting backport patches on the svn trunk.
If you want code that works on something other than 2.6.17,
I suggest you pull backports for the ofed branch (forked from
2.6.18-rc6) from ofed_1_1 tree by pulling
git://www.mellanox.co.il/~git/infiniband ofed_1_1
and looking in ofed_scripts directory.

-- 
MST


From mst at mellanox.co.il  Sat Sep 16 21:46:26 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 07:46:26 +0300
Subject: [openib-general] 2 SLES 10 backport directories
In-Reply-To: <450915EE.1090705@voltaire.com>
References: <450915EE.1090705@voltaire.com>
Message-ID: <20060917044626.GA26054@mellanox.co.il>

Quoting r. Erez Zilber <erezz at voltaire.com>:
> Subject: 2 SLES 10 backport directories
> 
> Michael,
> 
> I saw that there are 2 SLES 10 backport directories in the svn:
> 
> https://openib.org/svn/gen2/branches/backport/sles10/ - this one 
> contains patches that we added for SLES 10
> 
> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one 
> was added later by you.
> 
> Can we unite them?
> 
> Here's my motivation: I want to be able to install SLES 10, replace its 
> infiniband dir with infiniband from openib's svn, apply all SLES 10 
> patches (from a single directory) and then it should work.
> 
> This should help us in future OFED releases.

I'd like that too, but there's a difficulty here.

The rest of the backport patches make it possible to build
IB support out of kernel, without patching the kernel code itself.
This is an explicit requirement of some users, so we made an effort
to preserve this ability, and so far it works with the rest of the IB stack -
assuming that user has built infiniband support as a module or disabled it -
but that's what most people currenty have, anyway.

Unfortunately sles10 patches for iser that you mention violate this rule - they
patch the iscsi support that is already there as part of the kernel.
So unless this can be fixed somehow, we need the iscsi stuff separate, so that
1. we know to apply it in kernel source directory, not where we unpacked IB code
2. it can be applied conditionally when the user has enabled iser, so that
   others still have the ability not to touch their kernel


-- 
MST


From erezz at voltaire.com  Sun Sep 17 02:05:15 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Sun, 17 Sep 2006 12:05:15 +0300
Subject: [openib-general] [PATCH] IB/iser: fix iSER description and
 selections in Kconfig
In-Reply-To: <adau03awjku.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com>
	<adau03awjku.fsf@cisco.com>
Message-ID: <450D0FCB.1000401@voltaire.com>

Roland Dreier wrote:
> Wouldn't it better just to depend on INET the way ISCSI_TCP does?
> 'select' is more fragile and harder to maintain than 'depends' since
> you always have to make sure you select the full dependency tree of
> every option you really need.
>
>  - R.
>   
There are 3 additional required config entries: NET, INET & 
INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or 'depned' on 
some of them and 'select' the rest?

Also, since I'm not familiar enough with 'make rndconfig', here's a 
question:
if iSER 'depends' on INET, is it possible that 'make rndconfig' will 
enable iSER without enabling INET?

Thanks
Erez


From ogerlitz at voltaire.com  Sun Sep 17 02:57:30 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 17 Sep 2006 12:57:30 +0300
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <1158263019.8759.324.camel@brick.pathscale.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<45093428.5010009@voltaire.com>
	<1158263019.8759.324.camel@brick.pathscale.com>
Message-ID: <450D1C0A.90906@voltaire.com>

Ralph Campbell wrote:
> Here is my thinking so far:
> 
> The driver is passed an LKEY/RKEY plus an address.
> For ib_get_dma_mr(), the address is currently from
> dma_map_single(), dma_map_page(), or dma_map_sg().
> With the ib_dma_*() routines, I can intercept these calls
> and return something instead of a bus or IOMMU address.
> I would like to return a kernel virtual address since that
> is the simplest and is what I ultimately need. This is
> trivial for dma_map_single() and trivial for low memory
> pages for dma_map_page().
> 
> I think I can safely just return error for architectures
> with high memory pages since the driver really only works
> on 64-bit systems (for a variety of reasons which I won't
> go into) and those systems don't have high memory.

Again (and please go and check me), pages you need to DMA (ie move over 
IB) need **not** be mapped into the kernel virtual address space and 
this happens **not** only under ia32 high-memory scheme, please see my 
other email for two examples (direct I/O etc)


> ib_sg_dma_address would return the page_address() of sg->page
> but wouldn't be able to rely on other fields which might be in
> the struct scatterlist.

your design seems to reply on three fields: page, offset and length, so

ib_sg_map_sg(scat) is kmap-ping whatever pages which are not mapped now 
into kvirt

ib_dma_unmap_sg(scat) is kunmap-ping those pages you were mapping before 
(you might need an aux data structure to keep which need kunmap)

ib_sg_dma_address(scat) is page_address(scat->page) + scat->offset

ib_sg_dma_len(scat) is scat->length

Or.


From ogerlitz at voltaire.com  Sun Sep 17 04:52:09 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 17 Sep 2006 14:52:09 +0300
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
In-Reply-To: <aday7smwjmy.fsf@cisco.com>
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
	<aday7smwjmy.fsf@cisco.com>
Message-ID: <450D36E9.1000502@voltaire.com>

Roland Dreier wrote:
>     Or> change INFINIBAND_ADDR_TRANS to INFINIBAND_RDMA_CM and add
>     Or> help text clarifying what the thing does. Adding the help text
>     Or> also has the side effect of the cma config being visible when
>     Or> one does make menuconfig
> 
> Why do we want to make this config option visible?  Isn't it better
> for it to just take the right value automatically?

I want it to be visible so if some other config **depends** on it the 
use can **see** this config and select it.

Also as of the importance of the rdma cm within the IB stack being along 
with the ib verbs the second access point to ULP coders, seeing its 
config and documenting it is important.

Or.


From moshek at voltaire.com  Sun Sep 17 06:15:33 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Sun, 17 Sep 2006 16:15:33 +0300
Subject: [openib-general] Mstflint - not working on ppc64 and when driver is
 not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB859E@taurus.voltaire.com>


Michael,
 
The attached patch was received from Frank (IBM) .
 
Frank change the mmap in the mopen function and now it is working o.k. 
on my IBM JS21 ppc64 (sles9 sp3 sles10) and IBM  HS21 (EM64T) sles9 sp3
all the computer uses PCI-Ex HCA cards

I tested this fix on AMD computer (PCI-X)  and found that it did not fix
the problem initially reported by Or Gerlitz in the attached message. 

Also, I suspect that it doesn't work on MAC ppc64 G5 with PCI-X . (I
have to repeated this test) .

I'm suspect that this this is a PCI-X to PCI-EX issue .

 
Moshe
____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Moshe Kazir 
Sent: Thursday, September 07, 2006 12:32 PM
To: 'Michael S. Tsirkin'
Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org; Yiftah Shahar;
Tseng-hui Lin
Subject: RE: [openib-general] getting LOC_QP_OP_ERR with IPoIB -
mstflint question


Let assume that the HCA has wrong FWR and/or other reason that cause
driver load failure  ?

We have to check what's going on in this case. ->  mstflint is one of
our tools.

Moshe.


____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Wednesday, September 06, 2006 4:25 PM
To: Moshe Kazir
Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org; Yiftah Shahar;
Tseng-hui Lin
Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB -
mstflint question


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Is it time to create a  work arround that opens /proc/bus/pci/ ....
> And always work ?

But why isn't the driver loaded?

-- 
MST
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mstflint.patch
Type: application/octet-stream
Size: 8720 bytes
Desc: mstflint.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060917/151822cf/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mstflint.from.frank.tar.gz
Type: application/x-gzip
Size: 46672 bytes
Desc: mstflint.from.frank.tar.gz
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060917/151822cf/attachment.bin>
-------------- next part --------------
An embedded message was scrubbed...
From: "Michael S. Tsirkin" <mst at mellanox.co.il>
Subject: Re: [openib-general] getting LOC_QP_OP_ERR with IPoIB -	mstflint question
Date: Tue, 5 Sep 2006 16:36:50 +0300
Size: 5018
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060917/151822cf/attachment.mht>

From mst at mellanox.co.il  Sun Sep 17 06:34:49 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 16:34:49 +0300
Subject: [openib-general] Mstflint - not working on ppc64 and when
 driver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB859E@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB859E@taurus.voltaire.com>
Message-ID: <20060917133449.GA28318@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: Mstflint - not working on ppc64 and when driver is not loaded on AMD
> 
> 
> Michael,
>  
> The attached patch was received from Frank (IBM) .

Wow, that's one big patch, I can't see what it's doing at all.
Can just the relevant fix be isolated?

> Frank change the mmap in the mopen function and now it is working o.k. 
> on my IBM JS21 ppc64 (sles9 sp3 sles10) and IBM  HS21 (EM64T) sles9 sp3
> all the computer uses PCI-Ex HCA cards

> I tested this fix on AMD computer (PCI-X)  and found that it did not fix
> the problem initially reported by Or Gerlitz in the attached message. 

That is, if it is even relevant?

> Also, I suspect that it doesn't work on MAC ppc64 G5 with PCI-X . (I
> have to repeated this test) .
> 
> I'm suspect that this this is a PCI-X to PCI-EX issue .
> 

Hmm.
What I can understand of the patch, it attempts using sysfs resource0
which is only implemented on kernels > 2.6.12 or 2.6.13, so
that's probably your issue.

Can you try passing the following to mstflint (my version):
-d /sys/bus/pci/devices/0000\:08\:00.0/resource0 q
where 0000\:08\:00.0 is the appropriate device?

Does this work with driver not loaded? On which OS-es?

-- 
MST


From kliteyn at dev.mellanox.co.il  Sun Sep 17 07:20:32 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Sun, 17 Sep 2006 17:20:32 +0300
Subject: [openib-general] [PATCH] osm: bug in __osmv_send_sa_req
Message-ID: <1158502832.8516.9.camel@kliteynik.yok.mtl.com>

Hi Hal 

This patch fixes a bug is __osmv_send_sa_req in libvendor.
After sending a MAD, the status of the responce was ignored.

Yevgeny 

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il> 

Index: libvendor/osm_vendor_ibumad_sa.c
===================================================================
--- libvendor/osm_vendor_ibumad_sa.c    (revision 9500)
+++ libvendor/osm_vendor_ibumad_sa.c    (working copy)
@@ -606,6 +606,7 @@ __osmv_send_sa_req(
              "Waiting for async event\n" );
     cl_event_wait_on( &p_bind->sync_event, EVENT_NO_TIMEOUT, FALSE );
     cl_event_reset(&p_bind->sync_event);
+    status = p_madw->status;
   }

  Exit:


From moshek at voltaire.com  Sun Sep 17 07:41:15 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Sun, 17 Sep 2006 17:41:15 +0300
Subject: [openib-general] OpenSm on sles10 ppc64
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85A1@taurus.voltaire.com>


/etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10  OFED
1.0 .

Should ppc64 SLES10  OFED 1.0 work ?

Anyone tried it ?

Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: openfabrics-ewg-bounces at openib.org
[mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of
vlad at dev.mellanox.co.il
Sent: Thursday, September 14, 2006 7:39 PM
To: openfabrics-ewg at openib.org
Cc: openib-general at openib.org
Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready


Hi,

OFED-1.1-rc5 is available on
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
File: OFED-1.1-rc5.tgz
Please report any issues in bugzilla http://openib.org/bugzilla/


Release details:
================
Build_id:

OFED-1.1-rc5

openib-1.1 (REV=9485)
# User space https://openib.org/svn/gen2/branches/1.1/src/userspace
Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1
commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09

# MPI
mpi_osu-0.9.7-mlx2.2.0.tgz
openmpi-1.1.1-1.src.rpm
mpitests-2.0-0.src.rpm

OS support:
===========
Novell:
     - SLES 9.0 SP3
     - SLES10
Redhat:
     - Redhat EL4 up3

     - Redhat EL4 up4
kernel.org:
     - Kernel 2.6.17


Bug fixes from OFED-1.1-rc4:
==========================
1. ISER compilation fixed on SLES10
2. Fixed build on SLES9 PPC64
3. Updated libehca
4. OpenSM fixes
5. Added tavor_quirk option to rdma_cm module (disabled by default):
Tavor performance quirk: limit MTU to 1K if > 0 (int)

Known issues:
=============
libipathverbs compilation fails on SLES10 (Bug:204)


OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday
or Tuesday.


Regards,
Vladimir


> Hi,
>
> The plan is to issue OFED RC5 on Thursday 9/14 and final release next 
> week. I am aware of the  following issues:
>
>
> 1) Compilation on SLES9 on PPC     - Jack Morgenstein
> 2) Huge pages on PPC                      - Eli Cohen
> 3) libipathverbs:                                 - Qlogic
>             a) libipathverbs ABI issue
>             b) libipathverbs build on SLES10
> 4) SDP performance on Tavor           - Michael Tsirkin
> 5) iSER issue on SLES10                   - Voltaire
>
>
> In order to meet tomorrow's RC5 release all owners please send your 
> patches by end of today.
>
>
> Regards,
>
>     Aviram
>
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org 
> http://openib.org/mailman/listinfo/openfabrics-ewg
>


_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


From eitan at mellanox.co.il  Sun Sep 17 07:52:49 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 17:52:49 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108
 - not for MTU Sel=3
Message-ID: <864pv6mtoe.fsf@mtl066.yok.mtl.com>

Hi Hal

We have reviewed the patch for the above and figured out there is an
issue with it:
Currently when MTU_SEL=3 the quirk applies.
We think this is wrong behavior as MTU_SEL=3 means "max possible MTU" by 
the IBTA spec. So if an application/ULP would like to get the max MTU possible 
the correct answer is 2K for Tavor by the spec.
So this patch fxies the quirk and when MTU_SEL=3 it does not apply the MTU
limit quirk for Tavor devices.

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: 1.1/src/userspace/management/osm/opensm/osm_sa_multipath_record.c
===================================================================
--- 1.1/src/userspace/management/osm/opensm/osm_sa_multipath_record.c	(revision 9500)
+++ 1.1/src/userspace/management/osm/opensm/osm_sa_multipath_record.c	(working copy)
@@ -203,9 +203,13 @@ boolean_t
       break;
 
     case 1:    /* must be less than */
-    case 3:    /* largest available */
 	       /* can't be disqualified by this one */
       break;
+    case 3:    /* largest available */
+               /* the ULP intentionally requested */
+               /* the largest MTU possible */
+      return(FALSE);
+      break;
 			
     default:
       /* if we're here, there's a bug in ib_multipath_rec_mtu_sel() */
Index: opensm/osm_sa_path_record.c
===================================================================
--- 1.1/src/userspace/management/osm/opensm/osm_sa_path_record.c	(revision 9500)
+++ 1.1/src/userspace/management/osm/opensm/osm_sa_path_record.c	(working copy)
@@ -204,9 +204,13 @@ static boolean_t
       break;
 
     case 1:    /* must be less than */
-    case 3:    /* largest available */
                /* can't be disqualified by this one */
       break;
+    case 3:    /* largest available */
+               /* the ULP intentionally requested */
+               /* the largest MTU possible */
+      return(FALSE);
+      break;
 			
     default:
       /* if we're here, there's a bug in ib_path_rec_mtu_sel() */


From halr at voltaire.com  Sun Sep 17 07:49:39 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 17 Sep 2006 10:49:39 -0400
Subject: [openib-general] [openfabrics-ewg] OpenSm on sles10 ppc64
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85A1@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85A1@taurus.voltaire.com>
Message-ID: <1158504477.25157.143740.camel@hal.voltaire.com>

Hi Moshe,

On Sun, 2006-09-17 at 10:41, Moshe Kazir wrote:
> /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10  OFED
> 1.0 .

What error ?

> Should ppc64 SLES10  OFED 1.0 work ?

I don't think so.

> Anyone tried it ?

OFED 1.0 OpenSM release notes say:
* PPC support:
  No PPC QA was performed.

There was an issue with PPC64 that Sasha fixed post OFED 1.0. It's in
OFED 1.1 and could easily be retrofitted to OFED 1.0 if needed. Contact
Sasha or me if you are interested in doing this.

-- Hal

> 
> Moshe
> 
> ____________________________________________________________
> Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
>  
> Voltaire - The Grid Backbone
>  
> www.voltaire.com
> 
>   
> 
> 
> -----Original Message-----
> From: openfabrics-ewg-bounces at openib.org
> [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of
> vlad at dev.mellanox.co.il
> Sent: Thursday, September 14, 2006 7:39 PM
> To: openfabrics-ewg at openib.org
> Cc: openib-general at openib.org
> Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready
> 
> 
> Hi,
> 
> OFED-1.1-rc5 is available on
> https://openib.org/svn/gen2/branches/1.1/ofed/releases/
> File: OFED-1.1-rc5.tgz
> Please report any issues in bugzilla http://openib.org/bugzilla/
> 
> 
> Release details:
> ================
> Build_id:
> 
> OFED-1.1-rc5
> 
> openib-1.1 (REV=9485)
> # User space https://openib.org/svn/gen2/branches/1.1/src/userspace
> Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1
> commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09
> 
> # MPI
> mpi_osu-0.9.7-mlx2.2.0.tgz
> openmpi-1.1.1-1.src.rpm
> mpitests-2.0-0.src.rpm
> 
> OS support:
> ===========
> Novell:
>      - SLES 9.0 SP3
>      - SLES10
> Redhat:
>      - Redhat EL4 up3
> 
>      - Redhat EL4 up4
> kernel.org:
>      - Kernel 2.6.17
> 
> 
> Bug fixes from OFED-1.1-rc4:
> ==========================
> 1. ISER compilation fixed on SLES10
> 2. Fixed build on SLES9 PPC64
> 3. Updated libehca
> 4. OpenSM fixes
> 5. Added tavor_quirk option to rdma_cm module (disabled by default):
> Tavor performance quirk: limit MTU to 1K if > 0 (int)
> 
> Known issues:
> =============
> libipathverbs compilation fails on SLES10 (Bug:204)
> 
> 
> OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday
> or Tuesday.
> 
> 
> Regards,
> Vladimir
> 
> 
> > Hi,
> >
> > The plan is to issue OFED RC5 on Thursday 9/14 and final release next 
> > week. I am aware of the  following issues:
> >
> >
> > 1) Compilation on SLES9 on PPC     - Jack Morgenstein
> > 2) Huge pages on PPC                      - Eli Cohen
> > 3) libipathverbs:                                 - Qlogic
> >             a) libipathverbs ABI issue
> >             b) libipathverbs build on SLES10
> > 4) SDP performance on Tavor           - Michael Tsirkin
> > 5) iSER issue on SLES10                   - Voltaire
> >
> >
> > In order to meet tomorrow's RC5 release all owners please send your 
> > patches by end of today.
> >
> >
> > Regards,
> >
> >     Aviram
> >
> > _______________________________________________
> > openfabrics-ewg mailing list
> > openfabrics-ewg at openib.org 
> > http://openib.org/mailman/listinfo/openfabrics-ewg
> >
> 
> 
> 
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 
> 
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 


From jackm at dev.mellanox.co.il  Sun Sep 17 08:01:37 2006
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Sun, 17 Sep 2006 18:01:37 +0300
Subject: [openib-general] What can be the reason for VAPI_WR_FLUSH_ERR
 when sending from gen2 to gen1
In-Reply-To: <B79FAF8BB536314E859EA1963CFFD22201FBD397@wdtssmail01.eu.thmulti.com>
References: <B79FAF8BB536314E859EA1963CFFD22201FBD397@wdtssmail01.eu.thmulti.com>
Message-ID: <200609171801.37678.jackm@dev.mellanox.co.il>

On Friday 15 September 2006 12:37, Bub Thomas wrote:
> I'm now in the situation that I have a gen2 client connected to a gen1
> server via CM.
> Unfortunately the first IBV_WR_SEND causes a:
> (syndrome=0xf9=VAPI_WR_FLUSH_ERR , opcode=6=VAPI_CQE_RQ_SEND_DATA)
> error in the receive completion queue of the server.
> 
Its not at all clear what the error could be.  The Gen1 and Gen2 stacks
are implemented with totally different code.

Some suggestions (together with dotan at mellanox.co.il):
1. Connect a CATC/analyzer to the wire and capture the detailed traffic.
   Examine the CM messages exchanged to see that they are correct.

2. It sounds like the server QP is already in an error state when the first
   send is performed. Query the QP on the server side before performing the
   first server send to verify that it is in the RTS state.

3. Examine /var/log/messages on the server side to see if there were any
   CQ overruns (which would cause the associated QP to enter an error state).

PLEASE NOTE:  The opcode field is NOT valid in a completion-with-error. The only
	valid fields upon error completion are the status and work-request-id
	fields (all other completion fields are undefined).  Therefore, you
	cannot depend on the opcode value!  You need to save work request
        information keyed to the transaction ID to know what really happened.

Another question:  is the send you are talking about on the client side?
Is it a regular send, or an rdma operation?

- Jack


From eitan at mellanox.co.il  Sun Sep 17 08:57:50 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:57:50 +0300
Subject: [openib-general] [PATCH 1/13] osm: port to WinIB stack :
 include/opensm/osm_base.h
Message-ID: <863baqmqo1.fsf@mtl066.yok.mtl.com>

Hi Hal

osm_base.h uses cache dir for osm-partitions.conf.

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: include/opensm/osm_base.h
===================================================================
--- include/opensm/osm_base.h	(revision 9502)
+++ include/opensm/osm_base.h	(working copy)
@@ -231,7 +231,7 @@ BEGIN_C_DECLS
 * SYNOPSIS
 */
 #ifdef __WIN__
-#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmPath(), "osm-partitions.conf")
+#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmCachePath(), "osm-partitions.conf")
 #else
 #define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/osm-partitions.conf"
 #endif


From eitan at mellanox.co.il  Sun Sep 17 08:58:06 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:58:06 +0300
Subject: [openib-general] [PATCH 1/13] osm: port to WinIB stack :
 include/opensm/osm_base.h
Message-ID: <861wqamqnl.fsf@mtl066.yok.mtl.com>

Hi Hal

osm_base.h uses cache dir for osm-partitions.conf.

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: include/opensm/osm_base.h
===================================================================
--- include/opensm/osm_base.h	(revision 9502)
+++ include/opensm/osm_base.h	(working copy)
@@ -231,7 +231,7 @@ BEGIN_C_DECLS
 * SYNOPSIS
 */
 #ifdef __WIN__
-#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmPath(), "osm-partitions.conf")
+#define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmCachePath(), "osm-partitions.conf")
 #else
 #define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/osm-partitions.conf"
 #endif


From eitan at mellanox.co.il  Sun Sep 17 08:58:33 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:58:33 +0300
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 include/opensm/osm_pkey.h
Message-ID: <86zmcylc2e.fsf@mtl066.yok.mtl.com>

Hi Hal

Partition tables blocks are always 16 bits. 
This resolves the need to later cast back and forth.

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: include/opensm/osm_pkey.h
===================================================================
--- include/opensm/osm_pkey.h	(revision 9502)
+++ include/opensm/osm_pkey.h	(working copy)
@@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl
 typedef struct _osm_pending_pkey {
   cl_list_item_t	list_item;
   uint16_t		pkey;
-  uint32_t		block;
+  uint16_t		block;
   uint8_t		index;
   boolean_t		is_new;
 } osm_pending_pkey_t;
@@ -396,7 +396,7 @@ ib_api_status_t
 osm_pkey_tbl_get_block_and_idx(
   IN  osm_pkey_tbl_t *p_pkey_tbl, 
   IN  uint16_t       *p_pkey,
-  OUT uint32_t       *block_idx,
+  OUT uint16_t       *block_idx,
   OUT uint8_t        *pkey_index);
 /*
 *  p_pkey_tbl


From eitan at mellanox.co.il  Sun Sep 17 08:58:51 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:58:51 +0300
Subject: [openib-general] [PATCH 3/13] osm: port to WinIB stack :
	include/iba/ib_types.h
Message-ID: <86y7silc1w.fsf@mtl066.yok.mtl.com>

Hi Hal

Most are just adding OSM_API for fucntion declarations.
Some minor indentations.

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: include/iba/ib_types.h
===================================================================
--- include/iba/ib_types.h	(revision 9502)
+++ include/iba/ib_types.h	(working copy)
@@ -52,6 +52,19 @@
 
 BEGIN_C_DECLS
 
+#if defined( WIN32 ) || defined( _WIN64 )
+    #if defined( EXPORT_AL_SYMBOLS )
+         #define OSM_EXPORT	__declspec(dllexport)
+    #else
+         #define OSM_EXPORT	__declspec(dllimport)
+    #endif
+    #define OSM_API __stdcall
+#else
+    #define OSM_EXPORT	extern
+    #define OSM_API
+    #define __ptr64
+#endif
+
 /****h* IBA Base/Constants
 * NAME
 *	Constants
@@ -573,7 +586,7 @@ BEGIN_C_DECLS
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_class_is_vendor_specific_low(
 	IN		const	uint8_t class_code )
 {
@@ -605,7 +618,7 @@ ib_class_is_vendor_specific_low(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_class_is_vendor_specific_high(
 	IN		const	uint8_t class_code )
 {
@@ -637,7 +650,7 @@ ib_class_is_vendor_specific_high(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_class_is_vendor_specific(
 	IN		const	uint8_t class_code )
 {
@@ -668,7 +681,7 @@ ib_class_is_vendor_specific(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_class_is_rmpp(
         IN              const   uint8_t class_code )
 {
@@ -1297,6 +1310,7 @@ ib_class_is_rmpp(
 *	IB_MAD_ATTR_SLVL_RECORD
 *
 * DESCRIPTION
+*	VSLtoL Map Table attribute (15.2.5)
 *	SLtoVL Mapping Table Record attribute (15.2.5)
 *
 * SOURCE
@@ -1680,7 +1694,7 @@ ib_class_is_rmpp(
 *	IB_PATH_REC_BASE_MASK
 *
 * DESCRIPTION
-*	Mask for the base value field for path record MTU, rate,
+*	Mask for the base value field for path record MTU, rate
 *	and packet lifetime.
 *
 * SOURCE
@@ -1768,7 +1782,7 @@ typedef ib_net64_t		ib_gid_prefix_t;
 */
 #define IB_LINK_NO_CHANGE 0
 #define IB_LINK_DOWN      1
-#define IB_LINK_INIT	  2
+#define IB_LINK_INIT	   2
 #define IB_LINK_ARMED     3
 #define IB_LINK_ACTIVE    4
 #define IB_LINK_ACT_DEFER 5
@@ -1792,7 +1806,7 @@ static const char* const __ib_node_type_
 *
 * SYNOPSIS
 */
-static inline const char*
+static inline const char*	OSM_API
 ib_get_node_type_str(
 	IN uint32_t node_type )
 {
@@ -1834,7 +1848,7 @@ static const char* const __ib_port_state
 *
 * SYNOPSIS
 */
-static inline const char*
+static inline const char*	OSM_API
 ib_get_port_state_str(
 	IN				uint8_t						port_state )
 {
@@ -1865,7 +1879,7 @@ ib_get_port_state_str(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_get_port_state_from_str(
 	IN				char*						p_port_state_str )
 {
@@ -1920,7 +1934,7 @@ ib_get_port_state_from_str(
 *
 * SYNOPSIS
 */
-static inline ib_net16_t
+static inline ib_net16_t	OSM_API
 ib_pkey_get_base(
 	IN		const	ib_net16_t					pkey )
 {
@@ -1947,7 +1961,7 @@ ib_pkey_get_base(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_pkey_is_full_member(
 	IN		const	ib_net16_t					pkey )
 {
@@ -1979,7 +1993,7 @@ ib_pkey_is_full_member(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_pkey_is_invalid(
 	IN		const	ib_net16_t					pkey )
 {
@@ -2044,7 +2058,7 @@ typedef union _ib_gid
 * SEE ALSO
 *********/
 
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_gid_is_multicast(
 	IN		const	ib_gid_t*					p_gid )
 {
@@ -2060,7 +2074,7 @@ ib_gid_is_multicast(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_gid_set_default(
 	IN				ib_gid_t* const				p_gid,
 	IN		const	ib_net64_t					interface_id )
@@ -2093,7 +2107,7 @@ ib_gid_set_default(
 *
 * SYNOPSIS
 */
-static inline ib_net64_t
+static inline ib_net64_t	OSM_API
 ib_gid_get_subnet_prefix(
 	IN		const	ib_gid_t* const				p_gid )
 {
@@ -2122,7 +2136,7 @@ ib_gid_get_subnet_prefix(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_gid_is_link_local(
 	IN		const	ib_gid_t* const				p_gid )
 {
@@ -2152,7 +2166,7 @@ ib_gid_is_link_local(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_gid_is_site_local(
 	IN		const	ib_gid_t* const				p_gid )
 {
@@ -2182,7 +2196,7 @@ ib_gid_is_site_local(
 *
 * SYNOPSIS
 */
-static inline ib_net64_t
+static inline ib_net64_t	OSM_API
 ib_gid_get_guid(
 	IN		const	ib_gid_t* const				p_gid )
 {
@@ -2539,7 +2553,7 @@ typedef struct _ib_path_rec
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_path_rec_init_local(
 	IN	ib_path_rec_t* const	p_rec,
 	IN	ib_gid_t* const		p_dgid,
@@ -2649,7 +2663,7 @@ ib_path_rec_init_local(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_num_path(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2674,11 +2688,11 @@ ib_path_rec_num_path(
 *	ib_path_rec_sl
 *
 * DESCRIPTION
-*	Get service level.
+*	Get path service level.
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_sl(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2707,7 +2721,7 @@ ib_path_rec_sl(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_mtu(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2742,7 +2756,7 @@ ib_path_rec_mtu(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_mtu_sel(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2775,7 +2789,7 @@ ib_path_rec_mtu_sel(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_rate(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2814,7 +2828,7 @@ ib_path_rec_rate(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_rate_sel(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2847,7 +2861,7 @@ ib_path_rec_rate_sel(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_pkt_life(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2876,7 +2890,7 @@ ib_path_rec_pkt_life(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_pkt_life_sel(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2909,7 +2923,7 @@ ib_path_rec_pkt_life_sel(
 *
 * SYNOPSIS
 */
-static inline uint32_t
+static inline uint32_t	OSM_API
 ib_path_rec_flow_lbl(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -2938,7 +2952,7 @@ ib_path_rec_flow_lbl(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_rec_hop_limit(
 	IN		const	ib_path_rec_t* const		p_rec )
 {
@@ -3141,7 +3155,7 @@ typedef struct _ib_sm_info
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_sminfo_get_priority(
 	IN		const	ib_sm_info_t* const			p_smi )
 {
@@ -3169,7 +3183,7 @@ ib_sminfo_get_priority(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_sminfo_get_state(
 	IN		const	ib_sm_info_t* const			p_smi )
 {
@@ -3287,7 +3301,7 @@ typedef struct _ib_rmpp_mad
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_mad_init_new(
 	IN				ib_mad_t* const				p_mad,
 	IN		const	uint8_t						mgmt_class,
@@ -3350,7 +3364,7 @@ ib_mad_init_new(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_mad_init_response(
 	IN		const	ib_mad_t* const				p_req_mad,
 	IN				ib_mad_t* const				p_mad,
@@ -3395,7 +3409,7 @@ ib_mad_init_response(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_mad_is_response(
 	IN		const	ib_mad_t* const				p_mad )
 {
@@ -3452,7 +3466,7 @@ ib_mad_is_response(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_rmpp_is_flag_set(
 	IN		const	ib_rmpp_mad_t* const		p_rmpp_mad,
 	IN		const	uint8_t						flag )
@@ -3477,7 +3491,7 @@ ib_rmpp_is_flag_set(
 *	ib_mad_t, ib_rmpp_mad_t
 *********/
 
-static inline void
+static inline void	OSM_API
 ib_rmpp_set_resp_time(
 	IN				ib_rmpp_mad_t* const		p_rmpp_mad,
 	IN		const	uint8_t						resp_time )
@@ -3487,7 +3501,7 @@ ib_rmpp_set_resp_time(
 }
 
 
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_rmpp_get_resp_time(
 	IN		const	ib_rmpp_mad_t* const		p_rmpp_mad )
 {
@@ -3624,7 +3638,7 @@ typedef struct _ib_smp
 *
 * SYNOPSIS
 */
-static inline ib_net16_t
+static inline ib_net16_t	OSM_API
 ib_smp_get_status(
 	IN		const	ib_smp_t* const				p_smp )
 {
@@ -3653,7 +3667,7 @@ ib_smp_get_status(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_smp_is_response(
 	IN		const	ib_smp_t* const				p_smp )
 {
@@ -3681,7 +3695,7 @@ ib_smp_is_response(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_smp_is_d(
 	IN		const	ib_smp_t* const				p_smp )
 {
@@ -3714,7 +3728,7 @@ ib_smp_is_d(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_smp_init_new(
 	IN				ib_smp_t* const				p_smp,
 	IN		const	uint8_t						method,
@@ -3800,7 +3814,7 @@ ib_smp_init_new(
 *
 * SYNOPSIS
 */
-static inline void*
+static inline void*	OSM_API
 ib_smp_get_payload_ptr(
 	IN		const	ib_smp_t* const				p_smp )
 {
@@ -3894,14 +3908,14 @@ typedef struct _ib_sa_mad
 /**********/
 #define IB_SA_MAD_HDR_SIZE (sizeof(ib_sa_mad_t) - IB_SA_DATA_SIZE)
 
-static inline uint32_t
+static inline uint32_t	OSM_API
 ib_get_attr_size(
 	IN	const	ib_net16_t				attr_offset )
 {
 	return( ((uint32_t)cl_ntoh16( attr_offset )) << 3 );
 }
 
-static inline ib_net16_t
+static inline ib_net16_t	OSM_API
 ib_get_attr_offset(
 	IN	const	uint32_t				attr_size )
 {
@@ -3917,7 +3931,7 @@ ib_get_attr_offset(
 *
 * SYNOPSIS
 */
-static inline void*
+static inline void*	OSM_API
 ib_sa_mad_get_payload_ptr(
 	IN	const	ib_sa_mad_t* const		p_sa_mad )
 {
@@ -3954,7 +3968,7 @@ ib_sa_mad_get_payload_ptr(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_node_info_get_local_port_num(
 	IN		const	ib_node_info_t* const		p_ni )
 {
@@ -3985,7 +3999,7 @@ ib_node_info_get_local_port_num(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_node_info_get_vendor_id(
 	IN		const	ib_node_info_t* const		p_ni )
 {
@@ -4134,7 +4148,7 @@ typedef struct _ib_port_info
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_port_state(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4162,7 +4176,7 @@ ib_port_info_get_port_state(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_port_state(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						port_state )
@@ -4194,7 +4208,7 @@ ib_port_info_set_port_state(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_vl_cap(
 	IN const ib_port_info_t* const p_pi)
 {
@@ -4222,7 +4236,7 @@ ib_port_info_get_vl_cap(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_init_type(
 	IN const ib_port_info_t* const p_pi)
 {
@@ -4250,7 +4264,7 @@ ib_port_info_get_init_type(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_op_vls(
 	IN const ib_port_info_t* const p_pi)
 {
@@ -4278,7 +4292,7 @@ ib_port_info_get_op_vls(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_op_vls(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						op_vls )
@@ -4310,7 +4324,7 @@ ib_port_info_set_op_vls(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_state_no_change(
 	IN				ib_port_info_t* const		p_pi )
 {
@@ -4339,7 +4353,7 @@ ib_port_info_set_state_no_change(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_link_speed_sup(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4370,7 +4384,7 @@ ib_port_info_get_link_speed_sup(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_link_speed_sup(
 	IN				uint8_t const				speed,
 	IN				ib_port_info_t*				p_pi )
@@ -4405,7 +4419,7 @@ ib_port_info_set_link_speed_sup(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_port_phys_state(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4436,7 +4450,7 @@ ib_port_info_get_port_phys_state(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_port_phys_state(
 	IN				uint8_t const				phys_state,
 	IN				ib_port_info_t*				p_pi )
@@ -4471,7 +4485,7 @@ ib_port_info_set_port_phys_state(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_link_down_def_state(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4499,7 +4513,7 @@ ib_port_info_get_link_down_def_state(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_link_down_def_state(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						link_dwn_state )
@@ -4531,7 +4545,7 @@ ib_port_info_set_link_down_def_state(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_link_speed_active(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4583,7 +4597,7 @@ ib_port_info_get_link_speed_active(
 * SYNOPSIS
 */
 
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_compute_rate(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4680,7 +4694,7 @@ ib_port_info_compute_rate(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_path_get_ipd(
 	IN				uint8_t						local_link_width_supported,
 	IN				uint8_t						path_rec_rate )
@@ -4751,7 +4765,7 @@ ib_path_get_ipd(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_mtu_cap(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -4778,7 +4792,7 @@ ib_port_info_get_mtu_cap(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_neighbor_mtu(
 	IN const ib_port_info_t* const p_pi )
 {
@@ -4805,7 +4819,7 @@ ib_port_info_get_neighbor_mtu(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_neighbor_mtu(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						mtu )
@@ -4839,7 +4853,7 @@ ib_port_info_set_neighbor_mtu(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_master_smsl(
 	IN const ib_port_info_t* const p_pi )
 {
@@ -4866,7 +4880,7 @@ ib_port_info_get_master_smsl(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_master_smsl(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						smsl )
@@ -4898,7 +4912,7 @@ ib_port_info_set_master_smsl(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_timeout(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						timeout )
@@ -4933,7 +4947,7 @@ ib_port_info_set_timeout(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_client_rereg(
 	IN		ib_port_info_t* const   p_pi,
 	IN		const   uint8_t         client_rereg )
@@ -4968,7 +4982,7 @@ ib_port_info_set_client_rereg(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_timeout(
   IN				ib_port_info_t const*   p_pi )
 {
@@ -4996,7 +5010,7 @@ ib_port_info_get_timeout(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_client_rereg(
   IN				ib_port_info_t const* p_pi )
 {
@@ -5025,7 +5039,7 @@ ib_port_info_get_client_rereg(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_hoq_lifetime(
   IN		ib_port_info_t* const		p_pi,
   IN		const	uint8_t					hoq_life )
@@ -5059,7 +5073,7 @@ ib_port_info_set_hoq_lifetime(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_hoq_lifetime(
   IN		const ib_port_info_t* const		p_pi )
 {
@@ -5089,7 +5103,7 @@ ib_port_info_get_hoq_lifetime(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_vl_stall_count(
   IN		ib_port_info_t* const		p_pi,
   IN		const	uint8_t					vl_stall_count )
@@ -5123,7 +5137,7 @@ ib_port_info_set_vl_stall_count(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_vl_stall_count(
   IN		const ib_port_info_t* const		p_pi )
 {
@@ -5152,7 +5166,7 @@ ib_port_info_get_vl_stall_count(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_lmc(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -5180,7 +5194,7 @@ ib_port_info_get_lmc(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_lmc(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						lmc )
@@ -5213,7 +5227,7 @@ ib_port_info_set_lmc(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_link_speed_enabled(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -5240,7 +5254,7 @@ ib_port_info_get_link_speed_enabled(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_link_speed_enabled(
 	IN				ib_port_info_t* const		p_pi,
 	IN		const	uint8_t						link_speed_enabled )
@@ -5272,7 +5286,7 @@ ib_port_info_set_link_speed_enabled(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_mpb(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -5301,7 +5315,7 @@ ib_port_info_get_mpb(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_mpb(
 	IN				ib_port_info_t*				p_pi,
 	IN				uint8_t						mpb )
@@ -5332,7 +5346,7 @@ ib_port_info_set_mpb(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_local_phy_err_thd(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -5359,7 +5373,7 @@ ib_port_info_get_local_phy_err_thd(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_port_info_get_overrun_err_thd(
 	IN		const	ib_port_info_t* const		p_pi )
 {
@@ -5387,7 +5401,7 @@ ib_port_info_get_overrun_err_thd(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_port_info_set_phy_and_overrun_err_thd(
   IN		ib_port_info_t* const		p_pi,
   IN		uint8_t				phy_threshold,
@@ -5540,7 +5554,7 @@ typedef struct _ib_switch_info_record
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_switch_info_get_state_change(
 	IN		const	ib_switch_info_t* const		p_si )
 {
@@ -5568,7 +5582,7 @@ ib_switch_info_get_state_change(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_switch_info_clear_state_change(
 	IN				ib_switch_info_t* const		p_si )
 {
@@ -5599,7 +5613,7 @@ ib_switch_info_clear_state_change(
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_switch_info_is_enhanced_port0(
 	IN		const	ib_switch_info_t* const		p_si )
 {
@@ -5714,7 +5728,7 @@ typedef struct _ib_multipath_rec_t
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_num_path(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5743,7 +5757,7 @@ ib_multipath_rec_num_path(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_sl(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5772,7 +5786,7 @@ ib_multipath_rec_sl(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_mtu(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5807,7 +5821,7 @@ ib_multipath_rec_mtu(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_mtu_sel(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5840,7 +5854,7 @@ ib_multipath_rec_mtu_sel(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_rate(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5873,7 +5887,7 @@ ib_multipath_rec_rate(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_rate_sel(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5906,7 +5920,7 @@ ib_multipath_rec_rate_sel(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_pkt_life(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -5935,7 +5949,7 @@ ib_multipath_rec_pkt_life(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_multipath_rec_pkt_life_sel(
         IN              const   ib_multipath_rec_t* const            p_rec )
 {
@@ -6052,7 +6066,7 @@ typedef struct _ib_slvl_table_record
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_slvl_table_set(
   IN		ib_slvl_table_t*        p_slvl_tbl,
   IN		uint8_t                 sl_index,
@@ -6102,7 +6116,7 @@ ib_slvl_table_set(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_slvl_table_get(
   IN		const ib_slvl_table_t*        p_slvl_tbl,
   IN		uint8_t                 sl_index )
@@ -6223,7 +6237,7 @@ typedef struct _ib_grh
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_grh_get_ver_class_flow(
 	IN		const	ib_net32_t					ver_class_flow,
 		OUT			uint8_t* const				p_ver,
@@ -6275,7 +6289,7 @@ ib_grh_get_ver_class_flow(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_grh_set_ver_class_flow(
 	IN		const	uint8_t						ver,
 	IN		const	uint8_t						tclass,
@@ -6391,7 +6405,7 @@ typedef struct _ib_member_rec
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_member_get_sl_flow_hop(
 	IN const ib_net32_t sl_flow_hop,
 	OUT uint8_t* const p_sl,
@@ -6442,7 +6456,7 @@ ib_member_get_sl_flow_hop(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_member_set_sl_flow_hop(
 	IN const uint8_t sl,
 	IN const uint32_t flow_label,
@@ -6483,7 +6497,7 @@ ib_member_set_sl_flow_hop(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_member_get_scope_state(
 	IN	const	uint8_t			scope_state,
 	OUT	uint8_t* const			p_scope,
@@ -6527,7 +6541,7 @@ ib_member_get_scope_state(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_member_set_scope_state(
 	IN	const	uint8_t			scope,
 	IN	const	uint8_t			state )
@@ -6566,7 +6580,7 @@ ib_member_set_scope_state(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_member_set_join_state(
 	IN OUT		ib_member_rec_t		*p_mc_rec,
 	IN		const	uint8_t		state )
@@ -6730,7 +6744,7 @@ typedef struct _ib_mad_notice_attr    //
 *
 * SYNOPSIS
 */
-static inline boolean_t
+static inline boolean_t	OSM_API
 ib_notice_is_generic(
   IN		   const	ib_mad_notice_attr_t *p_ntc )
 {
@@ -6757,7 +6771,7 @@ ib_notice_is_generic(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_notice_get_type(
   IN		   const	ib_mad_notice_attr_t *p_ntc )
 {
@@ -6784,7 +6798,7 @@ ib_notice_get_type(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_notice_get_prod_type(
   IN		   const	ib_mad_notice_attr_t *p_ntc )
 {
@@ -6815,7 +6829,7 @@ ib_notice_get_prod_type(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_notice_set_prod_type(
   IN ib_mad_notice_attr_t *p_ntc,
   IN ib_net32_t prod_type_val )
@@ -6848,7 +6862,7 @@ ib_notice_set_prod_type(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_notice_set_prod_type_ho(
   IN ib_mad_notice_attr_t *p_ntc,
   IN uint32_t prod_type_val_ho )
@@ -6882,7 +6896,7 @@ ib_notice_set_prod_type_ho(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_notice_get_vend_id(
   IN		   const	ib_mad_notice_attr_t *p_ntc )
 {
@@ -6913,7 +6927,7 @@ ib_notice_get_vend_id(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_notice_set_vend_id(
   IN ib_mad_notice_attr_t *p_ntc,
   IN ib_net32_t vend_id )
@@ -6946,7 +6960,7 @@ ib_notice_set_vend_id(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_notice_set_vend_id_ho(
   IN ib_mad_notice_attr_t *p_ntc,
   IN uint32_t vend_id_ho )
@@ -6974,12 +6988,12 @@ ib_notice_set_vend_id_ho(
 #include <complib/cl_packon.h>
 typedef struct _ib_inform_info
 {
-  ib_gid_t				gid;
+  ib_gid_t				   gid;
   ib_net16_t				lid_range_begin;
   ib_net16_t				lid_range_end;
   ib_net16_t				reserved1;
-  uint8_t				is_generic;
-  uint8_t				subscribe;
+  uint8_t					is_generic;
+  uint8_t					subscribe;
   ib_net16_t				trap_type;
   union _inform_g_or_v
   {
@@ -7015,7 +7029,7 @@ typedef struct _ib_inform_info
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_inform_info_get_qpn_resp_time(
   IN		   const	ib_net32_t			qpn_resp_time_val,
   OUT			ib_net32_t* const			p_qpn,
@@ -7056,7 +7070,7 @@ ib_inform_info_get_qpn_resp_time(
 *
 * SYNOPSIS
 */
-static inline void
+static inline void	OSM_API
 ib_inform_info_set_qpn(
   IN	ib_inform_info_t 	*p_ii,
   IN	ib_net32_t const	qpn)
@@ -7087,7 +7101,7 @@ ib_inform_info_set_qpn(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_inform_info_get_node_type(
   IN		   const	ib_inform_info_t  *p_inf)
 {
@@ -7120,7 +7134,7 @@ ib_inform_info_get_node_type(
 *
 * SYNOPSIS
 */
-static inline ib_net32_t
+static inline ib_net32_t	OSM_API
 ib_inform_info_get_vend_id(
   IN	const	ib_inform_info_t  *p_inf)
 {
@@ -7271,7 +7285,7 @@ typedef struct _ib_iou_info
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_iou_info_diag_dev_id(
 	IN		const	ib_iou_info_t* const		p_iou_info )
 {
@@ -7300,7 +7314,7 @@ ib_iou_info_diag_dev_id(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ib_iou_info_option_rom(
 	IN		const	ib_iou_info_t*	const	p_iou_info )
 {
@@ -7329,7 +7343,7 @@ ib_iou_info_option_rom(
 *
 * SYNOPSIS
 */
-static inline uint8_t
+static inline uint8_t	OSM_API
 ioc_at_slot(
 	IN		const	ib_iou_info_t*	const	p_iou_info,
 	IN				uint8_t					slot )
@@ -7476,7 +7490,7 @@ typedef struct _ib_ioc_profile
 *********/
 
 
-static inline uint32_t
+static inline uint32_t	OSM_API
 ib_ioc_profile_get_vend_id(
 	IN		const	ib_ioc_profile_t* const		p_ioc_profile )
 {
@@ -7484,7 +7498,7 @@ ib_ioc_profile_get_vend_id(
 }
 
 
-static inline void
+static inline void	OSM_API
 ib_ioc_profile_set_vend_id(
 	IN				ib_ioc_profile_t* const		p_ioc_profile,
 	IN		const	uint32_t					vend_id )
@@ -7552,7 +7566,7 @@ typedef struct _ib_svc_entries
 *********/
 
 
-static inline void
+static inline void	OSM_API
 ib_dm_get_slot_lo_hi(
 	IN		const	ib_net32_t			slot_lo_hi,
 		OUT			uint8_t		*const	p_slot,
@@ -7580,7 +7594,7 @@ typedef struct _ib_ioc_info
 {
 	ib_net64_t				module_guid;
 	ib_net64_t				iou_guid;
-	ib_ioc_profile_t			ioc_profile;
+	ib_ioc_profile_t		ioc_profile;
 	ib_net64_t				access_key;
 	uint16_t				initiators_conf;
 	uint8_t					resv[38];
@@ -7621,8 +7635,8 @@ typedef struct _ib_ioc_info
 #define IB_SIDR_REQ_PDATA_SIZE_VER1			216
 #define IB_SIDR_REP_PDATA_SIZE_VER1			140
 
-#define IB_ARI_SIZE					72	// redefine
-#define IB_APR_INFO_SIZE				72
+#define IB_ARI_SIZE							72		// redefine
+#define IB_APR_INFO_SIZE					72
 
 
 /****d* Access Layer/ib_rej_status_t
@@ -7748,17 +7762,22 @@ typedef uint16_t					ib_sidr_status_t;
  *	The following definitions are shared between the Access Layer and VPD
  */
 
-typedef struct _ib_ca			*ib_ca_handle_t;
-typedef struct _ib_pd			*ib_pd_handle_t;
-typedef struct _ib_rdd			*ib_rdd_handle_t;
-typedef struct _ib_mr			*ib_mr_handle_t;
-typedef struct _ib_mw			*ib_mw_handle_t;
-typedef struct _ib_qp			*ib_qp_handle_t;
-typedef struct _ib_eec			*ib_eec_handle_t;
-typedef struct _ib_cq			*ib_cq_handle_t;
-typedef struct _ib_av			*ib_av_handle_t;
-typedef struct _ib_mcast		*ib_mcast_handle_t;
 
+typedef struct _ib_ca* __ptr64			ib_ca_handle_t;
+typedef struct _ib_pd* __ptr64			ib_pd_handle_t;
+typedef struct _ib_rdd* __ptr64			ib_rdd_handle_t;
+typedef struct _ib_mr* __ptr64			ib_mr_handle_t;
+typedef struct _ib_mw* __ptr64			ib_mw_handle_t;
+typedef struct _ib_qp* __ptr64			ib_qp_handle_t;
+typedef struct _ib_eec* __ptr64       ib_eec_handle_t;
+typedef struct _ib_cq* __ptr64			ib_cq_handle_t;
+typedef struct _ib_av* __ptr64			ib_av_handle_t;
+typedef struct _ib_mcast* __ptr64		ib_mcast_handle_t;
+
+/* Currently for windows branch we use the extended version of ib special verbs struct 
+	in order to be compliant with Infinicon ib_types , later we'll change it to support 
+	OpenSM ib_types.h */
+#ifndef WIN32
 
 /****d* Access Layer/ib_api_status_t
 * NAME
@@ -7832,7 +7851,7 @@ typedef enum _ib_api_status_t
 }	ib_api_status_t;
 /*****/
 
-extern const char* ib_error_str[];
+OSM_EXPORT const char* ib_error_str[];
 
 /****f* IBA Base: Types/ib_get_err_str
 * NAME
@@ -7843,7 +7862,7 @@ extern const char* ib_error_str[];
 *
 * SYNOPSIS
 */
-static inline const char*
+static inline const char*	OSM_API
 ib_get_err_str(
 	IN				ib_api_status_t				status )
 {
@@ -8020,7 +8039,7 @@ typedef enum _ib_async_event_t
 *
 *****/
 
-extern const char* ib_async_event_str[];
+OSM_EXPORT const char* ib_async_event_str[];
 
 /****f* IBA Base: Types/ib_get_async_event_str
 * NAME
@@ -8031,7 +8050,7 @@ extern const char* ib_async_event_str[];
 *
 * SYNOPSIS
 */
-static inline const char*
+static inline const char*	OSM_API
 ib_get_async_event_str(
 	IN				ib_async_event_t			event )
 {
@@ -8311,6 +8330,7 @@ typedef struct _ib_ca_attr
 	uint32_t				vend_id;
 	uint16_t				dev_id;
 	uint16_t				revision;
+	uint64_t				fw_ver;
 
 	/*
 	 * Total size of the ca attributes in bytes
@@ -8353,6 +8373,8 @@ typedef struct _ib_ca_attr
 	uint32_t				max_mcast_grps;
 	uint32_t				max_mcast_qps;
 	uint32_t				max_qps_per_mcast_grp;
+	uint32_t				max_fmr;
+	uint32_t				max_map_per_fmr;
 
 	/*
 	 * local_ack_delay:
@@ -8400,6 +8422,9 @@ typedef struct _ib_ca_attr
 *	revision
 *		Revision ID of this adapter
 *
+*	Fw_ver
+*		Device Firmware version.
+*
 *	size
 *		Total size in bytes for the HCA attributes.  This size includes total
 *		size required for all the variable members of the structure.  If a
@@ -9633,7 +9658,7 @@ typedef enum _ib_wc_status_t
 *		The completed work request was canceled by the user.
 *****/
 
-extern const char* ib_wc_status_str[];
+OSM_EXPORT const char* ib_wc_status_str[];
 
 /****f* IBA Base: Types/ib_get_wc_status_str
 * NAME
@@ -9644,7 +9669,7 @@ extern const char* ib_wc_status_str[];
 *
 * SYNOPSIS
 */
-static inline const char*
+static inline const char*	OSM_API
 ib_get_wc_status_str(
 	IN				ib_wc_status_t				wc_status )
 {
@@ -10300,4 +10325,9 @@ typedef struct _ib_ci_op
 
 END_C_DECLS
 
-#endif // __IB_TYPES_H__
+#endif /* ndef WIN */
+#if defined( __WIN__ )
+    #include <iba/ib_types_extended.h>
+#endif
+
+#endif /* __IB_TYPES_H__ */


From eitan at mellanox.co.il  Sun Sep 17 08:59:02 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:59:02 +0300
Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack :
	osmtest/osmtest.c
Message-ID: <86wt82lc1l.fsf@mtl066.yok.mtl.com>

Hi Hal

Explicit cast required for the win compiler to handle this...

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: osmtest/osmtest.c
===================================================================
--- osmtest/osmtest.c	(revision 9502)
+++ osmtest/osmtest.c	(working copy)
@@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t
   else
   {
     /* Also, this doesn't detect fewer than the correct number of paths being returned */
-    if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) )
+    if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) )
     {
       osm_log( &p_osmt->log, OSM_LOG_ERROR,
                "osmtest_validate_path_data: ERR 0052: "


From eitan at mellanox.co.il  Sun Sep 17 08:59:13 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:59:13 +0300
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
	opensm/osm_subnet.c
Message-ID: <86venmlc1a.fsf@mtl066.yok.mtl.com>

Hi Hal

No need for stdio.h but do need stdlib.h ...
Also map snprintf to _snprintf in windows case

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_subnet.c
===================================================================
--- opensm/osm_subnet.c	(revision 9502)
+++ opensm/osm_subnet.c	(working copy)
@@ -53,6 +53,7 @@
 
 #include <stdlib.h>
 #include <string.h>
+#include <stdio.h>
 #include <complib/cl_debug.h>
 #include <opensm/osm_subnet.h>
 #include <opensm/osm_opensm.h>
@@ -65,7 +66,6 @@
 #include <opensm/osm_node.h>
 #include <opensm/osm_multicast.h>
 #include <opensm/osm_inform.h>
-#include <stdlib.h>
 
 /**********************************************************************
  **********************************************************************/
@@ -659,6 +659,9 @@ __osm_subn_opts_unpack_charp(
   }
 }
 
+#ifdef WIN32
+#define snprintf _snprintf
+#endif
 /**********************************************************************
  **********************************************************************/
 static void


From eitan at mellanox.co.il  Sun Sep 17 08:59:22 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:59:22 +0300
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 opensm/osm_prtn_config.c
Message-ID: <86u036lc11.fsf@mtl066.yok.mtl.com>

Hi Hal

1. Avoid varargs macros not supported by win
2. Some explicit casting required

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_prtn_config.c
===================================================================
--- opensm/osm_prtn_config.c	(revision 9502)
+++ opensm/osm_prtn_config.c	(working copy)
@@ -66,17 +66,6 @@
 #define STRTO_IB_NET64(str, end, base) strtoull(str, end, base)
 #endif
 
-#define  PARSERR(log, lnum, fmt, arg...) { \
-	osm_log(log, OSM_LOG_ERROR, \
-		"PARSE ERROR: line %d: " fmt , (lnum), ##arg ); \
-	fprintf(stderr, \
-		"\nPARSE ERROR: line %d: " fmt "\n", (lnum), ##arg ); \
-}
-
-#define  PARSEWARN(log, lnum, fmt, arg...) \
-	osm_log(log, OSM_LOG_VERBOSE, \
-		"PARSE WARN: line %d: " fmt , (lnum), ##arg )
-
 /*
  */
 struct part_conf {
@@ -112,7 +101,7 @@ static int partition_create(unsigned lin
 
 	if (id) {
 		char *end;
-		pkey = strtoul(id, &end, 0);
+		pkey = (uint16_t)strtoul(id, &end, 0);
 		if (end == id || *end)
 			return -1;
 	} else
@@ -131,11 +120,11 @@ static int partition_create(unsigned lin
 		  conf->sl = OSM_DEFAULT_SL;
 		}
 	}
-	conf->p_prtn->sl = conf->sl;
+	conf->p_prtn->sl = (uint8_t)conf->sl;
 
 	if (conf->is_ipoib)
 		osm_prtn_add_mcgroup(conf->p_log, conf->p_subn, conf->p_prtn,
-			     conf->is_ipoib, conf->rate, conf->mtu);
+			     conf->is_ipoib, (uint8_t)conf->rate, (uint8_t)conf->mtu);
 
 	return 0;
 }
@@ -148,29 +137,33 @@ static int partition_add_flag(unsigned l
 		conf->is_ipoib = 1;
 	} else if (!strncmp(flag, "mtu", len)) {
 		if (!val || (conf->mtu = strtoul(val, NULL, 0)) == 0)
-			PARSEWARN(conf->p_log, lineno,
-				"flag \'mtu\' requires valid value"
-				" - skipped.\n");
+			osm_log(conf->p_log, OSM_LOG_VERBOSE,
+					  "PARSE WARN: line %d: "
+					  "flag \'mtu\' requires valid value"
+					  " - skipped.\n", lineno);
 	} else if (!strncmp(flag, "rate", len)) {
 		if (!val || (conf->rate = strtoul(val, NULL, 0)) == 0)
-			PARSEWARN(conf->p_log, lineno,
-				"flag \'rate\' requires valid value"
-				" - skipped.\n");
+			osm_log(conf->p_log, OSM_LOG_VERBOSE,
+					  "PARSE WARN: line %d: "
+					  "flag \'rate\' requires valid value"
+					  " - skipped.\n", lineno);
 	} else if (!strncmp(flag, "sl", len)) {
 		unsigned sl;
 		char *end;
 
 		if (!val || !*val || (sl = strtoul(val, &end, 0)) > 15 ||
 		    (*end && !isspace(*end)))
-			PARSEWARN(conf->p_log, lineno,
-				"flag \'sl\' requires valid value"
-				" - skipped.\n");
+			osm_log(conf->p_log, OSM_LOG_VERBOSE,
+					  "PARSE WARN: line %d: "
+					  "flag \'sl\' requires valid value"
+					  " - skipped.\n", lineno);
 		else
 			conf->sl = sl;
 	} else {
-		PARSEWARN(conf->p_log, lineno,
-			"unrecognized partition flag \'%s\'"
-			" - ignored.\n", flag);
+			osm_log(conf->p_log, OSM_LOG_VERBOSE,
+					  "PARSE WARN: line %d: "
+					  "unrecognized partition flag \'%s\'"
+					  " - ignored.\n", lineno, flag);
 	}
 	return 0;
 }
@@ -189,9 +182,10 @@ static int partition_add_port(unsigned l
 		if (!strncmp(flag, "full", strlen(flag)))
 			full = TRUE;
 		else if (strncmp(flag, "limited", strlen(flag))) {
-			PARSEWARN(conf->p_log, lineno,
-				"unrecognized port flag \'%s\'." 
-				" Assume \'limited\'\n", flag);
+			osm_log(conf->p_log, OSM_LOG_VERBOSE,
+					  "PARSE WARN: line %d: "
+					  "unrecognized port flag \'%s\'." 
+					  " Assume \'limited\'\n", lineno, flag);
 		}
 	}
 
@@ -305,8 +299,9 @@ static int parse_part_conf(struct part_c
 
 	q = strchr(p, ':');
 	if (!q) {
-		PARSERR(conf->p_log, lineno,
-			"no partition definition found\n");
+		osm_log(conf->p_log, OSM_LOG_ERROR, 
+				  "PARSE ERROR: line %d: "
+				  "no partition definition found\n", lineno);
 		return -1;
 	}
 
@@ -330,8 +325,9 @@ static int parse_part_conf(struct part_c
 			*q++ = '\0';
 		ret = parse_name_token(p, &flag, &flval);
 		if (!flag) {
-			PARSERR(conf->p_log, lineno,
-				"bad partition flags\n");
+			osm_log(conf->p_log, OSM_LOG_ERROR, 
+					  "PARSE ERROR: line %d: "
+					  "bad partition flags\n",lineno);
 			return -1;
 		}
 		p += ret;
@@ -341,8 +337,9 @@ static int parse_part_conf(struct part_c
 
 	if (p != str || (partition_create(lineno, conf,
 					name, id, flag, flval) < 0)) {
-		PARSERR(conf->p_log, lineno,
-			"bad partition definition\n");
+		osm_log(conf->p_log, OSM_LOG_ERROR, 
+				  "PARSE ERROR: line %d: "	
+				  "bad partition definition\n", lineno);
 		return -1;
 	}
 
@@ -354,8 +351,9 @@ static int parse_part_conf(struct part_c
 			*q++ = '\0';
 		ret = parse_name_token(p, &name, &flag);
 		if (partition_add_port(lineno, conf, name, flag) < 0) {
-			PARSERR(conf->p_log, lineno,
-				"bad PortGUID\n");
+			osm_log(conf->p_log, OSM_LOG_ERROR, 
+					  "PARSE ERROR: line %d: "
+					  "bad PortGUID\n", lineno);
 			return -1;
 		}
 		p += ret;
@@ -404,8 +402,9 @@ int osm_prtn_config_parse_file(osm_log_t
 
 			if (!conf &&
 				!(conf = new_part_conf(p_log, p_subn))) {
-				PARSERR(p_log, lineno,
-					"internal: cannot create config.\n");
+				osm_log(conf->p_log, OSM_LOG_ERROR, 
+						  "PARSE ERROR: line %d: "
+						  "internal: cannot create config.\n", lineno);
 				break;
 			}
 

From eitan at mellanox.co.il  Sun Sep 17 08:59:32 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:59:32 +0300
Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack :
	opensm/osm_ucast_file.c
Message-ID: <86sliqlc0r.fsf@mtl066.yok.mtl.com>

Hi Hal

1. Avoid varargs macros not supported by win
2. Some explicit casting required
3. Use stroull and not stroll

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_ucast_file.c
===================================================================
--- opensm/osm_ucast_file.c	(revision 9502)
+++ opensm/osm_ucast_file.c	(working copy)
@@ -52,18 +52,11 @@
 
 #include <iba/ib_types.h>
 #include <complib/cl_qmap.h>
+#include <complib/cl_debug.h>
 #include <opensm/osm_opensm.h>
 #include <opensm/osm_switch.h>
 #include <opensm/osm_log.h>
 
-#define PARSEERR(log, file_name, lineno, fmt, arg...) \
-		osm_log(log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u: " fmt , \
-			file_name, lineno, ##arg )
-
-#define PARSEWARN(log, file_name, lineno, fmt, arg...) \
-		osm_log(log, OSM_LOG_VERBOSE, "PARSE WARN: %s:%u: " fmt , \
-			file_name, lineno, ##arg )
-
 static uint16_t remap_lid(osm_opensm_t *p_osm, uint16_t lid, ib_net64_t guid)
 {
 	osm_port_t *p_port;
@@ -72,10 +65,11 @@ static uint16_t remap_lid(osm_opensm_t *
 
 	p_port = (osm_port_t *)cl_qmap_get(&p_osm->subn.port_guid_tbl, guid);
 	if (!p_port ||
-	    p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) {
+	    p_port == (osm_port_t *)cl_qmap_end(&p_osm->subn.port_guid_tbl)) 
+	{
 		osm_log(&p_osm->log, OSM_LOG_VERBOSE,
-			"remap_lid: cannot find port guid 0x%016" PRIx64
-			" , will use the same lid\n", cl_ntoh64(guid));
+				  "remap_lid: cannot find port guid 0x%016" PRIx64
+				  " , will use the same lid\n", cl_ntoh64(guid));
 		return lid;
 	}
 
@@ -182,19 +176,21 @@ static int do_ucast_file_load(void *cont
 				"skipping parsing. Using default routing algorithm\n");
 
 		}
+
 		else if (!strncmp(p, "Unicast lids", 12)) {
 			q = strstr(p, " guid 0x");
 			if (!q) {
-				PARSEERR(&p_osm->log, file_name, lineno,
-					 "cannot parse switch definition\n");
+				osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" 
+						  " cannot parse switch definition\n", 
+						  file_name, lineno);
 				return -1;
 			}
 			p = q + 6;
-			sw_guid = strtoll(p, &q, 16);
+			sw_guid = strtoull(p, &q, 16);
 			if (q && !isspace(*q)) {
-				PARSEERR(&p_osm->log, file_name, lineno,
-					 "cannot parse switch guid: \'%s\'\n",
-					 p);
+				osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" 
+						  "cannot parse switch guid: \'%s\'\n",
+						  file_name, lineno, p);
 				return -1;
 			}
 			sw_guid = cl_hton64(sw_guid);
@@ -212,40 +208,39 @@ static int do_ucast_file_load(void *cont
 			}
 		}
 		else if (p_sw && !strncmp(p, "0x", 2)) {
-			lid = strtoul(p, &q, 16);
+			lid = (uint16_t)strtoul(p, &q, 16);
 			if (q && !isspace(*q)) {
-				PARSEERR(&p_osm->log, file_name, lineno,
-					 "cannot parse lid: \'%s\'\n", p);
+				osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:"
+						  "cannot parse lid: \'%s\'\n", file_name, lineno, p);
 				return -1;
 			}
 			p = q;
 			while (isspace(*p))
 				p++;
-			port_num = strtoul(p, &q, 10);
+			port_num = (uint8_t)strtoul(p, &q, 10);
 			if (q && !isspace(*q)) {
-				PARSEERR(&p_osm->log, file_name, lineno,
-					 "cannot parse port: \'%s\'\n", p);
+				osm_log(&p_osm->log, OSM_LOG_ERROR, "PARSE ERROR: %s:%u:" 
+						  "cannot parse port: \'%s\'\n", file_name, lineno, p);
 				return -1;
 			}
 			p = q;
 			/* additionally try to exract guid */
 			q = strstr(p, " portguid 0x");
 			if (!q) {
-				PARSEWARN(&p_osm->log, file_name, lineno,
-					  "cannot find port guid "
-					  "(maybe broken dump): \'%s\'\n", p);
+				osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" 
+						  "cannot find port guid "
+						  "(maybe broken dump): \'%s\'\n", file_name, lineno, p);
 				port_guid = 0;
 			}
 			else
 			{
 				p = q + 10;
-				port_guid = strtoll(p, &q, 16);
+				port_guid = strtoull(p, &q, 16);
 				if (!q && !isspace(*q) && *q != ':') {
-					PARSEWARN(&p_osm->log, file_name,
-						  lineno,
-						  "cannot parse port guid "
-						  "(maybe broken dump): "
-						  "\'%s\'\n", p);
+					osm_log(&p_osm->log, OSM_LOG_VERBOSE, "PARSE WARNING: %s:%u:" 
+							  "cannot parse port guid "
+							  "(maybe broken dump): "
+							  "\'%s\'\n", file_name, lineno, p);
 					port_guid = 0;
 				}
 			}


From eitan at mellanox.co.il  Sun Sep 17 08:59:40 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:59:40 +0300
Subject: [openib-general] [PATCH 8/13] osm: port to WinIB stack :
	opensm/osm_opensm.c
Message-ID: <86r6yalc0j.fsf@mtl066.yok.mtl.com>

Hi Hal

Explicit NULL in empty array initializer

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_opensm.c
===================================================================
--- opensm/osm_opensm.c	(revision 9502)
+++ opensm/osm_opensm.c	(working copy)
@@ -80,7 +80,7 @@ const static struct routing_engine_modul
 	{"null", NULL},
 	{"updn", osm_ucast_updn_setup },
 	{"file", osm_ucast_file_setup },
-	{}
+	{NULL, NULL}
 };
 
 static int setup_routing_engine(osm_opensm_t *p_osm, const char *name)


From eitan at mellanox.co.il  Sun Sep 17 08:59:48 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 18:59:48 +0300
Subject: [openib-general] [PATCH 9/13] osm: port to WinIB stack :
	opensm/osm_prtn.c
Message-ID: <86psdulc0b.fsf@mtl066.yok.mtl.com>

Hi Hal

Required cl_debug.h for PRIx64
Also map snprintf to _snprintf and stat to _stat

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_prtn.c
===================================================================
--- opensm/osm_prtn.c	(revision 9502)
+++ opensm/osm_prtn.c	(working copy)
@@ -53,7 +53,7 @@
 #include <string.h>
 #include <stdio.h>
 #include <sys/stat.h>
-
+#include <complib/cl_debug.h>
 #include <iba/ib_types.h>
 #include <opensm/osm_opensm.h>
 #include <opensm/osm_partition.h>
@@ -61,6 +61,10 @@
 #include <opensm/osm_sa.h>
 #include <opensm/osm_multicast.h>
 
+#ifdef WIN32
+#define snprintf _snprintf
+#define stat _stat
+#endif
 
 extern int osm_prtn_config_parse_file(osm_log_t * const p_log,
 				      osm_subn_t * const p_subn,


From eitan at mellanox.co.il  Sun Sep 17 09:00:01 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 19:00:01 +0300
Subject: [openib-general] [PATCH 10/13] osm: port to WinIB stack :
	opensm/osm_pkey.c
Message-ID: <86odtelbzy.fsf@mtl066.yok.mtl.com>

Hi Hal

Some explicit casting required and also pkey blocks are only uint16_t .

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_pkey.c
===================================================================
--- opensm/osm_pkey.c	(revision 9502)
+++ opensm/osm_pkey.c	(working copy)
@@ -116,7 +116,7 @@ void osm_pkey_tbl_init_new_blocks(
   IN const osm_pkey_tbl_t *p_pkey_tbl)
 {
   ib_pkey_table_t *p_block;
-  int16_t b, num_blocks = cl_ptr_vector_get_size(&p_pkey_tbl->new_blocks);
+  size_t b, num_blocks = cl_ptr_vector_get_size(&p_pkey_tbl->new_blocks);
 
   for (b = 0; b < num_blocks; b++)
     if ((p_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b)))
@@ -279,17 +279,17 @@ ib_api_status_t
 osm_pkey_tbl_get_block_and_idx(
   IN osm_pkey_tbl_t *p_pkey_tbl,
   IN uint16_t	  *p_pkey,
-  OUT uint32_t	  *p_block_idx,
+  OUT uint16_t	  *p_block_idx,
   OUT uint8_t	  *p_pkey_idx)
 {
-  uint32_t	  num_of_blocks;
-  uint32_t	  block_index;
+  uint16_t	  num_of_blocks;
+  uint16_t	  block_index;
   ib_pkey_table_t *block;
 
   CL_ASSERT( p_block_idx != NULL );
   CL_ASSERT( p_pkey_idx != NULL );
  
-  num_of_blocks = cl_ptr_vector_get_size( &p_pkey_tbl->blocks);
+  num_of_blocks = (uint16_t)cl_ptr_vector_get_size( &p_pkey_tbl->blocks);
   for (block_index = 0; block_index < num_of_blocks; block_index++)
   {
     block = osm_pkey_tbl_block_get(p_pkey_tbl, block_index);


From eitan at mellanox.co.il  Sun Sep 17 09:00:12 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 19:00:12 +0300
Subject: [openib-general] [PATCH 11/13] osm: port to WinIB stack :
	opensm/osm_log.c
Message-ID: <86mz8ylbzn.fsf@mtl066.yok.mtl.com>

Hi Hal

1. function mappings for stat, fstat and fileno
2. Currently no imp for log file truncation 

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_log.c
===================================================================
--- opensm/osm_log.c	(revision 9502)
+++ opensm/osm_log.c	(working copy)
@@ -60,6 +60,8 @@
 #include <sys/stat.h>
 #include <errno.h>
 
+static int log_exit_count = 0;
+
 #ifndef WIN32
 #include <sys/time.h>
 #include <unistd.h>
@@ -79,9 +81,6 @@ static char *month_str[] = {
   "Nov",
   "Dec"
 };
-#endif /* ndef WIN32 */
-
-static int log_exit_count = 0;
 
 static void truncate_log_file(osm_log_t* const p_log)
 {
@@ -95,6 +94,19 @@ static void truncate_log_file(osm_log_t*
 	p_log->count = 0;
 }
 
+#else /* Windows */
+
+#define fstat _fstat
+#define stat _stat
+#define fileno _fileno
+static void truncate_log_file(osm_log_t* const p_log)
+{
+	fprintf(stderr, "truncate_log_file: cannot truncate on windows system (yet)\n");
+}
+
+#endif /* ndef WIN32 */
+
+
 void
 osm_log(
   IN osm_log_t* const p_log,


From eitan at mellanox.co.il  Sun Sep 17 09:00:33 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 19:00:33 +0300
Subject: [openib-general] [PATCH 12/13] osm: port to WinIB stack :
	opensm/osm_qos.c
Message-ID: <86lkoilbz2.fsf@mtl066.yok.mtl.com>

Hi Hal

Port num is uint8_t (avoid casting by using correct size field).
Added some explicit casts

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_qos.c
===================================================================
--- opensm/osm_qos.c	(revision 9502)
+++ opensm/osm_qos.c	(working copy)
@@ -70,7 +70,7 @@ static void qos_build_config(struct qos_
  */
 static ib_api_status_t vlarb_update_table_block(osm_req_t * p_req,
 						osm_physp_t * p,
-						unsigned port_num,
+						uint8_t port_num,
 						const ib_vl_arb_table_t *table_block,
 						unsigned block_length,
 						unsigned block_num)
@@ -80,7 +80,7 @@ static ib_api_status_t vlarb_update_tabl
 	uint32_t attr_mod;
 	ib_port_info_t *p_pi;
 	unsigned vl_mask;
-	int i;
+	unsigned int i;
 
 	if (!(p_pi = osm_physp_get_port_info_ptr(p)))
 		return IB_ERROR;
@@ -110,7 +110,7 @@ static ib_api_status_t vlarb_update_tabl
 }
 
 static ib_api_status_t vlarb_update(osm_req_t * p_req,
-				    osm_physp_t * p, unsigned port_num,
+				    osm_physp_t * p, uint8_t port_num,
 				    const struct qos_config *qcfg)
 {
 	ib_api_status_t status = IB_SUCCESS;
@@ -198,11 +198,11 @@ static ib_api_status_t sl2vl_update_tabl
 }
 
 static ib_api_status_t sl2vl_update(osm_req_t * p_req, osm_port_t * p_port,
-				    osm_physp_t * p, unsigned port_num,
+				    osm_physp_t * p, uint8_t port_num,
 				    const struct qos_config *qcfg)
 {
 	ib_api_status_t status;
-	unsigned i, num_ports;
+	uint8_t i, num_ports;
 	ib_port_info_t *p_pi = osm_physp_get_port_info_ptr(p);
 	osm_physp_t *p_physp;
 
@@ -273,7 +273,7 @@ static ib_api_status_t vl_high_limit_upd
 
 static ib_api_status_t qos_physp_setup(osm_log_t * p_log, osm_req_t * p_req,
 				       osm_port_t * p_port, osm_physp_t * p,
-				       unsigned port_num,
+				       uint8_t port_num,
 				       const struct qos_config *qcfg)
 {
 	ib_api_status_t status;
@@ -329,7 +329,7 @@ osm_signal_t osm_qos_setup(osm_opensm_t 
 	osm_physp_t *p_physp;
 	uint8_t node_type;
 	ib_api_status_t status;
-	uint32_t i;
+	uint8_t i;
 
 	if (p_osm->subn.opt.no_qos)
 		return OSM_SIGNAL_DONE;
@@ -411,7 +411,7 @@ static int parse_vlarb_entry(char *str, 
 	p += parse_one_unsigned(p, ':', &val);
 	e->vl = val % 15;
 	p += parse_one_unsigned(p, ',', &val);
-	e->weight = val;
+	e->weight = (uint8_t)val;
 	return p - str;
 }
 
@@ -434,7 +434,7 @@ static void qos_build_config(struct qos_
 	memset(cfg, 0, sizeof(*cfg));
 
 	cfg->max_vls = opt->max_vls > 0 ? opt->max_vls : dflt->max_vls;
-	cfg->vl_high_limit = opt->high_limit;
+	cfg->vl_high_limit = (uint8_t)opt->high_limit;
 
 	p = opt->vlarb_high ? opt->vlarb_high : dflt->vlarb_high;
 	for (i = 0; i < 2 * IB_NUM_VL_ARB_ELEMENTS_IN_BLOCK; i++) {


From eitan at mellanox.co.il  Sun Sep 17 09:00:49 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 19:00:49 +0300
Subject: [openib-general] [PATCH 13/13] osm: port to WinIB stack :
	opensm/osm_pkey_mgr.c
Message-ID: <86k642lbym.fsf@mtl066.yok.mtl.com>

Hi Hal

Avoid using array initialization statements which do not compile on win.

Thanks

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: opensm/osm_pkey_mgr.c
===================================================================
--- opensm/osm_pkey_mgr.c	(revision 9502)
+++ opensm/osm_pkey_mgr.c	(working copy)
@@ -67,7 +67,7 @@
   a different place for switch external ports (SwitchInfo) and the
   rest of the ports (NodeInfo).
 */
-static int 
+static uint16_t
 pkey_mgr_get_physp_max_blocks(
   IN const osm_subn_t *p_subn,
   IN const osm_physp_t *p_physp )
@@ -132,8 +132,8 @@ pkey_mgr_process_physical_port(
     CL_ASSERT( ib_pkey_get_base( *p_orig_pkey ) == ib_pkey_get_base( pkey ) );
     p_pending->is_new = FALSE;
     if (osm_pkey_tbl_get_block_and_idx(
-	  p_pkey_tbl, p_orig_pkey,
-	  &p_pending->block, &p_pending->index ) != IB_SUCCESS)
+			  p_pkey_tbl, p_orig_pkey,
+			  &p_pending->block, &p_pending->index ) != IB_SUCCESS)
     {
       osm_log( p_log, OSM_LOG_ERROR,
 	       "pkey_mgr_process_physical_port: ERR 0503: "
@@ -276,7 +276,8 @@ static boolean_t pkey_mgr_update_port(
   boolean_t ret_val = FALSE;
   osm_pending_pkey_t *p_pending;
   boolean_t found;
-  ib_pkey_table_t empty_block = {.pkey_entry = {0}, };
+  ib_pkey_table_t empty_block;
+  memset(&empty_block, 0, sizeof(ib_pkey_table_t));
 
   p_physp = osm_port_get_default_phys_ptr( p_port );
   if ( !osm_physp_is_valid( p_physp ) )
@@ -403,7 +404,8 @@ pkey_mgr_update_peer_port(
   uint16_t peer_max_blocks;
   ib_api_status_t status = IB_SUCCESS;
   boolean_t ret_val = FALSE;
-  ib_pkey_table_t empty_block = {.pkey_entry = {0}, };
+  ib_pkey_table_t empty_block;
+  memset(&empty_block, 0, sizeof(ib_pkey_table_t));
 
   p_physp = osm_port_get_default_phys_ptr( p_port );
   if (!osm_physp_is_valid( p_physp ))


From eitan at mellanox.co.il  Sun Sep 17 09:22:33 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 17 Sep 2006 19:22:33 +0300
Subject: [openib-general] [PATCH 0/13] osm: port to WinIB stack
Message-ID: <863baq5upi.fsf@mtl066.yok.mtl.com>

Hi Hal

The following series of 13 patches are required for porting of 
the trunk OpenSM code (based on 9502) to WinIB.

I have intentionally break the patch by file to ease the review.

Most changes are:
1. casting from some int into exact uintXX_t
2. Avoiding macros with varargs (windows do not support that) 
3. Mapping snprintf to _snprintf, stat to _stat, etc
4. Missing include for cl_debug required for PRIx64 def
5. The osm_log changes for truncating are not supported yet 
6. ib_types: add a macro for OSM_API required for windows to
	declare the API as __cdecl

These patches are for the trunk only.

Thanks

Eitan


From mst at mellanox.co.il  Sun Sep 17 10:30:28 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 20:30:28 +0300
Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack :
	osmtest/osmtest.c
In-Reply-To: <86wt82lc1l.fsf@mtl066.yok.mtl.com>
References: <86wt82lc1l.fsf@mtl066.yok.mtl.com>
Message-ID: <20060917173028.GA32526@mellanox.co.il>

Quoting r. Eitan Zahavi <eitan at mellanox.co.il>:
> Subject: [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c
> 
> Hi Hal
> 
> Explicit cast required for the win compiler to handle this...
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> 
> Index: osmtest/osmtest.c
> ===================================================================
> --- osmtest/osmtest.c	(revision 9502)
> +++ osmtest/osmtest.c	(working copy)
> @@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t
>    else
>    {
>      /* Also, this doesn't detect fewer than the correct number of paths being returned */
> -    if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) )
> +    if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) )
>      {
>        osm_log( &p_osmt->log, OSM_LOG_ERROR,
>                 "osmtest_validate_path_data: ERR 0052: "
> 

Integer casts are ugly, and can mask real errors.
All you want is for the math result to be unsigned, so
 1u << (2*lmc)
would be cleaner with the same effect, I think.

-- 
MST


From mst at mellanox.co.il  Sun Sep 17 10:34:08 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 20:34:08 +0300
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 opensm/osm_prtn_config.c
In-Reply-To: <86u036lc11.fsf@mtl066.yok.mtl.com>
References: <86u036lc11.fsf@mtl066.yok.mtl.com>
Message-ID: <20060917173408.GB32526@mellanox.co.il>

Quoting r. Eitan Zahavi <eitan at mellanox.co.il>:
> @@ -112,7 +101,7 @@ static int partition_create(unsigned lin
>  
>  	if (id) {
>  		char *end;
> -		pkey = strtoul(id, &end, 0);
> +		pkey = (uint16_t)strtoul(id, &end, 0);
>  		if (end == id || *end)
>  			return -1;
>  	} else

would it make sense to range-check the value before casting it?

-- 
MST


From mst at mellanox.co.il  Sun Sep 17 10:35:18 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 20:35:18 +0300
Subject: [openib-general] [PATCH 7/13] osm: port to WinIB stack :
 opensm/osm_ucast_file.c
In-Reply-To: <86sliqlc0r.fsf@mtl066.yok.mtl.com>
References: <86sliqlc0r.fsf@mtl066.yok.mtl.com>
Message-ID: <20060917173518.GC32526@mellanox.co.il>

Quoting r. Eitan Zahavi <eitan at mellanox.co.il>:
>  				p++;
> -			port_num = strtoul(p, &q, 10);
> +			port_num = (uint8_t)strtoul(p, &q, 10);
>  			if (q && !isspace(*q)) {

Would it make sense to range-check the value before casting it away?
-- 
MST


From eitan at mellanox.co.il  Sun Sep 17 11:50:21 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Sun, 17 Sep 2006 21:50:21 +0300
Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack :
 osmtest/osmtest.c
In-Reply-To: <20060917173028.GA32526@mellanox.co.il>
References: <86wt82lc1l.fsf@mtl066.yok.mtl.com>
	<20060917173028.GA32526@mellanox.co.il>
Message-ID: <450D98ED.3000707@mellanox.co.il>

Hi Michael,

In general I agree we could make the code a little more safe by checking 
castings.
But in many of the cases (not the ones with user input - 
strtoul/strtoull) it is not required as the values are limited by the IB 
arch.

Anyway, the patch I am sending is for WinIB migration. Just doing the 
explicit cast does not make things any worst.
We could take the task of cleaning these integer casts (like I did in 
osm_pkey.c/h) but this is another patch.

EZ

Michael S. Tsirkin wrote:

>Quoting r. Eitan Zahavi <eitan at mellanox.co.il>:
>  
>
>>Subject: [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c
>>
>>Hi Hal
>>
>>Explicit cast required for the win compiler to handle this...
>>
>>Thanks
>>
>>Eitan
>>
>>Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
>>
>>Index: osmtest/osmtest.c
>>===================================================================
>>--- osmtest/osmtest.c	(revision 9502)
>>+++ osmtest/osmtest.c	(working copy)
>>@@ -3281,7 +3281,7 @@ osmtest_validate_path_data( IN osmtest_t
>>   else
>>   {
>>     /* Also, this doesn't detect fewer than the correct number of paths being returned */
>>-    if ( p_path->count >= ( 1 << lmc ) * ( 1 << lmc ) )
>>+    if ( p_path->count >= (uint32_t)( 1 << (2*lmc)) )
>>     {
>>       osm_log( &p_osmt->log, OSM_LOG_ERROR,
>>                "osmtest_validate_path_data: ERR 0052: "
>>
>>    
>>
>
>Integer casts are ugly, and can mask real errors.
>All you want is for the math result to be unsigned, so
> 1u << (2*lmc)
>would be cleaner with the same effect, I think.
>
>  
>


From mst at mellanox.co.il  Sun Sep 17 11:55:43 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 21:55:43 +0300
Subject: [openib-general] [PATCH 4/13] osm: port to WinIB stack :
 osmtest/osmtest.c
In-Reply-To: <450D98ED.3000707@mellanox.co.il>
References: <450D98ED.3000707@mellanox.co.il>
Message-ID: <20060917185543.GD32526@mellanox.co.il>

Quoting r. Eitan Zahavi <eitan at mellanox.co.il>:
> Subject: Re: [openib-general] [PATCH 4/13] osm: port to WinIB stack : osmtest/osmtest.c
> 
> Hi Michael,
> 
> In general I agree we could make the code a little more safe by checking 
> castings.
> But in many of the cases (not the ones with user input - 
> strtoul/strtoull) it is not required as the values are limited by the IB 
> arch.
> 
> Anyway, the patch I am sending is for WinIB migration. Just doing the 
> explicit cast does not make things any worst.
> We could take the task of cleaning these integer casts (like I did in 
> osm_pkey.c/h) but this is another patch.
> 
> EZ

I agree with that. My point was VC++ was catching some potential errors
here so need to be careful not to through that away.

-- 
MST


From bgreen at nas.nasa.gov  Sun Sep 17 12:59:42 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Sun, 17 Sep 2006 12:59:42 -0700
Subject: [openib-general] patch trouble
In-Reply-To: Your message of "Sat, 16 Sep 2006 20:56:28 +0300."
	<20060916175628.GB22267@mellanox.co.il>
Message-ID: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov>

"Michael S. Tsirkin" writes:
> Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > Subject: patch trouble
> > 
> > Hello,
> > Many of the patches in subversion fail to have an effect when I apply them to a kernel,
> > because they create headers in 'drivers/infiniband/include' which depend on being inclu
> ded
> > before the like-named headers in the toplevel 'include'.  Is there a step I am missing 
> to
> > make the headers in 'drivers/infiniband/include' get chosen for inclusion first?
> 
> Note that backport patches are intended to be applied in an out-of-kernel
> fashion - they are not changing the kernel at all.
> 
> So you build as an out-of-tree driver, and dd something like this to make
> command line:
> 
>                 LINUXINCLUDE='-I$(CWD)/include \
>                 -I$(CWD)/drivers/infiniband/include \
>                 -Iinclude \
>                 $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include) \
>                 -include include/linux/autoconf.h \
>                 -include $(CWD)/include/linux/autoconf.h \
>                 ' \
> 
> You can find an example here
> https://openib.org/svn/gen2/trunk/ofed/openib/scripts/Makefile
> 
> BTW, Mellanox is not actively supporting backport patches on the svn trunk.
> If you want code that works on something other than 2.6.17,
> I suggest you pull backports for the ofed branch (forked from
> 2.6.18-rc6) from ofed_1_1 tree by pulling
> git://www.mellanox.co.il/~git/infiniband ofed_1_1
> and looking in ofed_scripts directory.
> 

Thanks.  I am looking at the git repository, and I see a number of patches in
'kernel_patches/fixes' which are apparently applied before the kernel patches under
'kernel_patches/backport'.  I also see the discrepancies between the patches in git and
svn.  I am currenly putting together a gentoo overlay (a series of gentoo installation
scripts) for openib.  Since there are no source tar files available for download, I am
downloading the code from subversion - I have already done this for the 1.0 subversion
branch, and mvapich2 from the 1.1 branch.  My interest in the 2.6.12 kernel comes from a
need to evaluate the lustre filesystem (production version), which has support for the
2.6.12 vanilla kernel.

Is there a great discrepancy between the git repository and the svn repository?
If I am downloading the kernel modules from subversion, should I still use the patchset
from the git repository?   What about putting a source tar file for openib up for
download?  There is currently only a source tarball for libibverbs, while ofed is too
RPM-centric.

Thanks,

-bryan


From mst at mellanox.co.il  Sun Sep 17 13:31:53 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 23:31:53 +0300
Subject: [openib-general] patch trouble
In-Reply-To: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov>
References: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov>
Message-ID: <20060917203153.GG32526@mellanox.co.il>

Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> Is there a great discrepancy between the git repository and the svn
> repository?  If I am downloading the kernel modules from subversion, should I
> still use the patchset from the git repository? 

*Please* do not use svn trunk code for production.
You want either kernel.org code or the OFED git repository for everything.
kernel code in subversion is being deprecated.

> What about putting a source
> tar file for openib up for download?

Putting anything up for dowload on openib site is very hard -
we mostly stick binary files in svn.

> There is currently only a source tarball
> for libibverbs, while ofed is too RPM-centric.

Not really. Please try the following:

Get the ofed tarball here
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
and unpack it.
Take this file: SOURCES/openib-1.1.tgz
That's all of subversion + git all nicely packed up.

You can run configure and make there and it mostly works as expected.

There's also install.sh script that wraps these two and also
adds some convenient softlinks and such goodies.

Let me know how it goes. BTW, which distro are you using?

-- 
MST


From mst at mellanox.co.il  Sun Sep 17 13:35:58 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 17 Sep 2006 23:35:58 +0300
Subject: [openib-general] patch trouble
In-Reply-To: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov>
References: <200609171959.k8HJxgUT005451@ece06.nas.nasa.gov>
Message-ID: <20060917203558.GH32526@mellanox.co.il>

Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> Subject: Re: patch trouble
> 
> "Michael S. Tsirkin" writes:
> > Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > > Subject: patch trouble
> > > 
> > > Hello,
> > > Many of the patches in subversion fail to have an effect when I apply them to a kernel,
> > > because they create headers in 'drivers/infiniband/include' which depend on being inclu
> > ded
> > > before the like-named headers in the toplevel 'include'.  Is there a step I am missing 
> > to
> > > make the headers in 'drivers/infiniband/include' get chosen for inclusion first?
> > 
> > Note that backport patches are intended to be applied in an out-of-kernel
> > fashion - they are not changing the kernel at all.
> > 
> > So you build as an out-of-tree driver, and dd something like this to make
> > command line:
> > 
> >                 LINUXINCLUDE='-I$(CWD)/include \
> >                 -I$(CWD)/drivers/infiniband/include \
> >                 -Iinclude \
> >                 $$(if $$(KBUILD_SRC),-Iinclude2 -I$$(srctree)/include) \
> >                 -include include/linux/autoconf.h \
> >                 -include $(CWD)/include/linux/autoconf.h \
> >                 ' \
> > 
> > You can find an example here
> > https://openib.org/svn/gen2/trunk/ofed/openib/scripts/Makefile
> > 
> > BTW, Mellanox is not actively supporting backport patches on the svn trunk.
> > If you want code that works on something other than 2.6.17,
> > I suggest you pull backports for the ofed branch (forked from
> > 2.6.18-rc6) from ofed_1_1 tree by pulling
> > git://www.mellanox.co.il/~git/infiniband ofed_1_1
> > and looking in ofed_scripts directory.
> > 
> 
> Thanks.  I am looking at the git repository, and I see a number of patches in
> 'kernel_patches/fixes' which are apparently applied before the kernel patches under
> 'kernel_patches/backport'.

Right. These are things that will be going into 2.6.19 but
that we decided should be in OFED.

> I also see the discrepancies between the patches in git and
> svn.

kernel code in svn trunk is deprecated - kernel code needs to sync
with linus and doing that from svn adds too much overhead.
In particular we stopped updating the backport patches for svn.

-- 
MST


From moshek at voltaire.com  Sun Sep 17 23:09:07 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Mon, 18 Sep 2006 09:09:07 +0300
Subject: [openib-general] Any chance to get 32-Bit libraries on SLES9
 x86_64?
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85A5@taurus.voltaire.com>

I had the other problem (trying to find the 64-bit rpm)
 
In sles9 sysfsutils is part of the udev rpm.
 
Therefore I think that you may  try udev...rpm  for sysfsutils 32-bit
version  
and udev-64bit...rpm  for sysfsutils 64-bit version 
 
after install the 32 bit libraries are located on /usr/lib
and the 64 bit libraries are located under /usr/lib64
 
Moshe

____________________________________________________________

Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)

 
Voltaire - The Grid Backbone

 
 www.voltaire.com <http://www.voltaire.com/> 

<mailto:g at voltaire.com> 

  
	-----Original Message-----
	From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Bub Thomas
	Sent: Friday, September 15, 2006 9:24 AM
	To: openib-general at openib.org; Bub Thomas
	Subject: [openib-general] Any chance to get 32-Bit libraries on
SLES9 x86_64?
	
	
	Is there any chance/trick to get 32-Bit Libraries build and
usable on SLES9 x86_64?

	When I installed OFED-1.1-rc4 I get:

	 
	WARNING: sysfsutils 32-bit version is required to build 32-bit
libibverbs package.

	WARNING: Skiping build of 32-bit libraries.

	I googled around and didn't find any sysfsutils 32-bit for
SLES9.

	I now that tit is working under SLES10 b  ut our customer base
is on SLES9 and very conservative when it comes down to using the latest
and greates Os/distribution.

	Thomas

	
	............................................................
	Thomas Bub
	Grass Valley Germany GmbH
	Brunnenweg 9
	64331 Weiterstadt, Germany
	Tel: +49 6150 104 147
	Fax: +49 6150 104 656
	Email: Thomas.Bub at thomson.net <mailto:Thomas.Bub at thomson.net> 
	www.GrassValley.com <http://www.grassvalley.com> 
	............................................................

	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060918/7c4ce005/attachment.html>

From moshek at voltaire.com  Mon Sep 18 00:13:40 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Mon, 18 Sep 2006 10:13:40 +0300
Subject: [openib-general] [openfabrics-ewg] OpenSm on sles10 ppc64 OFED
	1.0 - bug ?
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85A8@taurus.voltaire.com>


See attached file.

Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Hal Rosenstock [mailto:halr at voltaire.com] 
Sent: Sunday, September 17, 2006 5:50 PM
To: Moshe Kazir
Cc: openib-general at openib.org; OpenFabricsEWG; Sasha Khapyorsky
Subject: Re: [openfabrics-ewg] OpenSm on sles10 ppc64


Hi Moshe,

On Sun, 2006-09-17 at 10:41, Moshe Kazir wrote:
> /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10  
> OFED 1.0 .

What error ?

> Should ppc64 SLES10  OFED 1.0 work ?

I don't think so.

> Anyone tried it ?

OFED 1.0 OpenSM release notes say:
* PPC support:
  No PPC QA was performed.

There was an issue with PPC64 that Sasha fixed post OFED 1.0. It's in
OFED 1.1 and could easily be retrofitted to OFED 1.0 if needed. Contact
Sasha or me if you are interested in doing this.

-- Hal

> 
> Moshe
> 
> ____________________________________________________________
> Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
>  
> Voltaire - The Grid Backbone
>  
> www.voltaire.com
> 
>   
> 
> 
> -----Original Message-----
> From: openfabrics-ewg-bounces at openib.org
> [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of 
> vlad at dev.mellanox.co.il
> Sent: Thursday, September 14, 2006 7:39 PM
> To: openfabrics-ewg at openib.org
> Cc: openib-general at openib.org
> Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready
> 
> 
> Hi,
> 
> OFED-1.1-rc5 is available on 
> https://openib.org/svn/gen2/branches/1.1/ofed/releases/
> File: OFED-1.1-rc5.tgz
> Please report any issues in bugzilla http://openib.org/bugzilla/
> 
> 
> Release details:
> ================
> Build_id:
> 
> OFED-1.1-rc5
> 
> openib-1.1 (REV=9485)
> # User space https://openib.org/svn/gen2/branches/1.1/src/userspace
> Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 
> commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09
> 
> # MPI
> mpi_osu-0.9.7-mlx2.2.0.tgz
> openmpi-1.1.1-1.src.rpm
> mpitests-2.0-0.src.rpm
> 
> OS support:
> ===========
> Novell:
>      - SLES 9.0 SP3
>      - SLES10
> Redhat:
>      - Redhat EL4 up3
> 
>      - Redhat EL4 up4
> kernel.org:
>      - Kernel 2.6.17
> 
> 
> Bug fixes from OFED-1.1-rc4:
> ==========================
> 1. ISER compilation fixed on SLES10
> 2. Fixed build on SLES9 PPC64
> 3. Updated libehca
> 4. OpenSM fixes
> 5. Added tavor_quirk option to rdma_cm module (disabled by default): 
> Tavor performance quirk: limit MTU to 1K if > 0 (int)
> 
> Known issues:
> =============
> libipathverbs compilation fails on SLES10 (Bug:204)
> 
> 
> OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday

> or Tuesday.
> 
> 
> Regards,
> Vladimir
> 
> 
> > Hi,
> >
> > The plan is to issue OFED RC5 on Thursday 9/14 and final release 
> > next
> > week. I am aware of the  following issues:
> >
> >
> > 1) Compilation on SLES9 on PPC     - Jack Morgenstein
> > 2) Huge pages on PPC                      - Eli Cohen
> > 3) libipathverbs:                                 - Qlogic
> >             a) libipathverbs ABI issue
> >             b) libipathverbs build on SLES10
> > 4) SDP performance on Tavor           - Michael Tsirkin
> > 5) iSER issue on SLES10                   - Voltaire
> >
> >
> > In order to meet tomorrow's RC5 release all owners please send your
> > patches by end of today.
> >
> >
> > Regards,
> >
> >     Aviram
> >
> > _______________________________________________
> > openfabrics-ewg mailing list
> > openfabrics-ewg at openib.org
> > http://openib.org/mailman/listinfo/openfabrics-ewg
> >
> 
> 
> 
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org 
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 
> 
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org 
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: opensm.error.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060918/39481003/attachment.txt>

From krkumar2 at in.ibm.com  Mon Sep 18 00:35:45 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Mon, 18 Sep 2006 13:05:45 +0530
Subject: [openib-general] [PATCH] Fix freed mem deref race in
 cma_process_remove/cma_req_handler
Message-ID: <20060918073545.26067.41763.sendpatchset@localhost.localdomain>

The race is as follows :

A process : cma_process_remove() calls cma_remove_id_dev(),
	    which sets id state to CMA_DEVICE_REMOVAL and
	    calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
	    and calls cma_acquire_ib_dev() and on failure
	    calls cma_release_remove(), which does a
	    wake_up of cma_process_remove(). Then
	    cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
	    state of id, and since it is still (wrongly)
	    CMA_DEVICE_REMOVAL, it calls notify_user(id)
	    and if that fails, the caller - cma_process_remove()
	    calls rdma_destroy_id(id). Two processes can
	    call rdma_destroy_id(), resulting in one
	    de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-14 15:41:01.000000000 +0530
+++ new/core/cma.c	2006-09-18 11:52:52.000000000 +0530
@@ -1023,6 +1023,7 @@ static int cma_req_handler(struct ib_cm_
 	mutex_unlock(&lock);
 	if (ret) {
 		ret = -ENODEV;
+		cma_exch(conn_id, CMA_DESTROYING);
 		cma_release_remove(conn_id);
 		rdma_destroy_id(&conn_id->id);
 		goto out;


From halr at voltaire.com  Mon Sep 18 01:53:28 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 04:53:28 -0400
Subject: [openib-general] [PATCH] osm: bug in __osmv_send_sa_req
In-Reply-To: <1158502832.8516.9.camel@kliteynik.yok.mtl.com>
References: <1158502832.8516.9.camel@kliteynik.yok.mtl.com>
Message-ID: <1158569544.25157.180348.camel@hal.voltaire.com>

Hi Yevgeny,

On Sun, 2006-09-17 at 10:20, Yevgeny Kliteynik wrote:
> Hi Hal 
> 
> This patch fixes a bug is __osmv_send_sa_req in libvendor.
> After sending a MAD, the status of the responce was ignored.
> 
> Yevgeny 
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il> 

Thanks. Applied to trunk and 1.1.

-- Hal


From ogerlitz at voltaire.com  Mon Sep 18 01:56:31 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 18 Sep 2006 11:56:31 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
Message-ID: <450E5F3F.2090203@voltaire.com>

Eitan Zahavi wrote:
> The following patch solves an issue with OpenSM preferring largest MTU 
> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> devices instead of using a 1K MTU which is best for this device.

Eitan,

Isn't the 2K MTU issue with Tavor comes into play only under RC QP? more 
over, doing TAVOR/UD/2K MTU is very common, eg IPoIB.

So does your patch relies on a somehow completing quirk in the host side 
for UD based ULPs to add some mtu selector which will prevent the SM 
side quirk to take action?

Or.


From halr at voltaire.com  Mon Sep 18 02:16:17 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 05:16:17 -0400
Subject: [openib-general] [openfabrics-ewg] OpenSm on sles10 ppc64 OFED
	1.0 - bug ?
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85A8@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85A8@taurus.voltaire.com>
Message-ID: <1158570905.25157.180934.camel@hal.voltaire.com>

On Mon, 2006-09-18 at 03:13, Moshe Kazir wrote:
> See attached file.

That was the problem that Sasha found and fixed.

-- Hal

> Moshe
> 
> ____________________________________________________________
> Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
>  
> Voltaire - The Grid Backbone
>  
> www.voltaire.com
> 
>   
> 
> 
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com] 
> Sent: Sunday, September 17, 2006 5:50 PM
> To: Moshe Kazir
> Cc: openib-general at openib.org; OpenFabricsEWG; Sasha Khapyorsky
> Subject: Re: [openfabrics-ewg] OpenSm on sles10 ppc64
> 
> 
> Hi Moshe,
> 
> On Sun, 2006-09-17 at 10:41, Moshe Kazir wrote:
> > /etc/init.d/opensm start produce an error on my JS21 ppc64 SLES10  
> > OFED 1.0 .
> 
> What error ?
> 
> > Should ppc64 SLES10  OFED 1.0 work ?
> 
> I don't think so.
> 
> > Anyone tried it ?
> 
> OFED 1.0 OpenSM release notes say:
> * PPC support:
>   No PPC QA was performed.
> 
> There was an issue with PPC64 that Sasha fixed post OFED 1.0. It's in
> OFED 1.1 and could easily be retrofitted to OFED 1.0 if needed. Contact
> Sasha or me if you are interested in doing this.
> 
> -- Hal
> 
> > 
> > Moshe
> > 
> > ____________________________________________________________
> > Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
> >  
> > Voltaire - The Grid Backbone
> >  
> > www.voltaire.com
> > 
> >   
> > 
> > 
> > -----Original Message-----
> > From: openfabrics-ewg-bounces at openib.org
> > [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of 
> > vlad at dev.mellanox.co.il
> > Sent: Thursday, September 14, 2006 7:39 PM
> > To: openfabrics-ewg at openib.org
> > Cc: openib-general at openib.org
> > Subject: [openfabrics-ewg] OFED-1.1-RC5 is ready
> > 
> > 
> > Hi,
> > 
> > OFED-1.1-rc5 is available on 
> > https://openib.org/svn/gen2/branches/1.1/ofed/releases/
> > File: OFED-1.1-rc5.tgz
> > Please report any issues in bugzilla http://openib.org/bugzilla/
> > 
> > 
> > Release details:
> > ================
> > Build_id:
> > 
> > OFED-1.1-rc5
> > 
> > openib-1.1 (REV=9485)
> > # User space https://openib.org/svn/gen2/branches/1.1/src/userspace
> > Git: git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1 
> > commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09
> > 
> > # MPI
> > mpi_osu-0.9.7-mlx2.2.0.tgz
> > openmpi-1.1.1-1.src.rpm
> > mpitests-2.0-0.src.rpm
> > 
> > OS support:
> > ===========
> > Novell:
> >      - SLES 9.0 SP3
> >      - SLES10
> > Redhat:
> >      - Redhat EL4 up3
> > 
> >      - Redhat EL4 up4
> > kernel.org:
> >      - Kernel 2.6.17
> > 
> > 
> > Bug fixes from OFED-1.1-rc4:
> > ==========================
> > 1. ISER compilation fixed on SLES10
> > 2. Fixed build on SLES9 PPC64
> > 3. Updated libehca
> > 4. OpenSM fixes
> > 5. Added tavor_quirk option to rdma_cm module (disabled by default): 
> > Tavor performance quirk: limit MTU to 1K if > 0 (int)
> > 
> > Known issues:
> > =============
> > libipathverbs compilation fails on SLES10 (Bug:204)
> > 
> > 
> > OFED-1.1-rc6 (hopefully the last one) planned to be released on Monday
> 
> > or Tuesday.
> > 
> > 
> > Regards,
> > Vladimir
> > 
> > 
> > > Hi,
> > >
> > > The plan is to issue OFED RC5 on Thursday 9/14 and final release 
> > > next
> > > week. I am aware of the  following issues:
> > >
> > >
> > > 1) Compilation on SLES9 on PPC     - Jack Morgenstein
> > > 2) Huge pages on PPC                      - Eli Cohen
> > > 3) libipathverbs:                                 - Qlogic
> > >             a) libipathverbs ABI issue
> > >             b) libipathverbs build on SLES10
> > > 4) SDP performance on Tavor           - Michael Tsirkin
> > > 5) iSER issue on SLES10                   - Voltaire
> > >
> > >
> > > In order to meet tomorrow's RC5 release all owners please send your
> > > patches by end of today.
> > >
> > >
> > > Regards,
> > >
> > >     Aviram
> > >
> > > _______________________________________________
> > > openfabrics-ewg mailing list
> > > openfabrics-ewg at openib.org
> > > http://openib.org/mailman/listinfo/openfabrics-ewg
> > >
> > 
> > 
> > 
> > _______________________________________________
> > openfabrics-ewg mailing list
> > openfabrics-ewg at openib.org 
> > http://openib.org/mailman/listinfo/openfabrics-ewg
> > 
> > 
> > _______________________________________________
> > openfabrics-ewg mailing list
> > openfabrics-ewg at openib.org 
> > http://openib.org/mailman/listinfo/openfabrics-ewg
> > 
> 


From mst at mellanox.co.il  Mon Sep 18 02:35:06 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 12:35:06 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
	MT23108 devices
In-Reply-To: <450E5F3F.2090203@voltaire.com>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com> <450E5F3F.2090203@voltaire.com>
Message-ID: <20060918093506.GC29055@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices
> 
> Eitan Zahavi wrote:
> > The following patch solves an issue with OpenSM preferring largest MTU 
> > for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> > devices instead of using a 1K MTU which is best for this device.
> 
> Eitan,
> 
> Isn't the 2K MTU issue with Tavor comes into play only under RC QP?

I don't think so, no. Tavor supports 2K MTU, but it has better performance with
1K MTU than 2K MTU. QP type should not matter.

> more over, doing TAVOR/UD/2K MTU is very common, eg IPoIB.

Correct. And it works with existing SMs.
But ULPs that have specific MTU requirements must set MTU selector
accordingly, otherwise SM is free to select any MTU.

> So does your patch relies on a somehow completing quirk in the host side 
> for UD based ULPs to add some mtu selector which will prevent the SM 
> side quirk to take action?

It's more a bugfix than a quirk.

IPoIB currently has specific MTU requirements but does not set MTU
selector at all, relying on specific SM behaviour. I consider it
a bug in IPoIB and am testing a patch fixing this.

-- 
MST


From ogerlitz at voltaire.com  Mon Sep 18 02:45:22 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 18 Sep 2006 12:45:22 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
	MT23108 devices
In-Reply-To: <20060918093506.GC29055@mellanox.co.il>
References: <86y7sle4kg.fsf@mtl066.yok.mtl.com>
	<450E5F3F.2090203@voltaire.com> <20060918093506.GC29055@mellanox.co.il>
Message-ID: <450E6AB2.70505@voltaire.com>

Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:

>> Eitan Zahavi wrote:
>>> The following patch solves an issue with OpenSM preferring largest MTU 
>>> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
>>> devices instead of using a 1K MTU which is best for this device.

>> Isn't the 2K MTU issue with Tavor comes into play only under RC QP?

> I don't think so, no. Tavor supports 2K MTU, but it has better performance with
> 1K MTU than 2K MTU. QP type should not matter.

Can you double check that please, as far as i know there is something 
like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW with 
Tavor/UD/2048 is **no less** then Tavor/UD/1024.

So its very common for IPoIB net devices impl. to expose 2044 or 1500 
bytes MTU to the OS eg to cope with Ethernet and reduce IP 
fragmentation/reassembly of UDP/TCP traffic.

Or.


From halr at voltaire.com  Mon Sep 18 02:53:56 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 05:53:56 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 - not for MTU Sel=3
In-Reply-To: <864pv6mtoe.fsf@mtl066.yok.mtl.com>
References: <864pv6mtoe.fsf@mtl066.yok.mtl.com>
Message-ID: <1158573172.25157.182192.camel@hal.voltaire.com>

Hi Eitan,

On Sun, 2006-09-17 at 10:52, Eitan Zahavi wrote:
> Hi Hal
> 
> We have reviewed the patch for the above and figured out there is an
> issue with it:
> Currently when MTU_SEL=3 the quirk applies.
> We think this is wrong behavior as MTU_SEL=3 means "max possible MTU" by 
> the IBTA spec. So if an application/ULP would like to get the max MTU possible 
> the correct answer is 2K for Tavor by the spec.
> So this patch fxies the quirk and when MTU_SEL=3 it does not apply the MTU
> limit quirk for Tavor devices.

Good catch. So compliancy over performance is preferred for this case.

Thanks. Applied to both trunk and 1.1

-- Hal


From mst at mellanox.co.il  Mon Sep 18 02:54:23 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 12:54:23 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
	MT23108 devices
In-Reply-To: <450E6AB2.70505@voltaire.com>
References: <450E6AB2.70505@voltaire.com>
Message-ID: <20060918095423.GG29055@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: Re: [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> 
> >> Eitan Zahavi wrote:
> >>> The following patch solves an issue with OpenSM preferring largest MTU 
> >>> for PathRecord/MultiPathRecord for paths going to or from MT23108 (Tavor)
> >>> devices instead of using a 1K MTU which is best for this device.
> 
> >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP?
> 
> > I don't think so, no. Tavor supports 2K MTU, but it has better performance with
> > 1K MTU than 2K MTU. QP type should not matter.
> 
> Can you double check that please, as far as i know there is something 
> like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW with 
> Tavor/UD/2048 is **no less** then Tavor/UD/1024.

The property of Tavor to work better with 1K MTU is not transport-specific.
But, BW depends on the ULP. I guess UD top BW is simply lower (smaller messages)
so you do not see the drop there.

This just means ULP should use MTU selector to give SM hints about the MTU it
wants. If it wants the highest MTU available it should set the selector to 3,
not wildcard it.

> So its very common for IPoIB net devices impl. to expose 2044 or 1500 
> bytes MTU to the OS eg to cope with Ethernet and reduce IP 
> fragmentation/reassembly of UDP/TCP traffic.

I expect IPoIB to get better performance with higher MTU - TCP
fragmentation likely has bigger effect than hardware speed quirks.
But this is just another reason to set the mtu selector in IPoIB
appropriately.


-- 
MST


From halr at voltaire.com  Mon Sep 18 03:44:22 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 06:44:22 -0400
Subject: [openib-general] [PATCH 1/13] osm: port to WinIB stack :
 include/opensm/osm_base.h
In-Reply-To: <861wqamqnl.fsf@mtl066.yok.mtl.com>
References: <861wqamqnl.fsf@mtl066.yok.mtl.com>
Message-ID: <1158576183.18842.636.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
> Hi Hal
> 
> osm_base.h uses cache dir for osm-partitions.conf.
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From halr at voltaire.com  Mon Sep 18 03:53:51 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 06:53:51 -0400
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 include/opensm/osm_pkey.h
In-Reply-To: <86zmcylc2e.fsf@mtl066.yok.mtl.com>
References: <86zmcylc2e.fsf@mtl066.yok.mtl.com>
Message-ID: <1158576813.18842.935.camel@hal.voltaire.com>

Hi Eitan,

On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
> Hi Hal
> 
> Partition tables blocks are always 16 bits. 
> This resolves the need to later cast back and forth.
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> 
> Index: include/opensm/osm_pkey.h
> ===================================================================
> --- include/opensm/osm_pkey.h	(revision 9502)
> +++ include/opensm/osm_pkey.h	(working copy)
> @@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl
>  typedef struct _osm_pending_pkey {
>    cl_list_item_t	list_item;
>    uint16_t		pkey;
> -  uint32_t		block;
> +  uint16_t		block;
>    uint8_t		index;
>    boolean_t		is_new;
>  } osm_pending_pkey_t;
> @@ -396,7 +396,7 @@ ib_api_status_t
>  osm_pkey_tbl_get_block_and_idx(
>    IN  osm_pkey_tbl_t *p_pkey_tbl, 
>    IN  uint16_t       *p_pkey,
> -  OUT uint32_t       *block_idx,
> +  OUT uint16_t       *block_idx,
>    OUT uint8_t        *pkey_index);
>  /*
>  *  p_pkey_tbl

Doesn't this require at least a similar change to
opensm/osm_pkey.c:osm_pkey_tbl_get_block_and_idx ? Anything else ?

-- Hal


From mirko.benz at xiranet.com  Mon Sep 18 03:59:26 2006
From: mirko.benz at xiranet.com (Mirko Benz)
Date: Mon, 18 Sep 2006 12:59:26 +0200
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
Message-ID: <450E7C0E.3020001@xiranet.com>

Hello,

We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
.../ofed/bin/)
do not work with a normal user account -- no output given. It works as 
root though.

Regards,
Mirko


From halr at voltaire.com  Mon Sep 18 04:10:18 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 07:10:18 -0400
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <450E7C0E.3020001@xiranet.com>
References: <450E7C0E.3020001@xiranet.com>
Message-ID: <1158577816.18842.1501.camel@hal.voltaire.com>

Hi Mirko,

On Mon, 2006-09-18 at 06:59, Mirko Benz wrote:
> Hello,
> 
> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
> .../ofed/bin/)
> do not work with a normal user account -- no output given. It works as 
> root though.

It depends on how you have udev access for umad setup. With the default
setup for IB, root is required as these diagnostics send SMPs which
require umad access which is limited to root.

-- Hal

> Regards,
> Mirko
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mirko.benz at xiranet.com  Mon Sep 18 04:20:57 2006
From: mirko.benz at xiranet.com (Mirko Benz)
Date: Mon, 18 Sep 2006 13:20:57 +0200
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <1158577816.18842.1501.camel@hal.voltaire.com>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
Message-ID: <450E8119.4060405@xiranet.com>

Hi Hal,

This was a default/build all OFED install. Either we should place these 
tools under ../ofed/sbin or make it work for every body. At least a 
error message that umad access failed would be required.

Regards,
Mirko

Hal Rosenstock schrieb:
> Hi Mirko,
>
> On Mon, 2006-09-18 at 06:59, Mirko Benz wrote:
>   
>> Hello,
>>
>> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
>> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
>> .../ofed/bin/)
>> do not work with a normal user account -- no output given. It works as 
>> root though.
>>     
>
> It depends on how you have udev access for umad setup. With the default
> setup for IB, root is required as these diagnostics send SMPs which
> require umad access which is limited to root.
>
> -- Hal
>
>   
>> Regards,
>> Mirko
>>
>> _______________________________________________
>> openib-general mailing list
>> openib-general at openib.org
>> http://openib.org/mailman/listinfo/openib-general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>
>>     


From mst at mellanox.co.il  Mon Sep 18 04:40:18 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 14:40:18 +0300
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <450E8119.4060405@xiranet.com>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com>
Message-ID: <20060918114018.GJ29055@mellanox.co.il>

Quoting r. Mirko Benz <mirko.benz at xiranet.com>:
> Subject: Re: IB diagnostics problems (OFED-1.1-rc5)
> 
> Hi Hal,
> 
> This was a default/build all OFED install. Either we should place these 
> tools under ../ofed/sbin or make it work for every body. At least a 
> error message that umad access failed would be required.

I don't think opening umad for regular user by default is a good idea.
And isn't sbin for static binaries?

With regards to diagnostics - I think proper exit status is reported, so I
expect if you set up shell accordingly you'll get the diagnostic printout.
Printing stuff on stderr/stdout might interfere with activating these from
scripts, so I'm not sure it's a good idea. Hal?

-- 
MST


From erezz at voltaire.com  Mon Sep 18 04:57:05 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 18 Sep 2006 14:57:05 +0300
Subject: [openib-general] Negotiation of Rsponder resource & Initiator depth
Message-ID: <450E8991.5080603@voltaire.com>

Sean,

In the IB spec it says in 12.7.29:

The recipient of the REQ message shall choose a local Initiator Depth that
does not exceed the Responder Resources offered in the REQ. If the recipient
of the REQ message is unwilling or unable to do so, it shall send a
REJ message to discontinue the connection establishment.

 From reading the CMA code, I see that it does not negotiate these 
values (responder resources & initiator depth). It expects the ULP to 
negotiate it. Why? Shouldn't it be done by the CMA?

Thanks
-- 

____________________________________________________________

Erez Zilber | 972-9-971-7689

Software Engineer, Storage Team

Voltaire – _The Grid Backbone_

__

www.voltaire.com <http://www.voltaire.com/>


From glebn at voltaire.com  Mon Sep 18 05:06:37 2006
From: glebn at voltaire.com (glebn at voltaire.com)
Date: Mon, 18 Sep 2006 15:06:37 +0300
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <20060918114018.GJ29055@mellanox.co.il>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il>
Message-ID: <20060918120637.GB29931@minantech.com>

On Mon, Sep 18, 2006 at 02:40:18PM +0300, Michael S. Tsirkin wrote:
> Quoting r. Mirko Benz <mirko.benz at xiranet.com>:
> > Subject: Re: IB diagnostics problems (OFED-1.1-rc5)
> > 
> > Hi Hal,
> > 
> > This was a default/build all OFED install. Either we should place these 
> > tools under ../ofed/sbin or make it work for every body. At least a 
> > error message that umad access failed would be required.
> 
> I don't think opening umad for regular user by default is a good idea.
> And isn't sbin for static binaries?
> 
It isn't. sbin is for System BINaries.

--
			Gleb.


From halr at voltaire.com  Mon Sep 18 05:39:33 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 08:39:33 -0400
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <450E8119.4060405@xiranet.com>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com>
Message-ID: <1158583167.18842.4632.camel@hal.voltaire.com>

Hi again Mirko,

On Mon, 2006-09-18 at 07:20, Mirko Benz wrote:
> Hi Hal,
> 
> This was a default/build all OFED install. Either we should place these 
> tools under ../ofed/sbin or make it work for every body.

The issue with making it work for everyone is that there's a chicken and
egg problem in that when the tools are built and installed, one doesn't
know how udev will be configured for umad. I agree that since the
default is to run as root, these should be in sbin rather than bin. Can
you file a bugzilla report for this (or do you want me to do it on your
behalf) ? Is this critical for OFED 1.1 ?

>  At least a error message that umad access failed would be required.
Those are scripts and the errors are being returned from the lower level
programs invoked but not by the scripts.

Would you please file a bug for this as well (or let me know whether I
should do this) ? 

Thanks.

-- Hal

> Regards,
> Mirko
> 
> Hal Rosenstock schrieb:
> > Hi Mirko,
> >
> > On Mon, 2006-09-18 at 06:59, Mirko Benz wrote:
> >   
> >> Hello,
> >>
> >> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
> >> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
> >> .../ofed/bin/)
> >> do not work with a normal user account -- no output given. It works as 
> >> root though.
> >>     
> >
> > It depends on how you have udev access for umad setup. With the default
> > setup for IB, root is required as these diagnostics send SMPs which
> > require umad access which is limited to root.
> >
> > -- Hal
> >
> >   
> >> Regards,
> >> Mirko
> >>
> >> _______________________________________________
> >> openib-general mailing list
> >> openib-general at openib.org
> >> http://openib.org/mailman/listinfo/openib-general
> >>
> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>
> >>     
> 


From halr at voltaire.com  Mon Sep 18 05:42:06 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 08:42:06 -0400
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <20060918114018.GJ29055@mellanox.co.il>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il>
Message-ID: <1158583325.18842.4712.camel@hal.voltaire.com>

On Mon, 2006-09-18 at 07:40, Michael S. Tsirkin wrote:
> Quoting r. Mirko Benz <mirko.benz at xiranet.com>:
> > Subject: Re: IB diagnostics problems (OFED-1.1-rc5)
> > 
> > Hi Hal,
> > 
> > This was a default/build all OFED install. Either we should place these 
> > tools under ../ofed/sbin or make it work for every body. At least a 
> > error message that umad access failed would be required.
> 
> I don't think opening umad for regular user by default is a good idea.
> And isn't sbin for static binaries?
> 
> With regards to diagnostics - I think proper exit status is reported,
> so I expect if you set up shell accordingly you'll get the diagnostic printout.

I don't think so for this case.

> Printing stuff on stderr/stdout might interfere with activating these from
> scripts, so I'm not sure it's a good idea. Hal?

The ones Mirko cited currently are scripts rather than binaries.

-- Hal


From michael.arndt at informatik.tu-chemnitz.de  Mon Sep 18 05:47:04 2006
From: michael.arndt at informatik.tu-chemnitz.de (Michael Arndt)
Date: Mon, 18 Sep 2006 14:47:04 +0200
Subject: [openib-general] What does --process_mad-- exactly?
Message-ID: <001a01c6db20$8b306980$21606d86@one7>

Hi,

the function ib_mad_recv_done_handler, which is called if a dr_smp packet 
was received, calls "port_priv->device->process_mad". I analyze that 
function (recursive) for mthca, but I never found the point where a set or 
get method is applied.

Does anybody knows, what exactly process_mad does.

Thanks, Michael 


From mirko.benz at xiranet.com  Mon Sep 18 05:56:13 2006
From: mirko.benz at xiranet.com (Mirko Benz)
Date: Mon, 18 Sep 2006 14:56:13 +0200
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <1158583167.18842.4632.camel@hal.voltaire.com>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com>
	<1158583167.18842.4632.camel@hal.voltaire.com>
Message-ID: <450E976D.3070802@xiranet.com>

Hi Hal,

Please prepare the bugzilla entry.

It is not critical -- I just think it is not convenient for an end user.

Regards,
Mirko

Hal Rosenstock schrieb:
> Hi again Mirko,
>
> On Mon, 2006-09-18 at 07:20, Mirko Benz wrote:
>   
>> Hi Hal,
>>
>> This was a default/build all OFED install. Either we should place these 
>> tools under ../ofed/sbin or make it work for every body.
>>     
>
> The issue with making it work for everyone is that there's a chicken and
> egg problem in that when the tools are built and installed, one doesn't
> know how udev will be configured for umad. I agree that since the
> default is to run as root, these should be in sbin rather than bin. Can
> you file a bugzilla report for this (or do you want me to do it on your
> behalf) ? Is this critical for OFED 1.1 ?
>
>   
>>  At least a error message that umad access failed would be required.
>>     
> Those are scripts and the errors are being returned from the lower level
> programs invoked but not by the scripts.
>
> Would you please file a bug for this as well (or let me know whether I
> should do this) ? 
>
> Thanks.
>
> -- Hal
>
>   
>> Regards,
>> Mirko
>>
>> Hal Rosenstock schrieb:
>>     
>>> Hi Mirko,
>>>
>>> On Mon, 2006-09-18 at 06:59, Mirko Benz wrote:
>>>   
>>>       
>>>> Hello,
>>>>
>>>> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
>>>> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
>>>> .../ofed/bin/)
>>>> do not work with a normal user account -- no output given. It works as 
>>>> root though.
>>>>     
>>>>         
>>> It depends on how you have udev access for umad setup. With the default
>>> setup for IB, root is required as these diagnostics send SMPs which
>>> require umad access which is limited to root.
>>>
>>> -- Hal
>>>
>>>   
>>>       
>>>> Regards,
>>>> Mirko
>>>>
>>>> _______________________________________________
>>>> openib-general mailing list
>>>> openib-general at openib.org
>>>> http://openib.org/mailman/listinfo/openib-general
>>>>
>>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>>>
>>>>     
>>>>         


From halr at voltaire.com  Mon Sep 18 06:05:13 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 09:05:13 -0400
Subject: [openib-general] What does --process_mad-- exactly?
In-Reply-To: <001a01c6db20$8b306980$21606d86@one7>
References: <001a01c6db20$8b306980$21606d86@one7>
Message-ID: <1158584696.18842.5432.camel@hal.voltaire.com>

Hi Michael,

On Mon, 2006-09-18 at 08:47, Michael Arndt wrote:
> Hi,
> 
> the function ib_mad_recv_done_handler, which is called if a dr_smp packet 
> was received, calls "port_priv->device->process_mad". I analyze that 
> function (recursive) for mthca, but I never found the point where a set or 
> get method is applied.
> 
> Does anybody knows, what exactly process_mad does.

process_mad hands the incoming MAD to the driver (mthca, ipath, eHCA) if
it has defined this routine. In the case of mthca, it is used to hand
incoming SMA and PMA packets down the firmware (see
hw/mthca/mthca_mad.c:mthca_process_mad).

-- Hal

> Thanks, Michael 
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From johnt1johnt2 at gmail.com  Mon Sep 18 06:43:59 2006
From: johnt1johnt2 at gmail.com (john t)
Date: Mon, 18 Sep 2006 19:13:59 +0530
Subject: [openib-general] Reuse pd amd mr
Message-ID: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>

Hi

I have two HCA cards each having one port. I want to use same memory
buffer to store packets arriving on the two ports. Can I do this, meaning
can I use same pd (protection domain) and mr (memory registration) for the
two QPs (one QP on each port), though the context (i.e. ib device) for each
QP is different?

Regards,
John T
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060918/1cc5b432/attachment.html>

From dotanb at dev.mellanox.co.il  Mon Sep 18 06:58:20 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Mon, 18 Sep 2006 16:58:20 +0300
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>
References: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>
Message-ID: <450EA5FC.7040003@dev.mellanox.co.il>

Hi john.

john t wrote:
> Hi
>  
> I have two HCA cards each having one port. I want to use same memory 
> buffer to store packets arriving on the two ports. Can I do this, 
> meaning can I use same pd (protection domain) and mr (memory 
> registration) for the two QPs (one QP on each port), though the 
> context ( i.e. ib device) for each QP is different?

if the context is different how can you create 2 QPs using the same PD?
The context is a driver abstraction and the HCA is not aware of it ...

anyway, if you have 2 QPs and 1 MR which are in the same PD, the QPs can 
listen/send the packets on any port and write to the same MR
(in different address of course, the order of the packet arrival in 
those QPs is "random" ...)

Dotan


From sashak at voltaire.com  Mon Sep 18 06:46:04 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 18 Sep 2006 16:46:04 +0300
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <20060918114018.GJ29055@mellanox.co.il>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il>
Message-ID: <1158587165.9877.3.camel@localhost>

On Mon, 2006-09-18 at 14:40 +0300, Michael S. Tsirkin wrote:
> Quoting r. Mirko Benz <mirko.benz at xiranet.com>:
> > Subject: Re: IB diagnostics problems (OFED-1.1-rc5)
> > 
> > Hi Hal,
> > 
> > This was a default/build all OFED install. Either we should place these 
> > tools under ../ofed/sbin or make it work for every body. At least a 
> > error message that umad access failed would be required.
> 
> I don't think opening umad for regular user by default is a good idea.

Yes, but this can be limited for predefined group (something like ib,
umad, ibumad...) and then umad permitted users permitted will be a
members of this supplementary group.

Sasha

> And isn't sbin for static binaries?
> 
> With regards to diagnostics - I think proper exit status is reported, so I
> expect if you set up shell accordingly you'll get the diagnostic printout.
> Printing stuff on stderr/stdout might interfere with activating these from
> scripts, so I'm not sure it's a good idea. Hal?
> 


From sashak at voltaire.com  Mon Sep 18 06:46:04 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Mon, 18 Sep 2006 16:46:04 +0300
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <20060918114018.GJ29055@mellanox.co.il>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com> <20060918114018.GJ29055@mellanox.co.il>
Message-ID: <1158587165.9877.3.camel@localhost>

On Mon, 2006-09-18 at 14:40 +0300, Michael S. Tsirkin wrote:
> Quoting r. Mirko Benz <mirko.benz at xiranet.com>:
> > Subject: Re: IB diagnostics problems (OFED-1.1-rc5)
> > 
> > Hi Hal,
> > 
> > This was a default/build all OFED install. Either we should place these 
> > tools under ../ofed/sbin or make it work for every body. At least a 
> > error message that umad access failed would be required.
> 
> I don't think opening umad for regular user by default is a good idea.

Yes, but this can be limited for predefined group (something like ib,
umad, ibumad...) and then umad permitted users permitted will be a
members of this supplementary group.

Sasha

> And isn't sbin for static binaries?
> 
> With regards to diagnostics - I think proper exit status is reported, so I
> expect if you set up shell accordingly you'll get the diagnostic printout.
> Printing stuff on stderr/stdout might interfere with activating these from
> scripts, so I'm not sure it's a good idea. Hal?
> 


From trimmer at silverstorm.com  Mon Sep 18 07:09:00 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 18 Sep 2006 10:09:00 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <450E6AB2.70505@voltaire.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EEF6F@mail.silverstorm.com>

> From: Or Gerlitz
> Sent: Monday, September 18, 2006 5:45 AM
> To: Michael S. Tsirkin
> Cc: OPENIB
> Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> 
> >> Eitan Zahavi wrote:
> >>> The following patch solves an issue with OpenSM preferring largest
MTU
> >>> for PathRecord/MultiPathRecord for paths going to or from MT23108
> (Tavor)
> >>> devices instead of using a 1K MTU which is best for this device.
> 
> >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP?
> 
> > I don't think so, no. Tavor supports 2K MTU, but it has better
> performance with
> > 1K MTU than 2K MTU. QP type should not matter.
> 
> Can you double check that please, as far as i know there is something
> like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW
with
> Tavor/UD/2048 is **no less** then Tavor/UD/1024.
> 
> So its very common for IPoIB net devices impl. to expose 2044 or 1500
> bytes MTU to the OS eg to cope with Ethernet and reduce IP
> fragmentation/reassembly of UDP/TCP traffic.
> 

Putting this in the SM alone and making it a fabric wide setting is
inappropriate.  The performance difference depends on application
message size.  Application message size can vary per ULP and/or per
application itself.  For example one MPI application may send mostly
large messages while another may send mostly small messages.  The same
could be true of applications for other ULPs such as uDAPL and SDP, etc.

The root issue is the Tavor HCA has 1 too few credits to truly double
buffer at 2K MTU.  However at message sizes > 1K but < 2K the 2K MTU
performs better.

Here are some MPI bandwidth results:
Tavor w/ 2K MTU:
512             140.394173
1024            310.553002
1500            407.003858
1800            435.538752
2048            392.831026
4096            417.592991

Tavor w/ 1K MTU:
512             140.261964
1024            300.789425
1500            379.746835
1800            416.726957
2048            425.227096
4096            501.442289

Note that message sizes shown on left do not include MPI headers.  Hence
actual IB message size is approx 50 bytes larger.

So we see at IB message sizes < 1024 (MPI 512 message), performance is
the same.
At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance
is best with 2K MTU.
At IB message sizes > 2048 (MPI 2048-4096 messages above), performance
is best with 1K MTU.
At larger IB message sizes (MPI 4096 message), performance starts to
take off and ultimately at 128K message size (not shown) the 50%
difference between 1K and 2K MTU reaches its peak.

Todd Rimmer


From rdreier at cisco.com  Mon Sep 18 07:10:13 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 07:10:13 -0700
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
	<aday7smwjmy.fsf@cisco.com> <450D36E9.1000502@voltaire.com>
Message-ID: <aday7shp8oq.fsf@cisco.com>

    Or> I want it to be visible so if some other config **depends** on
    Or> it the use can **see** this config and select it.

    Or> Also as of the importance of the rdma cm within the IB stack
    Or> being along with the ib verbs the second access point to ULP
    Or> coders, seeing its config and documenting it is important.

I don't buy this.  The only thing making this config option visible
does is make it more likely (far more likely) that someone will
disable it.  Right now the RDMA CM is built as long as INFINIBAND and
INET are enabled.  No one is going to turn off INET on any normal
system so effectively the RDMA CM is always built whenever INFINIBAND
is enabled.

As far as making a config symbol to depend on, I think INET makes as
much sense or more: something using IP addressing naturally depends on
having IP networking.

 - R.


From rdreier at cisco.com  Mon Sep 18 07:15:13 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 07:15:13 -0700
Subject: [openib-general] [PATCH] IB/iser: fix iSER description and
 selections in Kconfig
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com>
	<adau03awjku.fsf@cisco.com> <450D0FCB.1000401@voltaire.com>
Message-ID: <adaslipp8ge.fsf@cisco.com>

    Erez> There are 3 additional required config entries: NET, INET &
    Erez> INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or
    Erez> 'depned' on some of them and 'select' the rest?

INET depends on NET, and INFINIBAND_RDMA_CM doesn't exist.  So
depending on INET is sufficient.  That's the reason 'depend' is better
than 'select' -- you don't have to worry about recreating the full
dependency tree of things you depend on.

    Erez> Also, since I'm not familiar enough with 'make rndconfig',
    Erez> here's a question: if iSER 'depends' on INET, is it possible
    Erez> that 'make rndconfig' will enable iSER without enabling
    Erez> INET?

No, of course not.  The whole point of make randconfig is to make a
random but valid configuration.

Anyway, rather than waste more time going back and forth on this, I
added the following to my for-2.6.19 tree as the obvious fix:

Author: Roland Dreier <rolandd at cisco.com>
Date:   Sun Sep 17 22:58:27 2006 -0700

    IB/iser: INFINIBAND_ISER depends on INET
    
    iSER won't build without CONFIG_INET enabled, so make Kconfig reflect that.
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
index fead87d..365a1b5 100644
--- a/drivers/infiniband/ulp/iser/Kconfig
+++ b/drivers/infiniband/ulp/iser/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_ISER
 	tristate "ISCSI RDMA Protocol"
-	depends on INFINIBAND && SCSI
+	depends on INFINIBAND && SCSI && INET
 	select SCSI_ISCSI_ATTRS
 	---help---
 	  Support for the ISCSI RDMA Protocol over InfiniBand.  This


From thlin at us.ibm.com  Mon Sep 18 07:24:50 2006
From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin)
Date: Mon, 18 Sep 2006 09:24:50 -0500
Subject: [openib-general] Mstflint - not working on ppc64 and when
 driver is not loaded on AMD
In-Reply-To: <20060917133449.GA28318@mellanox.co.il>
References: <D4F8F0B3820E754C887699BEF26A8940EB859E@taurus.voltaire.com>
	<20060917133449.GA28318@mellanox.co.il>
Message-ID: <1158589490.21249.19.camel@flin.austin.ibm.com>

Michael:

    You are right. The idea was to use sysfs resource0 whenever it is
available and fall back to config space when it is not. This would make
both new and old kernels happy. This re-structured mopen() and make the
patch look big.

    I dug into the ppc64 a little bit. The device driver does IO remap.
ioremap is needed in IBM pSeries machines. I suspect that's why
resource0 (and other mmap) only works when the device driver is loaded.
I have not figured out a way to do ioremap from user space. In addition
to open and mmap, maybe I should try to read a few bytes and fall back
to config space if the read failed. I suspect x86_64 suffers from the
same problem. I am getting an AMD blade to find what exact the problem
is.

    You mentioned "your version" of mstflint. Is that a different one
from the one in OFED-1.0? If it is, would you mind sending me a copy of
your version so that I can play with it as well? Thanks.

On Sun, 2006-09-17 at 16:34 +0300, Michael S. Tsirkin wrote:
> Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > Subject: Mstflint - not working on ppc64 and when driver is not loaded on AMD
> > 
> > 
> > Michael,
> >  
> > The attached patch was received from Frank (IBM) .
> 
> Wow, that's one big patch, I can't see what it's doing at all.
> Can just the relevant fix be isolated?
> 
> > Frank change the mmap in the mopen function and now it is working o.k. 
> > on my IBM JS21 ppc64 (sles9 sp3 sles10) and IBM  HS21 (EM64T) sles9 sp3
> > all the computer uses PCI-Ex HCA cards
> 
> > I tested this fix on AMD computer (PCI-X)  and found that it did not fix
> > the problem initially reported by Or Gerlitz in the attached message. 
> 
> That is, if it is even relevant?
> 
> > Also, I suspect that it doesn't work on MAC ppc64 G5 with PCI-X . (I
> > have to repeated this test) .
> > 
> > I'm suspect that this this is a PCI-X to PCI-EX issue .
> > 
> 
> Hmm.
> What I can understand of the patch, it attempts using sysfs resource0
> which is only implemented on kernels > 2.6.12 or 2.6.13, so
> that's probably your issue.
> 
> Can you try passing the following to mstflint (my version):
> -d /sys/bus/pci/devices/0000\:08\:00.0/resource0 q
> where 0000\:08\:00.0 is the appropriate device?
> 
> Does this work with driver not loaded? On which OS-es?
> 


From johnt1johnt2 at gmail.com  Mon Sep 18 07:36:33 2006
From: johnt1johnt2 at gmail.com (john t)
Date: Mon, 18 Sep 2006 20:06:33 +0530
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <450EA5FC.7040003@dev.mellanox.co.il>
References: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>
	<450EA5FC.7040003@dev.mellanox.co.il>
Message-ID: <a94efc20609180736u774bfba1k856d01cb6f3d5032@mail.gmail.com>

Hi Dotan,

This may be a very basic question. When u said "QPs can listen/send the
packets on any port and write to the same MR", does it mean QPs can
listen/send packets to any port on the same HCA or to also ports on
different HCA ?

Regards,
John T


On 9/18/06, Dotan Barak <dotanb at dev.mellanox.co.il> wrote:
>
> Hi john.
>
> john t wrote:
> > Hi
> >
> > I have two HCA cards each having one port. I want to use same memory
> > buffer to store packets arriving on the two ports. Can I do this,
> > meaning can I use same pd (protection domain) and mr (memory
> > registration) for the two QPs (one QP on each port), though the
> > context ( i.e. ib device) for each QP is different?
>
> if the context is different how can you create 2 QPs using the same PD?
> The context is a driver abstraction and the HCA is not aware of it ...
>
> anyway, if you have 2 QPs and 1 MR which are in the same PD, the QPs can
> listen/send the packets on any port and write to the same MR
> (in different address of course, the order of the packet arrival in
> those QPs is "random" ...)
>
> Dotan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060918/b107f346/attachment.html>

From rdreier at cisco.com  Mon Sep 18 07:49:28 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 07:49:28 -0700
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>
	(john t.'s message of "Mon, 18 Sep 2006 19:13:59 +0530")
References: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>
Message-ID: <adaodtdp6vb.fsf@cisco.com>

    john> Hi I have two HCA cards each having one port. I want to use
    john> same memory buffer to store packets arriving on the two
    john> ports. Can I do this, meaning can I use same pd (protection
    john> domain) and mr (memory registration) for the two QPs (one QP
    john> on each port), though the context (i.e. ib device) for each
    john> QP is different?

No, a PD belongs to a specific device.  However nothing prevents you
from creating one PD for each device, and two MRs (one for each
device, each using one of those two PDs) that cover the same memory.

 - R.


From dotanb at dev.mellanox.co.il  Mon Sep 18 07:50:09 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Mon, 18 Sep 2006 17:50:09 +0300
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <a94efc20609180736u774bfba1k856d01cb6f3d5032@mail.gmail.com>
References: <a94efc20609180643x2e30918dr234667ba4ebd52e8@mail.gmail.com>
	<450EA5FC.7040003@dev.mellanox.co.il>
	<a94efc20609180736u774bfba1k856d01cb6f3d5032@mail.gmail.com>
Message-ID: <450EB221.3000708@dev.mellanox.co.il>

john t wrote:
> Hi Dotan,
>  
> This may be a very basic question. When u said "QPs can listen/send 
> the packets on any port and write to the same MR", does it mean QPs 
> can listen/send packets to any port on the same HCA or to also ports 
> on different HCA ?
You are right, i wasn't clear enough (sorry)...

What i  meant was that you can work with more than one QP in an HCA,
each QP can send/recv messages on a different port (every QP is working 
with only one port).

A QP is a resource in the HCA, hence a QP in HCA1 cannot listen/send 
packets to the port of HCA2.
If you wish, your SW can handle all of the HCAs: you can open a QP for 
every port in every HCA( total QPs:  #HCAs * #Ports ).

i hope i was more clear this time ...
Dotan


>
>  
> On 9/18/06, *Dotan Barak* <dotanb at dev.mellanox.co.il 
> <mailto:dotanb at dev.mellanox.co.il>> wrote:
>
>     Hi john.
>
>     john t wrote:
>     > Hi
>     >
>     > I have two HCA cards each having one port. I want to use same
>     memory
>     > buffer to store packets arriving on the two ports. Can I do this,
>     > meaning can I use same pd (protection domain) and mr (memory
>     > registration) for the two QPs (one QP on each port), though the
>     > context ( i.e. ib device) for each QP is different?
>
>     if the context is different how can you create 2 QPs using the
>     same PD?
>     The context is a driver abstraction and the HCA is not aware of it ...
>
>     anyway, if you have 2 QPs and 1 MR which are in the same PD, the
>     QPs can
>     listen/send the packets on any port and write to the same MR
>     (in different address of course, the order of the packet arrival in
>     those QPs is "random" ...)
>
>     Dotan
>
>


From eitan at mellanox.co.il  Mon Sep 18 08:20:07 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Mon, 18 Sep 2006 18:20:07 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EEF6F@mail.silverstorm.com>
References: <D80D83302DEE6249A221093BF2BB69AE8EEF6F@mail.silverstorm.com>
Message-ID: <450EB927.2020903@mellanox.co.il>

Hi Todd,

Seems like your knowledge about the specific MTU best for the 
application (MPI) you are running is good
enough such that you will be able to include the MTU in the PathRecord 
request and thus the patch describe in here will not affect your MPI at all.
The patch only applies if your request does not  provide any MTU & MTU 
SEL comp_mask

EZ

Rimmer, Todd wrote:

>>From: Or Gerlitz
>>Sent: Monday, September 18, 2006 5:45 AM
>>To: Michael S. Tsirkin
>>Cc: OPENIB
>>Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
>>    
>>
>for
>  
>
>>MT23108 devices
>>
>>Michael S. Tsirkin wrote:
>>    
>>
>>>Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
>>>      
>>>
>>>>Eitan Zahavi wrote:
>>>>        
>>>>
>>>>>The following patch solves an issue with OpenSM preferring largest
>>>>>          
>>>>>
>MTU
>  
>
>>>>>for PathRecord/MultiPathRecord for paths going to or from MT23108
>>>>>          
>>>>>
>>(Tavor)
>>    
>>
>>>>>devices instead of using a 1K MTU which is best for this device.
>>>>>          
>>>>>
>>>>Isn't the 2K MTU issue with Tavor comes into play only under RC QP?
>>>>        
>>>>
>>>I don't think so, no. Tavor supports 2K MTU, but it has better
>>>      
>>>
>>performance with
>>    
>>
>>>1K MTU than 2K MTU. QP type should not matter.
>>>      
>>>
>>Can you double check that please, as far as i know there is something
>>like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW
>>    
>>
>with
>  
>
>>Tavor/UD/2048 is **no less** then Tavor/UD/1024.
>>
>>So its very common for IPoIB net devices impl. to expose 2044 or 1500
>>bytes MTU to the OS eg to cope with Ethernet and reduce IP
>>fragmentation/reassembly of UDP/TCP traffic.
>>
>>    
>>
>
>Putting this in the SM alone and making it a fabric wide setting is
>inappropriate.  The performance difference depends on application
>message size.  Application message size can vary per ULP and/or per
>application itself.  For example one MPI application may send mostly
>large messages while another may send mostly small messages.  The same
>could be true of applications for other ULPs such as uDAPL and SDP, etc.
>
>The root issue is the Tavor HCA has 1 too few credits to truly double
>buffer at 2K MTU.  However at message sizes > 1K but < 2K the 2K MTU
>performs better.
>
>Here are some MPI bandwidth results:
>Tavor w/ 2K MTU:
>512             140.394173
>1024            310.553002
>1500            407.003858
>1800            435.538752
>2048            392.831026
>4096            417.592991
>
>Tavor w/ 1K MTU:
>512             140.261964
>1024            300.789425
>1500            379.746835
>1800            416.726957
>2048            425.227096
>4096            501.442289
>
>Note that message sizes shown on left do not include MPI headers.  Hence
>actual IB message size is approx 50 bytes larger.
>
>So we see at IB message sizes < 1024 (MPI 512 message), performance is
>the same.
>At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance
>is best with 2K MTU.
>At IB message sizes > 2048 (MPI 2048-4096 messages above), performance
>is best with 1K MTU.
>At larger IB message sizes (MPI 4096 message), performance starts to
>take off and ultimately at 128K message size (not shown) the 50%
>difference between 1K and 2K MTU reaches its peak.
>
>Todd Rimmer
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From trimmer at silverstorm.com  Mon Sep 18 08:52:18 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 18 Sep 2006 11:52:18 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <450EB927.2020903@mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EEFC2@mail.silverstorm.com>

> From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
> Sent: Monday, September 18, 2006 11:20 AM
> To: Rimmer, Todd
> Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB
> Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
> 
> Hi Todd,
> 
> Seems like your knowledge about the specific MTU best for the
> application (MPI) you are running is good
> enough such that you will be able to include the MTU in the PathRecord
> request and thus the patch describe in here will not affect your MPI
at
> all.
> The patch only applies if your request does not  provide any MTU & MTU
> SEL comp_mask

Eitan,

The question is not about "our MPI", rather its to ensure the Open
Fabrics and OFED included MPIs and ULPs are capable of being tuned for
optimal performance.  When a fabric runs more than 1 application, its
necessary to be able to tune this at the MPI, SDP, etc level, not at the
SM level.

This patch turns on a non-standard behaviour in the SM for the entire
fabric such that some applications will have better performance while
others will suffer.  In order to be complete, this patch would need to
include ULP level tunability in all the relevant ULPs (MPI, SDP, uDAPL,
etc) to select the "MAX MTU" to use or to request.

This then begs the question, if proper tuning requires all the ULPs to
have a configurable MAX MTU, why should the SA need to implement the
quirk at all?

Todd Rimmer

> >
> >Putting this in the SM alone and making it a fabric wide setting is
> >inappropriate.  The performance difference depends on application
> >message size.  Application message size can vary per ULP and/or per
> >application itself.  For example one MPI application may send mostly
> >large messages while another may send mostly small messages.  The
same
> >could be true of applications for other ULPs such as uDAPL and SDP,
etc.
> >
> >The root issue is the Tavor HCA has 1 too few credits to truly double
> >buffer at 2K MTU.  However at message sizes > 1K but < 2K the 2K MTU
> >performs better.
> >
> >Here are some MPI bandwidth results:
> >Tavor w/ 2K MTU:
> >512             140.394173
> >1024            310.553002
> >1500            407.003858
> >1800            435.538752
> >2048            392.831026
> >4096            417.592991
> >
> >Tavor w/ 1K MTU:
> >512             140.261964
> >1024            300.789425
> >1500            379.746835
> >1800            416.726957
> >2048            425.227096
> >4096            501.442289
> >
> >Note that message sizes shown on left do not include MPI headers.
Hence
> >actual IB message size is approx 50 bytes larger.
> >
> >So we see at IB message sizes < 1024 (MPI 512 message), performance
is
> >the same.
> >At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages),
performance
> >is best with 2K MTU.
> >At IB message sizes > 2048 (MPI 2048-4096 messages above),
performance
> >is best with 1K MTU.
> >At larger IB message sizes (MPI 4096 message), performance starts to
> >take off and ultimately at 128K message size (not shown) the 50%
> >difference between 1K and 2K MTU reaches its peak.
> >
> >Todd Rimmer
> >
> >_______________________________________________
> >openib-general mailing list
> >openib-general at openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general
> >
> >


From changquing.tang at hp.com  Mon Sep 18 09:30:57 2006
From: changquing.tang at hp.com (Tang, Changqing)
Date: Mon, 18 Sep 2006 11:30:57 -0500
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <adaodtdp6vb.fsf@cisco.com>
Message-ID: <E55D8401EF235C4FBBC23C3573E4C472C1F8A8@cceexc18.americas.cpqcorp.net>

>
>No, a PD belongs to a specific device.  However nothing 
>prevents you from creating one PD for each device, and two MRs 
>(one for each device, each using one of those two PDs) that 
>cover the same memory.

Roland:
	I did exactly what you said with two cards on a node, however,
if I use the two physical channels for 
Message striping, 99% of the test passed, but for some condition, I got
IBV_WC_RETRY_EXC_ERR, or the code
Just hangs there with no sending completion(ibv_poll_cq returns 0). Do
you think this is a firware issue,
Or the driver issue ?

	Thanks.

--CQ


>
> - R.
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
>
>


From rdreier at cisco.com  Mon Sep 18 09:46:00 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 09:46:00 -0700
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <E55D8401EF235C4FBBC23C3573E4C472C1F8A8@cceexc18.americas.cpqcorp.net>
	(Changqing Tang's message of "Mon, 18 Sep 2006 11:30:57 -0500")
References: <E55D8401EF235C4FBBC23C3573E4C472C1F8A8@cceexc18.americas.cpqcorp.net>
Message-ID: <ada8xkhp1h3.fsf@cisco.com>

    Changqing> Roland: I did exactly what you said with two cards on a
    Changqing> node, however, if I use the two physical channels for
    Changqing> Message striping, 99% of the test passed, but for some
    Changqing> condition, I got IBV_WC_RETRY_EXC_ERR, or the code Just
    Changqing> hangs there with no sending completion(ibv_poll_cq
    Changqing> returns 0). Do you think this is a firware issue, Or
    Changqing> the driver issue ?

'retries exceeded' means that the transport retry count was
exceeded, so most likely your timeout is set too low.

Without seeing your code, I couldn't begin to say why you don't see a
send completion.  If you are absolutely positive that you post a send
and you never see a completion for that send, then I guess it is a
firmware or hardware problem.

 - R.


From changquing.tang at hp.com  Mon Sep 18 09:58:56 2006
From: changquing.tang at hp.com (Tang, Changqing)
Date: Mon, 18 Sep 2006 11:58:56 -0500
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <ada8xkhp1h3.fsf@cisco.com>
Message-ID: <E55D8401EF235C4FBBC23C3573E4C472C1F8EB@cceexc18.americas.cpqcorp.net>

 
>
>'retries exceeded' means that the transport retry count was 
>exceeded, so most likely your timeout is set too low.

Is there a common recommended value for this timeout ? I use 18, which
represents 1 second.

>
>Without seeing your code, I couldn't begin to say why you 
>don't see a send completion.  If you are absolutely positive 
>that you post a send and you never see a completion for that 
>send, then I guess it is a firmware or hardware problem.

It is very hard to reproduce this error with standalone code. I use
HP-Mpi and need 8 ranks, at least 4 nodes with 
2 cards on each node, and just one of our hundred test code can catch
this error, and it is on MPI_Scatterv
Operation.

--CQ


>
> - R.
>


From rdreier at cisco.com  Mon Sep 18 10:02:42 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 10:02:42 -0700
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <E55D8401EF235C4FBBC23C3573E4C472C1F8EB@cceexc18.americas.cpqcorp.net>
	(Changqing Tang's message of "Mon, 18 Sep 2006 11:58:56 -0500")
References: <E55D8401EF235C4FBBC23C3573E4C472C1F8EB@cceexc18.americas.cpqcorp.net>
Message-ID: <ada4pv5p0p9.fsf@cisco.com>

    Changqing> Is there a common recommended value for this timeout ?
    Changqing> I use 18, which represents 1 second.

18 should be OK I guess, unless you have congestion in your fabric, in
which case you have other problems anyway.

    Changqing> It is very hard to reproduce this error with standalone
    Changqing> code. I use HP-Mpi and need 8 ranks, at least 4 nodes
    Changqing> with 2 cards on each node, and just one of our hundred
    Changqing> test code can catch this error, and it is on
    Changqing> MPI_Scatterv Operation.

Unless you can narrow down a way to reproduce this, I don't think it's
going to be possible for anyone to help debug it.

 - R.


From changquing.tang at hp.com  Mon Sep 18 10:14:35 2006
From: changquing.tang at hp.com (Tang, Changqing)
Date: Mon, 18 Sep 2006 12:14:35 -0500
Subject: [openib-general] Reuse pd amd mr
In-Reply-To: <ada4pv5p0p9.fsf@cisco.com>
Message-ID: <E55D8401EF235C4FBBC23C3573E4C472C1F918@cceexc18.americas.cpqcorp.net>

 
>
>    Changqing> Is there a common recommended value for this timeout ?
>    Changqing> I use 18, which represents 1 second.
>
>18 should be OK I guess, unless you have congestion in your 
>fabric, in which case you have other problems anyway.
>
>    Changqing> It is very hard to reproduce this error with standalone
>    Changqing> code. I use HP-Mpi and need 8 ranks, at least 4 nodes
>    Changqing> with 2 cards on each node, and just one of our hundred
>    Changqing> test code can catch this error, and it is on
>    Changqing> MPI_Scatterv Operation.
>
>Unless you can narrow down a way to reproduce this, I don't 
>think it's going to be possible for anyone to help debug it.

OK, I forget to mention, if I use rdma on both channels, it is hard to
reprocude the hang,
If I create SRQ on one of the channel, then it hangs the other channel
even on the first
Rdma operation, I will write a standlone code for you driver guys to
debug.


--CQ


>
> - R.
>


From mst at mellanox.co.il  Mon Sep 18 10:22:05 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 20:22:05 +0300
Subject: [openib-general] Mstflint - not working on ppc64 and whendriver
 is not loaded on AMD
In-Reply-To: <1158589490.21249.19.camel@flin.austin.ibm.com>
References: <1158589490.21249.19.camel@flin.austin.ibm.com>
Message-ID: <20060918172205.GA1371@mellanox.co.il>

Quoting r. Tseng-Hui (Frank) Lin <thlin at us.ibm.com>:
>     You mentioned "your version" of mstflint. Is that a different one
> from the one in OFED-1.0? If it is, would you mind sending me a copy of
> your version so that I can play with it as well? Thanks.

Jut the one in svn trunk/OFED 1.1 RC.

-- 
MST


From ralphc at pathscale.com  Mon Sep 18 10:27:59 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Mon, 18 Sep 2006 10:27:59 -0700
Subject: [openib-general] How to support IOMMUs for ipath driver
In-Reply-To: <450D1C0A.90906@voltaire.com>
References: <1158108010.8759.192.camel@brick.pathscale.com>
	<45093428.5010009@voltaire.com>
	<1158263019.8759.324.camel@brick.pathscale.com>
	<450D1C0A.90906@voltaire.com>
Message-ID: <1158600479.2592.9.camel@brick.pathscale.com>

On Sun, 2006-09-17 at 12:57 +0300, Or Gerlitz wrote:
> Ralph Campbell wrote:
> > Here is my thinking so far:
> > 
> > The driver is passed an LKEY/RKEY plus an address.
> > For ib_get_dma_mr(), the address is currently from
> > dma_map_single(), dma_map_page(), or dma_map_sg().
> > With the ib_dma_*() routines, I can intercept these calls
> > and return something instead of a bus or IOMMU address.
> > I would like to return a kernel virtual address since that
> > is the simplest and is what I ultimately need. This is
> > trivial for dma_map_single() and trivial for low memory
> > pages for dma_map_page().
> > 
> > I think I can safely just return error for architectures
> > with high memory pages since the driver really only works
> > on 64-bit systems (for a variety of reasons which I won't
> > go into) and those systems don't have high memory.
> 
> Again (and please go and check me), pages you need to DMA (ie move over 
> IB) need **not** be mapped into the kernel virtual address space and 
> this happens **not** only under ia32 high-memory scheme, please see my 
> other email for two examples (direct I/O etc)
> 
> 
> > ib_sg_dma_address would return the page_address() of sg->page
> > but wouldn't be able to rely on other fields which might be in
> > the struct scatterlist.
> 
> your design seems to reply on three fields: page, offset and length, so
> 
> ib_sg_map_sg(scat) is kmap-ping whatever pages which are not mapped now 
> into kvirt
> 
> ib_dma_unmap_sg(scat) is kunmap-ping those pages you were mapping before 
> (you might need an aux data structure to keep which need kunmap)
> 
> ib_sg_dma_address(scat) is page_address(scat->page) + scat->offset
> 
> ib_sg_dma_len(scat) is scat->length
> 
> Or.

Correct.


From mst at mellanox.co.il  Mon Sep 18 11:06:11 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 21:06:11 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EEFC2@mail.silverstorm.com>
References: <D80D83302DEE6249A221093BF2BB69AE8EEFC2@mail.silverstorm.com>
Message-ID: <20060918180611.GB1371@mellanox.co.il>

Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
> Subject: RE: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices
> 
> > From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
> > Sent: Monday, September 18, 2006 11:20 AM
> > To: Rimmer, Todd
> > Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB
> > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
> for
> > MT23108 devices
> > 
> > Hi Todd,
> > 
> > Seems like your knowledge about the specific MTU best for the
> > application (MPI) you are running is good
> > enough such that you will be able to include the MTU in the PathRecord
> > request and thus the patch describe in here will not affect your MPI
> at
> > all.
> > The patch only applies if your request does not  provide any MTU & MTU
> > SEL comp_mask
> 
> Eitan,
> 
> The question is not about "our MPI", rather its to ensure the Open
> Fabrics and OFED included MPIs and ULPs are capable of being tuned for
> optimal performance.  When a fabric runs more than 1 application, its
> necessary to be able to tune this at the MPI, SDP, etc level, not at the
> SM level.

We did not remove this ability at all. So it's there.

> This patch turns on a non-standard behaviour in the SM for the entire
> fabric such that some applications will have better performance while
> others will suffer.

I disagree. The behaviour is perfectly standards compliant.

> In order to be complete, this patch would need to
> include ULP level tunability in all the relevant ULPs (MPI, SDP, uDAPL,
> etc) to select the "MAX MTU" to use or to request.

This tunability is already there - that's what MTU selector in path queries
does.

> This then begs the question, if proper tuning requires all the ULPs to
> have a configurable MAX MTU, why should the SA need to implement the
> quirk at all?
> 
> Todd Rimmer

If ULP wants MAX MTU, it must set MTU selector to 3 in path query.

If MTU selector is disabled in the query, SM will guess which MTU is best to
select. SM used a specific heuristic to perform that guess.  All we did is,
provide an option to use a different heuristic.

This is useful because, SM has data on the whole fabric as opposed to ULPs
which often only have data on the endnode.

-- 
MST


From bgreen at nas.nasa.gov  Mon Sep 18 11:07:32 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Mon, 18 Sep 2006 11:07:32 -0700
Subject: [openib-general] patch trouble
In-Reply-To: Your message of "Sun, 17 Sep 2006 23:31:53 +0300."
	<20060917203153.GG32526@mellanox.co.il>
Message-ID: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov>

"Michael S. Tsirkin" writes:
> Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > Is there a great discrepancy between the git repository and the svn
> > repository?  If I am downloading the kernel modules from subversion, should I
> > still use the patchset from the git repository? 
> 
> *Please* do not use svn trunk code for production.
> You want either kernel.org code or the OFED git repository for everything.
> kernel code in subversion is being deprecated.

Okay, thanks.  Thats good to know.  What about the userspace code in the
1.0/1.1 svn branch?  Is that userspace code equivalent to what's in the
OFED distribution?  I can forego using kernel code from subversion, but
it is convenient for the userspace stuff (as explained below).

> 
> > There is currently only a source tarball
> > for libibverbs, while ofed is too RPM-centric.
> 
> Not really. Please try the following:
> 
> Get the ofed tarball here
> https://openib.org/svn/gen2/branches/1.1/ofed/releases/
> and unpack it.
> Take this file: SOURCES/openib-1.1.tgz
> That's all of subversion + git all nicely packed up.
 
The problem with that tar file is that it contains far more than the
'.tgz' file.  It also contains some large source rpms.  The whole thing is
47 Megs, of which only about 12 Megs is of interest to me.

> Let me know how it goes. BTW, which distro are you using?

I am using Gentoo (www.gentoo.org).  I am actually writing the ebuild
(http://en.wikipedia.org/wiki/Ebuild) scripts for adding openib to the
gentoo science overlay (http://svn.cryos.net/projects/gentoo-sci-overlay)

The key feature of Gentoo package management is that packages are
downloaded, built, and installed from source, all in an automated fashion.

Downloading the entire 47 Meg files to build openib is prohibitive.
Especially if all one wants to do is build, say, libibverbs, libmthca, and
the performance tools.  That is why I am downloading the userspace code
from svn.  If there was a single downloadable openib.tgz file, I could
build the kernel modules as well as the userspace tools from that.
In the meantime, I'd like to continue getting userspace code from svn.

As for the kernel modules, I will stick with whats in the kernel for now,
though I look forward to SDP being added to the main line, as I'm getting
some rather nice performance from it.

-bryan


From trimmer at silverstorm.com  Mon Sep 18 11:20:46 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 18 Sep 2006 14:20:46 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <20060918180611.GB1371@mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EF020@mail.silverstorm.com>

> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
> Sent: Monday, September 18, 2006 2:06 PM
> To: Rimmer, Todd
> Cc: Eitan Zahavi; Or Gerlitz; OPENIB
> Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
> 
> Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
> > Subject: RE: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
> >
> > > From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
> > > Sent: Monday, September 18, 2006 11:20 AM
> > > To: Rimmer, Todd
> > > Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB
> > > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K
MTU
> > for
> > > MT23108 devices
> > >
> > > Hi Todd,
> > >
> > > Seems like your knowledge about the specific MTU best for the
> > > application (MPI) you are running is good
> > > enough such that you will be able to include the MTU in the
PathRecord
> > > request and thus the patch describe in here will not affect your
MPI
> > at
> > > all.
> > > The patch only applies if your request does not  provide any MTU &
MTU
> > > SEL comp_mask
> >
> > Eitan,
> >
> > The question is not about "our MPI", rather its to ensure the Open
> > Fabrics and OFED included MPIs and ULPs are capable of being tuned
for
> > optimal performance.  When a fabric runs more than 1 application,
its
> > necessary to be able to tune this at the MPI, SDP, etc level, not at
the
> > SM level.
> 
> We did not remove this ability at all. So it's there.
> 
> > In order to be complete, this patch would need to
> > include ULP level tunability in all the relevant ULPs (MPI, SDP,
uDAPL,
> > etc) to select the "MAX MTU" to use or to request.
> 
> This tunability is already there - that's what MTU selector in path
> queries
> does.
> 
> > This then begs the question, if proper tuning requires all the ULPs
to
> > have a configurable MAX MTU, why should the SA need to implement the
> > quirk at all?
> >
> If ULP wants MAX MTU, it must set MTU selector to 3 in path query.
> 
> If MTU selector is disabled in the query, SM will guess which MTU is
best
> to
> select. SM used a specific heuristic to perform that guess.  All we
did
> is,
> provide an option to use a different heuristic.
> 
> This is useful because, SM has data on the whole fabric as opposed to
ULPs
> which often only have data on the endnode.

The patch you submitted only modified Open SM.  So please show me the
patch where MVAPICH, Open MPI, SDP, SRP and other ULPs allow this to be
tuned by the user or application?  Lacking that patch, all the "if a ULP
wants" statements above are mute.  The goal is for OFED to provide a
high performance standard solution.  If end users must modify the ULPs
source code to achieve that goal, OFED misses the mark.

Todd Rimmer


From rdreier at cisco.com  Mon Sep 18 11:21:23 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 11:21:23 -0700
Subject: [openib-general] patch trouble
In-Reply-To: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov> (Bryan
	Green's message of "Mon, 18 Sep 2006 11:07:32 -0700")
References: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov>
Message-ID: <adavenlniho.fsf@cisco.com>

    Bryan> I am using Gentoo (www.gentoo.org).  I am actually writing
    Bryan> the ebuild (http://en.wikipedia.org/wiki/Ebuild) scripts
    Bryan> for adding openib to the gentoo science overlay
    Bryan> (http://svn.cryos.net/projects/gentoo-sci-overlay)

Cool, glad to hear it.

    Bryan> Downloading the entire 47 Meg files to build openib is
    Bryan> prohibitive.  Especially if all one wants to do is build,
    Bryan> say, libibverbs, libmthca, and the performance tools.  That
    Bryan> is why I am downloading the userspace code from svn.  If
    Bryan> there was a single downloadable openib.tgz file, I could
    Bryan> build the kernel modules as well as the userspace tools
    Bryan> from that.  In the meantime, I'd like to continue getting
    Bryan> userspace code from svn.

For libibverbs and libmthca at least, I am careful to keep the
releases on http://openib.org/downloads/ up to date.  For example, you
can find

    http://openib.org/downloads/libibverbs-1.0.3.tar.gz
    http://openib.org/downloads/libmthca-1.0.2.tar.gz

there, which are the latest stable releases as of now.

None of the other package maintainers seems to have gotten serious
about publishing releases of their packages -- and I agree with you
that just relying on OFED leaves a serious gap.

Anyway, for whatever reason, I seem to be the only openib person who
really cares about distro inclusion of stuff, but I'm happy to do
things that make your job as a packager easier, at least for my
userspace packages (libibverbs and libmthca).  Just let me know.  I've
already gotten those packages into mainline Debian, Ubuntu and Fedora
Extras repositories, and I'd be happy to see them in Gentoo as well.

 - R.


From mst at mellanox.co.il  Mon Sep 18 11:22:41 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 21:22:41 +0300
Subject: [openib-general] patch trouble
In-Reply-To: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov>
References: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov>
Message-ID: <20060918182241.GE1371@mellanox.co.il>

Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> Subject: Re: patch trouble
> 
> "Michael S. Tsirkin" writes:
> > Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > > Is there a great discrepancy between the git repository and the svn
> > > repository?  If I am downloading the kernel modules from subversion, should I
> > > still use the patchset from the git repository? 
> > 
> > *Please* do not use svn trunk code for production.
> > You want either kernel.org code or the OFED git repository for everything.
> > kernel code in subversion is being deprecated.
> 
> Okay, thanks.  Thats good to know.  What about the userspace code in the
> 1.0/1.1 svn branch?  Is that userspace code equivalent to what's in the
> OFED distribution?  I can forego using kernel code from subversion, but
> it is convenient for the userspace stuff (as explained below).

Yes, that 's what OFED uses for userspace.

> > 
> > > There is currently only a source tarball
> > > for libibverbs, while ofed is too RPM-centric.
> > 
> > Not really. Please try the following:
> > 
> > Get the ofed tarball here
> > https://openib.org/svn/gen2/branches/1.1/ofed/releases/
> > and unpack it.
> > Take this file: SOURCES/openib-1.1.tgz
> > That's all of subversion + git all nicely packed up.
>  
> The problem with that tar file is that it contains far more than the
> '.tgz' file.  It also contains some large source rpms.  The whole thing is
> 47 Megs, of which only about 12 Megs is of interest to me.

So, I guess you want to just remove the rest of the stuff?

> > Let me know how it goes. BTW, which distro are you using?
> 
> I am using Gentoo (www.gentoo.org).  I am actually writing the ebuild
> (http://en.wikipedia.org/wiki/Ebuild) scripts for adding openib to the
> gentoo science overlay (http://svn.cryos.net/projects/gentoo-sci-overlay)
> 
> The key feature of Gentoo package management is that packages are
> downloaded, built, and installed from source, all in an automated fashion.
> 
> Downloading the entire 47 Meg files to build openib is prohibitive.
> Especially if all one wants to do is build, say, libibverbs, libmthca, and
> the performance tools.  That is why I am downloading the userspace code
> from svn.  If there was a single downloadable openib.tgz file, I could
> build the kernel modules as well as the userspace tools from that.

We can try looking into that. So what do you want it to include?
We currently only target RPM based distros.
Are you willing to maintan the gentoo support?
Maybe after each release candidate you can prepare the tarball
for gentoo and upload it?

> In the meantime, I'd like to continue getting userspace code from svn.

That's fine too, I think.

> As for the kernel modules, I will stick with whats in the kernel for now,
> though I look forward to SDP being added to the main line, as I'm getting
> some rather nice performance from it.

Hmm. Which kernel do you run?
If you have 2.6.18, it's easy to add just SDP as an out of kernel module.

-- 
MST


From rdreier at cisco.com  Mon Sep 18 11:39:37 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 11:39:37 -0700
Subject: [openib-general] patch trouble
In-Reply-To: <20060918182241.GE1371@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 18 Sep 2006 21:22:41 +0300")
References: <200609181807.k8II7Wo5013573@ece06.nas.nasa.gov>
	<20060918182241.GE1371@mellanox.co.il>
Message-ID: <adar6y9nhna.fsf@cisco.com>

    Michael> Hmm. Which kernel do you run?  If you have 2.6.18, it's
    Michael> easy to add just SDP as an out of kernel module.

Lots of things would be easy if you have kernel 2.6.18, because then
you also have a time machine...

 - R.


From bgreen at nas.nasa.gov  Mon Sep 18 11:41:41 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Mon, 18 Sep 2006 11:41:41 -0700
Subject: [openib-general] patch trouble
In-Reply-To: Your message of "Mon, 18 Sep 2006 21:22:41 +0300."
	<20060918182241.GE1371@mellanox.co.il>
Message-ID: <200609181841.k8IIffnc013148@ece06.nas.nasa.gov>

"Michael S. Tsirkin" writes:
> Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > Subject: Re: patch trouble
> >  
> > The problem with that tar file is that it contains far more than the
> > '.tgz' file.  It also contains some large source rpms.  The whole thing
> > is 47 Megs, of which only about 12 Megs is of interest to me.
> 
> So, I guess you want to just remove the rest of the stuff?

Well, ideally, it would be possible to download a tarball that
contained the contents of
https://openfabrics.org/svn/gen2/branches/1.1/src/userspace/, minus the
mvapich subdirs, which could perhaps be a separate tgz as they are very
large on their own.

> 
> > If there was a single downloadable openib.tgz file, I could
> > build the kernel modules as well as the userspace tools from that.
> 
> We can try looking into that. So what do you want it to include?
> We currently only target RPM based distros.
> Are you willing to maintan the gentoo support?
> Maybe after each release candidate you can prepare the tarball
> for gentoo and upload it?
 
I'm willing the maintain the gentoo support for the gentoo science
overlay.  That isn't a problem.  I have already constructed gentoo-ified
versions of the OFED scripts found in '1.0/ofed/openib/scripts'.
I could potentially upload tarballs as you suggest.  For now, I'm going to
stick with subversion until my ebuilds are commited to the science
overlay, and there is potential interest in it from others.  We will have
to see.  If there are others in the gentoo community who would benefit
from it, I'd be happy to produce and upload the tarfiles as my schedule
permits, if openfabrics.org would host them.


> > As for the kernel modules, I will stick with whats in the kernel for now,
> > though I look forward to SDP being added to the main line, as I'm getting
> > some rather nice performance from it.
> 
> Hmm. Which kernel do you run?
> If you have 2.6.18, it's easy to add just SDP as an out of kernel module.

Usually our cluster has a very up-to-date kernel.  Right now I'm being
forced to downgrade it temporarily to 2.6.12 in order to evaluate Lustre.
In the future, I plan to do as you suggest, and add SDP as an out of
kernel module.

-bryan


From bgreen at nas.nasa.gov  Mon Sep 18 11:52:40 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Mon, 18 Sep 2006 11:52:40 -0700
Subject: [openib-general] patch trouble
In-Reply-To: Your message of "Mon, 18 Sep 2006 21:22:41 +0300."
	<20060918182241.GE1371@mellanox.co.il>
Message-ID: <200609181852.k8IIqeMH013506@ece06.nas.nasa.gov>

"Michael S. Tsirkin" writes:
> Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > Subject: Re: patch trouble
> > 
> >  
> > The problem with that tar file is that it contains far more than the
> > '.tgz' file.  It also contains some large source rpms.  The whole thing i
> s
> > 47 Megs, of which only about 12 Megs is of interest to me.
> 
> So, I guess you want to just remove the rest of the stuff?

I forgot to mention in the previous email - a separate tar file with kernel
module code and patches would be nice too.  Just to add to my previous
wish list.  ;)

-bryan


From bgreen at nas.nasa.gov  Mon Sep 18 11:56:58 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Mon, 18 Sep 2006 11:56:58 -0700
Subject: [openib-general] patch trouble
In-Reply-To: Your message of "Mon, 18 Sep 2006 11:21:23 PDT."
	<adavenlniho.fsf@cisco.com>
Message-ID: <200609181856.k8IIuwC9013626@ece06.nas.nasa.gov>

Roland Dreier writes:
> 
> For libibverbs and libmthca at least, I am careful to keep the
> releases on http://openib.org/downloads/ up to date.  For example, you
> can find
> 
>     http://openib.org/downloads/libibverbs-1.0.3.tar.gz
>     http://openib.org/downloads/libmthca-1.0.2.tar.gz
> 
> there, which are the latest stable releases as of now.

Yes, my first experimental ebuild was based on your libibverbs package. :)
But then I wanted more (namely libsdp and mvapich2) and moved on the svn.
Thanks for the packages.  I was wondering why those were the only ones.

> 
> Anyway, for whatever reason, I seem to be the only openib person who
> really cares about distro inclusion of stuff, but I'm happy to do
> things that make your job as a packager easier, at least for my
> userspace packages (libibverbs and libmthca).  Just let me know.  I've
> already gotten those packages into mainline Debian, Ubuntu and Fedora
> Extras repositories, and I'd be happy to see them in Gentoo as well.

Cool, thanks for the support.  :)

-bryan


From kliteyn at dev.mellanox.co.il  Mon Sep 18 11:52:09 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 18 Sep 2006 21:52:09 +0300
Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics
Message-ID: <450EEAD9.1000503@dev.mellanox.co.il>

Hi Hal

This patch fixes a bug in opensm that was discovered on
a 'broken' fabrics when opensm was executed with --stay_on_fatal.
Replacing assert with a real check.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Index: opensm/osm_node_info_rcv.c
===================================================================
--- opensm/osm_node_info_rcv.c  (revision 9527)
+++ opensm/osm_node_info_rcv.c  (working copy)
@@ -543,7 +543,13 @@ __osm_ni_rcv_process_ca_port(
      p_physp = osm_node_get_physp_ptr( p_node, port_num );

      CL_ASSERT( p_physp );
-    CL_ASSERT( osm_physp_is_valid( p_physp ) );
+    if (!osm_physp_is_valid( p_physp ))
+    {
+        osm_log( p_rcv->p_log, OSM_LOG_ERROR,
+                 "__osm_ni_rcv_process_ca_port: ERR 0D19: "
+                 "Invalid physical port found. Aborting discovery.\n");
+        goto Exit;
+    }

      /*
        Update the DR Path to the port,


From mst at mellanox.co.il  Mon Sep 18 12:05:05 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 22:05:05 +0300
Subject: [openib-general] patch trouble
In-Reply-To: <200609181856.k8IIuwC9013626@ece06.nas.nasa.gov>
References: <200609181856.k8IIuwC9013626@ece06.nas.nasa.gov>
Message-ID: <20060918190505.GG1371@mellanox.co.il>

Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> But then I wanted more (namely libsdp and mvapich2) and moved on the svn.
> Thanks for the packages.  I was wondering why those were the only ones.

libsdp (as well as SDP itself) is still in beta, that's why we didn't
publish any releases yet.

-- 
MST


From mst at mellanox.co.il  Mon Sep 18 12:13:53 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 22:13:53 +0300
Subject: [openib-general] patch trouble
In-Reply-To: <200609181852.k8IIqeMH013506@ece06.nas.nasa.gov>
References: <20060918182241.GE1371@mellanox.co.il>
	<200609181852.k8IIqeMH013506@ece06.nas.nasa.gov>
Message-ID: <20060918191353.GH1371@mellanox.co.il>

Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> Subject: Re: patch trouble
> 
> "Michael S. Tsirkin" writes:
> > Quoting r. Bryan Green <bgreen at nas.nasa.gov>:
> > > Subject: Re: patch trouble
> > > 
> > >  
> > > The problem with that tar file is that it contains far more than the
> > > '.tgz' file.  It also contains some large source rpms.  The whole thing i
> > s
> > > 47 Megs, of which only about 12 Megs is of interest to me.
> > 
> > So, I guess you want to just remove the rest of the stuff?
> 
> I forgot to mention in the previous email - a separate tar file with kernel
> module code and patches would be nice too.  Just to add to my previous
> wish list.  ;)

OK, so you'll be the first user for kernel/user split that I
wanted to do right after 1.1.

-- 
MST


From mst at mellanox.co.il  Mon Sep 18 12:17:08 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 18 Sep 2006 22:17:08 +0300
Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names
Message-ID: <20060918191708.GI1371@mellanox.co.il>

IB/sa: Fix ib_sa_selector names

Relevant SA queries are actually "greater than"
not "greater than or equal" as the name implies.
See IB spec 1.2 Vol 1, 15.2.5.16 PATHRECORD/Table 205 PathRecord.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: linux-2.6.18-rc2-devel/include/rdma/ib_sa.h
===================================================================
--- linux-2.6.18-rc2-devel.orig/include/rdma/ib_sa.h	2006-09-17 11:54:38.000000000 +0300
+++ linux-2.6.18-rc2-devel/include/rdma/ib_sa.h	2006-09-17 11:54:51.000000000 +0300
@@ -79,8 +79,8 @@ enum {
 };
 
 enum ib_sa_selector {
-	IB_SA_GTE  = 0,
-	IB_SA_LTE  = 1,
+	IB_SA_GT   = 0,
+	IB_SA_LT   = 1,
 	IB_SA_EQ   = 2,
 	/*
 	 * The meaning of "best" depends on the attribute: for

-- 
MST


From eitan at mellanox.co.il  Mon Sep 18 13:11:35 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Mon, 18 Sep 2006 23:11:35 +0300
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 include/opensm/osm_pkey.h
In-Reply-To: <1158576813.18842.935.camel@hal.voltaire.com>
References: <86zmcylc2e.fsf@mtl066.yok.mtl.com>
	<1158576813.18842.935.camel@hal.voltaire.com>
Message-ID: <450EFD77.7000605@mellanox.co.il>

Hal Rosenstock wrote:

>Hi Eitan,
>
>On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
>  
>
>>Hi Hal
>>
>>Partition tables blocks are always 16 bits. 
>>This resolves the need to later cast back and forth.
>>
>>Thanks
>>
>>Eitan
>>
>>Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
>>
>>Index: include/opensm/osm_pkey.h
>>===================================================================
>>--- include/opensm/osm_pkey.h	(revision 9502)
>>+++ include/opensm/osm_pkey.h	(working copy)
>>@@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl
>> typedef struct _osm_pending_pkey {
>>   cl_list_item_t	list_item;
>>   uint16_t		pkey;
>>-  uint32_t		block;
>>+  uint16_t		block;
>>   uint8_t		index;
>>   boolean_t		is_new;
>> } osm_pending_pkey_t;
>>@@ -396,7 +396,7 @@ ib_api_status_t
>> osm_pkey_tbl_get_block_and_idx(
>>   IN  osm_pkey_tbl_t *p_pkey_tbl, 
>>   IN  uint16_t       *p_pkey,
>>-  OUT uint32_t       *block_idx,
>>+  OUT uint16_t       *block_idx,
>>   OUT uint8_t        *pkey_index);
>> /*
>> *  p_pkey_tbl
>>    
>>
>
>Doesn't this require at least a similar change to
>opensm/osm_pkey.c:osm_pkey_tbl_get_block_and_idx ? Anything else ?
>  
>
Sure this affects the osm_pkey.c.
It is included in the mail named:  [PATCH 10/13] osm: port to WinIB 
stack : opensm/osm_pkey.c

>-- Hal
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From eitan at mellanox.co.il  Mon Sep 18 13:17:20 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: Mon, 18 Sep 2006 23:17:20 +0300
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EF020@mail.silverstorm.com>
References: <D80D83302DEE6249A221093BF2BB69AE8EF020@mail.silverstorm.com>
Message-ID: <450EFED0.7050909@mellanox.co.il>

Rimmer, Todd wrote:

>>From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
>>Sent: Monday, September 18, 2006 2:06 PM
>>To: Rimmer, Todd
>>Cc: Eitan Zahavi; Or Gerlitz; OPENIB
>>Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
>>    
>>
>for
>  
>
>>MT23108 devices
>>
>>Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
>>    
>>
>>>Subject: RE: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
>>>      
>>>
>for
>  
>
>>MT23108 devices
>>    
>>
>>>>From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
>>>>Sent: Monday, September 18, 2006 11:20 AM
>>>>To: Rimmer, Todd
>>>>Cc: Or Gerlitz; Michael S. Tsirkin; OPENIB
>>>>Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K
>>>>        
>>>>
>MTU
>  
>
>>>for
>>>      
>>>
>>>>MT23108 devices
>>>>
>>>>Hi Todd,
>>>>
>>>>Seems like your knowledge about the specific MTU best for the
>>>>application (MPI) you are running is good
>>>>enough such that you will be able to include the MTU in the
>>>>        
>>>>
>PathRecord
>  
>
>>>>request and thus the patch describe in here will not affect your
>>>>        
>>>>
>MPI
>  
>
>>>at
>>>      
>>>
>>>>all.
>>>>The patch only applies if your request does not  provide any MTU &
>>>>        
>>>>
>MTU
>  
>
>>>>SEL comp_mask
>>>>        
>>>>
>>>Eitan,
>>>
>>>The question is not about "our MPI", rather its to ensure the Open
>>>Fabrics and OFED included MPIs and ULPs are capable of being tuned
>>>      
>>>
>for
>  
>
>>>optimal performance.  When a fabric runs more than 1 application,
>>>      
>>>
>its
>  
>
>>>necessary to be able to tune this at the MPI, SDP, etc level, not at
>>>      
>>>
>the
>  
>
>>>SM level.
>>>      
>>>
>>We did not remove this ability at all. So it's there.
>>
>>    
>>
>>>In order to be complete, this patch would need to
>>>include ULP level tunability in all the relevant ULPs (MPI, SDP,
>>>      
>>>
>uDAPL,
>  
>
>>>etc) to select the "MAX MTU" to use or to request.
>>>      
>>>
>>This tunability is already there - that's what MTU selector in path
>>queries
>>does.
>>
>>    
>>
>>>This then begs the question, if proper tuning requires all the ULPs
>>>      
>>>
>to
>  
>
>>>have a configurable MAX MTU, why should the SA need to implement the
>>>quirk at all?
>>>
>>>      
>>>
>>If ULP wants MAX MTU, it must set MTU selector to 3 in path query.
>>
>>If MTU selector is disabled in the query, SM will guess which MTU is
>>    
>>
>best
>  
>
>>to
>>select. SM used a specific heuristic to perform that guess.  All we
>>    
>>
>did
>  
>
>>is,
>>provide an option to use a different heuristic.
>>
>>This is useful because, SM has data on the whole fabric as opposed to
>>    
>>
>ULPs
>  
>
>>which often only have data on the endnode.
>>    
>>
>
>The patch you submitted only modified Open SM.  So please show me the
>patch where MVAPICH, Open MPI, SDP, SRP and other ULPs allow this to be
>tuned by the user or application?  Lacking that patch, all the "if a ULP
>wants" statements above are mute.  The goal is for OFED to provide a
>high performance standard solution.  If end users must modify the ULPs
>source code to achieve that goal, OFED misses the mark.
>  
>
To our best knowledge the change which automatically selects 1K MTU for 
the above ULPs improves their performance.
Do you have any measurement on OFED 1.1 that shows otherwise? Under what 
cases?
If this is the case then you basically do not have to do anything and 
all just works as it used to.
But if we are correct then a user can create an OpenSM cache file, 
modify the enable_quirks to TRUE and restart the SM.
I am sure you could imaging this patch was not just another way for us 
to spend our time...

>Todd Rimmer
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>


From trimmer at silverstorm.com  Mon Sep 18 13:30:53 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 18 Sep 2006 16:30:53 -0400
Subject: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for
 MT23108 devices
In-Reply-To: <450EFED0.7050909@mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EF09C@mail.silverstorm.com>

> From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
> Sent: Monday, September 18, 2006 4:17 PM
> To: Rimmer, Todd
> Cc: Michael S. Tsirkin; Or Gerlitz; OPENIB
> Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
> 

> >The patch you submitted only modified Open SM.  So please show me the
> >patch where MVAPICH, Open MPI, SDP, SRP and other ULPs allow this to
be
> >tuned by the user or application?  Lacking that patch, all the "if a
ULP
> >wants" statements above are mute.  The goal is for OFED to provide a
> >high performance standard solution.  If end users must modify the
ULPs
> >source code to achieve that goal, OFED misses the mark.
> >
> >
> To our best knowledge the change which automatically selects 1K MTU
for
> the above ULPs improves their performance.
> Do you have any measurement on OFED 1.1 that shows otherwise? Under
what
> cases?
> If this is the case then you basically do not have to do anything and
> all just works as it used to.
> But if we are correct then a user can create an OpenSM cache file,
> modify the enable_quirks to TRUE and restart the SM.
> I am sure you could imaging this patch was not just another way for us
> to spend our time...
> 

In my previous posting I included MPI performance numbers which showed
>1K <2K performance was reduced when using 2K MTU.  This also applies to
SDP when an existing application is using message sizes in this range
(which is quite common).

Having control only in the SM is not sufficient.  Having to bounce the
SM is even worse.

Todd Rimmer


From halr at voltaire.com  Mon Sep 18 13:46:58 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 16:46:58 -0400
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 include/opensm/osm_pkey.h
In-Reply-To: <450EFD77.7000605@mellanox.co.il>
References: <86zmcylc2e.fsf@mtl066.yok.mtl.com>
	<1158576813.18842.935.camel@hal.voltaire.com>
	<450EFD77.7000605@mellanox.co.il>
Message-ID: <1158612363.18842.19619.camel@hal.voltaire.com>

On Mon, 2006-09-18 at 16:11, Eitan Zahavi wrote:
> Hal Rosenstock wrote:
> 
> >Hi Eitan,
> >
> >On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
> >  
> >
> >>Hi Hal
> >>
> >>Partition tables blocks are always 16 bits. 
> >>This resolves the need to later cast back and forth.
> >>
> >>Thanks
> >>
> >>Eitan
> >>
> >>Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> >>
> >>Index: include/opensm/osm_pkey.h
> >>===================================================================
> >>--- include/opensm/osm_pkey.h	(revision 9502)
> >>+++ include/opensm/osm_pkey.h	(working copy)
> >>@@ -143,7 +143,7 @@ typedef struct _osm_pkey_tbl
> >> typedef struct _osm_pending_pkey {
> >>   cl_list_item_t	list_item;
> >>   uint16_t		pkey;
> >>-  uint32_t		block;
> >>+  uint16_t		block;
> >>   uint8_t		index;
> >>   boolean_t		is_new;
> >> } osm_pending_pkey_t;
> >>@@ -396,7 +396,7 @@ ib_api_status_t
> >> osm_pkey_tbl_get_block_and_idx(
> >>   IN  osm_pkey_tbl_t *p_pkey_tbl, 
> >>   IN  uint16_t       *p_pkey,
> >>-  OUT uint32_t       *block_idx,
> >>+  OUT uint16_t       *block_idx,
> >>   OUT uint8_t        *pkey_index);
> >> /*
> >> *  p_pkey_tbl
> >>    
> >>
> >
> >Doesn't this require at least a similar change to
> >opensm/osm_pkey.c:osm_pkey_tbl_get_block_and_idx ? Anything else ?
> >  
> >
> Sure this affects the osm_pkey.c.
> It is included in the mail named:  [PATCH 10/13] osm: port to WinIB 
> stack : opensm/osm_pkey.c

Each patch should be able to stand on its own (so 2 and 10 should have
been one patch). No need to resubmit for this.

-- Hal

> >-- Hal
> >
> >
> >_______________________________________________
> >openib-general mailing list
> >openib-general at openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >  
> >
> 


From rdreier at cisco.com  Mon Sep 18 14:24:05 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 14:24:05 -0700
Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names
In-Reply-To: <20060918191708.GI1371@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 18 Sep 2006 22:17:08 +0300")
References: <20060918191708.GI1371@mellanox.co.il>
Message-ID: <adamz8woolm.fsf@cisco.com>

Thanks, queued for 2.6.19.

 > Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
 > 
 > Index: linux-2.6.18-rc2-devel/include/rdma/ib_sa.h
 > ===================================================================

One trivial request: can you make sure your patches have a "---" line
between the patch description and the actual patch?  That way git
tools can just apply the patch automagically for me.

 - R.


From rdreier at cisco.com  Mon Sep 18 14:26:34 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 14:26:34 -0700
Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names
In-Reply-To: <20060918191708.GI1371@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 18 Sep 2006 22:17:08 +0300")
References: <20060918191708.GI1371@mellanox.co.il>
Message-ID: <adairjkoohh.fsf@cisco.com>

BTW, I think this means your original IPoIB patch that did:

 > +	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
 > +	path->pathrec.mtu_selector   = IB_SA_GTE;

now needs to do something like

+	path->pathrec.mtu            = max(IB_MTU_256, priv->broadcast->mcmember.mtu - 1);
+	path->pathrec.mtu_selector   = IB_SA_GT;

right?

The strict inequality semantics defined by the spec are somewhat more
awkward to actually use :(

 - R.


From rdreier at cisco.com  Mon Sep 18 15:34:10 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 15:34:10 -0700
Subject: [openib-general] Bug in OpenSM multicast group creation?
Message-ID: <ada1wq8olct.fsf@cisco.com>

Around line 1340 of osm_sa_mcmember_record.c, there is the code:

  /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */
  (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */
  (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */
  (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */

  /* Initialize the mgrp */
  (*pp_mgrp)->mcmember_rec = mcm_rec;
  (*pp_mgrp)->mcmember_rec.mlid = mlid;

I don't know exactly what this is trying to do, but it looks very
fishy to me: as far as I can see, the second block of code overwrites
the effects of the first three lines.  So either those "/* exactly */"
lines aren't needed, or they need to be moved after the mgrp is
initialized.

 - R.


From halr at voltaire.com  Mon Sep 18 16:30:33 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 19:30:33 -0400
Subject: [openib-general] Bug in OpenSM multicast group creation?
In-Reply-To: <ada1wq8olct.fsf@cisco.com>
References: <ada1wq8olct.fsf@cisco.com>
Message-ID: <1158622202.18842.23874.camel@hal.voltaire.com>

On Mon, 2006-09-18 at 18:34, Roland Dreier wrote:
> Around line 1340 of osm_sa_mcmember_record.c, there is the code:
> 
>   /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */
>   (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */
>   (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */
>   (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */
> 
>   /* Initialize the mgrp */
>   (*pp_mgrp)->mcmember_rec = mcm_rec;
>   (*pp_mgrp)->mcmember_rec.mlid = mlid;
> 
> I don't know exactly what this is trying to do,

The response is required to have the selectors set to exactly regardless
of what they were in the request.

> but it looks very fishy to me:

Now that you point it out, me too :-(

>  as far as I can see, the second block of code overwrites
> the effects of the first three lines.  So either those "/* exactly */"
> lines aren't needed, or they need to be moved after the mgrp is
> initialized.

It appears to me that they should be moved after those 2 lines of mgrp
initialization.

-- Hal

>  - R.


From rdreier at cisco.com  Mon Sep 18 17:07:26 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 18 Sep 2006 17:07:26 -0700
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <adafyeuupa3.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 14 Sep 2006 14:11:32 -0700")
References: <20060914141901.GG25691@mellanox.co.il> <adafyeuupa3.fsf@cisco.com>
Message-ID: <adawt80n2gx.fsf@cisco.com>

Here's a patch that tries to fix this.  I only tried it with the Cisco
embedded SM, so someone should probably check that this doesn't break
under OpenSM.

Look OK?

 - R.


IPoIB: Create MCGs with all attributes required by RFC

RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:

  If the IB multicast group does not already exist, one must be
  created first with the IPoIB link MTU.  The MGID MUST use the same
  P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
  broadcast-GID.  The rest of attributes SHOULD follow the values used
  in the broadcast-GID as well.

However, the current IPoIB driver is only setting the attributes
required by the InfiniBand spec to create a multicast group, so in
particular the MTU and HopLimit are not being set.  Add these
attributes when creating MCGs, and also set the Rate attribute, since
IPoIB pays attention to that attribute as well.

Signed-off-by: Roland Dreier <rolandd at cisco.com>

---

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index fb3e487..3faa182 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -472,15 +472,25 @@ static void ipoib_mcast_join(struct net_
 
 	if (create) {
 		comp_mask |=
-			IB_SA_MCMEMBER_REC_QKEY		|
-			IB_SA_MCMEMBER_REC_SL		|
-			IB_SA_MCMEMBER_REC_FLOW_LABEL	|
-			IB_SA_MCMEMBER_REC_TRAFFIC_CLASS;
+			IB_SA_MCMEMBER_REC_QKEY			|
+			IB_SA_MCMEMBER_REC_MTU_SELECTOR		|
+			IB_SA_MCMEMBER_REC_MTU			|
+			IB_SA_MCMEMBER_REC_TRAFFIC_CLASS	|
+			IB_SA_MCMEMBER_REC_RATE_SELECTOR	|
+			IB_SA_MCMEMBER_REC_RATE			|
+			IB_SA_MCMEMBER_REC_SL			|
+			IB_SA_MCMEMBER_REC_FLOW_LABEL		|
+			IB_SA_MCMEMBER_REC_HOP_LIMIT;
 
 		rec.qkey	  = priv->broadcast->mcmember.qkey;
+		rec.mtu_selector  = IB_SA_EQ;
+		rec.mtu		  = priv->broadcast->mcmember.mtu;
+		rec.traffic_class = priv->broadcast->mcmember.traffic_class;
+		rec.rate_selector = IB_SA_EQ;
+		rec.rate	  = priv->broadcast->mcmember.rate;
 		rec.sl		  = priv->broadcast->mcmember.sl;
 		rec.flow_label	  = priv->broadcast->mcmember.flow_label;
-		rec.traffic_class = priv->broadcast->mcmember.traffic_class;
+		rec.hop_limit	  = priv->broadcast->mcmember.hop_limit;
 	}
 
 	init_completion(&mcast->done);


From halr at voltaire.com  Mon Sep 18 17:30:37 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 18 Sep 2006 20:30:37 -0400
Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In
 osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response
Message-ID: <1158625818.18842.25479.camel@hal.voltaire.com>

OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, set
exactly selectors after rather than before mgrp is initialized

Pointed out by: Roland Dreier <rdreier at cisco.com>

Signed-off-by: Hal Rosenstock <halr at voltaire.com>

Index: opensm/osm_sa_mcmember_record.c
===================================================================
--- opensm/osm_sa_mcmember_record.c	(revision 9347)
+++ opensm/osm_sa_mcmember_record.c	(working copy)
@@ -1337,15 +1337,18 @@ osm_mcmr_rcv_create_new_mgrp(
     goto Exit;
   }
 
-  /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */
-  (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */
-  (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */
-  (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */
-
   /* Initialize the mgrp */
   (*pp_mgrp)->mcmember_rec = mcm_rec;
   (*pp_mgrp)->mcmember_rec.mlid = mlid;
 
+  /* the mcmember_record should have mtu_sel, rate_sel, and pkt_lifetime_sel = 2 */
+  (*pp_mgrp)->mcmember_rec.mtu &= 0x3f;
+  (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */
+  (*pp_mgrp)->mcmember_rec.rate &= 0x3f;
+  (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */
+  (*pp_mgrp)->mcmember_rec.pkt_life &= 0x3f;
+  (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */
+
   /* Insert the new group in the data base */
   
   /* since we might have an old group by that mlid


From mst at mellanox.co.il  Mon Sep 18 20:03:41 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 06:03:41 +0300
Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names
In-Reply-To: <adairjkoohh.fsf@cisco.com>
References: <adairjkoohh.fsf@cisco.com>
Message-ID: <20060919030341.GA30563@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/sa: fix ib_sa_selector names
> 
> BTW, I think this means your original IPoIB patch that did:
> 
>  > +	path->pathrec.mtu            = priv->broadcast->mcmember.mtu;
>  > +	path->pathrec.mtu_selector   = IB_SA_GTE;
> 
> now needs to do something like
> 
> +	path->pathrec.mtu            = max(IB_MTU_256, priv->broadcast->mcmember.mtu - 1);
> +	path->pathrec.mtu_selector   = IB_SA_GT;
> 
> right?
> 
> The strict inequality semantics defined by the spec are somewhat more
> awkward to actually use :(

But they also happen to work :).
I'm testing that patch and will post it RSN.

-- 
MST


From mst at mellanox.co.il  Mon Sep 18 20:06:41 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 06:06:41 +0300
Subject: [openib-general] [PATCH] IB/sa: fix ib_sa_selector names
In-Reply-To: <adamz8woolm.fsf@cisco.com>
References: <adamz8woolm.fsf@cisco.com>
Message-ID: <20060919030641.GB30563@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/sa: fix ib_sa_selector names
> 
> Thanks, queued for 2.6.19.
> 
>  > Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
>  > 
>  > Index: linux-2.6.18-rc2-devel/include/rdma/ib_sa.h
>  > ===================================================================
> 
> One trivial request: can you make sure your patches have a "---" line
> between the patch description and the actual patch?  That way git
> tools can just apply the patch automagically for me.

Sure. Are you using git-apply-mbox BTW?

-- 
MST


From mst at mellanox.co.il  Mon Sep 18 20:15:34 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 06:15:34 +0300
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <adawt80n2gx.fsf@cisco.com>
References: <adawt80n2gx.fsf@cisco.com>
Message-ID: <20060919031534.GC30563@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
Subject: Re: Fwd: IPoIB Multicast

Here's a patch that tries to fix this.  I only tried it with the Cisco
embedded SM, so someone should probably check that this doesn't break
under OpenSM.

Look OK?

 - R.


We've been testing the following which looks exactly equivalent.
I'll look at the regression results in the morning and will let you know.

Please note this fixes an actual issue for us: on a mixed
1x/4x or SDP/DDR network, if a group is created with the wrong
parameters, some nodes are unable to join.

-----------------------------------------------------------

IB/ipoib: make multicast group creation spec compliant

IPoIB spec says:
The MGID MUST use the same P_Key, Q_Key, SL, MTU and HopLimit as
those used in the broadcast-GID. For the rest of attributes too,
the values used in the broadcast-GID SHOULD be used.

IPoIB currently violates this rule, which breaks multicast
on heterogenious networks.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2006-09-17 12:23:25.000000000 +0300
+++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2006-09-17 16:31:08.000000000 +0300
@@ -472,15 +472,25 @@ static void ipoib_mcast_join(struct net_
 
 	if (create) {
 		comp_mask |=
-			IB_SA_MCMEMBER_REC_QKEY		|
-			IB_SA_MCMEMBER_REC_SL		|
-			IB_SA_MCMEMBER_REC_FLOW_LABEL	|
-			IB_SA_MCMEMBER_REC_TRAFFIC_CLASS;
+			IB_SA_MCMEMBER_REC_QKEY          |
+			IB_SA_MCMEMBER_REC_SL		 |
+			IB_SA_MCMEMBER_REC_FLOW_LABEL	 |
+			IB_SA_MCMEMBER_REC_TRAFFIC_CLASS |
+			IB_SA_MCMEMBER_REC_RATE_SELECTOR |
+			IB_SA_MCMEMBER_REC_RATE          |
+			IB_SA_MCMEMBER_REC_HOP_LIMIT     |
+			IB_SA_MCMEMBER_REC_MTU_SELECTOR  |
+			IB_SA_MCMEMBER_REC_MTU;
 
 		rec.qkey	  = priv->broadcast->mcmember.qkey;
 		rec.sl		  = priv->broadcast->mcmember.sl;
 		rec.flow_label	  = priv->broadcast->mcmember.flow_label;
 		rec.traffic_class = priv->broadcast->mcmember.traffic_class;
+		rec.rate_selector = IB_SA_EQ;
+		rec.rate          = priv->broadcast->mcmember.rate;
+		rec.hop_limit     = priv->broadcast->mcmember.hop_limit;
+		rec.mtu_selector  = IB_SA_EQ;
+		rec.mtu           = priv->broadcast->mcmember.mtu;
 	}
 
 	init_completion(&mcast->done);

-- 
MST


From halr at voltaire.com  Mon Sep 18 22:07:25 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 01:07:25 -0400
Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics
In-Reply-To: <450EEAD9.1000503@dev.mellanox.co.il>
References: <450EEAD9.1000503@dev.mellanox.co.il>
Message-ID: <1158642400.18842.33716.camel@hal.voltaire.com>

Hi Yevgeny,

On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> This patch fixes a bug in opensm that was discovered on
> a 'broken' fabrics when opensm was executed with --stay_on_fatal.
> Replacing assert with a real check.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Is this intended for trunk only or also 1.1 ?

-- Hal


From eli at dev.mellanox.co.il  Mon Sep 18 23:30:53 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Tue, 19 Sep 2006 09:30:53 +0300
Subject: [openib-general] ipoib multicast problem
Message-ID: <1158647453.5392.66.camel@localhost>

Hi,
I have seen the following problem with ipoib:

1. An application registers to a multicast group as a full member. As a
result all the groups are listed in dev->mclist.
2. The infiniband link falls momentarily, opensm restarted etc.
3. All multicast memberships are flushed.
4. The net device will not join again until at a later time something
will cause ipoib_set_mcast_list() to be called.
 

From eli at dev.mellanox.co.il  Mon Sep 18 23:31:14 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Tue, 19 Sep 2006 09:31:14 +0300
Subject: [openib-general]  [PATCH] ipoib mcast restart
Message-ID: <1158647474.5392.68.camel@localhost>

Make sure after after ipoib_ib_dev_flush is executed,
ipoib_mcast_restart_task is executed also to join all the
mcast groups maintained by the kernel for the device.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

Index: openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- openib-1.1.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2006-09-14
17:20:06.000000000 +0300
+++ openib-1.1/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2006-09-17
15:51:52.000000000 +0300
@@ -619,8 +619,10 @@
 	 * The device could have been brought down between the start and when
 	 * we get here, don't bring it back up if it's not configured up
 	 */
-	if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+	if (test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) {
 		ipoib_ib_dev_up(dev);
+		ipoib_mcast_restart_task(dev);
+	}
 
 	mutex_lock(&priv->vlan_mutex);
 

From krkumar2 at in.ibm.com  Tue Sep 19 00:02:10 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Tue, 19 Sep 2006 12:32:10 +0530
Subject: [openib-general] [PATCH] id_priv_list->list is not initialized
	sometimes
Message-ID: <20060919070210.5476.68607.sendpatchset@localhost.localdomain>

rdma_listen could be called from a context where id_priv->list
is not initialized. Then at a later stage, a cma_cancel_listen
does a list_del() which could oops since this element is not
on any list. 

Eg, in rdma_listen(), if id->device is !NULL, it calls
cma_ib_listen() which doesn't add this id to any list. A
cma_cancel_listen() will do a list_del.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
--------

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-14 15:31:27.000000000 +0530
+++ new/core/cma.c	2006-09-14 16:07:35.000000000 +0530
@@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c
 	atomic_set(&id_priv->dev_remove, 0);
 	INIT_LIST_HEAD(&id_priv->listen_list);
 	INIT_LIST_HEAD(&id_priv->mc_list);
+	INIT_LIST_HEAD(&id_priv->list);
 	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
 
 	return &id_priv->id;


From krkumar2 at in.ibm.com  Tue Sep 19 00:02:06 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Tue, 19 Sep 2006 12:32:06 +0530
Subject: [openib-general] [PATCH] ucma : Encapsulate duplicate code to
	common routine
Message-ID: <20060919070206.5476.64107.sendpatchset@localhost.localdomain>

Encapsulate duplicate code to common routine - avoid checking same
errors in multiple places.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
--------

diff -ruNp org/core/ucma.c new/core/ucma.c
--- org/core/ucma.c	2006-09-18 17:38:12.000000000 +0530
+++ new/core/ucma.c	2006-09-18 17:39:34.000000000 +0530
@@ -87,20 +87,30 @@ struct ucma_event {
 static DEFINE_MUTEX(ctx_mutex);
 static DEFINE_IDR(ctx_idr);
 
-static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id)
+/* _ucma_find_context : internal find routine. Assumes ctx_mutex is held */
+static inline struct ucma_context* _ucma_find_context(int id)
 {
 	struct ucma_context *ctx;
 
-	mutex_lock(&ctx_mutex);
+	BUG_ON(!mutex_is_locked(&ctx_mutex));
+
 	ctx = idr_find(&ctx_idr, id);
 	if (!ctx)
 		ctx = ERR_PTR(-ENOENT);
 	else if (ctx->file != file)
 		ctx = ERR_PTR(-EINVAL);
-	else
+	return ctx;
+}
+
+static struct ucma_context* ucma_get_ctx(struct ucma_file *file, int id)
+{
+	struct ucma_context *ctx;
+
+	mutex_lock(&ctx_mutex);
+	ctx = _ucma_find_context(id);
+	if (!IS_ERR(ctx))
 		atomic_inc(&ctx->ref);
 	mutex_unlock(&ctx_mutex);
-
 	return ctx;
 }
 
@@ -354,12 +364,8 @@ static ssize_t ucma_destroy_id(struct uc
 		return -EFAULT;
 
 	mutex_lock(&ctx_mutex);
-	ctx = idr_find(&ctx_idr, cmd.id);
-	if (!ctx)
-		ctx = ERR_PTR(-ENOENT);
-	else if (ctx->file != file)
-		ctx = ERR_PTR(-EINVAL);
-	else
+	ctx = _ucma_find_context(cmd.id);
+	if (!IS_ERR(ctx))
 		idr_remove(&ctx_idr, ctx->id);
 	mutex_unlock(&ctx_mutex);
 

From krkumar2 at in.ibm.com  Tue Sep 19 00:02:03 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Tue, 19 Sep 2006 12:32:03 +0530
Subject: [openib-general] [PATCH] fix cma_leave_mc_groups
Message-ID: <20060919070203.5476.17650.sendpatchset@localhost.localdomain>

- mthca_multicast_detach - as an example, frees up a bit
  for re-use later so if it is not called during destroy_id,
  it *appears* that those bits (index) are leaked.

- cma_leave_mc_groups can race with other routines updating
  or reading the mclist, so use lock. Eg while doing a
  rdma_destroy_id(), other processes could be looking at
  this id and de-referencing mclist.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
--------

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-18 16:00:41.000000000 +0530
+++ new/core/cma.c	2006-09-18 16:12:58.000000000 +0530
@@ -761,14 +761,24 @@ static void cma_release_port(struct rdma
 static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
 {
 	struct cma_multicast *mc;
+	unsigned long flags;
 
+	spin_lock_irqsave(&id_priv->lock, flags);
 	while (!list_empty(&id_priv->mc_list)) {
 		mc = container_of(id_priv->mc_list.next,
 				  struct cma_multicast, list);
 		list_del(&mc->list);
+		spin_unlock_irqrestore(&id_priv->lock, flags);
+		if (id_priv->id.qp) {
+			ib_detach_mcast(id_priv->id.qp,
+					&mc->multicast.ib->rec.mgid,
+					mc->multicast.ib->rec.mlid);
+		}
 		ib_free_multicast(mc->multicast.ib);
 		kfree(mc);
+		spin_lock_irqsave(&id_priv->lock, flags);
 	}
+	spin_unlock_irqrestore(&id_priv->lock, flags);
 }
 
 void rdma_destroy_id(struct rdma_cm_id *id)


From krkumar2 at in.ibm.com  Tue Sep 19 00:02:14 2006
From: krkumar2 at in.ibm.com (Krishna Kumar)
Date: Tue, 19 Sep 2006 12:32:14 +0530
Subject: [openib-general] [PATCH] Typo in ib_set_client_data()
Message-ID: <20060919070214.5476.99212.sendpatchset@localhost.localdomain>

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
--------

diff -ruNp org/core/device.c new/core/device.c
--- org/core/device.c	2006-09-14 15:38:14.000000000 +0530
+++ new/core/device.c	2006-09-14 15:38:29.000000000 +0530
@@ -385,7 +385,7 @@ void *ib_get_client_data(struct ib_devic
 EXPORT_SYMBOL(ib_get_client_data);
 
 /**
- * ib_set_client_data - Get IB client context
+ * ib_set_client_data - Set IB client context
  * @device:Device to set context for
  * @client:Client to set context for
  * @data:Context to set


From mst at mellanox.co.il  Tue Sep 19 00:21:29 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 10:21:29 +0300
Subject: [openib-general] [PATCH] Fix freed mem deref race in
 cma_process_remove/cma_req_handler
In-Reply-To: <20060918073545.26067.41763.sendpatchset@localhost.localdomain>
References: <20060918073545.26067.41763.sendpatchset@localhost.localdomain>
Message-ID: <20060919072129.GD31498@mellanox.co.il>

Quoting r. Krishna Kumar <krkumar2 at in.ibm.com>:
> Subject: [PATCH] Fix freed mem deref race in cma_process_remove/cma_req_handler
> 
> The race is as follows :
> 
> A process : cma_process_remove() calls cma_remove_id_dev(),
> 	    which sets id state to CMA_DEVICE_REMOVAL and
> 	    calls wait_event(dev_remove).
> 
> B process : cma_req_handler() had incremented dev_remove,
> 	    and calls cma_acquire_ib_dev() and on failure
> 	    calls cma_release_remove(), which does a
> 	    wake_up of cma_process_remove(). Then
> 	    cma_req_handler() calls rdma_destroy_id();
> 
> A Process : cma_remove_id_dev() gets woken and checks the
> 	    state of id, and since it is still (wrongly)
> 	    CMA_DEVICE_REMOVAL, it calls notify_user(id)
> 	    and if that fails, the caller - cma_process_remove()
> 	    calls rdma_destroy_id(id). Two processes can
> 	    call rdma_destroy_id(), resulting in one
> 	    de-referencing kfreed id_priv.
> 
> Fix is for process B to set CMA_DESTROYING in cma_req_handler()
> so that process A will return instead of doing a rdma_destroy_id().
> 
> Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>

Did you actually see these crashes?
If yes, this looks serious enough even for 2.6.18. Sean?

-- 
MST


From mst at mellanox.co.il  Tue Sep 19 00:25:09 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 10:25:09 +0300
Subject: [openib-general] Fwd: [PATCH] id_priv_list->list is not initialized
	sometimes
Message-ID: <20060919072509.GE31498@mellanox.co.il>


----- Forwarded message from Krishna Kumar <krkumar2 at in.ibm.com> -----

From: "Krishna Kumar" <krkumar2 at in.ibm.com>
Date: Tue, 19 Sep 2006 12:32:10 +0530
Subject: [PATCH] id_priv_list->list is not initialized
 sometimes

rdma_listen could be called from a context where id_priv->list
is not initialized. Then at a later stage, a cma_cancel_listen
does a list_del() which could oops since this element is not
on any list. 

Eg, in rdma_listen(), if id->device is !NULL, it calls
cma_ib_listen() which doesn't add this id to any list. A
cma_cancel_listen() will do a list_del.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
--------

diff -ruNp org/core/cma.c new/core/cma.c
--- org/core/cma.c	2006-09-14 15:31:27.000000000 +0530
+++ new/core/cma.c	2006-09-14 16:07:35.000000000 +0530
@@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c
 	atomic_set(&id_priv->dev_remove, 0);
 	INIT_LIST_HEAD(&id_priv->listen_list);
 	INIT_LIST_HEAD(&id_priv->mc_list);
+	INIT_LIST_HEAD(&id_priv->list);
 	get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
 
 	return &id_priv->id;

----- End forwarded message -----

Did you actually see these crashes?
If yes, this might need to be fixed even for 2.6.18. Sean?

-- 
MST


From krkumar2 at in.ibm.com  Tue Sep 19 00:42:15 2006
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Tue, 19 Sep 2006 13:12:15 +0530
Subject: [openib-general] Fwd: [PATCH] id_priv_list->list is not
 initialized sometimes
In-Reply-To: <20060919072509.GE31498@mellanox.co.il>
Message-ID: <OF484D857E.052E5A98-ON652571EE.002A347E-652571EE.0029B162@in.ibm.com>

Hi Michael,

> Did you actually see these crashes?
> If yes, this might need to be fixed even for 2.6.18. Sean?

No I have not seen this crash, this is based on reading the code.

thanks,

- KK

openib-general-bounces at openib.org wrote on 09/19/2006 12:55:09 PM:

> 
> ----- Forwarded message from Krishna Kumar <krkumar2 at in.ibm.com> -----
> 
> From: "Krishna Kumar" <krkumar2 at in.ibm.com>
> Date: Tue, 19 Sep 2006 12:32:10 +0530
> Subject: [PATCH] id_priv_list->list is not initialized
>  sometimes
> 
> rdma_listen could be called from a context where id_priv->list
> is not initialized. Then at a later stage, a cma_cancel_listen
> does a list_del() which could oops since this element is not
> on any list. 
> 
> Eg, in rdma_listen(), if id->device is !NULL, it calls
> cma_ib_listen() which doesn't add this id to any list. A
> cma_cancel_listen() will do a list_del.
> 
> Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
> --------
> 
> diff -ruNp org/core/cma.c new/core/cma.c
> --- org/core/cma.c   2006-09-14 15:31:27.000000000 +0530
> +++ new/core/cma.c   2006-09-14 16:07:35.000000000 +0530
> @@ -339,6 +339,7 @@ struct rdma_cm_id* rdma_create_id(rdma_c
>     atomic_set(&id_priv->dev_remove, 0);
>     INIT_LIST_HEAD(&id_priv->listen_list);
>     INIT_LIST_HEAD(&id_priv->mc_list);
> +   INIT_LIST_HEAD(&id_priv->list);
>     get_random_bytes(&id_priv->seq_num, sizeof id_priv->seq_num);
> 
>     return &id_priv->id;
> 
> ----- End forwarded message -----
> 

> 
> -- 
> MST
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Tue Sep 19 01:13:24 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 11:13:24 +0300
Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps
Message-ID: <20060919081324.GF31498@mellanox.co.il>

From: "Jack Morgenstein" <jackm at dev.mellanox.co.il>

SM lid was incorrectly set to port lid.  This is a regression from 2.6.17 -
after event, no traps are sent to the SM LID - they go to the
loopback interface instead, and are typicaly dropped there.
Should be set to sm_lid of port info response.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Roland, this fixes a serious regression from 2.6.17.
The bug was introduced by commit 12bbb2b7be7f5564952ebe0196623e97464b8ac5:
	IB/mthca: Add client reregister event generation
I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or
2.6.18.1.

Index: ofed_1_1/drivers/infiniband/hw/mthca/mthca_mad.c
===================================================================
--- ofed_1_1.orig/drivers/infiniband/hw/mthca/mthca_mad.c	2006-08-16 10:16:19.000000000 +0300
+++ ofed_1_1/drivers/infiniband/hw/mthca/mthca_mad.c	2006-09-19 10:33:31.280328000 +0300
@@ -119,7 +119,7 @@ static void smp_snoop(struct ib_device *
 
 			mthca_update_rate(to_mdev(ibdev), port_num);
 			update_sm_ah(to_mdev(ibdev), port_num,
-				     be16_to_cpu(pinfo->lid),
+				     be16_to_cpu(pinfo->sm_lid),
 				     pinfo->neighbormtu_mastersmsl & 0xf);
 
 			event.device           = ibdev;


-- 
MST


From mst at mellanox.co.il  Tue Sep 19 02:08:49 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 12:08:49 +0300
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <20060919031534.GC30563@mellanox.co.il>
References: <adawt80n2gx.fsf@cisco.com> <20060919031534.GC30563@mellanox.co.il>
Message-ID: <20060919090849.GC32603@mellanox.co.il>

Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: Fwd: IPoIB Multicast
> 
> Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: Fwd: IPoIB Multicast
> 
> Here's a patch that tries to fix this.  I only tried it with the Cisco
> embedded SM, so someone should probably check that this doesn't break
> under OpenSM.
> 
> Look OK?
> 
>  - R.
> 
> 
> We've been testing the following which looks exactly equivalent.
> I'll look at the regression results in the morning and will let you know.

Works OK here. Please commit. Please note this does fix a real issue
for us, which is quite severe for clusters where ipoib is the only
interconnect, I wander whether this is 2.6.18 material.

-- 
MST


From maheshbarve at gmail.com  Tue Sep 19 02:29:17 2006
From: maheshbarve at gmail.com (Mahesh Barve)
Date: Tue, 19 Sep 2006 14:59:17 +0530
Subject: [openib-general] Posting requests on multiple QPs simultaneously
Message-ID: <507df10d0609190229g6855bd33g1f5973d4d489c6f6@mail.gmail.com>

Hi,
  Infiniband allows the creation of 16M QPs.
 Suppose a programmer wants to post separate requests on each of the QPs
simultaneously,
 what would be the most efficient way of doing it?
regards,
-mahesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/11040bfa/attachment.html>

From dotanb at dev.mellanox.co.il  Tue Sep 19 04:28:36 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 19 Sep 2006 14:28:36 +0300
Subject: [openib-general] Posting requests on multiple QPs simultaneously
In-Reply-To: <507df10d0609190229g6855bd33g1f5973d4d489c6f6@mail.gmail.com>
References: <507df10d0609190229g6855bd33g1f5973d4d489c6f6@mail.gmail.com>
Message-ID: <450FD464.9010809@dev.mellanox.co.il>

Mahesh Barve wrote:
> Hi,
>   Infiniband allows the creation of 16M QPs. 
>  Suppose a programmer wants to post separate requests on each of the 
> QPs simultaneously,
>  what would be the most efficient way of doing it?
> regards,
> -mahesh
>  
what is your question: should you use threads?  should you post one by 
one or post a list?

in one post operation, you cannot post WR to more than one QP.

Dotan


From eli at dev.mellanox.co.il  Tue Sep 19 04:44:51 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Tue, 19 Sep 2006 14:44:51 +0300
Subject: [openib-general] ipoib multicast problems on RHEL4.0 u4
Message-ID: <1158666291.24776.32.camel@localhost>

Hi,

while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt()
succeeds to add a multicast group to an interface but actually the
multicast group is not added to the net_device. This means that an
application cannot join a multicast group as a full member. When I
examined the differences between the kernel sources for u3 and u4 I
noticed that essential code was removed:

diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c
--- net/ipv4/arp.c  2006-09-18 15:35:03.000000000 +0300
+++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c  2006-09-19
10:08:06.000000000 +0300
@@ -213,9 +213,6 @@
    case ARPHRD_IEEE802_TR:
        ip_tr_mc_map(addr, haddr);
        return 0;
-   case ARPHRD_INFINIBAND:
-       ip_ib_mc_map(addr, haddr);
-       return 0;
    default:
        if (dir) {
            memcpy(haddr, dev->broadcast, dev->addr_len);


Can anyone suggest a workaround to this issue?

Thanks
Eli


From erezz at voltaire.com  Tue Sep 19 04:45:28 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 19 Sep 2006 14:45:28 +0300
Subject: [openib-general] [PATCH] IB/iser: fix iSER description and
 selections in Kconfig
In-Reply-To: <adaslipp8ge.fsf@cisco.com>
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com>
	<adau03awjku.fsf@cisco.com> <450D0FCB.1000401@voltaire.com>
	<adaslipp8ge.fsf@cisco.com>
Message-ID: <450FD858.3000507@voltaire.com>


Roland Dreier wrote:
>     Erez> There are 3 additional required config entries: NET, INET &
>     Erez> INFINIBAND_RDMA_CM. Do you suggest to 'depned' on them or
>     Erez> 'depned' on some of them and 'select' the rest?
>
> INET depends on NET, and INFINIBAND_RDMA_CM doesn't exist.  So
> depending on INET is sufficient.  That's the reason 'depend' is better
> than 'select' -- you don't have to worry about recreating the full
> dependency tree of things you depend on.
>
>     Erez> Also, since I'm not familiar enough with 'make rndconfig',
>     Erez> here's a question: if iSER 'depends' on INET, is it possible
>     Erez> that 'make rndconfig' will enable iSER without enabling
>     Erez> INET?
>
> No, of course not.  The whole point of make randconfig is to make a
> random but valid configuration.
>
> Anyway, rather than waste more time going back and forth on this, I
> added the following to my for-2.6.19 tree as the obvious fix:
>
> Author: Roland Dreier <rolandd at cisco.com>
> Date:   Sun Sep 17 22:58:27 2006 -0700
>
>     IB/iser: INFINIBAND_ISER depends on INET
>     
>     iSER won't build without CONFIG_INET enabled, so make Kconfig reflect that.
>     
>     Signed-off-by: Roland Dreier <rolandd at cisco.com>
>
> diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
> index fead87d..365a1b5 100644
> --- a/drivers/infiniband/ulp/iser/Kconfig
> +++ b/drivers/infiniband/ulp/iser/Kconfig
> @@ -1,6 +1,6 @@
>  config INFINIBAND_ISER
>  	tristate "ISCSI RDMA Protocol"
> -	depends on INFINIBAND && SCSI
> +	depends on INFINIBAND && SCSI && INET
>  	select SCSI_ISCSI_ATTRS
>  	---help---
>  	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
>   
I don't agree with that. It is possible that INFINIBAND_ADDR_TRANS won't 
be selected according to your patch. How about this solution: iSER 
should depend on INFINIBAND && SCSI && INFINIBAND_ADDR_TRANS (which 
depends on INET, so the INET dependency is ok).

Erez


From kliteyn at dev.mellanox.co.il  Tue Sep 19 05:23:17 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Tue, 19 Sep 2006 15:23:17 +0300
Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics
Message-ID: <450FE135.1060007@dev.mellanox.co.il>

Hi Hal.

Please apply this patch both to trunk and 1.1.

Thanks.

--
Yevgeny

 > Hi Yevgeny,
 >
 > On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote:
 > > Hi Hal
 > >
 > > This patch fixes a bug in opensm that was discovered on
 > > a 'broken' fabrics when opensm was executed with --stay_on_fatal.
 > > Replacing assert with a real check.
 > >
 > > Yevgeny
 > >
 > > Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
 >
 > Is this intended for trunk only or also 1.1 ?
 >
 > -- Hal


From mst at mellanox.co.il  Tue Sep 19 05:45:46 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 19 Sep 2006 15:45:46 +0300
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu selector
 for path queries
Message-ID: <20060919124546.GF32603@mellanox.co.il>

Roland, the patch is still under test (I'll leave it to run
for a nigh), but I'd like to get comments on the following:


IB/ipoib: user appropriate mtu selector for path queries

IPoIB must set mtu selector in path record query according to dev->mtu:
if we wildcard it, SM can select a path with lower MTU.
This breaks IPoIB on networks with SM Tavor quirk activates.

We can always require this, since IPoIB spec includes the following statement:
    The value (for IB MTU) assigned to the broadcast-GID must not
    be greater than any physical link MTU spanned by the IPoIB
    subnet.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

---

Note the following uses IB_SA_GT so it should be applied on top of SA
enum rename.

Index: ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- ofed_1_1.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ ofed_1_1/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -182,6 +182,8 @@ static int ipoib_change_mtu(struct net_d
 
 	dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
 
+	queue_work(ipoib_workqueue, &priv->flush_task);
+
 	return 0;
 }
 
@@ -452,15 +454,39 @@ static int path_rec_start(struct net_dev
 			  struct ipoib_path *path)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	ib_sa_comp_mask comp_mask = IB_SA_PATH_REC_MTU_SELECTOR | IB_SA_PATH_REC_MTU;
+
+	path->pathrec.mtu_selector = IB_SA_GT;
 
-	ipoib_dbg(priv, "Start path record lookup for " IPOIB_GID_FMT "\n",
-		  IPOIB_GID_ARG(path->pathrec.dgid));
+	switch (roundup_pow_of_two(dev->mtu + IPOIB_ENCAP_LEN)) {
+	case 512:
+		path->pathrec.mtu = IB_MTU_256;
+		break;
+	case 1024:
+		path->pathrec.mtu = IB_MTU_512;
+		break;
+	case 2048:
+		path->pathrec.mtu = IB_MTU_1024;
+		break;
+	case 4096:
+		path->pathrec.mtu = IB_MTU_2048;
+		break;
+	default:
+		/* Wildcard everything */
+		comp_mask = 0;
+		path->pathrec.mtu = 0;
+		path->pathrec.mtu_selector = 0;
+	}
+
+	ipoib_dbg(priv, "Start path record lookup for " IPOIB_GID_FMT " MTU > %d\n",
+		  IPOIB_GID_ARG(path->pathrec.dgid),
+		  comp_mask ? ib_mtu_enum_to_int(path->pathrec.mtu) : 0);
 
 	init_completion(&path->done);
 
 	path->query_id =
 		ib_sa_path_rec_get(priv->ca, priv->port,
-				   &path->pathrec,
+				   &path->pathrec, comp_mask    |
 				   IB_SA_PATH_REC_DGID		|
 				   IB_SA_PATH_REC_SGID		|
 				   IB_SA_PATH_REC_NUMB_PATH	|

-- 
MST


From ogerlitz at voltaire.com  Tue Sep 19 05:49:06 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 19 Sep 2006 15:49:06 +0300
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
In-Reply-To: <aday7shp8oq.fsf@cisco.com>
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
	<aday7smwjmy.fsf@cisco.com> <450D36E9.1000502@voltaire.com>
	<aday7shp8oq.fsf@cisco.com>
Message-ID: <450FE742.7040005@voltaire.com>

Roland Dreier wrote:
>     Or> I want it to be visible so if some other config **depends** on
>     Or> it the use can **see** this config and select it.
> 
>     Or> Also as of the importance of the rdma cm within the IB stack
>     Or> being along with the ib verbs the second access point to ULP
>     Or> coders, seeing its config and documenting it is important.
> 
> I don't buy this.  The only thing making this config option visible
> does is make it more likely (far more likely) that someone will
> disable it.  Right now the RDMA CM is built as long as INFINIBAND and
> INET are enabled.  No one is going to turn off INET on any normal
> system so effectively the RDMA CM is always built whenever INFINIBAND
> is enabled.

I am fine with having the CMA config selected whenever someone selects 
INFINIBAND so adding the help text and making it visible are not a must 
per my taste. However, are you fine with changing the **name** of the 
config directive to CONFIG_INFINIBAND_RDMA_CM so its better understood?

> As far as making a config symbol to depend on, I think INET makes as
> much sense or more: something using IP addressing naturally depends on
> having IP networking.

As Erez wrote you on the other thread, we must depend on the CMA else a 
user running make rndconfig would be able to produce a config file where 
  INFINIBAND is selected but the CMA (RDMA_ADDR_TRANS) config is not 
selected so linkage will fail.

Or.


From halr at voltaire.com  Tue Sep 19 06:17:05 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 09:17:05 -0400
Subject: [openib-general] [PATCH] osm: bug in OpenSM on broken fabrics
In-Reply-To: <450EEAD9.1000503@dev.mellanox.co.il>
References: <450EEAD9.1000503@dev.mellanox.co.il>
Message-ID: <1158671825.4509.3990.camel@hal.voltaire.com>

On Mon, 2006-09-18 at 14:52, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> This patch fixes a bug in opensm that was discovered on
> a 'broken' fabrics when opensm was executed with --stay_on_fatal.
> Replacing assert with a real check.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied with some cosmetic changes (to both trunk and 1.1).

Note that this patch was rejected (not sure why) and was manually
applied.

-- Hal


From halr at voltaire.com  Tue Sep 19 06:25:59 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 09:25:59 -0400
Subject: [openib-general] [PATCH][TRIVIAL]OpenSM/osm_node_info_rcv.c:
 Eliminate superfluous call level
Message-ID: <1158672358.4509.4309.camel@hal.voltaire.com>

OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level

Signed-off-by: Hal Rosenstock <halr at voltaire.com>
Index: opensm/osm_node_info_rcv.c
===================================================================
--- opensm/osm_node_info_rcv.c	(revision 9536)
+++ opensm/osm_node_info_rcv.c	(working copy)
@@ -437,7 +437,7 @@ __osm_ni_rcv_process_new_ca(
  The plock must be held before calling this function.
 **********************************************************************/
 static void
-__osm_ni_rcv_process_ca_port(
+__osm_ni_rcv_process_existing_ca(
   IN const osm_ni_rcv_t* const p_rcv,
   IN osm_node_t* const p_node,
   IN const osm_madw_t* const p_madw )
@@ -455,7 +455,7 @@ __osm_ni_rcv_process_ca_port(
   osm_bind_handle_t h_bind;
   cl_status_t cl_status;
 
-  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_ca_port );
+  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca );
 
   p_smp = osm_madw_get_smp_ptr( p_madw );
   p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp );
@@ -473,7 +473,7 @@ __osm_ni_rcv_process_ca_port(
   if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) )
   {
     osm_log( p_rcv->p_log, OSM_LOG_VERBOSE,
-             "__osm_ni_rcv_process_ca_port: "
+             "__osm_ni_rcv_process_existing_ca: "
              "Creating new port object with GUID = 0x%" PRIx64 "\n",
              cl_ntoh64( p_ni->port_guid ) );
 
@@ -483,7 +483,7 @@ __osm_ni_rcv_process_ca_port(
     if( p_port == NULL )
     {
       osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-               "__osm_ni_rcv_process_ca_port: ERR 0D04: "
+               "__osm_ni_rcv_process_existing_ca: ERR 0D04: "
                "Unable to create new port object\n" );
       goto Exit;
     }
@@ -500,7 +500,7 @@ __osm_ni_rcv_process_ca_port(
         Somehow, this port GUID already exists in the table.
       */
       osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-               "__osm_ni_rcv_process_ca_port: ERR 0D12: "
+               "__osm_ni_rcv_process_existing_ca: ERR 0D12: "
                "Port 0x%" PRIx64 " already in the database!\n",
                cl_ntoh64( p_ni->port_guid ) );
 
@@ -521,7 +521,7 @@ __osm_ni_rcv_process_ca_port(
       if( cl_status != CL_SUCCESS )
       {
         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-                 "__osm_ni_rcv_process_ca_port: ERR 0D08: "
+                 "__osm_ni_rcv_process_existing_ca: ERR 0D08: "
                  "Error %s adding to list\n",
                  CL_STATUS_MSG( cl_status ) );
         osm_port_delete( &p_port );
@@ -530,7 +530,7 @@ __osm_ni_rcv_process_ca_port(
       else
       {
         osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
-                 "__osm_ni_rcv_process_ca_port: "
+                 "__osm_ni_rcv_process_existing_ca: "
                  "Adding port GUID:0x%016" PRIx64 " to new_ports_list\n",
                  cl_ntoh64(osm_node_get_node_guid( p_port->p_node )) );
       }
@@ -547,7 +547,7 @@ __osm_ni_rcv_process_ca_port(
     if ( !osm_physp_is_valid( p_physp ) )
     {
         osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-                 "__osm_ni_rcv_process_ca_port: ERR 0D19: "
+                 "__osm_ni_rcv_process_existing_ca: ERR 0D19: "
                  "Invalid physical port. Aborting discovery\n");
         goto Exit;
     }
@@ -579,7 +579,7 @@ __osm_ni_rcv_process_ca_port(
   if( status != IB_SUCCESS )
   {
     osm_log( p_rcv->p_log, OSM_LOG_ERROR,
-             "__osm_ni_rcv_process_ca_port: ERR 0D13: "
+             "__osm_ni_rcv_process_existing_ca: ERR 0D13: "
              "Failure initiating PortInfo request (%s)\n",
              ib_get_err_str(status));
   }
@@ -592,22 +592,6 @@ __osm_ni_rcv_process_ca_port(
  The plock must be held before calling this function.
 **********************************************************************/
 static void
-__osm_ni_rcv_process_existing_ca(
-  IN const osm_ni_rcv_t* const p_rcv,
-  IN osm_node_t* const p_node,
-  IN const osm_madw_t* const p_madw )
-{
-  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca );
-
-  __osm_ni_rcv_process_ca_port( p_rcv, p_node, p_madw );
-
-  OSM_LOG_EXIT( p_rcv->p_log );
-}
-
-/**********************************************************************
- The plock must be held before calling this function.
-**********************************************************************/
-static void
 __osm_ni_rcv_process_new_router(
   IN const osm_ni_rcv_t* const p_rcv,
   IN osm_node_t* const p_node,


From aviram at dev.mellanox.co.il  Tue Sep 19 08:07:54 2006
From: aviram at dev.mellanox.co.il (Aviram Gutman)
Date: Tue, 19 Sep 2006 18:07:54 +0300
Subject: [openib-general] [openfabrics-ewg] OFED 1.1
In-Reply-To: <450ECD3E.8020703@dev.mellanox.co.il>
References: <450ECD3E.8020703@dev.mellanox.co.il>
Message-ID: <451007CA.7050809@dev.mellanox.co.il>

Aviram Gutman wrote:
> We want to have RC6 on Wed and final release next week on Tues or Wed 
> Sep-27.
> Is that acceptable by all EWG members?
>
> Regards,
>     Aviram
>
>
>
>
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
>   

We currently see two issues:

1) IPoIB multicast is not working on RHEL4 U4
2) iSER on SLES10 requires root privilege

I hope that Voltaire can fix issue #2. It seems that issue #1 is not 
solvable (unless we require the user to replace the kernel).
Are these issues showstoppers? Or can we issue RC6 with these issues 
outstanding?


Regards,

    Aviram


From rdreier at cisco.com  Tue Sep 19 09:27:45 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 09:27:45 -0700
Subject: [openib-general] Fwd: IPoIB Multicast
In-Reply-To: <20060919090849.GC32603@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 19 Sep 2006 12:08:49 +0300")
References: <adawt80n2gx.fsf@cisco.com> <20060919031534.GC30563@mellanox.co.il>
	<20060919090849.GC32603@mellanox.co.il>
Message-ID: <adahcz3n7ni.fsf@cisco.com>

    Michael> Works OK here. Please commit. Please note this does fix a
    Michael> real issue for us, which is quite severe for clusters
    Michael> where ipoib is the only interconnect, I wander whether
    Michael> this is 2.6.18 material.

I don't understand why this is a big problem.  What breaks if we let
OpenSM pick the MTU and Rate for a new multicast group?  It's already
picking them for the broadcast group.

Anyway I put this in my for-2.6.19 branch for now.

 - R.


From rdreier at cisco.com  Tue Sep 19 09:30:59 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 09:30:59 -0700
Subject: [openib-general] ipoib multicast problem
In-Reply-To: <1158647453.5392.66.camel@localhost> (Eli cohen's message
	of "Tue, 19 Sep 2006 09:30:53 +0300")
References: <1158647453.5392.66.camel@localhost>
Message-ID: <adad59rn7i4.fsf@cisco.com>

    Eli> 1. An application registers to a multicast group as a full
    Eli> member. As a result all the groups are listed in dev->mclist.
    Eli> 2. The infiniband link falls momentarily, opensm restarted
    Eli> etc.  3. All multicast memberships are flushed.  4. The net
    Eli> device will not join again until at a later time something
    Eli> will cause ipoib_set_mcast_list() to be called.
 
I don't understand.  How could ipoib rejoin the broadcast group and
then not rejoin the rest of the full member groups it has?

 - R.


From rdreier at cisco.com  Tue Sep 19 09:31:53 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 09:31:53 -0700
Subject: [openib-general] [PATCH] ipoib mcast restart
In-Reply-To: <1158647474.5392.68.camel@localhost> (Eli cohen's message
	of "Tue, 19 Sep 2006 09:31:14 +0300")
References: <1158647474.5392.68.camel@localhost>
Message-ID: <ada8xkfn7gm.fsf@cisco.com>

    Eli> Make sure after after ipoib_ib_dev_flush is executed,
    Eli> ipoib_mcast_restart_task is executed also to join all the
    Eli> mcast groups maintained by the kernel for the device.

Why is the ipoib_mcast_start_thread() at the end of ipoib_ib_dev_up()
not sufficient to rejoin all the mcgs?

 - R.


From rdreier at cisco.com  Tue Sep 19 09:40:54 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 09:40:54 -0700
Subject: [openib-general] [PATCH] IB/iser: fix iSER description and
 selections in Kconfig
In-Reply-To: <450FD858.3000507@voltaire.com> (Erez Zilber's message of
	"Tue, 19 Sep 2006 14:45:28 +0300")
References: <200609071902.57379.toralf.foerster@gmx.de>
	<200609101343.02740.toralf.foerster@gmx.de>
	<450401AE.2030606@voltaire.com>
	<200609101645.22695.toralf.foerster@gmx.de>
	<4505032B.3050706@voltaire.com> <ada1wqi79mb.fsf@cisco.com>
	<450912C0.8070807@voltaire.com> <45091AC4.3090005@voltaire.com>
	<adau03awjku.fsf@cisco.com> <450D0FCB.1000401@voltaire.com>
	<adaslipp8ge.fsf@cisco.com> <450FD858.3000507@voltaire.com>
Message-ID: <ada4pv3n71l.fsf@cisco.com>

    Erez> I don't agree with that. It is possible that
    Erez> INFINIBAND_ADDR_TRANS won't be selected according to your
    Erez> patch. How about this solution: iSER should depend on
    Erez> INFINIBAND && SCSI && INFINIBAND_ADDR_TRANS (which depends
    Erez> on INET, so the INET dependency is ok).

How is that possible?  If INFINIBAND and INET are selected, then
INFINIBAND_ADDR_TRANS is selected too (at least as far as I can see).
How do you enable INET without INFINIBAND_ADDR_TRANS?

I don't like making things depend on INFINIBAND_ADDR_TRANS, since it's
really just an internal symbol to prevent building ib_addr when it
won't build.

 - R.


From rdreier at cisco.com  Tue Sep 19 09:42:18 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 09:42:18 -0700
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
In-Reply-To: <450FE742.7040005@voltaire.com> (Or Gerlitz's message of
	"Tue, 19 Sep 2006 15:49:06 +0300")
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
	<aday7smwjmy.fsf@cisco.com> <450D36E9.1000502@voltaire.com>
	<aday7shp8oq.fsf@cisco.com> <450FE742.7040005@voltaire.com>
Message-ID: <adazmcvlset.fsf@cisco.com>

    Or> I am fine with having the CMA config selected whenever someone
    Or> selects INFINIBAND so adding the help text and making it
    Or> visible are not a must per my taste. However, are you fine
    Or> with changing the **name** of the config directive to
    Or> CONFIG_INFINIBAND_RDMA_CM so its better understood?

No, since really what it is controlling is the ib_addr module.

    Or> As Erez wrote you on the other thread, we must depend on the
    Or> CMA else a user running make rndconfig would be able to
    Or> produce a config file where INFINIBAND is selected but the CMA
    Or> (RDMA_ADDR_TRANS) config is not selected so linkage will fail.

How?  make randconfig won't produce invalid configurations.

 - R.


From rdreier at cisco.com  Tue Sep 19 09:44:57 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 09:44:57 -0700
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <20060919124546.GF32603@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 19 Sep 2006 15:45:46 +0300")
References: <20060919124546.GF32603@mellanox.co.il>
Message-ID: <adavenjlsae.fsf@cisco.com>

Seems OK from an anal spec compliance point of view, but I don't
understand this:

 > This breaks IPoIB on networks with SM Tavor quirk activates.

Even if opensm returns a path record with a lower MTU, the underlying
links still have a 2K mtu really, so nothing breaks.  IPoIB is just
doing something naughty by ignoring the MTU in the path record.  So
what breaks really?

(not to mention the fact that the "Tavor quirk" hasn't been accepted
into OpenSM yet anyway)

 - R.


From halr at voltaire.com  Tue Sep 19 10:16:56 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 13:16:56 -0400
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
 include/opensm/osm_pkey.h
In-Reply-To: <86zmcylc2e.fsf@mtl066.yok.mtl.com>
References: <86zmcylc2e.fsf@mtl066.yok.mtl.com>
Message-ID: <1158686214.4509.12804.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
> Hi Hal
> 
> Partition tables blocks are always 16 bits. 
> This resolves the need to later cast back and forth.
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only in conjunction with patch 10/13 on
osm_pkey.c.

-- Hal


From halr at voltaire.com  Tue Sep 19 10:17:05 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 13:17:05 -0400
Subject: [openib-general] [PATCH 10/13] osm: port to WinIB stack :
	opensm/osm_pkey.c
In-Reply-To: <86odtelbzy.fsf@mtl066.yok.mtl.com>
References: <86odtelbzy.fsf@mtl066.yok.mtl.com>
Message-ID: <1158686216.4509.12806.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote:
> Hi Hal
> 
> Some explicit casting required and also pkey blocks are only uint16_t .
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only in conjunction with patch 2/13 on
osm_pkey.h.

-- Hal


From halr at voltaire.com  Tue Sep 19 10:28:38 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 13:28:38 -0400
Subject: [openib-general] [PATCH 2/13] osm: port to WinIB stack :
	opensm/osm_subnet.c
In-Reply-To: <86venmlc1a.fsf@mtl066.yok.mtl.com>
References: <86venmlc1a.fsf@mtl066.yok.mtl.com>
Message-ID: <1158686918.4509.13210.camel@hal.voltaire.com>

Hi Eitan,

On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote:
> Hi Hal

I think this patch is really 5/13 rather than 2/13.

> No need for stdio.h but do need stdlib.h ...

It appears to be the other way around (stdio.h needed but stdlib.h
isn't), right ?

> Also map snprintf to _snprintf in windows case
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>
> 
> Index: opensm/osm_subnet.c
> ===================================================================
> --- opensm/osm_subnet.c	(revision 9502)
> +++ opensm/osm_subnet.c	(working copy)
> @@ -53,6 +53,7 @@
>  
>  #include <stdlib.h>

Should this include of stdlib.h also be removed ?

-- Hal

>  #include <string.h>
> +#include <stdio.h>
>  #include <complib/cl_debug.h>
>  #include <opensm/osm_subnet.h>
>  #include <opensm/osm_opensm.h>
> @@ -65,7 +66,6 @@
>  #include <opensm/osm_node.h>
>  #include <opensm/osm_multicast.h>
>  #include <opensm/osm_inform.h>
> -#include <stdlib.h>
>  
>  /**********************************************************************
>   **********************************************************************/
> @@ -659,6 +659,9 @@ __osm_subn_opts_unpack_charp(
>    }
>  }
>  
> +#ifdef WIN32
> +#define snprintf _snprintf
> +#endif
>  /**********************************************************************
>   **********************************************************************/
>  static void
> 


From dledford at redhat.com  Tue Sep 19 10:43:43 2006
From: dledford at redhat.com (Doug Ledford)
Date: Tue, 19 Sep 2006 13:43:43 -0400
Subject: [openib-general] ipoib multicast problems on RHEL4.0 u4
In-Reply-To: <1158666291.24776.32.camel@localhost>
References: <1158666291.24776.32.camel@localhost>
Message-ID: <1158687823.17671.119.camel@fc6.xsintricity.com>

On Tue, 2006-09-19 at 14:44 +0300, Eli cohen wrote:
> Hi,
> 
> while testing ipoib multicast on RHEL4.0 u4, I noticed that setsockopt()
> succeeds to add a multicast group to an interface but actually the
> multicast group is not added to the net_device. This means that an
> application cannot join a multicast group as a full member. When I
> examined the differences between the kernel sources for u3 and u4 I
> noticed that essential code was removed:
> 
> diff -ru net/ipv4/arp.c ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c
> --- net/ipv4/arp.c  2006-09-18 15:35:03.000000000 +0300
> +++ ../linux-2.6.9-42.ELsmp/net/ipv4/arp.c  2006-09-19
> 10:08:06.000000000 +0300
> @@ -213,9 +213,6 @@
>     case ARPHRD_IEEE802_TR:
>         ip_tr_mc_map(addr, haddr);
>         return 0;
> -   case ARPHRD_INFINIBAND:
> -       ip_ib_mc_map(addr, haddr);
> -       return 0;
>     default:
>         if (dir) {
>             memcpy(haddr, dev->broadcast, dev->addr_len);
> 
> 
> Can anyone suggest a workaround to this issue?

Short of spinning a kernel, it's going to be hard to work around.
Thanks for finding this, I'll track down how this got left out of the U4
kernel when it was in the U3 kernel :-/

-- 
Doug Ledford <dledford at redhat.com>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/38f94484/attachment.sig>

From halr at voltaire.com  Tue Sep 19 10:44:49 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 13:44:49 -0400
Subject: [openib-general] [PATCH 9/13] osm: port to WinIB stack :
	opensm/osm_prtn.c
In-Reply-To: <86psdulc0b.fsf@mtl066.yok.mtl.com>
References: <86psdulc0b.fsf@mtl066.yok.mtl.com>
Message-ID: <1158687889.4509.13818.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote:
> Hi Hal
> 
> Required cl_debug.h for PRIx64
> Also map snprintf to _snprintf and stat to _stat
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From halr at voltaire.com  Tue Sep 19 11:02:00 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 14:02:00 -0400
Subject: [openib-general] [PATCH 11/13] osm: port to WinIB stack :
	opensm/osm_log.c
In-Reply-To: <86mz8ylbzn.fsf@mtl066.yok.mtl.com>
References: <86mz8ylbzn.fsf@mtl066.yok.mtl.com>
Message-ID: <1158688911.4509.14416.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 12:00, Eitan Zahavi wrote:
> Hi Hal
> 
> 1. function mappings for stat, fstat and fileno
> 2. Currently no imp for log file truncation 
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From eli at dev.mellanox.co.il  Tue Sep 19 11:04:44 2006
From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il)
Date: Tue, 19 Sep 2006 21:04:44 +0300 (IDT)
Subject: [openib-general] ipoib multicast problem
In-Reply-To: <adad59rn7i4.fsf@cisco.com>
References: <1158647453.5392.66.camel@localhost>
 <adad59rn7i4.fsf@cisco.com>
Message-ID: <61036.212.235.62.73.1158689084.squirrel@dev.mellanox.co.il>

> >
> I don't understand.  How could ipoib rejoin the broadcast group and
> then not rejoin the rest of the full member groups it has?
>
>
That is because the broadcast group is not part of the multicast groups
maintained by the kernel but rather is part of ipoib and is joined from a
different function. The other full members are maintained by the kernel
for the net device and come from dev->mclist.


From eeb at bartonsoftware.com  Tue Sep 19 11:14:28 2006
From: eeb at bartonsoftware.com (Eric Barton)
Date: Tue, 19 Sep 2006 19:14:28 +0100
Subject: [openib-general] Completion callback /teardown race
Message-ID: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>


Hi,

I create 1 CQ just for receive completions on each of my QPs.  When I tear down
the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then wait
for all currently posted receives to complete.

This has worked just fine for me, but I've had a bug report from a site using
this software (possibly with HCAs I've not tested with) that another completion
callback can happen after all the posted receives have completed.

I supplied a debug/workaround patch that checks the CQ in this situation.  It
confirms that all posted receives have completed and that the CQ is in fact
empty.

Is this a bug, or an unavoidable race between arming the callback and polling
the CQ?

All the CQ callback does is wake a thread to poll the queue.  This effectively
keeps polling completions out of the CQ until it is empty. Then it calls
ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time.  

If this last call to ib_poll_cq() finds something, it repeats the whole process
- but can I be guaranteed another CQ callback in this case or is it
indeterminate?

-- 

                Cheers,
                        Eric


From halr at voltaire.com  Tue Sep 19 11:44:11 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 14:44:11 -0400
Subject: [openib-general] [PATCH 8/13] osm: port to WinIB stack :
	opensm/osm_opensm.c
In-Reply-To: <86r6yalc0j.fsf@mtl066.yok.mtl.com>
References: <86r6yalc0j.fsf@mtl066.yok.mtl.com>
Message-ID: <1158691450.4509.15987.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 11:59, Eitan Zahavi wrote:
> Hi Hal
> 
> Explicit NULL in empty array initializer
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From rdreier at cisco.com  Tue Sep 19 11:56:45 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 11:56:45 -0700
Subject: [openib-general] ipoib multicast problem
In-Reply-To: <61036.212.235.62.73.1158689084.squirrel@dev.mellanox.co.il>
	(eli@dev.mellanox.co.il's message of
	"Tue, 19 Sep 2006 21:04:44 +0300 (IDT)")
References: <1158647453.5392.66.camel@localhost> <adad59rn7i4.fsf@cisco.com>
	<61036.212.235.62.73.1158689084.squirrel@dev.mellanox.co.il>
Message-ID: <adamz8vlm6q.fsf@cisco.com>

    eli> That is because the broadcast group is not part of the
    eli> multicast groups maintained by the kernel but rather is part
    eli> of ipoib and is joined from a different function. The other
    eli> full members are maintained by the kernel for the net device
    eli> and come from dev->mclist.

Oh I see, when we flush the multicast groups we actually delete all of
them instead of just removing the attached flag.  OK I guess your fix
makes sense then.

 - R.


From eli at dev.mellanox.co.il  Tue Sep 19 12:08:30 2006
From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il)
Date: Tue, 19 Sep 2006 22:08:30 +0300 (IDT)
Subject: [openib-general] [PATCH] ipoib mcast restart
In-Reply-To: <ada8xkfn7gm.fsf@cisco.com>
References: <1158647474.5392.68.camel@localhost>
 <ada8xkfn7gm.fsf@cisco.com>
Message-ID: <61651.212.235.62.73.1158692910.squirrel@dev.mellanox.co.il>

> Why is the ipoib_mcast_start_thread() at the end of ipoib_ib_dev_up()
> not sufficient to rejoin all the mcgs?
>
Because after a port event all the mcast groups on the device are flushed
and all that remains is from the dev->mclist and we must renew the joins
from there.


From rdreier at cisco.com  Tue Sep 19 12:10:39 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 12:10:39 -0700
Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps
In-Reply-To: <20060919081324.GF31498@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 19 Sep 2006 11:13:24 +0300")
References: <20060919081324.GF31498@mellanox.co.il>
Message-ID: <adairjjlljk.fsf@cisco.com>

 > I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or
 > 2.6.18.1.

Makes sense -- I'll try to get this into 2.6.18, since it's a
one-liner and fixes a regression from 2.6.17.

 - R.


From rdreier at cisco.com  Tue Sep 19 12:13:36 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 12:13:36 -0700
Subject: [openib-general] [GIT PULL] please pull infiniband.git (one-liner
 fix for 2.6.18)
Message-ID: <adaeju7llen.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This contains another one-liner that fixes a regression from 2.6.17:

Jack Morgenstein:
      IB/mthca: Fix lid used for sending traps

 drivers/infiniband/hw/mthca/mthca_mad.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/infiniband/hw/mthca/mthca_mad.c b/drivers/infiniband/hw/mthca/mthca_mad.c
index d9bc030..45e106f 100644
--- a/drivers/infiniband/hw/mthca/mthca_mad.c
+++ b/drivers/infiniband/hw/mthca/mthca_mad.c
@@ -119,7 +119,7 @@ static void smp_snoop(struct ib_device *
 
 			mthca_update_rate(to_mdev(ibdev), port_num);
 			update_sm_ah(to_mdev(ibdev), port_num,
-				     be16_to_cpu(pinfo->lid),
+				     be16_to_cpu(pinfo->sm_lid),
 				     pinfo->neighbormtu_mastersmsl & 0xf);
 
 			event.device           = ibdev;


From trimmer at silverstorm.com  Tue Sep 19 12:24:27 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Tue, 19 Sep 2006 15:24:27 -0400
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EF2DD@mail.silverstorm.com>

> From: Eric Barton
> Sent: Tuesday, September 19, 2006 2:14 PM
> To: openib-general at openib.org
> Subject: [openib-general] Completion callback /teardown race
> 
> 
> 
> All the CQ callback does is wake a thread to poll the queue.  This
> effectively
> keeps polling completions out of the CQ until it is empty. Then it
calls
> ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time.
> 
> If this last call to ib_poll_cq() finds something, it repeats the
whole
> process
> - but can I be guaranteed another CQ callback in this case or is it
> indeterminate?
> 
The recommended algorithm would be:

poll_cq until empty
ib_req_notify_cq
poll_cq until empty

Once ib_req_notify_cq is called, its possible for an additional callback
to race with the poll_cq's which follow.

There are some differences in HCA behaviour with regard to
ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
the CQ is not empty at this point (in which case the poll_cq's after the
notify are optional).

However the behaviour defined in the IBTA spec indicates that
ib_req_notify_cq will cause a callback/interrupt only on the next CQE
which arrives, hence to be portable the poll_cq loop after
ib_req_notify_cq is necessary to cover any CQEs which arrived between
the prior poll and the ib_req_notify_cq.

Within a given callback invokation, there is no reason to call notify
more than once.

Todd Rimmer


From rdreier at cisco.com  Tue Sep 19 12:24:37 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 12:24:37 -0700
Subject: [openib-general] [PATCH] ipoib mcast restart
In-Reply-To: <1158647474.5392.68.camel@localhost> (Eli cohen's message
	of "Tue, 19 Sep 2006 09:31:14 +0300")
References: <1158647474.5392.68.camel@localhost>
Message-ID: <adaac4vlkwa.fsf@cisco.com>

OK, I applied this to for-2.6.19, although the patch was line-wrapped,
didn't have a usable subject, etc....  So...

<whine>
I merge > 100 patches every kernel release.  If I have to spend an
extra 5 minutes for each one fixing a patch or pulling it out of svn,
then I end up burning an extra 9 hours of stupid work.  If 20+ people
who contribute patches sent me clean patches, then everyone will be
happier because I'll be able to merge things quicker and focus on
productive work.
</whine>


From rdreier at cisco.com  Tue Sep 19 13:28:08 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 13:28:08 -0700
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <20060919124546.GF32603@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 19 Sep 2006 15:45:46 +0300")
References: <20060919124546.GF32603@mellanox.co.il>
Message-ID: <ada64fjlhyf.fsf@cisco.com>

I didn't really read the new patch before... anyway:

Why have you changed from the approach of just using the broadcast
group's MTU?  As far as I can see, the issue being addressed here is
purely theoretical anyway, but with the approach of taking the current
device MTU, you now have to flush all the paths if the configured MTU
changes, and you have to have a big switch in path_rec_start().

 - R.


From halr at voltaire.com  Tue Sep 19 13:59:32 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 16:59:32 -0400
Subject: [openib-general] IB diagnostics problems (OFED-1.1-rc5)
In-Reply-To: <450E976D.3070802@xiranet.com>
References: <450E7C0E.3020001@xiranet.com>
	<1158577816.18842.1501.camel@hal.voltaire.com>
	<450E8119.4060405@xiranet.com>
	<1158583167.18842.4632.camel@hal.voltaire.com>
	<450E976D.3070802@xiranet.com>
Message-ID: <1158699571.4509.21167.camel@hal.voltaire.com>

Hi Mirko,

On Mon, 2006-09-18 at 08:56, Mirko Benz wrote:
> Hi Hal,
> 
> Please prepare the bugzilla entry.

I entered the following:
http://openib.org/bugzilla/show_bug.cgi?id=238
http://openib.org/bugzilla/show_bug.cgi?id=239

Feel free to annotate it.

-- Hal

> It is not critical -- I just think it is not convenient for an end user.
> 
> Regards,
> Mirko
> 
> Hal Rosenstock schrieb:
> > Hi again Mirko,
> >
> > On Mon, 2006-09-18 at 07:20, Mirko Benz wrote:
> >   
> >> Hi Hal,
> >>
> >> This was a default/build all OFED install. Either we should place these 
> >> tools under ../ofed/sbin or make it work for every body.
> >>     
> >
> > The issue with making it work for everyone is that there's a chicken and
> > egg problem in that when the tools are built and installed, one doesn't
> > know how udev will be configured for umad. I agree that since the
> > default is to run as root, these should be in sbin rather than bin. Can
> > you file a bugzilla report for this (or do you want me to do it on your
> > behalf) ? Is this critical for OFED 1.1 ?
> >
> >   
> >>  At least a error message that umad access failed would be required.
> >>     
> > Those are scripts and the errors are being returned from the lower level
> > programs invoked but not by the scripts.
> >
> > Would you please file a bug for this as well (or let me know whether I
> > should do this) ? 
> >
> > Thanks.
> >
> > -- Hal
> >
> >   
> >> Regards,
> >> Mirko
> >>
> >> Hal Rosenstock schrieb:
> >>     
> >>> Hi Mirko,
> >>>
> >>> On Mon, 2006-09-18 at 06:59, Mirko Benz wrote:
> >>>   
> >>>       
> >>>> Hello,
> >>>>
> >>>> We are testing OFED-1.1-rc5 under Scientific Linux x86-64 (RHEL 4 clone).
> >>>> Some IB diagnostics tools e.g. ibhosts and ibswitches (located under 
> >>>> .../ofed/bin/)
> >>>> do not work with a normal user account -- no output given. It works as 
> >>>> root though.
> >>>>     
> >>>>         
> >>> It depends on how you have udev access for umad setup. With the default
> >>> setup for IB, root is required as these diagnostics send SMPs which
> >>> require umad access which is limited to root.
> >>>
> >>> -- Hal
> >>>
> >>>   
> >>>       
> >>>> Regards,
> >>>> Mirko
> >>>>
> >>>> _______________________________________________
> >>>> openib-general mailing list
> >>>> openib-general at openib.org
> >>>> http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>>>     
> >>>>         
> 


From rdreier at cisco.com  Tue Sep 19 14:17:03 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 14:17:03 -0700
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <200609191814.k8JIESjd007174@robert.bartonsoftware.com> (
	Eric Barton's message of "Tue, 19 Sep 2006 19:14:28 +0100")
References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
Message-ID: <ada1wq7lfow.fsf@cisco.com>

    Eric> If this last call to ib_poll_cq() finds something, it
    Eric> repeats the whole process - but can I be guaranteed another
    Eric> CQ callback in this case or is it indeterminate?

In general there is an unavoidable race, since you don't know whether
the new completion you find in the CQ was generated before or after
you requested notification.  So with the completion semantics as
defined in the IBA spec, you have the choice of two poisons:

 1) Don't poll after you request notification.  Then you run the risk
    of a completion being added after your last poll but before you
    requested notification.  If another completion never occurs, then
    you're stuck forever.

 2) Poll after you request notification.  Then you run the risk of
    having a completion added after your request for notification but
    before your final poll.  This means another completion event will
    be pending, but you will likely drain the CQ before you take the event.

However, Mellanox HCAs implement stronger semantics: they generate an
event if the CQ is not empty at the time notification is requested,
which closes the race between draining the CQ and requesting
notification.  This means *for Mellanox HCAs only* it is safe to do:

  completion_handler():
    poll CQ until empty
    request notification on CQ

with no additional poll after the request for notification.

I'll have more to say on this in the context of IPoIB and NAPI
shortly, since I've been thinking about this issue myself.

The ipath driver implements only the weaker semantics guaranteed by
the IBA spec -- ie an event is generated if a completion is added
after the request for notification.  And I don't know what ehca and
amso1100 implement to be honest.

(The Mellanox semantics are conforming though, since it's not
well-defined exactly when a completion is added to a CQ if no one looks...)

 - R.


From bevans at ocf.co.uk  Tue Sep 19 14:36:43 2006
From: bevans at ocf.co.uk (Barry Evans)
Date: Tue, 19 Sep 2006 22:36:43 +0100
Subject: [openib-general] Fluent and OFED
Message-ID: <43185D89536AD545B065B7ECEA630065AF28@mailserver.ocf.co.uk>

Hello,

 
Has anyone had any luck getting Fluent 6.2 to cooperate with OFED? I
think I've got all the libraries pointing to the right place, but I'm
ending up with the dreaded: "[1] Abort: [0] Abort: mpirun: executable
version 1 does not match our version 3." from mvapich. Ugh.

 
Cheers,

Barry

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/5cb9108b/attachment.html>

From trimmer at silverstorm.com  Tue Sep 19 14:47:19 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Tue, 19 Sep 2006 17:47:19 -0400
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <ada1wq7lfow.fsf@cisco.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EF34E@mail.silverstorm.com>

> From: Roland Dreier
> Sent: Tuesday, September 19, 2006 5:17 PM
> To: Eric Barton
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] Completion callback /teardown race
> 
> 
> I'll have more to say on this in the context of IPoIB and NAPI
> shortly, since I've been thinking about this issue myself.
> 
> The ipath driver implements only the weaker semantics guaranteed by
> the IBA spec -- ie an event is generated if a completion is added
> after the request for notification.  And I don't know what ehca and
> amso1100 implement to be honest.
> 
> (The Mellanox semantics are conforming though, since it's not
> well-defined exactly when a completion is added to a CQ if no one
> looks...)

An approach we implemented a few years ago in our proprietary stack was
a new verb (in addition to poll_cq and notify_req): poll_and_notify (we
called it iba_poll_and_rearm).

This verb always did a poll_cq, but if the CQ was drained it then did a
rearm of the CQ.  The return value from the call indicated what the next
step for the caller should be:
- SUCCESS - call poll_and_notify again (CQE returned)
- COMPLETED - nothing to do after this CQE (CQE returned, rearmed, no
need to poll anymore)
- POLL_NEEDED - loop on poll (CQE returned, rearmed, need to poll_cq til
empty)
- NOT_DONE - nothing more to do, no CQE (no CQE returned, rearmed, CQ
still empty, no need to poll anymore)
- error (invalid call, etc)

callback would loop on poll_and_notify as long as SUCCESS was returned.
afterwhich if POLL_NEEDED had been returned, it would loop on poll_cq

This approach provided 2 advantages:
1. for performance an extra 1-2 calls into the HCA driver per callback
were avoided.  The win here was saving some spin locks (in high CQE rate
drivers like IPoIB this was noticible).
2. on HCAs such as mellanox, POLL_NEEDED was never returned and the
caller never did unnecessary polls, however the caller and API was also
able to handle HCAs which did not have the mellanox semantics.

Todd Rimmer


From rdreier at cisco.com  Tue Sep 19 15:04:42 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 19 Sep 2006 15:04:42 -0700
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EF34E@mail.silverstorm.com>
	(Todd Rimmer's message of "Tue, 19 Sep 2006 17:47:19 -0400")
References: <D80D83302DEE6249A221093BF2BB69AE8EF34E@mail.silverstorm.com>
Message-ID: <adawt7zjyx1.fsf@cisco.com>

    Todd> An approach we implemented a few years ago in our
    Todd> proprietary stack was a new verb (in addition to poll_cq and
    Todd> notify_req): poll_and_notify (we called it
    Todd> iba_poll_and_rearm).

Makes sense but it doesn't actually help for NAPI for ipoib (to be
fair I haven't described the issue there yet).

 - R.


From ftillier at silverstorm.com  Tue Sep 19 15:27:39 2006
From: ftillier at silverstorm.com (Fabian Tillier)
Date: Tue, 19 Sep 2006 15:27:39 -0700
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EF2DD@mail.silverstorm.com>
References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
	<D80D83302DEE6249A221093BF2BB69AE8EF2DD@mail.silverstorm.com>
Message-ID: <79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com>

On 9/19/06, Rimmer, Todd <trimmer at silverstorm.com> wrote:
> > From: Eric Barton
> > Sent: Tuesday, September 19, 2006 2:14 PM
> > To: openib-general at openib.org
> > Subject: [openib-general] Completion callback /teardown race
> >
> > All the CQ callback does is wake a thread to poll the queue.  This
> > effectively
> > keeps polling completions out of the CQ until it is empty. Then it
> > calls
> > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) and ib_poll_cq() 1 more time.
> >
> > If this last call to ib_poll_cq() finds something, it repeats the
> > whole process
> > - but can I be guaranteed another CQ callback in this case or is it
> > indeterminate?
> >
> The recommended algorithm would be:
>
> poll_cq until empty
> ib_req_notify_cq
> poll_cq until empty

Note that if you are going to poll after ib_req_notify_cq, you can
simplify the above algorithm and just do:

ib_req_notify_cq
poll_cq until empty

However, such an algorithm will result in extra CQ events on Mellanox
HCAs.  On HCAs where the new CQ event is only generated for new CQEs
it works just as well as the opposite, which works only on Mellanox
HCAs:

poll_cq until empty
ib_req_notify_cq

> There are some differences in HCA behaviour with regard to
> ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
> the CQ is not empty at this point (in which case the poll_cq's after the
> notify are optional).
>
> However the behaviour defined in the IBTA spec indicates that
> ib_req_notify_cq will cause a callback/interrupt only on the next CQE
> which arrives, hence to be portable the poll_cq loop after
> ib_req_notify_cq is necessary to cover any CQEs which arrived between
> the prior poll and the ib_req_notify_cq.

I remember a while ago a mention that the behavior of the Mellanox
HCAs could be controlled in the firmware, so that they would follow
the IBTA spec defined behavior.

I don't know what the impact on performance would be if such a change
were made.  Perhaps someone from Mellanox can confirm/deny the HCAs
ability to implement the IBA spec behavior, and quantify the effects.

- Fab


From rjwalsh at pathscale.com  Tue Sep 19 17:09:03 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:09:03 -0700
Subject: [openib-general] gen2_basic patches
Message-ID: <4510869F.60309@pathscale.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

We've got some patches to gen2_basic to fix some problems with the test
suite.  Some are trivial (fix typos, etc.) and some are more serious
(handle max_qp counts correctly, etc.)  I'm going to be sending them out
piecemeal as we review them internally, and I'll make sure to send them
out in sequence (i.e. in the order they should be applied), so don't be
surprised to hear nothing for a day or two, then see some more patches ;-)

Regards,
 Robert.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iQEVAwUBRRCGnvzvnpzTd9fxAQKcegf/UtzQJiZFPRkcd4ZvBTHbNUdVK2NcQNkw
pAu/Mh2xRDboQ28btoJJbrERZ9VUpIlnyc8rQ2wRmDbkCQL/7vpDZkLK5XRYXZfg
DrwiXimRd8NHLfKVR/wbrR6QtuTDbIUpMWSpCFxkOoAYmKSRusjEoLK/Yf3gXggt
NsxoomFKSEPV3W2tgEn8Aanq0ZzfTPmBhFNbHPOrpyfb/tWFVc+IAQF/QFSai1Tm
PSjagRxTHY1eHCBHC7w1WZc7OOrSOBeKev5tzzcFO2PpzQ/3fAztcKRfDJ0UakIi
xvMOO+C0qM1EUowIRW+ymCoeFF5SXR6p2fuFeZ+vF6S6Sf9X1o7PLg==
=YULT
-----END PGP SIGNATURE-----


From rjwalsh at pathscale.com  Tue Sep 19 17:12:26 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:12:26 -0700
Subject: [openib-general] gen2_basic patch 1/10: fix some minor typos
Message-ID: <4510876A.6070602@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 01_typos.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/920c5224/attachment.ksh>

From rjwalsh at pathscale.com  Tue Sep 19 17:12:49 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:12:49 -0700
Subject: [openib-general] gen2_basic patch 2/10: fix up some compiler
	warnings
Message-ID: <45108781.8060602@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 02_warnings.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/7704fccd/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 02_warnings.patch.sig
Type: application/octet-stream
Size: 280 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/7704fccd/attachment.obj>

From rjwalsh at pathscale.com  Tue Sep 19 17:26:21 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:26:21 -0700
Subject: [openib-general] gen2_basic patch 3/10: fix is_global settings for
	AH attributes
Message-ID: <45108AAD.9040001@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 03_global.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/dad133f3/attachment.ksh>

From rjwalsh at pathscale.com  Tue Sep 19 17:27:43 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:27:43 -0700
Subject: [openib-general] gen2_basic patch 4/10: make sure the DLID is valid
Message-ID: <45108AFF.6070703@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 04_valid_lids.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/49a4edf6/attachment.ksh>

From rjwalsh at pathscale.com  Tue Sep 19 17:28:31 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:28:31 -0700
Subject: [openib-general] gen2_basic patch 5/10: select a valid port number
Message-ID: <45108B2F.8080207@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 05_valid_port.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/cee8fdb3/attachment.ksh>

From rjwalsh at pathscale.com  Tue Sep 19 17:29:27 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 17:29:27 -0700
Subject: [openib-general] gen2_basic patch 6/10: handle case where max_sge >
	100
Message-ID: <45108B67.1000606@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 06_max_sge.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060919/8b89d14f/attachment.ksh>

From halr at voltaire.com  Tue Sep 19 18:05:53 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 19 Sep 2006 21:05:53 -0400
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
 number
In-Reply-To: <45108B2F.8080207@pathscale.com>
References: <45108B2F.8080207@pathscale.com>
Message-ID: <1158714353.4509.30709.camel@hal.voltaire.com>

On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> gen2_basic - select a valid port number
> 
> Port numbers start at 1, not 0.

True for CA and routers but not switches.

> Signed-off by: Robert Walsh <robert.walsh at qlogic.com>
> 
> diff -rNu a/gen2_basic/test_poll_post.c b/gen2_basic/test_poll_post.c
> --- a/gen2_basic/test_poll_post.c	2006-09-13 19:09:47.410808000 -0700
> +++ b/gen2_basic/test_poll_post.c	2006-08-14 14:17:03.705821000 -0700
> @@ -283,7 +283,7 @@
>  			.dlid          = VL_range(rand_gen, 1, 0xffff),
>  			.sl            = VL_range(rand_gen, 0, 15),
>  			.src_path_bits = VL_range(rand_gen, 0, 0x8f),
> -			.port_num      = VL_random(rand_gen, device_attr.phys_port_cnt),
> +			.port_num      = VL_range(rand_gen, 1, device_attr.phys_port_cnt),
>  			.static_rate   = get_static_rate(1, rand_gen),
>  			.grh	       = {
>  				.traffic_class = VL_range(rand_gen, 1, 0xff),
> 
> ______________________________________________________________________
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rjwalsh at pathscale.com  Tue Sep 19 18:16:13 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 18:16:13 -0700
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
 number
In-Reply-To: <1158714353.4509.30709.camel@hal.voltaire.com>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
Message-ID: <4510965D.4040103@pathscale.com>

Hal Rosenstock wrote:
> On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
>> gen2_basic - select a valid port number
>>
>> Port numbers start at 1, not 0.
> 
> True for CA and routers but not switches.

Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
HCA-centric.

Regards,
 Robert.


From mst at mellanox.co.il  Tue Sep 19 21:26:18 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 07:26:18 +0300
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <ada64fjlhyf.fsf@cisco.com>
References: <ada64fjlhyf.fsf@cisco.com>
Message-ID: <20060920042618.GA1710@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
> I didn't really read the new patch before... anyway:
> 
> Why have you changed from the approach of just using the broadcast
> group's MTU?  As far as I can see, the issue being addressed here is
> purely theoretical anyway, but with the approach of taking the current
> device MTU, you now have to flush all the paths if the configured MTU
> changes, and you have to have a big switch in path_rec_start().
> 
>  - R.
> 

I'm not sure priv->broadcast is always initialized when we start
a path record query. Is there a reason why it is?

-- 
MST


From halr at voltaire.com  Tue Sep 19 21:39:48 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 00:39:48 -0400
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
 number
In-Reply-To: <4510965D.4040103@pathscale.com>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
	<4510965D.4040103@pathscale.com>
Message-ID: <1158727188.4509.39096.camel@hal.voltaire.com>

On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
> Hal Rosenstock wrote:
> > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> >> gen2_basic - select a valid port number
> >>
> >> Port numbers start at 1, not 0.
> > 
> > True for CA and routers but not switches.
> 
> Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
> HCA-centric.

Yes, that appears to be the scope but I'm not 100% sure.

-- Hal

> Regards,
>  Robert.


From mst at mellanox.co.il  Tue Sep 19 21:58:06 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 07:58:06 +0300
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <adavenjlsae.fsf@cisco.com>
References: <20060919124546.GF32603@mellanox.co.il> <adavenjlsae.fsf@cisco.com>
Message-ID: <20060920045806.GE1710@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
> Seems OK from an anal spec compliance point of view, but I don't
> understand this:
> 
>  > This breaks IPoIB on networks with SM Tavor quirk activates.
> 
> Even if opensm returns a path record with a lower MTU, the underlying
> links still have a 2K mtu really, so nothing breaks.  IPoIB is just
> doing something naughty by ignoring the MTU in the path record.  So
> what breaks really?

Maybe "breaks" was too strong a word. Let's change that to
"This makes IPoIB behave in a naughty way on networks with SM Tavor quirk
active" :)

> (not to mention the fact that the "Tavor quirk" hasn't been accepted
> into OpenSM yet anyway)

AFAIK it has been accepted.

-- 
MST


From mst at mellanox.co.il  Tue Sep 19 22:01:11 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 08:01:11 +0300
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <20060920042618.GA1710@mellanox.co.il>
References: <ada64fjlhyf.fsf@cisco.com> <20060920042618.GA1710@mellanox.co.il>
Message-ID: <20060920050111.GF1710@mellanox.co.il>

Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
> Quoting r. Roland Dreier <rdreier at cisco.com>:
> > Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> > 
> > I didn't really read the new patch before... anyway:
> > 
> > Why have you changed from the approach of just using the broadcast
> > group's MTU?  As far as I can see, the issue being addressed here is
> > purely theoretical anyway, but with the approach of taking the current
> > device MTU, you now have to flush all the paths if the configured MTU
> > changes, and you have to have a big switch in path_rec_start().
> > 
> >  - R.
> > 
> 
> I'm not sure priv->broadcast is always initialized when we start
> a path record query. Is there a reason why it is?

It also seemed kind of nice to be able to control the path MTU
from dev->mtu - and I don't think path flush on mtu change is an issue
from the performance POV.

What do you think?

-- 
MST


From mst at mellanox.co.il  Tue Sep 19 22:05:30 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 08:05:30 +0300
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
	number
In-Reply-To: <1158727188.4509.39096.camel@hal.voltaire.com>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
	<4510965D.4040103@pathscale.com>
	<1158727188.4509.39096.camel@hal.voltaire.com>
Message-ID: <20060920050530.GG1710@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: Re: gen2_basic patch 5/10: select a valid port number
> 
> On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
> > Hal Rosenstock wrote:
> > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> > >> gen2_basic - select a valid port number
> > >>
> > >> Port numbers start at 1, not 0.
> > > 
> > > True for CA and routers but not switches.
> > 
> > Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
> > HCA-centric.
> 
> Yes, that appears to be the scope but I'm not 100% sure.

Its easy to get linux running on a switch, so why not? You just
need to write a low level driver that cn send/receve MADs.
We did run a gen1 port on a switch at some point, and someone might want to
do it again.

-- 
MST


From mst at mellanox.co.il  Tue Sep 19 22:14:20 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 08:14:20 +0300
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com>
References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
	<D80D83302DEE6249A221093BF2BB69AE8EF2DD@mail.silverstorm.com>
	<79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com>
Message-ID: <20060920051420.GH1710@mellanox.co.il>

Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> > There are some differences in HCA behaviour with regard to
> > ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
> > the CQ is not empty at this point (in which case the poll_cq's after the
> > notify are optional).
> >
> > However the behaviour defined in the IBTA spec indicates that
> > ib_req_notify_cq will cause a callback/interrupt only on the next CQE
> > which arrives, hence to be portable the poll_cq loop after
> > ib_req_notify_cq is necessary to cover any CQEs which arrived between
> > the prior poll and the ib_req_notify_cq.
> 
> I remember a while ago a mention that the behavior of the Mellanox
> HCAs could be controlled in the firmware, so that they would follow
> the IBTA spec defined behavior.

There's a mistake here. Mellanox HCAs will generate an event upon
ib_req_notify_cq only if new completions has arrived after the previous event
has been reported.

AFAIK this is IBTA spec compliant.

-- 
MST


From rjwalsh at pathscale.com  Tue Sep 19 22:42:23 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Tue, 19 Sep 2006 22:42:23 -0700
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
	number
In-Reply-To: <20060920050530.GG1710@mellanox.co.il>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
	<4510965D.4040103@pathscale.com>
	<1158727188.4509.39096.camel@hal.voltaire.com>
	<20060920050530.GG1710@mellanox.co.il>
Message-ID: <4510D4BF.30907@pathscale.com>

> Its easy to get linux running on a switch, so why not? You just
> need to write a low level driver that cn send/receve MADs.
> We did run a gen1 port on a switch at some point, and someone might want to
> do it again.

OK - that's a fine project idea, but I'm not about to start coding it up 
any time soon :-)

In any case, if we're going to insist that this test run on a 
hypothetical switch gen2 distribution, then the "choose a random port" 
code needs to check if it's running on a CA or router versus a switch 
and choose the port range appropriately.

Regards,
  Robert.


From mst at mellanox.co.il  Tue Sep 19 23:02:04 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 09:02:04 +0300
Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps
In-Reply-To: <adairjjlljk.fsf@cisco.com>
References: <20060919081324.GF31498@mellanox.co.il> <adairjjlljk.fsf@cisco.com>
Message-ID: <20060920060204.GA2870@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] mthca: fix lid used for sending traps
> 
>  > I'm taking the fix into OFED 1.1 and I think it should go into 2.6.18 or
>  > 2.6.18.1.
> 
> Makes sense -- I'll try to get this into 2.6.18, since it's a
> one-liner and fixes a regression from 2.6.17.

Arrr!
http://lkml.org/lkml/2006/9/20/2

Missed 2.6.18 by a small margin. Gar! Acked for 2.6.18.1?

-- 
MST


From kliteyn at dev.mellanox.co.il  Tue Sep 19 23:36:52 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Wed, 20 Sep 2006 09:36:52 +0300
Subject: [openib-general] [PATCH] osm: fixing bugs in osmtest
Message-ID: <4510E184.8070900@dev.mellanox.co.il>

Hi Hal

I'm doing a major review of the osmtest.
This patch is fixing a few bugs in osmtest where failures
were ignored. More precisely, osmtest was expecting error,
but got IB_SUCCESS and ignored the fact that it should have
gotten an error.
There are also a few changes to improve the code and osmtest
log readability.
More patches expected.

This patch is for trunk only.

I tested applying this patch before sending it. If you get the
patch rejected again - let me know.

Thanks.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Index: osmtest/include/osmtest.h
===================================================================
--- osmtest/include/osmtest.h   (revision 9552)
+++ osmtest/include/osmtest.h   (working copy)
@@ -506,4 +506,13 @@ ib_api_status_t
  osmtest_get_local_port_lmc( IN osmtest_t * const p_osmt,
                              IN ib_net16_t  lid,
                              OUT uint8_t *  const p_lmc );
+
+
+/*
+ * A few auxiliary macros for logging
+ */
+
+#define EXPECTING_ERRORS_START "[[ ===== Expecting Errors - START ===== "
+#define EXPECTING_ERRORS_END   "   ===== Expecting Errors  -  END ===== ]]"
+
  #endif /* _OSMTEST_H_ */
Index: osmtest/osmtest.c
===================================================================
--- osmtest/osmtest.c   (revision 9552)
+++ osmtest/osmtest.c   (working copy)
@@ -552,6 +552,7 @@ osmtest_init( IN osmtest_t * const p_osm
      osm_log( &p_osmt->log, OSM_LOG_ERROR,
               "osmtest_init: ERR 0001: "
               "Unable to allocate vendor object" );
+    status = IB_ERROR;
      goto Exit;
    }

@@ -1817,6 +1818,11 @@ osmtest_wrong_sm_key_ignored( IN osmtest
      osm_log( &p_osmt->log, OSM_LOG_ERROR,
               "osmtest_wrong_sm_key_ignored: ERR 0011: "
               "Did not get a timeout but got (%s)\n", ib_get_err_str( status ) );
+    if ( status == IB_SUCCESS )
+    {
+      /* assign some error value to status, since IB_SUCCESS is a bad rc */
+      status = IB_ERROR;
+    }
      goto Exit;
    }
    else
@@ -5448,14 +5454,23 @@ osmtest_validate_against_db( IN osmtest_

    memset( &context, 0, sizeof( context ) );
    memset( &request, 0, sizeof( request ) );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
    status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
    if( status == IB_SUCCESS )
-    goto Exit;
-  else
    {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec: "
-             "IS EXPECTED ERROR ^^^^\n");
+    status = IB_ERROR;
+    goto Exit;
    }

    memset( &context, 0, sizeof( context ) );
@@ -5463,14 +5478,23 @@ osmtest_validate_against_db( IN osmtest_
    request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT;
    request.sgid_count = 1;
    ib_gid_set_default( &request.gids[0], portguid );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
    status = osmtest_get_multipath_rec( p_osmt, &request, &context );
-  if( status == IB_SUCCESS )
-    goto Exit;
-  else
+  if( status != IB_SUCCESS )
    {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec: "
-             "IS EXPECTED ERROR ^^^^\n");
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
+  if( status == IB_SUCCESS )
+  {
+    status = IB_ERROR;
+    goto Exit;
    }

    memset( &context, 0, sizeof( context ) );
@@ -5482,14 +5506,23 @@ osmtest_validate_against_db( IN osmtest_
    /* Set IPoIB broadcast MGID */
    request.gids[1].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL);
    request.gids[1].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL);
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
    status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
    if( status == IB_SUCCESS )
-    goto Exit;
-  else
    {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec: "
-             "IS EXPECTED ERROR ^^^^\n");
+    status = IB_ERROR;
+    goto Exit;
    }

    memset( &context, 0, sizeof( context ) );
@@ -5500,14 +5533,23 @@ osmtest_validate_against_db( IN osmtest_
    request.gids[0].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL);
    request.gids[0].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL);
    ib_gid_set_default( &request.gids[1], portguid );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
    status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
    if( status == IB_SUCCESS )
-    goto Exit;
-  else
    {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec_gid_ipoib_bcast: "
-             "IS EXPECTED ERROR ^^^^\n");
+    status = IB_ERROR;
+    goto Exit;
    }

    memset( &context, 0, sizeof( context ) );
@@ -5569,14 +5611,23 @@ osmtest_validate_against_db( IN osmtest_
      goto Exit;

    memset( &context, 0, sizeof( context ) );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
    status = osmtest_get_pkeytbl_rec_by_lid( p_osmt, test_lid, 0, &context );
-  if ( status == IB_SUCCESS )
-    goto Exit;
-  else
+  if( status != IB_SUCCESS )
    {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_pkeytbl_rec_by_lid: "
-             "IS EXPECTED ERROR ^^^^\n");
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
+  if( status == IB_SUCCESS )
+  {
+    status = IB_ERROR;
+    goto Exit;
    }

    memset( &context, 0, sizeof( context ) );
@@ -5679,26 +5730,43 @@ osmtest_validate_against_db( IN osmtest_
          goto Exit;

        memset( &context, 0, sizeof( context ) );
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" );
        status = osmtest_get_path_rec_by_lid_pair( p_osmt, 0xffff,
                                                   0xffff, &context );
-      if (status == IB_SUCCESS )
-        goto Exit;
-      else
+      if( status != IB_SUCCESS )
        {
-        osm_log ( &p_osmt->log, OSM_LOG_ERROR,
+         osm_log( &p_osmt->log, OSM_LOG_ERROR,
                    "osmtest_get_path_rec_by_lid_pair: "
-                  "IS EXPECTED ERROR ^^^^\n" );
+                  "Got error %s\n", ib_get_err_str(status) );
+      }
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" );
+
+      if( status == IB_SUCCESS )
+      {
+        status = IB_ERROR;
+        goto Exit;
        }

+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" );
+
        status = osmtest_get_path_rec_by_lid_pair( p_osmt, test_lid,
                                                   0xffff, &context );
-      if (status == IB_SUCCESS )
-        goto Exit;
-      else
+      if( status != IB_SUCCESS )
        {
-        osm_log ( &p_osmt->log, OSM_LOG_ERROR,
+         osm_log( &p_osmt->log, OSM_LOG_ERROR,
                    "osmtest_get_path_rec_by_lid_pair: "
-                  "IS EXPECTED ERROR ^^^^\n" );
+                  "Got error %s\n", ib_get_err_str(status) );
+      }
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" );
+
+      if( status == IB_SUCCESS )
+      {
+        status = IB_ERROR;
+        goto Exit;
        }
      }
    }
@@ -7141,6 +7209,9 @@ osmtest_run( IN osmtest_t * const p_osmt

    if( p_osmt->opt.flow == 1 )
    {
+    /*
+     * Creating an inventory file with all nodes, ports and paths
+     */
      status = osmtest_create_inventory_file( p_osmt );
      if( status != IB_SUCCESS )
      {
@@ -7155,6 +7226,9 @@ osmtest_run( IN osmtest_t * const p_osmt
    {
      if( p_osmt->opt.flow == 5 )
      {
+      /*
+       * Stress SA - flood the it with queries
+       */
        switch ( p_osmt->opt.stress )
        {
          case 0:
@@ -7215,8 +7289,11 @@ osmtest_run( IN osmtest_t * const p_osmt
        /*
         * Run normal validition tests.
         */
-       if (!p_osmt->opt.flow || p_osmt->opt.flow == 2)
+       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 2)
         {
+         /*
+          * Only validate the given inventory file
+          */
           status = osmtest_create_db( p_osmt );
           if( status != IB_SUCCESS )
           {
@@ -7238,7 +7315,7 @@ osmtest_run( IN osmtest_t * const p_osmt
           }
         }

-       if (!p_osmt->opt.flow)
+       if (p_osmt->opt.flow == 0)
         {
           status = osmtest_wrong_sm_key_ignored( p_osmt );
           if( status != IB_SUCCESS )
@@ -7251,8 +7328,11 @@ osmtest_run( IN osmtest_t * const p_osmt
           }
         }

-       if (!p_osmt->opt.flow || p_osmt->opt.flow == 3)
+       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 3)
         {
+         /*
+          * run service registration, deregistration, and lease test
+          */
           status = osmt_run_service_records_flow( p_osmt );
           if( status != IB_SUCCESS )
           {
@@ -7264,8 +7344,11 @@ osmtest_run( IN osmtest_t * const p_osmt
           }
         }

-       if (!p_osmt->opt.flow || p_osmt->opt.flow == 4)
+       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 4)
         {
+          /*
+           * Run event forwarding test
+           */
  #ifdef OSM_VENDOR_INTF_MTL
            status = osmt_run_inform_info_flow( p_osmt );

@@ -7286,12 +7369,13 @@ osmtest_run( IN osmtest_t * const p_osmt
  #endif
          }

-        /*
-         * since it generates a huge file, we run it only
-         * if explicitly required to
-         */
          if (p_osmt->opt.flow == 7)
          {
+          /*
+           * QoS info: dump VLArb and SLtoVL tables.
+           * Since it generates a huge file, we run it only
+           * if explicitly required to
+           */
            status = osmtest_create_db( p_osmt );
            if( status != IB_SUCCESS )
            {
@@ -7315,6 +7399,9 @@ osmtest_run( IN osmtest_t * const p_osmt

          if (p_osmt->opt.flow == 8)
          {
+          /*
+           * Run trap 64/65 flow
+           */
  #ifdef OSM_VENDOR_INTF_MTL
            status = osmt_run_trap64_65_flow( p_osmt  );
            if( status != IB_SUCCESS )
@@ -7334,8 +7421,11 @@ osmtest_run( IN osmtest_t * const p_osmt
  #endif
          }

-        if (!p_osmt->opt.flow || p_osmt->opt.flow == 6)
+        if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 6)
          {
+          /*
+           * Multicast flow
+           */
            status = osmt_run_mcast_flow( p_osmt );
            if( status != IB_SUCCESS )
            {


From dotanb at dev.mellanox.co.il  Wed Sep 20 00:40:40 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 20 Sep 2006 10:40:40 +0300
Subject: [openib-general] gen2_basic patches
In-Reply-To: <4510869F.60309@pathscale.com>
References: <4510869F.60309@pathscale.com>
Message-ID: <4510F078.4030401@dev.mellanox.co.il>

Robert Walsh wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi all,
>
> We've got some patches to gen2_basic to fix some problems with the test
> suite.  Some are trivial (fix typos, etc.) and some are more serious
> (handle max_qp counts correctly, etc.)  I'm going to be sending them out
> piecemeal as we review them internally, and I'll make sure to send them
> out in sequence (i.e. in the order they should be applied), so don't be
> surprised to hear nothing for a day or two, then see some more patches ;-)
>
> Regards,
>  Robert.
>   
Thank you (in advanced) for all of the patches that you will send us.

I will take the patches (and maybe modify them a little bit) and check 
it to the openib svn the final fixed version.

Thanks again.
Dotan


From ogerlitz at voltaire.com  Wed Sep 20 01:11:09 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 20 Sep 2006 11:11:09 +0300
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
Message-ID: <4510F79D.8010203@voltaire.com>

Eric Barton wrote:
> I create 1 CQ just for receive completions on each of my QPs.  When I tear down
> the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then wait
> for all currently posted receives to complete.

I understand your driver is a CMA consumer whose QP state transitions 
are carried out by the CMA. So you need ***not*** modify the QP state to 
error, as the CMA does it for you in rdma_disconnect() before sending 
the DREQ or DREP.

Please note that you need to call rdma_disconnect() in both sides, the 
one that initiates the disconnection but also on the side that gets the 
DREQ, that is suddenly gets RDMA_CM_EVENT_DISCONNECTED event (note that 
also the disconnection initiator would get this event and if you call 
there again to rdma_disconnect() its not going to break anything, i think).

Is it possible that manual QP modify to error in your code actually 
covered the latter case where you did not call rdma_disconnect()?

Or.


From dotanb at dev.mellanox.co.il  Wed Sep 20 02:15:59 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 20 Sep 2006 12:15:59 +0300
Subject: [openib-general] gen2_basic patches
In-Reply-To: <4510F078.4030401@dev.mellanox.co.il>
References: <4510869F.60309@pathscale.com>
	<4510F078.4030401@dev.mellanox.co.il>
Message-ID: <451106CF.10007@dev.mellanox.co.il>

Dotan Barak wrote:
> Robert Walsh wrote:
>   
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi all,
>>
>> We've got some patches to gen2_basic to fix some problems with the test
>> suite.  Some are trivial (fix typos, etc.) and some are more serious
>> (handle max_qp counts correctly, etc.)  I'm going to be sending them out
>> piecemeal as we review them internally, and I'll make sure to send them
>> out in sequence (i.e. in the order they should be applied), so don't be
>> surprised to hear nothing for a day or two, then see some more patches ;-)
>>
>> Regards,
>>  Robert.
>>   
>>     
> Thank you (in advanced) for all of the patches that you will send us.
>
> I will take the patches (and maybe modify them a little bit) and check 
> it to the openib svn the final fixed version.
>
> Thanks again.
> Dotan
>   
I applied all of the fixed that you sent me (1..6).
i will be happy if the next patches will be based on the latest test 
version that i just have committed.

thanks
Dotan


From ogerlitz at voltaire.com  Wed Sep 20 02:23:22 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 20 Sep 2006 12:23:22 +0300
Subject: [openib-general] [PATCH] IB/Kconfig: add help text and change
 CMA config name
In-Reply-To: <adazmcvlset.fsf@cisco.com>
References: <Pine.LNX.4.64.0609141005480.7597@zuben>
	<aday7smwjmy.fsf@cisco.com> <450D36E9.1000502@voltaire.com>
	<aday7shp8oq.fsf@cisco.com> <450FE742.7040005@voltaire.com>
	<adazmcvlset.fsf@cisco.com>
Message-ID: <4511088A.6000108@voltaire.com>

Roland Dreier wrote:
>     Or> I am fine with having the CMA config selected whenever someone
>     Or> selects INFINIBAND so adding the help text and making it
>     Or> visible are not a must per my taste. However, are you fine
>     Or> with changing the **name** of the config directive to
>     Or> CONFIG_INFINIBAND_RDMA_CM so its better understood?
> 
> No, since really what it is controlling is the ib_addr module.

Just for the record it is controlling the build of both ib_addr and 
rdma_cm modules where rdma address resolution is a part from the overall 
rdma communication management managed by the rdma_cm module.

Anyway, if you prefer to leave the config name as is, let it be.

>     Or> As Erez wrote you on the other thread, we must depend on the
>     Or> CMA else a user running make rndconfig would be able to
>     Or> produce a config file where INFINIBAND is selected but the CMA
>     Or> (RDMA_ADDR_TRANS) config is not selected so linkage will fail.
> 
> How?  make randconfig won't produce invalid configurations.

I think i got it (at last) if INFINIBAND is selected it causes the 
selection of INFINIBAND_ADDR_TRANS as long as INET is selected so if 
something (eg iSER) is dependent on INFINIBAND and INET make rndconfig 
would do the job of selecting both of them when it selects iSER, correct?

Or.


From halr at voltaire.com  Wed Sep 20 03:11:19 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 06:11:19 -0400
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
	number
In-Reply-To: <20060920050530.GG1710@mellanox.co.il>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
	<4510965D.4040103@pathscale.com>
	<1158727188.4509.39096.camel@hal.voltaire.com>
	<20060920050530.GG1710@mellanox.co.il>
Message-ID: <1158747078.4509.52336.camel@hal.voltaire.com>

On Wed, 2006-09-20 at 01:05, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: gen2_basic patch 5/10: select a valid port number
> > 
> > On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
> > > Hal Rosenstock wrote:
> > > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> > > >> gen2_basic - select a valid port number
> > > >>
> > > >> Port numbers start at 1, not 0.
> > > > 
> > > > True for CA and routers but not switches.
> > > 
> > > Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
> > > HCA-centric.
> > 
> > Yes, that appears to be the scope but I'm not 100% sure.
> 
> Its easy to get linux running on a switch, so why not? You just
> need to write a low level driver that cn send/receve MADs.
> We did run a gen1 port on a switch at some point, and someone might want to
> do it again.

And the only limitation would be what switch port 0 (extended, base)
supports relative to these tests.

-- Hal


From eeb at bartonsoftware.com  Wed Sep 20 03:25:00 2006
From: eeb at bartonsoftware.com (Eric Barton)
Date: Wed, 20 Sep 2006 11:25:00 +0100
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <4510F79D.8010203@voltaire.com>
Message-ID: <051b01c6dc9f$07900e70$0281a8c0@ebpc>

Or,

> Eric Barton wrote:
> > I create 1 CQ just for receive completions on each of my QPs.  When I
tear down
> > the QP, I rdma_disconnect(), change the QP state to IB_QPS_ERR and then
wait
> > for all currently posted receives to complete.
> 
> I understand your driver is a CMA consumer whose QP state transitions 
> are carried out by the CMA. So you need ***not*** modify the QP state to 
> error, as the CMA does it for you in rdma_disconnect() before sending 
> the DREQ or DREP.

Yes - understood.  It's not actually harmful at this point I think.  Please
correct me if I'm wrong.

> Please note that you need to call rdma_disconnect() in both sides, the 
> one that initiates the disconnection but also on the side that gets the 
> DREQ, that is suddenly gets RDMA_CM_EVENT_DISCONNECTED event...

I ensure I always call rdma_disconnect() once, no matter whether I am the
initiator or not.

                Cheers,
                        Eric


From ogerlitz at voltaire.com  Wed Sep 20 05:29:31 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 20 Sep 2006 15:29:31 +0300
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <051b01c6dc9f$07900e70$0281a8c0@ebpc>
References: <051b01c6dc9f$07900e70$0281a8c0@ebpc>
Message-ID: <4511342B.8090601@voltaire.com>

>> I understand your driver is a CMA consumer whose QP state transitions 
>> are carried out by the CMA. So you need ***not*** modify the QP state to 
>> error, as the CMA does it for you in rdma_disconnect() before sending 
>> the DREQ or DREP.

> Yes - understood.  It's not actually harmful at this point I think.  Please
> correct me if I'm wrong.

Not that its harmful, but its not needed, so it can confuse people 
looking/debugging this code...

>> Please note that you need to call rdma_disconnect() in both sides, the 
>> one that initiates the disconnection but also on the side that gets the 
>> DREQ, that is suddenly gets RDMA_CM_EVENT_DISCONNECTED event...

> I ensure I always call rdma_disconnect() once, no matter whether I am the
> initiator or not.

cool.

Or.


From halr at voltaire.com  Wed Sep 20 05:38:18 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 08:38:18 -0400
Subject: [openib-general] [PATCH] osm: fixing bugs in osmtest
In-Reply-To: <4510E184.8070900@dev.mellanox.co.il>
References: <4510E184.8070900@dev.mellanox.co.il>
Message-ID: <1158755895.4509.58062.camel@hal.voltaire.com>

Hi Yevgeny,

On Wed, 2006-09-20 at 02:36, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> I'm doing a major review of the osmtest.

Good. This has been long overdue.

> This patch is fixing a few bugs in osmtest where failures
> were ignored. More precisely, osmtest was expecting error,
> but got IB_SUCCESS and ignored the fact that it should have
> gotten an error.
> There are also a few changes to improve the code and osmtest
> log readability.

Looks good at the code inspection level.

> More patches expected.

Thanks for the heads up.

> This patch is for trunk only.
> 
> I tested applying this patch before sending it. If you get the
> patch rejected again - let me know.

It took the header file part but rejected all code blocks for osmtest.c
:-(

-- Hal


From erezz at voltaire.com  Wed Sep 20 05:45:13 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 20 Sep 2006 15:45:13 +0300
Subject: [openib-general] 2 SLES 10 backport directories
In-Reply-To: <20060917044626.GA26054@mellanox.co.il>
References: <450915EE.1090705@voltaire.com>
	<20060917044626.GA26054@mellanox.co.il>
Message-ID: <451137D9.3060607@voltaire.com>

Michael S. Tsirkin wrote:
> Quoting r. Erez Zilber <erezz at voltaire.com>:
>   
>> Subject: 2 SLES 10 backport directories
>>
>> Michael,
>>
>> I saw that there are 2 SLES 10 backport directories in the svn:
>>
>> https://openib.org/svn/gen2/branches/backport/sles10/ - this one 
>> contains patches that we added for SLES 10
>>
>> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one 
>> was added later by you.
>>
>> Can we unite them?
>>
>> Here's my motivation: I want to be able to install SLES 10, replace its 
>> infiniband dir with infiniband from openib's svn, apply all SLES 10 
>> patches (from a single directory) and then it should work.
>>
>> This should help us in future OFED releases.
>>     
>
> I'd like that too, but there's a difficulty here.
>
> The rest of the backport patches make it possible to build
> IB support out of kernel, without patching the kernel code itself.
> This is an explicit requirement of some users, so we made an effort
> to preserve this ability, and so far it works with the rest of the IB stack -
> assuming that user has built infiniband support as a module or disabled it -
> but that's what most people currenty have, anyway.
>
> Unfortunately sles10 patches for iser that you mention violate this rule - they
> patch the iscsi support that is already there as part of the kernel.
> So unless this can be fixed somehow, we need the iscsi stuff separate, so that
> 1. we know to apply it in kernel source directory, not where we unpacked IB code
> 2. it can be applied conditionally when the user has enabled iser, so that
>    others still have the ability not to touch their kernel
>   
I think that we can throw away 
https://openib.org/svn/gen2/branches/backport/sles10/. These patches 
apply to SLES 10 beta 8. They are no longer needed. As for 
https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/, it 
contains 2 iSER patches. Both affect only iSER code (nothing in 
open-iscsi or any other kernel code). Therefore, I think that it's ok.

What do you think?

Erez


From rjwalsh at pathscale.com  Wed Sep 20 09:19:01 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 09:19:01 -0700
Subject: [openib-general] gen2_basic patches
In-Reply-To: <451106CF.10007@dev.mellanox.co.il>
References: <4510869F.60309@pathscale.com>
	<4510F078.4030401@dev.mellanox.co.il>
	<451106CF.10007@dev.mellanox.co.il>
Message-ID: <451169F5.9040804@pathscale.com>

Dotan Barak wrote:
> Dotan Barak wrote:
>> Robert Walsh wrote:
>>  
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi all,
>>>
>>> We've got some patches to gen2_basic to fix some problems with the test
>>> suite.  Some are trivial (fix typos, etc.) and some are more serious
>>> (handle max_qp counts correctly, etc.)  I'm going to be sending them out
>>> piecemeal as we review them internally, and I'll make sure to send them
>>> out in sequence (i.e. in the order they should be applied), so don't be
>>> surprised to hear nothing for a day or two, then see some more 
>>> patches ;-)
>>>
>>> Regards,
>>>  Robert.
>>>       
>> Thank you (in advanced) for all of the patches that you will send us.
>>
>> I will take the patches (and maybe modify them a little bit) and check 
>> it to the openib svn the final fixed version.
>>
>> Thanks again.
>> Dotan
>>   
> I applied all of the fixed that you sent me (1..6).
> i will be happy if the next patches will be based on the latest test 
> version that i just have committed.

Thanks, Dotan.  I'll make sure the next bunch are against the latest stuff.


From ftillier at silverstorm.com  Wed Sep 20 09:30:59 2006
From: ftillier at silverstorm.com (Fabian Tillier)
Date: Wed, 20 Sep 2006 09:30:59 -0700
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <20060920051420.GH1710@mellanox.co.il>
References: <200609191814.k8JIESjd007174@robert.bartonsoftware.com>
	<D80D83302DEE6249A221093BF2BB69AE8EF2DD@mail.silverstorm.com>
	<79ae2f320609191527j6fbeeafbu74f7ba468cce1f73@mail.gmail.com>
	<20060920051420.GH1710@mellanox.co.il>
Message-ID: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com>

Hi Michael,

On 9/19/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> > > There are some differences in HCA behaviour with regard to
> > > ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt if
> > > the CQ is not empty at this point (in which case the poll_cq's after the
> > > notify are optional).
> > >
> > > However the behaviour defined in the IBTA spec indicates that
> > > ib_req_notify_cq will cause a callback/interrupt only on the next CQE
> > > which arrives, hence to be portable the poll_cq loop after
> > > ib_req_notify_cq is necessary to cover any CQEs which arrived between
> > > the prior poll and the ib_req_notify_cq.
> >
> > I remember a while ago a mention that the behavior of the Mellanox
> > HCAs could be controlled in the firmware, so that they would follow
> > the IBTA spec defined behavior.
>
> There's a mistake here. Mellanox HCAs will generate an event upon
> ib_req_notify_cq only if new completions has arrived after the previous event
> has been reported.

Thanks for correcting me - I expected my memory to be a bit rusty.  In
this case, is there any benefit in polling before calling
ib_req_notify_cq?

> AFAIK this is IBTA spec compliant.

Yes, I believe it is too.  Do you know if there is any impact on
performance in doing the following for completion processing:

ib_req_notify_cq
poll_cq until empty

Thanks,

- Fab


From dotanb at dev.mellanox.co.il  Wed Sep 20 09:52:58 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 20 Sep 2006 19:52:58 +0300
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
 number
In-Reply-To: <1158747078.4509.52336.camel@hal.voltaire.com>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
	<4510965D.4040103@pathscale.com>
	<1158727188.4509.39096.camel@hal.voltaire.com>
	<20060920050530.GG1710@mellanox.co.il>
	<1158747078.4509.52336.camel@hal.voltaire.com>
Message-ID: <451171EA.7010602@dev.mellanox.co.il>

Hal Rosenstock wrote:
> On Wed, 2006-09-20 at 01:05, Michael S. Tsirkin wrote:
>   
>> Quoting r. Hal Rosenstock <halr at voltaire.com>:
>>     
>>> Subject: Re: gen2_basic patch 5/10: select a valid port number
>>>
>>> On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
>>>       
>>>> Hal Rosenstock wrote:
>>>>         
>>>>> On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
>>>>>           
>>>>>> gen2_basic - select a valid port number
>>>>>>
>>>>>> Port numbers start at 1, not 0.
>>>>>>             
>>>>> True for CA and routers but not switches.
>>>>>           
>>>> Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
>>>> HCA-centric.
>>>>         
>>> Yes, that appears to be the scope but I'm not 100% sure.
>>>       
>> Its easy to get linux running on a switch, so why not? You just
>> need to write a low level driver that cn send/receve MADs.
>> We did run a gen1 port on a switch at some point, and someone might want to
>> do it again.
>>     
>
> And the only limitation would be what switch port 0 (extended, base)
> supports relative to these tests.
>
> -- Hal
>   
Hi.

This test was written in order to check the verbs layer and it was 
developed over an HCA (and being executed every day on all of our HCAs).

I don't know what is the expected result of executing this test over a 
switch and if there should be some changes in order to check switch 
features.
If I'll get this input, i will add the needed features/code to the test.

Dotan


From ralphc at pathscale.com  Wed Sep 20 10:29:38 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Wed, 20 Sep 2006 10:29:38 -0700
Subject: [openib-general] gen2_basic patch 5/10: select a valid port
 number
In-Reply-To: <1158727188.4509.39096.camel@hal.voltaire.com>
References: <45108B2F.8080207@pathscale.com>
	<1158714353.4509.30709.camel@hal.voltaire.com>
	<4510965D.4040103@pathscale.com>
	<1158727188.4509.39096.camel@hal.voltaire.com>
Message-ID: <1158773378.3608.9.camel@brick.pathscale.com>

In either case, if we want to support testing switches and HCAs,
we should have a command line option to change the tests
as appropriate for each.

On Wed, 2006-09-20 at 00:39 -0400, Hal Rosenstock wrote:
> On Tue, 2006-09-19 at 21:16, Robert Walsh wrote:
> > Hal Rosenstock wrote:
> > > On Tue, 2006-09-19 at 20:28, Robert Walsh wrote:
> > >> gen2_basic - select a valid port number
> > >>
> > >> Port numbers start at 1, not 0.
> > > 
> > > True for CA and routers but not switches.
> > 
> > Yeah.  Does anyone run gen2_basic on switches, though?  I assumed it was
> > HCA-centric.
> 
> Yes, that appears to be the scope but I'm not 100% sure.
> 
> -- Hal
> 
> > Regards,
> >  Robert.
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mshefty at ichips.intel.com  Wed Sep 20 10:59:18 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 20 Sep 2006 10:59:18 -0700
Subject: [openib-general] Negotiation of Rsponder resource & Initiator
 depth
In-Reply-To: <450E8991.5080603@voltaire.com>
References: <450E8991.5080603@voltaire.com>
Message-ID: <45118176.6040106@ichips.intel.com>

Erez Zilber wrote:
> In the IB spec it says in 12.7.29:
> 
> The recipient of the REQ message shall choose a local Initiator Depth that
> does not exceed the Responder Resources offered in the REQ. If the recipient
> of the REQ message is unwilling or unable to do so, it shall send a
> REJ message to discontinue the connection establishment.
> 
>  From reading the CMA code, I see that it does not negotiate these 
> values (responder resources & initiator depth). It expects the ULP to 
> negotiate it. Why? Shouldn't it be done by the CMA?

There's a bug in the CMA interface in that it doesn't expose the requested 
connection parameters up to a listener.  I have plans to fix this in the short 
term, but the negotiation is still left to the user.  I don't think that the CMA 
knows enough about what the application is trying to do to set this for it.

- Sean


From amit_byron at yahoo.com  Wed Sep 20 11:47:36 2006
From: amit_byron at yahoo.com (amit byron)
Date: Wed, 20 Sep 2006 11:47:36 -0700 (PDT)
Subject: [openib-general] max message size for IB_WR_SEND
Message-ID: <20060920184736.28168.qmail@web38513.mail.mud.yahoo.com>


hi,

if i evoke/call ib_post_send(IB_WR_SEND) with message
size 512 bytes, the message gets received on the
peer (second) node. the 2 nodes are connected point-to
-point.

but if message size is increased to 4096 bytes then
second node receives the message; but message content
is missing (empty).

won't infiniband stack break down message in smaller
chunks and assemble on peer node?

thanks,
Amit.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From amit_byron at yahoo.com  Wed Sep 20 11:50:16 2006
From: amit_byron at yahoo.com (amit byron)
Date: Wed, 20 Sep 2006 18:50:16 +0000 (UTC)
Subject: [openib-general] =?utf-8?q?max_message_size_for_IB=5FWR=5FSEND?=
Message-ID: <loom.20060920T204936-772@post.gmane.org>


hi,

if i evoke/call ib_post_send(IB_WR_SEND) with message
size 512 bytes, the message gets received on the
peer (second) node. the 2 nodes are connected point-to
-point.

but if message size is increased to 4096 bytes then
second node receives the message; but message content
is missing (empty).

won't infiniband stack break down message in smaller
chunks and assemble on peer node?

thanks,
Amit.


From rjwalsh at pathscale.com  Wed Sep 20 12:08:29 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 12:08:29 -0700
Subject: [openib-general] gen2_basic patch boogum?
Message-ID: <451191AD.80608@pathscale.com>

Hi Dotan,

I just noticed that you didn't apply one of my patch hunks that called
get_is_global().  I know why you didn't do it (the dlid is always 0: see
line 218 of test_av.c), but should you still be setting the is_global
field in the ah_attr structure to some value?  Right now, it will just
be set to some random unitialized stack value.

Regards,
 Robert.


From mst at mellanox.co.il  Wed Sep 20 13:42:20 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 23:42:20 +0300
Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64
In-Reply-To: <1158765336.4509.64116.camel@hal.voltaire.com>
References: <1158765336.4509.64116.camel@hal.voltaire.com>
Message-ID: <20060920204220.GA9724@mellanox.co.il>


Quoting r. Hal Rosenstock <halr at voltaire.com>:
> Subject: RE: OFED 1.1 and OpenSM on SLES 10 for PPC64
> 
> On Wed, 2006-09-20 at 11:11, Eitan Zahavi wrote:
> > I will try to get to that tomorrow
> 
> It's not an OpenSM issue. See the latest info in the bug report:
> http://openib.org/bugzilla/show_bug.cgi?id=241

I dug in a bit and I'm not sure what's the root cause,
but what is triggering the problem is that the saquery diag
utility depends on opensm, which makes a mess of dependencies,
and at some point libtool goes berserk.

Short term, can we just skip saquery utility in OFED 1.1?
Hal, can you approve this please?

Longer term, I think saquery should be fixed not to depend on opensm - opensm is
a large tool, complicated by portability requirements etc, and it is a waste to
need parts of it on endnodes just to be able to run some diagnostics.
With RMPP support in kernel, we really sholdn't need an extra depenency
just to push a query and get a response.
Comments?

-- 
MST


From mst at mellanox.co.il  Wed Sep 20 13:57:00 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 20 Sep 2006 23:57:00 +0300
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com>
References: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com>
Message-ID: <20060920205700.GB9724@mellanox.co.il>

Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> > There's a mistake here. Mellanox HCAs will generate an event upon
> > ib_req_notify_cq only if new completions has arrived after the previous event
> > has been reported.
> 
> Thanks for correcting me - I expected my memory to be a bit rusty.  In
> this case, is there any benefit in polling before calling
> ib_req_notify_cq?
> 
> > AFAIK this is IBTA spec compliant.
> 
> Yes, I believe it is too.  Do you know if there is any impact on
> performance in doing the following for completion processing:
> 
> ib_req_notify_cq
> poll_cq until empty

Some additional polling has a chance to improve performance on any
hardware: it increases the chance that you do a cheap poll for completion
instead of getting a (typically expensive) notification interrupt.
And its a win on any hardware to delay ib_req_notify_cq
as long as possible, so that a single event reports as many completions
as possible.

That's why it's common to e.g.

poll_cq until empty
ib_req_notify_cq
poll_cq until empty

this might work well for bursty traffic, where once CQ is empty it will stay
empty for a while.

There's no reason why polling twice will work best in all cases however -
it's easy to invent other heuristics:

for(i=0;i<1000;++i)
	poll_cq until empty
ib_req_notify_cq
for(i=0;i<10;++i)
	poll_cq until empty

etc.

what works best depends on the application.

-- 
MST


From mst at mellanox.co.il  Wed Sep 20 14:07:16 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Sep 2006 00:07:16 +0300
Subject: [openib-general] 2 SLES 10 backport directories
In-Reply-To: <451137D9.3060607@voltaire.com>
References: <450915EE.1090705@voltaire.com>
	<20060917044626.GA26054@mellanox.co.il> <451137D9.3060607@voltaire.com>
Message-ID: <20060920210716.GD9724@mellanox.co.il>

Quoting r. Erez Zilber <erezz at voltaire.com>:
> >> I saw that there are 2 SLES 10 backport directories in the svn:
> >>
> >> https://openib.org/svn/gen2/branches/backport/sles10/ - this one 
> >> contains patches that we added for SLES 10
> >>
> >> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/ - this one 
> >> was added later by you.
> >>
> >> Can we unite them?
> >>
> I think that we can throw away 
> https://openib.org/svn/gen2/branches/backport/sles10/. These patches 
> apply to SLES 10 beta 8. They are no longer needed. As for 
> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/, it 
> contains 2 iSER patches. Both affect only iSER code (nothing in 
> open-iscsi or any other kernel code). Therefore, I think that it's ok.

Go ahead and kill backport/sles10 then.
But the whole backport dir should be updated from OFED tree
or better killed once Sean switches to git.

-- 
MST


From rjwalsh at pathscale.com  Wed Sep 20 14:17:04 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 14:17:04 -0700
Subject: [openib-general] gen2_basic patch boogum?
In-Reply-To: <451191AD.80608@pathscale.com>
References: <451191AD.80608@pathscale.com>
Message-ID: <4511AFD0.9030902@pathscale.com>

Another quick question: I noticed that in the latest changes your
pushed, including my patches, you removed the following check in test_qp.c:

@@ -1702,7 +1700,6 @@
                CHECK_VALUE("qp_type", query_init_attr.qp_type,
attr.qp_type, goto cleanup);
                CHECK_VALUE_PTR("recv_cq", query_init_attr.recv_cq,
attr.recv_cq, goto cleanup);
                CHECK_VALUE_PTR("send_cq", query_init_attr.send_cq,
attr.send_cq, goto cleanup);
-               CHECK_VALUE("sq_sig_all", query_init_attr.sq_sig_all,
attr.sq_sig_all, goto cleanup);
                CHECK_VALUE_PTR("srq", query_init_attr.srq, attr.srq,
goto cleanup);
        }
        PASSED;

Any particular reason why you removed this?  I don't ever remember this
being a problem on ipath or mthca.

Regards,
 Robert.


From ftillier at silverstorm.com  Wed Sep 20 14:16:54 2006
From: ftillier at silverstorm.com (Fabian Tillier)
Date: Wed, 20 Sep 2006 14:16:54 -0700
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <20060920205700.GB9724@mellanox.co.il>
References: <79ae2f320609200930xd15aaf8safc62dfee6064cc4@mail.gmail.com>
	<20060920205700.GB9724@mellanox.co.il>
Message-ID: <79ae2f320609201416n6c61bd02p5c92701253f6c6b3@mail.gmail.com>

On 9/20/06, Michael S. Tsirkin <mst at mellanox.co.il> wrote:
> Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> > > There's a mistake here. Mellanox HCAs will generate an event upon
> > > ib_req_notify_cq only if new completions has arrived after the previous event
> > > has been reported.
> >
> > Thanks for correcting me - I expected my memory to be a bit rusty.  In
> > this case, is there any benefit in polling before calling
> > ib_req_notify_cq?
> >
> > > AFAIK this is IBTA spec compliant.
> >
> > Yes, I believe it is too.  Do you know if there is any impact on
> > performance in doing the following for completion processing:
> >
> > ib_req_notify_cq
> > poll_cq until empty
>
> Some additional polling has a chance to improve performance on any
> hardware: it increases the chance that you do a cheap poll for completion
> instead of getting a (typically expensive) notification interrupt.
> And its a win on any hardware to delay ib_req_notify_cq
> as long as possible, so that a single event reports as many completions
> as possible.

Ok, now you have me confused.  Based on what you said for Mellanox
HCAs, a new CQ event will be generated when the CQ is rearmed if any
CQEs where written since the last event was generated.  To me this
means that it doesn't matter if these CQEs where reaped or not.

That is, at t0 you have a CQE written and a CQ notification.  At t1
you have nother CQE written.  At t2 you poll both CQEs and rearm.
Since the CQE from t1 was written after the last event, I would expect
(based on your description) that I would get another CQ notification,
eventhough I already reaped the CQE.

Did you mean that the hardware will only generate a new event if there
are any un-reaped CQEs?

- Fab


From rjwalsh at pathscale.com  Wed Sep 20 14:20:04 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 14:20:04 -0700
Subject: [openib-general] gen2_basic patch 7/10: choose illegal
 max_qp_init_rd_atom values correctly
Message-ID: <4511B084.8070100@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 07_rd_atom.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060920/a13ee9b0/attachment.ksh>

From rjwalsh at pathscale.com  Wed Sep 20 14:28:32 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 14:28:32 -0700
Subject: [openib-general] gen2_basic patch 8/10: handle auto path migration
	properly
Message-ID: <4511B280.7060906@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 08_cleanup_mask.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060920/a8e28b8f/attachment.ksh>

From rjwalsh at pathscale.com  Wed Sep 20 14:29:13 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 14:29:13 -0700
Subject: [openib-general] gen2_basic patch 9/10: fix static_rate check
Message-ID: <4511B2A9.8070105@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 09_static_rate.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060920/b8488ad6/attachment.ksh>

From rjwalsh at pathscale.com  Wed Sep 20 14:29:49 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 14:29:49 -0700
Subject: [openib-general] gen2_basic patch 10/10: handle other vendor
 devices for max QP count
Message-ID: <4511B2CD.2020104@pathscale.com>

An embedded and charset-unspecified text was scrubbed...
Name: 10_num_qp.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060920/ee3660bc/attachment.ksh>

From halr at voltaire.com  Wed Sep 20 14:31:59 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 17:31:59 -0400
Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64
In-Reply-To: <20060920204220.GA9724@mellanox.co.il>
References: <1158765336.4509.64116.camel@hal.voltaire.com>
	<20060920204220.GA9724@mellanox.co.il>
Message-ID: <1158787917.4509.78684.camel@hal.voltaire.com>

On Wed, 2006-09-20 at 16:42, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: RE: OFED 1.1 and OpenSM on SLES 10 for PPC64
> > 
> > On Wed, 2006-09-20 at 11:11, Eitan Zahavi wrote:
> > > I will try to get to that tomorrow
> > 
> > It's not an OpenSM issue. See the latest info in the bug report:
> > http://openib.org/bugzilla/show_bug.cgi?id=241
> 
> I dug in a bit and I'm not sure what's the root cause,
> but what is triggering the problem is that the saquery diag
> utility depends on opensm,

No, it depends on the opensm library not opensm. This was all fine until
the libraries were broken into a separate RPM for OFED to attempt to
decouple them from OpenSM.

> which makes a mess of dependencies,

It requires opensm library for the SA client interface and complib for
portability. I believe this is no different than some other IB utilities
in OFED too.

>  and at some point libtool goes berserk.

Huh ?

> Short term, can we just skip saquery utility in OFED 1.1?
> Hal, can you approve this please?

I would prefer that is not the case and this is part of OFED 1.1.

> Longer term, I think saquery should be fixed not to depend on opensm - opensm is
> a large tool, complicated by portability requirements etc, and it is a waste to
> need parts of it on endnodes just to be able to run some diagnostics.
> With RMPP support in kernel, we really sholdn't need an extra depenency
> just to push a query and get a response.
> Comments?

It could depend on Sean's new user space SA client API (which perhaps
needs some more infrastructure) but we are not there yet.

-- Hal


From mst at mellanox.co.il  Wed Sep 20 14:38:40 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Sep 2006 00:38:40 +0300
Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64
In-Reply-To: <1158787917.4509.78684.camel@hal.voltaire.com>
References: <1158787917.4509.78684.camel@hal.voltaire.com>
Message-ID: <20060920213840.GB10173@mellanox.co.il>

Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Short term, can we just skip saquery utility in OFED 1.1?
> > Hal, can you approve this please?
> 
> I would prefer that is not the case and this is part of OFED 1.1.

So - what do you suggest?
Can you fix the OFED build today then? I don't have SLES10 ppc.
If no - do we delay the release to have this utility in?

-- 
MST


From halr at voltaire.com  Wed Sep 20 15:09:17 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 18:09:17 -0400
Subject: [openib-general] OFED 1.1 and OpenSM on SLES 10 for PPC64
In-Reply-To: <20060920213840.GB10173@mellanox.co.il>
References: <1158787917.4509.78684.camel@hal.voltaire.com>
	<20060920213840.GB10173@mellanox.co.il>
Message-ID: <1158790156.4509.80047.camel@hal.voltaire.com>

On Wed, 2006-09-20 at 17:38, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > Short term, can we just skip saquery utility in OFED 1.1?
> > > Hal, can you approve this please?
> > 
> > I would prefer that is not the case and this is part of OFED 1.1.
> 
> So - what do you suggest?
> Can you fix the OFED build today then? I don't have SLES10 ppc.

Me neither.

> If no - do we delay the release to have this utility in?

Doesn't sound like there is much choice. If the release has to be today,
then go without it.

-- Hal


From rdreier at cisco.com  Wed Sep 20 15:47:56 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Sep 2006 15:47:56 -0700
Subject: [openib-general] [PATCH] mthca: fix lid used for sending traps
In-Reply-To: <20060920060204.GA2870@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 20 Sep 2006 09:02:04 +0300")
References: <20060919081324.GF31498@mellanox.co.il>
	<adairjjlljk.fsf@cisco.com> <20060920060204.GA2870@mellanox.co.il>
Message-ID: <adafyemjgtf.fsf@cisco.com>

    Michael> Missed 2.6.18 by a small margin. Gar! Acked for 2.6.18.1?

I already sent it to stable at kernel.org.

Thanks...


From rdreier at cisco.com  Wed Sep 20 15:52:31 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Sep 2006 15:52:31 -0700
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <20060920042618.GA1710@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 20 Sep 2006 07:26:18 +0300")
References: <ada64fjlhyf.fsf@cisco.com> <20060920042618.GA1710@mellanox.co.il>
Message-ID: <adabqpajgls.fsf@cisco.com>

    Michael> I'm not sure priv->broadcast is always initialized when
    Michael> we start a path record query. Is there a reason why it is?

IPoIB can't send a packet until the broadcast group is joined, since
it doesn't do netif_carrier_on() until then.  So I don't see any way
that a path record query could start before we know the real mtu.

 - R.


From rdreier at cisco.com  Wed Sep 20 15:53:31 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 20 Sep 2006 15:53:31 -0700
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <20060920050111.GF1710@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 20 Sep 2006 08:01:11 +0300")
References: <ada64fjlhyf.fsf@cisco.com> <20060920042618.GA1710@mellanox.co.il>
	<20060920050111.GF1710@mellanox.co.il>
Message-ID: <ada7izyjgk4.fsf@cisco.com>

    Michael> It also seemed kind of nice to be able to control the
    Michael> path MTU from dev->mtu - and I don't think path flush on
    Michael> mtu change is an issue from the performance POV.

    Michael> What do you think?

It just seems weird to me...

 - R.


From sashak at voltaire.com  Wed Sep 20 18:01:08 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 21 Sep 2006 04:01:08 +0300
Subject: [openib-general] [PATCH] osm: fixing bugs in osmtest
In-Reply-To: <1158755895.4509.58062.camel@hal.voltaire.com>
References: <4510E184.8070900@dev.mellanox.co.il>
	<1158755895.4509.58062.camel@hal.voltaire.com>
Message-ID: <20060921010108.GA18938@sashak.voltaire.com>

On 08:38 Wed 20 Sep     , Hal Rosenstock wrote:
> Hi Yevgeny,
> 
> On Wed, 2006-09-20 at 02:36, Yevgeny Kliteynik wrote:
> > Hi Hal
> > 
> > I'm doing a major review of the osmtest.
> 
> Good. This has been long overdue.
> 
> > This patch is fixing a few bugs in osmtest where failures
> > were ignored. More precisely, osmtest was expecting error,
> > but got IB_SUCCESS and ignored the fact that it should have
> > gotten an error.
> > There are also a few changes to improve the code and osmtest
> > log readability.
> 
> Looks good at the code inspection level.
> 
> > More patches expected.
> 
> Thanks for the heads up.
> 
> > This patch is for trunk only.
> > 
> > I tested applying this patch before sending it. If you get the
> > patch rejected again - let me know.
> 
> It took the header file part but rejected all code blocks for osmtest.c
> :-(

It looks like modified and context lines have different numbers of
prefixed spaces.

Sasha


From sashak at voltaire.com  Wed Sep 20 18:27:47 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 21 Sep 2006 04:27:47 +0300
Subject: [openib-general] [PATCH TRIVIAL] opensm: remove
	osm_switch_get_lid() prototype
Message-ID: <20060921012747.GC18938@sashak.voltaire.com>

Hi Hal,

Some trivial cleanup.

Sasha.


opensm: remove osm_switch_get_lid() prototype

Remove prototype of non-existing osm_switch_get_lid() function.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---

 osm/include/opensm/osm_switch.h |   27 ---------------------------
 1 files changed, 0 insertions(+), 27 deletions(-)

diff --git a/osm/include/opensm/osm_switch.h b/osm/include/opensm/osm_switch.h
index 5f33e4a..8c4799f 100644
--- a/osm/include/opensm/osm_switch.h
+++ b/osm/include/opensm/osm_switch.h
@@ -542,33 +542,6 @@ osm_switch_get_port_by_lid(
 *	Switch object
 *********/
 
-/****f* OpenSM: Switch/osm_switch_get_lid
-* NAME
-*	osm_switch_get_lid
-*
-* DESCRIPTION
-*	Gets the switch's LID.
-*
-* SYNOPSIS
-*/
-ib_net16_t
-osm_switch_get_lid(
-	IN const osm_switch_t* const p_sw );
-/*
-* PARAMETERS
-*	p_sw
-*		[in] Pointer to an osm_switch_t object.
-*
-* RETURN VALUES
-*	Returns the switch's LID.  A value of zero means no LID has
-*	been assigned to the switch.
-*
-* NOTES
-*
-* SEE ALSO
-*	Switch object
-*********/
-
 /****f* OpenSM: Switch/osm_switch_get_physp_ptr
 * NAME
 *	osm_switch_get_physp_ptr


From sashak at voltaire.com  Wed Sep 20 18:50:09 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 21 Sep 2006 04:50:09 +0300
Subject: [openib-general] [PATCH TRIVIAL] opensm: LOG_ENTER name fix
Message-ID: <20060921015009.GD18938@sashak.voltaire.com>

Hi Hal,

Some trivial stuff.

Sasha


opensm: LOG_ENTER name fix

In osm_pkey_get_tables() fix the name used with LOG_ENTER().

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---

 osm/opensm/osm_port_info_rcv.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/osm/opensm/osm_port_info_rcv.c b/osm/opensm/osm_port_info_rcv.c
index 9e2fc11..af6fdc8 100644
--- a/osm/opensm/osm_port_info_rcv.c
+++ b/osm/opensm/osm_port_info_rcv.c
@@ -442,7 +442,7 @@ void osm_pkey_get_tables(
   uint32_t attr_mod_ho;
   osm_switch_t* p_switch;
 
-  OSM_LOG_ENTER( p_log, osm_physp_has_pkey );
+  OSM_LOG_ENTER( p_log, osm_pkey_get_tables );
 
   path = *osm_physp_get_dr_path_ptr( p_physp );
 

From halr at voltaire.com  Wed Sep 20 18:55:18 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 21:55:18 -0400
Subject: [openib-general] [PATCH TRIVIAL] opensm: remove
 osm_switch_get_lid() prototype
In-Reply-To: <20060921012747.GC18938@sashak.voltaire.com>
References: <20060921012747.GC18938@sashak.voltaire.com>
Message-ID: <1158803696.4509.88358.camel@hal.voltaire.com>

On Wed, 2006-09-20 at 21:27, Sasha Khapyorsky wrote:
> Hi Hal,
> 
> Some trivial cleanup.
> 
> Sasha.
> 
> 
> opensm: remove osm_switch_get_lid() prototype
> 
> Remove prototype of non-existing osm_switch_get_lid() function.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied to trunk only.

-- Hal


From halr at voltaire.com  Wed Sep 20 19:03:16 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 20 Sep 2006 22:03:16 -0400
Subject: [openib-general] [PATCH TRIVIAL] opensm: LOG_ENTER name fix
In-Reply-To: <20060921015009.GD18938@sashak.voltaire.com>
References: <20060921015009.GD18938@sashak.voltaire.com>
Message-ID: <1158804195.4509.88656.camel@hal.voltaire.com>

On Wed, 2006-09-20 at 21:50, Sasha Khapyorsky wrote:
> Hi Hal,
> 
> Some trivial stuff.
> 
> Sasha
> 
> 
> opensm: LOG_ENTER name fix
> 
> In osm_pkey_get_tables() fix the name used with LOG_ENTER().
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied to trunk only.

-- Hal


From mst at mellanox.co.il  Wed Sep 20 21:45:04 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Sep 2006 07:45:04 +0300
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <ada7izyjgk4.fsf@cisco.com>
References: <ada7izyjgk4.fsf@cisco.com>
Message-ID: <20060921044504.GA5830@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
>     Michael> It also seemed kind of nice to be able to control the
>     Michael> path MTU from dev->mtu - and I don't think path flush on
>     Michael> mtu change is an issue from the performance POV.
> 
>     Michael> What do you think?
> 
> It just seems weird to me...

Well, I like this better, but you are the final arbiter here.
I gather you want me to rework the patch to the original approach?

-- 
MST


From mst at mellanox.co.il  Wed Sep 20 21:57:13 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Sep 2006 07:57:13 +0300
Subject: [openib-general] [PATCH] IB/ipoib: user appropriate mtu
 selector for path queries
In-Reply-To: <adabqpajgls.fsf@cisco.com>
References: <adabqpajgls.fsf@cisco.com>
Message-ID: <20060921045713.GA5983@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: user appropriate mtu selector for path queries
> 
>     Michael> I'm not sure priv->broadcast is always initialized when
>     Michael> we start a path record query. Is there a reason why it is?
> 
> IPoIB can't send a packet until the broadcast group is joined, since
> it doesn't do netif_carrier_on() until then.  So I don't see any way
> that a path record query could start before we know the real mtu.

Good point, thanks. If we switch to that, a BUG_ON(priv->broadcast) just in case
won't hurt tough, would it?

-- 
MST


From dotanb at dev.mellanox.co.il  Wed Sep 20 22:34:05 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 08:34:05 +0300
Subject: [openib-general] max message size for IB_WR_SEND
In-Reply-To: <loom.20060920T204936-772@post.gmane.org>
References: <loom.20060920T204936-772@post.gmane.org>
Message-ID: <4512244D.4040404@dev.mellanox.co.il>

Hi.

amit byron wrote:
> hi,
>
> if i evoke/call ib_post_send(IB_WR_SEND) with message
> size 512 bytes, the message gets received on the
> peer (second) node. the 2 nodes are connected point-to
> -point.
>
> but if message size is increased to 4096 bytes then
> second node receives the message; but message content
> is missing (empty).
>
> won't infiniband stack break down message in smaller
> chunks and assemble on peer node?
>
> thanks,
> Amit.
>   
Which transport type are you using?
if you are using a UD QP, then the answer is no.
for any other transport type, the answer is yes (the message is being 
break down to packets with the MTU side as specified in the QP context.

maybe you have a different problem in you code. did you check the 
completion status in both of the nodes?

Dotan


From mlleinin at hpcn.ca.sandia.gov  Wed Sep 20 22:32:56 2006
From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger)
Date: Wed, 20 Sep 2006 22:32:56 -0700
Subject: [openib-general] OpenFabrics server scheduled downtime Sat. Sept 23
Message-ID: <1158816776.6412.89.camel@localhost>

 
  The OpenFabrics server will be offline Saturday September 23 from 6am
PST to 6pm PST due to a scheduled maintenance on a power substation at
Sandia.  These outages usually last less than the scheduled 12 hours.
We will bring the OpenFabrics server back online as soon as possible
after the scheduled outage. 

  Thanks,

	- Matt


From dotanb at dev.mellanox.co.il  Wed Sep 20 22:38:30 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 08:38:30 +0300
Subject: [openib-general] gen2_basic patch boogum?
In-Reply-To: <451191AD.80608@pathscale.com>
References: <451191AD.80608@pathscale.com>
Message-ID: <45122556.8090401@dev.mellanox.co.il>

Hi Robert.

Robert Walsh wrote:
> Hi Dotan,
>
> I just noticed that you didn't apply one of my patch hunks that called
> get_is_global().  I know why you didn't do it (the dlid is always 0: see
> line 218 of test_av.c), but should you still be setting the is_global
> field in the ah_attr structure to some value?  Right now, it will just
> be set to some random unitialized stack value.
>
> Regards,
>  Robert.
>   
You are right, i removed this line because the dlid is constant 0.
here is the initialization of the variable:
           struct ibv_ah_attr av_attr  = {
                        .dlid          = 0,
                        .sl            = VL_range(rand_gen, 0, 15),
                        .src_path_bits = VL_range(rand_gen, 0, 0x8f),
                        .port_num      = port,
                        .static_rate   = get_static_rate(1, rand_gen),
                        .grh           = {
                                .traffic_class = VL_range(rand_gen, 1, 
0xff),
                                .flow_label = VL_random(rand_gen, 0x100000),
                                .hop_limit = VL_range(rand_gen, 1, 0xff),
                        }
                };


This attributes of av_attr that are being initialized in this code will 
have the value that is being assigned to them.
All the rest of the attributes (for example: is_global) are being set 
with 0.

Dotan


From eeb at bartonsoftware.com  Wed Sep 20 22:40:11 2006
From: eeb at bartonsoftware.com (Eric Barton)
Date: Thu, 21 Sep 2006 06:40:11 +0100
Subject: [openib-general] RDMA CM callback status
Message-ID: <200609210540.k8L5eBce029142@robert.bartonsoftware.com>


Sean,

I have some questions regarding my RDMA CM callback handler....

int
kiblnd_cm_callback(struct rdma_cm_id *cmid, struct rdma_cm_event *event)
{
	switch (event->event) {
	default:
                ASSERT (0);

	case RDMA_CM_EVENT_CONNECT_REQUEST:
		return kiblnd_passive_connect(...);
                
	case RDMA_CM_EVENT_ADDR_ERROR:
                ASSERT(event->status != 0);
		/* handle error */
                return event->status;

	case RDMA_CM_EVENT_ADDR_RESOLVED:
                if (event->status == 0)
			return kiblnd_resolve_route(...);
		/* handle error */
		return event->status;

	case RDMA_CM_EVENT_ROUTE_ERROR:
                ASSERT(event->status != 0);
		/* handle error */
                return event->status;

	case RDMA_CM_EVENT_ROUTE_RESOLVED:
                if (event->status == 0)
                        return kiblnd_active_connect(...);
		/* handle error */
                return event->status;
                
	case RDMA_CM_EVENT_UNREACHABLE:
                ASSERT (event->status != 0);
		/* handle error out-of-line */
                return 0;

	case RDMA_CM_EVENT_CONNECT_ERROR:
                ASSERT (event->status != 0);
		/* handle error out-of-line */
                return 0;

	case RDMA_CM_EVENT_REJECTED:
		/* handle error out-of-line */
                return 0;

	case RDMA_CM_EVENT_ESTABLISHED:
		/* handle success */
                return 0;

	case RDMA_CM_EVENT_DISCONNECTED:
		/* teardown */
                return 0;

	case RDMA_CM_EVENT_DEVICE_REMOVAL:
		/* bleat on the console */
		return 0;
	}
}

1. Should I even be looking at event->status or does the event type tell me
   everything I need to know?  I've had a report that the assertion
   (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR.

2. /* handle error out-of-line */ above means I record failure in my connection
   data structure, start teardown and drop the callback's reference on it.
   When the last reference goes, the connection data structure is queued for
   final destruction (including rdma_destroy_id(cmid)).

   Given that this might race with the callback's caller is this OK?

-- 

                Cheers,
                        Eric

---------------------------------------------------
|Eric Barton        Barton Software               |
|9 York Gardens     Tel:    +44 (117) 330 1575    |
|Clifton            Mobile: +44 (7909) 680 356    |
|Bristol BS8 4LL    Fax:    call first            |
|United Kingdom     E-Mail:                       |
---------------------------------------------------


From dotanb at dev.mellanox.co.il  Wed Sep 20 23:51:11 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 09:51:11 +0300
Subject: [openib-general] gen2_basic patch 7/10: choose illegal
 max_qp_init_rd_atom values correctly
In-Reply-To: <4511B084.8070100@pathscale.com>
References: <4511B084.8070100@pathscale.com>
Message-ID: <4512365F.9090204@dev.mellanox.co.il>

Robert Walsh wrote:
> gen2_basic - choose illegal max_qp_init_rd_atom values correctly
>
> Signed-off by: Robert Walsh <robert.walsh at qlogic.com>
>
> diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c
> --- a/gen2_basic/test_qp.c	2006-09-13 19:09:47.419791000 -0700
> +++ b/gen2_basic/test_qp.c	2006-08-14 14:16:57.911621000 -0700
> @@ -369,7 +369,7 @@
>  	if (legal)
>  		return VL_range(rand_gen, 0, attr->max_qp_init_rd_atom);
>  	else
> -		return VL_range(rand_gen, attr->max_qp_init_rd_atom, 0xFF);
> +		return VL_range(rand_gen, attr->max_qp_init_rd_atom + 1, 0xFF);
>  }
>  
>  uint8_t get_max_dest_rd_atomic(
> @@ -380,7 +380,7 @@
>  	if (legal)
>  		return VL_range(rand_gen, 0, attr->max_qp_rd_atom);
>  	else
> -		return VL_range(rand_gen, attr->max_qp_rd_atom, 0xFF);
> +		return VL_range(rand_gen, attr->max_qp_rd_atom + 1, 0xFF);
>  }
>  
>  uint8_t get_min_rnr_timer(
>   


committed.

thanks
Dotan


From dotanb at dev.mellanox.co.il  Wed Sep 20 23:51:23 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 09:51:23 +0300
Subject: [openib-general] gen2_basic patch 8/10: handle auto path
 migration properly
In-Reply-To: <4511B280.7060906@pathscale.com>
References: <4511B280.7060906@pathscale.com>
Message-ID: <4512366B.5000704@dev.mellanox.co.il>

Robert Walsh wrote:
> gen2_basic - handle auto path migration properly
>
> Signed-off by: Robert Walsh <robert.walsh at qlogic.com>
>
> diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c
> --- a/gen2_basic/test_qp.c	2006-09-13 19:15:59.829006000 -0700
> +++ b/gen2_basic/test_qp.c	2006-08-14 14:16:57.911621000 -0700
> @@ -586,6 +586,7 @@
>  }
>  
>  void cleanup_mask(
> +	IN	struct ibv_device_attr *device_attr,
>  	IN	enum ibv_qp_type qp_type,
>  	IN OUT	int* mask)
>  {
> @@ -607,6 +608,8 @@
>  		*mask &= ~IBV_QP_MAX_DEST_RD_ATOMIC;
>  		*mask &= ~IBV_QP_MAX_QP_RD_ATOMIC;
>  	}
> +	if (!(device_attr->device_cap_flags & IBV_DEVICE_AUTO_PATH_MIG))
> +		*mask &= ~IBV_QP_ALT_PATH;
>  }
>  
>  int my_query_qp(
> @@ -774,7 +777,7 @@
>  	case REQUIRED_ATTR:
>  		mask |= test_vector[idx].required_attr;
>  
> -		cleanup_mask(qp_type, &mask);
> +		cleanup_mask(device_attr, qp_type, &mask);
>  		mask &= ~IBV_QP_PATH_MIG_STATE; 
>  
>  		if (test_vector[idx].to == IBV_QPS_SQD && test_vector[idx].from == IBV_QPS_SQD && qp_type != IBV_QPT_RC)
> @@ -798,8 +801,8 @@
>  			temp_mask = test_vector[idx].optional_attr;
>  			mask = test_vector[idx].required_attr | test_vector[idx].optional_attr;
>  		}
> -		cleanup_mask(qp_type, &mask);
> -		cleanup_mask(qp_type, &temp_mask);
> +		cleanup_mask(device_attr, qp_type, &mask);
> +		cleanup_mask(device_attr, qp_type, &temp_mask);
>  
>  		for (i = 1; i <= 20; ++i) {
>  			if ((1 << i) & temp_mask) {
> @@ -820,7 +823,7 @@
>  	
>  	case NOT_ALL_REQUIRED:
>  		mask = test_vector[idx].required_attr;
> -		cleanup_mask(qp_type, &mask);
> +		cleanup_mask(device_attr, qp_type, &mask);
>  
>  		for (i = 1; i <= 20; ++i) {
>  			if ((1 << i) & mask) {
> @@ -835,7 +838,7 @@
>  		break;
>  	case NOT_ALL_OPTIONAL:
>  		mask = test_vector[idx].required_attr | test_vector[idx].optional_attr;
> -		cleanup_mask(qp_type, &mask);
> +		cleanup_mask(device_attr, qp_type, &mask);
>  
>  		if (test_vector[idx].to == IBV_QPS_SQD && test_vector[idx].from == IBV_QPS_SQD && qp_type != IBV_QPT_RC)
>  			mask &= ~IBV_QP_PORT;
> @@ -855,7 +858,7 @@
>  		break;
>  	case INVALID_ATTR:
>  		mask = test_vector[idx].required_attr | test_vector[idx].optional_attr;
> -		cleanup_mask(qp_type, &mask);
> +		cleanup_mask(device_attr, qp_type, &mask);
>  
>  		mask = get_random_mask(rand_gen, mask);
>  
> @@ -1420,7 +1422,7 @@
>  
>  					for (j = 1; j < 20; ++j) {
>  						int mask = test_vector[i].optional_attr;
> -						cleanup_mask(qp_type, &mask);
> +						cleanup_mask(&device_attr, qp_type, &mask);
>  						if ((1 << j) & mask) {
>  							get_qp_cap(rand_gen, 1, &device_attr, &attr.cap);
>  
> @@ -1540,7 +1542,7 @@
>  				mask = IBV_QP_STATE | IBV_QP_TIMEOUT | IBV_QP_RETRY_CNT | IBV_QP_RNR_RETRY | 
>  					IBV_QP_SQ_PSN | IBV_QP_MAX_QP_RD_ATOMIC | IBV_QP_PATH_MIG_STATE;
>  
> -				cleanup_mask(qp_type, &mask);
> +				cleanup_mask(&device_attr, qp_type, &mask);
>  
>  				qp_attr.path_mig_state = IBV_MIG_REARM;
>  
> @@ -1556,7 +1558,7 @@
>  
>  				mask = IBV_QP_STATE | IBV_QP_PATH_MIG_STATE;
>  
> -				cleanup_mask(qp_type, &mask);
> +				cleanup_mask(&device_attr, qp_type, &mask);
>  
>  				qp_attr.path_mig_state = IBV_MIG_REARM;
>  
> @@ -1584,7 +1586,7 @@
>  
>  				mask = IBV_QP_STATE | IBV_QP_PATH_MIG_STATE;
>  
> -				cleanup_mask(qp_type, &mask);
> +				cleanup_mask(&device_attr, qp_type, &mask);
>  
>  				qp_attr.path_mig_state = IBV_MIG_REARM;
>  
>   
committed.

thanks
Dotan


From dotanb at dev.mellanox.co.il  Wed Sep 20 23:51:35 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 09:51:35 +0300
Subject: [openib-general] gen2_basic patch 9/10: fix static_rate check
In-Reply-To: <4511B2A9.8070105@pathscale.com>
References: <4511B2A9.8070105@pathscale.com>
Message-ID: <45123677.4050803@dev.mellanox.co.il>

Robert Walsh wrote:
> gen2_basic - fix static_rate check
>
> Make sure we're comparing apples to apples in the static_rate check.
>
> Signed-off by: Robert Walsh <robert.walsh at qlogic.com>
>
> diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c
> --- a/gen2_basic/test_qp.c	2006-09-13 19:17:17.835923000 -0700
> +++ b/gen2_basic/test_qp.c	2006-08-14 14:16:57.911621000 -0700
> @@ -659,7 +659,7 @@
>  /*		CHECK_VALUE("AV port_num", query_attr.ah_attr.port_num, attr->ah_attr.port_num, return -1); */
>  		CHECK_VALUE("AV sl", query_attr.ah_attr.sl, attr->ah_attr.sl, return -1);
>  		CHECK_VALUE("AV src_path_bits", query_attr.ah_attr.src_path_bits, attr->ah_attr.src_path_bits, return -1);
> -		CHECK_VALUE("AV static_rate", query_attr.ah_attr.static_rate, !!attr->ah_attr.static_rate, return -1);
> +		CHECK_VALUE("AV static_rate", !!query_attr.ah_attr.static_rate, !!attr->ah_attr.static_rate, return -1);
>  		if (query_attr.ah_attr.is_global) {
>  			int i;
>  			
>   
committed.

thanks
Dotan


From dotanb at dev.mellanox.co.il  Wed Sep 20 23:51:53 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 09:51:53 +0300
Subject: [openib-general] gen2_basic patch 10/10: handle other vendor
 devices for max QP count
In-Reply-To: <4511B2CD.2020104@pathscale.com>
References: <4511B2CD.2020104@pathscale.com>
Message-ID: <45123689.1080002@dev.mellanox.co.il>

Robert Walsh wrote:
> gen2_basic - handle other vendor devices for max QP count
>
> When choosing the actual max QP number, handle non-Mellanox devices too.
> Make sure we only clean up the QPs we actually created.
>
> Signed-off by: Robert Walsh <robert.walsh at qlogic.com>
>
> diff -rNu a/gen2_basic/test_qp.c b/gen2_basic/test_qp.c
> --- a/gen2_basic/test_qp.c	2006-09-13 19:18:03.655058000 -0700
> +++ b/gen2_basic/test_qp.c	2006-08-14 14:16:57.911621000 -0700
> @@ -1289,13 +1289,12 @@
>  	CHECK_PTR("ibv_create_cq", cq, goto cleanup);
>  
>          switch (device_attr.vendor_part_id) {
> -	case 23108:
> -	case 25208:
> -		num_qp = device_attr.max_qp; 
> -		break;
>  	case 25218:
>  	case 25204:
> 		num_qp = 15872; /* Found in experiments to be the max for memfree per process */
> +		break;
> +	default:
> +		num_qp = device_attr.max_qp; 
>  		break;
>  	}
>  
> @@ -1330,7 +1329,7 @@
>  	WAIT_CLEANUP;
>  
>  	if (qp) {
> -		for (i = 0; i < device_attr.max_qp + 1; ++ i) {
> +		for (i = 0; i < num_qp + 1; ++ i) {
>  			if (qp[i]) {
>  				rc = ibv_destroy_qp(qp[i]);
>  				CHECK_VALUE("ibv_destroy_qp", rc, 0, test_result = -1);
>   
committed.

thanks
Dotan


From rjwalsh at pathscale.com  Wed Sep 20 23:59:27 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Wed, 20 Sep 2006 23:59:27 -0700
Subject: [openib-general] gen2_basic patch boogum?
In-Reply-To: <45122556.8090401@dev.mellanox.co.il>
References: <451191AD.80608@pathscale.com>
	<45122556.8090401@dev.mellanox.co.il>
Message-ID: <4512384F.3060304@pathscale.com>

> All the rest of the attributes (for example: is_global) are being set 
> with 0.

Oh, OK - I wasn't sure that you wanted it set that way or randomly like 
it used to be.  No biggie.

Regards,
  Robert.


From kliteyn at dev.mellanox.co.il  Thu Sep 21 00:30:38 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 21 Sep 2006 10:30:38 +0300
Subject: [openib-general] [PATCHv2] osm: fixing bugs in osmtest
Message-ID: <yzspsdpbrs1.fsf@kliteynik.yok.mtl.com>

Hi Hal

It appears that each mailer is messing with white spaces 
in its own very special way... 

Anyway, this time it is ok for sure.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Index: osmtest/include/osmtest.h
===================================================================
--- osmtest/include/osmtest.h	(revision 9585)
+++ osmtest/include/osmtest.h	(working copy)
@@ -506,4 +506,13 @@ ib_api_status_t
 osmtest_get_local_port_lmc( IN osmtest_t * const p_osmt,
                             IN ib_net16_t  lid,
                             OUT uint8_t *  const p_lmc );
+
+
+/*
+ * A few auxiliary macros for logging
+ */
+
+#define EXPECTING_ERRORS_START "[[ ===== Expecting Errors - START ===== "
+#define EXPECTING_ERRORS_END   "   ===== Expecting Errors  -  END ===== ]]"
+
 #endif /* _OSMTEST_H_ */
Index: osmtest/osmtest.c
===================================================================
--- osmtest/osmtest.c	(revision 9585)
+++ osmtest/osmtest.c	(working copy)
@@ -552,6 +552,7 @@ osmtest_init( IN osmtest_t * const p_osm
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
              "osmtest_init: ERR 0001: "
              "Unable to allocate vendor object" );
+    status = IB_ERROR;
     goto Exit;
   }
 
@@ -1817,6 +1818,11 @@ osmtest_wrong_sm_key_ignored( IN osmtest
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
              "osmtest_wrong_sm_key_ignored: ERR 0011: "
              "Did not get a timeout but got (%s)\n", ib_get_err_str( status ) );
+    if ( status == IB_SUCCESS )
+    {
+      /* assign some error value to status, since IB_SUCCESS is a bad rc */
+      status = IB_ERROR;
+    }
     goto Exit;
   }
   else
@@ -5448,14 +5454,23 @@ osmtest_validate_against_db( IN osmtest_
 
   memset( &context, 0, sizeof( context ) );
   memset( &request, 0, sizeof( request ) );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
   status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
   if( status == IB_SUCCESS )
-    goto Exit;
-  else
   {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec: "
-             "IS EXPECTED ERROR ^^^^\n");
+    status = IB_ERROR;
+    goto Exit;
   }
 
   memset( &context, 0, sizeof( context ) );
@@ -5463,14 +5478,23 @@ osmtest_validate_against_db( IN osmtest_
   request.comp_mask = IB_MPR_COMPMASK_SGIDCOUNT;
   request.sgid_count = 1;
   ib_gid_set_default( &request.gids[0], portguid );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
   status = osmtest_get_multipath_rec( p_osmt, &request, &context );
-  if( status == IB_SUCCESS ) 
-    goto Exit;
-  else
+  if( status != IB_SUCCESS )
   {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec: "
-             "IS EXPECTED ERROR ^^^^\n");
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
+  if( status == IB_SUCCESS )
+  {
+    status = IB_ERROR;
+    goto Exit;
   }
 
   memset( &context, 0, sizeof( context ) );
@@ -5482,14 +5506,23 @@ osmtest_validate_against_db( IN osmtest_
   /* Set IPoIB broadcast MGID */
   request.gids[1].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL);
   request.gids[1].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL);
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
   status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
   if( status == IB_SUCCESS )
-    goto Exit;
-  else
   {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec: "
-             "IS EXPECTED ERROR ^^^^\n");
+    status = IB_ERROR;
+    goto Exit;
   }
 
   memset( &context, 0, sizeof( context ) );
@@ -5500,14 +5533,23 @@ osmtest_validate_against_db( IN osmtest_
   request.gids[0].unicast.prefix = CL_HTON64(0xff12401bffff0000ULL);
   request.gids[0].unicast.interface_id = CL_HTON64(0x00000000ffffffffULL);
   ib_gid_set_default( &request.gids[1], portguid );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
   status = osmtest_get_multipath_rec( p_osmt, &request, &context );
+  if( status != IB_SUCCESS )
+  {
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
   if( status == IB_SUCCESS )
-    goto Exit;
-  else
   {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_multipath_rec_gid_ipoib_bcast: "
-             "IS EXPECTED ERROR ^^^^\n");
+    status = IB_ERROR;
+    goto Exit;
   }
 
   memset( &context, 0, sizeof( context ) );
@@ -5569,14 +5611,23 @@ osmtest_validate_against_db( IN osmtest_
     goto Exit;
 
   memset( &context, 0, sizeof( context ) );
+
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_START "\n" );
   status = osmtest_get_pkeytbl_rec_by_lid( p_osmt, test_lid, 0, &context );
-  if ( status == IB_SUCCESS )
-    goto Exit;
-  else
+  if( status != IB_SUCCESS )
   {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmtest_get_pkeytbl_rec_by_lid: "
-             "IS EXPECTED ERROR ^^^^\n");
+     osm_log( &p_osmt->log, OSM_LOG_ERROR,
+              "osmtest_get_multipath_rec: "
+              "Got error %s\n", ib_get_err_str(status) );
+  }
+  osm_log( &p_osmt->log, OSM_LOG_ERROR,
+           "osmtest_get_multipath_rec: " EXPECTING_ERRORS_END "\n" );
+
+  if( status == IB_SUCCESS )
+  {
+    status = IB_ERROR;
+    goto Exit;
   }
 
   memset( &context, 0, sizeof( context ) );
@@ -5679,26 +5730,43 @@ osmtest_validate_against_db( IN osmtest_
         goto Exit;
 
       memset( &context, 0, sizeof( context ) );
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" );
       status = osmtest_get_path_rec_by_lid_pair( p_osmt, 0xffff,
                                                  0xffff, &context );
-      if (status == IB_SUCCESS )
-        goto Exit;
-      else
+      if( status != IB_SUCCESS )
       {
-        osm_log ( &p_osmt->log, OSM_LOG_ERROR,
+         osm_log( &p_osmt->log, OSM_LOG_ERROR,
                   "osmtest_get_path_rec_by_lid_pair: "
-                  "IS EXPECTED ERROR ^^^^\n" );
+                  "Got error %s\n", ib_get_err_str(status) );
+      }
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" );
+
+      if( status == IB_SUCCESS )
+      {
+        status = IB_ERROR;
+        goto Exit;
       }
 
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_START "\n" );
+
       status = osmtest_get_path_rec_by_lid_pair( p_osmt, test_lid,
                                                  0xffff, &context );
-      if (status == IB_SUCCESS )
-        goto Exit;
-      else
+      if( status != IB_SUCCESS )
       {
-        osm_log ( &p_osmt->log, OSM_LOG_ERROR,
+         osm_log( &p_osmt->log, OSM_LOG_ERROR,
                   "osmtest_get_path_rec_by_lid_pair: "
-                  "IS EXPECTED ERROR ^^^^\n" );
+                  "Got error %s\n", ib_get_err_str(status) );
+      }
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmtest_get_path_rec_by_lid_pair: " EXPECTING_ERRORS_END "\n" );
+
+      if( status == IB_SUCCESS )
+      {
+        status = IB_ERROR;
+        goto Exit;
       }
     }
   }
@@ -7141,6 +7209,9 @@ osmtest_run( IN osmtest_t * const p_osmt
 
   if( p_osmt->opt.flow == 1 )
   {
+    /*
+     * Creating an inventory file with all nodes, ports and paths
+     */
     status = osmtest_create_inventory_file( p_osmt );
     if( status != IB_SUCCESS )
     {
@@ -7155,6 +7226,9 @@ osmtest_run( IN osmtest_t * const p_osmt
   {
     if( p_osmt->opt.flow == 5 )
     {
+      /*
+       * Stress SA - flood the it with queries
+       */
       switch ( p_osmt->opt.stress )
       {
         case 0:
@@ -7215,8 +7289,11 @@ osmtest_run( IN osmtest_t * const p_osmt
       /*
        * Run normal validition tests.
        */
-       if (!p_osmt->opt.flow || p_osmt->opt.flow == 2)
+       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 2)
        {
+         /*
+          * Only validate the given inventory file
+          */ 
          status = osmtest_create_db( p_osmt );
          if( status != IB_SUCCESS )
          {
@@ -7238,7 +7315,7 @@ osmtest_run( IN osmtest_t * const p_osmt
          }
        }
 
-       if (!p_osmt->opt.flow)
+       if (p_osmt->opt.flow == 0)
        {
          status = osmtest_wrong_sm_key_ignored( p_osmt );
          if( status != IB_SUCCESS )
@@ -7251,8 +7328,11 @@ osmtest_run( IN osmtest_t * const p_osmt
          }
        }
 
-       if (!p_osmt->opt.flow || p_osmt->opt.flow == 3)
+       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 3)
        {
+         /*
+          * run service registration, deregistration, and lease test
+          */
          status = osmt_run_service_records_flow( p_osmt );
          if( status != IB_SUCCESS )
          {
@@ -7264,8 +7344,11 @@ osmtest_run( IN osmtest_t * const p_osmt
          }
        }
 
-       if (!p_osmt->opt.flow || p_osmt->opt.flow == 4)
+       if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 4)
        {
+          /* 
+           * Run event forwarding test
+           */
 #ifdef OSM_VENDOR_INTF_MTL
           status = osmt_run_inform_info_flow( p_osmt );
 
@@ -7286,12 +7369,13 @@ osmtest_run( IN osmtest_t * const p_osmt
 #endif
         }
 
-        /*
-         * since it generates a huge file, we run it only
-         * if explicitly required to
-         */
         if (p_osmt->opt.flow == 7)
         {
+          /* 
+           * QoS info: dump VLArb and SLtoVL tables.
+           * Since it generates a huge file, we run it only
+           * if explicitly required to
+           */
           status = osmtest_create_db( p_osmt );
           if( status != IB_SUCCESS )
           {
@@ -7315,6 +7399,9 @@ osmtest_run( IN osmtest_t * const p_osmt
 
         if (p_osmt->opt.flow == 8)
         {
+          /*
+           * Run trap 64/65 flow (this flow requires running of external tool)
+           */
 #ifdef OSM_VENDOR_INTF_MTL
           status = osmt_run_trap64_65_flow( p_osmt  );
           if( status != IB_SUCCESS )
@@ -7334,8 +7421,11 @@ osmtest_run( IN osmtest_t * const p_osmt
 #endif
         }
 
-        if (!p_osmt->opt.flow || p_osmt->opt.flow == 6)
+        if (p_osmt->opt.flow == 0 || p_osmt->opt.flow == 6)
         {
+          /*
+           * Multicast flow
+           */ 
           status = osmt_run_mcast_flow( p_osmt );
           if( status != IB_SUCCESS )
           {


From mst at mellanox.co.il  Thu Sep 21 01:09:08 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Sep 2006 11:09:08 +0300
Subject: [openib-general] [PATCHv2] osm: fixing bugs in osmtest
In-Reply-To: <yzspsdpbrs1.fsf@kliteynik.yok.mtl.com>
References: <yzspsdpbrs1.fsf@kliteynik.yok.mtl.com>
Message-ID: <20060921080908.GC27123@mellanox.co.il>

Quoting r. Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>:
> Subject: [PATCHv2] osm: fixing bugs in osmtest
> 
> Hi Hal
> 
> It appears that each mailer is messing with white spaces 
> in its own very special way... 

It's often not the mail agent's, but editor's fault:
lots of editors mess up code if you cut and paste the patch - since
it looks like regular text input to them.

You want an editor that lets you inline the patch as is, taking
it directly from file, not through the clipboard.

E.g. I hear that kmail composer let's you do file/insert file
to do that.

-- 
MST


From kliteyn at mellanox.co.il  Thu Sep 21 01:31:46 2006
From: kliteyn at mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 21 Sep 2006 11:31:46 +0300
Subject: [openib-general] [PATCH][TRIVIAL]OpenSM/osm_node_info_rcv.c:
 Eliminate superfluous call level
In-Reply-To: <1158672358.4509.4309.camel@hal.voltaire.com>
References: <1158672358.4509.4309.camel@hal.voltaire.com>
Message-ID: <1158827506.8655.28.camel@kliteynik.yok.mtl.com>

Hi Hal.

The patch looks OK.

Regards,
--
Yevgeny

On Tue, 2006-09-19 at 09:25 -0400, Hal Rosenstock wrote:

> OpenSM/osm_node_info_rcv.c: Eliminate superfluous call level
> 
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> Index: opensm/osm_node_info_rcv.c
> ===================================================================
> --- opensm/osm_node_info_rcv.c	(revision 9536)
> +++ opensm/osm_node_info_rcv.c	(working copy)
> @@ -437,7 +437,7 @@ __osm_ni_rcv_process_new_ca(
>   The plock must be held before calling this function.
>  **********************************************************************/
>  static void
> -__osm_ni_rcv_process_ca_port(
> +__osm_ni_rcv_process_existing_ca(
>    IN const osm_ni_rcv_t* const p_rcv,
>    IN osm_node_t* const p_node,
>    IN const osm_madw_t* const p_madw )
> @@ -455,7 +455,7 @@ __osm_ni_rcv_process_ca_port(
>    osm_bind_handle_t h_bind;
>    cl_status_t cl_status;
>  
> -  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_ca_port );
> +  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca );
>  
>    p_smp = osm_madw_get_smp_ptr( p_madw );
>    p_ni = (ib_node_info_t*)ib_smp_get_payload_ptr( p_smp );
> @@ -473,7 +473,7 @@ __osm_ni_rcv_process_ca_port(
>    if( p_port == (osm_port_t*)cl_qmap_end( p_guid_tbl ) )
>    {
>      osm_log( p_rcv->p_log, OSM_LOG_VERBOSE,
> -             "__osm_ni_rcv_process_ca_port: "
> +             "__osm_ni_rcv_process_existing_ca: "
>               "Creating new port object with GUID = 0x%" PRIx64 "\n",
>               cl_ntoh64( p_ni->port_guid ) );
>  
> @@ -483,7 +483,7 @@ __osm_ni_rcv_process_ca_port(
>      if( p_port == NULL )
>      {
>        osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> -               "__osm_ni_rcv_process_ca_port: ERR 0D04: "
> +               "__osm_ni_rcv_process_existing_ca: ERR 0D04: "
>                 "Unable to create new port object\n" );
>        goto Exit;
>      }
> @@ -500,7 +500,7 @@ __osm_ni_rcv_process_ca_port(
>          Somehow, this port GUID already exists in the table.
>        */
>        osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> -               "__osm_ni_rcv_process_ca_port: ERR 0D12: "
> +               "__osm_ni_rcv_process_existing_ca: ERR 0D12: "
>                 "Port 0x%" PRIx64 " already in the database!\n",
>                 cl_ntoh64( p_ni->port_guid ) );
>  
> @@ -521,7 +521,7 @@ __osm_ni_rcv_process_ca_port(
>        if( cl_status != CL_SUCCESS )
>        {
>          osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> -                 "__osm_ni_rcv_process_ca_port: ERR 0D08: "
> +                 "__osm_ni_rcv_process_existing_ca: ERR 0D08: "
>                   "Error %s adding to list\n",
>                   CL_STATUS_MSG( cl_status ) );
>          osm_port_delete( &p_port );
> @@ -530,7 +530,7 @@ __osm_ni_rcv_process_ca_port(
>        else
>        {
>          osm_log( p_rcv->p_log, OSM_LOG_DEBUG,
> -                 "__osm_ni_rcv_process_ca_port: "
> +                 "__osm_ni_rcv_process_existing_ca: "
>                   "Adding port GUID:0x%016" PRIx64 " to new_ports_list\n",
>                   cl_ntoh64(osm_node_get_node_guid( p_port->p_node )) );
>        }
> @@ -547,7 +547,7 @@ __osm_ni_rcv_process_ca_port(
>      if ( !osm_physp_is_valid( p_physp ) )
>      {
>          osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> -                 "__osm_ni_rcv_process_ca_port: ERR 0D19: "
> +                 "__osm_ni_rcv_process_existing_ca: ERR 0D19: "
>                   "Invalid physical port. Aborting discovery\n");
>          goto Exit;
>      }
> @@ -579,7 +579,7 @@ __osm_ni_rcv_process_ca_port(
>    if( status != IB_SUCCESS )
>    {
>      osm_log( p_rcv->p_log, OSM_LOG_ERROR,
> -             "__osm_ni_rcv_process_ca_port: ERR 0D13: "
> +             "__osm_ni_rcv_process_existing_ca: ERR 0D13: "
>               "Failure initiating PortInfo request (%s)\n",
>               ib_get_err_str(status));
>    }
> @@ -592,22 +592,6 @@ __osm_ni_rcv_process_ca_port(
>   The plock must be held before calling this function.
>  **********************************************************************/
>  static void
> -__osm_ni_rcv_process_existing_ca(
> -  IN const osm_ni_rcv_t* const p_rcv,
> -  IN osm_node_t* const p_node,
> -  IN const osm_madw_t* const p_madw )
> -{
> -  OSM_LOG_ENTER( p_rcv->p_log, __osm_ni_rcv_process_existing_ca );
> -
> -  __osm_ni_rcv_process_ca_port( p_rcv, p_node, p_madw );
> -
> -  OSM_LOG_EXIT( p_rcv->p_log );
> -}
> -
> -/**********************************************************************
> - The plock must be held before calling this function.
> -**********************************************************************/
> -static void
>  __osm_ni_rcv_process_new_router(
>    IN const osm_ni_rcv_t* const p_rcv,
>    IN osm_node_t* const p_node,
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060921/278bd076/attachment.html>

From ogerlitz at voltaire.com  Thu Sep 21 01:35:10 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 21 Sep 2006 11:35:10 +0300 (IDT)
Subject: [openib-general] timer_pending kernel assertion while stopping IPoIB
Message-ID: <Pine.LNX.4.64.0609211129040.28981@zuben>

I just got the following assertion:

KERNEL: assertion (!timer_pending(&dev->watchdog_timer)) failed at net/sched/sch_generic.c (631)

which is the outcome of this line in dev_shutdown()

	BUG_TRAP(!timer_pending(&dev->watchdog_timer));

when running a sctipt that does

modprobe ib_ipoib
echo 1 > /sys/module/ib_ipoib/parameters/mcast_debug_level
echo 1 > /sys/module/ib_ipoib/parameters/debug_level
ifconfig ib0 192.168.10.118

and then after some time a script that does

ifconfig ib0   down
ifconfig ib1   down
modprobe -r    ib_ipoib

below is the dmesg, it might help, the kernel is net-2.6.19 git

Or.

ib0: bringing up interface
ib0: starting multicast thread
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
ib0: adding multicast entry for mgid ff12:401b:ffff:0000:0000:0000:0000:0001
ib0: starting multicast thread
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -4)
ib0: starting multicast thread
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
ib0: Created ah ffff8100200517c0
ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV ffff8100200517c0, LID 0xc000, SL 0
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001
ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 (status 0)
ib0: Created ah ffff810035efec40
ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV ffff810035efec40, LID 0xc001, SL 0
ib0: successfully joined all multicast groups
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: waiting for MGID ff12:401b:ffff:0000:0000:0000:0000:0001
ib0: starting multicast thread
ib0: successfully joined all multicast groups
ib0: stopping interface
ib0: downing ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:0000:0001
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:0000:0001
ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: All sends and receives done.
KERNEL: assertion (!timer_pending(&dev->watchdog_timer)) failed at net/sched/sch_generic.c (631)
ib0: cleaning up ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib1: cleaning up ib_dev
ib1: stopping multicast thread
ib1: flushing multicast list


From erezz at voltaire.com  Thu Sep 21 02:52:12 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 21 Sep 2006 12:52:12 +0300
Subject: [openib-general] 2 SLES 10 backport directories
In-Reply-To: <20060920210716.GD9724@mellanox.co.il>
References: <450915EE.1090705@voltaire.com>
	<20060917044626.GA26054@mellanox.co.il> <451137D9.3060607@voltaire.com>
	<20060920210716.GD9724@mellanox.co.il>
Message-ID: <451260CC.6090805@voltaire.com>


>>>>         
>> I think that we can throw away 
>> https://openib.org/svn/gen2/branches/backport/sles10/. These patches 
>> apply to SLES 10 beta 8. They are no longer needed. As for 
>> https://openib.org/svn/gen2/branches/backport/2.6.16_sles10/, it 
>> contains 2 iSER patches. Both affect only iSER code (nothing in 
>> open-iscsi or any other kernel code). Therefore, I think that it's ok.
>>     
>
> Go ahead and kill backport/sles10 then.
> But the whole backport dir should be updated from OFED tree
> or better killed once Sean switches to git.
>   
I will delete it. However, I don't understand what do you mean by saying 
"the whole backport dir should be updated from OFED tree".

Erez


From dotanb at dev.mellanox.co.il  Thu Sep 21 04:51:40 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 14:51:40 +0300
Subject: [openib-general] gen2_basic patch boogum?
In-Reply-To: <4511AFD0.9030902@pathscale.com>
References: <451191AD.80608@pathscale.com>
 <4511AFD0.9030902@pathscale.com>
Message-ID: <45127CCC.80609@dev.mellanox.co.il>

Robert Walsh wrote:
> Another quick question: I noticed that in the latest changes your
> pushed, including my patches, you removed the following check in test_qp.c:
>
> @@ -1702,7 +1700,6 @@
>                 CHECK_VALUE("qp_type", query_init_attr.qp_type,
> attr.qp_type, goto cleanup);
>                 CHECK_VALUE_PTR("recv_cq", query_init_attr.recv_cq,
> attr.recv_cq, goto cleanup);
>                 CHECK_VALUE_PTR("send_cq", query_init_attr.send_cq,
> attr.send_cq, goto cleanup);
> -               CHECK_VALUE("sq_sig_all", query_init_attr.sq_sig_all,
> attr.sq_sig_all, goto cleanup);
>                 CHECK_VALUE_PTR("srq", query_init_attr.srq, attr.srq,
> goto cleanup);
>         }
>         PASSED;
>
> Any particular reason why you removed this?  I don't ever remember this
> being a problem on ipath or mthca.
>
> Regards,
>  Robert.
>   
Yes, in the IB spec, the query QP don't have to return the signal type 
of the SQ.
so, i prefer not to check an attribute that is not mandatory.

Dotan


From aviram at mellanox.co.il  Thu Sep 21 05:34:52 2006
From: aviram at mellanox.co.il (Aviram Gutman)
Date: Thu, 21 Sep 2006 15:34:52 +0300
Subject: [openib-general] OFED 1.1 RC6
Message-ID: <2D5DEE3C6A0E0244B0133244731D4C4BC03C@mtlexch01.mtl.com>


Hi,

OFED-1.1-rc6 is available on
https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-rc6.tgz

Please report any issues in bugzilla http://openib.org/bugzilla/


OS support:
===========
Novell:
     - SLES 9.0 SP3
     - SLES10
Redhat:
     - Redhat EL4 up3

     - Redhat EL4 up4*
kernel.org:
     - Kernel 2.6.17

* Please notice that there is no IPoIB multicast support in Redhat EL4
up4. A kernel patch will be in the documentation and customers that
needs it needs to use it


Limitations and known issues:
=================================

1. OpenIB Diags build on SLES10 ppc - Mellanox to fix - Vlad
2. iSER build on SLES10 needs root privilege - Voltaire supplied a fix
that will be integrated next week
3. Bug #233 SDP crash on ipath - not a showstopper to the release
4. Bug #33 ping fails on ib1 with Silverstorm switch - We couldn't
reproduce the problem. Silverstorm, Please debug
5. Fix IBDM to allow multiple devices on the same machine - Eitan Zahavi


Schedule:
========='

Seems that we need another RC to fix the issue. Will RC7 on Monday or
Tuesday next week (hopefully with minor fixes) and a final release by
end of next is OK with every one?


Thanks,
	Aviram

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060921/d5b1971d/attachment.html>

From halr at voltaire.com  Thu Sep 21 05:39:02 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 21 Sep 2006 08:39:02 -0400
Subject: [openib-general] [PATCHv2] osm: fixing bugs in osmtest
In-Reply-To: <yzspsdpbrs1.fsf@kliteynik.yok.mtl.com>
References: <yzspsdpbrs1.fsf@kliteynik.yok.mtl.com>
Message-ID: <1158842325.4509.111417.camel@hal.voltaire.com>

Hi Yevgeny,

On Thu, 2006-09-21 at 03:30, Yevgeny Kliteynik wrote:
> Hi Hal
> 
> It appears that each mailer is messing with white spaces 
> in its own very special way... 
> 
> Anyway, this time it is ok for sure.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Yes, that's better!

Thanks. Applied to trunk only.

-- Hal


From eli at dev.mellanox.co.il  Thu Sep 21 06:10:25 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 21 Sep 2006 16:10:25 +0300
Subject: [openib-general] [PATCH] IB/ipoib: likely/unlikely annotations
Message-ID: <1158844225.24776.118.camel@localhost>

Use likely/unlikely in data tx flow

Signed-off-by: Eli Cohen <eli at dev.mellanox.co.il>
Acked-by: Michael S. Tsirkin <mst at mellanox.co.il>
---

Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c	2006-09-21 15:43:49.000000000 +0300
+++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c	2006-09-21 15:46:26.000000000 +0300
@@ -643,7 +643,7 @@
 	struct ipoib_neigh *neigh;
 	unsigned long flags;
 
-	if (!spin_trylock_irqsave(&priv->tx_lock, flags))
+	if (unlikely(!spin_trylock_irqsave(&priv->tx_lock, flags)))
 		return NETDEV_TX_LOCKED;
 
 	/*
@@ -656,7 +656,7 @@
 		return NETDEV_TX_BUSY;
 	}
 
-	if (skb->dst && skb->dst->neighbour) {
+	if (likely(skb->dst && skb->dst->neighbour)) {
 		if (unlikely(!*to_ipoib_neigh(skb->dst->neighbour))) {
 			ipoib_path_lookup(skb, dev);
 			goto out;


From eli at dev.mellanox.co.il  Thu Sep 21 06:39:26 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 21 Sep 2006 16:39:26 +0300
Subject: [openib-general]  [PATCH] IB/ipoib: unlikely in send
Message-ID: <1158845966.24776.122.camel@localhost>

Use unlikely in send flow

Signed-off-by: Eli Cohen <eli at dev.mellanox.co.il>
---

Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2006-09-21 16:19:33.000000000 +0300
+++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2006-09-21 16:20:39.000000000 +0300
@@ -385,7 +385,7 @@
 	struct ipoib_tx_buf *tx_req;
 	dma_addr_t addr;
 
-	if (skb->len > dev->mtu + INFINIBAND_ALEN) {
+	if (unlikely(skb->len > dev->mtu + INFINIBAND_ALEN)) {
 		ipoib_warn(priv, "packet len %d (> %d) too long to send, dropping\n",
 			   skb->len, dev->mtu + INFINIBAND_ALEN);
 		++priv->stats.tx_dropped;


From eli at dev.mellanox.co.il  Thu Sep 21 07:56:32 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 21 Sep 2006 17:56:32 +0300
Subject: [openib-general] heads-up - ipoib NAPI
Message-ID: <1158850592.24776.156.camel@localhost>

Hi,

I have a draft implementation of NAPI in ipoib and got the following
results:

System descriptions
===================
Quad CPU E64T 2.4 Ghz
4 GB RAM
MT25204 Sinai HCA

I used netperf for benchmarking, the BW test ran for 600 seconds with 8
clients and 8 servers.

The results I received are bellow:

netperf TCP_STREAM:
		BW [MByte/sec]    clients side [irqs/sec]   server side [irqs/sec]
		--------------    -----------------------   ----------------------
without NAPI:       506                    86441                   66311
with NAPI:          550                     6830                   13600 


netperf TCP_RR:
		rate [tran/sec]
                ---------------
without NAPI:      39600
with NAPI:         39470


Please note this is still under work and we plan to do more tests and
measure on other devices.


From eli at dev.mellanox.co.il  Thu Sep 21 07:57:37 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Thu, 21 Sep 2006 17:57:37 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
Message-ID: <1158850657.24776.158.camel@localhost>

This patch implements NAPI for iopib. It is a draft implementation.
I would like your opinion on whether we need a module parameter
to control if NAPI should be activated or not.
Also there is a need to implement peek_cq and call it for
ib_req_notify_cq() so as to know if there is a need to call
netif_rx_schedule_prep() again.

Signed-off-by: Eli Cohen <eli at dev.mellanox.co.il>
---

Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c	2006-09-21 16:30:35.000000000 +0300
+++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_main.c	2006-09-21 16:30:42.000000000 +0300
@@ -69,6 +69,8 @@
 MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0");
 #endif
 
+static const int poll_def_weight = 64;
+
 struct ipoib_path_iter {
 	struct net_device *dev;
 	struct ipoib_path  path;
@@ -91,6 +93,9 @@
 	.remove = ipoib_remove_one
 };
 
+
+int ipoib_poll(struct net_device *dev, int *budget);
+
 int ipoib_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -689,6 +694,7 @@
 			goto out;
 		}
 
+
 		if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
 			spin_lock(&priv->lock);
 			__skb_queue_tail(&neigh->queue, skb);
@@ -892,6 +898,7 @@
 
 	/* Delete any child interfaces first */
 	list_for_each_entry_safe(cpriv, tcpriv, &priv->child_intfs, list) {
+		netif_poll_disable(priv->dev);
 		unregister_netdev(cpriv->dev);
 		ipoib_dev_cleanup(cpriv->dev);
 		free_netdev(cpriv->dev);
@@ -919,6 +926,8 @@
 	dev->hard_header 	 = ipoib_hard_header;
 	dev->set_multicast_list  = ipoib_set_mcast_list;
 	dev->neigh_setup         = ipoib_neigh_setup_dev;
+	dev->poll                = ipoib_poll;
+	dev->weight              = poll_def_weight;
 
 	dev->watchdog_timeo 	 = HZ;
 
@@ -1097,6 +1106,8 @@
 		goto register_failed;
 	}
 
+	netif_poll_enable(priv->dev);
+
 	ipoib_create_debug_files(priv->dev);
 
 	if (ipoib_add_pkey_attr(priv->dev))
@@ -1111,6 +1122,7 @@
 	return priv->dev;
 
 sysfs_failed:
+	netif_poll_disable(priv->dev);
 	ipoib_delete_debug_files(priv->dev);
 	unregister_netdev(priv->dev);
 
@@ -1168,6 +1180,7 @@
 	dev_list = ib_get_client_data(device, &ipoib_client);
 
 	list_for_each_entry_safe(priv, tmp, dev_list, list) {
+		netif_poll_disable(priv->dev);
 		ib_unregister_event_handler(&priv->event_handler);
 		flush_scheduled_work();
 
Index: openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- openib-1.1-rc6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2006-09-21 16:30:38.000000000 +0300
+++ openib-1.1-rc6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2006-09-21 17:24:59.000000000 +0300
@@ -169,7 +169,7 @@
 	return 0;
 }
 
-static void ipoib_ib_handle_wc(struct net_device *dev,
+static void ipoib_ib_handle_rwc(struct net_device *dev,
 			       struct ib_wc *wc)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -178,122 +178,186 @@
 	ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n",
 		       wr_id, wc->opcode, wc->status);
 
-	if (wr_id & IPOIB_OP_RECV) {
-		wr_id &= ~IPOIB_OP_RECV;
-
-		if (wr_id < ipoib_recvq_size) {
-			struct sk_buff *skb  = priv->rx_ring[wr_id].skb;
-			dma_addr_t      addr = priv->rx_ring[wr_id].mapping;
-
-			if (unlikely(wc->status != IB_WC_SUCCESS)) {
-				if (wc->status != IB_WC_WR_FLUSH_ERR)
-					ipoib_warn(priv, "failed recv event "
-						   "(status=%d, wrid=%d vend_err %x)\n",
-						   wc->status, wr_id, wc->vendor_err);
-				dma_unmap_single(priv->ca->dma_device, addr,
-						 IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
-				dev_kfree_skb_any(skb);
-				priv->rx_ring[wr_id].skb = NULL;
-				return;
-			}
-
-			/*
-			 * If we can't allocate a new RX buffer, dump
-			 * this packet and reuse the old buffer.
-			 */
-			if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) {
-				++priv->stats.rx_dropped;
-				goto repost;
-			}
-
-			ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
-				       wc->byte_len, wc->slid);
+	wr_id &= ~IPOIB_OP_RECV;
 
+	if (wr_id < ipoib_recvq_size) {
+		struct sk_buff *skb  = priv->rx_ring[wr_id].skb;
+		dma_addr_t      addr = priv->rx_ring[wr_id].mapping;
+
+		if (unlikely(wc->status != IB_WC_SUCCESS)) {
+			if (wc->status != IB_WC_WR_FLUSH_ERR)
+				ipoib_warn(priv, "failed recv event "
+					   "(status=%d, wrid=%d vend_err %x)\n",
+					   wc->status, wr_id, wc->vendor_err);
 			dma_unmap_single(priv->ca->dma_device, addr,
 					 IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
+			dev_kfree_skb_any(skb);
+			priv->rx_ring[wr_id].skb = NULL;
+			return;
+		}
 
-			skb_put(skb, wc->byte_len);
-			skb_pull(skb, IB_GRH_BYTES);
+		/*
+		 * If we can't allocate a new RX buffer, dump
+		 * this packet and reuse the old buffer.
+		 */
+		if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) {
+			++priv->stats.rx_dropped;
+			goto repost;
+		}
 
-			if (wc->slid != priv->local_lid ||
-			    wc->src_qp != priv->qp->qp_num) {
-				skb->protocol = ((struct ipoib_header *) skb->data)->proto;
-				skb->mac.raw = skb->data;
-				skb_pull(skb, IPOIB_ENCAP_LEN);
-
-				dev->last_rx = jiffies;
-				++priv->stats.rx_packets;
-				priv->stats.rx_bytes += skb->len;
-
-				skb->dev = dev;
-				/* XXX get correct PACKET_ type here */
-				skb->pkt_type = PACKET_HOST;
-				netif_rx_ni(skb);
-			} else {
-				ipoib_dbg_data(priv, "dropping loopback packet\n");
-				dev_kfree_skb_any(skb);
-			}
+		ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
+			       wc->byte_len, wc->slid);
 
-		repost:
-			if (unlikely(ipoib_ib_post_receive(dev, wr_id)))
-				ipoib_warn(priv, "ipoib_ib_post_receive failed "
-					   "for buf %d\n", wr_id);
-		} else
-			ipoib_warn(priv, "completion event with wrid %d\n",
-				   wr_id);
+		dma_unmap_single(priv->ca->dma_device, addr,
+				 IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
 
-	} else {
-		struct ipoib_tx_buf *tx_req;
-		unsigned long flags;
+		skb_put(skb, wc->byte_len);
+		skb_pull(skb, IB_GRH_BYTES);
 
-		if (wr_id >= ipoib_sendq_size) {
-			ipoib_warn(priv, "completion event with wrid %d (> %d)\n",
-				   wr_id, ipoib_sendq_size);
-			return;
+		if (wc->slid != priv->local_lid ||
+		    wc->src_qp != priv->qp->qp_num) {
+			skb->protocol = ((struct ipoib_header *) skb->data)->proto;
+			skb->mac.raw = skb->data;
+			skb_pull(skb, IPOIB_ENCAP_LEN);
+
+			dev->last_rx = jiffies;
+			++priv->stats.rx_packets;
+			priv->stats.rx_bytes += skb->len;
+
+			skb->dev = dev;
+			/* XXX get correct PACKET_ type here */
+			skb->pkt_type = PACKET_HOST;
+			netif_receive_skb(skb);
+		} else {
+			ipoib_dbg_data(priv, "dropping loopback packet\n");
+			dev_kfree_skb_any(skb);
 		}
 
-		ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id);
+	repost:
+		if (unlikely(ipoib_ib_post_receive(dev, wr_id)))
+			ipoib_warn(priv, "ipoib_ib_post_receive failed "
+				   "for buf %d\n", wr_id);
+	} else
+		ipoib_warn(priv, "completion event with wrid %d\n",
+			   wr_id);
 
-		tx_req = &priv->tx_ring[wr_id];
+}
 
-		dma_unmap_single(priv->ca->dma_device,
-				 pci_unmap_addr(tx_req, mapping),
-				 tx_req->skb->len,
-				 DMA_TO_DEVICE);
 
-		++priv->stats.tx_packets;
-		priv->stats.tx_bytes += tx_req->skb->len;
+static void ipoib_ib_handle_swc(struct net_device *dev,
+			       struct ib_wc *wc)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned int wr_id = wc->wr_id;
+	struct ipoib_tx_buf *tx_req;
+	unsigned long flags;
 
-		dev_kfree_skb_any(tx_req->skb);
+	ipoib_dbg_data(priv, "called: id %d, op %d, status: %d\n",
+		       wr_id, wc->opcode, wc->status);
 
-		spin_lock_irqsave(&priv->tx_lock, flags);
-		++priv->tx_tail;
-		if (netif_queue_stopped(dev) &&
-		    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) &&
-		    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1)
-			netif_wake_queue(dev);
-		spin_unlock_irqrestore(&priv->tx_lock, flags);
-
-		if (wc->status != IB_WC_SUCCESS &&
-		    wc->status != IB_WC_WR_FLUSH_ERR)
-			ipoib_warn(priv, "failed send event "
-				   "(status=%d, wrid=%d vend_err %x)\n",
-				   wc->status, wr_id, wc->vendor_err);
+	if (wr_id >= ipoib_sendq_size) {
+		ipoib_warn(priv, "completion event with wrid %d (> %d)\n",
+			   wr_id, ipoib_sendq_size);
+		return;
 	}
+
+	ipoib_dbg_data(priv, "send complete, wrid %d\n", wr_id);
+
+	tx_req = &priv->tx_ring[wr_id];
+
+	dma_unmap_single(priv->ca->dma_device,
+			 pci_unmap_addr(tx_req, mapping),
+			 tx_req->skb->len,
+			 DMA_TO_DEVICE);
+
+	++priv->stats.tx_packets;
+	priv->stats.tx_bytes += tx_req->skb->len;
+
+	dev_kfree_skb_any(tx_req->skb);
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	++priv->tx_tail;
+	if (netif_queue_stopped(dev) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags) &&
+	    priv->tx_head - priv->tx_tail <= ipoib_sendq_size >> 1)
+		netif_wake_queue(dev);
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+
+	if (wc->status != IB_WC_SUCCESS &&
+	    wc->status != IB_WC_WR_FLUSH_ERR)
+		ipoib_warn(priv, "failed send event "
+			   "(status=%d, wrid=%d vend_err %x)\n",
+			   wc->status, wr_id, wc->vendor_err);
 }
 
-void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
+static inline int is_rx_comp(struct ib_wc *wc)
+{
+	unsigned int wr_id = wc->wr_id;
+
+	if (wr_id & IPOIB_OP_RECV)
+		return 1;
+
+	return 0;
+}
+
+int ipoib_poll(struct net_device *dev, int *budget)
 {
-	struct net_device *dev = (struct net_device *) dev_ptr;
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	int n, i;
+	struct ib_cq *cq = priv->cq;
+	int quota = dev->quota;
+	int wc;
+	int rx = 0;
+	int tx = 0;
 
-	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	do {
-		n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc);
-		for (i = 0; i < n; ++i)
-			ipoib_ib_handle_wc(dev, priv->ibwc + i);
-	} while (n == IPOIB_NUM_WC);
+		wc = min_t(int, quota, IPOIB_NUM_WC);
+		n = ib_poll_cq(cq, wc, priv->ibwc);
+		for (i = 0; i < n; ++i) {
+			if (is_rx_comp(priv->ibwc + i)) {
+				++rx;
+				--quota;
+				ipoib_ib_handle_rwc(dev, priv->ibwc + i);
+			}
+			else {
+				++tx;
+				ipoib_ib_handle_swc(dev, priv->ibwc + i);
+			}
+
+			if (unlikely(quota <= 0))
+				goto not_done;
+		}
+	} while (n == wc);
+
+	if (rx || tx)
+		goto not_done;
+
+
+	netif_rx_complete(dev);
+	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+	/* TODO we need peek_cq here for hw devices that
+	   could would not generate interrupts for completions
+	   arriving between end of polling till request notify */
+
+	return 0;
+
+not_done:
+	*budget -= rx;
+	dev->quota = quota;
+	return 1;
+}
+
+void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
+{
+	struct net_device *dev = (struct net_device *) dev_ptr;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	/* tell the network layer we have packts */
+        if (netif_rx_schedule_prep(dev))
+		__netif_rx_schedule(dev);
+	else {
+		ipoib_warn(priv, "received interupt while in polling\n");
+	}
 }
 
 static inline int post_send(struct ipoib_dev_priv *priv,


From trimmer at silverstorm.com  Thu Sep 21 08:10:09 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Thu, 21 Sep 2006 11:10:09 -0400
Subject: [openib-general] Completion callback /teardown race
In-Reply-To: <20060920051420.GH1710@mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EF6E5@mail.silverstorm.com>

> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
> Sent: Wednesday, September 20, 2006 1:14 AM
> To: Tillier, Fabian
> Cc: Rimmer, Todd; openib-general at openib.org
> Subject: Re: Completion callback /teardown race
> 
> Quoting r. Fabian Tillier <ftillier at silverstorm.com>:
> > > There are some differences in HCA behaviour with regard to
> > > ib_req_notify_cq.  Mellanox HCAs will provide a callback/interrupt
if
> > > the CQ is not empty at this point (in which case the poll_cq's
after
> the
> > > notify are optional).
> > >
> > > However the behaviour defined in the IBTA spec indicates that
> > > ib_req_notify_cq will cause a callback/interrupt only on the next
CQE
> > > which arrives, hence to be portable the poll_cq loop after
> > > ib_req_notify_cq is necessary to cover any CQEs which arrived
between
> > > the prior poll and the ib_req_notify_cq.
> >
> > I remember a while ago a mention that the behavior of the Mellanox
> > HCAs could be controlled in the firmware, so that they would follow
> > the IBTA spec defined behavior.
> 
> There's a mistake here. Mellanox HCAs will generate an event upon
> ib_req_notify_cq only if new completions has arrived after the
previous
> event
> has been reported.
> 
> AFAIK this is IBTA spec compliant.

I agree the Mellanox HCA is spec compliant.

The difference between HCAs is how they handle the situation:

CQE arrives
HCA generates event/callback
poll CQ, remove CQE
poll CQ, detect CQ is empty
CQE arrives
ib_req_notify_cq

At this point a Mellanox HCA will generate an event (as Michael
indicates, an unprocessed CQE has arrived since the previous event).

Many other HCAs given this situation will not generate an event, instead
they generate an event when a CQE arrives after the ib_req_notify_cq.

Hence to support other HCAs, ULPs should poll the CQ after the
ib_req_notify_cq.

On any HCA model, ULPs should be prepared for a callback where the CQ is
empty.  There are situations in either approach which can introduce an
extra callback after the CQ has been emptied.

Todd Rimmer


From mst at mellanox.co.il  Thu Sep 21 08:09:29 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 21 Sep 2006 18:09:29 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <1158850657.24776.158.camel@localhost>
References: <1158850657.24776.158.camel@localhost>
Message-ID: <20060921150929.GC28717@mellanox.co.il>

Quoting r. Eli cohen <eli at dev.mellanox.co.il>:
> Also there is a need to implement peek_cq and call it for
> ib_req_notify_cq() so as to know if there is a need to call
> netif_rx_schedule_prep() again.

Thanks, Eli.
Implementing peek_cq is not hard, at least for mthca. I wander what we
should do if peek_cq is not available in the low level driver.
I guess we could just disable NAPI for this case - Roland, would that
be acceptable?

-- 
MST


From rdreier at cisco.com  Thu Sep 21 08:20:28 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 08:20:28 -0700
Subject: [openib-general] timer_pending kernel assertion while stopping
	IPoIB
In-Reply-To: <Pine.LNX.4.64.0609211129040.28981@zuben> (Or Gerlitz's
	message of "Thu, 21 Sep 2006 11:35:10 +0300 (IDT)")
References: <Pine.LNX.4.64.0609211129040.28981@zuben>
Message-ID: <adau031i6v7.fsf@cisco.com>

    Or> the kernel is net-2.6.19 git

My first guess would be it's a bug introduced in the net-2.6.19 tree.
Can you reproduce it with plain 2.6.18 and/or my for-2.6.19 branch?

 - R.


From dotanb at dev.mellanox.co.il  Thu Sep 21 08:26:43 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 18:26:43 +0300
Subject: [openib-general] [ipoib] [PATCH] - Removed unused include of
	vmalloc.h
Message-ID: <4512AF33.7090002@dev.mellanox.co.il>

IPoIB: Removed unused include of vmalloc.h.

Signed-off-by: Dotan Barak <dotanb at mellanox.co.il>
---
Index: last_stable/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  
2006-08-07 17:45:02.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/ipoib/ipoib_main.c       
2006-08-08 09:36:45.000000000 +0300
@@ -40,7 +40,6 @@

 #include <linux/init.h>
 #include <linux/slab.h>
-#include <linux/vmalloc.h>
 #include <linux/kernel.h>

 #include <linux/if_arp.h>      /* For ARPHRD_xxx */


From dotanb at dev.mellanox.co.il  Thu Sep 21 08:45:14 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Thu, 21 Sep 2006 18:45:14 +0300
Subject: [openib-general] all of the man pages should change the package
	name to OFED
Message-ID: <4512B38A.8060002@dev.mellanox.co.il>

Hi.

When i executed "man ibv_devinfo" or "man ibstat" (for example) i 
notices that those man pages are marked as part of the OpenIB package.
I believe that the package name should be changed to OFED.

what do you think?

thanks
Dotan


From sean.hefty at intel.com  Thu Sep 21 09:54:22 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 21 Sep 2006 09:54:22 -0700
Subject: [openib-general] RDMA CM callback status
In-Reply-To: <200609210540.k8L5eBce029142@robert.bartonsoftware.com>
Message-ID: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com>

>1. Should I even be looking at event->status or does the event type tell me
>   everything I need to know?  I've had a report that the assertion
>   (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR.

The event type is usually sufficient.  In the case of an error, the status
should provide some additional information regarding the type of error.

It sounds like (and looks like from reading the code) that you've hit a bug with
the ROUTE_ERROR event.  The failure status isn't being propagated up to the
user.

>2. /* handle error out-of-line */ above means I record failure in my connection
>   data structure, start teardown and drop the callback's reference on it.
>   When the last reference goes, the connection data structure is queued for
>   final destruction (including rdma_destroy_id(cmid)).
>
>   Given that this might race with the callback's caller is this OK?

Yes - The RDMA CM holds a reference on the cmid while in a callback, and drops
it once the callback returns.  rdma_destroy_id() will block until all references
are released on the cmid.

- Sean


From sweitzen at cisco.com  Thu Sep 21 10:03:57 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 21 Sep 2006 10:03:57 -0700
Subject: [openib-general] Cisco SQA test results for OFED 1.1 rc5
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3024B28D6@xmb-sjc-216.amer.cisco.com>

All testing was done on RHEL4 U3.  No new bugs were found, overall
things are looking very good.

We have still not done any testing on RHEL4 U4, SLES 10, IPoIB HA, and
SRP HA yet.
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060921/bbb95869/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ofed_sqa_results.xls
Type: application/vnd.ms-excel
Size: 106496 bytes
Desc: ofed_sqa_results.xls
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060921/bbb95869/attachment.xls>

From rdreier at cisco.com  Thu Sep 21 10:10:15 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 10:10:15 -0700
Subject: [openib-general] all of the man pages should change the package
	name to OFED
In-Reply-To: <4512B38A.8060002@dev.mellanox.co.il> (Dotan Barak's
	message of "Thu, 21 Sep 2006 18:45:14 +0300")
References: <4512B38A.8060002@dev.mellanox.co.il>
Message-ID: <adalkodi1s8.fsf@cisco.com>

    Dotan> Hi.  When i executed "man ibv_devinfo" or "man ibstat" (for
    Dotan> example) i notices that those man pages are marked as part
    Dotan> of the OpenIB package.  I believe that the package name
    Dotan> should be changed to OFED.

Not for the libibverbs stuff, since many distributions (Debian,
Ubuntu, Fedora) include libibverbs without OFED.  I guess I'll just
delete the OpenIB references from the man pages.

 - R.


From mvharish at gmail.com  Thu Sep 21 10:24:20 2006
From: mvharish at gmail.com (harish)
Date: Thu, 21 Sep 2006 10:24:20 -0700
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <1158850592.24776.156.camel@localhost>
References: <1158850592.24776.156.camel@localhost>
Message-ID: <a33d0a9f0609211024i412e3fa1p6946339c46603eee@mail.gmail.com>

Hi Eli,

Thanks for sharing the results with us. It is great to see the reduction in
Interrupts. Could you please specify the netperf test specifications
[message size; socket size]. Wondering what the numbers would be if we use
large socket and message sizes [128K & 64K respectively]. The reason for the
request is to make sure we are not hitting any TCP related bottleneck while
comparing NAPI vs. no NAPI cases. Please let me know what you think.

Thanks,
harish

On 9/21/06, Eli cohen <eli at dev.mellanox.co.il> wrote:
>
> Hi,
>
> I have a draft implementation of NAPI in ipoib and got the following
> results:
>
> System descriptions
> ===================
> Quad CPU E64T 2.4 Ghz
> 4 GB RAM
> MT25204 Sinai HCA
>
> I used netperf for benchmarking, the BW test ran for 600 seconds with 8
> clients and 8 servers.
>
> The results I received are bellow:
>
> netperf TCP_STREAM:
>                 BW [MByte/sec]    clients side [irqs/sec]   server side
> [irqs/sec]
>                 --------------    -----------------------
> ----------------------
> without NAPI:       506                    86441                   66311
> with NAPI:          550                     6830                   13600
>
>
> netperf TCP_RR:
>                 rate [tran/sec]
>                 ---------------
> without NAPI:      39600
> with NAPI:         39470
>
>
>
> Please note this is still under work and we plan to do more tests and
> measure on other devices.
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060921/47296399/attachment.html>

From eli at dev.mellanox.co.il  Thu Sep 21 10:37:59 2006
From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il)
Date: Thu, 21 Sep 2006 20:37:59 +0300 (IDT)
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <a33d0a9f0609211024i412e3fa1p6946339c46603eee@mail.gmail.com>
References: <1158850592.24776.156.camel@localhost>
	<a33d0a9f0609211024i412e3fa1p6946339c46603eee@mail.gmail.com>
Message-ID: <61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il>

> Hi Eli,
>
> Thanks for sharing the results with us. It is great to see the reduction
> in
> Interrupts. Could you please specify the netperf test specifications
> [message size; socket size]. Wondering what the numbers would be if we use
> large socket and message sizes [128K & 64K respectively]. The reason for
> the
> request is to make sure we are not hitting any TCP related bottleneck
> while
> comparing NAPI vs. no NAPI cases. Please let me know what you think.

I used large socket buffer sizes. Here is the command line I used. The
reult for the bandwidth is the some of all the connections.

netperf -H 11.4.3.144 -l 600 -f M -p $port -- -s 200000,200000 -S
200000,200000


From rdreier at cisco.com  Thu Sep 21 11:07:30 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 11:07:30 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <1158850657.24776.158.camel@localhost> (Eli cohen's message
	of "Thu, 21 Sep 2006 17:57:37 +0300")
References: <1158850657.24776.158.camel@localhost>
Message-ID: <adahcz1hz4t.fsf@cisco.com>

Looks pretty good.  I took a stab at implementing this myself, and it
seems we came to the same conclusion: for generic HCAs that have a
race between request notify and poll CQ, there is no alternative
except "peek CQ".

However I don't think we want to use peek CQ always -- I think that
extra CQ lock/unlock may kill a lot of the performance gain you see
with NAPI (and I don't think even mthca can do a lockless CQ peek,
since we need to protect against races with resize CQ, etc).  So
probably what we need is a feature bit in the struct ib_device to say
whether the peek CQ is needed or whether req notify will generate
events for existing CQEs.

You might want to respin your patch against my for-2.6.19 branch -- I
already split up the handle WC routine into separate send and receive
functions, so the patch will become much smaller.

Also, the handling of how many completions to poll and the logic of
when to call netif_rx_complete() seems very strange to me.  First, you
totally ignore the budget parameter, so you may end up doing more work
than the networking upper layers want.  Second, you often leave the
poll routine to run one more time, even if you've drained the CQ
without using up your work quota.

My poll routine ended up looking like the following, which I think is
more correct:

+int ipoib_poll(struct net_device *dev, int *budget)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	int max = min(*budget, dev->quota);
+	int done = 0;
+	int t;
+	int empty = 0;
+	int n, i;
+
+	while (max) {
+		t = min(IPOIB_NUM_WC, max);
+		n = ib_poll_cq(priv->cq, t, priv->ibwc);
+
+		for (i = 0; i < n; ++i) {
+			if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {
+				++done;
+				--max;
+				ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
+			} else
+				ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);
+		}
+
+		if (n != t) {
+			empty = 1;
+			break;
+		}
+	}
+
+	dev->quota -= done;
+	*budget    -= done;
+
+	if (empty) {
+		netif_rx_complete(dev);
+		ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP);
+		/* XXX rotting packet! */
+		return 0;
+	}
+
+	return 1;
+}


From rdreier at cisco.com  Thu Sep 21 11:10:27 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 11:10:27 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060921150929.GC28717@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 21 Sep 2006 18:09:29 +0300")
References: <1158850657.24776.158.camel@localhost>
	<20060921150929.GC28717@mellanox.co.il>
Message-ID: <adad59phyzw.fsf@cisco.com>

    Michael> Thanks, Eli.  Implementing peek_cq is not hard, at least
    Michael> for mthca. I wander what we should do if peek_cq is not
    Michael> available in the low level driver.  I guess we could just
    Michael> disable NAPI for this case - Roland, would that be
    Michael> acceptable?

Actually as I mentioned in my reply to Eli, mthca doesn't actually
need the peek CQ operation, and I don't think we want IPoIB to be
doing a peek CQ for mthca devices.  But I'd rather not have to
maintain both a NAPI and non-NAPI IPoIB completion path, so I think
the thing to do would be to implement peek CQ for all devices that
don't have the "event for existing CQE" behavior of req_notify_cq.

 - R.


From rdreier at cisco.com  Thu Sep 21 11:46:28 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 11:46:28 -0700
Subject: [openib-general] [ipoib] [PATCH] - Removed unused include of
	vmalloc.h
In-Reply-To: <4512AF33.7090002@dev.mellanox.co.il> (Dotan Barak's
	message of "Thu, 21 Sep 2006 18:26:43 +0300")
References: <4512AF33.7090002@dev.mellanox.co.il>
Message-ID: <ada8xkdhxbv.fsf@cisco.com>

Thanks, applied by hand to for-2.6.19 although your patch was
corrupted (line wrapped, whitespace damage at least)

<standard whine>
I merge > 100 patches every kernel release.  If I have to spend an
extra 5 minutes for each one fixing a patch or pulling it out of svn,
then I end up burning an extra 9 hours of stupid work.  If 20+ people
who contribute patches sent me clean patches, then everyone will be
happier because I'll be able to merge things quicker and focus on
productive work.
</standard whine>


From amit_byron at yahoo.com  Thu Sep 21 12:47:28 2006
From: amit_byron at yahoo.com (amit byron)
Date: Thu, 21 Sep 2006 19:47:28 +0000 (UTC)
Subject: [openib-general] =?utf-8?q?max_message_size_for_IB=5FWR=5FSEND?=
References: <loom.20060920T204936-772@post.gmane.org>
	<4512244D.4040404@dev.mellanox.co.il>
Message-ID: <loom.20060921T212754-753@post.gmane.org>

Dotan Barak <dotanb <at> dev.mellanox.co.il> writes:

> 
> Hi.
> 
> amit byron wrote:
> > hi,
> >
> > if i evoke/call ib_post_send(IB_WR_SEND) with message
> > size 512 bytes, the message gets received on the
> > peer (second) node. the 2 nodes are connected point-to
> > -point.
> >
> > but if message size is increased to 4096 bytes then
> > second node receives the message; but message content
> > is missing (empty).
> >
> > won't infiniband stack break down message in smaller
> > chunks and assemble on peer node?
> >
> > thanks,
> > Amit.
> >   
> Which transport type are you using?
> if you are using a UD QP, then the answer is no.
> for any other transport type, the answer is yes (the message is being 
> break down to packets with the MTU side as specified in the QP context.
> 
> maybe you have a different problem in you code. did you check the 
> completion status in both of the nodes?
> 
> Dotan
> 
> 

i'm using RC connection. the issue seems to occur only when
running in xen's domain 0 (xen0). on core linux kernel, the
code works -- i'm able to do both send message and perform
rdma write with size greater than 4096.

i don't see any errors reported while sending a message with
size greater than 4096 (same hold true for rdma write).

i'm able send message (greater than 4096 bytes) from code
running in core linux kernel to peer node code that is
running in xen's domain 0.

this suggest that there is some hard-limit that prevents
infiniband to send message; but no errors are reported
from infiniband stack.

any suggestions on how to enable tracing in hca driver?

thanks,
Amit.


From bgreen at nas.nasa.gov  Thu Sep 21 13:22:01 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Thu, 21 Sep 2006 13:22:01 -0700
Subject: [openib-general] ib_rdma_bw measures 1.2G vs. 1.4G
Message-ID: <200609212022.k8LKM1eM007702@ece06.nas.nasa.gov>

Hello,
I've been testing rdma bandwidth between a number
of machines using ib_rdma_bw, and I consistently
see two approximate bandwidths, 1.4 GBytes/s or
1.2 GBytes/s.
The 1.4G/s rate is what I expect from the link,
but I don't know why in some cases I get 1.2G/s.

What could cause this particular quantized
degradation in performance?  So far, these are
the datapoints I have (all systems are Mellanox DDR, 8x PCI-E):

There are 7 machines:
 ZeonA, ZeonB:  dual-Core2 zeon systems running Suse 10.1, OFED 1.0
 OptiA, OptiB, OptiC: older dual-cpu/dual-core Opteron systems
  running Gentoo with specialized 2.6.15 kernel, openib-1.0 userland.
 OptiX, OptiZ: brand new dual-cpu/dual-core Opteron systems,
  running Gentoo with 2.6.17-gentoo kernel, openib-1.1 userland.

Between ZeonA, ZeonB:
    1.4 G/s
Between OptiA, OptiB, and OptiC:
    1.2 G/s
Between OptiB and OptiC after kernel upgrades to 2.6.17-gentoo:
    1.4 G/s
Between OptiA (2.6.15) and OptiB (2.6.17-gentoo):
    1.4 G/s

Between OptiX, OptiZ:
    1.2 G/s

Between OptiX, ZeonA:
    1.2 G/s

I'm bummed to be seeing 1.2 G/s on the newer systems
with the 2.6.17 kernel.  What might be the explanation?

-bryan


From rdreier at cisco.com  Thu Sep 21 13:44:10 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 13:44:10 -0700
Subject: [openib-general] [PATCH] IB/ipoib: unlikely in send
In-Reply-To: <1158845966.24776.122.camel@localhost> (Eli cohen's message
	of "Thu, 21 Sep 2006 16:39:26 +0300")
References: <1158845966.24776.122.camel@localhost>
Message-ID: <ada4pv1hrvp.fsf@cisco.com>

Thanks, applied to for-2.6.19


From rdreier at cisco.com  Thu Sep 21 13:47:09 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 13:47:09 -0700
Subject: [openib-general] [PATCH] Typo in ib_set_client_data()
In-Reply-To: <20060919070214.5476.99212.sendpatchset@localhost.localdomain>
	(Krishna Kumar's message of "Tue, 19 Sep 2006 12:32:14 +0530")
References: <20060919070214.5476.99212.sendpatchset@localhost.localdomain>
Message-ID: <adazmctgd6a.fsf@cisco.com>

Thanks, applied by hand to for-2.6.19, although you didn't make a
patch that applies with '-p1'.

<standard whine>
I merge > 100 patches every kernel release.  If I have to spend an
extra 5 minutes for each one fixing a patch or pulling it out of svn,
then I end up burning an extra 9 hours of stupid work.  If 20+ people
who contribute patches sent me clean patches, then everyone will be
happier because I'll be able to merge things quicker and focus on
productive work.
</standard whine>


From eli at dev.mellanox.co.il  Thu Sep 21 15:33:17 2006
From: eli at dev.mellanox.co.il (eli at dev.mellanox.co.il)
Date: Fri, 22 Sep 2006 01:33:17 +0300 (IDT)
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adahcz1hz4t.fsf@cisco.com>
References: <1158850657.24776.158.camel@localhost> <adahcz1hz4t.fsf@cisco.com>
Message-ID: <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il>

>
> However I don't think we want to use peek CQ always -- I think that
> extra CQ lock/unlock may kill a lot of the performance gain you see
> with NAPI (and I don't think even mthca can do a lockless CQ peek,
> since we need to protect against races with resize CQ, etc).  So
> probably what we need is a feature bit in the struct ib_device to say
> whether the peek CQ is needed or whether req notify will generate
> events for existing CQEs.
>
Sounds good to me

> You might want to respin your patch against my for-2.6.19 branch -- I
> already split up the handle WC routine into separate send and receive
> functions, so the patch will become much smaller.
>

Sure. I can send an updated patch

> Also, the handling of how many completions to poll and the logic of
> when to call netif_rx_complete() seems very strange to me.  First, you
> totally ignore the budget parameter,
Not totally. I update it whenever I decide it is a "not_done" which
happens in two cases:
a. I finish my quota
b. I handled any completions - rx or tx. I do count tx as well since I
want to  coalesce as many completions as possible.

I do not update quota and budget when I exit polling mode because I think
there is no point in doing that (this is unlike the example in
NAPI-howto.txt but I will check again if this is the right thing to do).


>
> My poll routine ended up looking like the following, which I think is
> more correct:
>
> +int ipoib_poll(struct net_device *dev, int *budget)
> +{
> +	struct ipoib_dev_priv *priv = netdev_priv(dev);
> +	int max = min(*budget, dev->quota);
> +	int done = 0;
> +	int t;
> +	int empty = 0;
> +	int n, i;
> +
> +	while (max) {
> +		t = min(IPOIB_NUM_WC, max);
> +		n = ib_poll_cq(priv->cq, t, priv->ibwc);
> +
> +		for (i = 0; i < n; ++i) {
> +			if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) {
> +				++done;
> +				--max;
> +				ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
> +			} else
> +				ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);
> +		}
> +
> +		if (n != t) {
> +			empty = 1;
> +			break;
I don't think this is the right thing to do. Polling less completions then
you could is no reason to quit polling mode. You may receive more
completions in the next time you got called.

> +		}
> +	}
> +
> +	dev->quota -= done;
> +	*budget    -= done;
> +
> +	if (empty) {
> +		netif_rx_complete(dev);
> +		ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP);
> +		/* XXX rotting packet! */
> +		return 0;
> +	}
> +
> +	return 1;
> +}
>


From rdreier at cisco.com  Thu Sep 21 16:00:32 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 21 Sep 2006 16:00:32 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il>
	(eli@dev.mellanox.co.il's message of
	"Fri, 22 Sep 2006 01:33:17 +0300 (IDT)")
References: <1158850657.24776.158.camel@localhost> <adahcz1hz4t.fsf@cisco.com>
	<62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il>
Message-ID: <adar6y4hlkf.fsf@cisco.com>

 > > So probably what we need is a feature bit in the struct ib_device
 > > to say whether the peek CQ is needed or whether req notify will
 > > generate events for existing CQEs.

 > Sounds good to me

The biggest problem I have with this is that I don't know what to call
the feature bit.  Any suggestions?

 > > Also, the handling of how many completions to poll and the logic of
 > > when to call netif_rx_complete() seems very strange to me.  First, you
 > > totally ignore the budget parameter,

 > Not totally. I update it whenever I decide it is a "not_done" which
 > happens in two cases:
 > a. I finish my quota
 > b. I handled any completions - rx or tx. I do count tx as well since I
 > want to  coalesce as many completions as possible.

Right, but as far as I can see you don't handle the case where *budget
is less than dev->quota -- you only update *budget, you never look at
the original value passed through it.

 > > +		if (n != t) {
 > > +			empty = 1;
 > > +			break;

 > I don't think this is the right thing to do. Polling less completions then
 > you could is no reason to quit polling mode. You may receive more
 > completions in the next time you got called.

I misread your code slightly the first time through, so I don't think
it's actually wrong now.  But I am pretty confident that my code is
"more correct": if we ask for n CQEs and only poll t < n of them, then
we know we have drained the CQ without exhausting our quota of work.
So we should switch back to event-driven mode at that point.

I don't think a correct NAPI implementation would drain the CQ and
then schedule another poll on an empty CQ.

 - R.


From sashak at voltaire.com  Thu Sep 21 18:27:05 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 22 Sep 2006 04:27:05 +0300
Subject: [openib-general] OFED-1.1-rc6 fails to build ibutils on PPC64
Message-ID: <20060922012705.GN11259@sashak.voltaire.com>

Hi,

Recently I've played with PPC64/SLES10 machine and found that it fails
to build ibutils package. The build log says:

  gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/usr/include -I/var/tmp/OFED/usr/local/ofed/include/infiniband -I/var/tmp/OFED/usr/local/ofed/include -DOSM_VENDOR_INTF_OPENIB -DOSM_BUILD_OPENIB -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -O2 -Wall -fno-strict-aliasing -fPIC -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -MT ibis_wrap.lo -MD -MP -MF .deps/ibis_wrap.Tpo -c ibis_wrap.c -o ibis_wrap.o >/dev/null 2>&1
  /bin/sh ../libtool --tag=CC --mode=link gcc -I/usr/include -I/var/tmp/OFED/usr/local/ofed/include/infiniband -I/var/tmp/OFED/usr/local/ofed/include  -DOSM_VENDOR_INTF_OPENIB  -DOSM_BUILD_OPENIB -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 -O2 -Wall -fno-strict-aliasing -fPIC -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2   -o libibis.la -rpath /usr/local/ofed/lib64 -version-info "1:0:0" -no-undefined -Wl,-rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -L/var/tmp/OFED/usr/local/ofed/lib64 -lopensm -losmvendor -losmcomp -libumad -libcommon -L/usr/lib64 -ltcl8.4 -ldl  -lm ibis_wrap.lo ibbbm.lo ibcr.lo ibis.lo ibis_gsi_mad_ctrl.lo ibpm.lo ibsac.lo ibsm.lo ibvs.lo  
  libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved
  libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved
  libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved
  libtool: link: warning: `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' seems to be moved
  libtool: link: warning: library `/var/tmp/OFED/usr/local/ofed/lib64/libibcommon.la' was moved.
  gcc -shared  .libs/ibis_wrap.o .libs/ibbbm.o .libs/ibcr.o .libs/ibis.o .libs/ibis_gsi_mad_ctrl.o .libs/ibpm.o .libs/ibsac.o .libs/ibsm.o .libs/ibvs.o  -Wl,--rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -Wl,--rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -L/var/tmp/OFED/usr/local/ofed/lib64 /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so -L/var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/management/libibcommon -L/usr/lib64 -L/var/tmp/OFEDRPM/BUILD/openib-1.1/src/userspace/management/libibumad /var/tmp/OFED/usr/local/ofed/lib64/libosmvendor.so /var/tmp/OFED/usr/local/ofed/lib64/libosmcomp.so /var/tmp/OFED/usr/local/ofed/lib64/libibumad.so /var/tmp/OFED/usr/local/ofed/lib64/libibcommon.so -ltcl8.4 -ldl -lm  -Wl,-rpath -Wl,/var/tmp/OFED/usr/local/ofed/lib64 -Wl,-soname -Wl,libibis.so.1 -o .libs/libibis.so.1.0.0
  /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so: could not read symbols: File in wrong format
  collect2: ld returned 1 exit status
  make[3]: *** [libibis.la] Error 1
  make[3]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ibutils-1.0/ibis/src'
  make[2]: *** [all-recursive] Error 1
  make[2]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ibutils-1.0/ibis'
  make[1]: *** [all] Error 2
  make[1]: Leaving directory `/var/tmp/OFEDRPM/BUILD/ibutils-1.0/ibis'
  make: *** [all-recursive] Error 1
  error: Bad exit status from /var/tmp/rpm-tmp.16324 (%install)


Seems that ibis uses gcc without -m64 flag and then tries to link with
"pure" 64 library:

  $ file /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so.1.1.0 
  /var/tmp/OFED/usr/local/ofed/lib64/libopensm.so.1.1.0: ELF 64-bit MSB shared object, cisco 7500, version 1 (SYSV), not stripped

, and

  $ file ibis_wrap.o
  ibis_wrap.o: ELF 32-bit MSB relocatable, PowerPC or cisco 4500, version 1 (SYSV), not stripped

Other less critical issue is warnings like:

  warning: user vlad does not exist - using root
  warning: group mtl does not exist - using root
  warning: user vlad does not exist - using root
  warning: group mtl does not exist - using root

in the build log (in a different places). I don't think this is PPC64
related.


I can add this report as update to [Bug 241] (PPC64/SLES10 OFED build)
if needed.

Sasha


From mvharish at gmail.com  Thu Sep 21 19:31:02 2006
From: mvharish at gmail.com (harish)
Date: Thu, 21 Sep 2006 19:31:02 -0700
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il>
References: <1158850592.24776.156.camel@localhost>
	<a33d0a9f0609211024i412e3fa1p6946339c46603eee@mail.gmail.com>
	<61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il>
Message-ID: <a33d0a9f0609211931l48d0fc71t897422c61d593b0b@mail.gmail.com>

Hi Eli,

How did the CPU utilizations compare for the NAPI vs. no NAPI case? What are
your thoughts on what bottleneck you are hitting?

Sorry to bother you ;)
thanks
harish

On 9/21/06, eli at dev.mellanox.co.il <eli at dev.mellanox.co.il> wrote:
>
> > Hi Eli,
> >
> > Thanks for sharing the results with us. It is great to see the reduction
> > in
> > Interrupts. Could you please specify the netperf test specifications
> > [message size; socket size]. Wondering what the numbers would be if we
> use
> > large socket and message sizes [128K & 64K respectively]. The reason for
> > the
> > request is to make sure we are not hitting any TCP related bottleneck
> > while
> > comparing NAPI vs. no NAPI cases. Please let me know what you think.
>
> I used large socket buffer sizes. Here is the command line I used. The
> reult for the bandwidth is the some of all the connections.
>
> netperf -H 11.4.3.144 -l 600 -f M -p $port -- -s 200000,200000 -S
> 200000,200000
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060921/e5ebceab/attachment.html>

From sweitzen at cisco.com  Thu Sep 21 22:45:44 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 21 Sep 2006 22:45:44 -0700
Subject: [openib-general] all of the man pages should change the package
 name to OFED
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3024B2C3C@xmb-sjc-216.amer.cisco.com>

OpenFabrics maybe, but not OFED in my opinion.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: openib-general-bounces at openib.org 
> [mailto:openib-general-bounces at openib.org] On Behalf Of Dotan Barak
> Sent: Thursday, September 21, 2006 8:45 AM
> To: Roland Dreier (rdreier); Hal Rosenstock
> Cc: openib
> Subject: [openib-general] all of the man pages should change 
> the package name to OFED
> 
> Hi.
> 
> When i executed "man ibv_devinfo" or "man ibstat" (for example) i 
> notices that those man pages are marked as part of the OpenIB package.
> I believe that the package name should be changed to OFED.
> 
> what do you think?
> 
> thanks
> Dotan
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From fuentesylogos at gmail.com  Fri Sep 22 00:16:20 2006
From: fuentesylogos at gmail.com (Fuentes y Logos)
Date: Fri, 22 Sep 2006 04:16:20 -0300
Subject: [openib-general] 20.000 FUENTES Y LOGOS A 35 PESOS - ENVIO GRATIS
Message-ID: <20060922071658.9CEF53B0001@sentry-two.sandia.gov>

PAQUETE DEFINITIVO DE FUENTES Y CIENTOS DE LOGOS 

MAS DE 20.000 FUENTES Y LOGOS DE UTILIDAD PARA DARLE PARA SU NEGOCIO O EMPRESA TOTALMENTE EDITABLES
LA MEJOR IMAGEN A TU NEGOCIO O TRABAJO
TODAS LAS FUENTES ACTUALIZADA 2006
EXCLUSIVA DE  FUENTES Y LOGOS

EL CD A SOLO 35.00 PESOS ENVIO GRATIS DENTRO DE CAPITAL FEDERAL

EL ENVIO ES GRATIS SOLO DENTRO DE CAP.FED 

EL ENVIO FUERA DE CAP.FED SALE DE 15 A 20 PESOS

EL ENVIO PARA OTROS PAISES SALE 10 DOLARES 


NUESTRO SITIO WEB:  WWW.DEFUENTES.COM

CONTACTO A: 
contacto at defuentes.com

-----------------------------------------------------------------------------------------------------------------------------------------

SI DESEA SALIR DE ESTA LISTA RESPONDA EL MENSAJE CON EL SUJETO(SUBJET) REMOVER


From delaitt at cpc.wmin.ac.uk  Fri Sep 22 04:38:18 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Fri, 22 Sep 2006 12:38:18 +0100 (BST)
Subject: [openib-general] openib on SLES10 and ksym errors
Message-ID: <Pine.GSO.4.58.0609221228400.19198@seth.cpc.wmin.ac.uk>


hi,

I've compiled OFED-1.0-plus-Open-MPI-1.1 on SLES10/32 bits.

Linux n31 2.6.16.21-0.8-smp #3 SMP Thu Sep 21 17:18:27 BST 2006 i686 i686

However, when i install the rpms, i get the following ksym errors. I also
tried with ofed-1.1 and get the same. I'm using an sles10 kernel with
lustre patches. maybe there is a mismatch between the kernel i have and
recompiled ofed rpms ? any help would be grately appreciated!

Thanks,

Thierry.

ksym(fd_install) = d291f2c9 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586
        ksym(dma_free_coherent) = d4c86700 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586
        ksym(contig_page_data) = d748c5fa is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586
        ksym(dev_base) = db8cb539 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586
        ksym(dev_queue_xmit) = dbcc4c81 is needed by kernel-ib-1.0-2.6.16.21_0.8_smp.i586
        ksym(skb_under_panic) = dc41edc4 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(arp_tbl) = e0239e26 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(seq_lseek) = e040abe0 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(get_sb_pseudo) = e5a89af7 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(register_netdev) = e5c7634f is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(__alloc_skb) = e89833c0 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(__free_pages) = e981956f is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(fget) = ea0f36ab is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(__pci_register_driver) = eabba033 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(alloc_netdev) = ec64cd34 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(kill_fasync) = ec855dd5 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(mem_map) = ee2ba07 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(sysfs_create_group) = f5711d54 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(fasync_helper) = f76f81a9 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(unregister_netdev) = f7ad65b5 is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586        ksym(pci_find_capability) = f906e7af is needed by
kernel-ib-1.0-2.6.16.21_0.8_smp.i586

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.


From hendole at yahoo.co.uk  Thu Sep 21 03:10:43 2006
From: hendole at yahoo.co.uk (Henry)
Date: Thu, 21 Sep 2006 10:10:43 -0000
Subject: [openib-general] Hello
Message-ID: <0b4072b12621d1bfa0$2406-6ed9cdf4@POSTE4>

Dear Sir,  

We are an investment company that specializes in co-ordinating highyielding investments for our clients who want to make confidential investments in any part of the world. We have clients who are interested in making investments worth millions of US dollars but specifically want to invest confidentially by appointing a third party to manage the investments on their behalf.

We are contacting you to know your ability and willingness to make and manage such investments with little supervision. Please confirm to us that you will work with us to achieve the investment objectives of our clients.The fund is available in a secure bank account and shall be released upon favourable agreement with you. Please note that we want to make the investment as confidential as ever. If we do not hear from
you in 7 days, we shall consider it that you are not favourably disposed to accept our offer. You can contact me on  email;hendole at yahoo.co.uk  I await your urgent response.

Faithfully,

Hnry Ndoye

For:Global Investments Ltd


From halr at voltaire.com  Fri Sep 22 06:36:36 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 22 Sep 2006 09:36:36 -0400
Subject: [openib-general] [PATCH 3/13] osm: port to WinIB stack :
 include/iba/ib_types.h
In-Reply-To: <86y7silc1w.fsf@mtl066.yok.mtl.com>
References: <86y7silc1w.fsf@mtl066.yok.mtl.com>
Message-ID: <1158932195.4353.20494.camel@hal.voltaire.com>

On Sun, 2006-09-17 at 11:58, Eitan Zahavi wrote:
> Hi Hal
> 
> Most are just adding OSM_API for fucntion declarations.
> Some minor indentations.
> 
> Thanks
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied (with some cosmetic changes) to trunk only.

-- Hal


From delaitt at cpc.wmin.ac.uk  Fri Sep 22 07:00:21 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Fri, 22 Sep 2006 15:00:21 +0100 (BST)
Subject: [openib-general] OFED for SLES10
Message-ID: <Pine.GSO.4.58.0609221458320.19198@seth.cpc.wmin.ac.uk>


Hi,

Could someone point me in the right direction on how to compile ofed for
sles10 ?

Do i need to recompile modules for the kernel ?
Do i need to patch the kernel before compiling the module ?
Do i need to delete the existing infiniband directory in the kernel and replace it with ofed ?
Does the ofed distri include the kernel modules ?

Thanks,

Thierry.

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.


From thomas.bub at thomson.net  Fri Sep 22 07:41:04 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Fri, 22 Sep 2006 16:41:04 +0200
Subject: [openib-general] OFED for SLES10
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC46D@wdtssmail01.eu.thmulti.com>

Thierry,
the support for SLES10 has been introduced after the OFED-1.0 release
had been done.
Best thing is you download the OFED-1.1 Release Candidate 6 from:

https://openib.org/svn/gen2/branches/1.1/ofed/releases/OFED-1.1-rc6.tgz

You should have the kernel sources of the SLES10 kernel installed.
Unpack the tgz and run the install script found in the directory
OFED-1.1-rc6.
It does everything you need to get the drivers build and installed.
That should be all whats needed.

Thomas Bub

............................................................
Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: +49 6150 104 147
Fax: +49 6150 104 656
Email: Thomas.Bub at thomson.net
www.GrassValley.com 
............................................................


> -----Original Message-----
> From: openib-general-bounces at openib.org [mailto:openib-general-
> bounces at openib.org] On Behalf Of Thierry Delaitre
> Sent: Friday, September 22, 2006 4:00 PM
> To: openib-general at openib.org
> Subject: [openib-general] OFED for SLES10
> 
> 
> Hi,
> 
> Could someone point me in the right direction on how to compile ofed
for
> sles10 ?
> 
> Do i need to recompile modules for the kernel ?
> Do i need to patch the kernel before compiling the module ?
> Do i need to delete the existing infiniband directory in the kernel
and
> replace it with ofed ?
> Does the ofed distri include the kernel modules ?
> 
> Thanks,
> 
> Thierry.
> 
> ----------------------------------------
> Dr Thierry DELAITRE
> Systems and Services Manager, CSCS
> University of Westminster
> 115 New Cavendish Street, London W1W 6UW
> 
> Tel: 020 7911 5000 ext: 3586
> Fax: 020 7911 5089
> Mobile short dial code 1788
> 
> http://www.cscs.wmin.ac.uk/~delaitt
> ----------------------------------------
> 
> This e-mail and its attachments are intended for the above named only
> and may be confidential.  If they have come to you in error you must
> not copy or show them to anyone, nor should you take any action based
> on them, other than to notify the error by replying to the sender.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general


From hnguyen at de.ibm.com  Fri Sep 22 08:20:23 2006
From: hnguyen at de.ibm.com (Hoang-Nam Nguyen)
Date: Fri, 22 Sep 2006 17:20:23 +0200
Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface based
 on Anton Blanchard's new hvcall interface
Message-ID: <200609221720.24191.hnguyen@de.ibm.com>

Hello Roland!
Below is the patch of ehca according to Anton's new hvcall interface, which has been
committed in Paul's git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git

Besides the changes above this patch contains some coding style updates.
I created this patch against your git tree, branch for-2.6.19.

Thanks!
Hoang-Nam Nguyen


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_main.c |    9
 hcp_if.c    |  845 ++++++++++++++++++++---------------------------------------- hcp_if.h    |    2
 hipz_hw.h   |    2
 ipz_pt_fn.h |    7
 5 files changed, 300 insertions(+), 565 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 159b0be..0a0248f 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -5,6 +5,7 @@
  *
  *  Authors: Heiko J Schick <schickhj at de.ibm.com>
  *           Hoang-Nam Nguyen <hnguyen at de.ibm.com>
+ *           Joachim Fenkes <fenkes at de.ibm.com>
  *
  *  Copyright (c) 2005 IBM Corporation
  *
@@ -48,7 +49,7 @@ #include "hcp_if.h"
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0015");
+MODULE_VERSION("SVNEHCA_0016");
 
 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -268,7 +269,7 @@ int ehca_register_device(struct ehca_shc
   (1ull << IB_USER_VERBS_CMD_ATTACH_MCAST) |
   (1ull << IB_USER_VERBS_CMD_DETACH_MCAST);
 
- shca->ib_device.node_type           = RDMA_NODE_IB_CA;
+ shca->ib_device.node_type           = IB_NODE_CA;
  shca->ib_device.phys_port_cnt       = shca->num_ports;
  shca->ib_device.dma_device          = &shca->ibmebus_dev->ofdev.dev;
  shca->ib_device.query_device        = ehca_query_device;
@@ -446,7 +447,7 @@ static ssize_t  ehca_show_##name(struct 
   kfree(rblock);            \
   return 0;            \
  }           \
-            \
+                                                                           \
  data = rblock->name;                                               \
  kfree(rblock);                                                     \
             \
@@ -749,7 +750,7 @@ int __init ehca_module_init(void)
  int ret;
 
  printk(KERN_INFO "eHCA Infiniband Device Driver "
-                  "(Rel.: SVNEHCA_0015)\n");
+                  "(Rel.: SVNEHCA_0016)\n");
  idr_init(&ehca_qp_idr);
  idr_init(&ehca_cq_idr);
  spin_lock_init(&ehca_qp_idr_lock);
diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
index 260e82a..3fb46e6 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.c
+++ b/drivers/infiniband/hw/ehca/hcp_if.c
@@ -48,27 +48,27 @@ #include "hcp_phyp.h"
 #include "hipz_fns.h"
 #include "ipz_pt_fn.h"
 
-#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9,11)
-#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12,12)
-#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13,15)
-#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18,18)
-#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19,21)
-#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22,23)
-#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31,31)
-#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56,63)
-
-#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0,15)
-#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32,39)
-#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40,47)
-
-#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48,63)
-#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8,15)
-#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24,31)
-
-#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0,31)
-#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32,63)
+#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9, 11)
+#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12, 12)
+#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13, 15)
+#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18, 18)
+#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19, 21)
+#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22, 23)
+#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31, 31)
+#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56, 63)
+
+#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0, 15)
+#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32, 39)
+#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40, 47)
+
+#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48, 63)
+#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8, 15)
+#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24, 31)
+
+#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0, 31)
+#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32, 63)
 
 /* direct access qp controls */
 #define DAQP_CTRL_ENABLE    0x01
@@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu
  }
 }
 
-static long ehca_hcall_7arg_7ret(unsigned long opcode,
-     unsigned long arg1,
-     unsigned long arg2,
-     unsigned long arg3,
-     unsigned long arg4,
-     unsigned long arg5,
-     unsigned long arg6,
-     unsigned long arg7,
-     unsigned long *out1,
-     unsigned long *out2,
-     unsigned long *out3,
-     unsigned long *out4,
-     unsigned long *out5,
-     unsigned long *out6,
-     unsigned long *out7)
+static long ehca_plpar_hcall_norets(unsigned long opcode,
+        unsigned long arg1,
+        unsigned long arg2,
+        unsigned long arg3,
+        unsigned long arg4,
+        unsigned long arg5,
+        unsigned long arg6,
+        unsigned long arg7)
 {
  long ret;
  int i, sleep_msecs;
 
- ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx "
-       "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5,
-       arg6, arg7);
+ ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx "
+       "arg5=%lx arg6=%lx arg7=%lx",
+       opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
 
  for (i = 0; i < 5; i++) {
-  ret = plpar_hcall_7arg_7ret(opcode,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7,
-         out1, out2, out3, out4,
-         out5, out6,out7);
+  ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4,
+      arg5, arg6, arg7);
 
   if (H_IS_LONG_BUSY(ret)) {
    sleep_msecs = get_longbusy_msecs(ret);
@@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne
   if (ret < H_SUCCESS)
    ehca_gen_err("opcode=%lx ret=%lx"
          " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
-         " arg5=%lx arg6=%lx arg7=%lx"
-         " out1=%lx out2=%lx out3=%lx out4=%lx"
-         " out5=%lx out6=%lx out7=%lx",
+         " arg5=%lx arg6=%lx arg7=%lx ",
          opcode, ret,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7,
-         *out1, *out2, *out3, *out4,
-         *out5, *out6, *out7);
+         arg1, arg2, arg3, arg4, arg5,
+         arg6, arg7);
 
-  ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
-        "out4=%lx out5=%lx out6=%lx out7=%lx",
-        opcode, ret, *out1, *out2, *out3, *out4, *out5,
-        *out6, *out7);
+  ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret);
   return ret;
+
  }
 
  return H_BUSY;
 }
 
-static long ehca_hcall_9arg_9ret(unsigned long opcode,
-     unsigned long arg1,
-     unsigned long arg2,
-     unsigned long arg3,
-     unsigned long arg4,
-     unsigned long arg5,
-     unsigned long arg6,
-     unsigned long arg7,
-     unsigned long arg8,
-     unsigned long arg9,
-     unsigned long *out1,
-     unsigned long *out2,
-     unsigned long *out3,
-     unsigned long *out4,
-     unsigned long *out5,
-     unsigned long *out6,
-     unsigned long *out7,
-     unsigned long *out8,
-     unsigned long *out9)
+static long ehca_plpar_hcall9(unsigned long opcode,
+         unsigned long *outs, /* array of 9 outputs */
+         unsigned long arg1,
+         unsigned long arg2,
+         unsigned long arg3,
+         unsigned long arg4,
+         unsigned long arg5,
+         unsigned long arg6,
+         unsigned long arg7,
+         unsigned long arg8,
+         unsigned long arg9)
 {
  long ret;
  int i, sleep_msecs;
@@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne
        arg8, arg9);
 
  for (i = 0; i < 5; i++) {
-  ret = plpar_hcall_9arg_9ret(opcode,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7, arg8,
-         arg9,
-         out1, out2, out3, out4,
-         out5, out6, out7, out8,
-         out9);
+  ret = plpar_hcall9(opcode, outs,
+       arg1, arg2, arg3, arg4, arg5,
+       arg6, arg7, arg8, arg9);
 
   if (H_IS_LONG_BUSY(ret)) {
    sleep_msecs = get_longbusy_msecs(ret);
@@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne
          " out5=%lx out6=%lx out7=%lx out8=%lx"
          " out9=%lx",
          opcode, ret,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7, arg8,
-         arg9,
-         *out1, *out2, *out3, *out4,
-         *out5, *out6, *out7, *out8,
-         *out9);
+         arg1, arg2, arg3, arg4, arg5,
+         arg6, arg7, arg8, arg9,
+         outs[0], outs[1], outs[2], outs[3],
+         outs[4], outs[5], outs[6], outs[7],
+         outs[8]);
 
   ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
         "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx "
-        "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4,
-        *out5, *out6, *out7, *out8, *out9);
+        "out9=%lx",
+        opcode, ret, outs[0], outs[1], outs[2], outs[3],
+        outs[4], outs[5], outs[6], outs[7], outs[8]);
   return ret;
 
  }
 
  return H_BUSY;
 }
-
 u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle,
         struct ehca_pfeq *pfeq,
         const u32 neq_control,
         const u32 number_of_entries,
         struct ipz_eq_handle *eq_handle,
-        u32 * act_nr_of_entries,
-        u32 * act_pages,
-        u32 * eq_ist)
+        u32 *act_nr_of_entries,
+        u32 *act_pages,
+        u32 *eq_ist)
 {
  u64 ret;
- u64 dummy;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
  u64 allocate_controls;
- u64 act_nr_of_entries_out, act_pages_out, eq_ist_out;
 
  /* resource type */
  allocate_controls = 3ULL;
@@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc
  else /* notification event queue */
   allocate_controls = (1ULL << 63) | allocate_controls;
 
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,  /* r4 */
-       allocate_controls,      /* r5 */
-       number_of_entries,      /* r6 */
-       0, 0, 0, 0,
-       &eq_handle->handle,     /* r4 */
-       &dummy,            /* r5 */
-       &dummy,            /* r6 */
-       &act_nr_of_entries_out, /* r7 */
-       &act_pages_out,    /* r8 */
-       &eq_ist_out,            /* r8 */
-       &dummy);
-
- *act_nr_of_entries = (u32)act_nr_of_entries_out;
- *act_pages         = (u32)act_pages_out;
- *eq_ist            = (u32)eq_ist_out;
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,  /* r4 */
+    allocate_controls,      /* r5 */
+    number_of_entries,      /* r6 */
+    0, 0, 0, 0, 0, 0);
+ eq_handle->handle = outs[0];
+ *act_nr_of_entries = (u32)outs[3];
+ *act_pages = (u32)outs[4];
+ *eq_ist = (u32)outs[5];
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Not enough resource - ret=%lx ", ret);
@@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_
          struct ipz_eq_handle eq_handle,
          const u64 event_mask)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_RESET_EVENTS,
-        adapter_handle.handle, /* r4 */
-        eq_handle.handle,      /* r5 */
-        event_mask,            /* r6 */
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_RESET_EVENTS,
+           adapter_handle.handle, /* r4 */
+           eq_handle.handle,      /* r5 */
+           event_mask,       /* r6 */
+           0, 0, 0, 0);
 }
 
 u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle,
@@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc
         struct ehca_alloc_cq_parms *param)
 {
  u64 ret;
- u64 dummy;
- u64 act_nr_of_entries_out, act_pages_out;
- u64 g_la_privileged_out, g_la_user_out;
-
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,     /* r4  */
-       2,                       /* r5  */
-       param->eq_handle.handle,   /* r6  */
-       cq->token,               /* r7  */
-       param->nr_cqe,             /* r8  */
-       0, 0,
-       &cq->ipz_cq_handle.handle, /* r4  */
-       &dummy,               /* r5  */
-       &dummy,               /* r6  */
-       &act_nr_of_entries_out,    /* r7  */
-       &act_pages_out,       /* r8  */
-       &g_la_privileged_out,      /* r9  */
-       &g_la_user_out);           /* r10 */
-
- param->act_nr_of_entries = (u32)act_nr_of_entries_out;
- param->act_pages = (u32)act_pages_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,   /* r4  */
+    2,                  /* r5  */
+    param->eq_handle.handle, /* r6  */
+    cq->token,          /* r7  */
+    param->nr_cqe,           /* r8  */
+    0, 0, 0, 0);
+ cq->ipz_cq_handle.handle = outs[0];
+ param->act_nr_of_entries = (u32)outs[3];
+ param->act_pages = (u32)outs[4];
 
  if (ret == H_SUCCESS)
-  hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out);
+  hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc
         struct ehca_alloc_qp_parms *parms)
 {
  u64 ret;
- u64 dummy, allocate_controls, max_r10_reg;
- u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out;
+ u64 allocate_controls;
+ u64 max_r10_reg;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
  u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1;
  u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1;
  int daqp_ctrl = parms->daqp_ctrl;
@@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc
   | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE,
      parms->max_recv_sge);
 
-
- ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,       /* r4  */
-       allocate_controls,               /* r5  */
-       qp->send_cq->ipz_cq_handle.handle,
-       qp->recv_cq->ipz_cq_handle.handle,
-       parms->ipz_eq_handle.handle,
-       ((u64)qp->token << 32) | parms->pd.value,
-       max_r10_reg,                       /* r10 */
-       parms->ud_av_l_key_ctl,            /* r11 */
-       0,
-       &qp->ipz_qp_handle.handle,
-       &qp_nr_out,                       /* r5  */
-       &r6_out,                       /* r6  */
-       &r7_out,                       /* r7  */
-       &r8_out,                       /* r8  */
-       &dummy,                       /* r9  */
-       &g_la_user_out,               /* r10 */
-       &r11_out,
-       &dummy);
-
- /* extract outputs */
- qp->real_qp_num = (u32)qp_nr_out;
-
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,            /* r4  */
+    allocate_controls,            /* r5  */
+    qp->send_cq->ipz_cq_handle.handle,
+    qp->recv_cq->ipz_cq_handle.handle,
+    parms->ipz_eq_handle.handle,
+    ((u64)qp->token << 32) | parms->pd.value,
+    max_r10_reg,                    /* r10 */
+    parms->ud_av_l_key_ctl,            /* r11 */
+    0);
+ qp->ipz_qp_handle.handle = outs[0];
+ qp->real_qp_num = (u32)outs[1];
  parms->act_nr_send_sges =
-  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out);
+  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]);
  parms->act_nr_recv_wqes =
-  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out);
+  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]);
  parms->act_nr_send_sges =
-  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out);
+  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]);
  parms->act_nr_recv_sges =
-  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out);
+  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]);
  parms->nr_sq_pages =
-  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out);
+  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]);
  parms->nr_rq_pages =
-  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out);
+  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]);
 
  if (ret == H_SUCCESS)
-  hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out);
+  hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
-  ehca_gen_err("Not enough resources. ret=%lx",ret);
+  ehca_gen_err("Not enough resources. ret=%lx", ret);
 
  return ret;
 }
@@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a
         struct hipz_query_port *query_port_response_block)
 {
  u64 ret;
- u64 dummy;
  u64 r_cb = virt_to_abs(query_port_response_block);
 
  if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a
   return H_PARAMETER;
  }
 
- ret = ehca_hcall_7arg_7ret(H_QUERY_PORT,
-       adapter_handle.handle, /* r4 */
-       port_id,           /* r5 */
-       r_cb,           /* r6 */
-       0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall_norets(H_QUERY_PORT,
+          adapter_handle.handle, /* r4 */
+          port_id,              /* r5 */
+          r_cb,              /* r6 */
+          0, 0, 0, 0);
 
  if (ehca_debug_level)
   ehca_dmp(query_port_response_block, 64, "response_block");
@@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a
 u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
        struct hipz_query_hca *query_hca_rblock)
 {
- u64 dummy;
  u64 r_cb = virt_to_abs(query_hca_rblock);
 
  if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad
   return H_PARAMETER;
  }
 
- return ehca_hcall_7arg_7ret(H_QUERY_HCA,
-        adapter_handle.handle, /* r4 */
-        r_cb,                  /* r5 */
-        0, 0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_QUERY_HCA,
+           adapter_handle.handle, /* r4 */
+           r_cb,                  /* r5 */
+           0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle,
@@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i
      const u64 logical_address_of_page,
      u64 count)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES,
-        adapter_handle.handle,      /* r4  */
-        queue_type | pagesize << 8, /* r5  */
-        resource_handle,         /* r6  */
-        logical_address_of_page,    /* r7  */
-        count,                 /* r8  */
-        0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_REGISTER_RPAGES,
+           adapter_handle.handle,      /* r4  */
+           queue_type | pagesize << 8, /* r5  */
+           resource_handle,            /* r6  */
+           logical_address_of_page,    /* r7  */
+           count,                    /* r8  */
+           0, 0);
 }
 
 u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle,
@@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc
          logical_address_of_page, count);
 }
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
       u32 ist)
 {
- u32 ret;
- u64 dummy;
-
- ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE,
-       adapter_handle.handle, /* r4 */
-       ist,                   /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ u64 ret;
+ ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE,
+          adapter_handle.handle, /* r4 */
+          ist,                   /* r5 */
+          0, 0, 0, 0, 0);
 
  if (ret != H_SUCCESS && ret != H_BUSY)
   ehca_gen_err("Could not query interrupt state.");
@@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str
           void **log_addr_next_rq_wqe2processed,
           int dis_and_get_function_code)
 {
- u64 dummy, dummy1, dummy2;
-
- if (!log_addr_next_sq_wqe2processed)
-  log_addr_next_sq_wqe2processed = (void**)&dummy1;
- if (!log_addr_next_rq_wqe2processed)
-  log_addr_next_rq_wqe2processed = (void**)&dummy2;
-
- return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-        adapter_handle.handle,     /* r4 */
-        dis_and_get_function_code, /* r5 */
-        qp_handle.handle,        /* r6 */
-        0, 0, 0, 0,
-        (void*)log_addr_next_sq_wqe2processed,
-        (void*)log_addr_next_rq_wqe2processed,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ u64 ret;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+    adapter_handle.handle,     /* r4 */
+    dis_and_get_function_code, /* r5 */
+    qp_handle.handle,    /* r6 */
+    0, 0, 0, 0, 0, 0);
+ if (log_addr_next_sq_wqe2processed)
+  *log_addr_next_sq_wqe2processed = (void*)outs[0];
+ if (log_addr_next_rq_wqe2processed)
+  *log_addr_next_rq_wqe2processed = (void*)outs[1];
+
+ return ret;
 }
 
 u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle,
@@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad
        struct h_galpa gal)
 {
  u64 ret;
- u64 dummy;
- u64 invalid_attribute_identifier, rc_attrib_mask;
-
- ret = ehca_hcall_7arg_7ret(H_MODIFY_QP,
-       adapter_handle.handle,         /* r4 */
-       qp_handle.handle,           /* r5 */
-       update_mask,                   /* r6 */
-       virt_to_abs(mqpcb),           /* r7 */
-       0, 0, 0,
-       &invalid_attribute_identifier, /* r4 */
-       &dummy,                   /* r5 */
-       &dummy,                   /* r6 */
-       &dummy,                        /* r7 */
-       &dummy,                   /* r8 */
-       &rc_attrib_mask,               /* r9 */
-       &dummy);
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+ ret = ehca_plpar_hcall9(H_MODIFY_QP, outs,
+    adapter_handle.handle, /* r4 */
+    qp_handle.handle,      /* r5 */
+    update_mask,        /* r6 */
+    virt_to_abs(mqpcb),    /* r7 */
+    0, 0, 0, 0, 0);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Insufficient resources ret=%lx", ret);
@@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada
       struct hcp_modify_qp_control_block *qqpcb,
       struct h_galpa gal)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_QUERY_QP,
-        adapter_handle.handle, /* r4 */
-        qp_handle.handle,      /* r5 */
-        virt_to_abs(qqpcb),    /* r6 */
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_QUERY_QP,
+           adapter_handle.handle, /* r4 */
+           qp_handle.handle,      /* r5 */
+           virt_to_abs(qqpcb),    /* r6 */
+           0, 0, 0, 0);
 }
 
 u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle,
         struct ehca_qp *qp)
 {
  u64 ret;
- u64 dummy;
- u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
 
  ret = hcp_galpas_dtor(&qp->galpas);
  if (ret) {
   ehca_gen_err("Could not destruct qp->galpas");
   return H_RESOURCE;
  }
- ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-       adapter_handle.handle,     /* r4 */
-       /* function code */
-       1,                       /* r5 */
-       qp->ipz_qp_handle.handle,  /* r6 */
-       0, 0, 0, 0,
-       &ladr_next_sq_wqe_out,     /* r4 */
-       &ladr_next_rq_wqe_out,     /* r5 */
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+    adapter_handle.handle,     /* r4 */
+    /* function code */
+    1,                    /* r5 */
+    qp->ipz_qp_handle.handle,  /* r6 */
+    0, 0, 0, 0, 0, 0);
  if (ret == H_HARDWARE)
   ehca_gen_err("HCA not operational. ret=%lx", ret);
 
- ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-       adapter_handle.handle,     /* r4 */
-       qp->ipz_qp_handle.handle,  /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+          adapter_handle.handle,     /* r4 */
+          qp->ipz_qp_handle.handle,  /* r5 */
+          0, 0, 0, 0, 0);
 
  if (ret == H_RESOURCE)
   ehca_gen_err("Resource still in use. ret=%lx", ret);
@@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_
          struct h_galpa gal,
          u32 port)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_DEFINE_AQP0,
-        adapter_handle.handle, /* r4 */
-        qp_handle.handle,      /* r5 */
-        port,                  /* r6 */
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_DEFINE_AQP0,
+           adapter_handle.handle, /* r4 */
+           qp_handle.handle,      /* r5 */
+           port,                  /* r6 */
+           0, 0, 0, 0);
 }
 
 u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle,
@@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_
          u32 * bma_qp_nr)
 {
  u64 ret;
- u64 dummy;
- u64 pma_qp_nr_out, bma_qp_nr_out;
-
- ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1,
-       adapter_handle.handle, /* r4 */
-       qp_handle.handle,      /* r5 */
-       port,           /* r6 */
-       0, 0, 0, 0,
-       &pma_qp_nr_out,        /* r4 */
-       &bma_qp_nr_out,        /* r5 */
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
-
- *pma_qp_nr = (u32)pma_qp_nr_out;
- *bma_qp_nr = (u32)bma_qp_nr_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs,
+    adapter_handle.handle, /* r4 */
+    qp_handle.handle,      /* r5 */
+    port,                /* r6 */
+    0, 0, 0, 0, 0, 0);
+ *pma_qp_nr = (u32)outs[0];
+ *bma_qp_nr = (u32)outs[1];
 
  if (ret == H_ALIAS_EXIST)
   ehca_gen_err("AQP1 already exists. ret=%lx", ret);
@@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_
          u64 subnet_prefix, u64 interface_id)
 {
  u64 ret;
- u64 dummy;
-
- ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP,
-       adapter_handle.handle,     /* r4 */
-       qp_handle.handle,          /* r5 */
-       mcg_dlid,                  /* r6 */
-       interface_id,              /* r7 */
-       subnet_prefix,             /* r8 */
-       0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+
+ ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP,
+          adapter_handle.handle,  /* r4 */
+          qp_handle.handle,       /* r5 */
+          mcg_dlid,               /* r6 */
+          interface_id,           /* r7 */
+          subnet_prefix,          /* r8 */
+          0, 0);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_
          u16 mcg_dlid,
          u64 subnet_prefix, u64 interface_id)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_DETACH_MCQP,
-        adapter_handle.handle, /* r4 */
-        qp_handle.handle,    /* r5 */
-        mcg_dlid,            /* r6 */
-        interface_id,          /* r7 */
-        subnet_prefix,         /* r8 */
-        0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_DETACH_MCQP,
+           adapter_handle.handle, /* r4 */
+           qp_handle.handle,      /* r5 */
+           mcg_dlid,              /* r6 */
+           interface_id,          /* r7 */
+           subnet_prefix,         /* r8 */
+           0, 0);
 }
 
 u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle,
@@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a
         u8 force_flag)
 {
  u64 ret;
- u64 dummy;
 
  ret = hcp_galpas_dtor(&cq->galpas);
  if (ret) {
@@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a
   return H_RESOURCE;
  }
 
- ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-       adapter_handle.handle,     /* r4 */
-       cq->ipz_cq_handle.handle,  /* r5 */
-       force_flag != 0 ? 1L : 0L, /* r6 */
-       0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+          adapter_handle.handle,     /* r4 */
+          cq->ipz_cq_handle.handle,  /* r5 */
+          force_flag != 0 ? 1L : 0L, /* r6 */
+          0, 0, 0, 0);
 
  if (ret == H_RESOURCE)
   ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret);
@@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a
         struct ehca_eq *eq)
 {
  u64 ret;
- u64 dummy;
 
  ret = hcp_galpas_dtor(&eq->galpas);
  if (ret) {
@@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a
   return H_RESOURCE;
  }
 
- ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-       adapter_handle.handle,     /* r4 */
-       eq->ipz_eq_handle.handle,  /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
-
+ ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+          adapter_handle.handle,     /* r4 */
+          eq->ipz_eq_handle.handle,  /* r5 */
+          0, 0, 0, 0, 0);
 
  if (ret == H_RESOURCE)
   ehca_gen_err("Resource in use. ret=%lx ", ret);
@@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc
         struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 lkey_out;
- u64 rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,            /* r4 */
-       5,                                /* r5 */
-       vaddr,                            /* r6 */
-       length,                           /* r7 */
-       (((u64)access_ctrl) << 32ULL),    /* r8 */
-       pd.value,                         /* r9 */
-       0,
-       &(outparms->handle.handle),       /* r4 */
-       &dummy,                           /* r5 */
-       &lkey_out,                        /* r6 */
-       &rkey_out,                        /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
- outparms->lkey = (u32)lkey_out;
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,            /* r4 */
+    5,                                /* r5 */
+    vaddr,                            /* r6 */
+    length,                           /* r7 */
+    (((u64)access_ctrl) << 32ULL),    /* r8 */
+    pd.value,                         /* r9 */
+    0, 0, 0);
+ outparms->handle.handle = outs[0];
+ outparms->lkey = (u32)outs[2];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc
          queue_type,
          mr->ipz_mr_handle.handle,
          logical_address_of_page, count);
-
  return ret;
 }
 
@@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada
       struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out;
-
- ret = ehca_hcall_7arg_7ret(H_QUERY_MR,
-       adapter_handle.handle,     /* r4 */
-       mr->ipz_mr_handle.handle,  /* r5 */
-       0, 0, 0, 0, 0,
-       &outparms->len,            /* r4 */
-       &outparms->vaddr,          /* r5 */
-       &remote_len_out,           /* r6 */
-       &remote_vaddr_out,         /* r7 */
-       &acc_ctrl_pd_out,          /* r8 */
-       &r9_out,
-       &dummy);
-
- outparms->acl  = acc_ctrl_pd_out >> 32;
- outparms->lkey = (u32)(r9_out >> 32);
- outparms->rkey = (u32)(r9_out & (0xffffffff));
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_QUERY_MR, outs,
+    adapter_handle.handle,     /* r4 */
+    mr->ipz_mr_handle.handle,  /* r5 */
+    0, 0, 0, 0, 0, 0, 0);
+ outparms->len = outs[0];
+ outparms->vaddr = outs[1];
+ outparms->acl  = outs[4] >> 32;
+ outparms->lkey = (u32)(outs[5] >> 32);
+ outparms->rkey = (u32)(outs[5] & (0xffffffff));
 
  return ret;
 }
@@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada
 u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle,
        const struct ehca_mr *mr)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-        adapter_handle.handle,    /* r4 */
-        mr->ipz_mr_handle.handle, /* r5 */
-        0, 0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+           adapter_handle.handle,    /* r4 */
+           mr->ipz_mr_handle.handle, /* r5 */
+           0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle,
@@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i
      struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 lkey_out, rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR,
-       adapter_handle.handle,    /* r4 */
-       mr->ipz_mr_handle.handle, /* r5 */
-       vaddr_in,              /* r6 */
-       length,                   /* r7 */
-       /* r8 */
-       ((((u64)access_ctrl) << 32ULL) | pd.value),
-       mr_addr_cb,               /* r9 */
-       0,
-       &dummy,                   /* r4 */
-       &outparms->vaddr,         /* r5 */
-       &lkey_out,                /* r6 */
-       &rkey_out,                /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
-
- outparms->lkey = (u32)lkey_out;
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs,
+    adapter_handle.handle,    /* r4 */
+    mr->ipz_mr_handle.handle, /* r5 */
+    vaddr_in,           /* r6 */
+    length,                   /* r7 */
+    /* r8 */
+    ((((u64)access_ctrl) << 32ULL) | pd.value),
+    mr_addr_cb,               /* r9 */
+    0, 0, 0);
+ outparms->vaddr = outs[1];
+ outparms->lkey = (u32)outs[2];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz
    struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 lkey_out, rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR,
-       adapter_handle.handle,            /* r4 */
-       orig_mr->ipz_mr_handle.handle,    /* r5 */
-       vaddr_in,                         /* r6 */
-       (((u64)access_ctrl) << 32ULL),    /* r7 */
-       pd.value,                         /* r8 */
-       0, 0,
-       &(outparms->handle.handle),       /* r4 */
-       &dummy,                           /* r5 */
-       &lkey_out,                        /* r6 */
-       &rkey_out,                        /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
- outparms->lkey = (u32)lkey_out;
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs,
+    adapter_handle.handle,            /* r4 */
+    orig_mr->ipz_mr_handle.handle,    /* r5 */
+    vaddr_in,                         /* r6 */
+    (((u64)access_ctrl) << 32ULL),    /* r7 */
+    pd.value,                         /* r8 */
+    0, 0, 0, 0);
+ outparms->handle.handle = outs[0];
+ outparms->lkey = (u32)outs[2];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc
         struct ehca_mw_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,      /* r4 */
-       6,                          /* r5 */
-       pd.value,                   /* r6 */
-       0, 0, 0, 0,
-       &(outparms->handle.handle), /* r4 */
-       &dummy,                     /* r5 */
-       &dummy,                     /* r6 */
-       &rkey_out,                  /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
-
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,      /* r4 */
+    6,                          /* r5 */
+    pd.value,                   /* r6 */
+    0, 0, 0, 0, 0, 0);
+ outparms->handle.handle = outs[0];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada
       struct ehca_mw_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 pd_out, rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_QUERY_MW,
-       adapter_handle.handle,    /* r4 */
-       mw->ipz_mw_handle.handle, /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,                   /* r4 */
-       &dummy,                   /* r5 */
-       &dummy,                   /* r6 */
-       &rkey_out,                /* r7 */
-       &pd_out,                  /* r8 */
-       &dummy,
-       &dummy);
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_QUERY_MW, outs,
+    adapter_handle.handle,    /* r4 */
+    mw->ipz_mw_handle.handle, /* r5 */
+    0, 0, 0, 0, 0, 0, 0);
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada
 u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle,
        const struct ehca_mw *mw)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-        adapter_handle.handle,    /* r4 */
-        mw->ipz_mw_handle.handle, /* r5 */
-        0, 0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+           adapter_handle.handle,    /* r4 */
+           mw->ipz_mw_handle.handle, /* r5 */
+           0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
@@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a
         void *rblock,
         unsigned long *byte_count)
 {
- u64 dummy;
  u64 r_cb = virt_to_abs(rblock);
 
  if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a
   return H_PARAMETER;
  }
 
- return ehca_hcall_7arg_7ret(H_ERROR_DATA,
-        adapter_handle.handle,
-        ressource_handle,
-        r_cb,
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_ERROR_DATA,
+           adapter_handle.handle,
+           ressource_handle,
+           r_cb,
+           0, 0, 0, 0);
 }
diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
index 39956d8..587ebd4 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.h
+++ b/drivers/infiniband/hw/ehca/hcp_if.h
@@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc
         const u64 logical_address_of_page,
         const u64 count);
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle
       hcp_adapter_handle,
       u32 ist);
 
diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h
index f5f4871..3fc92b0 100644
--- a/drivers/infiniband/hw/ehca/hipz_hw.h
+++ b/drivers/infiniband/hw/ehca/hipz_hw.h
@@ -184,8 +184,6 @@ struct hipz_mrmwmm {
 
 };
 
-#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0)
-
 #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x)
 
 struct hipz_qpedmm {
diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
index 7e55a31..2f13509 100644
--- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
+++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
@@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_
 {
  void *ret = ipz_qeit_get(queue);
  u32 qe = *(u8 *) ret;
- if ((qe >> 7) == (queue->toggle_state & 1))
-  ipz_qeit_eq_get_inc(queue); /* this is a good one */
- else
-  ret = NULL;
+ if ((qe >> 7) != (queue->toggle_state & 1))
+  return NULL;
+ ipz_qeit_eq_get_inc(queue); /* this is a good one */
  return ret;
 }
 
-------------- next part --------------
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 159b0be..0a0248f 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -5,6 +5,7 @@
  *
  *  Authors: Heiko J Schick <schickhj at de.ibm.com>
  *           Hoang-Nam Nguyen <hnguyen at de.ibm.com>
+ *           Joachim Fenkes <fenkes at de.ibm.com>
  *
  *  Copyright (c) 2005 IBM Corporation
  *
@@ -48,7 +49,7 @@ #include "hcp_if.h"
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0015");
+MODULE_VERSION("SVNEHCA_0016");
 
 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -268,7 +269,7 @@ int ehca_register_device(struct ehca_shc
 		(1ull << IB_USER_VERBS_CMD_ATTACH_MCAST)	|
 		(1ull << IB_USER_VERBS_CMD_DETACH_MCAST);
 
-	shca->ib_device.node_type           = RDMA_NODE_IB_CA;
+	shca->ib_device.node_type           = IB_NODE_CA;
 	shca->ib_device.phys_port_cnt       = shca->num_ports;
 	shca->ib_device.dma_device          = &shca->ibmebus_dev->ofdev.dev;
 	shca->ib_device.query_device        = ehca_query_device;
@@ -446,7 +447,7 @@ static ssize_t  ehca_show_##name(struct 
 		kfree(rblock);					   	   \
 		return 0;					   	   \
 	}								   \
-									   \
+                                                                           \
 	data = rblock->name;                                               \
 	kfree(rblock);                                                     \
 									   \
@@ -749,7 +750,7 @@ int __init ehca_module_init(void)
 	int ret;
 
 	printk(KERN_INFO "eHCA Infiniband Device Driver "
-	                 "(Rel.: SVNEHCA_0015)\n");
+	                 "(Rel.: SVNEHCA_0016)\n");
 	idr_init(&ehca_qp_idr);
 	idr_init(&ehca_cq_idr);
 	spin_lock_init(&ehca_qp_idr_lock);
diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
index 260e82a..3fb46e6 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.c
+++ b/drivers/infiniband/hw/ehca/hcp_if.c
@@ -48,27 +48,27 @@ #include "hcp_phyp.h"
 #include "hipz_fns.h"
 #include "ipz_pt_fn.h"
 
-#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9,11)
-#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12,12)
-#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13,15)
-#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18,18)
-#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19,21)
-#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22,23)
-#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31,31)
-#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56,63)
-
-#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0,15)
-#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32,39)
-#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40,47)
-
-#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48,63)
-#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8,15)
-#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24,31)
-
-#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0,31)
-#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32,63)
+#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9, 11)
+#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12, 12)
+#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13, 15)
+#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18, 18)
+#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19, 21)
+#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22, 23)
+#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31, 31)
+#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56, 63)
+
+#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0, 15)
+#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32, 39)
+#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40, 47)
+
+#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48, 63)
+#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8, 15)
+#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24, 31)
+
+#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0, 31)
+#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32, 63)
 
 /* direct access qp controls */
 #define DAQP_CTRL_ENABLE    0x01
@@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu
 	}
 }
 
-static long ehca_hcall_7arg_7ret(unsigned long opcode,
-				 unsigned long arg1,
-				 unsigned long arg2,
-				 unsigned long arg3,
-				 unsigned long arg4,
-				 unsigned long arg5,
-				 unsigned long arg6,
-				 unsigned long arg7,
-				 unsigned long *out1,
-				 unsigned long *out2,
-				 unsigned long *out3,
-				 unsigned long *out4,
-				 unsigned long *out5,
-				 unsigned long *out6,
-				 unsigned long *out7)
+static long ehca_plpar_hcall_norets(unsigned long opcode,
+				    unsigned long arg1,
+				    unsigned long arg2,
+				    unsigned long arg3,
+				    unsigned long arg4,
+				    unsigned long arg5,
+				    unsigned long arg6,
+				    unsigned long arg7)
 {
 	long ret;
 	int i, sleep_msecs;
 
-	ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx "
-		     "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5,
-		     arg6, arg7);
+	ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx "
+		     "arg5=%lx arg6=%lx arg7=%lx",
+		     opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
 
 	for (i = 0; i < 5; i++) {
-		ret = plpar_hcall_7arg_7ret(opcode,
-					    arg1, arg2, arg3, arg4,
-					    arg5, arg6, arg7,
-					    out1, out2, out3, out4,
-					    out5, out6,out7);
+		ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4,
+					 arg5, arg6, arg7);
 
 		if (H_IS_LONG_BUSY(ret)) {
 			sleep_msecs = get_longbusy_msecs(ret);
@@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne
 		if (ret < H_SUCCESS)
 			ehca_gen_err("opcode=%lx ret=%lx"
 				     " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
-				     " arg5=%lx arg6=%lx arg7=%lx"
-				     " out1=%lx out2=%lx out3=%lx out4=%lx"
-				     " out5=%lx out6=%lx out7=%lx",
+				     " arg5=%lx arg6=%lx arg7=%lx ",
 				     opcode, ret,
-				     arg1, arg2, arg3, arg4,
-				     arg5, arg6, arg7,
-				     *out1, *out2, *out3, *out4,
-				     *out5, *out6, *out7);
+				     arg1, arg2, arg3, arg4, arg5,
+				     arg6, arg7);
 
-		ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
-			     "out4=%lx out5=%lx out6=%lx out7=%lx",
-			     opcode, ret, *out1, *out2, *out3, *out4, *out5,
-			     *out6, *out7);
+		ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret);
 		return ret;
+
 	}
 
 	return H_BUSY;
 }
 
-static long ehca_hcall_9arg_9ret(unsigned long opcode,
-				 unsigned long arg1,
-				 unsigned long arg2,
-				 unsigned long arg3,
-				 unsigned long arg4,
-				 unsigned long arg5,
-				 unsigned long arg6,
-				 unsigned long arg7,
-				 unsigned long arg8,
-				 unsigned long arg9,
-				 unsigned long *out1,
-				 unsigned long *out2,
-				 unsigned long *out3,
-				 unsigned long *out4,
-				 unsigned long *out5,
-				 unsigned long *out6,
-				 unsigned long *out7,
-				 unsigned long *out8,
-				 unsigned long *out9)
+static long ehca_plpar_hcall9(unsigned long opcode,
+			      unsigned long *outs, /* array of 9 outputs */
+			      unsigned long arg1,
+			      unsigned long arg2,
+			      unsigned long arg3,
+			      unsigned long arg4,
+			      unsigned long arg5,
+			      unsigned long arg6,
+			      unsigned long arg7,
+			      unsigned long arg8,
+			      unsigned long arg9)
 {
 	long ret;
 	int i, sleep_msecs;
@@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne
 		     arg8, arg9);
 
 	for (i = 0; i < 5; i++) {
-		ret = plpar_hcall_9arg_9ret(opcode,
-					    arg1, arg2, arg3, arg4,
-					    arg5, arg6, arg7, arg8,
-					    arg9,
-					    out1, out2, out3, out4,
-					    out5, out6, out7, out8,
-					    out9);
+		ret = plpar_hcall9(opcode, outs,
+				   arg1, arg2, arg3, arg4, arg5,
+				   arg6, arg7, arg8, arg9);
 
 		if (H_IS_LONG_BUSY(ret)) {
 			sleep_msecs = get_longbusy_msecs(ret);
@@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne
 				     " out5=%lx out6=%lx out7=%lx out8=%lx"
 				     " out9=%lx",
 				     opcode, ret,
-				     arg1, arg2, arg3, arg4,
-				     arg5, arg6, arg7, arg8,
-				     arg9,
-				     *out1, *out2, *out3, *out4,
-				     *out5, *out6, *out7, *out8,
-				     *out9);
+				     arg1, arg2, arg3, arg4, arg5,
+				     arg6, arg7, arg8, arg9,
+				     outs[0], outs[1], outs[2], outs[3],
+				     outs[4], outs[5], outs[6], outs[7],
+				     outs[8]);
 
 		ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
 			     "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx "
-			     "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4,
-			     *out5, *out6, *out7, *out8, *out9);
+			     "out9=%lx",
+			     opcode, ret, outs[0], outs[1], outs[2], outs[3],
+			     outs[4], outs[5], outs[6], outs[7], outs[8]);
 		return ret;
 
 	}
 
 	return H_BUSY;
 }
-
 u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle,
 			     struct ehca_pfeq *pfeq,
 			     const u32 neq_control,
 			     const u32 number_of_entries,
 			     struct ipz_eq_handle *eq_handle,
-			     u32 * act_nr_of_entries,
-			     u32 * act_pages,
-			     u32 * eq_ist)
+			     u32 *act_nr_of_entries,
+			     u32 *act_pages,
+			     u32 *eq_ist)
 {
 	u64 ret;
-	u64 dummy;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
 	u64 allocate_controls;
-	u64 act_nr_of_entries_out, act_pages_out, eq_ist_out;
 
 	/* resource type */
 	allocate_controls = 3ULL;
@@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc
 	else /* notification event queue */
 		allocate_controls = (1ULL << 63) | allocate_controls;
 
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,  /* r4 */
-				   allocate_controls,      /* r5 */
-				   number_of_entries,      /* r6 */
-				   0, 0, 0, 0,
-				   &eq_handle->handle,     /* r4 */
-				   &dummy,	           /* r5 */
-				   &dummy,	           /* r6 */
-				   &act_nr_of_entries_out, /* r7 */
-				   &act_pages_out,	   /* r8 */
-				   &eq_ist_out,            /* r8 */
-				   &dummy);
-
-	*act_nr_of_entries = (u32)act_nr_of_entries_out;
-	*act_pages         = (u32)act_pages_out;
-	*eq_ist            = (u32)eq_ist_out;
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,  /* r4 */
+				allocate_controls,      /* r5 */
+				number_of_entries,      /* r6 */
+				0, 0, 0, 0, 0, 0);
+	eq_handle->handle = outs[0];
+	*act_nr_of_entries = (u32)outs[3];
+	*act_pages = (u32)outs[4];
+	*eq_ist = (u32)outs[5];
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Not enough resource - ret=%lx ", ret);
@@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_
 		       struct ipz_eq_handle eq_handle,
 		       const u64 event_mask)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_RESET_EVENTS,
-				    adapter_handle.handle, /* r4 */
-				    eq_handle.handle,      /* r5 */
-				    event_mask,	           /* r6 */
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_RESET_EVENTS,
+				       adapter_handle.handle, /* r4 */
+				       eq_handle.handle,      /* r5 */
+				       event_mask,	      /* r6 */
+				       0, 0, 0, 0);
 }
 
 u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle,
@@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc
 			     struct ehca_alloc_cq_parms *param)
 {
 	u64 ret;
-	u64 dummy;
-	u64 act_nr_of_entries_out, act_pages_out;
-	u64 g_la_privileged_out, g_la_user_out;
-
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,     /* r4  */
-				   2,	                      /* r5  */
-				   param->eq_handle.handle,   /* r6  */
-				   cq->token,	              /* r7  */
-				   param->nr_cqe,             /* r8  */
-				   0, 0,
-				   &cq->ipz_cq_handle.handle, /* r4  */
-				   &dummy,	              /* r5  */
-				   &dummy,	              /* r6  */
-				   &act_nr_of_entries_out,    /* r7  */
-				   &act_pages_out,	      /* r8  */
-				   &g_la_privileged_out,      /* r9  */
-				   &g_la_user_out);           /* r10 */
-
-	param->act_nr_of_entries = (u32)act_nr_of_entries_out;
-	param->act_pages = (u32)act_pages_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,   /* r4  */
+				2,	                 /* r5  */
+				param->eq_handle.handle, /* r6  */
+				cq->token,	         /* r7  */
+				param->nr_cqe,           /* r8  */
+				0, 0, 0, 0);
+	cq->ipz_cq_handle.handle = outs[0];
+	param->act_nr_of_entries = (u32)outs[3];
+	param->act_pages = (u32)outs[4];
 
 	if (ret == H_SUCCESS)
-		hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out);
+		hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc
 			     struct ehca_alloc_qp_parms *parms)
 {
 	u64 ret;
-	u64 dummy, allocate_controls, max_r10_reg;
-	u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out;
+	u64 allocate_controls;
+	u64 max_r10_reg;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
 	u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1;
 	u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1;
 	int daqp_ctrl = parms->daqp_ctrl;
@@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc
 		| EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE,
 				 parms->max_recv_sge);
 
-
-	ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,	      /* r4  */
-				   allocate_controls,	              /* r5  */
-				   qp->send_cq->ipz_cq_handle.handle,
-				   qp->recv_cq->ipz_cq_handle.handle,
-				   parms->ipz_eq_handle.handle,
-				   ((u64)qp->token << 32) | parms->pd.value,
-				   max_r10_reg,	                      /* r10 */
-				   parms->ud_av_l_key_ctl,            /* r11 */
-				   0,
-				   &qp->ipz_qp_handle.handle,
-				   &qp_nr_out,	                      /* r5  */
-				   &r6_out,	                      /* r6  */
-				   &r7_out,	                      /* r7  */
-				   &r8_out,	                      /* r8  */
-				   &dummy,	                      /* r9  */
-				   &g_la_user_out,	              /* r10 */
-				   &r11_out,
-				   &dummy);
-
-	/* extract outputs */
-	qp->real_qp_num = (u32)qp_nr_out;
-
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,	           /* r4  */
+				allocate_controls,	           /* r5  */
+				qp->send_cq->ipz_cq_handle.handle,
+				qp->recv_cq->ipz_cq_handle.handle,
+				parms->ipz_eq_handle.handle,
+				((u64)qp->token << 32) | parms->pd.value,
+				max_r10_reg,	                   /* r10 */
+				parms->ud_av_l_key_ctl,            /* r11 */
+				0);
+	qp->ipz_qp_handle.handle = outs[0];
+	qp->real_qp_num = (u32)outs[1];
 	parms->act_nr_send_sges =
-		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out);
+		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]);
 	parms->act_nr_recv_wqes =
-		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out);
+		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]);
 	parms->act_nr_send_sges =
-		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out);
+		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]);
 	parms->act_nr_recv_sges =
-		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out);
+		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]);
 	parms->nr_sq_pages =
-		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out);
+		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]);
 	parms->nr_rq_pages =
-		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out);
+		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]);
 
 	if (ret == H_SUCCESS)
-		hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out);
+		hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
-		ehca_gen_err("Not enough resources. ret=%lx",ret);
+		ehca_gen_err("Not enough resources. ret=%lx", ret);
 
 	return ret;
 }
@@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a
 		      struct hipz_query_port *query_port_response_block)
 {
 	u64 ret;
-	u64 dummy;
 	u64 r_cb = virt_to_abs(query_port_response_block);
 
 	if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a
 		return H_PARAMETER;
 	}
 
-	ret = ehca_hcall_7arg_7ret(H_QUERY_PORT,
-				   adapter_handle.handle, /* r4 */
-				   port_id,	          /* r5 */
-				   r_cb,	          /* r6 */
-				   0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall_norets(H_QUERY_PORT,
+				      adapter_handle.handle, /* r4 */
+				      port_id,	             /* r5 */
+				      r_cb,	             /* r6 */
+				      0, 0, 0, 0);
 
 	if (ehca_debug_level)
 		ehca_dmp(query_port_response_block, 64, "response_block");
@@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a
 u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
 		     struct hipz_query_hca *query_hca_rblock)
 {
-	u64 dummy;
 	u64 r_cb = virt_to_abs(query_hca_rblock);
 
 	if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad
 		return H_PARAMETER;
 	}
 
-	return ehca_hcall_7arg_7ret(H_QUERY_HCA,
-				    adapter_handle.handle, /* r4 */
-				    r_cb,                  /* r5 */
-				    0, 0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_QUERY_HCA,
+				       adapter_handle.handle, /* r4 */
+				       r_cb,                  /* r5 */
+				       0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle,
@@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i
 			  const u64 logical_address_of_page,
 			  u64 count)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES,
-				    adapter_handle.handle,      /* r4  */
-				    queue_type | pagesize << 8, /* r5  */
-				    resource_handle,	        /* r6  */
-				    logical_address_of_page,    /* r7  */
-				    count,	                /* r8  */
-				    0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_REGISTER_RPAGES,
+				       adapter_handle.handle,      /* r4  */
+				       queue_type | pagesize << 8, /* r5  */
+				       resource_handle,	           /* r6  */
+				       logical_address_of_page,    /* r7  */
+				       count,	                   /* r8  */
+				       0, 0);
 }
 
 u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle,
@@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc
 				     logical_address_of_page, count);
 }
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
 			   u32 ist)
 {
-	u32 ret;
-	u64 dummy;
-
-	ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE,
-				   adapter_handle.handle, /* r4 */
-				   ist,                   /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	u64 ret;
+	ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE,
+				      adapter_handle.handle, /* r4 */
+				      ist,                   /* r5 */
+				      0, 0, 0, 0, 0);
 
 	if (ret != H_SUCCESS && ret != H_BUSY)
 		ehca_gen_err("Could not query interrupt state.");
@@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str
 			       void **log_addr_next_rq_wqe2processed,
 			       int dis_and_get_function_code)
 {
-	u64 dummy, dummy1, dummy2;
-
-	if (!log_addr_next_sq_wqe2processed)
-		log_addr_next_sq_wqe2processed = (void**)&dummy1;
-	if (!log_addr_next_rq_wqe2processed)
-		log_addr_next_rq_wqe2processed = (void**)&dummy2;
-
-	return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-				    adapter_handle.handle,     /* r4 */
-				    dis_and_get_function_code, /* r5 */
-				    qp_handle.handle,	       /* r6 */
-				    0, 0, 0, 0,
-				    (void*)log_addr_next_sq_wqe2processed,
-				    (void*)log_addr_next_rq_wqe2processed,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	u64 ret;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+				adapter_handle.handle,     /* r4 */
+				dis_and_get_function_code, /* r5 */
+				qp_handle.handle,	   /* r6 */
+				0, 0, 0, 0, 0, 0);
+	if (log_addr_next_sq_wqe2processed)
+		*log_addr_next_sq_wqe2processed = (void*)outs[0];
+	if (log_addr_next_rq_wqe2processed)
+		*log_addr_next_rq_wqe2processed = (void*)outs[1];
+
+	return ret;
 }
 
 u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle,
@@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad
 		     struct h_galpa gal)
 {
 	u64 ret;
-	u64 dummy;
-	u64 invalid_attribute_identifier, rc_attrib_mask;
-
-	ret = ehca_hcall_7arg_7ret(H_MODIFY_QP,
-				   adapter_handle.handle,         /* r4 */
-				   qp_handle.handle,	          /* r5 */
-				   update_mask,	                  /* r6 */
-				   virt_to_abs(mqpcb),	          /* r7 */
-				   0, 0, 0,
-				   &invalid_attribute_identifier, /* r4 */
-				   &dummy,	                  /* r5 */
-				   &dummy,	                  /* r6 */
-				   &dummy,                        /* r7 */
-				   &dummy,	                  /* r8 */
-				   &rc_attrib_mask,               /* r9 */
-				   &dummy);
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+	ret = ehca_plpar_hcall9(H_MODIFY_QP, outs,
+				adapter_handle.handle, /* r4 */
+				qp_handle.handle,      /* r5 */
+				update_mask,	       /* r6 */
+				virt_to_abs(mqpcb),    /* r7 */
+				0, 0, 0, 0, 0);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Insufficient resources ret=%lx", ret);
@@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada
 		    struct hcp_modify_qp_control_block *qqpcb,
 		    struct h_galpa gal)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_QUERY_QP,
-				    adapter_handle.handle, /* r4 */
-				    qp_handle.handle,      /* r5 */
-				    virt_to_abs(qqpcb),	   /* r6 */
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_QUERY_QP,
+				       adapter_handle.handle, /* r4 */
+				       qp_handle.handle,      /* r5 */
+				       virt_to_abs(qqpcb),    /* r6 */
+				       0, 0, 0, 0);
 }
 
 u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle,
 		      struct ehca_qp *qp)
 {
 	u64 ret;
-	u64 dummy;
-	u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
 
 	ret = hcp_galpas_dtor(&qp->galpas);
 	if (ret) {
 		ehca_gen_err("Could not destruct qp->galpas");
 		return H_RESOURCE;
 	}
-	ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-				   adapter_handle.handle,     /* r4 */
-				   /* function code */
-				   1,	                      /* r5 */
-				   qp->ipz_qp_handle.handle,  /* r6 */
-				   0, 0, 0, 0,
-				   &ladr_next_sq_wqe_out,     /* r4 */
-				   &ladr_next_rq_wqe_out,     /* r5 */
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+				adapter_handle.handle,     /* r4 */
+				/* function code */
+				1,	                   /* r5 */
+				qp->ipz_qp_handle.handle,  /* r6 */
+				0, 0, 0, 0, 0, 0);
 	if (ret == H_HARDWARE)
 		ehca_gen_err("HCA not operational. ret=%lx", ret);
 
-	ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				   adapter_handle.handle,     /* r4 */
-				   qp->ipz_qp_handle.handle,  /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				      adapter_handle.handle,     /* r4 */
+				      qp->ipz_qp_handle.handle,  /* r5 */
+				      0, 0, 0, 0, 0);
 
 	if (ret == H_RESOURCE)
 		ehca_gen_err("Resource still in use. ret=%lx", ret);
@@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_
 		       struct h_galpa gal,
 		       u32 port)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_DEFINE_AQP0,
-				    adapter_handle.handle, /* r4 */
-				    qp_handle.handle,      /* r5 */
-				    port,                  /* r6 */
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_DEFINE_AQP0,
+				       adapter_handle.handle, /* r4 */
+				       qp_handle.handle,      /* r5 */
+				       port,                  /* r6 */
+				       0, 0, 0, 0);
 }
 
 u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle,
@@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_
 		       u32 * bma_qp_nr)
 {
 	u64 ret;
-	u64 dummy;
-	u64 pma_qp_nr_out, bma_qp_nr_out;
-
-	ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1,
-				   adapter_handle.handle, /* r4 */
-				   qp_handle.handle,      /* r5 */
-				   port,	          /* r6 */
-				   0, 0, 0, 0,
-				   &pma_qp_nr_out,        /* r4 */
-				   &bma_qp_nr_out,        /* r5 */
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
-	*pma_qp_nr = (u32)pma_qp_nr_out;
-	*bma_qp_nr = (u32)bma_qp_nr_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs,
+				adapter_handle.handle, /* r4 */
+				qp_handle.handle,      /* r5 */
+				port,	               /* r6 */
+				0, 0, 0, 0, 0, 0);
+	*pma_qp_nr = (u32)outs[0];
+	*bma_qp_nr = (u32)outs[1];
 
 	if (ret == H_ALIAS_EXIST)
 		ehca_gen_err("AQP1 already exists. ret=%lx", ret);
@@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_
 		       u64 subnet_prefix, u64 interface_id)
 {
 	u64 ret;
-	u64 dummy;
-
-	ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP,
-				   adapter_handle.handle,     /* r4 */
-				   qp_handle.handle,          /* r5 */
-				   mcg_dlid,                  /* r6 */
-				   interface_id,              /* r7 */
-				   subnet_prefix,             /* r8 */
-				   0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+
+	ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP,
+				      adapter_handle.handle,  /* r4 */
+				      qp_handle.handle,       /* r5 */
+				      mcg_dlid,               /* r6 */
+				      interface_id,           /* r7 */
+				      subnet_prefix,          /* r8 */
+				      0, 0);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_
 		       u16 mcg_dlid,
 		       u64 subnet_prefix, u64 interface_id)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_DETACH_MCQP,
-				    adapter_handle.handle, /* r4 */
-				    qp_handle.handle,	   /* r5 */
-				    mcg_dlid,	           /* r6 */
-				    interface_id,          /* r7 */
-				    subnet_prefix,         /* r8 */
-				    0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_DETACH_MCQP,
+				       adapter_handle.handle, /* r4 */
+				       qp_handle.handle,      /* r5 */
+				       mcg_dlid,              /* r6 */
+				       interface_id,          /* r7 */
+				       subnet_prefix,         /* r8 */
+				       0, 0);
 }
 
 u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle,
@@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a
 		      u8 force_flag)
 {
 	u64 ret;
-	u64 dummy;
 
 	ret = hcp_galpas_dtor(&cq->galpas);
 	if (ret) {
@@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a
 		return H_RESOURCE;
 	}
 
-	ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				   adapter_handle.handle,     /* r4 */
-				   cq->ipz_cq_handle.handle,  /* r5 */
-				   force_flag != 0 ? 1L : 0L, /* r6 */
-				   0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				      adapter_handle.handle,     /* r4 */
+				      cq->ipz_cq_handle.handle,  /* r5 */
+				      force_flag != 0 ? 1L : 0L, /* r6 */
+				      0, 0, 0, 0);
 
 	if (ret == H_RESOURCE)
 		ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret);
@@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a
 		      struct ehca_eq *eq)
 {
 	u64 ret;
-	u64 dummy;
 
 	ret = hcp_galpas_dtor(&eq->galpas);
 	if (ret) {
@@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a
 		return H_RESOURCE;
 	}
 
-	ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				   adapter_handle.handle,     /* r4 */
-				   eq->ipz_eq_handle.handle,  /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
+	ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				      adapter_handle.handle,     /* r4 */
+				      eq->ipz_eq_handle.handle,  /* r5 */
+				      0, 0, 0, 0, 0);
 
 	if (ret == H_RESOURCE)
 		ehca_gen_err("Resource in use. ret=%lx ", ret);
@@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc
 			     struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 lkey_out;
-	u64 rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,            /* r4 */
-				   5,                                /* r5 */
-				   vaddr,                            /* r6 */
-				   length,                           /* r7 */
-				   (((u64)access_ctrl) << 32ULL),    /* r8 */
-				   pd.value,                         /* r9 */
-				   0,
-				   &(outparms->handle.handle),       /* r4 */
-				   &dummy,                           /* r5 */
-				   &lkey_out,                        /* r6 */
-				   &rkey_out,                        /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-	outparms->lkey = (u32)lkey_out;
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,            /* r4 */
+				5,                                /* r5 */
+				vaddr,                            /* r6 */
+				length,                           /* r7 */
+				(((u64)access_ctrl) << 32ULL),    /* r8 */
+				pd.value,                         /* r9 */
+				0, 0, 0);
+	outparms->handle.handle = outs[0];
+	outparms->lkey = (u32)outs[2];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc
 					    queue_type,
 					    mr->ipz_mr_handle.handle,
 					    logical_address_of_page, count);
-
 	return ret;
 }
 
@@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada
 		    struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out;
-
-	ret = ehca_hcall_7arg_7ret(H_QUERY_MR,
-				   adapter_handle.handle,     /* r4 */
-				   mr->ipz_mr_handle.handle,  /* r5 */
-				   0, 0, 0, 0, 0,
-				   &outparms->len,            /* r4 */
-				   &outparms->vaddr,          /* r5 */
-				   &remote_len_out,           /* r6 */
-				   &remote_vaddr_out,         /* r7 */
-				   &acc_ctrl_pd_out,          /* r8 */
-				   &r9_out,
-				   &dummy);
-
-	outparms->acl  = acc_ctrl_pd_out >> 32;
-	outparms->lkey = (u32)(r9_out >> 32);
-	outparms->rkey = (u32)(r9_out & (0xffffffff));
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_QUERY_MR, outs,
+				adapter_handle.handle,     /* r4 */
+				mr->ipz_mr_handle.handle,  /* r5 */
+				0, 0, 0, 0, 0, 0, 0);
+	outparms->len = outs[0];
+	outparms->vaddr = outs[1];
+	outparms->acl  = outs[4] >> 32;
+	outparms->lkey = (u32)(outs[5] >> 32);
+	outparms->rkey = (u32)(outs[5] & (0xffffffff));
 
 	return ret;
 }
@@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada
 u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle,
 			    const struct ehca_mr *mr)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				    adapter_handle.handle,    /* r4 */
-				    mr->ipz_mr_handle.handle, /* r5 */
-				    0, 0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				       adapter_handle.handle,    /* r4 */
+				       mr->ipz_mr_handle.handle, /* r5 */
+				       0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle,
@@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i
 			  struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 lkey_out, rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR,
-				   adapter_handle.handle,    /* r4 */
-				   mr->ipz_mr_handle.handle, /* r5 */
-				   vaddr_in,	             /* r6 */
-				   length,                   /* r7 */
-				   /* r8 */
-				   ((((u64)access_ctrl) << 32ULL) | pd.value),
-				   mr_addr_cb,               /* r9 */
-				   0,
-				   &dummy,                   /* r4 */
-				   &outparms->vaddr,         /* r5 */
-				   &lkey_out,                /* r6 */
-				   &rkey_out,                /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
-	outparms->lkey = (u32)lkey_out;
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs,
+				adapter_handle.handle,    /* r4 */
+				mr->ipz_mr_handle.handle, /* r5 */
+				vaddr_in,	          /* r6 */
+				length,                   /* r7 */
+				/* r8 */
+				((((u64)access_ctrl) << 32ULL) | pd.value),
+				mr_addr_cb,               /* r9 */
+				0, 0, 0);
+	outparms->vaddr = outs[1];
+	outparms->lkey = (u32)outs[2];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz
 			struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 lkey_out, rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR,
-				   adapter_handle.handle,            /* r4 */
-				   orig_mr->ipz_mr_handle.handle,    /* r5 */
-				   vaddr_in,                         /* r6 */
-				   (((u64)access_ctrl) << 32ULL),    /* r7 */
-				   pd.value,                         /* r8 */
-				   0, 0,
-				   &(outparms->handle.handle),       /* r4 */
-				   &dummy,                           /* r5 */
-				   &lkey_out,                        /* r6 */
-				   &rkey_out,                        /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-	outparms->lkey = (u32)lkey_out;
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs,
+				adapter_handle.handle,            /* r4 */
+				orig_mr->ipz_mr_handle.handle,    /* r5 */
+				vaddr_in,                         /* r6 */
+				(((u64)access_ctrl) << 32ULL),    /* r7 */
+				pd.value,                         /* r8 */
+				0, 0, 0, 0);
+	outparms->handle.handle = outs[0];
+	outparms->lkey = (u32)outs[2];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc
 			     struct ehca_mw_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,      /* r4 */
-				   6,                          /* r5 */
-				   pd.value,                   /* r6 */
-				   0, 0, 0, 0,
-				   &(outparms->handle.handle), /* r4 */
-				   &dummy,                     /* r5 */
-				   &dummy,                     /* r6 */
-				   &rkey_out,                  /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,      /* r4 */
+				6,                          /* r5 */
+				pd.value,                   /* r6 */
+				0, 0, 0, 0, 0, 0);
+	outparms->handle.handle = outs[0];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada
 		    struct ehca_mw_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 pd_out, rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_QUERY_MW,
-				   adapter_handle.handle,    /* r4 */
-				   mw->ipz_mw_handle.handle, /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,                   /* r4 */
-				   &dummy,                   /* r5 */
-				   &dummy,                   /* r6 */
-				   &rkey_out,                /* r7 */
-				   &pd_out,                  /* r8 */
-				   &dummy,
-				   &dummy);
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_QUERY_MW, outs,
+				adapter_handle.handle,    /* r4 */
+				mw->ipz_mw_handle.handle, /* r5 */
+				0, 0, 0, 0, 0, 0, 0);
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada
 u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle,
 			    const struct ehca_mw *mw)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				    adapter_handle.handle,    /* r4 */
-				    mw->ipz_mw_handle.handle, /* r5 */
-				    0, 0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				       adapter_handle.handle,    /* r4 */
+				       mw->ipz_mw_handle.handle, /* r5 */
+				       0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
@@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a
 		      void *rblock,
 		      unsigned long *byte_count)
 {
-	u64 dummy;
 	u64 r_cb = virt_to_abs(rblock);
 
 	if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a
 		return H_PARAMETER;
 	}
 
-	return ehca_hcall_7arg_7ret(H_ERROR_DATA,
-				    adapter_handle.handle,
-				    ressource_handle,
-				    r_cb,
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_ERROR_DATA,
+				       adapter_handle.handle,
+				       ressource_handle,
+				       r_cb,
+				       0, 0, 0, 0);
 }
diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
index 39956d8..587ebd4 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.h
+++ b/drivers/infiniband/hw/ehca/hcp_if.h
@@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc
 			     const u64 logical_address_of_page,
 			     const u64 count);
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle
 			   hcp_adapter_handle,
 			   u32 ist);
 
diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h
index f5f4871..3fc92b0 100644
--- a/drivers/infiniband/hw/ehca/hipz_hw.h
+++ b/drivers/infiniband/hw/ehca/hipz_hw.h
@@ -184,8 +184,6 @@ struct hipz_mrmwmm {
 
 };
 
-#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0)
-
 #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x)
 
 struct hipz_qpedmm {
diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
index 7e55a31..2f13509 100644
--- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
+++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
@@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_
 {
 	void *ret = ipz_qeit_get(queue);
 	u32 qe = *(u8 *) ret;
-	if ((qe >> 7) == (queue->toggle_state & 1))
-		ipz_qeit_eq_get_inc(queue); /* this is a good one */
-	else
-		ret = NULL;
+	if ((qe >> 7) != (queue->toggle_state & 1))
+		return NULL;
+	ipz_qeit_eq_get_inc(queue); /* this is a good one */
 	return ret;
 }
 

From rdreier at cisco.com  Fri Sep 22 08:27:49 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Sep 2006 08:27:49 -0700
Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface
 based on Anton Blanchard's new hvcall interface
In-Reply-To: <200609221720.24191.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's
	message of "Fri, 22 Sep 2006 17:20:23 +0200")
References: <200609221720.24191.hnguyen@de.ibm.com>
Message-ID: <ada7izvhqfe.fsf@cisco.com>

 > - shca->ib_device.node_type           = RDMA_NODE_IB_CA;
 > + shca->ib_device.node_type           = IB_NODE_CA;

Did you test this at all?  I can't see how this would build against my
for-2.6.19 tree...

Please resend a patch that you know is working.

 - R.


From hnguyen at de.ibm.com  Fri Sep 22 13:00:12 2006
From: hnguyen at de.ibm.com (Hoang-Nam Nguyen)
Date: Fri, 22 Sep 2006 22:00:12 +0200
Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface
 based on Anton Blanchard's new hvcall interface
Message-ID: <200609222200.12722.hnguyen@de.ibm.com>

> - shca->ib_device.node_type           = RDMA_NODE_IB_CA;
> + shca->ib_device.node_type           = IB_NODE_CA;
My mistake, I tested against Paul's git tree only and then used a wrong patch script,
which exchanged those defines.
This time I did all manually and tested also against your git tree with Anton's patch
http://ozlabs.org/pipermail/linuxppc-dev/2006-July/024556.html. 
Thanks!
Nam Nguyen


Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---


 ehca_main.c |    5 
 hcp_if.c    |  845 ++++++++++++++++++++----------------------------------------
 hcp_if.h    |    2 
 hipz_hw.h   |    2 
 ipz_pt_fn.h |    7 
 5 files changed, 298 insertions(+), 563 deletions(-)


diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 159b0be..2380994 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -5,6 +5,7 @@
  *
  *  Authors: Heiko J Schick <schickhj at de.ibm.com>
  *           Hoang-Nam Nguyen <hnguyen at de.ibm.com>
+ *           Joachim Fenkes <fenkes at de.ibm.com>
  *
  *  Copyright (c) 2005 IBM Corporation
  *
@@ -48,7 +49,7 @@ #include "hcp_if.h"
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0015");
+MODULE_VERSION("SVNEHCA_0016");
 
 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -749,7 +750,7 @@ int __init ehca_module_init(void)
  int ret;
 
  printk(KERN_INFO "eHCA Infiniband Device Driver "
-                  "(Rel.: SVNEHCA_0015)\n");
+                  "(Rel.: SVNEHCA_0016)\n");
  idr_init(&ehca_qp_idr);
  idr_init(&ehca_cq_idr);
  spin_lock_init(&ehca_qp_idr_lock);
diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
index 260e82a..3fb46e6 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.c
+++ b/drivers/infiniband/hw/ehca/hcp_if.c
@@ -48,27 +48,27 @@ #include "hcp_phyp.h"
 #include "hipz_fns.h"
 #include "ipz_pt_fn.h"
 
-#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9,11)
-#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12,12)
-#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13,15)
-#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18,18)
-#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19,21)
-#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22,23)
-#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31,31)
-#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56,63)
-
-#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0,15)
-#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32,39)
-#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40,47)
-
-#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48,63)
-#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8,15)
-#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24,31)
-
-#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0,31)
-#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32,63)
+#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9, 11)
+#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12, 12)
+#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13, 15)
+#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18, 18)
+#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19, 21)
+#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22, 23)
+#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31, 31)
+#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56, 63)
+
+#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0, 15)
+#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32, 39)
+#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40, 47)
+
+#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48, 63)
+#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8, 15)
+#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24, 31)
+
+#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0, 31)
+#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32, 63)
 
 /* direct access qp controls */
 #define DAQP_CTRL_ENABLE    0x01
@@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu
  }
 }
 
-static long ehca_hcall_7arg_7ret(unsigned long opcode,
-     unsigned long arg1,
-     unsigned long arg2,
-     unsigned long arg3,
-     unsigned long arg4,
-     unsigned long arg5,
-     unsigned long arg6,
-     unsigned long arg7,
-     unsigned long *out1,
-     unsigned long *out2,
-     unsigned long *out3,
-     unsigned long *out4,
-     unsigned long *out5,
-     unsigned long *out6,
-     unsigned long *out7)
+static long ehca_plpar_hcall_norets(unsigned long opcode,
+        unsigned long arg1,
+        unsigned long arg2,
+        unsigned long arg3,
+        unsigned long arg4,
+        unsigned long arg5,
+        unsigned long arg6,
+        unsigned long arg7)
 {
  long ret;
  int i, sleep_msecs;
 
- ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx "
-       "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5,
-       arg6, arg7);
+ ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx "
+       "arg5=%lx arg6=%lx arg7=%lx",
+       opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
 
  for (i = 0; i < 5; i++) {
-  ret = plpar_hcall_7arg_7ret(opcode,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7,
-         out1, out2, out3, out4,
-         out5, out6,out7);
+  ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4,
+      arg5, arg6, arg7);
 
   if (H_IS_LONG_BUSY(ret)) {
    sleep_msecs = get_longbusy_msecs(ret);
@@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne
   if (ret < H_SUCCESS)
    ehca_gen_err("opcode=%lx ret=%lx"
          " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
-         " arg5=%lx arg6=%lx arg7=%lx"
-         " out1=%lx out2=%lx out3=%lx out4=%lx"
-         " out5=%lx out6=%lx out7=%lx",
+         " arg5=%lx arg6=%lx arg7=%lx ",
          opcode, ret,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7,
-         *out1, *out2, *out3, *out4,
-         *out5, *out6, *out7);
+         arg1, arg2, arg3, arg4, arg5,
+         arg6, arg7);
 
-  ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
-        "out4=%lx out5=%lx out6=%lx out7=%lx",
-        opcode, ret, *out1, *out2, *out3, *out4, *out5,
-        *out6, *out7);
+  ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret);
   return ret;
+
  }
 
  return H_BUSY;
 }
 
-static long ehca_hcall_9arg_9ret(unsigned long opcode,
-     unsigned long arg1,
-     unsigned long arg2,
-     unsigned long arg3,
-     unsigned long arg4,
-     unsigned long arg5,
-     unsigned long arg6,
-     unsigned long arg7,
-     unsigned long arg8,
-     unsigned long arg9,
-     unsigned long *out1,
-     unsigned long *out2,
-     unsigned long *out3,
-     unsigned long *out4,
-     unsigned long *out5,
-     unsigned long *out6,
-     unsigned long *out7,
-     unsigned long *out8,
-     unsigned long *out9)
+static long ehca_plpar_hcall9(unsigned long opcode,
+         unsigned long *outs, /* array of 9 outputs */
+         unsigned long arg1,
+         unsigned long arg2,
+         unsigned long arg3,
+         unsigned long arg4,
+         unsigned long arg5,
+         unsigned long arg6,
+         unsigned long arg7,
+         unsigned long arg8,
+         unsigned long arg9)
 {
  long ret;
  int i, sleep_msecs;
@@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne
        arg8, arg9);
 
  for (i = 0; i < 5; i++) {
-  ret = plpar_hcall_9arg_9ret(opcode,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7, arg8,
-         arg9,
-         out1, out2, out3, out4,
-         out5, out6, out7, out8,
-         out9);
+  ret = plpar_hcall9(opcode, outs,
+       arg1, arg2, arg3, arg4, arg5,
+       arg6, arg7, arg8, arg9);
 
   if (H_IS_LONG_BUSY(ret)) {
    sleep_msecs = get_longbusy_msecs(ret);
@@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne
          " out5=%lx out6=%lx out7=%lx out8=%lx"
          " out9=%lx",
          opcode, ret,
-         arg1, arg2, arg3, arg4,
-         arg5, arg6, arg7, arg8,
-         arg9,
-         *out1, *out2, *out3, *out4,
-         *out5, *out6, *out7, *out8,
-         *out9);
+         arg1, arg2, arg3, arg4, arg5,
+         arg6, arg7, arg8, arg9,
+         outs[0], outs[1], outs[2], outs[3],
+         outs[4], outs[5], outs[6], outs[7],
+         outs[8]);
 
   ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
         "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx "
-        "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4,
-        *out5, *out6, *out7, *out8, *out9);
+        "out9=%lx",
+        opcode, ret, outs[0], outs[1], outs[2], outs[3],
+        outs[4], outs[5], outs[6], outs[7], outs[8]);
   return ret;
 
  }
 
  return H_BUSY;
 }
-
 u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle,
         struct ehca_pfeq *pfeq,
         const u32 neq_control,
         const u32 number_of_entries,
         struct ipz_eq_handle *eq_handle,
-        u32 * act_nr_of_entries,
-        u32 * act_pages,
-        u32 * eq_ist)
+        u32 *act_nr_of_entries,
+        u32 *act_pages,
+        u32 *eq_ist)
 {
  u64 ret;
- u64 dummy;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
  u64 allocate_controls;
- u64 act_nr_of_entries_out, act_pages_out, eq_ist_out;
 
  /* resource type */
  allocate_controls = 3ULL;
@@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc
  else /* notification event queue */
   allocate_controls = (1ULL << 63) | allocate_controls;
 
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,  /* r4 */
-       allocate_controls,      /* r5 */
-       number_of_entries,      /* r6 */
-       0, 0, 0, 0,
-       &eq_handle->handle,     /* r4 */
-       &dummy,            /* r5 */
-       &dummy,            /* r6 */
-       &act_nr_of_entries_out, /* r7 */
-       &act_pages_out,    /* r8 */
-       &eq_ist_out,            /* r8 */
-       &dummy);
-
- *act_nr_of_entries = (u32)act_nr_of_entries_out;
- *act_pages         = (u32)act_pages_out;
- *eq_ist            = (u32)eq_ist_out;
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,  /* r4 */
+    allocate_controls,      /* r5 */
+    number_of_entries,      /* r6 */
+    0, 0, 0, 0, 0, 0);
+ eq_handle->handle = outs[0];
+ *act_nr_of_entries = (u32)outs[3];
+ *act_pages = (u32)outs[4];
+ *eq_ist = (u32)outs[5];
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Not enough resource - ret=%lx ", ret);
@@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_
          struct ipz_eq_handle eq_handle,
          const u64 event_mask)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_RESET_EVENTS,
-        adapter_handle.handle, /* r4 */
-        eq_handle.handle,      /* r5 */
-        event_mask,            /* r6 */
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_RESET_EVENTS,
+           adapter_handle.handle, /* r4 */
+           eq_handle.handle,      /* r5 */
+           event_mask,       /* r6 */
+           0, 0, 0, 0);
 }
 
 u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle,
@@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc
         struct ehca_alloc_cq_parms *param)
 {
  u64 ret;
- u64 dummy;
- u64 act_nr_of_entries_out, act_pages_out;
- u64 g_la_privileged_out, g_la_user_out;
-
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,     /* r4  */
-       2,                       /* r5  */
-       param->eq_handle.handle,   /* r6  */
-       cq->token,               /* r7  */
-       param->nr_cqe,             /* r8  */
-       0, 0,
-       &cq->ipz_cq_handle.handle, /* r4  */
-       &dummy,               /* r5  */
-       &dummy,               /* r6  */
-       &act_nr_of_entries_out,    /* r7  */
-       &act_pages_out,       /* r8  */
-       &g_la_privileged_out,      /* r9  */
-       &g_la_user_out);           /* r10 */
-
- param->act_nr_of_entries = (u32)act_nr_of_entries_out;
- param->act_pages = (u32)act_pages_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,   /* r4  */
+    2,                  /* r5  */
+    param->eq_handle.handle, /* r6  */
+    cq->token,          /* r7  */
+    param->nr_cqe,           /* r8  */
+    0, 0, 0, 0);
+ cq->ipz_cq_handle.handle = outs[0];
+ param->act_nr_of_entries = (u32)outs[3];
+ param->act_pages = (u32)outs[4];
 
  if (ret == H_SUCCESS)
-  hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out);
+  hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc
         struct ehca_alloc_qp_parms *parms)
 {
  u64 ret;
- u64 dummy, allocate_controls, max_r10_reg;
- u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out;
+ u64 allocate_controls;
+ u64 max_r10_reg;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
  u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1;
  u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1;
  int daqp_ctrl = parms->daqp_ctrl;
@@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc
   | EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE,
      parms->max_recv_sge);
 
-
- ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,       /* r4  */
-       allocate_controls,               /* r5  */
-       qp->send_cq->ipz_cq_handle.handle,
-       qp->recv_cq->ipz_cq_handle.handle,
-       parms->ipz_eq_handle.handle,
-       ((u64)qp->token << 32) | parms->pd.value,
-       max_r10_reg,                       /* r10 */
-       parms->ud_av_l_key_ctl,            /* r11 */
-       0,
-       &qp->ipz_qp_handle.handle,
-       &qp_nr_out,                       /* r5  */
-       &r6_out,                       /* r6  */
-       &r7_out,                       /* r7  */
-       &r8_out,                       /* r8  */
-       &dummy,                       /* r9  */
-       &g_la_user_out,               /* r10 */
-       &r11_out,
-       &dummy);
-
- /* extract outputs */
- qp->real_qp_num = (u32)qp_nr_out;
-
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,            /* r4  */
+    allocate_controls,            /* r5  */
+    qp->send_cq->ipz_cq_handle.handle,
+    qp->recv_cq->ipz_cq_handle.handle,
+    parms->ipz_eq_handle.handle,
+    ((u64)qp->token << 32) | parms->pd.value,
+    max_r10_reg,                    /* r10 */
+    parms->ud_av_l_key_ctl,            /* r11 */
+    0);
+ qp->ipz_qp_handle.handle = outs[0];
+ qp->real_qp_num = (u32)outs[1];
  parms->act_nr_send_sges =
-  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out);
+  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]);
  parms->act_nr_recv_wqes =
-  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out);
+  (u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]);
  parms->act_nr_send_sges =
-  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out);
+  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]);
  parms->act_nr_recv_sges =
-  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out);
+  (u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]);
  parms->nr_sq_pages =
-  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out);
+  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]);
  parms->nr_rq_pages =
-  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out);
+  (u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]);
 
  if (ret == H_SUCCESS)
-  hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out);
+  hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
-  ehca_gen_err("Not enough resources. ret=%lx",ret);
+  ehca_gen_err("Not enough resources. ret=%lx", ret);
 
  return ret;
 }
@@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a
         struct hipz_query_port *query_port_response_block)
 {
  u64 ret;
- u64 dummy;
  u64 r_cb = virt_to_abs(query_port_response_block);
 
  if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a
   return H_PARAMETER;
  }
 
- ret = ehca_hcall_7arg_7ret(H_QUERY_PORT,
-       adapter_handle.handle, /* r4 */
-       port_id,           /* r5 */
-       r_cb,           /* r6 */
-       0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall_norets(H_QUERY_PORT,
+          adapter_handle.handle, /* r4 */
+          port_id,              /* r5 */
+          r_cb,              /* r6 */
+          0, 0, 0, 0);
 
  if (ehca_debug_level)
   ehca_dmp(query_port_response_block, 64, "response_block");
@@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a
 u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
        struct hipz_query_hca *query_hca_rblock)
 {
- u64 dummy;
  u64 r_cb = virt_to_abs(query_hca_rblock);
 
  if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad
   return H_PARAMETER;
  }
 
- return ehca_hcall_7arg_7ret(H_QUERY_HCA,
-        adapter_handle.handle, /* r4 */
-        r_cb,                  /* r5 */
-        0, 0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_QUERY_HCA,
+           adapter_handle.handle, /* r4 */
+           r_cb,                  /* r5 */
+           0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle,
@@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i
      const u64 logical_address_of_page,
      u64 count)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES,
-        adapter_handle.handle,      /* r4  */
-        queue_type | pagesize << 8, /* r5  */
-        resource_handle,         /* r6  */
-        logical_address_of_page,    /* r7  */
-        count,                 /* r8  */
-        0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_REGISTER_RPAGES,
+           adapter_handle.handle,      /* r4  */
+           queue_type | pagesize << 8, /* r5  */
+           resource_handle,            /* r6  */
+           logical_address_of_page,    /* r7  */
+           count,                    /* r8  */
+           0, 0);
 }
 
 u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle,
@@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc
          logical_address_of_page, count);
 }
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
       u32 ist)
 {
- u32 ret;
- u64 dummy;
-
- ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE,
-       adapter_handle.handle, /* r4 */
-       ist,                   /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ u64 ret;
+ ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE,
+          adapter_handle.handle, /* r4 */
+          ist,                   /* r5 */
+          0, 0, 0, 0, 0);
 
  if (ret != H_SUCCESS && ret != H_BUSY)
   ehca_gen_err("Could not query interrupt state.");
@@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str
           void **log_addr_next_rq_wqe2processed,
           int dis_and_get_function_code)
 {
- u64 dummy, dummy1, dummy2;
-
- if (!log_addr_next_sq_wqe2processed)
-  log_addr_next_sq_wqe2processed = (void**)&dummy1;
- if (!log_addr_next_rq_wqe2processed)
-  log_addr_next_rq_wqe2processed = (void**)&dummy2;
-
- return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-        adapter_handle.handle,     /* r4 */
-        dis_and_get_function_code, /* r5 */
-        qp_handle.handle,        /* r6 */
-        0, 0, 0, 0,
-        (void*)log_addr_next_sq_wqe2processed,
-        (void*)log_addr_next_rq_wqe2processed,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ u64 ret;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+    adapter_handle.handle,     /* r4 */
+    dis_and_get_function_code, /* r5 */
+    qp_handle.handle,    /* r6 */
+    0, 0, 0, 0, 0, 0);
+ if (log_addr_next_sq_wqe2processed)
+  *log_addr_next_sq_wqe2processed = (void*)outs[0];
+ if (log_addr_next_rq_wqe2processed)
+  *log_addr_next_rq_wqe2processed = (void*)outs[1];
+
+ return ret;
 }
 
 u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle,
@@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad
        struct h_galpa gal)
 {
  u64 ret;
- u64 dummy;
- u64 invalid_attribute_identifier, rc_attrib_mask;
-
- ret = ehca_hcall_7arg_7ret(H_MODIFY_QP,
-       adapter_handle.handle,         /* r4 */
-       qp_handle.handle,           /* r5 */
-       update_mask,                   /* r6 */
-       virt_to_abs(mqpcb),           /* r7 */
-       0, 0, 0,
-       &invalid_attribute_identifier, /* r4 */
-       &dummy,                   /* r5 */
-       &dummy,                   /* r6 */
-       &dummy,                        /* r7 */
-       &dummy,                   /* r8 */
-       &rc_attrib_mask,               /* r9 */
-       &dummy);
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+ ret = ehca_plpar_hcall9(H_MODIFY_QP, outs,
+    adapter_handle.handle, /* r4 */
+    qp_handle.handle,      /* r5 */
+    update_mask,        /* r6 */
+    virt_to_abs(mqpcb),    /* r7 */
+    0, 0, 0, 0, 0);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Insufficient resources ret=%lx", ret);
@@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada
       struct hcp_modify_qp_control_block *qqpcb,
       struct h_galpa gal)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_QUERY_QP,
-        adapter_handle.handle, /* r4 */
-        qp_handle.handle,      /* r5 */
-        virt_to_abs(qqpcb),    /* r6 */
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_QUERY_QP,
+           adapter_handle.handle, /* r4 */
+           qp_handle.handle,      /* r5 */
+           virt_to_abs(qqpcb),    /* r6 */
+           0, 0, 0, 0);
 }
 
 u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle,
         struct ehca_qp *qp)
 {
  u64 ret;
- u64 dummy;
- u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
 
  ret = hcp_galpas_dtor(&qp->galpas);
  if (ret) {
   ehca_gen_err("Could not destruct qp->galpas");
   return H_RESOURCE;
  }
- ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-       adapter_handle.handle,     /* r4 */
-       /* function code */
-       1,                       /* r5 */
-       qp->ipz_qp_handle.handle,  /* r6 */
-       0, 0, 0, 0,
-       &ladr_next_sq_wqe_out,     /* r4 */
-       &ladr_next_rq_wqe_out,     /* r5 */
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+    adapter_handle.handle,     /* r4 */
+    /* function code */
+    1,                    /* r5 */
+    qp->ipz_qp_handle.handle,  /* r6 */
+    0, 0, 0, 0, 0, 0);
  if (ret == H_HARDWARE)
   ehca_gen_err("HCA not operational. ret=%lx", ret);
 
- ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-       adapter_handle.handle,     /* r4 */
-       qp->ipz_qp_handle.handle,  /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+          adapter_handle.handle,     /* r4 */
+          qp->ipz_qp_handle.handle,  /* r5 */
+          0, 0, 0, 0, 0);
 
  if (ret == H_RESOURCE)
   ehca_gen_err("Resource still in use. ret=%lx", ret);
@@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_
          struct h_galpa gal,
          u32 port)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_DEFINE_AQP0,
-        adapter_handle.handle, /* r4 */
-        qp_handle.handle,      /* r5 */
-        port,                  /* r6 */
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_DEFINE_AQP0,
+           adapter_handle.handle, /* r4 */
+           qp_handle.handle,      /* r5 */
+           port,                  /* r6 */
+           0, 0, 0, 0);
 }
 
 u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle,
@@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_
          u32 * bma_qp_nr)
 {
  u64 ret;
- u64 dummy;
- u64 pma_qp_nr_out, bma_qp_nr_out;
-
- ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1,
-       adapter_handle.handle, /* r4 */
-       qp_handle.handle,      /* r5 */
-       port,           /* r6 */
-       0, 0, 0, 0,
-       &pma_qp_nr_out,        /* r4 */
-       &bma_qp_nr_out,        /* r5 */
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
-
- *pma_qp_nr = (u32)pma_qp_nr_out;
- *bma_qp_nr = (u32)bma_qp_nr_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs,
+    adapter_handle.handle, /* r4 */
+    qp_handle.handle,      /* r5 */
+    port,                /* r6 */
+    0, 0, 0, 0, 0, 0);
+ *pma_qp_nr = (u32)outs[0];
+ *bma_qp_nr = (u32)outs[1];
 
  if (ret == H_ALIAS_EXIST)
   ehca_gen_err("AQP1 already exists. ret=%lx", ret);
@@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_
          u64 subnet_prefix, u64 interface_id)
 {
  u64 ret;
- u64 dummy;
-
- ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP,
-       adapter_handle.handle,     /* r4 */
-       qp_handle.handle,          /* r5 */
-       mcg_dlid,                  /* r6 */
-       interface_id,              /* r7 */
-       subnet_prefix,             /* r8 */
-       0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+
+ ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP,
+          adapter_handle.handle,  /* r4 */
+          qp_handle.handle,       /* r5 */
+          mcg_dlid,               /* r6 */
+          interface_id,           /* r7 */
+          subnet_prefix,          /* r8 */
+          0, 0);
 
  if (ret == H_NOT_ENOUGH_RESOURCES)
   ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_
          u16 mcg_dlid,
          u64 subnet_prefix, u64 interface_id)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_DETACH_MCQP,
-        adapter_handle.handle, /* r4 */
-        qp_handle.handle,    /* r5 */
-        mcg_dlid,            /* r6 */
-        interface_id,          /* r7 */
-        subnet_prefix,         /* r8 */
-        0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_DETACH_MCQP,
+           adapter_handle.handle, /* r4 */
+           qp_handle.handle,      /* r5 */
+           mcg_dlid,              /* r6 */
+           interface_id,          /* r7 */
+           subnet_prefix,         /* r8 */
+           0, 0);
 }
 
 u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle,
@@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a
         u8 force_flag)
 {
  u64 ret;
- u64 dummy;
 
  ret = hcp_galpas_dtor(&cq->galpas);
  if (ret) {
@@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a
   return H_RESOURCE;
  }
 
- ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-       adapter_handle.handle,     /* r4 */
-       cq->ipz_cq_handle.handle,  /* r5 */
-       force_flag != 0 ? 1L : 0L, /* r6 */
-       0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
+ ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+          adapter_handle.handle,     /* r4 */
+          cq->ipz_cq_handle.handle,  /* r5 */
+          force_flag != 0 ? 1L : 0L, /* r6 */
+          0, 0, 0, 0);
 
  if (ret == H_RESOURCE)
   ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret);
@@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a
         struct ehca_eq *eq)
 {
  u64 ret;
- u64 dummy;
 
  ret = hcp_galpas_dtor(&eq->galpas);
  if (ret) {
@@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a
   return H_RESOURCE;
  }
 
- ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-       adapter_handle.handle,     /* r4 */
-       eq->ipz_eq_handle.handle,  /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy,
-       &dummy);
-
+ ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+          adapter_handle.handle,     /* r4 */
+          eq->ipz_eq_handle.handle,  /* r5 */
+          0, 0, 0, 0, 0);
 
  if (ret == H_RESOURCE)
   ehca_gen_err("Resource in use. ret=%lx ", ret);
@@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc
         struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 lkey_out;
- u64 rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,            /* r4 */
-       5,                                /* r5 */
-       vaddr,                            /* r6 */
-       length,                           /* r7 */
-       (((u64)access_ctrl) << 32ULL),    /* r8 */
-       pd.value,                         /* r9 */
-       0,
-       &(outparms->handle.handle),       /* r4 */
-       &dummy,                           /* r5 */
-       &lkey_out,                        /* r6 */
-       &rkey_out,                        /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
- outparms->lkey = (u32)lkey_out;
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,            /* r4 */
+    5,                                /* r5 */
+    vaddr,                            /* r6 */
+    length,                           /* r7 */
+    (((u64)access_ctrl) << 32ULL),    /* r8 */
+    pd.value,                         /* r9 */
+    0, 0, 0);
+ outparms->handle.handle = outs[0];
+ outparms->lkey = (u32)outs[2];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc
          queue_type,
          mr->ipz_mr_handle.handle,
          logical_address_of_page, count);
-
  return ret;
 }
 
@@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada
       struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out;
-
- ret = ehca_hcall_7arg_7ret(H_QUERY_MR,
-       adapter_handle.handle,     /* r4 */
-       mr->ipz_mr_handle.handle,  /* r5 */
-       0, 0, 0, 0, 0,
-       &outparms->len,            /* r4 */
-       &outparms->vaddr,          /* r5 */
-       &remote_len_out,           /* r6 */
-       &remote_vaddr_out,         /* r7 */
-       &acc_ctrl_pd_out,          /* r8 */
-       &r9_out,
-       &dummy);
-
- outparms->acl  = acc_ctrl_pd_out >> 32;
- outparms->lkey = (u32)(r9_out >> 32);
- outparms->rkey = (u32)(r9_out & (0xffffffff));
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_QUERY_MR, outs,
+    adapter_handle.handle,     /* r4 */
+    mr->ipz_mr_handle.handle,  /* r5 */
+    0, 0, 0, 0, 0, 0, 0);
+ outparms->len = outs[0];
+ outparms->vaddr = outs[1];
+ outparms->acl  = outs[4] >> 32;
+ outparms->lkey = (u32)(outs[5] >> 32);
+ outparms->rkey = (u32)(outs[5] & (0xffffffff));
 
  return ret;
 }
@@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada
 u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle,
        const struct ehca_mr *mr)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-        adapter_handle.handle,    /* r4 */
-        mr->ipz_mr_handle.handle, /* r5 */
-        0, 0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+           adapter_handle.handle,    /* r4 */
+           mr->ipz_mr_handle.handle, /* r5 */
+           0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle,
@@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i
      struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 lkey_out, rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR,
-       adapter_handle.handle,    /* r4 */
-       mr->ipz_mr_handle.handle, /* r5 */
-       vaddr_in,              /* r6 */
-       length,                   /* r7 */
-       /* r8 */
-       ((((u64)access_ctrl) << 32ULL) | pd.value),
-       mr_addr_cb,               /* r9 */
-       0,
-       &dummy,                   /* r4 */
-       &outparms->vaddr,         /* r5 */
-       &lkey_out,                /* r6 */
-       &rkey_out,                /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
-
- outparms->lkey = (u32)lkey_out;
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs,
+    adapter_handle.handle,    /* r4 */
+    mr->ipz_mr_handle.handle, /* r5 */
+    vaddr_in,           /* r6 */
+    length,                   /* r7 */
+    /* r8 */
+    ((((u64)access_ctrl) << 32ULL) | pd.value),
+    mr_addr_cb,               /* r9 */
+    0, 0, 0);
+ outparms->vaddr = outs[1];
+ outparms->lkey = (u32)outs[2];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz
    struct ehca_mr_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 lkey_out, rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR,
-       adapter_handle.handle,            /* r4 */
-       orig_mr->ipz_mr_handle.handle,    /* r5 */
-       vaddr_in,                         /* r6 */
-       (((u64)access_ctrl) << 32ULL),    /* r7 */
-       pd.value,                         /* r8 */
-       0, 0,
-       &(outparms->handle.handle),       /* r4 */
-       &dummy,                           /* r5 */
-       &lkey_out,                        /* r6 */
-       &rkey_out,                        /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
- outparms->lkey = (u32)lkey_out;
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs,
+    adapter_handle.handle,            /* r4 */
+    orig_mr->ipz_mr_handle.handle,    /* r5 */
+    vaddr_in,                         /* r6 */
+    (((u64)access_ctrl) << 32ULL),    /* r7 */
+    pd.value,                         /* r8 */
+    0, 0, 0, 0);
+ outparms->handle.handle = outs[0];
+ outparms->lkey = (u32)outs[2];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc
         struct ehca_mw_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-       adapter_handle.handle,      /* r4 */
-       6,                          /* r5 */
-       pd.value,                   /* r6 */
-       0, 0, 0, 0,
-       &(outparms->handle.handle), /* r4 */
-       &dummy,                     /* r5 */
-       &dummy,                     /* r6 */
-       &rkey_out,                  /* r7 */
-       &dummy,
-       &dummy,
-       &dummy);
-
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+    adapter_handle.handle,      /* r4 */
+    6,                          /* r5 */
+    pd.value,                   /* r6 */
+    0, 0, 0, 0, 0, 0);
+ outparms->handle.handle = outs[0];
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada
       struct ehca_mw_hipzout_parms *outparms)
 {
  u64 ret;
- u64 dummy;
- u64 pd_out, rkey_out;
-
- ret = ehca_hcall_7arg_7ret(H_QUERY_MW,
-       adapter_handle.handle,    /* r4 */
-       mw->ipz_mw_handle.handle, /* r5 */
-       0, 0, 0, 0, 0,
-       &dummy,                   /* r4 */
-       &dummy,                   /* r5 */
-       &dummy,                   /* r6 */
-       &rkey_out,                /* r7 */
-       &pd_out,                  /* r8 */
-       &dummy,
-       &dummy);
- outparms->rkey = (u32)rkey_out;
+ u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+ ret = ehca_plpar_hcall9(H_QUERY_MW, outs,
+    adapter_handle.handle,    /* r4 */
+    mw->ipz_mw_handle.handle, /* r5 */
+    0, 0, 0, 0, 0, 0, 0);
+ outparms->rkey = (u32)outs[3];
 
  return ret;
 }
@@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada
 u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle,
        const struct ehca_mw *mw)
 {
- u64 dummy;
-
- return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-        adapter_handle.handle,    /* r4 */
-        mw->ipz_mw_handle.handle, /* r5 */
-        0, 0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+           adapter_handle.handle,    /* r4 */
+           mw->ipz_mw_handle.handle, /* r5 */
+           0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
@@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a
         void *rblock,
         unsigned long *byte_count)
 {
- u64 dummy;
  u64 r_cb = virt_to_abs(rblock);
 
  if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a
   return H_PARAMETER;
  }
 
- return ehca_hcall_7arg_7ret(H_ERROR_DATA,
-        adapter_handle.handle,
-        ressource_handle,
-        r_cb,
-        0, 0, 0, 0,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy,
-        &dummy);
+ return ehca_plpar_hcall_norets(H_ERROR_DATA,
+           adapter_handle.handle,
+           ressource_handle,
+           r_cb,
+           0, 0, 0, 0);
 }
diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
index 39956d8..587ebd4 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.h
+++ b/drivers/infiniband/hw/ehca/hcp_if.h
@@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc
         const u64 logical_address_of_page,
         const u64 count);
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle
       hcp_adapter_handle,
       u32 ist);
 
diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h
index f5f4871..3fc92b0 100644
--- a/drivers/infiniband/hw/ehca/hipz_hw.h
+++ b/drivers/infiniband/hw/ehca/hipz_hw.h
@@ -184,8 +184,6 @@ struct hipz_mrmwmm {
 
 };
 
-#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0)
-
 #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x)
 
 struct hipz_qpedmm {
diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
index 7e55a31..2f13509 100644
--- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
+++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
@@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_
 {
  void *ret = ipz_qeit_get(queue);
  u32 qe = *(u8 *) ret;
- if ((qe >> 7) == (queue->toggle_state & 1))
-  ipz_qeit_eq_get_inc(queue); /* this is a good one */
- else
-  ret = NULL;
+ if ((qe >> 7) != (queue->toggle_state & 1))
+  return NULL;
+ ipz_qeit_eq_get_inc(queue); /* this is a good one */
  return ret;
 }
 
-------------- next part --------------
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 159b0be..2380994 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -5,6 +5,7 @@
  *
  *  Authors: Heiko J Schick <schickhj at de.ibm.com>
  *           Hoang-Nam Nguyen <hnguyen at de.ibm.com>
+ *           Joachim Fenkes <fenkes at de.ibm.com>
  *
  *  Copyright (c) 2005 IBM Corporation
  *
@@ -48,7 +49,7 @@ #include "hcp_if.h"
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
 MODULE_DESCRIPTION("IBM eServer HCA InfiniBand Device Driver");
-MODULE_VERSION("SVNEHCA_0015");
+MODULE_VERSION("SVNEHCA_0016");
 
 int ehca_open_aqp1     = 0;
 int ehca_debug_level   = 0;
@@ -749,7 +750,7 @@ int __init ehca_module_init(void)
 	int ret;
 
 	printk(KERN_INFO "eHCA Infiniband Device Driver "
-	                 "(Rel.: SVNEHCA_0015)\n");
+	                 "(Rel.: SVNEHCA_0016)\n");
 	idr_init(&ehca_qp_idr);
 	idr_init(&ehca_cq_idr);
 	spin_lock_init(&ehca_qp_idr_lock);
diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
index 260e82a..3fb46e6 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.c
+++ b/drivers/infiniband/hw/ehca/hcp_if.c
@@ -48,27 +48,27 @@ #include "hcp_phyp.h"
 #include "hipz_fns.h"
 #include "ipz_pt_fn.h"
 
-#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9,11)
-#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12,12)
-#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13,15)
-#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18,18)
-#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19,21)
-#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22,23)
-#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31,31)
-#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56,63)
-
-#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0,15)
-#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32,39)
-#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40,47)
-
-#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16,31)
-#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48,63)
-#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8,15)
-#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24,31)
-
-#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0,31)
-#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32,63)
+#define H_ALL_RES_QP_ENHANCED_OPS       EHCA_BMASK_IBM(9, 11)
+#define H_ALL_RES_QP_PTE_PIN            EHCA_BMASK_IBM(12, 12)
+#define H_ALL_RES_QP_SERVICE_TYPE       EHCA_BMASK_IBM(13, 15)
+#define H_ALL_RES_QP_LL_RQ_CQE_POSTING  EHCA_BMASK_IBM(18, 18)
+#define H_ALL_RES_QP_LL_SQ_CQE_POSTING  EHCA_BMASK_IBM(19, 21)
+#define H_ALL_RES_QP_SIGNALING_TYPE     EHCA_BMASK_IBM(22, 23)
+#define H_ALL_RES_QP_UD_AV_LKEY_CTRL    EHCA_BMASK_IBM(31, 31)
+#define H_ALL_RES_QP_RESOURCE_TYPE      EHCA_BMASK_IBM(56, 63)
+
+#define H_ALL_RES_QP_MAX_OUTST_SEND_WR  EHCA_BMASK_IBM(0, 15)
+#define H_ALL_RES_QP_MAX_OUTST_RECV_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_MAX_SEND_SGE       EHCA_BMASK_IBM(32, 39)
+#define H_ALL_RES_QP_MAX_RECV_SGE       EHCA_BMASK_IBM(40, 47)
+
+#define H_ALL_RES_QP_ACT_OUTST_SEND_WR  EHCA_BMASK_IBM(16, 31)
+#define H_ALL_RES_QP_ACT_OUTST_RECV_WR  EHCA_BMASK_IBM(48, 63)
+#define H_ALL_RES_QP_ACT_SEND_SGE       EHCA_BMASK_IBM(8, 15)
+#define H_ALL_RES_QP_ACT_RECV_SGE       EHCA_BMASK_IBM(24, 31)
+
+#define H_ALL_RES_QP_SQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(0, 31)
+#define H_ALL_RES_QP_RQUEUE_SIZE_PAGES  EHCA_BMASK_IBM(32, 63)
 
 /* direct access qp controls */
 #define DAQP_CTRL_ENABLE    0x01
@@ -95,35 +95,25 @@ static u32 get_longbusy_msecs(int longbu
 	}
 }
 
-static long ehca_hcall_7arg_7ret(unsigned long opcode,
-				 unsigned long arg1,
-				 unsigned long arg2,
-				 unsigned long arg3,
-				 unsigned long arg4,
-				 unsigned long arg5,
-				 unsigned long arg6,
-				 unsigned long arg7,
-				 unsigned long *out1,
-				 unsigned long *out2,
-				 unsigned long *out3,
-				 unsigned long *out4,
-				 unsigned long *out5,
-				 unsigned long *out6,
-				 unsigned long *out7)
+static long ehca_plpar_hcall_norets(unsigned long opcode,
+				    unsigned long arg1,
+				    unsigned long arg2,
+				    unsigned long arg3,
+				    unsigned long arg4,
+				    unsigned long arg5,
+				    unsigned long arg6,
+				    unsigned long arg7)
 {
 	long ret;
 	int i, sleep_msecs;
 
-	ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx arg5=%lx "
-		     "arg6=%lx arg7=%lx", opcode, arg1, arg2, arg3, arg4, arg5,
-		     arg6, arg7);
+	ehca_gen_dbg("opcode=%lx arg1=%lx arg2=%lx arg3=%lx arg4=%lx "
+		     "arg5=%lx arg6=%lx arg7=%lx",
+		     opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
 
 	for (i = 0; i < 5; i++) {
-		ret = plpar_hcall_7arg_7ret(opcode,
-					    arg1, arg2, arg3, arg4,
-					    arg5, arg6, arg7,
-					    out1, out2, out3, out4,
-					    out5, out6,out7);
+		ret = plpar_hcall_norets(opcode, arg1, arg2, arg3, arg4,
+					 arg5, arg6, arg7);
 
 		if (H_IS_LONG_BUSY(ret)) {
 			sleep_msecs = get_longbusy_msecs(ret);
@@ -134,44 +124,30 @@ static long ehca_hcall_7arg_7ret(unsigne
 		if (ret < H_SUCCESS)
 			ehca_gen_err("opcode=%lx ret=%lx"
 				     " arg1=%lx arg2=%lx arg3=%lx arg4=%lx"
-				     " arg5=%lx arg6=%lx arg7=%lx"
-				     " out1=%lx out2=%lx out3=%lx out4=%lx"
-				     " out5=%lx out6=%lx out7=%lx",
+				     " arg5=%lx arg6=%lx arg7=%lx ",
 				     opcode, ret,
-				     arg1, arg2, arg3, arg4,
-				     arg5, arg6, arg7,
-				     *out1, *out2, *out3, *out4,
-				     *out5, *out6, *out7);
+				     arg1, arg2, arg3, arg4, arg5,
+				     arg6, arg7);
 
-		ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
-			     "out4=%lx out5=%lx out6=%lx out7=%lx",
-			     opcode, ret, *out1, *out2, *out3, *out4, *out5,
-			     *out6, *out7);
+		ehca_gen_dbg("opcode=%lx ret=%lx", opcode, ret);
 		return ret;
+
 	}
 
 	return H_BUSY;
 }
 
-static long ehca_hcall_9arg_9ret(unsigned long opcode,
-				 unsigned long arg1,
-				 unsigned long arg2,
-				 unsigned long arg3,
-				 unsigned long arg4,
-				 unsigned long arg5,
-				 unsigned long arg6,
-				 unsigned long arg7,
-				 unsigned long arg8,
-				 unsigned long arg9,
-				 unsigned long *out1,
-				 unsigned long *out2,
-				 unsigned long *out3,
-				 unsigned long *out4,
-				 unsigned long *out5,
-				 unsigned long *out6,
-				 unsigned long *out7,
-				 unsigned long *out8,
-				 unsigned long *out9)
+static long ehca_plpar_hcall9(unsigned long opcode,
+			      unsigned long *outs, /* array of 9 outputs */
+			      unsigned long arg1,
+			      unsigned long arg2,
+			      unsigned long arg3,
+			      unsigned long arg4,
+			      unsigned long arg5,
+			      unsigned long arg6,
+			      unsigned long arg7,
+			      unsigned long arg8,
+			      unsigned long arg9)
 {
 	long ret;
 	int i, sleep_msecs;
@@ -182,13 +158,9 @@ static long ehca_hcall_9arg_9ret(unsigne
 		     arg8, arg9);
 
 	for (i = 0; i < 5; i++) {
-		ret = plpar_hcall_9arg_9ret(opcode,
-					    arg1, arg2, arg3, arg4,
-					    arg5, arg6, arg7, arg8,
-					    arg9,
-					    out1, out2, out3, out4,
-					    out5, out6, out7, out8,
-					    out9);
+		ret = plpar_hcall9(opcode, outs,
+				   arg1, arg2, arg3, arg4, arg5,
+				   arg6, arg7, arg8, arg9);
 
 		if (H_IS_LONG_BUSY(ret)) {
 			sleep_msecs = get_longbusy_msecs(ret);
@@ -205,37 +177,35 @@ static long ehca_hcall_9arg_9ret(unsigne
 				     " out5=%lx out6=%lx out7=%lx out8=%lx"
 				     " out9=%lx",
 				     opcode, ret,
-				     arg1, arg2, arg3, arg4,
-				     arg5, arg6, arg7, arg8,
-				     arg9,
-				     *out1, *out2, *out3, *out4,
-				     *out5, *out6, *out7, *out8,
-				     *out9);
+				     arg1, arg2, arg3, arg4, arg5,
+				     arg6, arg7, arg8, arg9,
+				     outs[0], outs[1], outs[2], outs[3],
+				     outs[4], outs[5], outs[6], outs[7],
+				     outs[8]);
 
 		ehca_gen_dbg("opcode=%lx ret=%lx out1=%lx out2=%lx out3=%lx "
 			     "out4=%lx out5=%lx out6=%lx out7=%lx out8=%lx "
-			     "out9=%lx", opcode, ret,*out1, *out2, *out3, *out4,
-			     *out5, *out6, *out7, *out8, *out9);
+			     "out9=%lx",
+			     opcode, ret, outs[0], outs[1], outs[2], outs[3],
+			     outs[4], outs[5], outs[6], outs[7], outs[8]);
 		return ret;
 
 	}
 
 	return H_BUSY;
 }
-
 u64 hipz_h_alloc_resource_eq(const struct ipz_adapter_handle adapter_handle,
 			     struct ehca_pfeq *pfeq,
 			     const u32 neq_control,
 			     const u32 number_of_entries,
 			     struct ipz_eq_handle *eq_handle,
-			     u32 * act_nr_of_entries,
-			     u32 * act_pages,
-			     u32 * eq_ist)
+			     u32 *act_nr_of_entries,
+			     u32 *act_pages,
+			     u32 *eq_ist)
 {
 	u64 ret;
-	u64 dummy;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
 	u64 allocate_controls;
-	u64 act_nr_of_entries_out, act_pages_out, eq_ist_out;
 
 	/* resource type */
 	allocate_controls = 3ULL;
@@ -246,22 +216,15 @@ u64 hipz_h_alloc_resource_eq(const struc
 	else /* notification event queue */
 		allocate_controls = (1ULL << 63) | allocate_controls;
 
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,  /* r4 */
-				   allocate_controls,      /* r5 */
-				   number_of_entries,      /* r6 */
-				   0, 0, 0, 0,
-				   &eq_handle->handle,     /* r4 */
-				   &dummy,	           /* r5 */
-				   &dummy,	           /* r6 */
-				   &act_nr_of_entries_out, /* r7 */
-				   &act_pages_out,	   /* r8 */
-				   &eq_ist_out,            /* r8 */
-				   &dummy);
-
-	*act_nr_of_entries = (u32)act_nr_of_entries_out;
-	*act_pages         = (u32)act_pages_out;
-	*eq_ist            = (u32)eq_ist_out;
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,  /* r4 */
+				allocate_controls,      /* r5 */
+				number_of_entries,      /* r6 */
+				0, 0, 0, 0, 0, 0);
+	eq_handle->handle = outs[0];
+	*act_nr_of_entries = (u32)outs[3];
+	*act_pages = (u32)outs[4];
+	*eq_ist = (u32)outs[5];
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Not enough resource - ret=%lx ", ret);
@@ -273,20 +236,11 @@ u64 hipz_h_reset_event(const struct ipz_
 		       struct ipz_eq_handle eq_handle,
 		       const u64 event_mask)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_RESET_EVENTS,
-				    adapter_handle.handle, /* r4 */
-				    eq_handle.handle,      /* r5 */
-				    event_mask,	           /* r6 */
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_RESET_EVENTS,
+				       adapter_handle.handle, /* r4 */
+				       eq_handle.handle,      /* r5 */
+				       event_mask,	      /* r6 */
+				       0, 0, 0, 0);
 }
 
 u64 hipz_h_alloc_resource_cq(const struct ipz_adapter_handle adapter_handle,
@@ -294,30 +248,21 @@ u64 hipz_h_alloc_resource_cq(const struc
 			     struct ehca_alloc_cq_parms *param)
 {
 	u64 ret;
-	u64 dummy;
-	u64 act_nr_of_entries_out, act_pages_out;
-	u64 g_la_privileged_out, g_la_user_out;
-
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,     /* r4  */
-				   2,	                      /* r5  */
-				   param->eq_handle.handle,   /* r6  */
-				   cq->token,	              /* r7  */
-				   param->nr_cqe,             /* r8  */
-				   0, 0,
-				   &cq->ipz_cq_handle.handle, /* r4  */
-				   &dummy,	              /* r5  */
-				   &dummy,	              /* r6  */
-				   &act_nr_of_entries_out,    /* r7  */
-				   &act_pages_out,	      /* r8  */
-				   &g_la_privileged_out,      /* r9  */
-				   &g_la_user_out);           /* r10 */
-
-	param->act_nr_of_entries = (u32)act_nr_of_entries_out;
-	param->act_pages = (u32)act_pages_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,   /* r4  */
+				2,	                 /* r5  */
+				param->eq_handle.handle, /* r6  */
+				cq->token,	         /* r7  */
+				param->nr_cqe,           /* r8  */
+				0, 0, 0, 0);
+	cq->ipz_cq_handle.handle = outs[0];
+	param->act_nr_of_entries = (u32)outs[3];
+	param->act_pages = (u32)outs[4];
 
 	if (ret == H_SUCCESS)
-		hcp_galpas_ctor(&cq->galpas, g_la_privileged_out, g_la_user_out);
+		hcp_galpas_ctor(&cq->galpas, outs[5], outs[6]);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -330,8 +275,9 @@ u64 hipz_h_alloc_resource_qp(const struc
 			     struct ehca_alloc_qp_parms *parms)
 {
 	u64 ret;
-	u64 dummy, allocate_controls, max_r10_reg;
-	u64 qp_nr_out, r6_out, r7_out, r8_out, g_la_user_out, r11_out;
+	u64 allocate_controls;
+	u64 max_r10_reg;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
 	u16 max_nr_receive_wqes = qp->init_attr.cap.max_recv_wr + 1;
 	u16 max_nr_send_wqes = qp->init_attr.cap.max_send_wr + 1;
 	int daqp_ctrl = parms->daqp_ctrl;
@@ -360,48 +306,36 @@ u64 hipz_h_alloc_resource_qp(const struc
 		| EHCA_BMASK_SET(H_ALL_RES_QP_MAX_RECV_SGE,
 				 parms->max_recv_sge);
 
-
-	ret = ehca_hcall_9arg_9ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,	      /* r4  */
-				   allocate_controls,	              /* r5  */
-				   qp->send_cq->ipz_cq_handle.handle,
-				   qp->recv_cq->ipz_cq_handle.handle,
-				   parms->ipz_eq_handle.handle,
-				   ((u64)qp->token << 32) | parms->pd.value,
-				   max_r10_reg,	                      /* r10 */
-				   parms->ud_av_l_key_ctl,            /* r11 */
-				   0,
-				   &qp->ipz_qp_handle.handle,
-				   &qp_nr_out,	                      /* r5  */
-				   &r6_out,	                      /* r6  */
-				   &r7_out,	                      /* r7  */
-				   &r8_out,	                      /* r8  */
-				   &dummy,	                      /* r9  */
-				   &g_la_user_out,	              /* r10 */
-				   &r11_out,
-				   &dummy);
-
-	/* extract outputs */
-	qp->real_qp_num = (u32)qp_nr_out;
-
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,	           /* r4  */
+				allocate_controls,	           /* r5  */
+				qp->send_cq->ipz_cq_handle.handle,
+				qp->recv_cq->ipz_cq_handle.handle,
+				parms->ipz_eq_handle.handle,
+				((u64)qp->token << 32) | parms->pd.value,
+				max_r10_reg,	                   /* r10 */
+				parms->ud_av_l_key_ctl,            /* r11 */
+				0);
+	qp->ipz_qp_handle.handle = outs[0];
+	qp->real_qp_num = (u32)outs[1];
 	parms->act_nr_send_sges =
-		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, r6_out);
+		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_SEND_WR, outs[2]);
 	parms->act_nr_recv_wqes =
-		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, r6_out);
+		(u16)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_OUTST_RECV_WR, outs[2]);
 	parms->act_nr_send_sges =
-		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, r7_out);
+		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_SEND_SGE, outs[3]);
 	parms->act_nr_recv_sges =
-		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, r7_out);
+		(u8)EHCA_BMASK_GET(H_ALL_RES_QP_ACT_RECV_SGE, outs[3]);
 	parms->nr_sq_pages =
-		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, r8_out);
+		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_SQUEUE_SIZE_PAGES, outs[4]);
 	parms->nr_rq_pages =
-		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, r8_out);
+		(u32)EHCA_BMASK_GET(H_ALL_RES_QP_RQUEUE_SIZE_PAGES, outs[4]);
 
 	if (ret == H_SUCCESS)
-		hcp_galpas_ctor(&qp->galpas, g_la_user_out, g_la_user_out);
+		hcp_galpas_ctor(&qp->galpas, outs[6], outs[6]);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
-		ehca_gen_err("Not enough resources. ret=%lx",ret);
+		ehca_gen_err("Not enough resources. ret=%lx", ret);
 
 	return ret;
 }
@@ -411,7 +345,6 @@ u64 hipz_h_query_port(const struct ipz_a
 		      struct hipz_query_port *query_port_response_block)
 {
 	u64 ret;
-	u64 dummy;
 	u64 r_cb = virt_to_abs(query_port_response_block);
 
 	if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -419,18 +352,11 @@ u64 hipz_h_query_port(const struct ipz_a
 		return H_PARAMETER;
 	}
 
-	ret = ehca_hcall_7arg_7ret(H_QUERY_PORT,
-				   adapter_handle.handle, /* r4 */
-				   port_id,	          /* r5 */
-				   r_cb,	          /* r6 */
-				   0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall_norets(H_QUERY_PORT,
+				      adapter_handle.handle, /* r4 */
+				      port_id,	             /* r5 */
+				      r_cb,	             /* r6 */
+				      0, 0, 0, 0);
 
 	if (ehca_debug_level)
 		ehca_dmp(query_port_response_block, 64, "response_block");
@@ -441,7 +367,6 @@ u64 hipz_h_query_port(const struct ipz_a
 u64 hipz_h_query_hca(const struct ipz_adapter_handle adapter_handle,
 		     struct hipz_query_hca *query_hca_rblock)
 {
-	u64 dummy;
 	u64 r_cb = virt_to_abs(query_hca_rblock);
 
 	if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -450,17 +375,10 @@ u64 hipz_h_query_hca(const struct ipz_ad
 		return H_PARAMETER;
 	}
 
-	return ehca_hcall_7arg_7ret(H_QUERY_HCA,
-				    adapter_handle.handle, /* r4 */
-				    r_cb,                  /* r5 */
-				    0, 0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_QUERY_HCA,
+				       adapter_handle.handle, /* r4 */
+				       r_cb,                  /* r5 */
+				       0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_register_rpage(const struct ipz_adapter_handle adapter_handle,
@@ -470,22 +388,13 @@ u64 hipz_h_register_rpage(const struct i
 			  const u64 logical_address_of_page,
 			  u64 count)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_REGISTER_RPAGES,
-				    adapter_handle.handle,      /* r4  */
-				    queue_type | pagesize << 8, /* r5  */
-				    resource_handle,	        /* r6  */
-				    logical_address_of_page,    /* r7  */
-				    count,	                /* r8  */
-				    0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_REGISTER_RPAGES,
+				       adapter_handle.handle,      /* r4  */
+				       queue_type | pagesize << 8, /* r5  */
+				       resource_handle,	           /* r6  */
+				       logical_address_of_page,    /* r7  */
+				       count,	                   /* r8  */
+				       0, 0);
 }
 
 u64 hipz_h_register_rpage_eq(const struct ipz_adapter_handle adapter_handle,
@@ -507,23 +416,14 @@ u64 hipz_h_register_rpage_eq(const struc
 				     logical_address_of_page, count);
 }
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle adapter_handle,
 			   u32 ist)
 {
-	u32 ret;
-	u64 dummy;
-
-	ret = ehca_hcall_7arg_7ret(H_QUERY_INT_STATE,
-				   adapter_handle.handle, /* r4 */
-				   ist,                   /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	u64 ret;
+	ret = ehca_plpar_hcall_norets(H_QUERY_INT_STATE,
+				      adapter_handle.handle, /* r4 */
+				      ist,                   /* r5 */
+				      0, 0, 0, 0, 0);
 
 	if (ret != H_SUCCESS && ret != H_BUSY)
 		ehca_gen_err("Could not query interrupt state.");
@@ -576,25 +476,20 @@ u64 hipz_h_disable_and_get_wqe(const str
 			       void **log_addr_next_rq_wqe2processed,
 			       int dis_and_get_function_code)
 {
-	u64 dummy, dummy1, dummy2;
-
-	if (!log_addr_next_sq_wqe2processed)
-		log_addr_next_sq_wqe2processed = (void**)&dummy1;
-	if (!log_addr_next_rq_wqe2processed)
-		log_addr_next_rq_wqe2processed = (void**)&dummy2;
-
-	return ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-				    adapter_handle.handle,     /* r4 */
-				    dis_and_get_function_code, /* r5 */
-				    qp_handle.handle,	       /* r6 */
-				    0, 0, 0, 0,
-				    (void*)log_addr_next_sq_wqe2processed,
-				    (void*)log_addr_next_rq_wqe2processed,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	u64 ret;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+				adapter_handle.handle,     /* r4 */
+				dis_and_get_function_code, /* r5 */
+				qp_handle.handle,	   /* r6 */
+				0, 0, 0, 0, 0, 0);
+	if (log_addr_next_sq_wqe2processed)
+		*log_addr_next_sq_wqe2processed = (void*)outs[0];
+	if (log_addr_next_rq_wqe2processed)
+		*log_addr_next_rq_wqe2processed = (void*)outs[1];
+
+	return ret;
 }
 
 u64 hipz_h_modify_qp(const struct ipz_adapter_handle adapter_handle,
@@ -605,22 +500,13 @@ u64 hipz_h_modify_qp(const struct ipz_ad
 		     struct h_galpa gal)
 {
 	u64 ret;
-	u64 dummy;
-	u64 invalid_attribute_identifier, rc_attrib_mask;
-
-	ret = ehca_hcall_7arg_7ret(H_MODIFY_QP,
-				   adapter_handle.handle,         /* r4 */
-				   qp_handle.handle,	          /* r5 */
-				   update_mask,	                  /* r6 */
-				   virt_to_abs(mqpcb),	          /* r7 */
-				   0, 0, 0,
-				   &invalid_attribute_identifier, /* r4 */
-				   &dummy,	                  /* r5 */
-				   &dummy,	                  /* r6 */
-				   &dummy,                        /* r7 */
-				   &dummy,	                  /* r8 */
-				   &rc_attrib_mask,               /* r9 */
-				   &dummy);
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+	ret = ehca_plpar_hcall9(H_MODIFY_QP, outs,
+				adapter_handle.handle, /* r4 */
+				qp_handle.handle,      /* r5 */
+				update_mask,	       /* r6 */
+				virt_to_abs(mqpcb),    /* r7 */
+				0, 0, 0, 0, 0);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Insufficient resources ret=%lx", ret);
@@ -634,61 +520,37 @@ u64 hipz_h_query_qp(const struct ipz_ada
 		    struct hcp_modify_qp_control_block *qqpcb,
 		    struct h_galpa gal)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_QUERY_QP,
-				    adapter_handle.handle, /* r4 */
-				    qp_handle.handle,      /* r5 */
-				    virt_to_abs(qqpcb),	   /* r6 */
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_QUERY_QP,
+				       adapter_handle.handle, /* r4 */
+				       qp_handle.handle,      /* r5 */
+				       virt_to_abs(qqpcb),    /* r6 */
+				       0, 0, 0, 0);
 }
 
 u64 hipz_h_destroy_qp(const struct ipz_adapter_handle adapter_handle,
 		      struct ehca_qp *qp)
 {
 	u64 ret;
-	u64 dummy;
-	u64 ladr_next_sq_wqe_out, ladr_next_rq_wqe_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
 
 	ret = hcp_galpas_dtor(&qp->galpas);
 	if (ret) {
 		ehca_gen_err("Could not destruct qp->galpas");
 		return H_RESOURCE;
 	}
-	ret = ehca_hcall_7arg_7ret(H_DISABLE_AND_GETC,
-				   adapter_handle.handle,     /* r4 */
-				   /* function code */
-				   1,	                      /* r5 */
-				   qp->ipz_qp_handle.handle,  /* r6 */
-				   0, 0, 0, 0,
-				   &ladr_next_sq_wqe_out,     /* r4 */
-				   &ladr_next_rq_wqe_out,     /* r5 */
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall9(H_DISABLE_AND_GETC, outs,
+				adapter_handle.handle,     /* r4 */
+				/* function code */
+				1,	                   /* r5 */
+				qp->ipz_qp_handle.handle,  /* r6 */
+				0, 0, 0, 0, 0, 0);
 	if (ret == H_HARDWARE)
 		ehca_gen_err("HCA not operational. ret=%lx", ret);
 
-	ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				   adapter_handle.handle,     /* r4 */
-				   qp->ipz_qp_handle.handle,  /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				      adapter_handle.handle,     /* r4 */
+				      qp->ipz_qp_handle.handle,  /* r5 */
+				      0, 0, 0, 0, 0);
 
 	if (ret == H_RESOURCE)
 		ehca_gen_err("Resource still in use. ret=%lx", ret);
@@ -701,20 +563,11 @@ u64 hipz_h_define_aqp0(const struct ipz_
 		       struct h_galpa gal,
 		       u32 port)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_DEFINE_AQP0,
-				    adapter_handle.handle, /* r4 */
-				    qp_handle.handle,      /* r5 */
-				    port,                  /* r6 */
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_DEFINE_AQP0,
+				       adapter_handle.handle, /* r4 */
+				       qp_handle.handle,      /* r5 */
+				       port,                  /* r6 */
+				       0, 0, 0, 0);
 }
 
 u64 hipz_h_define_aqp1(const struct ipz_adapter_handle adapter_handle,
@@ -724,24 +577,15 @@ u64 hipz_h_define_aqp1(const struct ipz_
 		       u32 * bma_qp_nr)
 {
 	u64 ret;
-	u64 dummy;
-	u64 pma_qp_nr_out, bma_qp_nr_out;
-
-	ret = ehca_hcall_7arg_7ret(H_DEFINE_AQP1,
-				   adapter_handle.handle, /* r4 */
-				   qp_handle.handle,      /* r5 */
-				   port,	          /* r6 */
-				   0, 0, 0, 0,
-				   &pma_qp_nr_out,        /* r4 */
-				   &bma_qp_nr_out,        /* r5 */
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
-	*pma_qp_nr = (u32)pma_qp_nr_out;
-	*bma_qp_nr = (u32)bma_qp_nr_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_DEFINE_AQP1, outs,
+				adapter_handle.handle, /* r4 */
+				qp_handle.handle,      /* r5 */
+				port,	               /* r6 */
+				0, 0, 0, 0, 0, 0);
+	*pma_qp_nr = (u32)outs[0];
+	*bma_qp_nr = (u32)outs[1];
 
 	if (ret == H_ALIAS_EXIST)
 		ehca_gen_err("AQP1 already exists. ret=%lx", ret);
@@ -756,22 +600,14 @@ u64 hipz_h_attach_mcqp(const struct ipz_
 		       u64 subnet_prefix, u64 interface_id)
 {
 	u64 ret;
-	u64 dummy;
-
-	ret = ehca_hcall_7arg_7ret(H_ATTACH_MCQP,
-				   adapter_handle.handle,     /* r4 */
-				   qp_handle.handle,          /* r5 */
-				   mcg_dlid,                  /* r6 */
-				   interface_id,              /* r7 */
-				   subnet_prefix,             /* r8 */
-				   0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+
+	ret = ehca_plpar_hcall_norets(H_ATTACH_MCQP,
+				      adapter_handle.handle,  /* r4 */
+				      qp_handle.handle,       /* r5 */
+				      mcg_dlid,               /* r6 */
+				      interface_id,           /* r7 */
+				      subnet_prefix,          /* r8 */
+				      0, 0);
 
 	if (ret == H_NOT_ENOUGH_RESOURCES)
 		ehca_gen_err("Not enough resources. ret=%lx", ret);
@@ -785,22 +621,13 @@ u64 hipz_h_detach_mcqp(const struct ipz_
 		       u16 mcg_dlid,
 		       u64 subnet_prefix, u64 interface_id)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_DETACH_MCQP,
-				    adapter_handle.handle, /* r4 */
-				    qp_handle.handle,	   /* r5 */
-				    mcg_dlid,	           /* r6 */
-				    interface_id,          /* r7 */
-				    subnet_prefix,         /* r8 */
-				    0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_DETACH_MCQP,
+				       adapter_handle.handle, /* r4 */
+				       qp_handle.handle,      /* r5 */
+				       mcg_dlid,              /* r6 */
+				       interface_id,          /* r7 */
+				       subnet_prefix,         /* r8 */
+				       0, 0);
 }
 
 u64 hipz_h_destroy_cq(const struct ipz_adapter_handle adapter_handle,
@@ -808,7 +635,6 @@ u64 hipz_h_destroy_cq(const struct ipz_a
 		      u8 force_flag)
 {
 	u64 ret;
-	u64 dummy;
 
 	ret = hcp_galpas_dtor(&cq->galpas);
 	if (ret) {
@@ -816,18 +642,11 @@ u64 hipz_h_destroy_cq(const struct ipz_a
 		return H_RESOURCE;
 	}
 
-	ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				   adapter_handle.handle,     /* r4 */
-				   cq->ipz_cq_handle.handle,  /* r5 */
-				   force_flag != 0 ? 1L : 0L, /* r6 */
-				   0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
+	ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				      adapter_handle.handle,     /* r4 */
+				      cq->ipz_cq_handle.handle,  /* r5 */
+				      force_flag != 0 ? 1L : 0L, /* r6 */
+				      0, 0, 0, 0);
 
 	if (ret == H_RESOURCE)
 		ehca_gen_err("H_FREE_RESOURCE failed ret=%lx ", ret);
@@ -839,7 +658,6 @@ u64 hipz_h_destroy_eq(const struct ipz_a
 		      struct ehca_eq *eq)
 {
 	u64 ret;
-	u64 dummy;
 
 	ret = hcp_galpas_dtor(&eq->galpas);
 	if (ret) {
@@ -847,18 +665,10 @@ u64 hipz_h_destroy_eq(const struct ipz_a
 		return H_RESOURCE;
 	}
 
-	ret = ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				   adapter_handle.handle,     /* r4 */
-				   eq->ipz_eq_handle.handle,  /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
+	ret = ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				      adapter_handle.handle,     /* r4 */
+				      eq->ipz_eq_handle.handle,  /* r5 */
+				      0, 0, 0, 0, 0);
 
 	if (ret == H_RESOURCE)
 		ehca_gen_err("Resource in use. ret=%lx ", ret);
@@ -875,27 +685,19 @@ u64 hipz_h_alloc_resource_mr(const struc
 			     struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 lkey_out;
-	u64 rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,            /* r4 */
-				   5,                                /* r5 */
-				   vaddr,                            /* r6 */
-				   length,                           /* r7 */
-				   (((u64)access_ctrl) << 32ULL),    /* r8 */
-				   pd.value,                         /* r9 */
-				   0,
-				   &(outparms->handle.handle),       /* r4 */
-				   &dummy,                           /* r5 */
-				   &lkey_out,                        /* r6 */
-				   &rkey_out,                        /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-	outparms->lkey = (u32)lkey_out;
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,            /* r4 */
+				5,                                /* r5 */
+				vaddr,                            /* r6 */
+				length,                           /* r7 */
+				(((u64)access_ctrl) << 32ULL),    /* r8 */
+				pd.value,                         /* r9 */
+				0, 0, 0);
+	outparms->handle.handle = outs[0];
+	outparms->lkey = (u32)outs[2];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -923,7 +725,6 @@ u64 hipz_h_register_rpage_mr(const struc
 					    queue_type,
 					    mr->ipz_mr_handle.handle,
 					    logical_address_of_page, count);
-
 	return ret;
 }
 
@@ -932,24 +733,17 @@ u64 hipz_h_query_mr(const struct ipz_ada
 		    struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 remote_len_out, remote_vaddr_out, acc_ctrl_pd_out, r9_out;
-
-	ret = ehca_hcall_7arg_7ret(H_QUERY_MR,
-				   adapter_handle.handle,     /* r4 */
-				   mr->ipz_mr_handle.handle,  /* r5 */
-				   0, 0, 0, 0, 0,
-				   &outparms->len,            /* r4 */
-				   &outparms->vaddr,          /* r5 */
-				   &remote_len_out,           /* r6 */
-				   &remote_vaddr_out,         /* r7 */
-				   &acc_ctrl_pd_out,          /* r8 */
-				   &r9_out,
-				   &dummy);
-
-	outparms->acl  = acc_ctrl_pd_out >> 32;
-	outparms->lkey = (u32)(r9_out >> 32);
-	outparms->rkey = (u32)(r9_out & (0xffffffff));
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_QUERY_MR, outs,
+				adapter_handle.handle,     /* r4 */
+				mr->ipz_mr_handle.handle,  /* r5 */
+				0, 0, 0, 0, 0, 0, 0);
+	outparms->len = outs[0];
+	outparms->vaddr = outs[1];
+	outparms->acl  = outs[4] >> 32;
+	outparms->lkey = (u32)(outs[5] >> 32);
+	outparms->rkey = (u32)(outs[5] & (0xffffffff));
 
 	return ret;
 }
@@ -957,19 +751,10 @@ u64 hipz_h_query_mr(const struct ipz_ada
 u64 hipz_h_free_resource_mr(const struct ipz_adapter_handle adapter_handle,
 			    const struct ehca_mr *mr)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				    adapter_handle.handle,    /* r4 */
-				    mr->ipz_mr_handle.handle, /* r5 */
-				    0, 0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				       adapter_handle.handle,    /* r4 */
+				       mr->ipz_mr_handle.handle, /* r5 */
+				       0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_reregister_pmr(const struct ipz_adapter_handle adapter_handle,
@@ -982,28 +767,20 @@ u64 hipz_h_reregister_pmr(const struct i
 			  struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 lkey_out, rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_REREGISTER_PMR,
-				   adapter_handle.handle,    /* r4 */
-				   mr->ipz_mr_handle.handle, /* r5 */
-				   vaddr_in,	             /* r6 */
-				   length,                   /* r7 */
-				   /* r8 */
-				   ((((u64)access_ctrl) << 32ULL) | pd.value),
-				   mr_addr_cb,               /* r9 */
-				   0,
-				   &dummy,                   /* r4 */
-				   &outparms->vaddr,         /* r5 */
-				   &lkey_out,                /* r6 */
-				   &rkey_out,                /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
-	outparms->lkey = (u32)lkey_out;
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_REREGISTER_PMR, outs,
+				adapter_handle.handle,    /* r4 */
+				mr->ipz_mr_handle.handle, /* r5 */
+				vaddr_in,	          /* r6 */
+				length,                   /* r7 */
+				/* r8 */
+				((((u64)access_ctrl) << 32ULL) | pd.value),
+				mr_addr_cb,               /* r9 */
+				0, 0, 0);
+	outparms->vaddr = outs[1];
+	outparms->lkey = (u32)outs[2];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1017,25 +794,18 @@ u64 hipz_h_register_smr(const struct ipz
 			struct ehca_mr_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 lkey_out, rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_REGISTER_SMR,
-				   adapter_handle.handle,            /* r4 */
-				   orig_mr->ipz_mr_handle.handle,    /* r5 */
-				   vaddr_in,                         /* r6 */
-				   (((u64)access_ctrl) << 32ULL),    /* r7 */
-				   pd.value,                         /* r8 */
-				   0, 0,
-				   &(outparms->handle.handle),       /* r4 */
-				   &dummy,                           /* r5 */
-				   &lkey_out,                        /* r6 */
-				   &rkey_out,                        /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-	outparms->lkey = (u32)lkey_out;
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_REGISTER_SMR, outs,
+				adapter_handle.handle,            /* r4 */
+				orig_mr->ipz_mr_handle.handle,    /* r5 */
+				vaddr_in,                         /* r6 */
+				(((u64)access_ctrl) << 32ULL),    /* r7 */
+				pd.value,                         /* r8 */
+				0, 0, 0, 0);
+	outparms->handle.handle = outs[0];
+	outparms->lkey = (u32)outs[2];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1046,23 +816,15 @@ u64 hipz_h_alloc_resource_mw(const struc
 			     struct ehca_mw_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_ALLOC_RESOURCE,
-				   adapter_handle.handle,      /* r4 */
-				   6,                          /* r5 */
-				   pd.value,                   /* r6 */
-				   0, 0, 0, 0,
-				   &(outparms->handle.handle), /* r4 */
-				   &dummy,                     /* r5 */
-				   &dummy,                     /* r6 */
-				   &rkey_out,                  /* r7 */
-				   &dummy,
-				   &dummy,
-				   &dummy);
-
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
+				adapter_handle.handle,      /* r4 */
+				6,                          /* r5 */
+				pd.value,                   /* r6 */
+				0, 0, 0, 0, 0, 0);
+	outparms->handle.handle = outs[0];
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1072,21 +834,13 @@ u64 hipz_h_query_mw(const struct ipz_ada
 		    struct ehca_mw_hipzout_parms *outparms)
 {
 	u64 ret;
-	u64 dummy;
-	u64 pd_out, rkey_out;
-
-	ret = ehca_hcall_7arg_7ret(H_QUERY_MW,
-				   adapter_handle.handle,    /* r4 */
-				   mw->ipz_mw_handle.handle, /* r5 */
-				   0, 0, 0, 0, 0,
-				   &dummy,                   /* r4 */
-				   &dummy,                   /* r5 */
-				   &dummy,                   /* r6 */
-				   &rkey_out,                /* r7 */
-				   &pd_out,                  /* r8 */
-				   &dummy,
-				   &dummy);
-	outparms->rkey = (u32)rkey_out;
+	u64 outs[PLPAR_HCALL9_BUFSIZE];
+
+	ret = ehca_plpar_hcall9(H_QUERY_MW, outs,
+				adapter_handle.handle,    /* r4 */
+				mw->ipz_mw_handle.handle, /* r5 */
+				0, 0, 0, 0, 0, 0, 0);
+	outparms->rkey = (u32)outs[3];
 
 	return ret;
 }
@@ -1094,19 +848,10 @@ u64 hipz_h_query_mw(const struct ipz_ada
 u64 hipz_h_free_resource_mw(const struct ipz_adapter_handle adapter_handle,
 			    const struct ehca_mw *mw)
 {
-	u64 dummy;
-
-	return ehca_hcall_7arg_7ret(H_FREE_RESOURCE,
-				    adapter_handle.handle,    /* r4 */
-				    mw->ipz_mw_handle.handle, /* r5 */
-				    0, 0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_FREE_RESOURCE,
+				       adapter_handle.handle,    /* r4 */
+				       mw->ipz_mw_handle.handle, /* r5 */
+				       0, 0, 0, 0, 0);
 }
 
 u64 hipz_h_error_data(const struct ipz_adapter_handle adapter_handle,
@@ -1114,7 +859,6 @@ u64 hipz_h_error_data(const struct ipz_a
 		      void *rblock,
 		      unsigned long *byte_count)
 {
-	u64 dummy;
 	u64 r_cb = virt_to_abs(rblock);
 
 	if (r_cb & (EHCA_PAGESIZE-1)) {
@@ -1122,16 +866,9 @@ u64 hipz_h_error_data(const struct ipz_a
 		return H_PARAMETER;
 	}
 
-	return ehca_hcall_7arg_7ret(H_ERROR_DATA,
-				    adapter_handle.handle,
-				    ressource_handle,
-				    r_cb,
-				    0, 0, 0, 0,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy,
-				    &dummy);
+	return ehca_plpar_hcall_norets(H_ERROR_DATA,
+				       adapter_handle.handle,
+				       ressource_handle,
+				       r_cb,
+				       0, 0, 0, 0);
 }
diff --git a/drivers/infiniband/hw/ehca/hcp_if.h b/drivers/infiniband/hw/ehca/hcp_if.h
index 39956d8..587ebd4 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.h
+++ b/drivers/infiniband/hw/ehca/hcp_if.h
@@ -107,7 +107,7 @@ u64 hipz_h_register_rpage_eq(const struc
 			     const u64 logical_address_of_page,
 			     const u64 count);
 
-u32 hipz_h_query_int_state(const struct ipz_adapter_handle
+u64 hipz_h_query_int_state(const struct ipz_adapter_handle
 			   hcp_adapter_handle,
 			   u32 ist);
 
diff --git a/drivers/infiniband/hw/ehca/hipz_hw.h b/drivers/infiniband/hw/ehca/hipz_hw.h
index f5f4871..3fc92b0 100644
--- a/drivers/infiniband/hw/ehca/hipz_hw.h
+++ b/drivers/infiniband/hw/ehca/hipz_hw.h
@@ -184,8 +184,6 @@ struct hipz_mrmwmm {
 
 };
 
-#define MRX_HCR_LPARID_VALID EHCA_BMASK_IBM(0,0)
-
 #define MRMWMM_OFFSET(x) offsetof(struct hipz_mrmwmm,x)
 
 struct hipz_qpedmm {
diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
index 7e55a31..2f13509 100644
--- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
+++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
@@ -226,10 +226,9 @@ static inline void *ipz_eqit_eq_get_inc_
 {
 	void *ret = ipz_qeit_get(queue);
 	u32 qe = *(u8 *) ret;
-	if ((qe >> 7) == (queue->toggle_state & 1))
-		ipz_qeit_eq_get_inc(queue); /* this is a good one */
-	else
-		ret = NULL;
+	if ((qe >> 7) != (queue->toggle_state & 1))
+		return NULL;
+	ipz_qeit_eq_get_inc(queue); /* this is a good one */
 	return ret;
 }
 

From ardavis at ichips.intel.com  Fri Sep 22 14:09:35 2006
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Fri, 22 Sep 2006 14:09:35 -0700
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
In-Reply-To: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
Message-ID: <4514510F.3050400@ichips.intel.com>

Sean Hefty wrote:

>Currently a DREP is only sent in response to a DREQ if a connection
>has been found matching the DREQ, and it is in the proper state.  Once
>a DREP is sent, the local connection moves into timewait.  Duplicate
>DREQs received while in this state result in re-sending the DREP.
>
>However, it's likely that the local connection will enter and exit
>timewait before the remote side times out a lost DREP and resends a DREQ.
>There are a couple possible solutions to this.  One is to increase how
>long a connection remains in timewait, by multiplying its wait time by
>max_cm_retries.  This can greatly increase the timewait state before a QP
>can be re-used when CM messages are not lost.
>
>An alternative is to send a DREP in response to a DREQ, even if a local
>connection is not found, which is what this patch does.
>  
>

Would it be possible to get this fix in  rc7? I am consistently seeing 
this problem with Intel MPI on a 64 node cluster.

-arlin


From rdreier at cisco.com  Fri Sep 22 15:37:22 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Sep 2006 15:37:22 -0700
Subject: [openib-general] [GIT PULL] please pull infiniband.git
Message-ID: <adapsdnfrz1.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree has:
 - Add support for iWARP (RDMA over IP)
 - Add amso1100 driver for Ammasso 1100 iWARP adapters
 - Add ehca driver for IBM GX InfiniBand adapters
 - ipath fixes
 - lots of other smaller stuff

Bryan O'Sullivan:
      IB/ipath: More changes to support InfiniPath on PowerPC 970 systems
      IB/ipath: lock resource limit counters correctly
      IB/ipath: fix for crash on module unload, if cfgports < portcnt
      IB/ipath: fix handling of kpiobufs
      IB/ipath: drop requirement that PIO buffers be mmaped write-only
      IB/ipath: merge ipath_core and ib_ipath drivers
      IB/ipath: simplify layering code
      IB/ipath: simplify debugging code after ipath_core and ib_ipath merger
      IB/ipath: remove stale references to userspace SMA
      IB/ipath: trivial cleanups
      IB/ipath: add new minor device to allow sending of diag packets
      IB/ipath: do not allow use of CQ entries with invalid counts
      IB/ipath: account for attached QPs correctly
      IB/ipath: support new QLogic product naming scheme
      IB/ipath: add serial number to hardware freeze error message
      IB/ipath: be more strict about testing the modify QP verb
      IB/ipath: validate path_mig_state properly
      IB/ipath: put a limit on the number of QPs that can be created
      IB/ipath: handle sq_sig_all field correctly
      IB/ipath: allow SMA to be disabled
      IB/ipath: fix return value from ipath_poll
      IB/ipath: control receive polarity inversion

Dotan Barak:
      IPoIB: Remove unused include of vmalloc.h

Eli Cohen:
      IPoIB: Rejoin all multicast groups after a port event
      IPoIB: Add some likely/unlikely annotations in hot path

Erez Zilber:
      IB/iser: fix a check of SG alignment for RDMA
      IB/iser: Limit the max size of a scsi command
      IB/iser: make FMR "page size" be 4K and not PAGE_SIZE
      IB/iser: fix some debug prints
      IB/iser: Do not use FMR for a single dma entry sg

Heiko J Schick:
      IB/ehca: Add driver for IBM eHCA InfiniBand adapters

Ishai Rabinovitz:
      IB/srp: Add port/device attributes

Jack Morgenstein:
      IB/mthca: Fix lid used for sending traps
      IB/mthca: Fix default static rate returned for Tavor in AV
      IB/mthca: Return port number for unconnected QPs in query_qp
      IB/mthca: Return correct number of bits for static rate in query_qp
      IB/mthca: Recover from catastrophic errors

James Lentini:
      IB/mthca: Include the header we really want
      IB/mad: Remove unused includes

Krishna Kumar:
      IB: Fix typo in kerneldoc for ib_set_client_data()

Michael S. Tsirkin:
      IB/mthca: Don't use privileged UAR for kernel access
      IB/ipoib: Fix flush/start xmit race (from code review)
      IB/sa: Require SA registration
      IB/cm: Do not track remote QPN in timewait state
      IB/sa: fix ib_sa_selector names

Or Gerlitz:
      RDMA/cma: Document rdma_destroy_id() function
      RDMA/cma: Document rdma_accept() error handling

Ralph Campbell:
      IB/uverbs: Allow resize CQ operation to return driver-specific data
      IB/uverbs: Pass userspace data to modify_srq and modify_qp methods
      IB/ipath: Performance improvements via mmap of queues

Roland Dreier:
      IB/uverbs: Use idr_read_cq() where appropriate
      IB/uverbs: Fix lockdep warning when QP is created with 2 CQs
      IB: Whitespace fixes
      IPoIB: Refactor completion handling
      IB/mthca: Simplify calls to mthca_cq_clean()
      IB/iser: INFINIBAND_ISER depends on INET
      IPoIB: Create MCGs with all attributes required by RFC

Sean Hefty:
      IB/cm: Enable atomics along with RDMA reads
      IB/cm: Use correct reject code for invalid GID
      IB/mad: Add support for dual-sided RMPP transfers.
      IB/cm: Randomize starting comm ID
      RDMA/cma: Protect against adding device during destruction

Tom Tucker:
      RDMA: iWARP Connection Manager.
      RDMA: iWARP Core Changes.
      RDMA/amso1100: Add driver for Ammasso 1100 RNIC

 MAINTAINERS                                       |   16 
 drivers/infiniband/Kconfig                        |    4 
 drivers/infiniband/Makefile                       |    4 
 drivers/infiniband/core/Makefile                  |    4 
 drivers/infiniband/core/addr.c                    |   22 
 drivers/infiniband/core/cache.c                   |    5 
 drivers/infiniband/core/cm.c                      |   66 -
 drivers/infiniband/core/cma.c                     |  403 +++-
 drivers/infiniband/core/device.c                  |    6 
 drivers/infiniband/core/iwcm.c                    | 1019 +++++++++
 drivers/infiniband/core/iwcm.h                    |   62 +
 drivers/infiniband/core/mad.c                     |   19 
 drivers/infiniband/core/mad_priv.h                |    1 
 drivers/infiniband/core/mad_rmpp.c                |   94 +
 drivers/infiniband/core/sa_query.c                |   67 +
 drivers/infiniband/core/smi.c                     |   16 
 drivers/infiniband/core/sysfs.c                   |   13 
 drivers/infiniband/core/ucm.c                     |    9 
 drivers/infiniband/core/user_mad.c                |    7 
 drivers/infiniband/core/uverbs_cmd.c              |   64 -
 drivers/infiniband/core/verbs.c                   |   21 
 drivers/infiniband/hw/amso1100/Kbuild             |    8 
 drivers/infiniband/hw/amso1100/Kconfig            |   15 
 drivers/infiniband/hw/amso1100/c2.c               | 1255 ++++++++++++
 drivers/infiniband/hw/amso1100/c2.h               |  551 +++++
 drivers/infiniband/hw/amso1100/c2_ae.c            |  321 +++
 drivers/infiniband/hw/amso1100/c2_ae.h            |  108 +
 drivers/infiniband/hw/amso1100/c2_alloc.c         |  144 +
 drivers/infiniband/hw/amso1100/c2_cm.c            |  452 ++++
 drivers/infiniband/hw/amso1100/c2_cq.c            |  433 ++++
 drivers/infiniband/hw/amso1100/c2_intr.c          |  209 ++
 drivers/infiniband/hw/amso1100/c2_mm.c            |  375 +++
 drivers/infiniband/hw/amso1100/c2_mq.c            |  174 ++
 drivers/infiniband/hw/amso1100/c2_mq.h            |  106 +
 drivers/infiniband/hw/amso1100/c2_pd.c            |   89 +
 drivers/infiniband/hw/amso1100/c2_provider.c      |  869 ++++++++
 drivers/infiniband/hw/amso1100/c2_provider.h      |  181 ++
 drivers/infiniband/hw/amso1100/c2_qp.c            |  975 +++++++++
 drivers/infiniband/hw/amso1100/c2_rnic.c          |  663 ++++++
 drivers/infiniband/hw/amso1100/c2_status.h        |  158 +
 drivers/infiniband/hw/amso1100/c2_user.h          |   82 +
 drivers/infiniband/hw/amso1100/c2_vq.c            |  260 ++
 drivers/infiniband/hw/amso1100/c2_vq.h            |   63 +
 drivers/infiniband/hw/amso1100/c2_wr.h            | 1520 ++++++++++++++
 drivers/infiniband/hw/ehca/Kconfig                |   16 
 drivers/infiniband/hw/ehca/Makefile               |   16 
 drivers/infiniband/hw/ehca/ehca_av.c              |  271 +++
 drivers/infiniband/hw/ehca/ehca_classes.h         |  346 +++
 drivers/infiniband/hw/ehca/ehca_classes_pSeries.h |  236 ++
 drivers/infiniband/hw/ehca/ehca_cq.c              |  427 ++++
 drivers/infiniband/hw/ehca/ehca_eq.c              |  185 ++
 drivers/infiniband/hw/ehca/ehca_hca.c             |  241 ++
 drivers/infiniband/hw/ehca/ehca_irq.c             |  762 +++++++
 drivers/infiniband/hw/ehca/ehca_irq.h             |   77 +
 drivers/infiniband/hw/ehca/ehca_iverbs.h          |  182 ++
 drivers/infiniband/hw/ehca/ehca_main.c            |  818 ++++++++
 drivers/infiniband/hw/ehca/ehca_mcast.c           |  131 +
 drivers/infiniband/hw/ehca/ehca_mrmw.c            | 2261 +++++++++++++++++++++
 drivers/infiniband/hw/ehca/ehca_mrmw.h            |  140 +
 drivers/infiniband/hw/ehca/ehca_pd.c              |  114 +
 drivers/infiniband/hw/ehca/ehca_qes.h             |  259 ++
 drivers/infiniband/hw/ehca/ehca_qp.c              | 1507 ++++++++++++++
 drivers/infiniband/hw/ehca/ehca_reqs.c            |  653 ++++++
 drivers/infiniband/hw/ehca/ehca_sqp.c             |  111 +
 drivers/infiniband/hw/ehca/ehca_tools.h           |  172 ++
 drivers/infiniband/hw/ehca/ehca_uverbs.c          |  392 ++++
 drivers/infiniband/hw/ehca/hcp_if.c               |  874 ++++++++
 drivers/infiniband/hw/ehca/hcp_if.h               |  261 ++
 drivers/infiniband/hw/ehca/hcp_phyp.c             |   80 +
 drivers/infiniband/hw/ehca/hcp_phyp.h             |   90 +
 drivers/infiniband/hw/ehca/hipz_fns.h             |   68 +
 drivers/infiniband/hw/ehca/hipz_fns_core.h        |  100 +
 drivers/infiniband/hw/ehca/hipz_hw.h              |  388 ++++
 drivers/infiniband/hw/ehca/ipz_pt_fn.c            |  149 +
 drivers/infiniband/hw/ehca/ipz_pt_fn.h            |  247 ++
 drivers/infiniband/hw/ipath/Kconfig               |   21 
 drivers/infiniband/hw/ipath/Makefile              |   29 
 drivers/infiniband/hw/ipath/ipath_common.h        |   19 
 drivers/infiniband/hw/ipath/ipath_cq.c            |  183 +-
 drivers/infiniband/hw/ipath/ipath_debug.h         |    2 
 drivers/infiniband/hw/ipath/ipath_diag.c          |  154 +
 drivers/infiniband/hw/ipath/ipath_driver.c        |  349 ++-
 drivers/infiniband/hw/ipath/ipath_file_ops.c      |   35 
 drivers/infiniband/hw/ipath/ipath_fs.c            |    4 
 drivers/infiniband/hw/ipath/ipath_ht400.c         | 1603 ---------------
 drivers/infiniband/hw/ipath/ipath_iba6110.c       | 1612 +++++++++++++++
 drivers/infiniband/hw/ipath/ipath_iba6120.c       | 1264 ++++++++++++
 drivers/infiniband/hw/ipath/ipath_init_chip.c     |   21 
 drivers/infiniband/hw/ipath/ipath_intr.c          |   24 
 drivers/infiniband/hw/ipath/ipath_kernel.h        |   57 -
 drivers/infiniband/hw/ipath/ipath_keys.c          |    3 
 drivers/infiniband/hw/ipath/ipath_layer.c         | 1179 -----------
 drivers/infiniband/hw/ipath/ipath_layer.h         |  115 -
 drivers/infiniband/hw/ipath/ipath_mad.c           |  339 +++
 drivers/infiniband/hw/ipath/ipath_mmap.c          |  122 +
 drivers/infiniband/hw/ipath/ipath_mr.c            |   12 
 drivers/infiniband/hw/ipath/ipath_pe800.c         | 1254 ------------
 drivers/infiniband/hw/ipath/ipath_qp.c            |  242 ++
 drivers/infiniband/hw/ipath/ipath_rc.c            |    9 
 drivers/infiniband/hw/ipath/ipath_registers.h     |    7 
 drivers/infiniband/hw/ipath/ipath_ruc.c           |  160 +
 drivers/infiniband/hw/ipath/ipath_srq.c           |  244 +-
 drivers/infiniband/hw/ipath/ipath_stats.c         |   27 
 drivers/infiniband/hw/ipath/ipath_sysfs.c         |   41 
 drivers/infiniband/hw/ipath/ipath_uc.c            |    5 
 drivers/infiniband/hw/ipath/ipath_ud.c            |  182 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c         |  687 +++++-
 drivers/infiniband/hw/ipath/ipath_verbs.h         |  252 ++
 drivers/infiniband/hw/ipath/ipath_verbs_mcast.c   |    7 
 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c      |   52 
 drivers/infiniband/hw/ipath/verbs_debug.h         |  108 -
 drivers/infiniband/hw/mthca/mthca_av.c            |    2 
 drivers/infiniband/hw/mthca/mthca_catas.c         |   62 +
 drivers/infiniband/hw/mthca/mthca_cmd.c           |    2 
 drivers/infiniband/hw/mthca/mthca_cq.c            |   10 
 drivers/infiniband/hw/mthca/mthca_dev.h           |   12 
 drivers/infiniband/hw/mthca/mthca_mad.c           |    2 
 drivers/infiniband/hw/mthca/mthca_main.c          |   88 +
 drivers/infiniband/hw/mthca/mthca_provider.c      |    2 
 drivers/infiniband/hw/mthca/mthca_qp.c            |   20 
 drivers/infiniband/hw/mthca/mthca_srq.c           |    2 
 drivers/infiniband/hw/mthca/mthca_uar.c           |    2 
 drivers/infiniband/ulp/ipoib/ipoib.h              |    2 
 drivers/infiniband/ulp/ipoib/ipoib_ib.c           |  194 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c         |   37 
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c    |   34 
 drivers/infiniband/ulp/iser/Kconfig               |    2 
 drivers/infiniband/ulp/iser/iscsi_iser.c          |    1 
 drivers/infiniband/ulp/iser/iscsi_iser.h          |    7 
 drivers/infiniband/ulp/iser/iser_memory.c         |   80 +
 drivers/infiniband/ulp/iser/iser_verbs.c          |   10 
 drivers/infiniband/ulp/srp/ib_srp.c               |   43 
 include/rdma/ib_addr.h                            |   17 
 include/rdma/ib_sa.h                              |   45 
 include/rdma/ib_user_verbs.h                      |    2 
 include/rdma/ib_verbs.h                           |   31 
 include/rdma/iw_cm.h                              |  258 ++
 include/rdma/rdma_cm.h                            |   12 
 138 files changed, 28416 insertions(+), 5494 deletions(-)
 create mode 100644 drivers/infiniband/core/iwcm.c
 create mode 100644 drivers/infiniband/core/iwcm.h
 create mode 100644 drivers/infiniband/hw/amso1100/Kbuild
 create mode 100644 drivers/infiniband/hw/amso1100/Kconfig
 create mode 100644 drivers/infiniband/hw/amso1100/c2.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_ae.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_ae.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_alloc.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_cm.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_cq.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_intr.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_mm.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_mq.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_mq.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_pd.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_provider.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_provider.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_qp.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_rnic.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_status.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_user.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_vq.c
 create mode 100644 drivers/infiniband/hw/amso1100/c2_vq.h
 create mode 100644 drivers/infiniband/hw/amso1100/c2_wr.h
 create mode 100644 drivers/infiniband/hw/ehca/Kconfig
 create mode 100644 drivers/infiniband/hw/ehca/Makefile
 create mode 100644 drivers/infiniband/hw/ehca/ehca_av.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_classes.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_classes_pSeries.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_cq.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_eq.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_hca.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_irq.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_irq.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_iverbs.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_main.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_mcast.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_mrmw.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_mrmw.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_pd.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_qes.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_qp.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_reqs.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_sqp.c
 create mode 100644 drivers/infiniband/hw/ehca/ehca_tools.h
 create mode 100644 drivers/infiniband/hw/ehca/ehca_uverbs.c
 create mode 100644 drivers/infiniband/hw/ehca/hcp_if.c
 create mode 100644 drivers/infiniband/hw/ehca/hcp_if.h
 create mode 100644 drivers/infiniband/hw/ehca/hcp_phyp.c
 create mode 100644 drivers/infiniband/hw/ehca/hcp_phyp.h
 create mode 100644 drivers/infiniband/hw/ehca/hipz_fns.h
 create mode 100644 drivers/infiniband/hw/ehca/hipz_fns_core.h
 create mode 100644 drivers/infiniband/hw/ehca/hipz_hw.h
 create mode 100644 drivers/infiniband/hw/ehca/ipz_pt_fn.c
 create mode 100644 drivers/infiniband/hw/ehca/ipz_pt_fn.h
 delete mode 100644 drivers/infiniband/hw/ipath/ipath_ht400.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_iba6110.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_iba6120.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_mmap.c
 delete mode 100644 drivers/infiniband/hw/ipath/ipath_pe800.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
 delete mode 100644 drivers/infiniband/hw/ipath/verbs_debug.h
 create mode 100644 include/rdma/iw_cm.h


From rdreier at cisco.com  Fri Sep 22 18:21:47 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 22 Sep 2006 18:21:47 -0700
Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface
 based on Anton Blanchard's new hvcall interface
In-Reply-To: <200609222200.12722.hnguyen@de.ibm.com> (Hoang-Nam Nguyen's
	message of "Fri, 22 Sep 2006 22:00:12 +0200")
References: <200609222200.12722.hnguyen@de.ibm.com>
Message-ID: <adahcyzfkd0.fsf@cisco.com>

Thanks, I rolled this up in the ehca patch in my tree.

Anyway both Paul and I merged with Linus today, so the hcall cleanup
and ehca are both upstream.  It would be great if you could do a quick
check to make sure that ehca works in Linus's current git tree.

Thanks,
  Roland


From delaitt at cpc.wmin.ac.uk  Sat Sep 23 03:47:28 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Sat, 23 Sep 2006 11:47:28 +0100 (BST)
Subject: [openib-general] symbols missing on ib_cm and rdma_cm / ofed-1.1rc6
	sles10
Message-ID: <Pine.GSO.4.58.0609231147040.25867@seth.cpc.wmin.ac.uk>


Hi,

I'm using ofed-1.1 rc6 on sles10. ipoib works ok but i'm experiencing
problems using lustre nal with rdma. in particular, i cannot load the
following 2 ib mobules. any help would be appreciated.

it seems the problem is with ofed (1.1-rc6). it seems those 2 ib modules
cannot load and will hence prevent ko2iblnd from loading up.

Sep 23 11:30:30 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_wc
Sep 23 11:30:30 n32 kernel: ib_cm: Unknown symbol ib_init_ah_from_path
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_listen
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_destroy_cm_id
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_create_cm_id
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rep
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_init_qp_attr
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_drep
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rtu
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_dreq
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_req
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_cm_establish
Sep 23 11:32:37 n32 kernel: rdma_cm: Unknown symbol ib_send_cm_rej

Cheers,

Thierry.


From HNGUYEN at de.ibm.com  Sat Sep 23 13:45:28 2006
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Sat, 23 Sep 2006 22:45:28 +0200
Subject: [openib-general] [PATCH 2.6.19-rc1] ehca firmware interface
 based on Anton Blanchard's new hvcall interface
In-Reply-To: <adahcyzfkd0.fsf@cisco.com>
Message-ID: <OFFF711B0A.758C0571-ONC12571F2.0070BAD2-C12571F2.0071ED80@de.ibm.com>

Hi Roland,
> Anyway both Paul and I merged with Linus today, so the hcall cleanup
> and ehca are both upstream.  It would be great if you could do a quick
> check to make sure that ehca works in Linus's current git tree.
I compiled Linus's git tree and did some basic tests successfully
with ehca (ipoib, userspace, netpipe tcp/ib).
Thanks!
Nam Nguyen


From delaitt at cpc.wmin.ac.uk  Sun Sep 24 02:57:32 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Sun, 24 Sep 2006 10:57:32 +0100 (BST)
Subject: [openib-general] problems with lustre o2ib module & ofed
Message-ID: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>


I get the following when loading lustre o2ib module. I'm using ofed-1.1
rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
kernel i'm using and lustre too. I don't understand why i get the
following as i only have one version of the ib modules ?

ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
o2ib, module ko2iblnd, rc=256


n32:~/lustre-1.5.95 # ls -l
/lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/
total 328
-rw-r--r-- 1 root root 13190 Sep 24 10:16 ib_addr.ko
-rw-r--r-- 1 root root 37875 Sep 24 10:16 ib_cm.ko
-rw-r--r-- 1 root root 57592 Sep 24 10:16 ib_core.ko
-rw-r--r-- 1 root root 42829 Sep 24 10:16 ib_mad.ko
-rw-r--r-- 1 root root 20095 Sep 24 10:16 ib_sa.ko
-rw-r--r-- 1 root root 22930 Sep 24 10:16 ib_ucm.ko
-rw-r--r-- 1 root root 21234 Sep 24 10:16 ib_umad.ko
-rw-r--r-- 1 root root 45057 Sep 24 10:16 ib_uverbs.ko
-rw-r--r-- 1 root root 29987 Sep 24 10:16 rdma_cm.ko
-rw-r--r-- 1 root root 17669 Sep 24 10:16 rdma_ucm.ko
n32:~/lustre-1.5.95 # ls -ld /usr/local/ofed
drwxr-xr-x 9 root root 328 Sep 24 10:18 /usr/local/ofed

Thierry.


From rjwalsh at pathscale.com  Sun Sep 24 10:45:01 2006
From: rjwalsh at pathscale.com (Robert Walsh)
Date: Sun, 24 Sep 2006 10:45:01 -0700
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
Message-ID: <4516C41D.1060301@pathscale.com>

Thierry Delaitre wrote:
> I get the following when loading lustre o2ib module. I'm using ofed-1.1
> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
> kernel i'm using and lustre too. I don't understand why i get the
> following as i only have one version of the ib modules ?

This explanation gets ugly :-)

The short description is: you can't build external modules that depend 
on other external modules that you previously built.

The reason why is: the kernel devel stuff ships with a file called 
Module.symvers, which contains all the version information for all the 
symbols in the kernel and in all the modules built when the kernel was 
built.  When you build an external module, the kernel build stuff looks 
in here to get the version information for any symbol referenced that it 
can't find in the group of modules you're building.  If you've replaced 
some modules with newer ones (like what happens when you install 
OFED-1.1, for example), then the symbol versions in the new modules will 
not match what's in the Module.symvers file.

In your case, you installed a bunch of new modules (OFED-1.1) and then, 
in a second step, installed another new module (Lustre).  The OFED-1.1 
build was OK because all external symbols that it referenced (all of 
which are in vmlinux, I think) had properly matching version entries in 
Module.symvers.  The Lustre build, however, was pulling ib_* symbols 
from the new OFED-1.1 modules that had mismatching symbol versions in 
Module.symvers from the original kernel modules (I don't remember if the 
kernel build warns about mismatching symbol versions at build time.)

At insmod time, the kernel checks that the symbol versions of 
already-loaded modules match the expected versions in the to-be-loaded 
module.  In your case, they will not.

One solutions is: extract the kernel sources form the OFED-1.1 
distribution, patch them as the OFED build script would, add in the 
Lustre bits and build the whole thing yourself manually.

Another solution is: update the Module.symvers file.

Neither is a terribly satisfactory solution.

Regards,
  Robert.


From jackm at dev.mellanox.co.il  Sun Sep 24 23:40:47 2006
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Mon, 25 Sep 2006 09:40:47 +0300
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
Message-ID: <200609250940.47936.jackm@dev.mellanox.co.il>

Did you recompile Lustre following the installation of ofed-1.1?
I'm not familiar with the Lustre installation procedure (i.e., if it 
gets compiled on the current host).  If yes, you probably merely need
to uninstall and reinstall Lustre o2ib. 

- Jack

On Sunday 24 September 2006 12:57, Thierry Delaitre wrote:
> 
> I get the following when loading lustre o2ib module. I'm using ofed-1.1
> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
> kernel i'm using and lustre too. I don't understand why i get the
> following as i only have one version of the ib modules ?
> 
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq
> ko2iblnd: disagrees about version of symbol ib_dereg_mr
> ko2iblnd: Unknown symbol ib_dereg_mr
> ko2iblnd: disagrees about version of symbol ib_destroy_cq
> ko2iblnd: Unknown symbol ib_destroy_cq
> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> ko2iblnd: Unknown symbol ib_get_dma_mr
> ko2iblnd: disagrees about version of symbol ib_alloc_pd
> ko2iblnd: Unknown symbol ib_alloc_pd
> ko2iblnd: disagrees about version of symbol ib_modify_qp
> ko2iblnd: Unknown symbol ib_modify_qp
> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> ko2iblnd: Unknown symbol ib_dealloc_pd
> LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> o2ib, module ko2iblnd, rc=256
> 
> 
> n32:~/lustre-1.5.95 # ls -l
> /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/
> total 328
> -rw-r--r-- 1 root root 13190 Sep 24 10:16 ib_addr.ko
> -rw-r--r-- 1 root root 37875 Sep 24 10:16 ib_cm.ko
> -rw-r--r-- 1 root root 57592 Sep 24 10:16 ib_core.ko
> -rw-r--r-- 1 root root 42829 Sep 24 10:16 ib_mad.ko
> -rw-r--r-- 1 root root 20095 Sep 24 10:16 ib_sa.ko
> -rw-r--r-- 1 root root 22930 Sep 24 10:16 ib_ucm.ko
> -rw-r--r-- 1 root root 21234 Sep 24 10:16 ib_umad.ko
> -rw-r--r-- 1 root root 45057 Sep 24 10:16 ib_uverbs.ko
> -rw-r--r-- 1 root root 29987 Sep 24 10:16 rdma_cm.ko
> -rw-r--r-- 1 root root 17669 Sep 24 10:16 rdma_ucm.ko
> n32:~/lustre-1.5.95 # ls -ld /usr/local/ofed
> drwxr-xr-x 9 root root 328 Sep 24 10:18 /usr/local/ofed
> 
> Thierry.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From ogerlitz at voltaire.com  Sun Sep 24 23:48:04 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 25 Sep 2006 09:48:04 +0300
Subject: [openib-general] RDMA CM callback status
In-Reply-To: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com>
References: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com>
Message-ID: <45177BA4.2090001@voltaire.com>

Sean Hefty wrote:
>> 2. /* handle error out-of-line */ above means I record failure in my connection
>>   data structure, start teardown and drop the callback's reference on it.
>>   When the last reference goes, the connection data structure is queued for
>>   final destruction (including rdma_destroy_id(cmid)).
>>
>>   Given that this might race with the callback's caller is this OK?
> 
> Yes - The RDMA CM holds a reference on the cmid while in a callback, and drops
> it once the callback returns.  rdma_destroy_id() will block until all references
> are released on the cmid.

Eric,

Just to make sure, please be aware to the node in rdma_cm.h telling that 
you are not allowed to call rdma_destroy_id() from the **context** of 
the cma callback (since as Sean explained above in that case the cma 
will block on a ref which would never reach zero).

Or.


From ogerlitz at voltaire.com  Sun Sep 24 23:58:20 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 25 Sep 2006 09:58:20 +0300
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <200609250940.47936.jackm@dev.mellanox.co.il>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
Message-ID: <45177E0C.1040101@voltaire.com>

Jack Morgenstein wrote:
> Did you recompile Lustre following the installation of ofed-1.1?
> I'm not familiar with the Lustre installation procedure (i.e., if it 
> gets compiled on the current host).  If yes, you probably merely need
> to uninstall and reinstall Lustre o2ib. 

OK, can we state clearly what's the user needs to do with modules 
directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and 
hopefully more to come).

Is it recompile / uninstall / install ???

Or.

> On Sunday 24 September 2006 12:57, Thierry Delaitre wrote:
>> I get the following when loading lustre o2ib module. I'm using ofed-1.1
>> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
>> kernel i'm using and lustre too. I don't understand why i get the
>> following as i only have one version of the ib modules ?


From jackm at dev.mellanox.co.il  Sun Sep 24 23:58:11 2006
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Mon, 25 Sep 2006 09:58:11 +0300
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <4516C41D.1060301@pathscale.com>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<4516C41D.1060301@pathscale.com>
Message-ID: <200609250958.11476.jackm@dev.mellanox.co.il>

Robert,

We build "external modules that depend on other external
modules that you previously built" all the time in our
regression testing -- and this runs properly under lots of
distibutions and under lots of different linux kernels.

we do not experience the problem you describe.

We build kernel modules which exercise various installed OFED 1.1
kernel modules (ib_verbs, ib_mad, etc etc).  We then load these
kernel modules during our regression testing to verify the operation
of the OFED 1.1 kernel modules.

If the explanation you provide below is correct, our kernel module
testing would not work at all. (We do not do any of the workarounds
you described below).

We have seen problems like the one described when either:
a. The dependent external modules were not rebuilt following
   OFED installation.
or
b. There were old .ko files lying around which were loaded instead of
   the installed OFED .ko files.

- Jack

On Sunday 24 September 2006 20:45, Robert Walsh wrote:
> This explanation gets ugly :-)
> 
> The short description is: you can't build external modules that depend 
> on other external modules that you previously built.
> 
> The reason why is: the kernel devel stuff ships with a file called 
> Module.symvers, which contains all the version information for all the 
> symbols in the kernel and in all the modules built when the kernel was 
> built.  When you build an external module, the kernel build stuff looks 
> in here to get the version information for any symbol referenced that it 
> can't find in the group of modules you're building.  If you've replaced 
> some modules with newer ones (like what happens when you install 
> OFED-1.1, for example), then the symbol versions in the new modules will 
> not match what's in the Module.symvers file.
> 
> In your case, you installed a bunch of new modules (OFED-1.1) and then, 
> in a second step, installed another new module (Lustre).  The OFED-1.1 
> build was OK because all external symbols that it referenced (all of 
> which are in vmlinux, I think) had properly matching version entries in 
> Module.symvers.  The Lustre build, however, was pulling ib_* symbols 
> from the new OFED-1.1 modules that had mismatching symbol versions in 
> Module.symvers from the original kernel modules (I don't remember if the 
> kernel build warns about mismatching symbol versions at build time.)
> 
> At insmod time, the kernel checks that the symbol versions of 
> already-loaded modules match the expected versions in the to-be-loaded 
> module.  In your case, they will not.
> 
> One solutions is: extract the kernel sources form the OFED-1.1 
> distribution, patch them as the OFED build script would, add in the 
> Lustre bits and build the whole thing yourself manually.
> 
> Another solution is: update the Module.symvers file.
> 
> Neither is a terribly satisfactory solution.
> 
> Regards,
>   Robert.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From jackm at dev.mellanox.co.il  Mon Sep 25 00:34:55 2006
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Mon, 25 Sep 2006 10:34:55 +0300
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <45177E0C.1040101@voltaire.com>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
Message-ID: <200609251034.56218.jackm@dev.mellanox.co.il>

On Monday 25 September 2006 09:58, Or Gerlitz wrote:
> Jack Morgenstein wrote:
> > Did you recompile Lustre following the installation of ofed-1.1?
> > I'm not familiar with the Lustre installation procedure (i.e., if it 
> > gets compiled on the current host).  If yes, you probably merely need
> > to uninstall and reinstall Lustre o2ib. 
> 
> OK, can we state clearly what's the user needs to do with modules 
> directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and 
> hopefully more to come).
> 
> Is it recompile / uninstall / install ???

If possible:
 - recompile (make) and reinstall to kernel (make install) Lustre o2ib

Otherwise:
 - uninstall and reinstall onto the host Lustre o2ib (assuming that
	the Lustre installation compiles its modules on that host
	during the installation process and then installs them,
	rather than just copying over pre-compiled modules to
	/lib/modules/<kernel version>/drivers/kernel/infiniband
> 
> > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote:
> >> I get the following when loading lustre o2ib module. I'm using ofed-1.1
> >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
> >> kernel i'm using and lustre too. I don't understand why i get the
> >> following as i only have one version of the ib modules ?

This is a bit unclear.  Was Lustre installed AFTER the OFED installation?

- Jack


From mst at mellanox.co.il  Mon Sep 25 00:41:55 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 10:41:55 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adar6y4hlkf.fsf@cisco.com>
References: <1158850657.24776.158.camel@localhost> <adahcz1hz4t.fsf@cisco.com>
	<62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il>
	<adar6y4hlkf.fsf@cisco.com>
Message-ID: <20060925074155.GB21836@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
>  > > So probably what we need is a feature bit in the struct ib_device
>  > > to say whether the peek CQ is needed or whether req notify will
>  > > generate events for existing CQEs.
> 
>  > Sounds good to me
> 
> The biggest problem I have with this is that I don't know what to call
> the feature bit.  Any suggestions?

Actually, the reason it is hard to come up with
the name is that what this enables is the natural
poll/request notification order.

Maybe set bit for the lack of the feature? REQUIRES_POLL_AFTER_ARM?

-- 
MST


From delaitt at cpc.wmin.ac.uk  Mon Sep 25 00:42:59 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Mon, 25 Sep 2006 08:42:59 +0100 (BST)
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <45177E0C.1040101@voltaire.com>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
Message-ID: <Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>


On Mon, 25 Sep 2006, Or Gerlitz wrote:

> Jack Morgenstein wrote:
> > Did you recompile Lustre following the installation of ofed-1.1?
> > I'm not familiar with the Lustre installation procedure (i.e., if it
> > gets compiled on the current host).  If yes, you probably merely need
> > to uninstall and reinstall Lustre o2ib.
>
> OK, can we state clearly what's the user needs to do with modules
> directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and
> hopefully more to come).
>
> Is it recompile / uninstall / install ???

The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6
for SLES10.

ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all
resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/
and do match the ones compiled by ofed. I have tried these steps several
times.

n32:~ # lsmod | grep ib
libcfs                103060  1 lnet
ib_ucm                 19332  0
ib_addr                10756  1 rdma_cm
ib_cm                  31968  2 ib_ucm,rdma_cm
ib_ipoib               48144  0
ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
ib_uverbs              38312  2 rdma_ucm,ib_ucm
ib_umad                17968  0
ib_mthca              116240  0
ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
ib_core                49024  9
ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad

I compiled lustre for the above kernel and ofed installation. I get the
following when doing a 'lctl network up' in lustre. I have modversion set
to on in the kernel. If i set it to 'n' then i get a null pointer
exception and the module crashes.

ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
o2ib, module ko2iblnd, rc=256

I have tried with ofed-1.1-rc5 and experiences the same issue.

Thierry.

> Or.
>
> > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote:
> >> I get the following when loading lustre o2ib module. I'm using ofed-1.1
> >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
> >> kernel i'm using and lustre too. I don't understand why i get the
> >> following as i only have one version of the ib modules ?
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.


From mst at mellanox.co.il  Mon Sep 25 01:00:51 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 11:00:51 +0300
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
Message-ID: <20060925080051.GC21836@mellanox.co.il>

Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> Subject: Re: problems with lustre o2ib module & ofed
> 
> 
> On Mon, 25 Sep 2006, Or Gerlitz wrote:
> 
> > Jack Morgenstein wrote:
> > > Did you recompile Lustre following the installation of ofed-1.1?
> > > I'm not familiar with the Lustre installation procedure (i.e., if it
> > > gets compiled on the current host).  If yes, you probably merely need
> > > to uninstall and reinstall Lustre o2ib.
> >
> > OK, can we state clearly what's the user needs to do with modules
> > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and
> > hopefully more to come).
> >
> > Is it recompile / uninstall / install ???
> 
> The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6
> for SLES10.
> 
> ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all
> resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/
> and do match the ones compiled by ofed. I have tried these steps several
> times.
> 
> n32:~ # lsmod | grep ib
> libcfs                103060  1 lnet
> ib_ucm                 19332  0
> ib_addr                10756  1 rdma_cm
> ib_cm                  31968  2 ib_ucm,rdma_cm
> ib_ipoib               48144  0
> ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
> ib_uverbs              38312  2 rdma_ucm,ib_ucm
> ib_umad                17968  0
> ib_mthca              116240  0
> ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
> ib_core                49024  9
> ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
> 
> I compiled lustre for the above kernel and ofed installation. I get the
> following when doing a 'lctl network up' in lustre. I have modversion set
> to on in the kernel. If i set it to 'n' then i get a null pointer
> exception and the module crashes.
> 
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq

don't know anything about lustre, but note you must
point build to pick up headers from
/usr/local/ofed/src/openib/include/
*before* the built-in header includes.

replace /usr/local/ofed with the prefix you specified.

-- 
MST


From delaitt at cpc.wmin.ac.uk  Mon Sep 25 01:12:01 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Mon, 25 Sep 2006 09:12:01 +0100 (BST)
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <20060925080051.GC21836@mellanox.co.il>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
	<20060925080051.GC21836@mellanox.co.il>
Message-ID: <Pine.GSO.4.58.0609250909180.25974@seth.cpc.wmin.ac.uk>


On Mon, 25 Sep 2006, Michael S. Tsirkin wrote:

> Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> > Subject: Re: problems with lustre o2ib module & ofed
> >
> >
> > On Mon, 25 Sep 2006, Or Gerlitz wrote:
> >
> > > Jack Morgenstein wrote:
> > > > Did you recompile Lustre following the installation of ofed-1.1?
> > > > I'm not familiar with the Lustre installation procedure (i.e., if it
> > > > gets compiled on the current host).  If yes, you probably merely need
> > > > to uninstall and reinstall Lustre o2ib.
> > >
> > > OK, can we state clearly what's the user needs to do with modules
> > > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and
> > > hopefully more to come).
> > >
> > > Is it recompile / uninstall / install ???
> >
> > The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6
> > for SLES10.
> >
> > ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all
> > resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/
> > and do match the ones compiled by ofed. I have tried these steps several
> > times.
> >
> > n32:~ # lsmod | grep ib
> > libcfs                103060  1 lnet
> > ib_ucm                 19332  0
> > ib_addr                10756  1 rdma_cm
> > ib_cm                  31968  2 ib_ucm,rdma_cm
> > ib_ipoib               48144  0
> > ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
> > ib_uverbs              38312  2 rdma_ucm,ib_ucm
> > ib_umad                17968  0
> > ib_mthca              116240  0
> > ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
> > ib_core                49024  9
> > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
> >
> > I compiled lustre for the above kernel and ofed installation. I get the
> > following when doing a 'lctl network up' in lustre. I have modversion set
> > to on in the kernel. If i set it to 'n' then i get a null pointer
> > exception and the module crashes.
> >
> > ko2iblnd: disagrees about version of symbol ib_create_cq
> > ko2iblnd: Unknown symbol ib_create_cq
>
> don't know anything about lustre, but note you must
> point build to pick up headers from
> /usr/local/ofed/src/openib/include/
> *before* the built-in header includes.

I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
lustre's configure line below. Lustre's configure script looks for a
driver/infiniband directory which only seems to exist under
/usr/local/ofed/src/openib-1.1

./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/

Thierry.

> replace /usr/local/ofed with the prefix you specified.

> --
> MST


From delaitt at cpc.wmin.ac.uk  Mon Sep 25 01:16:04 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Mon, 25 Sep 2006 09:16:04 +0100 (BST)
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <200609251034.56218.jackm@dev.mellanox.co.il>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<200609251034.56218.jackm@dev.mellanox.co.il>
Message-ID: <Pine.GSO.4.58.0609250912090.25974@seth.cpc.wmin.ac.uk>


On Mon, 25 Sep 2006, Jack Morgenstein wrote:

> On Monday 25 September 2006 09:58, Or Gerlitz wrote:
> > Jack Morgenstein wrote:
> > > Did you recompile Lustre following the installation of ofed-1.1?
> > > I'm not familiar with the Lustre installation procedure (i.e., if it
> > > gets compiled on the current host).  If yes, you probably merely need
> > > to uninstall and reinstall Lustre o2ib.
> >
> > OK, can we state clearly what's the user needs to do with modules
> > directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and
> > hopefully more to come).
> >
> > Is it recompile / uninstall / install ???
>
> If possible:
>  - recompile (make) and reinstall to kernel (make install) Lustre o2ib
>
> Otherwise:
>  - uninstall and reinstall onto the host Lustre o2ib (assuming that
> 	the Lustre installation compiles its modules on that host
> 	during the installation process and then installs them,
> 	rather than just copying over pre-compiled modules to
> 	/lib/modules/<kernel version>/drivers/kernel/infiniband
> >
> > > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote:
> > >> I get the following when loading lustre o2ib module. I'm using ofed-1.1
> > >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
> > >> kernel i'm using and lustre too. I don't understand why i get the
> > >> following as i only have one version of the ib modules ?
>
> This is a bit unclear.  Was Lustre installed AFTER the OFED installation?

1) patch kernel with lustre patches and recompile/install kernel
2) boot with new kernel
3) make + install ofed-1.1-rc6
4) depmod -a
5) compile + install lustre (lustre was installed after ofed installation)
6) depmod -a
7) modprobe lnet
8) lctl network up.

in step (5), lustre was configured as follows:
./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/

Thierry.


From mst at mellanox.co.il  Mon Sep 25 01:26:09 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 11:26:09 +0300
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <Pine.GSO.4.58.0609250909180.25974@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
	<20060925080051.GC21836@mellanox.co.il>
	<Pine.GSO.4.58.0609250909180.25974@seth.cpc.wmin.ac.uk>
Message-ID: <20060925082609.GE21836@mellanox.co.il>

Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> 
> I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
> lustre's configure line below. Lustre's configure script looks for a
> driver/infiniband directory which only seems to exist under
> /usr/local/ofed/src/openib-1.1
> 
> ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/
> 
> Thierry.
> 
> > replace /usr/local/ofed with the prefix you specified.

This looks wrong - openib-1.1 is the pristine sources.
openib/include is the exported interface and is what you should use
for dependent modules.
No idea why would lustre need drivers/infiniband.
Try creating a softlink:

mkdir /usr/local/ofed/src/openib/drivers/infiniband
ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband

-- 
MST


From delaitt at cpc.wmin.ac.uk  Mon Sep 25 01:56:30 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Mon, 25 Sep 2006 09:56:30 +0100 (BST)
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <20060925082609.GE21836@mellanox.co.il>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
	<20060925080051.GC21836@mellanox.co.il>
	<Pine.GSO.4.58.0609250909180.25974@seth.cpc.wmin.ac.uk>
	<20060925082609.GE21836@mellanox.co.il>
Message-ID: <Pine.GSO.4.58.0609250942430.25974@seth.cpc.wmin.ac.uk>


On Mon, 25 Sep 2006, Michael S. Tsirkin wrote:

> Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> >
> > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
> > lustre's configure line below. Lustre's configure script looks for a
> > driver/infiniband directory which only seems to exist under
> > /usr/local/ofed/src/openib-1.1
> >
> > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/
> >
> > Thierry.
> >
> > > replace /usr/local/ofed with the prefix you specified.
>
> This looks wrong - openib-1.1 is the pristine sources.
> openib/include is the exported interface and is what you should use
> for dependent modules.
> No idea why would lustre need drivers/infiniband.
> Try creating a softlink:
>
> mkdir /usr/local/ofed/src/openib/drivers/infiniband
> ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband

I untarred lustre 1.5.95, compiled it (./configure
--with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a
and still get the following:

my modprobe.conf is the following

options lnet ip2nets="o2ib0 161.74.83.[0-255]"

lctl network up
LNET configure error 100: Network is down

ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
o2ib, module ko2iblnd, rc=256

lsmod | grep ib
libcfs                103060  1 lnet
ib_ucm                 19332  0
ib_addr                10756  1 rdma_cm
ib_cm                  31968  2 ib_ucm,rdma_cm
ib_ipoib               48400  0
ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
ib_uverbs              38312  2 rdma_ucm,ib_ucm
ib_umad                17968  0
ib_mthca              116240  0
ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
ib_core                49024  9
ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad

nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd
d5dcb698 A __crc_ib_alloc_pd
0000001c r __kcrctab_ib_alloc_pd
0000006a r __kstrtab_ib_alloc_pd
00000038 r __ksymtab_ib_alloc_pd
00000c65 T ib_alloc_pd

from lustre's config.log:

configure:6500: checking whether to enable OpenIB gen2 support
configure:6586: cp conftest.c build && make modules CC=gcc -f
/root/lustre-1.5.95/build/Makefile LUSTRE_LINUX
_CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include  M=/root/lustre-1.5.95/build
/root/lustre-1.5.95/build/conftest.c:42: warning: function declaration
isn't a prototype
/root/lustre-1.5.95/build/conftest.c: In function 'main':
/root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason'
/root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr'
/root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr'
/root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr'
/root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param'
WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined!
configure:6589: $? = 0
configure:6591: test -s build/conftest.o
configure:6594: $? = 0
configure:6597: result: yes


Thierry.


From ogerlitz at voltaire.com  Mon Sep 25 03:16:40 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 25 Sep 2006 13:16:40 +0300
Subject: [openib-general] timer_pending kernel assertion while stopping
	IPoIB
In-Reply-To: <adau031i6v7.fsf@cisco.com>
References: <Pine.LNX.4.64.0609211129040.28981@zuben>
	<adau031i6v7.fsf@cisco.com>
Message-ID: <4517AC88.9080202@voltaire.com>

Roland Dreier wrote:
>     Or> the kernel is net-2.6.19 git
> 
> My first guess would be it's a bug introduced in the net-2.6.19 tree.
> Can you reproduce it with plain 2.6.18 and/or my for-2.6.19 branch?

OK, i will be able to test this with 2.6.18 later this week, as for 
doing so with your for-2.6.19 branch, is it sufficient to do (assuming 
the tree was cloned and now updated with git pull)

$ git checkout -f for-2.6.19

to have the sources "state" be as of that branch? for example following 
doing so i don't see the amso1100 directory below drivers/infiniband/hw

Or.


From delaitt at cpc.wmin.ac.uk  Mon Sep 25 04:49:56 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Mon, 25 Sep 2006 12:49:56 +0100 (BST)
Subject: [openib-general] problems with lustre o2ib module & ofed
In-Reply-To: <Pine.GSO.4.58.0609250942430.25974@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
	<20060925080051.GC21836@mellanox.co.il>
	<Pine.GSO.4.58.0609250909180.25974@seth.cpc.wmin.ac.uk>
	<20060925082609.GE21836@mellanox.co.il>
	<Pine.GSO.4.58.0609250942430.25974@seth.cpc.wmin.ac.uk>
Message-ID: <Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>


It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default
despite the fact that my kernel is 2.6.16.21-0.8-smp !

uname -a
Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux

make[3]: Nothing to be done for `install-exec-am'.
/bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre
 /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota

I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and
/lib/modules/2.6.16.21-0.8-default

i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and
not 2.6.16.21-0.8-smp

Thierry.

On Mon, 25 Sep 2006, Thierry Delaitre wrote:

>
> On Mon, 25 Sep 2006, Michael S. Tsirkin wrote:
>
> > Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> > >
> > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
> > > lustre's configure line below. Lustre's configure script looks for a
> > > driver/infiniband directory which only seems to exist under
> > > /usr/local/ofed/src/openib-1.1
> > >
> > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/
> > >
> > > Thierry.
> > >
> > > > replace /usr/local/ofed with the prefix you specified.
> >
> > This looks wrong - openib-1.1 is the pristine sources.
> > openib/include is the exported interface and is what you should use
> > for dependent modules.
> > No idea why would lustre need drivers/infiniband.
> > Try creating a softlink:
> >
> > mkdir /usr/local/ofed/src/openib/drivers/infiniband
> > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband
>
> I untarred lustre 1.5.95, compiled it (./configure
> --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a
> and still get the following:
>
> my modprobe.conf is the following
>
> options lnet ip2nets="o2ib0 161.74.83.[0-255]"
>
> lctl network up
> LNET configure error 100: Network is down
>
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq
> ko2iblnd: disagrees about version of symbol ib_dereg_mr
> ko2iblnd: Unknown symbol ib_dereg_mr
> ko2iblnd: disagrees about version of symbol ib_destroy_cq
> ko2iblnd: Unknown symbol ib_destroy_cq
> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> ko2iblnd: Unknown symbol ib_get_dma_mr
> ko2iblnd: disagrees about version of symbol ib_alloc_pd
> ko2iblnd: Unknown symbol ib_alloc_pd
> ko2iblnd: disagrees about version of symbol ib_modify_qp
> ko2iblnd: Unknown symbol ib_modify_qp
> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> ko2iblnd: Unknown symbol ib_dealloc_pd
> LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> o2ib, module ko2iblnd, rc=256
>
> lsmod | grep ib
> libcfs                103060  1 lnet
> ib_ucm                 19332  0
> ib_addr                10756  1 rdma_cm
> ib_cm                  31968  2 ib_ucm,rdma_cm
> ib_ipoib               48400  0
> ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
> ib_uverbs              38312  2 rdma_ucm,ib_ucm
> ib_umad                17968  0
> ib_mthca              116240  0
> ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
> ib_core                49024  9
> ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
>
> nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd
> d5dcb698 A __crc_ib_alloc_pd
> 0000001c r __kcrctab_ib_alloc_pd
> 0000006a r __kstrtab_ib_alloc_pd
> 00000038 r __ksymtab_ib_alloc_pd
> 00000c65 T ib_alloc_pd
>
> from lustre's config.log:
>
> configure:6500: checking whether to enable OpenIB gen2 support
> configure:6586: cp conftest.c build && make modules CC=gcc -f
> /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX
> _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include  M=/root/lustre-1.5.95/build
> /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration
> isn't a prototype
> /root/lustre-1.5.95/build/conftest.c: In function 'main':
> /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason'
> /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr'
> /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr'
> /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr'
> /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param'
> WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined!
> configure:6589: $? = 0
> configure:6591: test -s build/conftest.o
> configure:6594: $? = 0
> configure:6597: result: yes
>
>
> Thierry.
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.


From kliteyn at dev.mellanox.co.il  Mon Sep 25 05:35:37 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 25 Sep 2006 15:35:37 +0300
Subject: [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In
 osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response
In-Reply-To: <450F7D7E.8070408@mellanox.co.il>
References: <450F7D7E.8070408@mellanox.co.il>
Message-ID: <4517CD19.20700@dev.mellanox.co.il>

Hi Hal.

The patch looks ok. A few remarks thought:

It appears that the multicast group mtu/rate selectors
are actually not referenced by anyone - the SM/SA code 
implicitly assumes that they should be 'exact', and acts 
accordingly. Same goes for the response - the selectors
that are filled in are hard-coded to 'exact'.

This is the reason why the bug that this patch fixes has
never appeared, and why fixing it will not change the SM
behavior.

But of course, it is better to have this fix anyway.

--
Yevgeny

> Subject:
> [openib-general] [PATCH] OpenSM/osm_sa_mcmember_record.c: In
> osm_mcmr_rcv_create_new_mgrp, fix exactly selectors in response
> From:
> "Hal Rosenstock" <halr at voltaire.com>
> Date:
> 18 Sep 2006 20:30:37 -0400
> To:
> openib-general at openib.org
> 
> To:
> openib-general at openib.org
> CC:
> "Roland Dreier" <rdreier at cisco.com>
> 
> 
> OpenSM/osm_sa_mcmember_record.c: In osm_mcmr_rcv_create_new_mgrp, set
> exactly selectors after rather than before mgrp is initialized
> 
> Pointed out by: Roland Dreier <rdreier at cisco.com>
> 
> Signed-off-by: Hal Rosenstock <halr at voltaire.com>
> 
> Index: opensm/osm_sa_mcmember_record.c
> ===================================================================
> --- opensm/osm_sa_mcmember_record.c	(revision 9347)
> +++ opensm/osm_sa_mcmember_record.c	(working copy)
> @@ -1337,15 +1337,18 @@ osm_mcmr_rcv_create_new_mgrp(
>      goto Exit;
>    }
>  
> -  /* the mcmember_record should have mtu_sel, rate_sel and pkt_lifetime_sel = 2 */
> -  (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */
> -  (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */
> -  (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */
> -
>    /* Initialize the mgrp */
>    (*pp_mgrp)->mcmember_rec = mcm_rec;
>    (*pp_mgrp)->mcmember_rec.mlid = mlid;
>  
> +  /* the mcmember_record should have mtu_sel, rate_sel, and pkt_lifetime_sel = 2 */
> +  (*pp_mgrp)->mcmember_rec.mtu &= 0x3f;
> +  (*pp_mgrp)->mcmember_rec.mtu |= 2<<6; /* exactly */
> +  (*pp_mgrp)->mcmember_rec.rate &= 0x3f;
> +  (*pp_mgrp)->mcmember_rec.rate |= 2<<6; /* exactly */
> +  (*pp_mgrp)->mcmember_rec.pkt_life &= 0x3f;
> +  (*pp_mgrp)->mcmember_rec.pkt_life |= 2<<6; /* exactly */
> +
>    /* Insert the new group in the data base */
>    
>    /* since we might have an old group by that mlid
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From eli at dev.mellanox.co.il  Mon Sep 25 05:58:30 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Mon, 25 Sep 2006 15:58:30 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adar6y4hlkf.fsf@cisco.com>
References: <1158850657.24776.158.camel@localhost> <adahcz1hz4t.fsf@cisco.com>
	<62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il>
	<adar6y4hlkf.fsf@cisco.com>
Message-ID: <1159189110.26523.12.camel@localhost>

I experimented with your patch and could not find any noticeable 
change in BW or interrupt rate so I guess we can use your ideas.

Still, note that NAPI_howto.txt does not read *budget to limit the 
number of polls and the code bellow from kernel 2.6.17.7 takes into 
account that budget can become negative.

static void net_rx_action(struct softirq_action *h)
{
	struct softnet_data *queue = &__get_cpu_var(softnet_data);
	unsigned long start_time = jiffies;
	int budget = netdev_budget;
	void *have;

	local_irq_disable();

	while (!list_empty(&queue->poll_list)) {
		struct net_device *dev;

		if (budget <= 0 || jiffies - start_time > 1)
			goto softnet_break;
	...
}


>I took a stab at implementing this myself, and it
>
> You might want to respin your patch against my for-2.6.19 branch 

Do you think I should work on this or you plan to push your code?


>> The biggest problem I have with this is that I don't know what to
call
>> the feature bit.  Any suggestions?

>Maybe set bit for the lack of the feature? REQUIRES_POLL_AFTER_ARM?

We can take Michael's suggestion or use NOT_REQUIRES_POLL_AFTER_ARM so
we can implement this for mthca without touching ipath or ehca at the
first step.


From dotanb at dev.mellanox.co.il  Mon Sep 25 06:06:17 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Mon, 25 Sep 2006 16:06:17 +0300
Subject: [openib-general] max message size for IB_WR_SEND
In-Reply-To: <loom.20060921T212754-753@post.gmane.org>
References: <loom.20060920T204936-772@post.gmane.org>
	<4512244D.4040404@dev.mellanox.co.il>
	<loom.20060921T212754-753@post.gmane.org>
Message-ID: <4517D449.60502@dev.mellanox.co.il>

amit byron wrote:
> Dotan Barak <dotanb <at> dev.mellanox.co.il> writes:
>
>   
>> Hi.
>>
>> amit byron wrote:
>>     
>>> hi,
>>>
>>> if i evoke/call ib_post_send(IB_WR_SEND) with message
>>> size 512 bytes, the message gets received on the
>>> peer (second) node. the 2 nodes are connected point-to
>>> -point.
>>>
>>> but if message size is increased to 4096 bytes then
>>> second node receives the message; but message content
>>> is missing (empty).
>>>
>>> won't infiniband stack break down message in smaller
>>> chunks and assemble on peer node?
>>>
>>> thanks,
>>> Amit.
>>>   
>>>       
>> Which transport type are you using?
>> if you are using a UD QP, then the answer is no.
>> for any other transport type, the answer is yes (the message is being 
>> break down to packets with the MTU side as specified in the QP context.
>>
>> maybe you have a different problem in you code. did you check the 
>> completion status in both of the nodes?
>>
>> Dotan
>>
>>
>>     
>
> i'm using RC connection. the issue seems to occur only when
> running in xen's domain 0 (xen0). on core linux kernel, the
> code works -- i'm able to do both send message and perform
> rdma write with size greater than 4096.
>
> i don't see any errors reported while sending a message with
> size greater than 4096 (same hold true for rdma write).
>
> i'm able send message (greater than 4096 bytes) from code
> running in core linux kernel to peer node code that is
> running in xen's domain 0.
>
> this suggest that there is some hard-limit that prevents
> infiniband to send message; but no errors are reported
> from infiniband stack.
>
> any suggestions on how to enable tracing in hca driver?
>
> thanks,
> Amit.
>   

1) You can use perfquery in the sender/receiver host to find how much 
data/packets were sent/received.

2) why does the number 4096 is so important?
maybe the problem happens when using message size > MTU ...
which MTU do you use in the QP?
maybe you should try to send a message with the size of MTU + 1 bytes 
and check the result ...

Dotan


From eli at dev.mellanox.co.il  Mon Sep 25 06:16:39 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Mon, 25 Sep 2006 16:16:39 +0300
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <a33d0a9f0609211931l48d0fc71t897422c61d593b0b@mail.gmail.com>
References: <1158850592.24776.156.camel@localhost>
	<a33d0a9f0609211024i412e3fa1p6946339c46603eee@mail.gmail.com>
	<61399.85.250.167.59.1158860279.squirrel@dev.mellanox.co.il>
	<a33d0a9f0609211931l48d0fc71t897422c61d593b0b@mail.gmail.com>
Message-ID: <1159190199.26523.18.camel@localhost>

On Thu, 2006-09-21 at 19:31 -0700, harish wrote:

> How did the CPU utilizations compare for the NAPI vs. no NAPI case?
> What are your thoughts on what bottleneck you are hitting?
> 


The CPU utilization reported by netperf is not accurate since it is not
reported on a per cpu basis and I don't have one number to reliably
describe utilization. I think the bottleneck is that the CPU that
handles the softirq that handles the rx sk_buffs is 100% utilized and it
dictates the limit.


From kliteyn at dev.mellanox.co.il  Mon Sep 25 06:12:00 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Mon, 25 Sep 2006 16:12:00 +0300
Subject: [openib-general] [PATCH] osm: cosmetic changes in osmtest multicast
	flow
Message-ID: <yzsbqp4850f.fsf@kliteynik.yok.mtl.com>

Hi Hal

This patch is all about cosmetics - it improves
the osmtest log readability, and it also has some 
cosmetic additions in the code.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>


Index: osmtest/osmt_multicast.c
===================================================================
--- osmtest/osmt_multicast.c	(revision 9622)
+++ osmtest/osmt_multicast.c	(working copy)
@@ -54,6 +54,9 @@
 #include <complib/cl_map.h>
 #include "osmtest.h"
 
+/**********************************************************************
+ **********************************************************************/
+
 static
 cl_status_t
 __match_mgids(
@@ -76,6 +79,9 @@ __match_mgids(
 
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_query_mcast( IN osmtest_t * const p_osmt ) {
   ib_api_status_t status = IB_SUCCESS;
@@ -219,6 +225,9 @@ osmt_query_mcast( IN osmtest_t * const p
   return ( status );
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 /* given a multicast request send and wait for response. */
 ib_api_status_t
 osmt_send_mcast_request( IN osmtest_t * const p_osmt,
@@ -334,6 +343,9 @@ osmt_send_mcast_request( IN osmtest_t * 
 
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 void
 osmt_init_mc_query_rec(IN  osmtest_t * const p_osmt,
                        IN OUT ib_member_rec_t  *p_mc_req) {
@@ -702,9 +714,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+  
   /* no MGID */
   memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t));
   /* Request Join */
@@ -727,9 +738,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
       IB_SA_MAD_STATUS_INSUF_COMPS) {
@@ -749,9 +759,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+  
   /* no MGID */
   memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t));
   /* Request Join */
@@ -774,9 +783,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+           
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
       IB_SA_MAD_STATUS_INSUF_COMPS) {
@@ -803,10 +811,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking Join with insufficient comp mask - flow label (o15.0.1.3)...\n"
            );
 
-  osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+  osm_log( &p_osmt->log, OSM_LOG_ERROR, 
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   /* Request Join */
   ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER);
@@ -828,9 +834,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
       IB_SA_MAD_STATUS_INSUF_COMPS) {
@@ -854,9 +858,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   /* Request Join */
   ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER) ;
@@ -878,9 +880,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
@@ -905,9 +905,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+  
   /* no MGID */
   /* memset(&mc_req_rec.mgid, 0, sizeof(ib_gid_t)); */
   /* Request Join */
@@ -930,9 +929,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
@@ -1228,9 +1225,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking Create given MGID=0 skip service level (o15.0.1.4)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   osmt_init_mc_query_rec(p_osmt, &mc_req_rec);
 
@@ -1258,9 +1253,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
@@ -1311,9 +1304,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   osmt_init_mc_query_rec(p_osmt, &mc_req_rec);
 
@@ -1342,9 +1333,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     &res_sa_mad );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
@@ -1368,9 +1357,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking Create given MGID=0 skip TClass (o15.0.1.4)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   osmt_init_mc_query_rec(p_osmt, &mc_req_rec);
 
@@ -1400,9 +1387,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     &res_sa_mad );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if (status != IB_REMOTE_ERROR ||
       (( ib_net16_t ) (res_sa_mad.status & IB_SMP_STATUS_MASK )) !=
@@ -1887,18 +1872,15 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+  
   mc_req_rec.mgid.raw[0] = 0xFA;
   status = osmt_send_mcast_request( p_osmt, 1,
                                     &mc_req_rec,
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:   vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
@@ -1919,9 +1901,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
   mc_req_rec.mgid.raw[0] = 0xFF;
   mc_req_rec.mgid.raw[3] = 0x1B;
   comp_mask = comp_mask | IB_MCR_COMPMASK_SCOPE;
@@ -1932,9 +1912,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:  vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+  
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -1955,9 +1934,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
 
   mc_req_rec.mgid = good_mgid;
@@ -1969,9 +1946,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:   vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2034,9 +2010,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   mc_req_rec.mgid = good_mgid;
 
@@ -2048,9 +2022,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     &res_sa_mad );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:   vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
 
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
@@ -2112,9 +2084,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
   mc_req_rec.mgid = good_mgid;
   mc_req_rec.mgid.raw[12] = 0xFF;
   mc_req_rec.scope_state = 0x22; /*  link-local scope */
@@ -2124,9 +2094,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2171,9 +2140,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            );
 
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
   /* We have created a new MCG so now we need different mgid when cresting group otherwise it will be counted as join request .*/
   mc_req_rec.mgid = good_mgid;
   mc_req_rec.mgid.raw[12] = 0xFC;
@@ -2185,9 +2152,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:   vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2462,9 +2428,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking BAD RATE when connecting to existing MGID (o15.0.1.13)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   mc_req_rec.mgid = good_mgid;
   mc_req_rec.rate =
@@ -2487,9 +2451,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2509,9 +2472,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "existing MGID (o15.0.1.13)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   mc_req_rec.mgid = osm_ipoib_mgid;
   mc_req_rec.mtu =
@@ -2534,9 +2495,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:   vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2556,9 +2516,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "to existing MGID (o15.0.1.13)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   mc_req_rec.mgid = osm_ipoib_mgid;
   mc_req_rec.mtu =
@@ -2581,9 +2539,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:   vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2697,18 +2654,16 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking Delete by trying to Join deleted group (o15.0.1.14)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+  
   mc_req_rec.scope_state = 0x22; /*  use non member - so if no group fail */
   status = osmt_send_mcast_request( p_osmt, 1, /*  join */
                                     &mc_req_rec,
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+  
   if (status != IB_REMOTE_ERROR)
   {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2727,9 +2682,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking BAD Delete of Mgid membership (no prev join) (o15.0.1.15)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   mc_req_rec.mgid = osm_ipoib_mgid;
   mc_req_rec.rate =
@@ -2742,9 +2695,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2821,9 +2773,7 @@ osmt_run_mcast_flow( IN osmtest_t * cons
            "Checking BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)...\n"
            );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
 
   mc_req_rec.mgid = osm_ipoib_mgid;
   mc_req_rec.rate =
@@ -2836,9 +2786,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if ((status != IB_REMOTE_ERROR) ||
       (res_sa_mad.status != IB_SA_MAD_STATUS_REQ_INVALID)) {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
@@ -2896,9 +2845,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
 
   /* impossible requested mtu always greater than exist in MCG */
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expecting Errors: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_START "\n" );
+  
   mc_req_rec.mtu = IB_MTU_LEN_4096 | IB_PATH_SELECTOR_GREATER_THAN << 6;
   memcpy(&mc_req_rec.mgid,&tmp_mgid,sizeof(ib_gid_t));
   ib_member_set_join_state(&mc_req_rec, IB_MC_REC_STATE_FULL_MEMBER);
@@ -2914,9 +2862,8 @@ osmt_run_mcast_flow( IN osmtest_t * cons
                                     comp_mask,
                                     &res_sa_mad );
   osm_log( &p_osmt->log, OSM_LOG_ERROR,
-           "osmt_run_mcast_flow: "
-           " Expected Errors:    vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv\n"
-           );
+           "osmt_run_mcast_flow: " EXPECTING_ERRORS_END "\n" );
+           
   if (status == IB_SUCCESS)
   {
     osm_log( &p_osmt->log, OSM_LOG_ERROR,


From delaitt at cpc.wmin.ac.uk  Mon Sep 25 07:01:33 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Mon, 25 Sep 2006 15:01:33 +0100 (BST)
Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib
 module & ofed
In-Reply-To: <Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<200609250940.47936.jackm@dev.mellanox.co.il>
	<45177E0C.1040101@voltaire.com>
	<Pine.GSO.4.58.0609250836411.25974@seth.cpc.wmin.ac.uk>
	<20060925080051.GC21836@mellanox.co.il>
	<Pine.GSO.4.58.0609250909180.25974@seth.cpc.wmin.ac.uk>
	<20060925082609.GE21836@mellanox.co.il>
	<Pine.GSO.4.58.0609250942430.25974@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>
Message-ID: <Pine.GSO.4.58.0609251305100.25974@seth.cpc.wmin.ac.uk>


On Mon, 25 Sep 2006, Thierry Delaitre wrote:

>
> It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default
> despite the fact that my kernel is 2.6.16.21-0.8-smp !
>
> uname -a
> Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux
>
> make[3]: Nothing to be done for `install-exec-am'.
> /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre
>  /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota
>
> I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and
> /lib/modules/2.6.16.21-0.8-default
>
> i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and
> not 2.6.16.21-0.8-smp

I've updated the UTS_RELEASE string in
/usr/src/linux-2.6.16.21-0.8/include/linux/version.h from default to smp
and deleted my /lib/modules/
lustre now installs in /lib/modules/2.6.16.21-0.8-smp/kernel along with
ofed ib drivers. i recompiled the kernel, ofed and lustre and still gets
this:

ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
LustreError: 7430:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
o2ib, module ko2iblnd, rc=256

nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_create_cq
3cfe7afa A __crc_ib_create_cq
00000060 r __kcrctab_ib_create_cq
0000015f r __kstrtab_ib_create_cq
000000c0 r __ksymtab_ib_create_cq
00000d50 T ib_create_cq

i'm a bit stuck!

Thierry.

> Thierry.
>
> On Mon, 25 Sep 2006, Thierry Delaitre wrote:
>
> >
> > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote:
> >
> > > Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> > > >
> > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
> > > > lustre's configure line below. Lustre's configure script looks for a
> > > > driver/infiniband directory which only seems to exist under
> > > > /usr/local/ofed/src/openib-1.1
> > > >
> > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/
> > > >
> > > > Thierry.
> > > >
> > > > > replace /usr/local/ofed with the prefix you specified.
> > >
> > > This looks wrong - openib-1.1 is the pristine sources.
> > > openib/include is the exported interface and is what you should use
> > > for dependent modules.
> > > No idea why would lustre need drivers/infiniband.
> > > Try creating a softlink:
> > >
> > > mkdir /usr/local/ofed/src/openib/drivers/infiniband
> > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband
> >
> > I untarred lustre 1.5.95, compiled it (./configure
> > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a
> > and still get the following:
> >
> > my modprobe.conf is the following
> >
> > options lnet ip2nets="o2ib0 161.74.83.[0-255]"
> >
> > lctl network up
> > LNET configure error 100: Network is down
> >
> > ko2iblnd: disagrees about version of symbol ib_create_cq
> > ko2iblnd: Unknown symbol ib_create_cq
> > ko2iblnd: disagrees about version of symbol ib_dereg_mr
> > ko2iblnd: Unknown symbol ib_dereg_mr
> > ko2iblnd: disagrees about version of symbol ib_destroy_cq
> > ko2iblnd: Unknown symbol ib_destroy_cq
> > ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> > ko2iblnd: Unknown symbol ib_get_dma_mr
> > ko2iblnd: disagrees about version of symbol ib_alloc_pd
> > ko2iblnd: Unknown symbol ib_alloc_pd
> > ko2iblnd: disagrees about version of symbol ib_modify_qp
> > ko2iblnd: Unknown symbol ib_modify_qp
> > ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> > ko2iblnd: Unknown symbol ib_dealloc_pd
> > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> > o2ib, module ko2iblnd, rc=256
> >
> > lsmod | grep ib
> > libcfs                103060  1 lnet
> > ib_ucm                 19332  0
> > ib_addr                10756  1 rdma_cm
> > ib_cm                  31968  2 ib_ucm,rdma_cm
> > ib_ipoib               48400  0
> > ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
> > ib_uverbs              38312  2 rdma_ucm,ib_ucm
> > ib_umad                17968  0
> > ib_mthca              116240  0
> > ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
> > ib_core                49024  9
> > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
> >
> > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd
> > d5dcb698 A __crc_ib_alloc_pd
> > 0000001c r __kcrctab_ib_alloc_pd
> > 0000006a r __kstrtab_ib_alloc_pd
> > 00000038 r __ksymtab_ib_alloc_pd
> > 00000c65 T ib_alloc_pd
> >
> > from lustre's config.log:
> >
> > configure:6500: checking whether to enable OpenIB gen2 support
> > configure:6586: cp conftest.c build && make modules CC=gcc -f
> > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX
> > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include  M=/root/lustre-1.5.95/build
> > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration
> > isn't a prototype
> > /root/lustre-1.5.95/build/conftest.c: In function 'main':
> > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason'
> > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr'
> > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr'
> > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr'
> > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param'
> > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined!
> > configure:6589: $? = 0
> > configure:6591: test -s build/conftest.o
> > configure:6594: $? = 0
> > configure:6597: result: yes
> >
> >
> > Thierry.
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
> >
>
> ----------------------------------------
> Dr Thierry DELAITRE
> Systems and Services Manager, CSCS
> University of Westminster
> 115 New Cavendish Street, London W1W 6UW
>
> Tel: 020 7911 5000 ext: 3586
> Fax: 020 7911 5089
> Mobile short dial code 1788
>
> http://www.cscs.wmin.ac.uk/~delaitt
> ----------------------------------------
>
> This e-mail and its attachments are intended for the above named only
> and may be confidential.  If they have come to you in error you must
> not copy or show them to anyone, nor should you take any action based
> on them, other than to notify the error by replying to the sender.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>
>

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.


From dotanb at dev.mellanox.co.il  Mon Sep 25 07:04:50 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Mon, 25 Sep 2006 17:04:50 +0300
Subject: [openib-general] [cma] the private data length that arrives with
 the event RDMA_CM_EVENT_CONNECT_REQUEST is false
Message-ID: <4517E202.8080509@dev.mellanox.co.il>

Hi Sean.

I'm using the following configuration:
*************************************************************
Host Architecture : x86_64
Linux Distribution: Red Hat Enterprise Linux AS release 4 (Nahant Update 3)
Kernel Version    : 2.6.9-34.ELsmp
GCC Version       : gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)
Memory size       : 4039892 kB
Driver Version    : gen2_linux-20060922-1700 (REV=9611)
HCA ID(s)         : mthca0
HCA model(s)      : 25208
FW version(s)     : 4.7.927
Board(s)          : MT_00A0010001
*************************************************************

I have 2 sides:
    The first side calls rdma_connect() with private data (and 
private_data_len != 0)
    The second side wait for the RDMA_CM_EVENT_CONNECT_REQUEST event and 
check the private_data_len.

The problem is that the private_data_len in the second side (receiver) 
is not equal to the sent data (length).

Can you please check this issue?

Thanks
Dotan


From rdreier at cisco.com  Mon Sep 25 07:11:52 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 07:11:52 -0700
Subject: [openib-general] [cma] the private data length that arrives
 with the event RDMA_CM_EVENT_CONNECT_REQUEST is false
In-Reply-To: <4517E202.8080509@dev.mellanox.co.il> (Dotan Barak's
	message of "Mon, 25 Sep 2006 17:04:50 +0300")
References: <4517E202.8080509@dev.mellanox.co.il>
Message-ID: <aday7s8doif.fsf@cisco.com>

    Dotan> The problem is that the private_data_len in the second side
    Dotan> (receiver) is not equal to the sent data (length).

How do you expect the private data length to be passed from one side
to the other?  There is no such field in the CM protocol.

The only thing the RDMA CM can do is pass the maximum possible private
data length to the passive consumer.

 - R.


From rdreier at cisco.com  Mon Sep 25 07:29:53 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 07:29:53 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060925074155.GB21836@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 25 Sep 2006 10:41:55 +0300")
References: <1158850657.24776.158.camel@localhost> <adahcz1hz4t.fsf@cisco.com>
	<62072.85.250.167.59.1158877997.squirrel@dev.mellanox.co.il>
	<adar6y4hlkf.fsf@cisco.com> <20060925074155.GB21836@mellanox.co.il>
Message-ID: <adau02wdnoe.fsf@cisco.com>

    Michael> Actually, the reason it is hard to come up with the name
    Michael> is that what this enables is the natural poll/request
    Michael> notification order.

Over the weekend and I thought about this and I came up with an idea I
kind of like, inspired by Todd Rimmer's comments about poll-and-notify.

We could change ib_req_notify_cq() to have an extra parameter:

static inline int ib_req_notify_cq(struct ib_cq *cq,
				   enum ib_cq_notify cq_notify,
				   int *lost_event_possible)

and if non-NULL is passed in for lost_event_possible, then
req_notify_cq should do the equivalent of a CQ peek after arming the
CQ event.

Of course mthca would just set *lost_event_possible to 0 without
needing to do any check.

(The reason I make it a pointer parameter rather than just using the
return value is so that consumers don't need to take the potential
cost of a CQ peek on devices where arming a CQ is cheap but peeking in
a CQ might require an extra lock or something).

What do you think?

 - R.


From dotanb at dev.mellanox.co.il  Mon Sep 25 07:36:09 2006
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Mon, 25 Sep 2006 17:36:09 +0300
Subject: [openib-general] [cma] the private data length that arrives
 with the event RDMA_CM_EVENT_CONNECT_REQUEST is false
In-Reply-To: <aday7s8doif.fsf@cisco.com>
References: <4517E202.8080509@dev.mellanox.co.il> <aday7s8doif.fsf@cisco.com>
Message-ID: <4517E959.8010306@dev.mellanox.co.il>

Roland Dreier wrote:
>     Dotan> The problem is that the private_data_len in the second side
>     Dotan> (receiver) is not equal to the sent data (length).
>
> How do you expect the private data length to be passed from one side
> to the other?  There is no such field in the CM protocol.
>
> The only thing the RDMA CM can do is pass the maximum possible private
> data length to the passive consumer.
>
>  - R.
>   
You are right, the CM should support only private data (according to the 
IB spec chapter 12).
The CMA implementation in the gen2 have the attribute private_data_len 
in the rdma_cm_event structure.

So, what is the purpose of private_data_len (in the event structure)?

thanks
Dotan


From swise at opengridcomputing.com  Mon Sep 25 07:46:41 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 25 Sep 2006 09:46:41 -0500
Subject: [openib-general] [cma] the private data length that arrives
 with the event RDMA_CM_EVENT_CONNECT_REQUEST is false
In-Reply-To: <4517E959.8010306@dev.mellanox.co.il>
References: <4517E202.8080509@dev.mellanox.co.il>
	<aday7s8doif.fsf@cisco.com> <4517E959.8010306@dev.mellanox.co.il>
Message-ID: <1159195602.7283.2.camel@stevo-desktop>

For iWARP, the private data length is a field in the MPA startup packets
and thus can be passed up to the consumer in connect request events and
connect reply events.


On Mon, 2006-09-25 at 17:36 +0300, Dotan Barak wrote:
> Roland Dreier wrote:
> >     Dotan> The problem is that the private_data_len in the second side
> >     Dotan> (receiver) is not equal to the sent data (length).
> >
> > How do you expect the private data length to be passed from one side
> > to the other?  There is no such field in the CM protocol.
> >
> > The only thing the RDMA CM can do is pass the maximum possible private
> > data length to the passive consumer.
> >
> >  - R.
> >   
> You are right, the CM should support only private data (according to the 
> IB spec chapter 12).
> The CMA implementation in the gen2 have the attribute private_data_len 
> in the rdma_cm_event structure.
> 
> So, what is the purpose of private_data_len (in the event structure)?
> 
> thanks
> Dotan
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Mon Sep 25 07:54:27 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 17:54:27 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adau02wdnoe.fsf@cisco.com>
References: <adau02wdnoe.fsf@cisco.com>
Message-ID: <20060925145427.GB23882@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
>     Michael> Actually, the reason it is hard to come up with the name
>     Michael> is that what this enables is the natural poll/request
>     Michael> notification order.
> 
> Over the weekend and I thought about this and I came up with an idea I
> kind of like, inspired by Todd Rimmer's comments about poll-and-notify.
> 
> We could change ib_req_notify_cq() to have an extra parameter:
> 
> static inline int ib_req_notify_cq(struct ib_cq *cq,
> 				   enum ib_cq_notify cq_notify,
> 				   int *lost_event_possible)
> 
> and if non-NULL is passed in for lost_event_possible, then
> req_notify_cq should do the equivalent of a CQ peek after arming the
> CQ event.

I thought about this too.

But this has a disadvantage over the device-wide flag: when flag is device-wide,
we can just have 2 polling routines - with and without peek - and select the
correct one at device open depending on the hardware capabilities.
Thus we can avoid a conditional branch on the fast path,
which I think is nice.

So I think if we want to enable mthca-specific optimization,
the righ tway is with device flags.

On a separate note - ib_req_notify_cq is also testing the lost_event_possible flag -
so now we have 2 conditional branches on fast path, and this hurts all ULPs. Ugh.

If we extend the interface, I would rather make  a new call
	ib_req_notify_and_peek_cq(truct ib_cq *cq, enum ib_cq_notify cq_notify)
that returns 0 on empty CQ, 1 on non-empty and negative on error.

-- 
MST


From trimmer at silverstorm.com  Mon Sep 25 09:27:23 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 25 Sep 2006 12:27:23 -0400
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060925145427.GB23882@mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EFB94@mail.silverstorm.com>

> From: Michael S. Tsirkin
> Sent: Monday, September 25, 2006 10:54 AM
> To: Roland Dreier
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI
> 
> Quoting r. Roland Dreier <rdreier at cisco.com>:
> > Subject: Re: [PATCH] IB/ipoib: NAPI
> >
> >     Michael> Actually, the reason it is hard to come up with the
name
> >     Michael> is that what this enables is the natural poll/request
> >     Michael> notification order.
> >
> > Over the weekend and I thought about this and I came up with an idea
I
> > kind of like, inspired by Todd Rimmer's comments about
poll-and-notify.
> >
> > We could change ib_req_notify_cq() to have an extra parameter:
> >
> > static inline int ib_req_notify_cq(struct ib_cq *cq,
> > 				   enum ib_cq_notify cq_notify,
> > 				   int *lost_event_possible)
> >
> > and if non-NULL is passed in for lost_event_possible, then
> > req_notify_cq should do the equivalent of a CQ peek after arming the
> > CQ event.
> 
> I thought about this too.
> 
> But this has a disadvantage over the device-wide flag: when flag is
> device-wide,
> we can just have 2 polling routines - with and without peek - and
select
> the
> correct one at device open depending on the hardware capabilities.
> Thus we can avoid a conditional branch on the fast path,
> which I think is nice.
> 
> So I think if we want to enable mthca-specific optimization,
> the righ tway is with device flags.
> 
> On a separate note - ib_req_notify_cq is also testing the
> lost_event_possible flag -
> so now we have 2 conditional branches on fast path, and this hurts all
> ULPs. Ugh.
> 
> If we extend the interface, I would rather make  a new call
> 	ib_req_notify_and_peek_cq(truct ib_cq *cq, enum ib_cq_notify
> cq_notify)
> that returns 0 on empty CQ, 1 on non-empty and negative on error.
> 
> --
> MST
> 

Its inefficient to peek the CQ if the next operation is likely to then
be a poll.  Performing the poll_and_notify in one call is more
efficient.

Then if you use poll_and_notify instead of poll_cq in the polling loops,
you can also be equally efficient for all HCA models without needing a
hardware capability flag and 2 polling algorithms in each ULP.  Instead
the HCA driver naturally provides the most efficient approach and all
callers use the same algorithm.

In the examples below, lets assume 2 CQEs are returned, then its rearmed
and CQ is still empty afterward.

For example on Mellanox HCAs the actual sequence would be:
	poll_and_notify
		returns a CQE, tells caller to call it again
	poll_and_notify
		returns a CQE, tells caller to call it again
	poll_and_notify
		finds CQ empty, rearms CQ, tells caller its done [note
no peek needed]
3 Driver calls, 3 CQE access, 1 rearm

For other HCAs the actual sequence would be:
	poll_and_notify
		returns a CQE, tells caller to call it again
	poll_and_notify
		returns a CQE, tells caller to call it again
	poll_and_notify
		finds CQ empty, rearms CQ, peeks CQ
		if CQ empty, tells caller its done [for this example,
its true]
		if CQ not empty, tells caller to loop on poll_cq
3 Driver calls, 4 CQE access, 1 rearm

In comparison the present code (or with a device capability flag) is:
	poll_cq
		returns a CQE
	poll_cq
		returns a CQE
	poll_cq
		finds CQ empty
	notify_cq
		rearms CQ
	if non-Mellanox HCA
		poll_cq - finds CQ empty
4-5 Driver calls, 3-4 CQE access, 1 rearm

With notify with an internal peek (lost event flag approach) its:
	poll_cq
		returns a CQE
	poll_cq
		returns a CQE
	poll_cq
		finds CQ empty
	notify_cq
		rearms CQ, for non-mellanox HCA, peeks CQ - finds CQ
empty
	if lost events indicated [for this example its false]
		poll_cq til empty
4-5 Driver calls, 3-4 CQE access, 1 rearm

Hence for all HCA models, the poll_and_notify approach has fewer driver
calls. (3 in above example, compared to 4 for other approaches).

In general driver calls are going to be the expensive factor in this
comparison.  The main difference in all the above examples will be the
spin_lock for the CQ.

Depending on HCA design, the poll_cq and/or notify_cq and/or peek_cq
operations may also incur an expensive PCI bus read or write.  However,
with the exception of the notify w/peek approach, those costs are the
same for all the above examples.

In the case (not shown above) where there was 1 additional CQE found
after the rearm [applicable only to non-Mellanox HCAs], the
poll_and_notify approach will also save 1 CQE access as compared to the
notify w/internal peek approach.

Todd Rimmer


From rdreier at cisco.com  Mon Sep 25 09:41:26 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 09:41:26 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EFB94@mail.silverstorm.com>
	(Todd Rimmer's message of "Mon, 25 Sep 2006 12:27:23 -0400")
References: <D80D83302DEE6249A221093BF2BB69AE8EFB94@mail.silverstorm.com>
Message-ID: <adak63rew5l.fsf@cisco.com>

    Todd> Its inefficient to peek the CQ if the next operation is
    Todd> likely to then be a poll.  Performing the poll_and_notify in
    Todd> one call is more efficient.

Yes, but if you think carefully about how to implement NAPI for IPoIB,
you'll see that poll-and-notify is not a useful operation.  If a
device does not support the "exact" Mellanox CQ notify semantics, then
there is no way around using peek CQ somehow.

 - R.


From rdreier at cisco.com  Mon Sep 25 09:45:58 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 09:45:58 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060925145427.GB23882@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 25 Sep 2006 17:54:27 +0300")
References: <adau02wdnoe.fsf@cisco.com> <20060925145427.GB23882@mellanox.co.il>
Message-ID: <adafyefevy1.fsf@cisco.com>

    Michael> But this has a disadvantage over the device-wide flag:
    Michael> when flag is device-wide, we can just have 2 polling
    Michael> routines - with and without peek - and select the correct
    Michael> one at device open depending on the hardware
    Michael> capabilities.  Thus we can avoid a conditional branch on
    Michael> the fast path, which I think is nice.

Yeah, but I can't make up my mind whether two polling routines is a
good thing or a bad thing.  We get a very specific optimization, but
we have two copies of the same code then.

    Michael> On a separate note - ib_req_notify_cq is also testing the
    Michael> lost_event_possible flag - so now we have 2 conditional
    Michael> branches on fast path, and this hurts all ULPs. Ugh.

I suspect that the cost here is minimal -- lost_event_possible is
going to be in a register, etc.

    Michael> If we extend the interface, I would rather make a new
    Michael> call ib_req_notify_and_peek_cq(truct ib_cq *cq, enum
    Michael> ib_cq_notify cq_notify) that returns 0 on empty CQ, 1 on
    Michael> non-empty and negative on error.

And again, I don't want to make the interface too fat...

There are a few of tradeoffs here: microoptimization
vs. maintainability, IPoIB & NAPI vs. all other ULPs...

 - R.


From mst at mellanox.co.il  Mon Sep 25 10:58:54 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 20:58:54 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adafyefevy1.fsf@cisco.com>
References: <adafyefevy1.fsf@cisco.com>
Message-ID: <20060925175854.GA25001@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
>     Michael> But this has a disadvantage over the device-wide flag:
>     Michael> when flag is device-wide, we can just have 2 polling
>     Michael> routines - with and without peek - and select the correct
>     Michael> one at device open depending on the hardware
>     Michael> capabilities.  Thus we can avoid a conditional branch on
>     Michael> the fast path, which I think is nice.
> 
> Yeah, but I can't make up my mind whether two polling routines is a
> good thing or a bad thing.  We get a very specific optimization, but
> we have two copies of the same code then.

Well, with a flag the ULP can decide what it wants to do,
we are not forcing anything here.

>     Michael> On a separate note - ib_req_notify_cq is also testing the
>     Michael> lost_event_possible flag - so now we have 2 conditional
>     Michael> branches on fast path, and this hurts all ULPs. Ugh.
> 
> I suspect that the cost here is minimal -- lost_event_possible is
> going to be in a register, etc.

Hmm, since we are passing it by pointer to a function
called through a pointer, I don't see how can gcc
move it out of memory into register. Am I wrong?

>     Michael> If we extend the interface, I would rather make a new
>     Michael> call ib_req_notify_and_peek_cq(truct ib_cq *cq, enum
>     Michael> ib_cq_notify cq_notify) that returns 0 on empty CQ, 1 on
>     Michael> non-empty and negative on error.
> 
> And again, I don't want to make the interface too fat...

Well, lots of flags that you are required to implement
amounts to the same thing from low level driver developer
perspective, isn't that right?

> There are a few of tradeoffs here: microoptimization
> vs. maintainability, IPoIB & NAPI vs. all other ULPs...

I just find a flag + conditional peek a much simpler approach.
Since all our testing is done on mthca anyway, almost
all approaches amount to doing a NOP in various ways for us.

So I would suggest
- get Eli's patch with simple flag into shape & working on all hardware,
  push into git.
- people interested in specific hardware test performance and propose patches
  to improve it even further.

Does this sound good?


-- 
MST


From ardavis at ichips.intel.com  Mon Sep 25 11:37:53 2006
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Mon, 25 Sep 2006 11:37:53 -0700
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
In-Reply-To: <4514510F.3050400@ichips.intel.com>
References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
	<4514510F.3050400@ichips.intel.com>
Message-ID: <45182201.4000105@ichips.intel.com>

Arlin Davis wrote:

>Sean Hefty wrote:
>
>  
>
>>Currently a DREP is only sent in response to a DREQ if a connection
>>has been found matching the DREQ, and it is in the proper state.  Once
>>a DREP is sent, the local connection moves into timewait.  Duplicate
>>DREQs received while in this state result in re-sending the DREP.
>>
>>However, it's likely that the local connection will enter and exit
>>timewait before the remote side times out a lost DREP and resends a DREQ.
>>There are a couple possible solutions to this.  One is to increase how
>>long a connection remains in timewait, by multiplying its wait time by
>>max_cm_retries.  This can greatly increase the timewait state before a QP
>>can be re-used when CM messages are not lost.
>>
>>An alternative is to send a DREP in response to a DREQ, even if a local
>>connection is not found, which is what this patch does.
>> 
>>
>>    
>>
>
>Would it be possible to get this fix in  rc7? I am consistently seeing 
>this problem with Intel MPI on a 64 node cluster.
>
>-arlin
>  
>
Aviram? Is there an rc7 and could this get in?

>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>  
>


From trimmer at silverstorm.com  Mon Sep 25 11:50:19 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 25 Sep 2006 14:50:19 -0400
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adak63rew5l.fsf@cisco.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EFC0A@mail.silverstorm.com>

> From: Roland Dreier [mailto:rdreier at cisco.com]
> Sent: Monday, September 25, 2006 12:41 PM
> To: Rimmer, Todd
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI
> 
>     Todd> Its inefficient to peek the CQ if the next operation is
>     Todd> likely to then be a poll.  Performing the poll_and_notify in
>     Todd> one call is more efficient.
> 
> Yes, but if you think carefully about how to implement NAPI for IPoIB,
> you'll see that poll-and-notify is not a useful operation.  If a
> device does not support the "exact" Mellanox CQ notify semantics, then
> there is no way around using peek CQ somehow.
> 
>  - R.
Roland,

What were your thoughts on how to handle this part of Eli's proposed
code:
	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
	/* TODO we need peek_cq here for hw devices that
	   could would not generate interrupts for completions
	   arriving between end of polling till request notify */

	return 0;

On a non-Mellanox HCA, if the CQ is not empty here, isn't this required
to poll it til empty and process all the CQEs (otherwise we may not get
another interrupt).  If instead we return 1 from the dev->poll routine
here, we could be scheduled for a future poll and a future interrupt
(which might be bad).

Todd


From rdreier at cisco.com  Mon Sep 25 11:54:30 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 11:54:30 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EFC0A@mail.silverstorm.com>
	(Todd Rimmer's message of "Mon, 25 Sep 2006 14:50:19 -0400")
References: <D80D83302DEE6249A221093BF2BB69AE8EFC0A@mail.silverstorm.com>
Message-ID: <ada3bafepzt.fsf@cisco.com>

 > What were your thoughts on how to handle this part of Eli's proposed
 > code:
 > 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 > 	/* TODO we need peek_cq here for hw devices that
 > 	   could would not generate interrupts for completions
 > 	   arriving between end of polling till request notify */
 > 
 > 	return 0;
 > 
 > On a non-Mellanox HCA, if the CQ is not empty here, isn't this required
 > to poll it til empty and process all the CQEs (otherwise we may not get
 > another interrupt).  If instead we return 1 from the dev->poll routine
 > here, we could be scheduled for a future poll and a future interrupt
 > (which might be bad).

That's exactly where we need peek CQ.  We can't repoll the CQ, because
netif_rx_complete() has already been called, so the poll routine might
already be running on another CPU.  The only thing I can see to do is
peek in the CQ, and if it's not empty, then go through the whole
netif_rx_reschedule() song and dance.

 - R.


From rdreier at cisco.com  Mon Sep 25 11:58:43 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 11:58:43 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060925175854.GA25001@mellanox.co.il> (Michael S.
	Tsirkin's message of "Mon, 25 Sep 2006 20:58:54 +0300")
References: <adafyefevy1.fsf@cisco.com> <20060925175854.GA25001@mellanox.co.il>
Message-ID: <aday7s7db8c.fsf@cisco.com>

 > I just find a flag + conditional peek a much simpler approach.
 > Since all our testing is done on mthca anyway, almost
 > all approaches amount to doing a NOP in various ways for us.

Umm, that's a pretty parochial attitude...

 > So I would suggest
 > - get Eli's patch with simple flag into shape & working on all hardware,
 >   push into git.
 > - people interested in specific hardware test performance and propose patches
 >   to improve it even further.

What is a 'simple flag'?  Who is going to implement peek_cq() for ehca
and ipath?

I'm not really that interested in the most micro-optimized approach.
I'd rather have something simple and easy to maintain -- in other
words, I don't want 2 or 3 completion handling paths in IPoIB.  And
from that perspective, extending req_notify_cq() looks pretty good to
me.

 - R.


From rdreier at cisco.com  Mon Sep 25 12:02:35 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 12:02:35 -0700
Subject: [openib-general] Question about ehca CQ handling
Message-ID: <adau02vdb1w.fsf@cisco.com>

While looking over the ehca driver from the perspective of adding a
"peek CQ" operation, I noticed some code that looked funny.

In hipz_set_cqx_n0() and hipz_set_cqx_n1(), what is the point of the
calls to hipz_galpa_load_cq()?  The return value is discarded.  I see
that hipz_galpa_load_cq() dereferences a volatile pointer internally,
so I'm guessing this is some sort of ordering constraint.  But would
it be just as good to do "barrier()" there?

 - R.


From rdreier at cisco.com  Mon Sep 25 12:08:05 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 12:08:05 -0700
Subject: [openib-general] timer_pending kernel assertion while stopping
	IPoIB
In-Reply-To: <4517AC88.9080202@voltaire.com> (Or Gerlitz's message of
	"Mon, 25 Sep 2006 13:16:40 +0300")
References: <Pine.LNX.4.64.0609211129040.28981@zuben>
	<adau031i6v7.fsf@cisco.com> <4517AC88.9080202@voltaire.com>
Message-ID: <adapsdjdasq.fsf@cisco.com>

    Or> OK, i will be able to test this with 2.6.18 later this week,
    Or> as for doing so with your for-2.6.19 branch, is it sufficient
    Or> to do (assuming the tree was cloned and now updated with git
    Or> pull)

    Or> $ git checkout -f for-2.6.19

    Or> to have the sources "state" be as of that branch? for example
    Or> following doing so i don't see the amso1100 directory below
    Or> drivers/infiniband/hw

Well, I guess it's too late now, since both Dave M. and I have merged
upstream with Linus.

But still it would be worth reproducing with Linus's latest git tree.

 - R.


From trimmer at silverstorm.com  Mon Sep 25 12:11:24 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 25 Sep 2006 15:11:24 -0400
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <ada3bafepzt.fsf@cisco.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EFC21@mail.silverstorm.com>

> From: Roland Dreier [mailto:rdreier at cisco.com]
> Sent: Monday, September 25, 2006 2:55 PM
> To: Rimmer, Todd
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI
> 
>  > What were your thoughts on how to handle this part of Eli's
proposed
>  > code:
>  > 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>  > 	/* TODO we need peek_cq here for hw devices that
>  > 	   could would not generate interrupts for completions
>  > 	   arriving between end of polling till request notify */
>  >
>  > 	return 0;
>  >
>  > On a non-Mellanox HCA, if the CQ is not empty here, isn't this
required
>  > to poll it til empty and process all the CQEs (otherwise we may not
get
>  > another interrupt).  If instead we return 1 from the dev->poll
routine
>  > here, we could be scheduled for a future poll and a future
interrupt
>  > (which might be bad).
> 
> That's exactly where we need peek CQ.  We can't repoll the CQ, because
> netif_rx_complete() has already been called, so the poll routine might
> already be running on another CPU.  The only thing I can see to do is
> peek in the CQ, and if it's not empty, then go through the whole
> netif_rx_reschedule() song and dance.
> 
>  - R.

I agree.  This would also mean the ipoib_warn in ipoib_ib_completion
would go away (would be a valid situation).

I'm going to keep thinking about this, seems like we can't call
req_notify until after netif_rx_complete, otherwise we have a different
race.  That leads to the req_notify and peek approach.

It's a shame, because for all other ULPs, the poll_and_notify approach
works well.

I too would prefer not to see dual algorithms and a device flag as it
could quickly lead to a lot of redundant code.

Todd Rimmer


From rdreier at cisco.com  Mon Sep 25 12:16:51 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 12:16:51 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EFC21@mail.silverstorm.com>
	(Todd Rimmer's message of "Mon, 25 Sep 2006 15:11:24 -0400")
References: <D80D83302DEE6249A221093BF2BB69AE8EFC21@mail.silverstorm.com>
Message-ID: <adalko7dae4.fsf@cisco.com>

    Todd> I agree.  This would also mean the ipoib_warn in
    Todd> ipoib_ib_completion would go away (would be a valid
    Todd> situation).

Which warning?  I don't see anything that would change, and I don't
see any warnings at all in ipoib_ib_completion().

    Todd> I'm going to keep thinking about this, seems like we can't
    Todd> call req_notify until after netif_rx_complete, otherwise we
    Todd> have a different race.  That leads to the req_notify and
    Todd> peek approach.

Yes, that's right.  Doing req_notify before netif_rx_complete risks
triggering the event before netif_rx_complete, which leads to the poll
routine never getting scheduled at all.

 - R.


From mst at mellanox.co.il  Mon Sep 25 12:17:33 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 22:17:33 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <aday7s7db8c.fsf@cisco.com>
References: <aday7s7db8c.fsf@cisco.com>
Message-ID: <20060925191733.GD25001@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
>  > I just find a flag + conditional peek a much simpler approach.
>  > Since all our testing is done on mthca anyway, almost
>  > all approaches amount to doing a NOP in various ways for us.
> 
> Umm, that's a pretty parochial attitude...

But what do you suggest? I don't have all IB hardware.
All I was saying was that on mthca

ib_req_notify(&lost)
if (lost) {
	reschedule
}

and

ib_req_notify
if (dev->flags & NEED) && peek_cq() {
	reschedule
}

never calls reschedule, so they are equivalent from that POV.

>  > So I would suggest
>  > - get Eli's patch with simple flag into shape & working on all hardware,
>  >   push into git.
>  > - people interested in specific hardware test performance and propose patches
>  >   to improve it even further.
> 
> What is a 'simple flag'?

Something like IB_DEVICE_IMMEDIATE_COMPETION_EVENT, set in mthca.

ib_req_notify
if (!(dev->flags & IB_DEVICE_IMMEDIATE_COMPETION_EVENT) && peek_cq() {
	reschedule
}

> Who is going to implement peek_cq() for ehca
> and ipath?

I can do that - it's easy. But it's still true I only really test mthca.

> I'm not really that interested in the most micro-optimized approach.

I'm still not convinced an extra peek cq is *that* expensive.
Maybe we can just always peek without any clever tricks?

> I'd rather have something simple and easy to maintain -- in other
> words, I don't want 2 or 3 completion handling paths in IPoIB.  And
> from that perspective, extending req_notify_cq() looks pretty good to
> me.

Hmm, OK I guess.
So, can you post the patch updating req_notify_cq?
Eli can then respin the IPoIB patch to use that.

-- 
MST


From trimmer at silverstorm.com  Mon Sep 25 12:32:42 2006
From: trimmer at silverstorm.com (Rimmer, Todd)
Date: Mon, 25 Sep 2006 15:32:42 -0400
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adalko7dae4.fsf@cisco.com>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE8EFC33@mail.silverstorm.com>


> From: Roland Dreier [mailto:rdreier at cisco.com]
> Sent: Monday, September 25, 2006 3:17 PM
> To: Rimmer, Todd
> Cc: Michael S. Tsirkin; openib-general at openib.org
> Subject: Re: [openib-general] [PATCH] IB/ipoib: NAPI
> 
>     Todd> I agree.  This would also mean the ipoib_warn in
>     Todd> ipoib_ib_completion would go away (would be a valid
>     Todd> situation).
> 
> Which warning?  I don't see anything that would change, and I don't
> see any warnings at all in ipoib_ib_completion().
> 

In Eli's proposed code:
+void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) {
+	struct net_device *dev = (struct net_device *) dev_ptr;
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+
+	/* tell the network layer we have packts */
+        if (netif_rx_schedule_prep(dev))
+		__netif_rx_schedule(dev);
+	else {
+		ipoib_warn(priv, "received interupt while in
polling\n");
+	}
 }

The call to ipoib_warn should be removed if you go with a peek and
reschedule approach, because after notify has been called, we could get
a completion callback after the peek has decided to reschedule the poll.

Todd Rimmer


From ishai at dev.mellanox.co.il  Mon Sep 25 12:31:13 2006
From: ishai at dev.mellanox.co.il (ishai at dev.mellanox.co.il)
Date: Mon, 25 Sep 2006 22:31:13 +0300 (IDT)
Subject: [openib-general] High Availability status in OFED (was Re:
 [openfabrics-ewg] Mellanox/Voltaire/QLogic/IBM SQA results for OFED 1.1?)
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30250611E@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA30250611E@xmb-sjc-216.amer.cisco.com>
Message-ID: <1439.89.1.166.254.1159212673.squirrel@dev.mellanox.co.il>

Hi Scott,

The IPoIB HA (High Availability)solution in OFED 1.1 is a short term
solution. (There is an on going work on a full solution, that uses
bonding).
This short term solution for IPoIB HA uses the command "ip monitor link"
to find out when a link goes down, and then updates the ip address of the
other port.

Apparently RHEL4 uses an old version of iproute package (iproute-2.6.9-3
with ip utility, iproute2-ss040831 in RHEL4.0 U4) in which there is no
unique indication when a port goes down. (It gives the same indication
when a port goes up or down).

In SLES10 there is a newer version of iproute and our solution works well
with this version.

In order to solve the problem, The next RC will include also an
installation of a version of iproute (iproute2-2.6.16-060323 with ip
utility, iproute2-ss060323). This version will be installed only for OFED
installation that includes the IPoIB HA option and only on RHEL4. The
package will be installed in a private directory inside the OFED directory
(It will not replace the iproute version of the distribution) and will be
accessed by the IPoIB scripts using the exact path.


As for SRP HA:
SRP HA is currently available only for SLES10. The reason is that SRP HA
uses the device-mapper multipath that needs high version of udev (>050).
RHEL4 uses udev 039.


Ishai

>> As for the HA it works on SuSE but not on RH. Ishai will
>> issue a report.
>
> This will be fixed for 1.1, right?
>
> Scott
>
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
>


From mst at mellanox.co.il  Mon Sep 25 12:35:58 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 22:35:58 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EFC33@mail.silverstorm.com>
References: <D80D83302DEE6249A221093BF2BB69AE8EFC33@mail.silverstorm.com>
Message-ID: <20060925193558.GF25001@mellanox.co.il>

Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
> In Eli's proposed code:
> +void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr) {
> +	struct net_device *dev = (struct net_device *) dev_ptr;
> +	struct ipoib_dev_priv *priv = netdev_priv(dev);
> +
> +	/* tell the network layer we have packts */
> +        if (netif_rx_schedule_prep(dev))
> +		__netif_rx_schedule(dev);
> +	else {
> +		ipoib_warn(priv, "received interupt while in
> polling\n");
> +	}
>  }
> 
> The call to ipoib_warn should be removed if you go with a peek and
> reschedule approach, because after notify has been called, we could get
> a completion callback after the peek has decided to reschedule the poll.

right.

-- 
MST


From rdreier at cisco.com  Mon Sep 25 12:36:08 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 25 Sep 2006 12:36:08 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE8EFC33@mail.silverstorm.com>
	(Todd Rimmer's message of "Mon, 25 Sep 2006 15:32:42 -0400")
References: <D80D83302DEE6249A221093BF2BB69AE8EFC33@mail.silverstorm.com>
Message-ID: <adahcyvd9hz.fsf@cisco.com>

    Todd> The call to ipoib_warn should be removed if you go with a
    Todd> peek and reschedule approach, because after notify has been
    Todd> called, we could get a completion callback after the peek
    Todd> has decided to reschedule the poll.

Right.  In fact I would just make ipoib_ib_completion() do nothing but
call netif_rx_schedule() (which encapsulates all this logic anyway).

 - R.


From mst at mellanox.co.il  Mon Sep 25 12:46:14 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Mon, 25 Sep 2006 22:46:14 +0300
Subject: [openib-general] openib.org svn upgrade to 1.4.0
In-Reply-To: <1158642390.7167.298.camel@localhost>
References: <1158642390.7167.298.camel@localhost>
Message-ID: <20060925194614.GG25001@mellanox.co.il>

Would it be possible to update openib.org svn server to 1.4.0?
It is still running 1.0.3 which is quite dated.

There were a lot of speed improvements in svn since then.

This would also enable svnsync - the ability to replicate history from one
repository to another, which is extremely important for people like me that
often work over a slow connection or offline.

-- 
MST


From sean.hefty at intel.com  Mon Sep 25 17:03:36 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 25 Sep 2006 17:03:36 -0700
Subject: [openib-general] [RFC] determining which changes in svn to merge
 upstream or remove
Message-ID: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com>

Now that changes from the iWarp branch have been merged upstream, I wanted to
get feedback about migrating existing changes in svn upstream, or removing
features from svn.  Specifically, the following features are in svn only:

* RDMA CM:
        - userspace support
        - multicast support
        - UD QP support (required for multicast)
        - IB specific options (set paths, CM timeouts)
* Local SA cache
* IB multicast module

Additionally, there have been patches to provide userspace SA support, which
requires additional discussion.  (Roland sent a proposal to run a daemon on top
of the existing usermad interface.)  Feedback on the userspace API, and
user/kernel API are welcome.

The main intent behind most of these changes is providing userspace multicast
support through both the RDMA CM, as well as raw IB.

I would like to get a better understanding of which features in svn are in use,
so that I can create patches for their inclusion for 2.6.20, and identify if any
features can be removed from svn.  Userspace support for any feature will target
2.6.20 as well, but may go into the -mm branch if people feel that it still
needs to mature.  My hope is that the differences between svn and the kernel can
be limited primarily to new modules (e.g. the local SA).

- Sean


From mst at mellanox.co.il  Mon Sep 25 22:51:10 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 08:51:10 +0300
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com>
References: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com>
Message-ID: <20060926055109.GB21085@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: [RFC] determining which changes in svn to merge upstream or remove
> 
> Now that changes from the iWarp branch have been merged upstream, I wanted to
> get feedback about migrating existing changes in svn upstream, or removing
> features from svn.  Specifically, the following features are in svn only:
> 
> * RDMA CM:

BTW, there was a set of bugfix patches for CMA posted that didn't get acked or
nacked yet.  They looked sane and I took them into ofed - could you take the
time to review please? Should I repost?  It might make sense to put stability
fixes in before adding more features.

>         - userspace support

I think we agreed that this will use timewait support in
core/low level drivers to handle timewait/stale packets right.
Is that right? If so, I really need to fid the time to do this.

>         - multicast support
>         - UD QP support (required for multicast)
>         - IB specific options (set paths, CM timeouts)

I think that at some point we agreed that at least the option to set
retry count can be made generic (with a limit of 15 retries).
This kind of makes sense since TCP sockets have SYN retry option.

Wrt CM timeouts, asking the ULP to guess the timeout
does not make much sense to me - how does the ULP know?
IMO we need to implement a smarter heuristic that will set them
automatically somehow. Is RDMA CM using all data from the path record
query already? How about implementing exponential backoff? Other ideas?

> * Local SA cache

This is supposed to reduce the load on SM, but personally, I am still not
convinced this is actually necessary - we are seeing gen2 based clusters running
just fine without these tricks.

What is more, this seems to break the model of IB network as a centrally managed
fabric, and a look at the code gives me the feeling no one thought through how
this will interact with SM features such as QoS, balancing, tavor MTU
optimizations etc.

> * IB multicast module

Last time I tested this, there still were crashes with the IPoIB.
If there's a patch that adds just this change, I might be able to test it.
OTOH, I'm still not sure why are we touching IPoIB at all since
it seems unlikely any other ULP will want to share in the IPoIB mcast group.


-- 
MST


From mst at mellanox.co.il  Mon Sep 25 22:56:59 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 08:56:59 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adahcz1hz4t.fsf@cisco.com>
References: <1158850657.24776.158.camel@localhost> <adahcz1hz4t.fsf@cisco.com>
Message-ID: <20060926055659.GC21085@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> I took a stab at implementing this myself

BTW, are you taking over this work? Just implementing peek/request for
notification enhancements?  A couple of your comments sounded like you are - a
little coordination won't hurt here.

-- 
MST


From sean.hefty at intel.com  Tue Sep 26 01:12:17 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 26 Sep 2006 01:12:17 -0700
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <20060926055109.GB21085@mellanox.co.il>
Message-ID: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com>

>BTW, there was a set of bugfix patches for CMA posted that didn't get acked or
>nacked yet.  They looked sane and I took them into ofed - could you take the
>time to review please? Should I repost?  It might make sense to put stability
>fixes in before adding more features.

I've actually been on vacation for 2 of the last 3 weeks, so haven't had a
chance to review recent patches.  I should get to them by the end of this week.

I want to ensure that whatever ends up being submitted upstream makes the most
sense, including pushing fixes before other changes.  This adds more work, which
is why I'd like to get svn more in sync with the kernel.

>>         - userspace support
>
>I think we agreed that this will use timewait support in
>core/low level drivers to handle timewait/stale packets right.
>Is that right? If so, I really need to fid the time to do this.

I think so.  I don't see a clean alternative.  I would also propose that we
discourage connecting QPs outside of the IB CM to allow for detecting duplicate
connections.  We don't necessarily need to enforce this in code, but changing
test programs to at least comment that connecting over sockets is discouraged
may help.

>>         - multicast support
>>         - UD QP support (required for multicast)
>>         - IB specific options (set paths, CM timeouts)
>
>I think that at some point we agreed that at least the option to set
>retry count can be made generic (with a limit of 15 retries).
>This kind of makes sense since TCP sockets have SYN retry option.
>
>Wrt CM timeouts, asking the ULP to guess the timeout
>does not make much sense to me - how does the ULP know?
>IMO we need to implement a smarter heuristic that will set them
>automatically somehow. Is RDMA CM using all data from the path record
>query already? How about implementing exponential backoff? Other ideas?

My thoughts on the options are to try to hold off merging them upstream.  The
option to get/set path records needs reworked.  Getting paths should be done
outside of the RDMA CM, such as through a userspace SA, and the user to kernel
interface should pass the attribute values as defined by the spec (i.e. in
network order) to avoid marshalling issues.

The CM timeout/retry options are used by uDAPL, but the fix to increase the
default retry count to the maximum may help.  The RDMA CM uses the data from the
path record, but the ULP has the most data about how long the remote side might
take to respond to a CM REQ message (remote_cm_response_timeout).  We might be
able to have the RDMA CM make clever use of the MRA to avoid the issue, and even
in the short term, dumb use of the MRA may help.  (The issue is that as
connections are formed, they begin being used, which can greatly affect how
quickly new connections can be created.  We've seen them take up to 60 seconds.)

>
>> * Local SA cache
>
>This is supposed to reduce the load on SM, but personally, I am still not
>convinced this is actually necessary - we are seeing gen2 based clusters
>running
>just fine without these tricks.
>
>What is more, this seems to break the model of IB network as a centrally
>managed
>fabric, and a look at the code gives me the feeling no one thought through how
>this will interact with SM features such as QoS, balancing, tavor MTU
>optimizations etc.

This is a module that I'm not inclined to submit upstream.  It was requested as
part of the Path Forward work, but I haven't seen any feedback on its use or
performance.

>> * IB multicast module
>
>Last time I tested this, there still were crashes with the IPoIB.
>If there's a patch that adds just this change, I might be able to test it.
>OTOH, I'm still not sure why are we touching IPoIB at all since
>it seems unlikely any other ULP will want to share in the IPoIB mcast group.

Personally, I think ipoib should use it to reduce code duplication.  Declining
the change because there _might_ be a bug in the new module doesn't seem like
the right approach to take.  (Why accept changes in the HCA driver then?)  Plus
we continue to find bugs in the ipoib multicast code.  The main reason I
modified ipoib was so that ib_multicast had a real user that I could test with,
but there's nothing architecturally that prevents a process from joining the
ipoib multicast group (maybe to snoop traffic for some reason...).  I haven't
seen any crashes with ipoib using ib_multicast, and since my last fix, I haven't
seen any bug reports.

The patches for ipoib need to be regenerated because of changes since the svn
checkin.

- Sean


From mst at mellanox.co.il  Tue Sep 26 02:34:35 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 12:34:35 +0300
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com>
References: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com>
Message-ID: <20060926093435.GA21473@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> The CM timeout/retry options are used by uDAPL, but the fix to increase the
> default retry count to the maximum may help.  The RDMA CM uses the data from the
> path record, but the ULP has the most data about how long the remote side might
> take to respond to a CM REQ message (remote_cm_response_timeout).  We might be
> able to have the RDMA CM make clever use of the MRA to avoid the issue, and even
> in the short term, dumb use of the MRA may help.  (The issue is that as
> connections are formed, they begin being used, which can greatly affect how
> quickly new connections can be created.  We've seen them take up to 60 seconds.)

Connections taking 60 sec to create is an issue.
Can you please explain how the fact that some connections are used affect
the time it takes to send the response?
Why would sending MRA be faster than sending the response?

> >> * IB multicast module
> >
> >Last time I tested this, there still were crashes with the IPoIB.
> >If there's a patch that adds just this change, I might be able to test it.
> >OTOH, I'm still not sure why are we touching IPoIB at all since
> >it seems unlikely any other ULP will want to share in the IPoIB mcast group.
> 
> Personally, I think ipoib should use it to reduce code duplication.

I did not notice any significant reduction in ipoib LOC, but maybe I'm mistaken.
Let's see the updated patch.

-- 
MST
-- 
MST


From mst at mellanox.co.il  Tue Sep 26 02:51:22 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 12:51:22 +0300
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com>
References: <000001c6e143$7c010d60$0f74e984@amr.corp.intel.com>
Message-ID: <20060926095122.GB21473@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [RFC] determining which changes in svn to merge upstream or remove
> 
> >BTW, there was a set of bugfix patches for CMA posted that didn't get acked or
> >nacked yet.  They looked sane and I took them into ofed - could you take the
> >time to review please? Should I repost?  It might make sense to put stability
> >fixes in before adding more features.
> 
> I've actually been on vacation for 2 of the last 3 weeks, so haven't had a
> chance to review recent patches.  I should get to them by the end of this week.

You can get the full list of stuff we apply in ofed here:

git://www.mellanox.co.il/~git/infiniband ofed_addons

look in directory kernel_patches/fixes

Here's a list:

cma_list_init.patch
cma_mem_leak.patch
cma_race_fix.patch
cma_tavor_quirk.patch

and your own:
sean_cma_establish.patch

The following needs some discussion:
sean_cm_drep_on_not_found.patch

-- 
MST


From ishai at mellanox.co.il  Tue Sep 26 06:28:50 2006
From: ishai at mellanox.co.il (Ishai Rabinovitz)
Date: Tue, 26 Sep 2006 16:28:50 +0300
Subject: [openib-general] [PATCH] IB/SRP identify QP in error state
Message-ID: <20060926132850.GA17342@mellanox.co.il>

There is a bug in mthca low level driver. 
A call to ib_post_send that tries to post to a QP that is in error state does
not return immediately with error. It terminates with errors after a timeout.

This causes SRP to wait a long time to reconnect. (Each abort call and
each reset_device call performs post_send and waits for the timeout).
The following patch solves this problem by identifying the failure 
and returning an immediate error code.

Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>
---
Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-25 13:51:47.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-25 15:40:04.000000000 +0300
@@ -543,6 +543,7 @@ static int srp_reconnect_target(struct s
 	target->tx_head	 = 0;
 	target->tx_tail  = 0;
 
+	target->need_reset = 0;
 	ret = srp_connect_target(target);
 	if (ret)
 		goto err;
@@ -858,6 +859,7 @@ static void srp_completion(struct ib_cq 
 			printk(KERN_ERR PFX "failed %s status %d\n",
 			       wc.wr_id & SRP_OP_RECV ? "receive" : "send",
 			       wc.status);
+			target->need_reset = 1;
 			break;
 		}
 
@@ -1313,6 +1315,8 @@ static int srp_abort(struct scsi_cmnd *s
 
 	printk(KERN_ERR "SRP abort called\n");
 
+	if (target->need_reset)
+		return FAILED;
 	if (srp_find_req(target, scmnd, &req))
 		return FAILED;
 	if (srp_send_tsk_mgmt(target, req, SRP_TSK_ABORT_TASK))
@@ -1341,6 +1345,8 @@ static int srp_reset_device(struct scsi_
 
 	printk(KERN_ERR "SRP reset_device called\n");
 
+	if (target->need_reset)
+		return FAILED;
 	if (srp_find_req(target, scmnd, &req))
 		return FAILED;
 	if (srp_send_tsk_mgmt(target, req, SRP_TSK_LUN_RESET))
@@ -1750,6 +1756,7 @@ static ssize_t srp_create_target(struct 
 		goto err_free;
 	}
 
+	target->need_reset = 0;
 	ret = srp_connect_target(target);
 	if (ret) {
 		printk(KERN_ERR PFX "Connection failed\n");
Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.h
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.h	2006-09-25 13:51:47.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.h	2006-09-25 14:00:36.000000000 +0300
@@ -158,6 +158,7 @@ struct srp_target_port {
 	struct completion	done;
 	int			status;
 	enum srp_target_state	state;
+	int			need_reset;
 };
 
 struct srp_iu {
-- 
Ishai Rabinovitz


From jackm at dev.mellanox.co.il  Tue Sep 26 07:07:37 2006
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 26 Sep 2006 17:07:37 +0300
Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib
 module & ofed
In-Reply-To: <Pine.GSO.4.58.0609251305100.25974@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251305100.25974@seth.cpc.wmin.ac.uk>
Message-ID: <200609261707.37720.jackm@dev.mellanox.co.il>

On Monday 25 September 2006 17:01, Thierry Delaitre wrote:

I noticed in the Lustre configure file the following
  --with-linux=path       set path to Linux source (default=/usr/src/linux)

Where does /usr/src/linux link to?

You might consider explicitly specifying the following options as well in the
Lustre ./configure step:

  --with-linux=path       set path to Linux source (default=/usr/src/linux)
  --with-linux-obj=path   set path to Linux objects dir (default=$LINUX)
  --with-linux-config=path
                          set path to Linux .conf (default=$LINUX_OBJ/.config)

- Jack
> 
> On Mon, 25 Sep 2006, Thierry Delaitre wrote:
> 
> >
> > It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default
> > despite the fact that my kernel is 2.6.16.21-0.8-smp !
> >
> > uname -a
> > Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux
> >
> > make[3]: Nothing to be done for `install-exec-am'.
> > /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre
> >  /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota
> >
> > I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and
> > /lib/modules/2.6.16.21-0.8-default
> >
> > i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and
> > not 2.6.16.21-0.8-smp
> 
> I've updated the UTS_RELEASE string in
> /usr/src/linux-2.6.16.21-0.8/include/linux/version.h from default to smp
> and deleted my /lib/modules/
> lustre now installs in /lib/modules/2.6.16.21-0.8-smp/kernel along with
> ofed ib drivers. i recompiled the kernel, ofed and lustre and still gets
> this:
> 
> ko2iblnd: disagrees about version of symbol ib_create_cq
> ko2iblnd: Unknown symbol ib_create_cq
> ko2iblnd: disagrees about version of symbol ib_dereg_mr
> ko2iblnd: Unknown symbol ib_dereg_mr
> ko2iblnd: disagrees about version of symbol ib_destroy_cq
> ko2iblnd: Unknown symbol ib_destroy_cq
> ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> ko2iblnd: Unknown symbol ib_get_dma_mr
> ko2iblnd: disagrees about version of symbol ib_alloc_pd
> ko2iblnd: Unknown symbol ib_alloc_pd
> ko2iblnd: disagrees about version of symbol ib_modify_qp
> ko2iblnd: Unknown symbol ib_modify_qp
> ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> ko2iblnd: Unknown symbol ib_dealloc_pd
> LustreError: 7430:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> o2ib, module ko2iblnd, rc=256
> 
> nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_create_cq
> 3cfe7afa A __crc_ib_create_cq
> 00000060 r __kcrctab_ib_create_cq
> 0000015f r __kstrtab_ib_create_cq
> 000000c0 r __ksymtab_ib_create_cq
> 00000d50 T ib_create_cq
> 
> i'm a bit stuck!
> 
> Thierry.
> 
> > Thierry.
> >
> > On Mon, 25 Sep 2006, Thierry Delaitre wrote:
> >
> > >
> > > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote:
> > >
> > > > Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> > > > >
> > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
> > > > > lustre's configure line below. Lustre's configure script looks for a
> > > > > driver/infiniband directory which only seems to exist under
> > > > > /usr/local/ofed/src/openib-1.1
> > > > >
> > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/
> > > > >
> > > > > Thierry.
> > > > >
> > > > > > replace /usr/local/ofed with the prefix you specified.
> > > >
> > > > This looks wrong - openib-1.1 is the pristine sources.
> > > > openib/include is the exported interface and is what you should use
> > > > for dependent modules.
> > > > No idea why would lustre need drivers/infiniband.
> > > > Try creating a softlink:
> > > >
> > > > mkdir /usr/local/ofed/src/openib/drivers/infiniband
> > > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband
> > >
> > > I untarred lustre 1.5.95, compiled it (./configure
> > > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a
> > > and still get the following:
> > >
> > > my modprobe.conf is the following
> > >
> > > options lnet ip2nets="o2ib0 161.74.83.[0-255]"
> > >
> > > lctl network up
> > > LNET configure error 100: Network is down
> > >
> > > ko2iblnd: disagrees about version of symbol ib_create_cq
> > > ko2iblnd: Unknown symbol ib_create_cq
> > > ko2iblnd: disagrees about version of symbol ib_dereg_mr
> > > ko2iblnd: Unknown symbol ib_dereg_mr
> > > ko2iblnd: disagrees about version of symbol ib_destroy_cq
> > > ko2iblnd: Unknown symbol ib_destroy_cq
> > > ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> > > ko2iblnd: Unknown symbol ib_get_dma_mr
> > > ko2iblnd: disagrees about version of symbol ib_alloc_pd
> > > ko2iblnd: Unknown symbol ib_alloc_pd
> > > ko2iblnd: disagrees about version of symbol ib_modify_qp
> > > ko2iblnd: Unknown symbol ib_modify_qp
> > > ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> > > ko2iblnd: Unknown symbol ib_dealloc_pd
> > > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> > > o2ib, module ko2iblnd, rc=256
> > >
> > > lsmod | grep ib
> > > libcfs                103060  1 lnet
> > > ib_ucm                 19332  0
> > > ib_addr                10756  1 rdma_cm
> > > ib_cm                  31968  2 ib_ucm,rdma_cm
> > > ib_ipoib               48400  0
> > > ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
> > > ib_uverbs              38312  2 rdma_ucm,ib_ucm
> > > ib_umad                17968  0
> > > ib_mthca              116240  0
> > > ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
> > > ib_core                49024  9
> > > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
> > >
> > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd
> > > d5dcb698 A __crc_ib_alloc_pd
> > > 0000001c r __kcrctab_ib_alloc_pd
> > > 0000006a r __kstrtab_ib_alloc_pd
> > > 00000038 r __ksymtab_ib_alloc_pd
> > > 00000c65 T ib_alloc_pd
> > >
> > > from lustre's config.log:
> > >
> > > configure:6500: checking whether to enable OpenIB gen2 support
> > > configure:6586: cp conftest.c build && make modules CC=gcc -f
> > > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX
> > > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include  M=/root/lustre-1.5.95/build
> > > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration
> > > isn't a prototype
> > > /root/lustre-1.5.95/build/conftest.c: In function 'main':
> > > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason'
> > > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr'
> > > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr'
> > > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr'
> > > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param'
> > > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined!
> > > configure:6589: $? = 0
> > > configure:6591: test -s build/conftest.o
> > > configure:6594: $? = 0
> > > configure:6597: result: yes
> > >
> > >
> > > Thierry.
> > >
> > > _______________________________________________
> > > openib-general mailing list
> > > openib-general at openib.org
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > >
> > >
> >
> > ----------------------------------------
> > Dr Thierry DELAITRE
> > Systems and Services Manager, CSCS
> > University of Westminster
> > 115 New Cavendish Street, London W1W 6UW
> >
> > Tel: 020 7911 5000 ext: 3586
> > Fax: 020 7911 5089
> > Mobile short dial code 1788
> >
> > http://www.cscs.wmin.ac.uk/~delaitt
> > ----------------------------------------
> >
> > This e-mail and its attachments are intended for the above named only
> > and may be confidential.  If they have come to you in error you must
> > not copy or show them to anyone, nor should you take any action based
> > on them, other than to notify the error by replying to the sender.
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> >
> >
> 
> ----------------------------------------
> Dr Thierry DELAITRE
> Systems and Services Manager, CSCS
> University of Westminster
> 115 New Cavendish Street, London W1W 6UW
> 
> Tel: 020 7911 5000 ext: 3586
> Fax: 020 7911 5089
> Mobile short dial code 1788
> 
> http://www.cscs.wmin.ac.uk/~delaitt
> ----------------------------------------
> 
> This e-mail and its attachments are intended for the above named only
> and may be confidential.  If they have come to you in error you must
> not copy or show them to anyone, nor should you take any action based
> on them, other than to notify the error by replying to the sender.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From mst at mellanox.co.il  Tue Sep 26 07:24:00 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 17:24:00 +0300
Subject: [openib-general] [PATCH] IB/SRP identify QP in error state
In-Reply-To: <20060926132850.GA17342@mellanox.co.il>
References: <20060926132850.GA17342@mellanox.co.il>
Message-ID: <20060926142400.GF21473@mellanox.co.il>

Quoting r. Ishai Rabinovitz <ishai at mellanox.co.il>:
> Subject: [PATCH] IB/SRP identify QP in error state
> 
> There is a bug in mthca low level driver. 
> A call to ib_post_send that tries to post to a QP that is in error state does
> not return immediately with error. It terminates with errors after a timeout.

Let me rephrase: after post send/receive to QP in error state in mthca,
a completion with error might never get generated.
SRP will then timeout.

To fix mthca, we'd need to change QP state on completion with error
and on modify to error, and add actual code where it now says

/* XXX check that state is OK to post send */
/* XXX check that state is OK to post receive */

I guess the reason we never fixed this was because it did not
seem to actually hurt any real ULPs, and testing QP state will
affect fast path performance.

However, IB spec is quite explicit on this point, and fixing a low
level drivers seems a better approach than adding work-arounds in ULPs.

Roland, what do you think?

-- 
MST


From ishai at mellanox.co.il  Tue Sep 26 07:45:41 2006
From: ishai at mellanox.co.il (Ishai Rabinovitz)
Date: Tue, 26 Sep 2006 17:45:41 +0300
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
Message-ID: <20060926144541.GA17938@mellanox.co.il>

Hi Roland,

SRP High Availability needs an initiator to connect to the same target 
several times, e.g., once from each IB port of the target (this way we can use
device mapper multipath for failover). Note that both connections are actually
active, e.g. multipath is issuing commands to to get the remote scsi id.

Since multiple channel operation is currently disabled in connection request,
each new connection request will cause the target to disconnect
the existing connection which forces us to bounce a lot between the two channels.

This patch enables multiple channel operation in connection requests, to avoid getting
disconnects when multiple connections are active. There does not seem to be any harm
in doing this even when multipath is not used.

Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>

---

Index: last_stable/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================
--- last_stable.orig/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-26 09:22:13.000000000 +0300
+++ last_stable/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-26 14:54:35.000000000 +0300
@@ -329,6 +329,7 @@ static int srp_send_req(struct srp_targe
 	req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len);
 	req->priv.req_buf_fmt 	= cpu_to_be16(SRP_BUF_FORMAT_DIRECT |
 					      SRP_BUF_FORMAT_INDIRECT);
+	req->priv.req_flags	= SRP_MULTICHAN_MULTI;
 	/*
 	 * In the published SRP specification (draft rev. 16a), the 
 	 * port identifier format is 8 bytes of ID extension followed
-- 
Ishai Rabinovitz


From thomas.bub at thomson.net  Tue Sep 26 08:00:07 2006
From: thomas.bub at thomson.net (Bub Thomas)
Date: Tue, 26 Sep 2006 17:00:07 +0200
Subject: [openib-general] How to register, query and delete a service_id?
Message-ID: <B79FAF8BB536314E859EA1963CFFD222029AC478@wdtssmail01.eu.thmulti.com>

Hi,
as I'm porting my gen1 application to gen2 my last task is to port the
service_id registration, query and deletion to gen2.
With the help of Mellanox I got it running under gen1 using read/write
of mad messages on the device "/dev/ts_ua0".
I browsed though the ofed sources and got lost in there.
Is there some good and simple example that can help me out of my blind? 
I assume I have to use the ibmad and/or ibumad library?
Thanks
Thomas Bub

............................................................
Thomas Bub
Grass Valley Germany GmbH
Brunnenweg 9
64331 Weiterstadt, Germany
Tel: +49 6150 104 147
Fax: +49 6150 104 656
Email: Thomas.Bub at thomson.net
www.GrassValley.com  <http://www.grassvalley.com> 
............................................................


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/1e6e27b8/attachment.html>

From sashak at voltaire.com  Tue Sep 26 08:10:00 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 26 Sep 2006 18:10:00 +0300
Subject: [openib-general] [PATCH] osm: cosmetic changes in osmtest
 multicast flow
In-Reply-To: <yzsbqp4850f.fsf@kliteynik.yok.mtl.com>
References: <yzsbqp4850f.fsf@kliteynik.yok.mtl.com>
Message-ID: <20060926151000.GA8949@sashak.voltaire.com>

On 16:12 Mon 25 Sep     , Yevgeny Kliteynik wrote:
> Hi Hal
> 
> This patch is all about cosmetics - it improves
> the osmtest log readability, and it also has some 
> cosmetic additions in the code.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Applied (to trunk). Thanks.

Sasha


From rdreier at cisco.com  Tue Sep 26 08:07:45 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Sep 2006 08:07:45 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060926055659.GC21085@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 26 Sep 2006 08:56:59 +0300")
References: <1158850657.24776.158.camel@localhost>
	<adahcz1hz4t.fsf@cisco.com> <20060926055659.GC21085@mellanox.co.il>
Message-ID: <ada1wpyd5tq.fsf@cisco.com>

    Michael> BTW, are you taking over this work? Just implementing
    Michael> peek/request for notification enhancements?  A couple of
    Michael> your comments sounded like you are - a little
    Michael> coordination won't hurt here.

I think I can handle everything from here -- I will post patches based
on my current approach soon.

 - R.


From rdreier at cisco.com  Tue Sep 26 08:12:08 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Sep 2006 08:12:08 -0700
Subject: [openib-general] [PATCH] IB/SRP identify QP in error state
In-Reply-To: <20060926142400.GF21473@mellanox.co.il> (Michael S.
	Tsirkin's message of "Tue, 26 Sep 2006 17:24:00 +0300")
References: <20060926132850.GA17342@mellanox.co.il>
	<20060926142400.GF21473@mellanox.co.il>
Message-ID: <adau02ubr1z.fsf@cisco.com>

    Michael> Let me rephrase: after post send/receive to QP in error
    Michael> state in mthca, a completion with error might never get
    Michael> generated.

Won't a flush error be generated for every request posted to a QP in
the error state?

    Michael> To fix mthca, we'd need to change QP state on completion
    Michael> with error and on modify to error, and add actual code
    Michael> where it now says

    Michael> /* XXX check that state is OK to post send */ /* XXX
    Michael> check that state is OK to post receive */

    Michael> I guess the reason we never fixed this was because it did
    Michael> not seem to actually hurt any real ULPs, and testing QP
    Michael> state will affect fast path performance.

    Michael> However, IB spec is quite explicit on this point, and
    Michael> fixing a low level drivers seems a better approach than
    Michael> adding work-arounds in ULPs.

    Michael> Roland, what do you think?

Yes, that was something I just never got around to implementing.  Of
course since transition to error state may be done asynchronously by
hardware, we still have the case where the consumer tries to post a
work request to a QP in error but the low-level driver still thinks
the QP is in RTS.

 - R.
In-Reply-To: <20060926142400.GF21473 at mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 26 Sep 2006 17:24:00 +0300")
User-Agent: Gnus/5.1007 (Gnus v5.10.7) XEmacs/21.4.18 (linux)
Date: Tue, 26 Sep 2006 08:11:54 -0700
Message-ID: <adavenabr2d.fsf at cisco.com>


From aviram at mellanox.co.il  Tue Sep 26 08:39:37 2006
From: aviram at mellanox.co.il (Aviram Gutman)
Date: Tue, 26 Sep 2006 18:39:37 +0300
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
Message-ID: <2D5DEE3C6A0E0244B0133244731D4C4B04D147@mtlexch01.mtl.com>

 
-----Original Message-----
From: Arlin Davis [mailto:ardavis at ichips.intel.com] 
Sent: Monday, September 25, 2006 9:38 PM
To: Arlin Davis
Cc: Sean Hefty; openib-general at openib.org; Aviram Gutman
Subject: Re: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response
to unmatched DREQ

Arlin Davis wrote:

>Sean Hefty wrote:
>
>  
>
>>Currently a DREP is only sent in response to a DREQ if a connection 
>>has been found matching the DREQ, and it is in the proper state.  Once

>>a DREP is sent, the local connection moves into timewait.  Duplicate 
>>DREQs received while in this state result in re-sending the DREP.
>>
>>However, it's likely that the local connection will enter and exit 
>>timewait before the remote side times out a lost DREP and resends a
DREQ.
>>There are a couple possible solutions to this.  One is to increase how

>>long a connection remains in timewait, by multiplying its wait time by

>>max_cm_retries.  This can greatly increase the timewait state before a

>>QP can be re-used when CM messages are not lost.
>>
>>An alternative is to send a DREP in response to a DREQ, even if a 
>>local connection is not found, which is what this patch does.
>> 
>>
>>    
>>
>
>Would it be possible to get this fix in  rc7? I am consistently seeing 
>this problem with Intel MPI on a 64 node cluster.
>
>-arlin
>  
>
> Aviram? Is there an rc7 and could this get in?

>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general
>
>  
>

Yes, Michael Tsirkin add it.


From sean.hefty at intel.com  Tue Sep 26 08:48:11 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 26 Sep 2006 08:48:11 -0700
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <20060926093435.GA21473@mellanox.co.il>
Message-ID: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com>

>Connections taking 60 sec to create is an issue.
>Can you please explain how the fact that some connections are used affect
>the time it takes to send the response?

This is in userspace, and IMO, an application issue.  Threads using established
connections simply begin consuming all processor time.  This is while running
under heavy load and trying to scale up the application.

>Why would sending MRA be faster than sending the response?

An MRA could be sent directly by the RDMA CM in the kernel in a REQ callback,
whereas the response requires the userspace application to poll the REQ and
generate a REP.

- Sean


From aviram at dev.mellanox.co.il  Tue Sep 26 09:00:34 2006
From: aviram at dev.mellanox.co.il (Aviram Gutman)
Date: Tue, 26 Sep 2006 19:00:34 +0300
Subject: [openib-general] OFED Status
Message-ID: <45194EA2.6070202@dev.mellanox.co.il>

Hi,

OFED 1.1 RC6 was released on Thu.

The issues that were resolved since are:

1) OpenIB Diags build on SLES10 ppc  - Solved by Moshe Katzir from Voltaire
2)  iSER build on SLES10 needs root privilege - Voltaire fixed it
3) Bug #233 SDP crash on ipath - I believe MST fixed. Betsy please confirm.
4) Fix IBDM to allow multiple devices on the same machine - Eitan Zahavi 
fixed
5) SRP HA - Fixed by Ishai
6) IPoIB HA on RH - Vlad made progess, issue is still not solved.
7) The CM fix that Arlin asked - In

Pending that IPoIB HA is solved would like to issue RC7 that suppose to 
be final. Is everyone OK with this approach?


Aviram


From bos at pathscale.com  Tue Sep 26 09:58:27 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Tue, 26 Sep 2006 09:58:27 -0700
Subject: [openib-general] [openfabrics-ewg] OFED Status
In-Reply-To: <45194EA2.6070202@dev.mellanox.co.il>
References: <45194EA2.6070202@dev.mellanox.co.il>
Message-ID: <1159289907.9652.18.camel@chalcedony.pathscale.com>

On Tue, 2006-09-26 at 19:00 +0300, Aviram Gutman wrote:

> 3) Bug #233 SDP crash on ipath - I believe MST fixed. Betsy please confirm.

Yes, this seems to be fixed.

	<b


From delaitt at cpc.wmin.ac.uk  Tue Sep 26 10:40:44 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Tue, 26 Sep 2006 18:40:44 +0100 (BST)
Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib
 module & ofed
In-Reply-To: <200609261707.37720.jackm@dev.mellanox.co.il>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251305100.25974@seth.cpc.wmin.ac.uk>
	<200609261707.37720.jackm@dev.mellanox.co.il>
Message-ID: <Pine.GSO.4.58.0609261704080.25974@seth.cpc.wmin.ac.uk>


On Tue, 26 Sep 2006, Jack Morgenstein wrote:

> On Monday 25 September 2006 17:01, Thierry Delaitre wrote:
>
> I noticed in the Lustre configure file the following
>   --with-linux=path       set path to Linux source (default=/usr/src/linux)
>
> Where does /usr/src/linux link to?
>
> You might consider explicitly specifying the following options as well in the
> Lustre ./configure step:
>
>   --with-linux=path       set path to Linux source (default=/usr/src/linux)
>   --with-linux-obj=path   set path to Linux objects dir (default=$LINUX)
>   --with-linux-config=path
>                           set path to Linux .conf (default=$LINUX_OBJ/.config)

I specified the whole string and still the same.

./configure --with-o2ib=/usr/local/ofed/src/openib --with-linux=/usr/src/linux-2.6.16.21-0.8 --with-linux-obj=/usr/src/linux-2.6.16.21-0.8 --with-linux-config=/usr/src/linux-2.6.16.21-0.8/.config

Thierry.

> - Jack
> >
> > On Mon, 25 Sep 2006, Thierry Delaitre wrote:
> >
> > >
> > > It seems that lustre puts its modules in /lib/modules/2.6.16.21-0.8-default
> > > despite the fact that my kernel is 2.6.16.21-0.8-smp !
> > >
> > > uname -a
> > > Linux n32 2.6.16.21-0.8-smp #4 SMP Sun Sep 24 08:47:30 BST 2006 i686 i686 i386 GNU/Linux
> > >
> > > make[3]: Nothing to be done for `install-exec-am'.
> > > /bin/sh ../../mkinstalldirs /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre
> > >  /usr/bin/install -c -m 644 lquota.ko /lib/modules/2.6.16.21-0.8-default/kernel/fs/lustre/lquota
> > >
> > > I therefore ends up with a /lib/modules/2.6.16.21-0.8-smp and
> > > /lib/modules/2.6.16.21-0.8-default
> > >
> > > i'm now searching why lustre thinks my kernel is 2.6.16.21-0.8-default and
> > > not 2.6.16.21-0.8-smp
> >
> > I've updated the UTS_RELEASE string in
> > /usr/src/linux-2.6.16.21-0.8/include/linux/version.h from default to smp
> > and deleted my /lib/modules/
> > lustre now installs in /lib/modules/2.6.16.21-0.8-smp/kernel along with
> > ofed ib drivers. i recompiled the kernel, ofed and lustre and still gets
> > this:
> >
> > ko2iblnd: disagrees about version of symbol ib_create_cq
> > ko2iblnd: Unknown symbol ib_create_cq
> > ko2iblnd: disagrees about version of symbol ib_dereg_mr
> > ko2iblnd: Unknown symbol ib_dereg_mr
> > ko2iblnd: disagrees about version of symbol ib_destroy_cq
> > ko2iblnd: Unknown symbol ib_destroy_cq
> > ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> > ko2iblnd: Unknown symbol ib_get_dma_mr
> > ko2iblnd: disagrees about version of symbol ib_alloc_pd
> > ko2iblnd: Unknown symbol ib_alloc_pd
> > ko2iblnd: disagrees about version of symbol ib_modify_qp
> > ko2iblnd: Unknown symbol ib_modify_qp
> > ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> > ko2iblnd: Unknown symbol ib_dealloc_pd
> > LustreError: 7430:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> > o2ib, module ko2iblnd, rc=256
> >
> > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_create_cq
> > 3cfe7afa A __crc_ib_create_cq
> > 00000060 r __kcrctab_ib_create_cq
> > 0000015f r __kstrtab_ib_create_cq
> > 000000c0 r __ksymtab_ib_create_cq
> > 00000d50 T ib_create_cq
> >
> > i'm a bit stuck!
> >
> > Thierry.
> >
> > > Thierry.
> > >
> > > On Mon, 25 Sep 2006, Thierry Delaitre wrote:
> > >
> > > >
> > > > On Mon, 25 Sep 2006, Michael S. Tsirkin wrote:
> > > >
> > > > > Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> > > > > >
> > > > > > I've set the o2ib path to /usr/local/ofed/src/openib-1.1 as shown in the
> > > > > > lustre's configure line below. Lustre's configure script looks for a
> > > > > > driver/infiniband directory which only seems to exist under
> > > > > > /usr/local/ofed/src/openib-1.1
> > > > > >
> > > > > > ./configure --with-o2ib=/usr/local/ofed/src/openib-1.1/
> > > > > >
> > > > > > Thierry.
> > > > > >
> > > > > > > replace /usr/local/ofed with the prefix you specified.
> > > > >
> > > > > This looks wrong - openib-1.1 is the pristine sources.
> > > > > openib/include is the exported interface and is what you should use
> > > > > for dependent modules.
> > > > > No idea why would lustre need drivers/infiniband.
> > > > > Try creating a softlink:
> > > > >
> > > > > mkdir /usr/local/ofed/src/openib/drivers/infiniband
> > > > > ln -s /usr/local/ofed/src/openib/include /usr/local/ofed/src/openib/drivers/infiniband
> > > >
> > > > I untarred lustre 1.5.95, compiled it (./configure
> > > > --with-o2ib=/usr/local/ofed/src/openib) . did a make install, depmod -a
> > > > and still get the following:
> > > >
> > > > my modprobe.conf is the following
> > > >
> > > > options lnet ip2nets="o2ib0 161.74.83.[0-255]"
> > > >
> > > > lctl network up
> > > > LNET configure error 100: Network is down
> > > >
> > > > ko2iblnd: disagrees about version of symbol ib_create_cq
> > > > ko2iblnd: Unknown symbol ib_create_cq
> > > > ko2iblnd: disagrees about version of symbol ib_dereg_mr
> > > > ko2iblnd: Unknown symbol ib_dereg_mr
> > > > ko2iblnd: disagrees about version of symbol ib_destroy_cq
> > > > ko2iblnd: Unknown symbol ib_destroy_cq
> > > > ko2iblnd: disagrees about version of symbol ib_get_dma_mr
> > > > ko2iblnd: Unknown symbol ib_get_dma_mr
> > > > ko2iblnd: disagrees about version of symbol ib_alloc_pd
> > > > ko2iblnd: Unknown symbol ib_alloc_pd
> > > > ko2iblnd: disagrees about version of symbol ib_modify_qp
> > > > ko2iblnd: Unknown symbol ib_modify_qp
> > > > ko2iblnd: disagrees about version of symbol ib_dealloc_pd
> > > > ko2iblnd: Unknown symbol ib_dealloc_pd
> > > > LustreError: 4177:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
> > > > o2ib, module ko2iblnd, rc=256
> > > >
> > > > lsmod | grep ib
> > > > libcfs                103060  1 lnet
> > > > ib_ucm                 19332  0
> > > > ib_addr                10756  1 rdma_cm
> > > > ib_cm                  31968  2 ib_ucm,rdma_cm
> > > > ib_ipoib               48400  0
> > > > ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
> > > > ib_uverbs              38312  2 rdma_ucm,ib_ucm
> > > > ib_umad                17968  0
> > > > ib_mthca              116240  0
> > > > ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
> > > > ib_core                49024  9
> > > > ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
> > > >
> > > > nm /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/core/ib_core.ko | grep ib_alloc_pd
> > > > d5dcb698 A __crc_ib_alloc_pd
> > > > 0000001c r __kcrctab_ib_alloc_pd
> > > > 0000006a r __kstrtab_ib_alloc_pd
> > > > 00000038 r __ksymtab_ib_alloc_pd
> > > > 00000c65 T ib_alloc_pd
> > > >
> > > > from lustre's config.log:
> > > >
> > > > configure:6500: checking whether to enable OpenIB gen2 support
> > > > configure:6586: cp conftest.c build && make modules CC=gcc -f
> > > > /root/lustre-1.5.95/build/Makefile LUSTRE_LINUX
> > > > _CONFIG=/usr/src/linux/.config -o tmp_include_depends -o scripts -o include/config/MARKER -C /usr/src/linux EXTRA_CFLAGS=-Werror-implicit-function-declaration -g -I/root/lustre-1.5.95/lnet/include -I/root/lustre-1.5.95/lustre/include -I/usr/local/ofed/src/openib/include  M=/root/lustre-1.5.95/build
> > > > /root/lustre-1.5.95/build/conftest.c:42: warning: function declaration
> > > > isn't a prototype
> > > > /root/lustre-1.5.95/build/conftest.c: In function 'main':
> > > > /root/lustre-1.5.95/build/conftest.c:49: warning: unused variable 'rej_reason'
> > > > /root/lustre-1.5.95/build/conftest.c:48: warning: unused variable 'pool_fmr'
> > > > /root/lustre-1.5.95/build/conftest.c:47: warning: unused variable 'qp_attr'
> > > > /root/lustre-1.5.95/build/conftest.c:46: warning: unused variable 'device_attr'
> > > > /root/lustre-1.5.95/build/conftest.c:45: warning: unused variable 'conn_param'
> > > > WARNING: "rdma_create_id" [/root/lustre-1.5.95/build/conftest.ko] undefined!
> > > > configure:6589: $? = 0
> > > > configure:6591: test -s build/conftest.o
> > > > configure:6594: $? = 0
> > > > configure:6597: result: yes
> > > >
> > > >
> > > > Thierry.
> > > >
> > > > _______________________________________________
> > > > openib-general mailing list
> > > > openib-general at openib.org
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > > >
> > > >
> > >
> > > ----------------------------------------
> > > Dr Thierry DELAITRE
> > > Systems and Services Manager, CSCS
> > > University of Westminster
> > > 115 New Cavendish Street, London W1W 6UW
> > >
> > > Tel: 020 7911 5000 ext: 3586
> > > Fax: 020 7911 5089
> > > Mobile short dial code 1788
> > >
> > > http://www.cscs.wmin.ac.uk/~delaitt
> > > ----------------------------------------
> > >
> > > This e-mail and its attachments are intended for the above named only
> > > and may be confidential.  If they have come to you in error you must
> > > not copy or show them to anyone, nor should you take any action based
> > > on them, other than to notify the error by replying to the sender.
> > >
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss at clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > >
> > >
> >
> > ----------------------------------------
> > Dr Thierry DELAITRE
> > Systems and Services Manager, CSCS
> > University of Westminster
> > 115 New Cavendish Street, London W1W 6UW
> >
> > Tel: 020 7911 5000 ext: 3586
> > Fax: 020 7911 5089
> > Mobile short dial code 1788
> >
> > http://www.cscs.wmin.ac.uk/~delaitt
> > ----------------------------------------
> >
> > This e-mail and its attachments are intended for the above named only
> > and may be confidential.  If they have come to you in error you must
> > not copy or show them to anyone, nor should you take any action based
> > on them, other than to notify the error by replying to the sender.
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >
>

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.


From xma at us.ibm.com  Tue Sep 26 11:14:53 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 11:14:53 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <ada1wpyd5tq.fsf@cisco.com>
Message-ID: <OF1CECF0B2.B5DFBD41-ON872571F5.0063FE0C-882571F5.00643C9E@us.ibm.com>


Roland,

We had a simple version of NAPI patch. We saw the performance improvement
on mthca but not ehca. We will test this NAPI patch on ehca when it's
available to see how's the performance.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/a4a84d16/attachment.html>

From xma at us.ibm.com  Tue Sep 26 11:21:14 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 11:21:14 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <1158850657.24776.158.camel@localhost>
Message-ID: <OF2A2647E8.67324A49-ON872571F5.0064A835-882571F5.0064D189@us.ibm.com>


> This patch implements NAPI for iopib. It is a draft implementation.
> I would like your opinion on whether we need a module parameter
> to control if NAPI should be activated or not.

It can be a configuration option to enable/disable NAPI, just like other
network device.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/8db936e3/attachment.html>

From xma at us.ibm.com  Tue Sep 26 11:28:49 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 11:28:49 -0700
Subject: [openib-general] enable GSO over IPoIB
In-Reply-To: <OF1CECF0B2.B5DFBD41-ON872571F5.0063FE0C-882571F5.00643C9E@us.ibm.com>
Message-ID: <OF4E6C1736.BEF901BD-ON872571F5.006556F6-882571F5.00658369@us.ibm.com>


Since linux 2.6.18 supports GSO, I have patched IPoIB to enable GSO, but
haven't tested the performance yet. Has anyone tried already?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/6f55cfcb/attachment.html>

From mlleinin at hpcn.ca.sandia.gov  Tue Sep 26 11:44:13 2006
From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger)
Date: Tue, 26 Sep 2006 11:44:13 -0700
Subject: [openib-general] OpenFabrics IBTA DevCon 2006 presentations
Message-ID: <1159296253.15009.57.camel@localhost>

Most of the presentations from the OpenFabrics IBTA DevCon 2006 in San
Francisco yesterday have been posted online at

http://openfabrics.org/conference/sep2006devcon/

and 

http://www.infinibandta.org/events/DevCon2006_presentations


Thanks to everyone who helped set up this event and to those that
participated.

  - Matt


From mst at mellanox.co.il  Tue Sep 26 12:20:07 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 22:20:07 +0300
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com>
References: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com>
Message-ID: <20060926192007.GA24009@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [RFC] determining which changes in svn to merge upstream or remove
> 
> >Connections taking 60 sec to create is an issue.
> >Can you please explain how the fact that some connections are used affect
> >the time it takes to send the response?
> 
> This is in userspace, and IMO, an application issue.  Threads using established
> connections simply begin consuming all processor time.  This is while running
> under heavy load and trying to scale up the application.
>
> >Why would sending MRA be faster than sending the response?
> 
> An MRA could be sent directly by the RDMA CM in the kernel in a REQ callback,
> whereas the response requires the userspace application to poll the REQ and
> generate a REP.

I see. So it actually does look like for userspace clients, CMA should send MRA
immediately and then let userspace send REP in its own good time.

-- 
MST


From mst at mellanox.co.il  Tue Sep 26 12:28:22 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 22:28:22 +0300
Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib
 module & ofed
In-Reply-To: <Pine.GSO.4.58.0609261704080.25974@seth.cpc.wmin.ac.uk>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251305100.25974@seth.cpc.wmin.ac.uk>
	<200609261707.37720.jackm@dev.mellanox.co.il>
	<Pine.GSO.4.58.0609261704080.25974@seth.cpc.wmin.ac.uk>
Message-ID: <20060926192822.GD24009@mellanox.co.il>

Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> Subject: Re: [Lustre-discuss] Re: problems with lustre o2ib module & ofed
> 
> 
> On Tue, 26 Sep 2006, Jack Morgenstein wrote:
> 
> > On Monday 25 September 2006 17:01, Thierry Delaitre wrote:
> >
> > I noticed in the Lustre configure file the following
> >   --with-linux=path       set path to Linux source (default=/usr/src/linux)
> >
> > Where does /usr/src/linux link to?
> >
> > You might consider explicitly specifying the following options as well in the
> > Lustre ./configure step:
> >
> >   --with-linux=path       set path to Linux source (default=/usr/src/linux)
> >   --with-linux-obj=path   set path to Linux objects dir (default=$LINUX)
> >   --with-linux-config=path
> >                           set path to Linux .conf (default=$LINUX_OBJ/.config)
> 
> I specified the whole string and still the same.
> 
> ./configure --with-o2ib=/usr/local/ofed/src/openib --with-linux=/usr/src/linux-2.6.16.21-0.8 --with-linux-obj=/usr/src/linux-2.6.16.21-0.8 --with-linux-config=/usr/src/linux-2.6.16.21-0.8/.config
> 
> Thierry.

1. Did you reboot after rebuilding everything?

2. Try to check the compiler command line used for building lustre.
You must make sure gen2 is before linux kernel in -I flag list.

-- 
MST


From narravul at cse.ohio-state.edu  Tue Sep 26 12:38:34 2006
From: narravul at cse.ohio-state.edu (Sundeep Narravula)
Date: Tue, 26 Sep 2006 15:38:34 -0400 (EDT)
Subject: [openib-general] Port reuse issue for rdma_cm/iwarp
Message-ID: <Pine.GSO.4.40.0609261530080.29485-100000@mu.cse.ohio-state.edu>


Hi,
 We are facing a problem while running back-to-back applications using the
same port number for rdma_cm over iwarp (Ammasso). The port seems to be
busy for about 60 seconds after each disconnect.

The first execution finishes without any problems or errors. When the
execution is repeated immediately, we see a RDMA_CM_EVENT_REJECTED event
on the active connect side. However, if we use a different port or if we
include a delay of more than 60 seconds between the runs, we do not see
this problem.

Is this a known issue? Is there anyway to force a immediate reuse of the
port?

Thanks,
  --Sundeep.


From swise at opengridcomputing.com  Tue Sep 26 12:50:51 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 26 Sep 2006 14:50:51 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <1159300251.11549.6.camel@stevo-desktop>

Roland, 

Whats the status with the main trunk kernel code and 2.6.18?  

I noticed that it doesn't build and needs something like this. I haven't
tested this yet...

Signed-off-by: Steve Wise <swise at opengridcomputing.com>


Index: uverbs_main.c
===================================================================
--- uverbs_main.c       (revision 9632)
+++ uverbs_main.c       (working copy)
@@ -820,11 +820,12 @@
        kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
 }

-static struct super_block *uverbs_event_get_sb(struct file_system_type *fs_type, int flags,
-                                              const char *dev_name, void *data)+static int uverbs_event_get_sb(struct file_system_type *fs_type, int flags,
+                              const char *dev_name, void *data,
+                              struct vfsmount *mnt)
 {
        return get_sb_pseudo(fs_type, "infinibandevent:", NULL,
-                            INFINIBANDEVENTFS_MAGIC);
+                            INFINIBANDEVENTFS_MAGIC, mnt);
 }

 static struct file_system_type uverbs_event_fs = {


From swise at opengridcomputing.com  Tue Sep 26 13:01:34 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 26 Sep 2006 15:01:34 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159300251.11549.6.camel@stevo-desktop>
References: <1159300251.11549.6.camel@stevo-desktop>
Message-ID: <1159300894.11549.11.camel@stevo-desktop>

On Tue, 2006-09-26 at 14:50 -0500, Steve Wise wrote:
> Roland, 
> 
> Whats the status with the main trunk kernel code and 2.6.18?  
> 
> I noticed that it doesn't build and needs something like this. I haven't
> tested this yet...
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>

Oops, that patch was mangled.  Try this:

Index: uverbs_main.c
===================================================================
--- uverbs_main.c	(revision 9632)
+++ uverbs_main.c	(working copy)
@@ -820,11 +820,12 @@
 	kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
 }
 
-static struct super_block *uverbs_event_get_sb(struct file_system_type *fs_type, int flags,
-					       const char *dev_name, void *data)
+static int uverbs_event_get_sb(struct file_system_type *fs_type, int flags,
+			       const char *dev_name, void *data,
+			       struct vfsmount *mnt)
 {
 	return get_sb_pseudo(fs_type, "infinibandevent:", NULL,
-			     INFINIBANDEVENTFS_MAGIC);
+			     INFINIBANDEVENTFS_MAGIC, mnt);
 }
 
 static struct file_system_type uverbs_event_fs = {


From xma at us.ibm.com  Tue Sep 26 13:04:09 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 13:04:09 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159300251.11549.6.camel@stevo-desktop>
Message-ID: <OF5CD701BD.6037C3D3-ON872571F5.006E0197-882571F5.006E3D9F@us.ibm.com>


> Whats the status with the main trunk kernel code and 2.6.18?
>
> I noticed that it doesn't build and needs something like this. I haven't
> tested this yet...

Yes. You need this patch and also need change ipoib_multicast.c:
dev->xmit_lock to dev->_xmit_lock to build the trunk on 2.6.18.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/d028522f/attachment.html>

From xma at us.ibm.com  Tue Sep 26 13:12:04 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 13:12:04 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF2A2647E8.67324A49-ON872571F5.0064A835-882571F5.0064D189@us.ibm.com>
Message-ID: <OFCFFF2378.2760CA3D-ON872571F5.006EAE0B-882571F5.006EF764@us.ibm.com>


We did some touch test on ehca driver, we saw performance drop somehow. I
strongly recommand NAPI as a configurable option in ipoib. So customers can
turn on/off based on their configurations.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/d90c2c55/attachment.html>

From chris_youb at yahoo.ca  Tue Sep 26 13:21:14 2006
From: chris_youb at yahoo.ca (chris_youb at yahoo.ca)
Date: Wed, 27 Sep 2006 04:21:14 +0800
Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed'
Message-ID: <6169134.1159302074231.JavaMail.websites@opensubscriber>

I'm trying to setup OpenSM on one of our boxes.  I've installed the RPMs from ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox card.
When I try to start opensm I get the following error message: 'umad_open_port: open /dev/infiniband/umad1 failed'.  Any suggestions of what I can try next?

******** Setup ********
H/W: Dell 1550
O/S: Suse 10.0 (linux 2.6.13-15.12-default)
HBC: Mellanox MT23108 rev 3.5.000
S/W: ofed-1.0-sles10-rpms_i686.tar.gz

******** OpenSM ********
linux:/usr/local/ofed/bin # ./opensm -V -d5
-------------------------------------------------
OpenSM Rev:openib-1.2.1
Based on OpenIB svn Exported revision
Command Line Arguments:
 Big V selected
 d level = 0x5
 Log File: /var/log/osm.log
-------------------------------------------------
OpenSM Rev:openib-1.2.1 OpenIB svn Exported revision

ibwarn: [6860] umad_init:
ibwarn: [6860] umad_get_cas_names: max 32
ibwarn: [6860] umad_get_cas_names: return 1 cas
ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 64
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_port: ca_name (null) portnum 0
ibwarn: [6860] umad_get_cas_names: max 20
ibwarn: [6860] umad_get_cas_names: return 1 cas
ibwarn: [6860] resolve_ca_name: checking ca 'mthca0'
ibwarn: [6860] resolve_ca_port: checking ca 'mthca0'
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] resolve_ca_port: checking port 0
ibwarn: [6860] resolve_ca_port: checking port 1
ibwarn: [6860] resolve_ca_port: checking port 2
ibwarn: [6860] resolve_ca_name: found ca mthca0 with port 2 type 0
ibwarn: [6860] resolve_ca_name: phys found 0 on mthca0 port 2
ibwarn: [6860] umad_release_port: port mthca0:2
ibwarn: [6860] umad_release_port: releasing mthca0:2
Using default GUID 0x2c90107fbfcf2
ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 32
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_port: ca_name mthca0 portnum 2
ibwarn: [6860] umad_open_port: ca mthca0 port 2
ibwarn: [6860] umad_open_port: opening mthca0 port 2
ibwarn: [6860] dev_to_umad_id: mapped mthca0 2 to 1
ibwarn: [6860] umad_open_port: open /dev/infiniband/umad1 failed

Error from osm_opensm_bind (0x2A)
Exiting SM

ibwarn: [6860] umad_done:

******** Drivers ********
ib_mthca               97692  0
ib_mad                 34324  2 ib_umad,ib_mthca
ib_core                39680  3 ib_umad,ib_mthca,ib_mad

******** Logs ********
linux:/usr/local/ofed/bin # tail -f /var/log/osm.log
Jan 28 14:35:41 017194 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Jan 28 14:35:41 017349 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Jan 28 14:35:41 025501 [4018DFE0] -> osm_vendor_bind: Binding to port 0x2c90107fbfcf2
Jan 28 14:35:41 030909 [4018DFE0] -> osm_vendor_open_port: ERR 542C: umad_open_port() failed
Jan 28 14:35:41 030986 [4018DFE0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90107fbfcf2
Jan 28 14:35:41 031015 [4018DFE0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jan 28 14:35:41 031228 [4018DFE0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jan 28 14:35:41 031742 [4018DFE0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Jan 28 14:35:41 032313 [0000] -> Exiting SM


--
This message was sent on behalf of chris_youb at yahoo.ca at openSubscriber.com
http://www.opensubscriber.com/messages/openib-general at openib.org/topic.html


From halr at voltaire.com  Tue Sep 26 13:32:42 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: Tue, 26 Sep 2006 23:32:42 +0300
Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed'
References: <6169134.1159302074231.JavaMail.websites@opensubscriber>
Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589ADBC@taurus.voltaire.com>

Hi,
 
Do you have udev installed and configured ? You may want to refer to the wiki (https://openib.org/tiki/tiki-index.php) for more troubleshooting info. There's some info in the cheat sheet (https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet) which may help.
 
-- Hal

________________________________

From: openib-general-bounces at openib.org on behalf of chris_youb at yahoo.ca
Sent: Tue 9/26/2006 4:21 PM
To: openib-general at openib.org
Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed'


I'm trying to setup OpenSM on one of our boxes.  I've installed the RPMs from ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox card.
When I try to start opensm I get the following error message: 'umad_open_port: open /dev/infiniband/umad1 failed'.  Any suggestions of what I can try next?

******** Setup ********
H/W: Dell 1550
O/S: Suse 10.0 (linux 2.6.13-15.12-default)
HBC: Mellanox MT23108 rev 3.5.000
S/W: ofed-1.0-sles10-rpms_i686.tar.gz

******** OpenSM ********
linux:/usr/local/ofed/bin # ./opensm -V -d5
-------------------------------------------------
OpenSM Rev:openib-1.2.1
Based on OpenIB svn Exported revision
Command Line Arguments:
 Big V selected
 d level = 0x5
 Log File: /var/log/osm.log
-------------------------------------------------
OpenSM Rev:openib-1.2.1 OpenIB svn Exported revision

ibwarn: [6860] umad_init:
ibwarn: [6860] umad_get_cas_names: max 32
ibwarn: [6860] umad_get_cas_names: return 1 cas
ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 64
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_port: ca_name (null) portnum 0
ibwarn: [6860] umad_get_cas_names: max 20
ibwarn: [6860] umad_get_cas_names: return 1 cas
ibwarn: [6860] resolve_ca_name: checking ca 'mthca0'
ibwarn: [6860] resolve_ca_port: checking ca 'mthca0'
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] resolve_ca_port: checking port 0
ibwarn: [6860] resolve_ca_port: checking port 1
ibwarn: [6860] resolve_ca_port: checking port 2
ibwarn: [6860] resolve_ca_name: found ca mthca0 with port 2 type 0
ibwarn: [6860] resolve_ca_name: phys found 0 on mthca0 port 2
ibwarn: [6860] umad_release_port: port mthca0:2
ibwarn: [6860] umad_release_port: releasing mthca0:2
Using default GUID 0x2c90107fbfcf2
ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 32
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports
ibwarn: [6860] umad_get_ca: ca_name mthca0
ibwarn: [6860] umad_get_ca: opened mthca0
ibwarn: [6860] umad_get_port: ca_name mthca0 portnum 2
ibwarn: [6860] umad_open_port: ca mthca0 port 2
ibwarn: [6860] umad_open_port: opening mthca0 port 2
ibwarn: [6860] dev_to_umad_id: mapped mthca0 2 to 1
ibwarn: [6860] umad_open_port: open /dev/infiniband/umad1 failed

Error from osm_opensm_bind (0x2A)
Exiting SM

ibwarn: [6860] umad_done:

******** Drivers ********
ib_mthca               97692  0
ib_mad                 34324  2 ib_umad,ib_mthca
ib_core                39680  3 ib_umad,ib_mthca,ib_mad

******** Logs ********
linux:/usr/local/ofed/bin # tail -f /var/log/osm.log
Jan 28 14:35:41 017194 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Jan 28 14:35:41 017349 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
Jan 28 14:35:41 025501 [4018DFE0] -> osm_vendor_bind: Binding to port 0x2c90107fbfcf2
Jan 28 14:35:41 030909 [4018DFE0] -> osm_vendor_open_port: ERR 542C: umad_open_port() failed
Jan 28 14:35:41 030986 [4018DFE0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90107fbfcf2
Jan 28 14:35:41 031015 [4018DFE0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jan 28 14:35:41 031228 [4018DFE0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jan 28 14:35:41 031742 [4018DFE0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Jan 28 14:35:41 032313 [0000] -> Exiting SM


--
This message was sent on behalf of chris_youb at yahoo.ca at openSubscriber.com
http://www.opensubscriber.com/messages/openib-general at openib.org/topic.html

_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From caitlinb at broadcom.com  Tue Sep 26 13:36:43 2006
From: caitlinb at broadcom.com (Caitlin Bestler)
Date: Tue, 26 Sep 2006 13:36:43 -0700
Subject: [openib-general] Port reuse issue for rdma_cm/iwarp
In-Reply-To: <Pine.GSO.4.40.0609261530080.29485-100000@mu.cse.ohio-state.edu>
Message-ID: <54AD0F12E08D1541B826BE97C98F99F1A31348@NT-SJCA-0751.brcm.ad.broadcom.com>

openib-general-bounces at openib.org wrote:
> Hi,
>  We are facing a problem while running back-to-back
> applications using the same port number for rdma_cm over
> iwarp (Ammasso). The port seems to be busy for about 60
> seconds after each disconnect.
> 
> The first execution finishes without any problems or errors.
> When the execution is repeated immediately, we see a
> RDMA_CM_EVENT_REJECTED event on the active connect side.
> However, if we use a different port or if we include a delay
> of more than 60 seconds between the runs, we do not see this problem.
> 
> Is this a known issue? Is there anyway to force a immediate
> reuse of the port?
> 

TCP restricts prompt re-use of the same Source/Destination
Address/Port pair while old traffic could still be in-flight.
This is generally not an issue because prompt re-use of the
exact four tuple is rare.

Is there a special reason why your application needs to
reuse the same port from the active side? If the port number
is being used to identify the rank, could private data be
used instead?


From sashak at voltaire.com  Tue Sep 26 13:44:08 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 26 Sep 2006 23:44:08 +0300
Subject: [openib-general] OpenSM -> 'open /dev/infiniband/umad1 failed'
In-Reply-To: <6169134.1159302074231.JavaMail.websites@opensubscriber>
References: <6169134.1159302074231.JavaMail.websites@opensubscriber>
Message-ID: <20060926204408.GA23096@sashak.voltaire.com>

Hi,

On 04:21 Wed 27 Sep     , chris_youb at yahoo.ca wrote:
> I'm trying to setup OpenSM on one of our boxes.  I've installed the RPMs from ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox card.
> When I try to start opensm I get the following error message: 'umad_open_port: open /dev/infiniband/umad1 failed'.  Any suggestions of what I can try next?

Be sure that device node '/dev/infiniband/umad1' exists and you have
permission to access it for read/write.

Sasha

> 
> ******** Setup ********
> H/W: Dell 1550
> O/S: Suse 10.0 (linux 2.6.13-15.12-default)
> HBC: Mellanox MT23108 rev 3.5.000
> S/W: ofed-1.0-sles10-rpms_i686.tar.gz
> 
> ******** OpenSM ********
> linux:/usr/local/ofed/bin # ./opensm -V -d5
> -------------------------------------------------
> OpenSM Rev:openib-1.2.1
> Based on OpenIB svn Exported revision
> Command Line Arguments:
>  Big V selected
>  d level = 0x5
>  Log File: /var/log/osm.log
> -------------------------------------------------
> OpenSM Rev:openib-1.2.1 OpenIB svn Exported revision
> 
> ibwarn: [6860] umad_init:
> ibwarn: [6860] umad_get_cas_names: max 32
> ibwarn: [6860] umad_get_cas_names: return 1 cas
> ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 64
> ibwarn: [6860] umad_get_ca: ca_name mthca0
> ibwarn: [6860] umad_get_ca: opened mthca0
> ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports
> ibwarn: [6860] umad_get_ca: ca_name mthca0
> ibwarn: [6860] umad_get_ca: opened mthca0
> ibwarn: [6860] umad_get_port: ca_name (null) portnum 0
> ibwarn: [6860] umad_get_cas_names: max 20
> ibwarn: [6860] umad_get_cas_names: return 1 cas
> ibwarn: [6860] resolve_ca_name: checking ca 'mthca0'
> ibwarn: [6860] resolve_ca_port: checking ca 'mthca0'
> ibwarn: [6860] umad_get_ca: ca_name mthca0
> ibwarn: [6860] umad_get_ca: opened mthca0
> ibwarn: [6860] resolve_ca_port: checking port 0
> ibwarn: [6860] resolve_ca_port: checking port 1
> ibwarn: [6860] resolve_ca_port: checking port 2
> ibwarn: [6860] resolve_ca_name: found ca mthca0 with port 2 type 0
> ibwarn: [6860] resolve_ca_name: phys found 0 on mthca0 port 2
> ibwarn: [6860] umad_release_port: port mthca0:2
> ibwarn: [6860] umad_release_port: releasing mthca0:2
> Using default GUID 0x2c90107fbfcf2
> ibwarn: [6860] umad_get_ca_portguids: ca name mthca0 max port guids 32
> ibwarn: [6860] umad_get_ca: ca_name mthca0
> ibwarn: [6860] umad_get_ca: opened mthca0
> ibwarn: [6860] umad_get_ca_portguids: mthca0: 3 ports
> ibwarn: [6860] umad_get_ca: ca_name mthca0
> ibwarn: [6860] umad_get_ca: opened mthca0
> ibwarn: [6860] umad_get_port: ca_name mthca0 portnum 2
> ibwarn: [6860] umad_open_port: ca mthca0 port 2
> ibwarn: [6860] umad_open_port: opening mthca0 port 2
> ibwarn: [6860] dev_to_umad_id: mapped mthca0 2 to 1
> ibwarn: [6860] umad_open_port: open /dev/infiniband/umad1 failed
> 
> Error from osm_opensm_bind (0x2A)
> Exiting SM
> 
> ibwarn: [6860] umad_done:
> 
> ******** Drivers ********
> ib_mthca               97692  0
> ib_mad                 34324  2 ib_umad,ib_mthca
> ib_core                39680  3 ib_umad,ib_mthca,ib_mad
> 
> ******** Logs ********
> linux:/usr/local/ofed/bin # tail -f /var/log/osm.log
> Jan 28 14:35:41 017194 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Jan 28 14:35:41 017349 [4018DFE0] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000
> Jan 28 14:35:41 025501 [4018DFE0] -> osm_vendor_bind: Binding to port 0x2c90107fbfcf2
> Jan 28 14:35:41 030909 [4018DFE0] -> osm_vendor_open_port: ERR 542C: umad_open_port() failed
> Jan 28 14:35:41 030986 [4018DFE0] -> osm_vendor_bind: ERR 5424: Unable to Open Port 0x2c90107fbfcf2
> Jan 28 14:35:41 031015 [4018DFE0] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
> Jan 28 14:35:41 031228 [4018DFE0] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
> Jan 28 14:35:41 031742 [4018DFE0] -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
> Jan 28 14:35:41 032313 [0000] -> Exiting SM
> 
> 
> --
> This message was sent on behalf of chris_youb at yahoo.ca at openSubscriber.com
> http://www.opensubscriber.com/messages/openib-general at openib.org/topic.html
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From sashak at voltaire.com  Tue Sep 26 13:51:30 2006
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 26 Sep 2006 23:51:30 +0300
Subject: [openib-general] [PATCH TRIVIAL] opensm: libibumad: show open()'s
	errno string.
Message-ID: <20060926205130.GB23096@sashak.voltaire.com>

Show errno string then open() fails.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---

 libibumad/src/umad.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/libibumad/src/umad.c b/libibumad/src/umad.c
index cb9eef6..7bf0048 100644
--- a/libibumad/src/umad.c
+++ b/libibumad/src/umad.c
@@ -575,7 +575,7 @@ umad_open_port(char *ca_name, int portnu
 		 UMAD_DEV_DIR , umad_id);
 
 	if ((port->dev_fd = open(port->dev_file, O_RDWR|O_NONBLOCK)) < 0) {
-		DEBUG("open %s failed", port->dev_file);
+		DEBUG("open %s failed: %s", port->dev_file, strerror(errno));
 		return -EIO;
 	}
 

From shemminger at osdl.org  Tue Sep 26 13:51:14 2006
From: shemminger at osdl.org (Stephen Hemminger)
Date: Tue, 26 Sep 2006 13:51:14 -0700
Subject: [openib-general] Compile warnings (cross build)
Message-ID: <20060926135114.1da96c1b@freekitty>

Hello,

At OSDL we have been running automated cross-compiles on the 
scsi-misc and scsi-rc-fixes trees and I thought it might be 
helpful to post the warnings and errors which appear compared 
to the tree it is based on.  SCSI is clean, so mostly these 
are warnings.  We do allmodconfig or defconfig on arm, I386, 
ia64, powerpc, ppc, sparc64, x86_64.  If there were no 
additional warnings, then that architecture is not in the output.

So, here are the _additional_ warnings from the linux-2.6.18-scsi-misc1 
compile outputs versus the linux-2.6.18 compile outputs.

Let me know if this is useful, or how it could be better.

WKR,
Judith Lebzelter
OSDL


*********ia64*********
> drivers/infiniband/hw/amso1100/c2_provider.c: In function `c2_reg_phys_mr':
> drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: long long unsigned int format, long unsigned int arg (arg 6)
> drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: long long unsigned int format, long unsigned int arg (arg 7)
> drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: long long unsigned int format, long unsigned int arg (arg 8)
> drivers/infiniband/hw/amso1100/c2_rnic.c: In function `c2_rnic_init':
> drivers/infiniband/hw/amso1100/c2_rnic.c:529: warning: long long unsigned int format, dma_addr_t arg (arg 4)
> drivers/infiniband/hw/amso1100/c2_rnic.c:552: warning: long long unsigned int format, dma_addr_t arg (arg 4)
> drivers/infiniband/hw/amso1100/c2_alloc.c: In function `c2_alloc_mqsp':
> drivers/infiniband/hw/amso1100/c2_alloc.c:117: warning: long long unsigned int format, long unsigned int arg (arg 4)
> drivers/infiniband/hw/amso1100/c2_alloc.c:94: warning: 'mqsp' might be used uninitialized in this function
> drivers/infiniband/hw/amso1100/c2_ae.c: In function `c2_ae_event':
> drivers/infiniband/hw/amso1100/c2_ae.c:195: warning: long long unsigned int format, long unsigned int arg (arg 4)


*********powerpc*********
> drivers/infiniband/hw/amso1100/c2_provider.c: In function 'c2_reg_phys_mr':
> drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64'
> drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
> drivers/infiniband/hw/amso1100/c2_provider.c:397: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
> drivers/infiniband/hw/amso1100/c2_rnic.c: In function 'c2_rnic_init':
> drivers/infiniband/hw/amso1100/c2_rnic.c:529: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'dma_addr_t'
> drivers/infiniband/hw/amso1100/c2_rnic.c:552: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'dma_addr_t'
> drivers/infiniband/hw/amso1100/c2_alloc.c: In function 'c2_alloc_mqsp':
> drivers/infiniband/hw/amso1100/c2_alloc.c:117: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'dma_addr_t'
> drivers/infiniband/hw/amso1100/c2_ae.c: In function 'c2_ae_event':
> drivers/infiniband/hw/amso1100/c2_ae.c:195: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'

-- 
Stephen Hemminger <shemminger at osdl.org>


From rdreier at cisco.com  Tue Sep 26 14:09:47 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Sep 2006 14:09:47 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF2A2647E8.67324A49-ON872571F5.0064A835-882571F5.0064D189@us.ibm.com>
	(Shirley Ma's message of "Tue, 26 Sep 2006 11:21:14 -0700")
References: <OF2A2647E8.67324A49-ON872571F5.0064A835-882571F5.0064D189@us.ibm.com>
Message-ID: <ada8xk6bahw.fsf@cisco.com>

    Shirley> It can be a configuration option to enable/disable NAPI,
    Shirley> just like other network device.

But is there any reason to keep the non-NAPI code around?  I hate to
have  two codepaths to maintain.


From rdreier at cisco.com  Tue Sep 26 14:11:05 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Sep 2006 14:11:05 -0700
Subject: [openib-general] enable GSO over IPoIB
In-Reply-To: <OF4E6C1736.BEF901BD-ON872571F5.006556F6-882571F5.00658369@us.ibm.com>
	(Shirley Ma's message of "Tue, 26 Sep 2006 11:28:49 -0700")
References: <OF4E6C1736.BEF901BD-ON872571F5.006556F6-882571F5.00658369@us.ibm.com>
Message-ID: <ada4puubafq.fsf@cisco.com>

    Shirley> Since linux 2.6.18 supports GSO, I have patched IPoIB to
    Shirley> enable GSO, but haven't tested the performance yet. Has
    Shirley> anyone tried already?

No, I don't think anyone looked at that yet.  Could you post your
patch?  What is required?  Supporting gather/scatter?

 - R.


From rdreier at cisco.com  Tue Sep 26 14:13:32 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Sep 2006 14:13:32 -0700
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <20060926135114.1da96c1b@freekitty> (Stephen Hemminger's
	message of "Tue, 26 Sep 2006 13:51:14 -0700")
References: <20060926135114.1da96c1b@freekitty>
Message-ID: <adazmcm9vr7.fsf@cisco.com>

 > At OSDL we have been running automated cross-compiles on the 
 > scsi-misc and scsi-rc-fixes trees and I thought it might be 
 > helpful to post the warnings and errors which appear compared 
 > to the tree it is based on.  SCSI is clean, so mostly these 
 > are warnings.  We do allmodconfig or defconfig on arm, I386, 
 > ia64, powerpc, ppc, sparc64, x86_64.  If there were no 
 > additional warnings, then that architecture is not in the output.

I assume you mean my infiniband.git tree?  (Probably cut-and-paste
from another email ;)

 > Let me know if this is useful, or how it could be better.

This is super-useful!  Please continue to post these reports.

I'll fix up these warnings and merge upstream.

Thanks,
  Roland


From rdreier at cisco.com  Tue Sep 26 14:15:52 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 26 Sep 2006 14:15:52 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <ada8xk6bahw.fsf@cisco.com> (Roland Dreier's message of
	"Tue, 26 Sep 2006 14:09:47 -0700")
References: <OF2A2647E8.67324A49-ON872571F5.0064A835-882571F5.0064D189@us.ibm.com>
	<ada8xk6bahw.fsf@cisco.com>
Message-ID: <adavena9vnb.fsf@cisco.com>

    Roland> But is there any reason to keep the non-NAPI code around?
    Roland> I hate to have two codepaths to maintain.

I see you said that ehca showed a performance drop with NAPI.  What
approach did you use to handle "rotting packets" (race between poll CQ
and request notification on a CQ)?

Do you know how ehca behaves?  Does it have that race?  ie what
happens in this situation:

    poll CQ -> CQ is empty
        (new completion is added to CQ)
    request notify on CQ
        (no more completions are added)

Mellanox HCAs will generate a CQ event in this case, although it's not
strictly required by the IB spec.  How will ehca behave?

 - R.


From ttelford.groups at gmail.com  Tue Sep 26 14:19:02 2006
From: ttelford.groups at gmail.com (Troy Telford)
Date: Tue, 26 Sep 2006 15:19:02 -0600
Subject: [openib-general] DAPL setup/config help
Message-ID: <op.tgh750a7zidtg1@rygel.lnxi.com>

I've never set up dapl before, however I now have a reason to try...

The problem is, I can't seem to find any documentation on how to set it
up.  I've tried the sample /etc/dat.conf (modified for the IPoIB address
on the system), but I'm not sure I've been sucessful.

I've:
* compiled from OFED 1.0
* verified the library paths listed in /etc/dat.conf are correct
* I do know that things like IP over IB, MVAPICH, Open MPI, etc. work
fine; but they're not using DAPL
* tried the 'dapltest' and 'dtest' programs.

In both cases, I receive an error to the extent of:
DAT_PROVIDER_NOT_FOUND DAT_NAME_NOT_REGISTERED

Can anybody point me in the right direction (so I can RTFM and get on with
life?)


From wombat2 at us.ibm.com  Tue Sep 26 14:35:34 2006
From: wombat2 at us.ibm.com (Bernard King-Smith)
Date: Tue, 26 Sep 2006 17:35:34 -0400
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
Message-ID: <OF4B9A668B.79A63961-ON852571F5.00765CAE-852571F5.00769F14@us.ibm.com>

Eli and Roland,

Has anyone run the RR test in Netperf to look at latency? What 1 byte RR 
rates did you see before and after applying the patch.

Bernie King-Smith 
IBM Corporation
Server Group
Cluster System Performance 
wombat2 at us.ibm.com    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES 

"We are not responsible for the world we are born into, only for the world 
we leave when we die.
So we have to accept what has gone before us and work to change the only 
thing we can,
-- The Future." William Shatner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/a8a3dcf5/attachment.html>

From xma at us.ibm.com  Tue Sep 26 14:46:53 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 14:46:53 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adavena9vnb.fsf@cisco.com>
Message-ID: <OFD83F631F.331BD210-ON872571F5.00777082-882571F5.0077A568@us.ibm.com>


Roland,

> Do you know how ehca behaves?  Does it have that race?  ie what
> happens in this situation:
>
>     poll CQ -> CQ is empty
>         (new completion is added to CQ)
>     request notify on CQ
>         (no more completions are added)
>
> Mellanox HCAs will generate a CQ event in this case, although it's not
> strictly required by the IB spec.  How will ehca behave?
>
>  - R.

That could be the reason. I did see mthca poll empty entry, but not on
ehca. I will confirm this with ehca team.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/1d08f3eb/attachment.html>

From narravul at cse.ohio-state.edu  Tue Sep 26 14:53:36 2006
From: narravul at cse.ohio-state.edu (Sundeep Narravula)
Date: Tue, 26 Sep 2006 17:53:36 -0400 (EDT)
Subject: [openib-general] Port reuse issue for rdma_cm/iwarp
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1A31348@NT-SJCA-0751.brcm.ad.broadcom.com>
Message-ID: <Pine.GSO.4.40.0609261746390.29485-100000@mu.cse.ohio-state.edu>

> TCP restricts prompt re-use of the same Source/Destination
> Address/Port pair while old traffic could still be in-flight.
> This is generally not an issue because prompt re-use of the
> exact four tuple is rare.
>
> Is there a special reason why your application needs to
> reuse the same port from the active side? If the port number
> is being used to identify the rank, could private data be
> used instead?

Our application is primarily an invocation of multiple independent parallel
jobs which all need to connect or each other on each invocation. Since
this is a TCP limitation, is there any interface similar to setsockopt
with TCP_NODELAY. We probably need to use different ports otherwise.

Thanks,
  --Sundeep.

>
>
>


From mst at mellanox.co.il  Tue Sep 26 12:20:07 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Tue, 26 Sep 2006 22:20:07 +0300
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com>
References: <000001c6e183$2db6adf0$5f78e984@amr.corp.intel.com>
Message-ID: <20060926192007.GA24009@mellanox.co.il>

Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [RFC] determining which changes in svn to merge upstream or remove
> 
> >Connections taking 60 sec to create is an issue.
> >Can you please explain how the fact that some connections are used affect
> >the time it takes to send the response?
> 
> This is in userspace, and IMO, an application issue.  Threads using established
> connections simply begin consuming all processor time.  This is while running
> under heavy load and trying to scale up the application.
>
> >Why would sending MRA be faster than sending the response?
> 
> An MRA could be sent directly by the RDMA CM in the kernel in a REQ callback,
> whereas the response requires the userspace application to poll the REQ and
> generate a REP.

I see. So it actually does look like for userspace clients, CMA should send MRA
immediately and then let userspace send REP in its own good time.

-- 
MST


From ardavis at ichips.intel.com  Tue Sep 26 15:15:58 2006
From: ardavis at ichips.intel.com (Arlin Davis)
Date: Tue, 26 Sep 2006 15:15:58 -0700
Subject: [openib-general] DAPL setup/config help
In-Reply-To: <op.tgh750a7zidtg1@rygel.lnxi.com>
References: <op.tgh750a7zidtg1@rygel.lnxi.com>
Message-ID: <4519A69E.1070008@ichips.intel.com>

Troy Telford wrote:

>I've never set up dapl before, however I now have a reason to try...
>
>The problem is, I can't seem to find any documentation on how to set it
>up.  I've tried the sample /etc/dat.conf (modified for the IPoIB address
>on the system), but I'm not sure I've been sucessful.
>
>I've:
>* compiled from OFED 1.0
>* verified the library paths listed in /etc/dat.conf are correct
>* I do know that things like IP over IB, MVAPICH, Open MPI, etc. work
>fine; but they're not using DAPL
>* tried the 'dapltest' and 'dtest' programs.
>
>In both cases, I receive an error to the extent of:
>DAT_PROVIDER_NOT_FOUND DAT_NAME_NOT_REGISTERED
>  
>
The dapl provider name that your application uses for the open must 
match the ia_name entry in dat.conf.

sample dat.conf:

# Each entry should have the following fields:
#
# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
#           <provider_version> <ia_params> <platform_params>
OpenIB-cma u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2 
"ib0 0" ""


The dtest makefile with OFED 1.0 should use OpenIB-cma as the provider 
name instead of OpenIB-cma-ip. The default configuration was fixed in 
OFED 1.1.

For dapltest you must pass this dat.conf name as an argument to all 
scripts. For example "./srv.sh OpenIB-cma"

-arlin

>Can anybody point me in the right direction (so I can RTFM and get on with
>life?)
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>  
>


From xma at us.ibm.com  Tue Sep 26 14:53:53 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 14:53:53 -0700
Subject: [openib-general] enable GSO over IPoIB
In-Reply-To: <ada4puubafq.fsf@cisco.com>
Message-ID: <OFBB76619C.C348CC61-ON872571F5.0077B8EA-882571F5.0078499A@us.ibm.com>


>     Shirley> Since linux 2.6.18 supports GSO, I have patched IPoIB to
>     Shirley> enable GSO, but haven't tested the performance yet. Has
>     Shirley> anyone tried already?
>
> No, I don't think anyone looked at that yet.  Could you post your
> patch?  What is required?  Supporting gather/scatter?
>
>  - R.

Don't need too. GSO only improves sender side performance. It allows large
packet send in ULPs, and segments this packet in interface layer before
driver xmit. The GSO enablement is through ethtool. Since ipoib doesn't
support ethtool, i just simply added a module parameter to set the
interface GSO flag when loading the module. My next step is to enable
gather/scatter in ipoib send to chain multiple packets together for one
door bell.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/76f3d34a/attachment.html>

From ttelford.groups at gmail.com  Tue Sep 26 16:05:05 2006
From: ttelford.groups at gmail.com (Troy Telford)
Date: Tue, 26 Sep 2006 17:05:05 -0600
Subject: [openib-general] DAPL setup/config help
In-Reply-To: <4519A69E.1070008@ichips.intel.com>
References: <op.tgh750a7zidtg1@rygel.lnxi.com>
	<4519A69E.1070008@ichips.intel.com>
Message-ID: <op.tgic2rxkzidtg1@rygel.lnxi.com>

So far, so good.  Thanks!

> The dapl provider name that your application uses for the open must  
> match the ia_name entry in dat.conf.
>
> sample dat.conf:
>
> # Each entry should have the following fields:
> #
> # <ia_name> <api_version> <threadsafety> <default> <lib_path> \
> #           <provider_version> <ia_params> <platform_params>
> OpenIB-cma u1.2 nonthreadsafe default /usr/lib/libdaplcma.so mv_dapl.1.2  
> "ib0 0" ""
>
>
> The dtest makefile with OFED 1.0 should use OpenIB-cma as the provider  
> name instead of OpenIB-cma-ip. The default configuration was fixed in  
> OFED 1.1.
>
> For dapltest you must pass this dat.conf name as an argument to all  
> scripts. For example "./srv.sh OpenIB-cma"
>
> -arlin
-- 
Troy


From geneing at gmail.com  Tue Sep 26 16:32:23 2006
From: geneing at gmail.com (EI)
Date: Tue, 26 Sep 2006 16:32:23 -0700
Subject: [openib-general] 90-ib.rules incorrect?
Message-ID: <eefa625e0609261632y153042ecq587b0fae8a05c45f@mail.gmail.com>

Isn't the format of 90-ib.rules in
https://openfabrics.org/svn/gen2/trunk/ofed/openib/scripts/90-ib.rulesincorrect.

We have

KERNEL="umad*", NAME="infiniband/%k", which should be
KERNEL=="umad*", NAME="infiniband/%k"

Am I missing something?

Eugene
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/d498c3d6/attachment.html>

From vuhuong at mellanox.com  Tue Sep 26 19:56:45 2006
From: vuhuong at mellanox.com (Vu Pham)
Date: Tue, 26 Sep 2006 19:56:45 -0700
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <20060926144541.GA17938@mellanox.co.il>
References: <20060926144541.GA17938@mellanox.co.il>
Message-ID: <4519E86D.9030508@mellanox.com>

Ishai Rabinovitz wrote:
> Hi Roland,
> 
> SRP High Availability needs an initiator to connect to the same target 
> several times, e.g., once from each IB port of the target (this way we can use
> device mapper multipath for failover). Note that both connections are actually
> active, e.g. multipath is issuing commands to to get the remote scsi id.
> 


It depends on how you define a path. An target-port-ID & 
initiator-port-ID tuple will form a path. Current srp 
implementation uses port GID as initiator_port_ID, and 
target ioc_guid + id_ext as target_port_ID. With this 
implementation, a physical host port & physical target port 
will form a path

multipath driver will see the same scsi_id of a lun thru 
multiple paths


> Since multiple channel operation is currently disabled in connection request,
> each new connection request will cause the target to disconnect
> the existing connection which forces us to bounce a lot between the two channels.

Either you can use multiple channels or derive different 
initiator_port_ID in the login req to have multiple paths on 
the same physical port

Most of srp targets that I tested don't support multiple 
channels.

-vu


From mst at mellanox.co.il  Tue Sep 26 20:48:51 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 06:48:51 +0300
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <4519E86D.9030508@mellanox.com>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com>
Message-ID: <20060927034851.GH24009@mellanox.co.il>

Quoting r. Vu Pham <vuhuong at mellanox.com>:
> Most of srp targets that I tested don't support multiple 
> channels.

Which are these?
And what happens when you ask for multichannel support?

-- 
MST


From mst at mellanox.co.il  Tue Sep 26 21:07:45 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 07:07:45 +0300
Subject: [openib-general] backporting fixes
Message-ID: <20060927040745.GI24009@mellanox.co.il>

Hi!
Now that  2.6.18 (with an additional patch) I looked at backporting bugfixes to
older kernels.  The main problem I see is that the neighbour destructor
interface change is not in 2.6.16, so IPoIB crashes randomly.

So approaches are
- Try to push the change into 2.6.16 by netdev
- Use the all-neighbour list as done by ofed
- Abandon the whole project

Ideas?

-- 
MST


From xma at us.ibm.com  Tue Sep 26 21:34:22 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 21:34:22 -0700
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <1158850592.24776.156.camel@localhost>
Message-ID: <OF6CC30638.D4F256AD-ON872571F6.001815B4-882571F5.0083AB74@us.ibm.com>


Hi, Eli,

> Hi,
> I have a draft implementation of NAPI in ipoib and got the following
> results:
> System descriptions
> ===================
> Quad CPU E64T 2.4 Ghz
> 4 GB RAM
> MT25204 Sinai HCA
> I used netperf for benchmarking, the BW test ran for 600 seconds with 8
> clients and 8 servers.
>
> The results I received are bellow:
>
> netperf TCP_STREAM:
>       BW [MByte/sec]    clients side [irqs/sec]   server side [irqs/sec]
>       --------------    -----------------------   ----------------------
> without NAPI:       506                    86441                   66311
> with NAPI:          550                     6830                   13600
>
>
> netperf TCP_RR:
>       rate [tran/sec]
>                 ---------------
> without NAPI:      39600
> with NAPI:         39470
>
>
>
> Please note this is still under work and we plan to do more tests and
> measure on other devices.

NAPI patch moves ipoib poll from hardware interrupt context to softirq
context. It would reduce the hardware interrupts, reduce hardware latency
and induce some network latency. It might reduce cpu utilization. But I
still question about the BW improvement. I did see various performance with
the same test under the same condition.

Have you tested this patch with different message sizes, different socket
sizes? Are these results consistent better?

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/3219d92d/attachment.html>

From mst at mellanox.co.il  Tue Sep 26 21:44:33 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 07:44:33 +0300
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <OF6CC30638.D4F256AD-ON872571F6.001815B4-882571F5.0083AB74@us.ibm.com>
References: <1158850592.24776.156.camel@localhost>
	<OF6CC30638.D4F256AD-ON872571F6.001815B4-882571F5.0083AB74@us.ibm.com>
Message-ID: <20060927044433.GK24009@mellanox.co.il>

Quoting r. Shirley Ma <xma at us.ibm.com>:
> It might reduce cpu utilization. But I still question about the BW
> improvement.

Well, since (with enough sockets) we are CPU-bound, here's your answer
why BW would go up with NAPI.

-- 
MST


From mst at mellanox.co.il  Tue Sep 26 21:59:30 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 07:59:30 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OFCFFF2378.2760CA3D-ON872571F5.006EAE0B-882571F5.006EF764@us.ibm.com>
References: <OF2A2647E8.67324A49-ON872571F5.0064A835-882571F5.0064D189@us.ibm.com>
	<OFCFFF2378.2760CA3D-ON872571F5.006EAE0B-882571F5.006EF764@us.ibm.com>
Message-ID: <20060927045929.GL24009@mellanox.co.il>

Quoting r. Shirley Ma <xma at us.ibm.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
> We did some touch test on ehca driver, we saw performance drop somehow.

Hmm, it seems ehca still defers the completion event to a tasklet.  It always
seemed weird to me.  So that could be the reason - with NAPI you now get 2
tasklet schedules, as you are actually doing part of what NAPI does, inside the
low level driver.  Try ripping that out and calling the event handler directly,
and see what it does to performance with NAPI.

> I strongly recommand NAPI as a configurable option in ipoib. So customers can turn on/off based on their configurations.

I still hope ehca NAPI performance can be fixed. But if not, maybe we should
have the low level driver set a disable_napi flag rather than have users play
with module options.

-- 
MST


From xma at us.ibm.com  Tue Sep 26 22:08:35 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 22:08:35 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <ada8xk6bahw.fsf@cisco.com>
Message-ID: <OFFBDFB0BD.26173320-ON872571F6.001BAE14-882571F6.0002F778@us.ibm.com>


Hi, Roland,

>     Shirley> It can be a configuration option to enable/disable NAPI,
>     Shirley> just like other network device.
>
> But is there any reason to keep the non-NAPI code around?  I hate to
> have  two codepaths to maintain.

If you would like to maintain one code path only, then we need to compare
the NAPI patch with thread-context polling mode patch. I did see big
performance improvement with thread-context polling mode patch I have been
working on. (I used to split CQ. I am tring without splitting CQ now). And
I think it would improve multiple links performance in share one EQ
situation.

thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/11ea50a6/attachment.html>

From xma at us.ibm.com  Tue Sep 26 22:55:11 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 22:55:11 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060927045929.GL24009@mellanox.co.il>
Message-ID: <OF0ECEEC20.3A8E1746-ON872571F6.001FCAC6-882571F6.00073B6D@us.ibm.com>


"Michael S. Tsirkin" <mst at mellanox.co.il> wrote on 09/26/2006 09:59:30 PM:

> Quoting r. Shirley Ma <xma at us.ibm.com>:
> > Subject: Re: [PATCH] IB/ipoib: NAPI
> >
> > We did some touch test on ehca driver, we saw performance drop somehow.
>
> Hmm, it seems ehca still defers the completion event to a tasklet.  It
always
> seemed weird to me.  So that could be the reason - with NAPI you now get
2
> tasklet schedules, as you are actually doing part of what NAPI
does,inside the
> low level driver.  Try ripping that out and calling the event
> handler directly,
> and see what it does to performance with NAPI.
The reason for this ehca implementation is two ports/links shared one EQ.
We are implementing multiple EQs suport for one adapter now. If that works,
then we can modify the ehca code as mthca. Actually mthca has the same
problem as ehca over two links on the same adapter. Two links on the same
adapter performance are very bad, not scaled at all.

> > I strongly recommand NAPI as a configurable option in ipoib. So
> customers can turn on/off based on their configurations.
>
> I still hope ehca NAPI performance can be fixed. But if not, maybe we
should
> have the low level driver set a disable_napi flag rather than have users
play
> with module options.
>
> --
> MST
We have been working on this issue for some time. That's the reason we
didn't post our NAPI patch. Hopefully we can fix it. If we can show NAPI
performance (latency, BW, cpu utilization) are better in all cases (UP vs.
SMP, one socket vs. multiple sockets, one link vs. multiple links,
different message sizes, different socket sizes....) I will agree to turn
on NAPI as default.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/83783cbc/attachment.html>

From xma at us.ibm.com  Tue Sep 26 23:05:55 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 23:05:55 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060927045929.GL24009@mellanox.co.il>
Message-ID: <OF16859FE9.35787750-ON872571F6.00214B1B-882571F6.000836F6@us.ibm.com>


"Michael S. Tsirkin" <mst at mellanox.co.il> wrote on 09/26/2006 09:59:30 PM:
> I still hope ehca NAPI performance can be fixed. But if not, maybe we
should
> have the low level driver set a disable_napi flag rather than have users
play
> with module options.
>
> --
> MST
I forgot to mention these NAPI parameters should be tunable for different
device drivers, like dev->weight, or set up in lower driver.

thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/ac98fd56/attachment.html>

From delaitt at cpc.wmin.ac.uk  Tue Sep 26 23:16:02 2006
From: delaitt at cpc.wmin.ac.uk (Thierry Delaitre)
Date: Wed, 27 Sep 2006 07:16:02 +0100 (BST)
Subject: [openib-general] [Lustre-discuss] Re: problems with lustre o2ib
 module & ofed
In-Reply-To: <20060926192822.GD24009@mellanox.co.il>
References: <Pine.GSO.4.58.0609241054510.27777@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251244180.25974@seth.cpc.wmin.ac.uk>
	<Pine.GSO.4.58.0609251305100.25974@seth.cpc.wmin.ac.uk>
	<200609261707.37720.jackm@dev.mellanox.co.il>
	<Pine.GSO.4.58.0609261704080.25974@seth.cpc.wmin.ac.uk>
	<20060926192822.GD24009@mellanox.co.il>
Message-ID: <Pine.GSO.4.58.0609270714060.4854@seth.cpc.wmin.ac.uk>


On Tue, 26 Sep 2006, Michael S. Tsirkin wrote:

> Quoting r. Thierry Delaitre <delaitt at cpc.wmin.ac.uk>:
> > Subject: Re: [Lustre-discuss] Re: problems with lustre o2ib module & ofed
> >
> >
> > On Tue, 26 Sep 2006, Jack Morgenstein wrote:
> >
> > > On Monday 25 September 2006 17:01, Thierry Delaitre wrote:
> > >
> > > I noticed in the Lustre configure file the following
> > >   --with-linux=path       set path to Linux source (default=/usr/src/linux)
> > >
> > > Where does /usr/src/linux link to?
> > >
> > > You might consider explicitly specifying the following options as well in the
> > > Lustre ./configure step:
> > >
> > >   --with-linux=path       set path to Linux source (default=/usr/src/linux)
> > >   --with-linux-obj=path   set path to Linux objects dir (default=$LINUX)
> > >   --with-linux-config=path
> > >                           set path to Linux .conf (default=$LINUX_OBJ/.config)
> >
> > I specified the whole string and still the same.
> >
> > ./configure --with-o2ib=/usr/local/ofed/src/openib --with-linux=/usr/src/linux-2.6.16.21-0.8 --with-linux-obj=/usr/src/linux-2.6.16.21-0.8 --with-linux-config=/usr/src/linux-2.6.16.21-0.8/.config
> >
> > Thierry.
>
> 1. Did you reboot after rebuilding everything?
>
> 2. Try to check the compiler command line used for building lustre.
> You must make sure gen2 is before linux kernel in -I flag list.

I've managed to solve the problem by deleting the
/usr/src/linux/include/rdma and pointing /usr/src/linux/driver/infiniband
to /usr/local/ofed/src/openib/driver/infiniband

Thanks to all for your help.

Thierry.


From mst at mellanox.co.il  Tue Sep 26 23:23:16 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 09:23:16 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF0ECEEC20.3A8E1746-ON872571F6.001FCAC6-882571F6.00073B6D@us.ibm.com>
References: <OF0ECEEC20.3A8E1746-ON872571F6.001FCAC6-882571F6.00073B6D@us.ibm.com>
Message-ID: <20060927062316.GO24009@mellanox.co.il>

Quoting r. Shirley Ma <xma at us.ibm.com>:
> We
> are implementing multiple EQs suport for one adapter now.

I think with MSI we can have a per-interface EQ in mthca.
Main reason I'm not doing this is because I haven't figured out
the right interface to pass this information to the low level driver yet.

Maybe we should just assign EQs to CQs in a round-robin fashion
for now, and just hope typical use allocates CQs sequentially.
Worst case, we are back to where we are now, performance-wise.
Roland, how does this sound?

> If that works, then
> we can modify the ehca code as mthca. Actually mthca has the same problem as
> ehca over two links on the same adapter.

OK, but if as you point out the issue is not device-specific -
that's a good reason not to do tricks in low-level driver to try
and work around this, but address this at ULP level.

-- 
MST


From mst at mellanox.co.il  Tue Sep 26 23:28:22 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 09:28:22 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF16859FE9.35787750-ON872571F6.00214B1B-882571F6.000836F6@us.ibm.com>
References: <OF16859FE9.35787750-ON872571F6.00214B1B-882571F6.000836F6@us.ibm.com>
Message-ID: <20060927062822.GQ24009@mellanox.co.il>

Quoting r. Shirley Ma <xma at us.ibm.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
> "Michael S. Tsirkin" <mst at mellanox.co.il> wrote on 09/26/2006 09:59:30 PM:
> > I still hope ehca NAPI performance can be fixed. But if not, maybe we should
> > have the low level driver set a disable_napi flag rather than have users play
> > with module options.
>
> I forgot to mention these NAPI parameters should be tunable for different device drivers, like dev->weight, or set up in lower driver.

So we need something like poll_weight in struct ib_device, to give
a hint on how expensive an interrupt is versus poll?
Seems to make sense, and actually might be useful for other ULPs.
Roland, what do you think?

-- 
MST


From eli at dev.mellanox.co.il  Tue Sep 26 23:35:26 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Wed, 27 Sep 2006 09:35:26 +0300
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <OF6CC30638.D4F256AD-ON872571F6.001815B4-882571F5.0083AB74@us.ibm.com>
References: <OF6CC30638.D4F256AD-ON872571F6.001815B4-882571F5.0083AB74@us.ibm.com>
Message-ID: <1159338926.27719.17.camel@localhost>

On Tue, 2006-09-26 at 21:34 -0700, Shirley Ma wrote:

> NAPI patch moves ipoib poll from hardware interrupt context to softirq
> context. It would reduce the hardware interrupts, reduce hardware
> latency and induce some network latency. It might reduce cpu
> utilization. But I still question about the BW improvement. I did see
> various performance with the same test under the same condition.
> 
When you open just one connection you can see around 10% of variations
in BW measure. But then you don't utilize all the CPU power you have and
you don't get to the threshold where NAPI becomes effective.
Using multiple connections utilizes all CPUs in the system, increases
send rate, and increases the chances of the receiver to poll CQEs up to
its quota and be scheduled again without re-enabling interrupts.


> Have you tested this patch with different message sizes, different
> socket sizes? Are these results consistent better?
> 
I used large socket sizes but I think with that with multiple
connections this parameter does not have significant effect.


From eli at dev.mellanox.co.il  Tue Sep 26 23:38:31 2006
From: eli at dev.mellanox.co.il (Eli cohen)
Date: Wed, 27 Sep 2006 09:38:31 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF4B9A668B.79A63961-ON852571F5.00765CAE-852571F5.00769F14@us.ibm.com>
References: <OF4B9A668B.79A63961-ON852571F5.00765CAE-852571F5.00769F14@us.ibm.com>
Message-ID: <1159339111.27719.21.camel@localhost>

On Tue, 2006-09-26 at 17:35 -0400, Bernard King-Smith wrote:

> Has anyone run the RR test in Netperf to look at latency? What 1 byte
> RR rates did you see before and after applying the patch. 
> 
netperf TCP_RR:
                rate [tran/sec]
                ---------------
without NAPI:      39600
with NAPI:         39470

As you can see there is a minor difference.


From xma at us.ibm.com  Tue Sep 26 23:39:37 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 26 Sep 2006 23:39:37 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060927062316.GO24009@mellanox.co.il>
Message-ID: <OF169AEDB2.69E90784-ON872571F6.002408AF-882571F6.000B4CEC@us.ibm.com>


"Michael S. Tsirkin" <mst at mellanox.co.il> wrote on 09/26/2006 11:23:16 PM:

> Quoting r. Shirley Ma <xma at us.ibm.com>:
> > We
> > are implementing multiple EQs suport for one adapter now.
>
> I think with MSI we can have a per-interface EQ in mthca.
> Main reason I'm not doing this is because I haven't figured out
> the right interface to pass this information to the low level driver yet.
>
> Maybe we should just assign EQs to CQs in a round-robin fashion
> for now, and just hope typical use allocates CQs sequentially.
> Worst case, we are back to where we are now, performance-wise.
> Roland, how does this sound?
>
> > If that works, then
> > we can modify the ehca code as mthca. Actually mthca has the same
problem as
> > ehca over two links on the same adapter.
>
> OK, but if as you point out the issue is not device-specific -
> that's a good reason not to do tricks in low-level driver to try
> and work around this, but address this at ULP level.
>
> --
> MST
Yes. That's what we are working on to define the right APIs to pass
information to low level driver. Now we are trying per interface per EQ,
then we will extent the work to N(CQ):M(EQ) mapping. ehca could support up
to 127 EQs, I would suggest to use hash.

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060926/60d98e26/attachment.html>

From vuhuong at mellanox.com  Tue Sep 26 23:45:50 2006
From: vuhuong at mellanox.com (Vu Pham)
Date: Tue, 26 Sep 2006 23:45:50 -0700
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <20060927034851.GH24009@mellanox.co.il>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com> <20060927034851.GH24009@mellanox.co.il>
Message-ID: <451A1E1E.80203@mellanox.com>

Michael S. Tsirkin wrote:
> Quoting r. Vu Pham <vuhuong at mellanox.com>:
> 
>>Most of srp targets that I tested don't support multiple 
>>channels.
> 
> 
> Which are these?


Mellanox referenced srp target, Texas Memory System's SSD, 
Engenio.

> And what happens when you ask for multichannel support?
> 

For Texas' SSD the login req is rejected

For Mellanox srp target the new multi channel/connection is 
established; however, if the host is in error recovery and 
do target reset, the host should terminate all outstanding 
channels/connections else the target have outstanding I/Os 
dangled on multi-channel/connection and try to complete the 
I/Os. This is violate scsi task management


From mst at mellanox.co.il  Wed Sep 27 00:10:59 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 10:10:59 +0300
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <4519E86D.9030508@mellanox.com>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com>
Message-ID: <20060927071059.GA21509@mellanox.co.il>

Quoting r. Vu Pham <vuhuong at mellanox.com>:
> Either you can use multiple channels or derive different 
> initiator_port_ID in the login req to have multiple paths on 
> the same physical port

So how about we just stick a pointer inside the indentifier extension
instead of enabling multichannel?

-- 
MST


From sweitzen at cisco.com  Wed Sep 27 00:20:35 2006
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 27 Sep 2006 00:20:35 -0700
Subject: [openib-general] [openfabrics-ewg] OFED Status
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA302506AFF@xmb-sjc-216.amer.cisco.com>

Yes, this is fine with me.

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 

> -----Original Message-----
> From: openfabrics-ewg-bounces at openib.org 
> [mailto:openfabrics-ewg-bounces at openib.org] On Behalf Of Aviram Gutman
> Sent: Tuesday, September 26, 2006 9:01 AM
> To: EWG; Openib-General at Openib.Org
> Subject: [openfabrics-ewg] OFED Status
> 
> Hi,
> 
> OFED 1.1 RC6 was released on Thu.
> 
> The issues that were resolved since are:
> 
> 1) OpenIB Diags build on SLES10 ppc  - Solved by Moshe Katzir 
> from Voltaire
> 2)  iSER build on SLES10 needs root privilege - Voltaire fixed it
> 3) Bug #233 SDP crash on ipath - I believe MST fixed. Betsy 
> please confirm.
> 4) Fix IBDM to allow multiple devices on the same machine - 
> Eitan Zahavi 
> fixed
> 5) SRP HA - Fixed by Ishai
> 6) IPoIB HA on RH - Vlad made progess, issue is still not solved.
> 7) The CM fix that Arlin asked - In
> 
> Pending that IPoIB HA is solved would like to issue RC7 that 
> suppose to 
> be final. Is everyone OK with this approach?
> 
> 
> Aviram
> 
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 


From erezz at voltaire.com  Wed Sep 27 00:55:42 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 10:55:42 +0300
Subject: [openib-general] oops after rmmod ib_cm when stopping iSER
Message-ID: <451A2E7E.8050504@voltaire.com>

Sean,

When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an oops 
(below). In order to check which module caused that oops, I replaced the 
'modprobe -r' call with rmmod for each module:

rmmod ib_iser
rmmod libiscsi
rmmod scsi_transport_iscsi
rmmod rdma_cm
rmmod ib_addr
rmmod ib_cm

If I wait a few seconds before the removal of ib_cm, everything is ok.

thyme login: Sep 27 09:50:08 thyme kernel: iser: 
iscsi_iser_ep_disconnect:ib conn ffff81005e426000 state 2
Sep 27 09:50:08 thyme kernel: iser: iser_cq_tasklet_fn:comp w. error op 
0 status 5
Sep 27 09:50:08 thyme last message repeated 3 times
Sep 27 09:50:08 thyme kernel: iser: iser_cma_handler:event 10 conn 
ffff81005e426000 id ffff81006c304a00
Sep 27 09:50:08 thyme kernel: iser: iser_free_ib_conn_res:freeing conn 
ffff81005e426000 cma_id ffff81006c304a00 fmr pool ffff8100560f2e40 qp f0
Sep 27 09:50:08 thyme kernel: iser: iser_device_try_release:device 
ffff8100796037c0 refcount 0
Sep 27 09:50:09 thyme kernel: cma_cleanup: entry
Sep 27 09:50:09 thyme kernel: cma_cleanup: calling destroy_workqueue
Sep 27 09:50:09 thyme kernel: cma_cleanup: calling idr_destroy(&sdp_ps)
Sep 27 09:50:09 thyme kernel: cma_cleanup: calling idr_destroy(&tcp_ps)
Sep 27 09:50:09 thyme kernel: cma_cleanup: exit
Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: entry
Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: calling ib_unregister_client
Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: calling idr_destroy
Sep 27 09:50:09 thyme kernel: ib_cm_cleanup: exit
Unable to handle kernel paging request at ffffffff8b02e017 RIP:
[<ffffffff8024133c>] delayed_work_timer_fn+0x2c/0x40
PGD 203027 PUD 205027 PMD 0
Oops: 0000 [1] SMP
CPU 3
Modules linked in: ib_uverbs ib_ipoib ib_sa autofs usbserial parport_pc 
lp parport edd cpufreq_userspace acpi_cpufreq thermal processor fan bud
Pid: 0, comm: swapper Not tainted 2.6.18-rc4-ga2d9f966-dirty #1
RIP: 0010:[<ffffffff8024133c>] [<ffffffff8024133c>] 
delayed_work_timer_fn+0x2c/0x40
RSP: 0018:ffff81007e36fef8 EFLAGS: 00010246
RAX: ffffffff8b02dfff RBX: 0000000000000100 RCX: ffff81006b152d20
RDX: 0000000000000003 RSI: ffff810068576a00 RDI: ffff810068576a00
RBP: ffff81007e340000 R08: efe9331445cb91ec R09: ffff81007e3a8008
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff80241310
R13: ffff81007e36ff00 R14: 000000000000000a R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff81007e344b40(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff8b02e017 CR3: 0000000060a44000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81007e36a000, task ffff81007e347080)
Stack: ffffffff80239826 ffff81007e36ff00 ffff81007e36ff00 ffff81000102ac20
0000000000000000 0000000000000096 0000000000000011 ffffffff8065a110
ffffffff806b2b20 0000000000000003 ffffffff80235d0b ffff81007e36ff48
Call Trace:
<IRQ> [<ffffffff80239826>] run_timer_softirq+0x156/0x1e0
[<ffffffff80235d0b>] __do_softirq+0x6b/0xe0
[<ffffffff8020aee8>] call_softirq+0x1c/0x34
[<ffffffff8020c92c>] do_softirq+0x2c/0x90
[<ffffffff802083e0>] mwait_idle+0x0/0x50
[<ffffffff8020a886>] apic_timer_interrupt+0x66/0x6c
<EOI> [<ffffffff80208416>] mwait_idle+0x36/0x50
[<ffffffff80208e1a>] cpu_idle+0x6a/0x90
[<ffffffff8067f919>] start_secondary+0x499/0x4b0


Code: 48 8b 3c d0 e9 4b ff ff ff 66 66 66 90 66 66 66 90 66 66 90
RIP [<ffffffff8024133c>] delayed_work_timer_fn+0x2c/0x40
RSP <ffff81007e36fef8>
CR2: ffffffff8b02e017
<0>Kernel panic - not syncing: Aiee, killing interrupt handler!

-- 

____________________________________________________________

Erez Zilber | 972-9-971-7689

Software Engineer, Storage Team

Voltaire – _The Grid Backbone_

__

www.voltaire.com <http://www.voltaire.com/>


From mst at mellanox.co.il  Wed Sep 27 01:30:03 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 11:30:03 +0300
Subject: [openib-general] enable GSO over IPoIB
In-Reply-To: <OFBB76619C.C348CC61-ON872571F5.0077B8EA-882571F5.0078499A@us.ibm.com>
References: <ada4puubafq.fsf@cisco.com>
	<OFBB76619C.C348CC61-ON872571F5.0077B8EA-882571F5.0078499A@us.ibm.com>
Message-ID: <20060927083003.GB22263@mellanox.co.il>

Quoting r. Shirley Ma <xma at us.ibm.com>:
> Subject: Re: enable GSO over IPoIB
> 
> >     Shirley> Since linux 2.6.18 supports GSO, I have patched IPoIB to
> >     Shirley> enable GSO, but haven't tested the performance yet. Has
> >     Shirley> anyone tried already?
> >
> > No, I don't think anyone looked at that yet.  Could you post your
> > patch?  What is required?  Supporting gather/scatter?
> >
> >  - R.
> 
> Don't need too. GSO only improves sender side performance. It allows large packet send in ULPs, and segments this packet in interface layer before driver xmit. The GSO enablement is through ethtool. Since ipoib doesn't support ethtool, i just simply added a module parameter to set the interface GSO flag when loading the module.

Any idea what does ethtool do that IPoIB can't support?

-- 
MST


From xma at us.ibm.com  Wed Sep 27 01:37:04 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 27 Sep 2006 01:37:04 -0700
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <1159338926.27719.17.camel@localhost>
Message-ID: <OF4396859A.3B8D4673-ON872571F6.002AA8AC-882571F6.00160D9A@us.ibm.com>


Hi, Eli,

Eli cohen <eli at dev.mellanox.co.il> wrote on 09/26/2006 11:35:26 PM:
> On Tue, 2006-09-26 at 21:34 -0700, Shirley Ma wrote:
>
> > NAPI patch moves ipoib poll from hardware interrupt context to softirq
> > context. It would reduce the hardware interrupts, reduce hardware
> > latency and induce some network latency. It might reduce cpu
> > utilization. But I still question about the BW improvement. I did see
> > various performance with the same test under the same condition.
> >
> When you open just one connection you can see around 10% of variations
> in BW measure. But then you don't utilize all the CPU power you have and
> you don't get to the threshold where NAPI becomes effective.
> Using multiple connections utilizes all CPUs in the system, increases
> send rate, and increases the chances of the receiver to poll CQEs up to
> its quota and be scheduled again without re-enabling interrupts.

Send rate shouldn't be limited by one connection. The cpu is much faster
than the link speed. I don't think multiple connections send rate is
increased than one connection. Do you have any data to show that?

When I monitored the CQEs, I didn't see too many CQEs in CQ for one
notification, and I don't think moving NAPI from hardware interrupt context
to softirq context would increase that number. Or the latency might cause
the number increased, I did see that number increased and performance
increased with some udelay in hardware interrupt polling mode. If you saw
the packets increased, how many packets did you see in both one hardware
interrupt poll and one NAPI poll?

You NAPI poll is driven either by receiver quota or any send CQE in CQ.
Have you tested UDP performance? any difference?

Thanks
Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/c7ae1696/attachment.html>

From mst at mellanox.co.il  Wed Sep 27 01:45:25 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 11:45:25 +0300
Subject: [openib-general] heads-up - ipoib NAPI
In-Reply-To: <OF4396859A.3B8D4673-ON872571F6.002AA8AC-882571F6.00160D9A@us.ibm.com>
References: <1159338926.27719.17.camel@localhost>
	<OF4396859A.3B8D4673-ON872571F6.002AA8AC-882571F6.00160D9A@us.ibm.com>
Message-ID: <20060927084525.GB22364@mellanox.co.il>

Quoting r. Shirley Ma <xma at us.ibm.com>:
> You NAPI poll is driven either by receiver quota or any send CQE in CQ. Have you tested UDP performance? any difference?

The thing to do currently is probably to wait for Roland to post an
updated patch, then test it.

-- 
MST


From xma at us.ibm.com  Wed Sep 27 01:45:49 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 27 Sep 2006 01:45:49 -0700
Subject: [openib-general] enable GSO over IPoIB
In-Reply-To: <20060927083003.GB22263@mellanox.co.il>
Message-ID: <OF8E461F65.0BD2E62A-ON872571F6.002FB912-882571F6.0016DABA@us.ibm.com>


"Michael S. Tsirkin" <mst at mellanox.co.il> wrote on 09/27/2006 01:30:03 AM:
>Any idea what does ethtool do that IPoIB can't support?
ethtool is an ethernet device tool. It's OK to partically implement ethtool
operations in IPoIB. We also need to patch the userlevel utility to support
ibX interface. Now it only supports ethX.

thanks
Shirley Ma
IBM Linux Technology Center
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/56defddb/attachment.html>

From dennis at osc.edu  Wed Sep 27 02:44:22 2006
From: dennis at osc.edu (Dennis Dalessandro)
Date: Wed, 27 Sep 2006 11:44:22 +0200
Subject: [openib-general] Port reuse issue for rdma_cm/iwarp
In-Reply-To: <Pine.GSO.4.40.0609261746390.29485-100000@mu.cse.ohio-state.edu>
References: <Pine.GSO.4.40.0609261746390.29485-100000@mu.cse.ohio-state.edu>
Message-ID: <1159350262.2785.3.camel@barney>

Has to do with the socket going into the time wait state, which is
because it is waiting for any possibly still in flight packets as
Caitlin said.  From what I was told, there is not really any option to
get around this with the Ammasso card. This was back when they were
still in business though, and for their ccil driver.  Probably better
off to use different ports.

-Dennis


On Tue, 2006-09-26 at 17:53 -0400, Sundeep Narravula wrote:
> > TCP restricts prompt re-use of the same Source/Destination
> > Address/Port pair while old traffic could still be in-flight.
> > This is generally not an issue because prompt re-use of the
> > exact four tuple is rare.
> >
> > Is there a special reason why your application needs to
> > reuse the same port from the active side? If the port number
> > is being used to identify the rank, could private data be
> > used instead?
> 
> Our application is primarily an invocation of multiple independent parallel
> jobs which all need to connect or each other on each invocation. Since
> this is a TCP limitation, is there any interface similar to setsockopt
> with TCP_NODELAY. We probably need to use different ports otherwise.
> 
> Thanks,
>   --Sundeep.
> 
> >
> >
> >
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From erezz at voltaire.com  Wed Sep 27 05:21:35 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 15:21:35 +0300 (IDT)
Subject: [openib-general] [PATCH 0/3] IB/iser: bug fixes for 2.6.19 rc1
Message-ID: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>

Roland,

Here is a series of patches for iSER. Most of them are bug fixes. I hope 
that they can be added to rc1.

Thanks
Erez


From erezz at voltaire.com  Wed Sep 27 05:27:10 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 15:27:10 +0300 (IDT)
Subject: [openib-general] [PATCH 1/3] IB/iser: have iSER data transaction
 object pointing to iSER conn
In-Reply-To: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609271521540.20024-100000@hydrus>

iSER uses a data transaction object (struct iser_dto) as part
of its IB data descriptors (struct iser_desc) management.
It also uses a hierarchy of connection structures pointing to
each other. A DTO may exist even after the iscsi_iser connection
pointed by it is destructed (eg one that is bounded to post
receive buffer which was flushed by the IB HW). Hence DTOs need
point to the lowest connection, which is struct iser_conn.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.c     |    2 ++
 drivers/infiniband/ulp/iser/iscsi_iser.h     |    2 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |   11 ++++++-----
 drivers/infiniband/ulp/iser/iser_verbs.c     |    8 +++++---
 4 files changed, 14 insertions(+), 9 deletions(-)

57b132002a5e3bf3ba0ae362f174404e29c69449
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 101e407..b37f429 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -317,6 +317,8 @@ iscsi_iser_conn_destroy(struct iscsi_cls
 	struct iscsi_iser_conn *iser_conn = conn->dd_data;
 
 	iscsi_conn_teardown(cls_conn);
+	if (iser_conn->ib_conn)
+		iser_conn->ib_conn->iser_conn = NULL;
 	kfree(iser_conn);
 }
 
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 7c3d0c9..7f44636 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -187,7 +187,7 @@ struct iser_regd_buf {
 
 struct iser_dto {
 	struct iscsi_iser_cmd_task *ctask;
-	struct iscsi_iser_conn     *conn;
+	struct iser_conn *ib_conn;
 	int                        notify_enable;
 
 	/* vector of registered buffers */
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index ccf56f6..14ae61e 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -249,7 +249,7 @@ static int iser_post_receive_control(str
 	}
 
 	recv_dto = &rx_desc->dto;
-	recv_dto->conn          = iser_conn;
+	recv_dto->ib_conn = iser_conn->ib_conn;
 	recv_dto->regd_vector_len = 0;
 
 	regd_hdr = &rx_desc->hdr_regd_buf;
@@ -296,7 +296,7 @@ static void iser_create_send_desc(struct
 	regd_hdr->virt_addr  = tx_desc; /* == &tx_desc->iser_header */
 	regd_hdr->data_size  = ISER_TOTAL_HEADERS_LEN;
 
-	send_dto->conn          = iser_conn;
+	send_dto->ib_conn         = iser_conn->ib_conn;
 	send_dto->notify_enable   = 1;
 	send_dto->regd_vector_len = 0;
 
@@ -588,7 +588,7 @@ void iser_rcv_completion(struct iser_des
 			 unsigned long dto_xfer_len)
 {
 	struct iser_dto        *dto = &rx_desc->dto;
-	struct iscsi_iser_conn *conn = dto->conn;
+	struct iscsi_iser_conn *conn = dto->ib_conn->iser_conn;
 	struct iscsi_session *session = conn->iscsi_conn->session;
 	struct iscsi_cmd_task *ctask;
 	struct iscsi_iser_cmd_task *iser_ctask;
@@ -641,7 +641,8 @@ void iser_rcv_completion(struct iser_des
 void iser_snd_completion(struct iser_desc *tx_desc)
 {
 	struct iser_dto        *dto = &tx_desc->dto;
-	struct iscsi_iser_conn *iser_conn = dto->conn;
+	struct iser_conn       *ib_conn = dto->ib_conn;
+	struct iscsi_iser_conn *iser_conn = ib_conn->iser_conn;
 	struct iscsi_conn      *conn = iser_conn->iscsi_conn;
 	struct iscsi_mgmt_task *mtask;
 
@@ -652,7 +653,7 @@ void iser_snd_completion(struct iser_des
 	if (tx_desc->type == ISCSI_TX_DATAOUT)
 		kmem_cache_free(ig.desc_cache, tx_desc);
 
-	atomic_dec(&iser_conn->ib_conn->post_send_buf_count);
+	atomic_dec(&ib_conn->post_send_buf_count);
 
 	write_lock(conn->recv_lock);
 	if (conn->suspend_tx) {
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 72febf1..11d4e87 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -570,6 +570,8 @@ void iser_conn_release(struct iser_conn 
 	/* on EVENT_ADDR_ERROR there's no device yet for this conn */
 	if (device != NULL)
 		iser_device_try_release(device);
+	if (ib_conn->iser_conn)
+		ib_conn->iser_conn->ib_conn = NULL;
 	kfree(ib_conn);
 }
 
@@ -692,7 +694,7 @@ int iser_post_recv(struct iser_desc *rx_
 	struct iser_dto   *recv_dto = &rx_desc->dto;
 
 	/* Retrieve conn */
-	ib_conn = recv_dto->conn->ib_conn;
+	ib_conn = recv_dto->ib_conn;
 
 	iser_dto_to_iov(recv_dto, iov, 2);
 
@@ -725,7 +727,7 @@ int iser_post_send(struct iser_desc *tx_
 	struct iser_conn  *ib_conn;
 	struct iser_dto   *dto = &tx_desc->dto;
 
-	ib_conn = dto->conn->ib_conn;
+	ib_conn = dto->ib_conn;
 
 	iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN);
 
@@ -772,7 +774,7 @@ static void iser_comp_error_worker(void 
 static void iser_handle_comp_error(struct iser_desc *desc)
 {
 	struct iser_dto  *dto     = &desc->dto;
-	struct iser_conn *ib_conn = dto->conn->ib_conn;
+	struct iser_conn *ib_conn = dto->ib_conn;
 
 	iser_dto_buffs_release(dto);
 
-- 
1.2.6


From erezz at voltaire.com  Wed Sep 27 06:43:06 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 16:43:06 +0300 (IDT)
Subject: [openib-general] [PATCH 2/3] IB/iser: dma unmap an unaligned for
 rdma data before touching it
In-Reply-To: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609271527420.20024-100000@hydrus>

iSER uses the dma mapping api to map the page holding the
scsi command data to the hca dma address space. When the
command data is not aligned for rdma, the data is copied
to/from an allocated buffer which in turn is used for
executing this command. The pages associated with the
command must be unmapped before being are touched.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.h     |    7 ++++
 drivers/infiniband/ulp/iser/iser_initiator.c |   49 +++++---------------------
 drivers/infiniband/ulp/iser/iser_memory.c    |   42 ++++++++++++++++++++++
 3 files changed, 59 insertions(+), 39 deletions(-)

78a237418bd3547cfeb49828a8b857ac5241749f
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 7f44636..4a7069f 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -350,4 +350,11 @@ int  iser_post_send(struct iser_desc *tx
 
 int iser_conn_state_comp(struct iser_conn *ib_conn,
 			 enum iser_ib_conn_state comp);
+
+int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
+			    struct iser_data_buf       *data,
+			    enum   iser_data_dir       iser_dir,
+			    enum   dma_data_direction  dma_dir);
+
+void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask);
 #endif
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index 14ae61e..9b3d79c 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -66,42 +66,6 @@ static void iser_dto_add_regd_buff(struc
 	dto->regd_vector_len++;
 }
 
-static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
-				  struct iser_data_buf       *data,
-				  enum   iser_data_dir       iser_dir,
-				  enum   dma_data_direction  dma_dir)
-{
-	struct device *dma_device;
-
-	iser_ctask->dir[iser_dir] = 1;
-	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
-
-	data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir);
-	if (data->dma_nents == 0) {
-		iser_err("dma_map_sg failed!!!\n");
-		return -EINVAL;
-	}
-	return 0;
-}
-
-static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask)
-{
-	struct device  *dma_device;
-	struct iser_data_buf *data;
-
-	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
-
-	if (iser_ctask->dir[ISER_DIR_IN]) {
-		data = &iser_ctask->data[ISER_DIR_IN];
-		dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE);
-	}
-
-	if (iser_ctask->dir[ISER_DIR_OUT]) {
-		data = &iser_ctask->data[ISER_DIR_OUT];
-		dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE);
-	}
-}
-
 /* Register user buffer memory and initialize passive rdma
  *  dto descriptor. Total data size is stored in
  *  iser_ctask->data[ISER_DIR_IN].data_len
@@ -699,14 +663,19 @@ void iser_ctask_rdma_init(struct iscsi_i
 void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask)
 {
 	int deferred;
+	int is_rdma_aligned = 1;
 
 	/* if we were reading, copy back to unaligned sglist,
 	 * anyway dma_unmap and free the copy
 	 */
-	if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL)
+	if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) {
+		is_rdma_aligned = 0;
 		iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN);
-	if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL)
+	}
+	if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) {
+		is_rdma_aligned = 0;
 		iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT);
+	}
 
 	if (iser_ctask->dir[ISER_DIR_IN]) {
 		deferred = iser_regd_buff_release
@@ -726,7 +695,9 @@ void iser_ctask_rdma_finalize(struct isc
 		}
 	}
 
-	iser_dma_unmap_task_data(iser_ctask);
+       /* if the data was unaligned, it was already unmapped and then copied */
+       if (is_rdma_aligned)
+		iser_dma_unmap_task_data(iser_ctask);
 }
 
 void iser_dto_buffs_release(struct iser_dto *dto)
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 31950a5..0f87163 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -360,6 +360,44 @@ static void iser_page_vec_build(struct i
 	}
 }
 
+int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
+			    struct iser_data_buf       *data,
+			    enum   iser_data_dir       iser_dir,
+			    enum   dma_data_direction  dma_dir)
+{
+	struct device *dma_device;
+
+	iser_ctask->dir[iser_dir] = 1;
+	dma_device =
+		iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir);
+	if (data->dma_nents == 0) {
+		iser_err("dma_map_sg failed!!!\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask)
+{
+	struct device  *dma_device;
+	struct iser_data_buf *data;
+
+	dma_device =
+		iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	if (iser_ctask->dir[ISER_DIR_IN]) {
+		data = &iser_ctask->data[ISER_DIR_IN];
+		dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE);
+	}
+
+	if (iser_ctask->dir[ISER_DIR_OUT]) {
+		data = &iser_ctask->data[ISER_DIR_OUT];
+		dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE);
+	}
+}
+
 /**
  * iser_reg_rdma_mem - Registers memory intended for RDMA,
  * obtaining rkey and va
@@ -382,6 +420,10 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 		iser_err("rdma alignment violation %d/%d aligned\n",
 			 aligned_len, mem->size);
 		iser_data_buf_dump(mem);
+
+		/* unmap the command data before accessing it */
+		iser_dma_unmap_task_data(iser_ctask);
+
 		/* allocate copy buf, if we are writing, copy the */
 		/* unaligned scatterlist, dma map the copy        */
 		if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0)
-- 
1.2.6


From erezz at voltaire.com  Wed Sep 27 06:48:57 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 16:48:57 +0300 (IDT)
Subject: [openib-general] [PATCH 3/3] IB/iser: fix the description of iSER
	in Kconfig
In-Reply-To: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609271546020.20024-100000@hydrus>

fix the description of iSER in Kconfig. It is not accurate.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/Kconfig |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

e6a8887cad4e2270c5173451e8b706b907b88133
diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
index fead87d..80f6716 100644
--- a/drivers/infiniband/ulp/iser/Kconfig
+++ b/drivers/infiniband/ulp/iser/Kconfig
@@ -1,11 +1,12 @@
 config INFINIBAND_ISER
-	tristate "ISCSI RDMA Protocol"
+	tristate "iSCSI Extensions for RDMA (iSER)"
 	depends on INFINIBAND && SCSI
 	select SCSI_ISCSI_ATTRS
 	---help---
-	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
-	  allows you to access storage devices that speak ISER/ISCSI
+	  Support for the iSCSI Extensions for RDMA (iSER) Protocol over InfiniBand. This
+	  allows you to access storage devices that speak iSCSI over iSER
 	  over InfiniBand.
 
-	  The ISER protocol is defined by IETF.
-	  See <http://www.ietf.org/>.
+	  The iSER protocol is defined by IETF.
+	  See <http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-05.txt>
+	  and <http://www.infinibandta.org/members/spec/iser_annex_060418.pdf>
-- 
1.2.6


From erezz at voltaire.com  Wed Sep 27 06:58:02 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 16:58:02 +0300 (IDT)
Subject: [openib-general] [PATCH 1/3] IB/iser: have iSER data transaction
 object pointing to iSER conn
In-Reply-To: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609271655420.20024-100000@hydrus>

(This patch may be a duplicate. Something went wrong with my previous 
mail.)

iSER uses a data transaction object (struct iser_dto) as part
of its IB data descriptors (struct iser_desc) management.
It also uses a hierarchy of connection structures pointing to
each other. A DTO may exist even after the iscsi_iser connection
pointed by it is destructed (eg one that is bounded to post
receive buffer which was flushed by the IB HW). Hence DTOs need
point to the lowest connection, which is struct iser_conn.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.c     |    2 ++
 drivers/infiniband/ulp/iser/iscsi_iser.h     |    2 +-
 drivers/infiniband/ulp/iser/iser_initiator.c |   11 ++++++-----
 drivers/infiniband/ulp/iser/iser_verbs.c     |    8 +++++---
 4 files changed, 14 insertions(+), 9 deletions(-)

57b132002a5e3bf3ba0ae362f174404e29c69449
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 101e407..b37f429 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -317,6 +317,8 @@ iscsi_iser_conn_destroy(struct iscsi_cls
 	struct iscsi_iser_conn *iser_conn = conn->dd_data;
 
 	iscsi_conn_teardown(cls_conn);
+	if (iser_conn->ib_conn)
+		iser_conn->ib_conn->iser_conn = NULL;
 	kfree(iser_conn);
 }
 
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 7c3d0c9..7f44636 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -187,7 +187,7 @@ struct iser_regd_buf {
 
 struct iser_dto {
 	struct iscsi_iser_cmd_task *ctask;
-	struct iscsi_iser_conn     *conn;
+	struct iser_conn *ib_conn;
 	int                        notify_enable;
 
 	/* vector of registered buffers */
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index ccf56f6..14ae61e 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -249,7 +249,7 @@ static int iser_post_receive_control(str
 	}
 
 	recv_dto = &rx_desc->dto;
-	recv_dto->conn          = iser_conn;
+	recv_dto->ib_conn = iser_conn->ib_conn;
 	recv_dto->regd_vector_len = 0;
 
 	regd_hdr = &rx_desc->hdr_regd_buf;
@@ -296,7 +296,7 @@ static void iser_create_send_desc(struct
 	regd_hdr->virt_addr  = tx_desc; /* == &tx_desc->iser_header */
 	regd_hdr->data_size  = ISER_TOTAL_HEADERS_LEN;
 
-	send_dto->conn          = iser_conn;
+	send_dto->ib_conn         = iser_conn->ib_conn;
 	send_dto->notify_enable   = 1;
 	send_dto->regd_vector_len = 0;
 
@@ -588,7 +588,7 @@ void iser_rcv_completion(struct iser_des
 			 unsigned long dto_xfer_len)
 {
 	struct iser_dto        *dto = &rx_desc->dto;
-	struct iscsi_iser_conn *conn = dto->conn;
+	struct iscsi_iser_conn *conn = dto->ib_conn->iser_conn;
 	struct iscsi_session *session = conn->iscsi_conn->session;
 	struct iscsi_cmd_task *ctask;
 	struct iscsi_iser_cmd_task *iser_ctask;
@@ -641,7 +641,8 @@ void iser_rcv_completion(struct iser_des
 void iser_snd_completion(struct iser_desc *tx_desc)
 {
 	struct iser_dto        *dto = &tx_desc->dto;
-	struct iscsi_iser_conn *iser_conn = dto->conn;
+	struct iser_conn       *ib_conn = dto->ib_conn;
+	struct iscsi_iser_conn *iser_conn = ib_conn->iser_conn;
 	struct iscsi_conn      *conn = iser_conn->iscsi_conn;
 	struct iscsi_mgmt_task *mtask;
 
@@ -652,7 +653,7 @@ void iser_snd_completion(struct iser_des
 	if (tx_desc->type == ISCSI_TX_DATAOUT)
 		kmem_cache_free(ig.desc_cache, tx_desc);
 
-	atomic_dec(&iser_conn->ib_conn->post_send_buf_count);
+	atomic_dec(&ib_conn->post_send_buf_count);
 
 	write_lock(conn->recv_lock);
 	if (conn->suspend_tx) {
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 72febf1..11d4e87 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -570,6 +570,8 @@ void iser_conn_release(struct iser_conn 
 	/* on EVENT_ADDR_ERROR there's no device yet for this conn */
 	if (device != NULL)
 		iser_device_try_release(device);
+	if (ib_conn->iser_conn)
+		ib_conn->iser_conn->ib_conn = NULL;
 	kfree(ib_conn);
 }
 
@@ -692,7 +694,7 @@ int iser_post_recv(struct iser_desc *rx_
 	struct iser_dto   *recv_dto = &rx_desc->dto;
 
 	/* Retrieve conn */
-	ib_conn = recv_dto->conn->ib_conn;
+	ib_conn = recv_dto->ib_conn;
 
 	iser_dto_to_iov(recv_dto, iov, 2);
 
@@ -725,7 +727,7 @@ int iser_post_send(struct iser_desc *tx_
 	struct iser_conn  *ib_conn;
 	struct iser_dto   *dto = &tx_desc->dto;
 
-	ib_conn = dto->conn->ib_conn;
+	ib_conn = dto->ib_conn;
 
 	iser_dto_to_iov(dto, iov, MAX_REGD_BUF_VECTOR_LEN);
 
@@ -772,7 +774,7 @@ static void iser_comp_error_worker(void 
 static void iser_handle_comp_error(struct iser_desc *desc)
 {
 	struct iser_dto  *dto     = &desc->dto;
-	struct iser_conn *ib_conn = dto->conn->ib_conn;
+	struct iser_conn *ib_conn = dto->ib_conn;
 
 	iser_dto_buffs_release(dto);
 
-- 
1.2.6


From erezz at voltaire.com  Wed Sep 27 07:02:49 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 27 Sep 2006 17:02:49 +0300 (IDT)
Subject: [openib-general] [PATCH 2/3] IB/iser: dma unmap an unaligned for
 rdma data before touching it
In-Reply-To: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
Message-ID: <Pine.LNX.4.44.0609271700440.20024-100000@hydrus>

(This patch may be a duplicate. Something went wrong with my previous 
mail.)

iSER uses the dma mapping api to map the page holding the
scsi command data to the hca dma address space. When the
command data is not aligned for rdma, the data is copied
to/from an allocated buffer which in turn is used for
executing this command. The pages associated with the
command must be unmapped before being touched.

Signed-off-by: Erez Zilber <erezz at voltaire.com>

---

 drivers/infiniband/ulp/iser/iscsi_iser.h     |    7 ++++
 drivers/infiniband/ulp/iser/iser_initiator.c |   49 +++++---------------------
 drivers/infiniband/ulp/iser/iser_memory.c    |   42 ++++++++++++++++++++++
 3 files changed, 59 insertions(+), 39 deletions(-)

78a237418bd3547cfeb49828a8b857ac5241749f
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 7f44636..4a7069f 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -350,4 +350,11 @@ int  iser_post_send(struct iser_desc *tx
 
 int iser_conn_state_comp(struct iser_conn *ib_conn,
 			 enum iser_ib_conn_state comp);
+
+int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
+			    struct iser_data_buf       *data,
+			    enum   iser_data_dir       iser_dir,
+			    enum   dma_data_direction  dma_dir);
+
+void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask);
 #endif
diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index 14ae61e..9b3d79c 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -66,42 +66,6 @@ static void iser_dto_add_regd_buff(struc
 	dto->regd_vector_len++;
 }
 
-static int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
-				  struct iser_data_buf       *data,
-				  enum   iser_data_dir       iser_dir,
-				  enum   dma_data_direction  dma_dir)
-{
-	struct device *dma_device;
-
-	iser_ctask->dir[iser_dir] = 1;
-	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
-
-	data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir);
-	if (data->dma_nents == 0) {
-		iser_err("dma_map_sg failed!!!\n");
-		return -EINVAL;
-	}
-	return 0;
-}
-
-static void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask)
-{
-	struct device  *dma_device;
-	struct iser_data_buf *data;
-
-	dma_device = iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
-
-	if (iser_ctask->dir[ISER_DIR_IN]) {
-		data = &iser_ctask->data[ISER_DIR_IN];
-		dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE);
-	}
-
-	if (iser_ctask->dir[ISER_DIR_OUT]) {
-		data = &iser_ctask->data[ISER_DIR_OUT];
-		dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE);
-	}
-}
-
 /* Register user buffer memory and initialize passive rdma
  *  dto descriptor. Total data size is stored in
  *  iser_ctask->data[ISER_DIR_IN].data_len
@@ -699,14 +663,19 @@ void iser_ctask_rdma_init(struct iscsi_i
 void iser_ctask_rdma_finalize(struct iscsi_iser_cmd_task *iser_ctask)
 {
 	int deferred;
+	int is_rdma_aligned = 1;
 
 	/* if we were reading, copy back to unaligned sglist,
 	 * anyway dma_unmap and free the copy
 	 */
-	if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL)
+	if (iser_ctask->data_copy[ISER_DIR_IN].copy_buf != NULL) {
+		is_rdma_aligned = 0;
 		iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_IN);
-	if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL)
+	}
+	if (iser_ctask->data_copy[ISER_DIR_OUT].copy_buf != NULL) {
+		is_rdma_aligned = 0;
 		iser_finalize_rdma_unaligned_sg(iser_ctask, ISER_DIR_OUT);
+	}
 
 	if (iser_ctask->dir[ISER_DIR_IN]) {
 		deferred = iser_regd_buff_release
@@ -726,7 +695,9 @@ void iser_ctask_rdma_finalize(struct isc
 		}
 	}
 
-	iser_dma_unmap_task_data(iser_ctask);
+       /* if the data was unaligned, it was already unmapped and then copied */
+       if (is_rdma_aligned)
+		iser_dma_unmap_task_data(iser_ctask);
 }
 
 void iser_dto_buffs_release(struct iser_dto *dto)
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 31950a5..0f87163 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -360,6 +360,44 @@ static void iser_page_vec_build(struct i
 	}
 }
 
+int iser_dma_map_task_data(struct iscsi_iser_cmd_task *iser_ctask,
+			    struct iser_data_buf       *data,
+			    enum   iser_data_dir       iser_dir,
+			    enum   dma_data_direction  dma_dir)
+{
+	struct device *dma_device;
+
+	iser_ctask->dir[iser_dir] = 1;
+	dma_device =
+		iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	data->dma_nents = dma_map_sg(dma_device, data->buf, data->size, dma_dir);
+	if (data->dma_nents == 0) {
+		iser_err("dma_map_sg failed!!!\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+void iser_dma_unmap_task_data(struct iscsi_iser_cmd_task *iser_ctask)
+{
+	struct device  *dma_device;
+	struct iser_data_buf *data;
+
+	dma_device =
+		iser_ctask->iser_conn->ib_conn->device->ib_device->dma_device;
+
+	if (iser_ctask->dir[ISER_DIR_IN]) {
+		data = &iser_ctask->data[ISER_DIR_IN];
+		dma_unmap_sg(dma_device, data->buf, data->size, DMA_FROM_DEVICE);
+	}
+
+	if (iser_ctask->dir[ISER_DIR_OUT]) {
+		data = &iser_ctask->data[ISER_DIR_OUT];
+		dma_unmap_sg(dma_device, data->buf, data->size, DMA_TO_DEVICE);
+	}
+}
+
 /**
  * iser_reg_rdma_mem - Registers memory intended for RDMA,
  * obtaining rkey and va
@@ -382,6 +420,10 @@ int iser_reg_rdma_mem(struct iscsi_iser_
 		iser_err("rdma alignment violation %d/%d aligned\n",
 			 aligned_len, mem->size);
 		iser_data_buf_dump(mem);
+
+		/* unmap the command data before accessing it */
+		iser_dma_unmap_task_data(iser_ctask);
+
 		/* allocate copy buf, if we are writing, copy the */
 		/* unaligned scatterlist, dma map the copy        */
 		if (iser_start_rdma_unaligned_sg(iser_ctask, cmd_dir) != 0)
-- 
1.2.6


From RAISCH at de.ibm.com  Wed Sep 27 07:18:23 2006
From: RAISCH at de.ibm.com (Christoph Raisch)
Date: Wed, 27 Sep 2006 16:18:23 +0200
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OFD83F631F.331BD210-ON872571F5.00777082-882571F5.0077A568@us.ibm.com>
Message-ID: <OF70336D84.1F4AFA38-ONC12571F6.004D7056-C12571F6.004E273C@de.ibm.com>

> Roland,
>
> > Do you know how ehca behaves?  Does it have that race?  ie what
> > happens in this situation:
> >
> >     poll CQ -> CQ is empty
> >         (new completion is added to CQ)
> >     request notify on CQ
> >         (no more completions are added)
> >
> > Mellanox HCAs will generate a CQ event in this case, although it's not
> > strictly required by the IB spec.  How will ehca behave?
> >
> >  - R.
>
> That could be the reason. I did see mthca poll empty entry, but not
> on ehca. I will confirm this with ehca team.
>
> Thanks
> Shirley Ma

It's possible that a race will happen between the interrupt handler, the
poll routine and the hardware.
By doing a

     poll CQ -> CQ is empty
         (new completion is added to CQ)
     request notify on CQ
         (no more completions are added)
     poll one more time

you should be on the safe side.

Gruss / Regards . . . Christoph Raisch


From moshek at voltaire.com  Wed Sep 27 07:42:41 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Wed, 27 Sep 2006 17:42:41 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64 and whendriver
 is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85DF@taurus.voltaire.com>

Michael,
 
Frank new version was tested once more in Voltaire and is working o.k. .
I tested  `./mstflint -d <lspci output> q`  when drivers are loaded and
when drivers are not loaded. in all cases it worked o.k.
 
Test was ferformed on the following environments :
 
-    IBM js21 ppc64 sles10 PCI-E
-    IBM js21 ppc64 sles9 sp3 PCI-E
-    IBM hs21 em64t redhat as 4 u3 PCI-E
-    IBM hs21 em64t sles 9 sp3 PCI-E
-    x86_64 sles10  PCI-E
-    MAC ppc64 sles10 PCI-X
-    MAC ppc64 sles10 PCI-E
 
Please consider inserting the patch to OFED .
 
Moshe
 
 
____________________________________________________________

Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)

 
Voltaire - The Grid Backbone

 
 www.voltaire.com <http://www.voltaire.com/> 

<mailto:g at voltaire.com> 

  
			-----Original Message-----
			From: Tseng-hui Lin [mailto:tsenglin at us.ibm.com]

			Sent: Monday, September 25, 2006 7:49 AM
			To: Moshe Kazir
			Subject: Re: FW: [openib-general] Mstflint - not
working on ppc64 and whendriver is not loaded on AMD
			

			Moshe:
			
			This version should take care of the ppc64 and
x86_64 not working without mthca driver problem. I have tried on my JS21
and PowerMAC G5 with PCIe HCAs and a x86_64 blade with PCI-X HCA. My
change rewrites the mopen() function. Even though the change is not
really big, it makes the OpenFabrics mantainer Michael unconfortable ti
integrate my change into OFED. I have make it easy to disable my changes
on machines we don't test in this version. Hopefully Michael would pick
the change up this time. The attached patch is made against the mstflint
in OFED-1.1r6. I also attached the tar-gz in case you don't like to use
patch. Please try it on the x86_64 machines failed previously. Thanks.
			
			(See attached file: mstflint.patch)(See attached
file: mstflint.tgz)
			
			Tseng-Hui (Frank) Lin, PhD
			Dept 7UEA, LTC Linux OS - Yellow, eServer I/O
(Go Orange!)
			Building 902-6B007, 11501 Burnet Road, Austin,
TX 78758
			Phone: (512)838-8312 T/L: 678-8312
			FAX: (512)838-8858 T/L: 678-8358
			e-mail: tsenglin at us.ibm.com
			
			"Moshe Kazir" <moshek at voltaire.com>
			
			
					"Moshe Kazir"
<moshek at voltaire.com> 

					09/19/2006 01:28 AM

	
	To
	
Tseng-hui Lin/Austin/IBM at IBMUS	
	 
	cc
		
	 
	Subject
	
FW: [openib-general] Mstflint - not working on ppc64 and whendriver is
not loaded on AMD	
		 	
			
			Frank,
			
			1. I sent you the last OFED 1.0 mstflint and as
I know it did not
			changed.
			
			2. You may download the last OFED 1.1 release
->OFED-1.1-RC5 (see
			attached message) . The most update mstflint
directory is located in
			SOURCE/openib-1.1.tgz 
			
			Moshe
	
____________________________________________________________
			Moshe Katzir   |  +972-9971-8639 (o)   |
+972-52-860-6042  (m)
			
			Voltaire - The Grid Backbone
			
			www.voltaire.com
			
			
			-----Original Message-----
			From: Michael S. Tsirkin [
mailto:mst at mellanox.co.il <mailto:mst at mellanox.co.il> ] 
			Sent: Monday, September 18, 2006 8:22 PM
			To: Tseng-Hui (Frank) Lin
			Cc: Moshe Kazir; Or Gerlitz; Tseng-hui Lin;
openib-general at openib.org
			Subject: Re: [openib-general] Mstflint - not
working on ppc64 and
			whendriver is not loaded on AMD
			
			
			Quoting r. Tseng-Hui (Frank) Lin
<thlin at us.ibm.com>:
			>     You mentioned "your version" of mstflint.
Is that a different one 
			> from the one in OFED-1.0? If it is, would you
mind sending me a copy 
			> of your version so that I can play with it as
well? Thanks.
			
			Jut the one in svn trunk/OFED 1.1 RC.
			
			-- 
			MST 
			
			----- Message from <vlad at dev.mellanox.co.il> on
Thu, 14 Sep 2006 19:39:16 +0300 -----
	
	To:
	
	<openfabrics-ewg at openib.org>
	
	cc:
	
	<openib-general at openib.org>
	
	Subject:
	
	[openfabrics-ewg] OFED-1.1-RC5 is ready
			Hi,
			
			OFED-1.1-rc5 is available on
			
https://openib.org/svn/gen2/branches/1.1/ofed/releases/
<https://openib.org/svn/gen2/branches/1.1/ofed/releases/> 
			File: OFED-1.1-rc5.tgz
			Please report any issues in bugzilla 
http://openib.org/bugzilla/ <http://openib.org/bugzilla/> 
			
			
			Release details:
			================
			Build_id:
			
			OFED-1.1-rc5
			
			openib-1.1 (REV=9485)
			# User space
			
https://openib.org/svn/gen2/branches/1.1/src/userspace
<https://openib.org/svn/gen2/branches/1.1/src/userspace> 
			Git:
git://www.mellanox.co.il/~git/infinibandref: refs/heads/ofed_1_1
			commit 18c1cb87c4b16f1a1577807077bbdcba3f446f09
			
			# MPI
			mpi_osu-0.9.7-mlx2.2.0.tgz
			openmpi-1.1.1-1.src.rpm
			mpitests-2.0-0.src.rpm
			
			OS support:
			===========
			Novell:
			   - SLES 9.0 SP3
			   - SLES10
			Redhat:
			   - Redhat EL4 up3
			
			   - Redhat EL4 up4
			kernel.org:
			   - Kernel 2.6.17
			
			
			Bug fixes from OFED-1.1-rc4:
			==========================
			1. ISER compilation fixed on SLES10
			2. Fixed build on SLES9 PPC64
			3. Updated libehca
			4. OpenSM fixes
			5. Added tavor_quirk option to rdma_cm module
(disabled by default): Tavor
			performance quirk: limit MTU to 1K if > 0 (int)
			
			Known issues:
			=============
			libipathverbs compilation fails on SLES10
(Bug:204)
			
			
			OFED-1.1-rc6 (hopefully the last one) planned to
be released on Monday or
			Tuesday.
			
			
			Regards,
			Vladimir
			
			
			> Hi,
			>
			> The plan is to issue OFED RC5 on Thursday 9/14
and final release next
			> week. I am aware of the  following issues:
			>
			>
			> 1) Compilation on SLES9 on PPC     - Jack
Morgenstein
			> 2) Huge pages on PPC                      -
Eli Cohen
			> 3) libipathverbs:
- Qlogic
			>             a) libipathverbs ABI issue
			>             b) libipathverbs build on SLES10
			> 4) SDP performance on Tavor           -
Michael Tsirkin
			> 5) iSER issue on SLES10                   -
Voltaire
			>
			>
			> In order to meet tomorrow's RC5 release all
owners please send your
			> patches by end of today.
			>
			>
			> Regards,
			>
			>     Aviram
			>
			>
_______________________________________________
			> openfabrics-ewg mailing list
			> openfabrics-ewg at openib.org
			> 
http://openib.org/mailman/listinfo/openfabrics-ewg
<http://openib.org/mailman/listinfo/openfabrics-ewg> 
			>
			
			
			_______________________________________________
			openfabrics-ewg mailing list
			openfabrics-ewg at openib.org
			
http://openib.org/mailman/listinfo/openfabrics-ewg
<http://openib.org/mailman/listinfo/openfabrics-ewg> 
			
			
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/d8d1a026/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mstflint.patch
Type: application/octet-stream
Size: 6903 bytes
Desc: mstflint.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/d8d1a026/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mstflint.tgz
Type: application/octet-stream
Size: 47588 bytes
Desc: mstflint.tgz
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/d8d1a026/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20576761.gif
Type: image/gif
Size: 105 bytes
Desc: 20576761.gif
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/d8d1a026/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20283974.gif
Type: image/gif
Size: 45 bytes
Desc: 20283974.gif
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/d8d1a026/attachment-0001.gif>

From vuhuong at mellanox.com  Wed Sep 27 08:16:37 2006
From: vuhuong at mellanox.com (Vu Pham)
Date: Wed, 27 Sep 2006 08:16:37 -0700
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <20060927071059.GA21509@mellanox.co.il>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il>
Message-ID: <451A95D5.7060409@mellanox.com>

Michael S. Tsirkin wrote:
> Quoting r. Vu Pham <vuhuong at mellanox.com>:
> 
>>Either you can use multiple channels or derive different 
>>initiator_port_ID in the login req to have multiple paths on 
>>the same physical port
> 
> 
> So how about we just stick a pointer inside the indentifier extension
> instead of enabling multichannel?
> 

That's the simple change. Beside that you have to maintain a 
list of connections/channels connected to the same target, 
to manage/clean-up resource associated with these 
connections, how to handle error recovery especially target 
reset and host reset...

What is the advantage to have multiple connections/qps on 
the same physical port to the same target? The disavantages 
are wasting resources, instability, no fail-over on physical 
port error...


From mst at mellanox.co.il  Wed Sep 27 08:19:16 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Wed, 27 Sep 2006 18:19:16 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64 and
 whendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85DF@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85DF@taurus.voltaire.com>
Message-ID: <20060927151916.GB26351@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: FW: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD
> 
> Michael,
>  
> Frank new version was tested once more in Voltaire and is working o.k. .
> I tested  `./mstflint -d <lspci output> q`  when drivers are loaded and when drivers are not loaded. in all cases it worked o.k.

Thanks for testing, but I'd like to get a handle on what's going on first.

First, I'm pretty sure when driver is loaded things work OK on all systems.
When driver is not loaded - could you please answer whether using
/sys/bus/pci/devices/0000\:03\:00.0/resource0
works for you (on systems that have resource0)?

>  
> Test was ferformed on the following environments :
>  
> -    IBM js21 ppc64 sles10 PCI-E
> -    IBM js21 ppc64 sles9 sp3 PCI-E
> -    IBM hs21 em64t redhat as 4 u3 PCI-E
> -    IBM hs21 em64t sles 9 sp3 PCI-E
> -    x86_64 sles10  PCI-E
> -    MAC ppc64 sles10 PCI-X
> -    MAC ppc64 sles10 PCI-E
>
> Please consider inserting the patch to OFED .
>  
> Moshe

Since I don't consider this a critical fix (there's no reason driver won't go
up, and if it does not, there's a simple workaround by specifying the /proc
interface, that is slower but works), I don't think this should go into OFED 1.1.

Unfortunately, I never got a small bugfix patch against the latest mstflint -
the patch I saw posted touches all kind of things all over the code -
so I can't insert it in trunk, either.

-- 
MST


From michaelc at cs.wisc.edu  Wed Sep 27 09:13:50 2006
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 27 Sep 2006 11:13:50 -0500
Subject: [openib-general] [PATCH 1/3] IB/iser: have iSER data
 transaction object pointing to iSER conn
In-Reply-To: <Pine.LNX.4.44.0609271521540.20024-100000@hydrus>
References: <Pine.LNX.4.44.0609271521540.20024-100000@hydrus>
Message-ID: <451AA33E.2050009@cs.wisc.edu>

Erez Zilber wrote:
> iSER uses a data transaction object (struct iser_dto) as part
> of its IB data descriptors (struct iser_desc) management.
> It also uses a hierarchy of connection structures pointing to
> each other. A DTO may exist even after the iscsi_iser connection
> pointed by it is destructed (eg one that is bounded to post
> receive buffer which was flushed by the IB HW). Hence DTOs need
> point to the lowest connection, which is struct iser_conn.
> 
> Signed-off-by: Erez Zilber <erezz at voltaire.com>
> 

Both look fine to me.

One question not really related to your patches. How much work would you
guys have to do to iscsi_iser to support bi directional commands?


From vuhuong at mellanox.com  Wed Sep 27 09:21:01 2006
From: vuhuong at mellanox.com (Vu Pham)
Date: Wed, 27 Sep 2006 09:21:01 -0700
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <451A95D5.7060409@mellanox.com>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il>
	<451A95D5.7060409@mellanox.com>
Message-ID: <451AA4ED.7010501@mellanox.com>

Vu Pham wrote:
> Michael S. Tsirkin wrote:
> 
>>Quoting r. Vu Pham <vuhuong at mellanox.com>:
>>
>>
>>>Either you can use multiple channels or derive different 
>>>initiator_port_ID in the login req to have multiple paths on 
>>>the same physical port
>>
>>
>>So how about we just stick a pointer inside the indentifier extension
>>instead of enabling multichannel?
>>
> 
> 
> That's the simple change. Beside that you have to maintain a 
> list of connections/channels connected to the same target, 
> to manage/clean-up resource associated with these 
> connections, how to handle error recovery especially target 
> reset and host reset...
> 
> What is the advantage to have multiple connections/qps on 
> the same physical port to the same target? The disavantages 
> are wasting resources, instability, no fail-over on physical 
> port error...
> 

I see the limitation of current srp implementation. If we 
have the following topoloty
host port 1 -- target port 1
host port 1 -- target port 2

the current srp implementation will use the same 
initiator_port_id for both login requests and the target 
will reject the second login if you don't turn on 
SUPPORT_MULTI_CHANNEL

Another way to solve this is to use different 
initiator_port_id for the logins ie.

path 1: initiator_port_ID{target_port1_GUID, 
initiator_port1_GUID} and target_port_ID{id_ext, ioc_guid}

path 2: 
initiator_port_ID{target_port2_GUID,initiator_port1_GUID} 
and target_port_ID

This also will guarantee the uniqueness of initiator_port_id 
in the fabric


From ishai at dev.mellanox.co.il  Wed Sep 27 09:42:33 2006
From: ishai at dev.mellanox.co.il (ishai at dev.mellanox.co.il)
Date: Wed, 27 Sep 2006 19:42:33 +0300 (IDT)
Subject: [openib-general] 90-ib.rules incorrect?
In-Reply-To: <eefa625e0609261632y153042ecq587b0fae8a05c45f@mail.gmail.com>
References: <eefa625e0609261632y153042ecq587b0fae8a05c45f@mail.gmail.com>
Message-ID: <16725.194.90.237.34.1159375353.squirrel@dev.mellanox.co.il>

In early versions of udev the syntax was different. The syntax used (=)
and not (==).
RHEL4 for example is still using such old version of udev.

Apparently the new udev versions (used for example in SLES10) still
supports the old syntax.

So this way we can have one file that suits both udev versions.

Ishai


> Isn't the format of 90-ib.rules in
> https://openfabrics.org/svn/gen2/trunk/ofed/openib/scripts/90-ib.rulesincorrect.
>
> We have
>
> KERNEL="umad*", NAME="infiniband/%k", which should be
> KERNEL=="umad*", NAME="infiniband/%k"
>
> Am I missing something?
>
> Eugene
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general


From thlin at us.ibm.com  Wed Sep 27 09:45:59 2006
From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin)
Date: Wed, 27 Sep 2006 11:45:59 -0500
Subject: [openib-general] FW: Mstflint - not working on ppc64 and
 whendriver is not loaded on AMD
In-Reply-To: <20060927151916.GB26351@mellanox.co.il>
References: <D4F8F0B3820E754C887699BEF26A8940EB85DF@taurus.voltaire.com>
	<20060927151916.GB26351@mellanox.co.il>
Message-ID: <1159375559.21249.60.camel@flin.austin.ibm.com>

On Wed, 2006-09-27 at 18:19 +0300, Michael S. Tsirkin wrote:
> Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > Subject: FW: [openib-general] Mstflint - not working on ppc64 and whendriver is not loaded on AMD
> > 
> > Michael,
> >  
> > Frank new version was tested once more in Voltaire and is working o.k. .
> > I tested  `./mstflint -d <lspci output> q`  when drivers are loaded and when drivers are not loaded. in all cases it worked o.k.
> 
> Thanks for testing, but I'd like to get a handle on what's going on first.
> 
> First, I'm pretty sure when driver is loaded things work OK on all systems.
> When driver is not loaded - could you please answer whether using
> /sys/bus/pci/devices/0000\:03\:00.0/resource0
> works for you (on systems that have resource0)?
> 

It doesn't work.

> >  
> > Test was ferformed on the following environments :
> >  
> > -    IBM js21 ppc64 sles10 PCI-E
> > -    IBM js21 ppc64 sles9 sp3 PCI-E
> > -    IBM hs21 em64t redhat as 4 u3 PCI-E
> > -    IBM hs21 em64t sles 9 sp3 PCI-E
> > -    x86_64 sles10  PCI-E
> > -    MAC ppc64 sles10 PCI-X
> > -    MAC ppc64 sles10 PCI-E
> >
> > Please consider inserting the patch to OFED .
> >  
> > Moshe
> 
> Since I don't consider this a critical fix (there's no reason driver won't go
> up, and if it does not, there's a simple workaround by specifying the /proc
> interface, that is slower but works), I don't think this should go into OFED 1.1.
> 
> Unfortunately, I never got a small bugfix patch against the latest mstflint -
> the patch I saw posted touches all kind of things all over the code -
> so I can't insert it in trunk, either.
> 

I agree this is not critical. The patch changes nothing but the way of
opening the device.

On some ppc64 and x86_64 machines, the I/O memory mapped by mmap() is
not accessable (return 0xFFFFFFFF) unless the kernel code (usually the
device driver) does an ioremap. This is why mmap resource0 does not work
on these machines. There is no way I am aware of can do ioremap from
user space code like mstflint. The only thing I can think of is to fall
back to use the config space file in /proc/bus/pci/.

The (big) patch I made checks if the faster way (mmap resource0) works.
It it doesn't, the patch tries other slower ways and use the fastest
working way it can find. That's all the patch does. It does not make big
fix. It just save the users trouble of trying all possible ways of
opening a devices manually.

I understand applying big patch is risky unless it can be throughly
tested. Unfortunately, no one has all the machines to test the patch.
Moshe and I have tested the patch on Power MAC, Squadrons, JS20, and
JS21 (almost all living ppc64 machines) as well as a few x86_64
machines. We believe this patch is safe for these machines. The patch
can be enabled by defining CONFIG_MOPEN_FALL_BACK to 1.
CONFIG_MOPEN_FALL_BACK is defined to 1 for ppc64 and x86_64 and 0 for
others. We can enable this patch on other machines when people who have
these machines tested the patch.

I agree this is no a critical patch, but it is a useful one. Moreover,
it is well tested on the machines with the patch enabled and change
nothing on the machines with the patch disabled. I believe this is a
safe patch. Please re-consider adding it. Thanks.


From ishai at mellanox.co.il  Wed Sep 27 10:00:06 2006
From: ishai at mellanox.co.il (Ishai Rabinovitz)
Date: Wed, 27 Sep 2006 20:00:06 +0300
Subject: [openib-general] [PATCH] IB/SRP: allowing multiple connections from
 taregt to initiator
Message-ID: <20060927170006.GC32010@mellanox.co.il>


SRP High Availability should enable an initiator to connect to the same target
several times, e.g., once from each IB port of the target.

Some targets do not support multichannel. In order to work with them as well
we will use another identifier_extension to the initiator port for each target
connection.

Signed-off-by: Ishai Rabinovitz <ishai at mellanox.co.il>

---

I think this is the best solution. It allows users to use all four physical
connections from the initiator to target.

It also allows users to have several logical connections on one physical
connection (If they want connection with different attributes - for example
different max_cmd_per_lun).

It is SRP spec compliant.

I also added a module param, so it is possible to turn this option off.

Index: latest/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================
--- latest.orig/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-27 10:36:13.000000000 +0300
+++ latest/drivers/infiniband/ulp/srp/ib_srp.c	2006-09-27 16:48:12.000000000 +0300
@@ -85,6 +85,13 @@ MODULE_PARM_DESC(mellanox_workarounds,
 
 static const u8 mellanox_oui[3] = { 0x00, 0x02, 0xc9 };
 
+static int variable_identifier_extension = 1;
+
+module_param(variable_identifier_extension, int, 0444);
+MODULE_PARM_DESC(variable_identifier_extension,
+		 "Use another identifier_extension on each connection to target"
+		 ", allows multichannel connection on all targets if != 0");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_completion(struct ib_cq *cq, void *target_ptr);
@@ -329,6 +336,7 @@ static int srp_send_req(struct srp_targe
 	req->priv.req_it_iu_len = cpu_to_be32(srp_max_iu_len);
 	req->priv.req_buf_fmt 	= cpu_to_be16(SRP_BUF_FORMAT_DIRECT |
 					      SRP_BUF_FORMAT_INDIRECT);
+
 	/*
 	 * In the published SRP specification (draft rev. 16a), the 
 	 * port identifier format is 8 bytes of ID extension followed
@@ -341,13 +349,23 @@ static int srp_send_req(struct srp_targe
 	if (target->io_class == SRP_REV10_IB_IO_CLASS) {
 		memcpy(req->priv.initiator_port_id,
 		       target->srp_host->initiator_port_id + 8, 8);
-		memcpy(req->priv.initiator_port_id + 8,
-		       target->srp_host->initiator_port_id, 8);
+		if (variable_identifier_extension)
+			memcpy(req->priv.initiator_port_id + 8,
+			       &target, sizeof target);
+		else
+			memcpy(req->priv.initiator_port_id + 8,
+			       target->srp_host->initiator_port_id, 8);
 		memcpy(req->priv.target_port_id,     &target->ioc_guid, 8);
 		memcpy(req->priv.target_port_id + 8, &target->id_ext, 8);
 	} else {
-		memcpy(req->priv.initiator_port_id,
-		       target->srp_host->initiator_port_id, 16);
+		if (variable_identifier_extension)
+			memcpy(req->priv.initiator_port_id,
+			       &target, sizeof target);
+		else
+			memcpy(req->priv.initiator_port_id,
+			       target->srp_host->initiator_port_id, 8);
+		memcpy(req->priv.initiator_port_id + 8,
+		       target->srp_host->initiator_port_id + 8, 8);
 		memcpy(req->priv.target_port_id,     &target->id_ext, 8);
 		memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8);
 	}
@@ -1823,7 +1841,8 @@ static struct srp_host *srp_add_port(str
 	host->dev  = device;
 	host->port = port;
 
-	host->initiator_port_id[7] = port;
+	if (!variable_identifier_extension)
+		host->initiator_port_id[7] = port;
 	memcpy(host->initiator_port_id + 8, &device->dev->node_guid, 8);
 
 	host->class_dev.class = &srp_class;
-- 
Ishai Rabinovitz


From mshefty at ichips.intel.com  Wed Sep 27 10:04:46 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 10:04:46 -0700
Subject: [openib-general] oops after rmmod ib_cm when stopping iSER
In-Reply-To: <451A2E7E.8050504@voltaire.com>
References: <451A2E7E.8050504@voltaire.com>
Message-ID: <451AAF2E.1060602@ichips.intel.com>

Erez Zilber wrote:
> When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an oops 
> (below). In order to check which module caused that oops, I replaced the 
> 'modprobe -r' call with rmmod for each module:
> 
> rmmod ib_iser
> rmmod libiscsi
> rmmod scsi_transport_iscsi
> rmmod rdma_cm
> rmmod ib_addr
> rmmod ib_cm
> 
> If I wait a few seconds before the removal of ib_cm, everything is ok.

Thanks for the info.  My guess is that the cm_id's are not taking a reference on 
the cm devices, which is allowing the module unload to proceed while cm_id's 
still remain in timewait.  I will look at this in more detail and work on a 
patch.  How reproducible is this?

- Sean


From geneing at gmail.com  Wed Sep 27 10:48:56 2006
From: geneing at gmail.com (EI)
Date: Wed, 27 Sep 2006 10:48:56 -0700
Subject: [openib-general] 90-ib.rules incorrect?
In-Reply-To: <16725.194.90.237.34.1159375353.squirrel@dev.mellanox.co.il>
References: <eefa625e0609261632y153042ecq587b0fae8a05c45f@mail.gmail.com>
	<16725.194.90.237.34.1159375353.squirrel@dev.mellanox.co.il>
Message-ID: <eefa625e0609271048y577042efsbe2207fbd66356a0@mail.gmail.com>

Ishai,

udev in OpenSuSE 10.2 alpha gives an error with the current rules file that
are using (=).

Eugene

On 9/27/06, ishai at dev.mellanox.co.il <ishai at dev.mellanox.co.il> wrote:
>
> In early versions of udev the syntax was different. The syntax used (=)
> and not (==).
> RHEL4 for example is still using such old version of udev.
>
> Apparently the new udev versions (used for example in SLES10) still
> supports the old syntax.
>
> So this way we can have one file that suits both udev versions.
>
> Ishai
>
>
> > Isn't the format of 90-ib.rules in
> >
> https://openfabrics.org/svn/gen2/trunk/ofed/openib/scripts/90-ib.rulesincorrect
> .
> >
> > We have
> >
> > KERNEL="umad*", NAME="infiniband/%k", which should be
> > KERNEL=="umad*", NAME="infiniband/%k"
> >
> > Am I missing something?
> >
> > Eugene
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/48dbd4c3/attachment.html>

From weiny2 at llnl.gov  Wed Sep 27 10:54:57 2006
From: weiny2 at llnl.gov (Ira Weiny)
Date: Wed, 27 Sep 2006 10:54:57 -0700
Subject: [openib-general] 5 new diag tools.
Message-ID: <20060927105457.7c147e0e.weiny2@llnl.gov>

The included patch is for 5 new diag tools which I have written in perl.  I
started out building this to combine the information of the other tools to give
more information about the port counters.  Information like the specific source
and destination port, link speed, etc.  In the process I found that it was also
beneficial to limit the amount of information reported as some of the lower
level diags report so much information that some people simply dismiss the
tools as useless.

I hope they are useful.

Thanks,
Ira Weiny
weiny2 at llnl.gov

-------------- next part --------------
A non-text attachment was scrubbed...
Name: new-ib-diags.patch
Type: application/octet-stream
Size: 43797 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/49185789/attachment.obj>

From bgreen at nas.nasa.gov  Wed Sep 27 10:58:48 2006
From: bgreen at nas.nasa.gov (Bryan Green)
Date: Wed, 27 Sep 2006 10:58:48 -0700
Subject: [openib-general] mvapich2-gen2 svn - vapi <--> gen2 ??
Message-ID: <200609271758.k8RHwm1h000536@ece06.nas.nasa.gov>

Hello,
Regarding mvapich2-gen2 in the openib svn,
can an mvapich2 vapi build on one machine
communicate with a gen2 build on another?

-bryan


From mshefty at ichips.intel.com  Wed Sep 27 11:02:32 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 11:02:32 -0700
Subject: [openib-general] Different byte order between gen1 CM and gen2
 CM ->RE: How to connect gen2 CM to gen1 IBGD CM?
In-Reply-To: <45097457.5020007@ichips.intel.com>
References: <B79FAF8BB536314E859EA1963CFFD222029AC450@wdtssmail01.eu.thmulti.com>
	<45097457.5020007@ichips.intel.com>
Message-ID: <451ABCB8.2010605@ichips.intel.com>

Sean Hefty wrote:
> The byte ordering in the kernel APIs are fairly clear about this, but that 
> documentation didn't carry up to userspace everywhere.  I will update the 
> userspace documentation, but it may take me a few weeks to get to this.

I've added some additional comments next to structure fields that are specified 
in network-byte order.  Hopefully this will help others avoid running into 
similar issues.

- Sean


From mshefty at ichips.intel.com  Wed Sep 27 11:12:28 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 11:12:28 -0700
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
In-Reply-To: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
Message-ID: <451ABF0C.90607@ichips.intel.com>

Sean Hefty wrote:
> Currently a DREP is only sent in response to a DREQ if a connection
> has been found matching the DREQ, and it is in the proper state.  Once
> a DREP is sent, the local connection moves into timewait.  Duplicate
> DREQs received while in this state result in re-sending the DREP.
> 
> However, it's likely that the local connection will enter and exit
> timewait before the remote side times out a lost DREP and resends a DREQ.
> There are a couple possible solutions to this.  One is to increase how
> long a connection remains in timewait, by multiplying its wait time by
> max_cm_retries.  This can greatly increase the timewait state before a QP
> can be re-used when CM messages are not lost.
> 
> An alternative is to send a DREP in response to a DREQ, even if a local
> connection is not found, which is what this patch does.

If there are no objections, I will commit this patch to svn, and submit for 
inclusion upstream.

- Sean


From robert.j.woodruff at intel.com  Wed Sep 27 11:37:35 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Wed, 27 Sep 2006 11:37:35 -0700
Subject: [openib-general] [openfabrics-ewg] OFED Status
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBBF25A@orsmsx418.amr.corp.intel.com>

Aviram wrote,
>Pending that IPoIB HA is solved would like to issue RC7 that suppose to

>be final. Is everyone OK with this approach?


>Aviram

Sounds good,

What is the target date for RC7 ?  


From mshefty at ichips.intel.com  Wed Sep 27 11:44:22 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 11:44:22 -0700
Subject: [openib-general] [PATCH] ucma : Encapsulate duplicate code to
 common routine
In-Reply-To: <20060919070206.5476.64107.sendpatchset@localhost.localdomain>
References: <20060919070206.5476.64107.sendpatchset@localhost.localdomain>
Message-ID: <451AC686.4040703@ichips.intel.com>

Krishna Kumar wrote:
> Encapsulate duplicate code to common routine - avoid checking same
> errors in multiple places.

I went back and forth on this, but ended up committing it, since it does 
slightly simplify maintenance.

- Sean


From mshefty at ichips.intel.com  Wed Sep 27 11:48:03 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 11:48:03 -0700
Subject: [openib-general] [PATCH] id_priv_list->list is not initialized
 sometimes
In-Reply-To: <20060919070210.5476.68607.sendpatchset@localhost.localdomain>
References: <20060919070210.5476.68607.sendpatchset@localhost.localdomain>
Message-ID: <451AC763.208@ichips.intel.com>

Krishna Kumar wrote:
> rdma_listen could be called from a context where id_priv->list
> is not initialized. Then at a later stage, a cma_cancel_listen
> does a list_del() which could oops since this element is not
> on any list. 
> 
> Eg, in rdma_listen(), if id->device is !NULL, it calls
> cma_ib_listen() which doesn't add this id to any list. A
> cma_cancel_listen() will do a list_del.

I don't think this is needed.  cma_cancel_listens() is only called if the id is 
listening across multiple devices (and id->device is NULL).  See 
cma_cancel_operation().

- Sean


From xma at us.ibm.com  Wed Sep 27 12:34:05 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 27 Sep 2006 12:34:05 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF70336D84.1F4AFA38-ONC12571F6.004D7056-C12571F6.004E273C@de.ibm.com>
Message-ID: <OF327E628E.E2603118-ON872571F6.006B5266-882571F6.0052347D@us.ibm.com>


I have created a patch to monitor CQ. That wasn't the reason for
performance drop. I couldn't see any race from the output.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060927/1dbdbc5a/attachment.html>

From rdreier at cisco.com  Wed Sep 27 13:53:04 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Sep 2006 13:53:04 -0700
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <451AA4ED.7010501@mellanox.com> (Vu Pham's message of
	"Wed, 27 Sep 2006 09:21:01 -0700")
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il>
	<451A95D5.7060409@mellanox.com> <451AA4ED.7010501@mellanox.com>
Message-ID: <adar6xx821b.fsf@cisco.com>

Maybe we should just use the port GUID instead of the node GUID to
form the initiator ID?  That would solve this pretty cleanly I think.


From mshefty at ichips.intel.com  Wed Sep 27 14:02:36 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 14:02:36 -0700
Subject: [openib-general] [PATCH] Fix freed mem deref race in
 cma_process_remove/cma_req_handler
In-Reply-To: <20060918073545.26067.41763.sendpatchset@localhost.localdomain>
References: <20060918073545.26067.41763.sendpatchset@localhost.localdomain>
Message-ID: <451AE6EC.3000301@ichips.intel.com>

Good catch.  Thanks - committed.

- Sean


From ishai at dev.mellanox.co.il  Wed Sep 27 14:08:11 2006
From: ishai at dev.mellanox.co.il (Ishai Rabinovitz)
Date: Thu, 28 Sep 2006 00:08:11 +0300 (IDT)
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <adar6xx821b.fsf@cisco.com>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il>
	<451A95D5.7060409@mellanox.com> <451AA4ED.7010501@mellanox.com>
	<adar6xx821b.fsf@cisco.com>
Message-ID: <1106.89.1.173.135.1159391291.squirrel@dev.mellanox.co.il>

Roland Dreier wrote:
> Maybe we should just use the port GUID instead of the node GUID to
> form the initiator ID?  That would solve this pretty cleanly I think.


This is also Vu's idea.

There are two issues:

1) My patch allows a sophisticated user to have two logical connections on
the same physical solution. He can have different connection parameters
(e.g., MAX_CMD_PER_LUN) according to the application needs.
 Do you think there is such need?

2) In the current implementation there is a problem when there are two
connections on the same physical connection - when the second connection
sends REQ to the target, the target sends a DREQ to the first connection,
but when someone tries to access the first scsi_host, ib_srp tries to
reconnect the first connection and then the second connection gets a DREQ
- and so the ping pong goes.
And if there is a multipath daemon that checks the status of the
connections this ping pong can be for ever.
We need to find a way to eliminate this behavior.

Ishai


From mshefty at ichips.intel.com  Wed Sep 27 14:10:21 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 14:10:21 -0700
Subject: [openib-general] [PATCH] fix cma_leave_mc_groups
In-Reply-To: <20060919070203.5476.17650.sendpatchset@localhost.localdomain>
References: <20060919070203.5476.17650.sendpatchset@localhost.localdomain>
Message-ID: <451AE8BD.9050203@ichips.intel.com>

Krishna Kumar wrote:
> - cma_leave_mc_groups can race with other routines updating
>   or reading the mclist, so use lock. Eg while doing a
>   rdma_destroy_id(), other processes could be looking at
>   this id and de-referencing mclist.

I don't think that there's an issue here.

The mc_list is only accessed by other direct API calls.  For example, 
rdma_join_multicast() or rdma_leave_multicast().  A user cannot call 
rdma_destroy_id() with other API calls.

- Sean


From rdreier at cisco.com  Wed Sep 27 14:20:21 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Sep 2006 14:20:21 -0700
Subject: [openib-general] backporting fixes
In-Reply-To: <20060927040745.GI24009@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 27 Sep 2006 07:07:45 +0300")
References: <20060927040745.GI24009@mellanox.co.il>
Message-ID: <adamz8l80ru.fsf@cisco.com>

 > Now that  2.6.18 (with an additional patch) I looked at backporting bugfixes to
 > older kernels.  The main problem I see is that the neighbour destructor
 > interface change is not in 2.6.16, so IPoIB crashes randomly.
 > 
 > So approaches are
 > - Try to push the change into 2.6.16 by netdev
 > - Use the all-neighbour list as done by ofed
 > - Abandon the whole project
 > 
 > Ideas?

Unfortunately I don't think this bug is very amenable to being fixed
in a 2.6.16/-stable tree.  So the third solution is probably the best
we can do at this point.

 - R.


From rdreier at cisco.com  Wed Sep 27 14:24:13 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Sep 2006 14:24:13 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159300894.11549.11.camel@stevo-desktop> (Steve Wise's
	message of "Tue, 26 Sep 2006 15:01:34 -0500")
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop>
Message-ID: <adairj980le.fsf@cisco.com>

Do we have to keep the kernel modules in svn limping along?  As time
goes on, I have less and less patience for double maintenance.

Oh well, since you provided the patch I'll apply it.

 - R.


From mshefty at ichips.intel.com  Wed Sep 27 14:39:57 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Wed, 27 Sep 2006 14:39:57 -0700
Subject: [openib-general] RDMA CM callback status
In-Reply-To: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com>
References: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com>
Message-ID: <451AEFAD.4000708@ichips.intel.com>

Sean Hefty wrote:
>>1. Should I even be looking at event->status or does the event type tell me
>>  everything I need to know?  I've had a report that the assertion
>>  (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR.
> 
> It sounds like (and looks like from reading the code) that you've hit a bug with
> the ROUTE_ERROR event.  The failure status isn't being propagated up to the
> user.

I've committed a patch to svn which will set the event status correctly when a 
route error occurs.

- Sean


From bos at pathscale.com  Wed Sep 27 14:46:18 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Wed, 27 Sep 2006 14:46:18 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adairj980le.fsf@cisco.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
Message-ID: <1159393578.21086.16.camel@chalcedony.pathscale.com>

On Wed, 2006-09-27 at 14:24 -0700, Roland Dreier wrote:
> Do we have to keep the kernel modules in svn limping along?  As time
> goes on, I have less and less patience for double maintenance.

I'm still all in favour of nuking them...

	<b


From rdreier at cisco.com  Wed Sep 27 14:47:02 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Sep 2006 14:47:02 -0700
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <20060926135114.1da96c1b@freekitty> (Stephen Hemminger's
	message of "Tue, 26 Sep 2006 13:51:14 -0700")
References: <20060926135114.1da96c1b@freekitty>
Message-ID: <adaac4l7zjd.fsf@cisco.com>

OK, this is what I just came up with to fix these.

Look OK to you Tom?

diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c b/drivers/infiniband/hw/amso1100/c2_ae.c
index 08f46c8..3aae497 100644
--- a/drivers/infiniband/hw/amso1100/c2_ae.c
+++ b/drivers/infiniband/hw/amso1100/c2_ae.c
@@ -197,7 +197,7 @@ void c2_ae_event(struct c2_dev *c2dev, u
 			"resource=%x, qp_state=%s\n",
 			__FUNCTION__,
 			to_event_str(event_id),
-			be64_to_cpu(wr->ae.ae_generic.user_context),
+			(unsigned long long) be64_to_cpu(wr->ae.ae_generic.user_context),
 			be32_to_cpu(wr->ae.ae_generic.resource_type),
 			be32_to_cpu(wr->ae.ae_generic.resource),
 			to_qp_state_str(be32_to_cpu(wr->ae.ae_generic.qp_state)));
diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c b/drivers/infiniband/hw/amso1100/c2_alloc.c
index 1d25299..028a60b 100644
--- a/drivers/infiniband/hw/amso1100/c2_alloc.c
+++ b/drivers/infiniband/hw/amso1100/c2_alloc.c
@@ -115,7 +115,7 @@ u16 *c2_alloc_mqsp(struct c2_dev *c2dev,
 			    ((unsigned long) &(head->shared_ptr[mqsp]) -
 			     (unsigned long) head);
 		pr_debug("%s addr %p dma_addr %llx\n", __FUNCTION__,
-			 &(head->shared_ptr[mqsp]), (u64)*dma_addr);
+			 &(head->shared_ptr[mqsp]), (unsigned long long) *dma_addr);
 		return &(head->shared_ptr[mqsp]);
 	}
 	return NULL;
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index dd6af55..622d6f1 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -397,7 +397,9 @@ static struct ib_mr *c2_reg_phys_mr(stru
 	pr_debug("%s - page shift %d, pbl_depth %d, total_len %u, "
 		"*iova_start %llx, first pa %llx, last pa %llx\n",
 		__FUNCTION__, page_shift, pbl_depth, total_len,
-		*iova_start, page_list[0], page_list[pbl_depth-1]);
+		(unsigned long long) *iova_start,
+	       	(unsigned long long) page_list[0],
+	       	(unsigned long long) page_list[pbl_depth-1]);
   	err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list,
  					 (1 << page_shift), pbl_depth,
 					 total_len, 0, iova_start,
diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c
index f49a32b..e37c568 100644
--- a/drivers/infiniband/hw/amso1100/c2_rnic.c
+++ b/drivers/infiniband/hw/amso1100/c2_rnic.c
@@ -527,7 +527,7 @@ int c2_rnic_init(struct c2_dev *c2dev)
 				      		DMA_FROM_DEVICE);
 	pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma);
 	pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages,
-		 (u64)c2dev->rep_vq.host_dma);
+		 (unsigned long long) c2dev->rep_vq.host_dma);
 	c2_mq_rep_init(&c2dev->rep_vq,
 		   1,
 		   qsize,
@@ -550,7 +550,7 @@ int c2_rnic_init(struct c2_dev *c2dev)
 				      		DMA_FROM_DEVICE);
 	pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma);
 	pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages,
-		 (u64)c2dev->rep_vq.host_dma);
+		 (unsigned long long) c2dev->rep_vq.host_dma);
 	c2_mq_rep_init(&c2dev->aeq,
 		       2,
 		       qsize,


From rdreier at cisco.com  Wed Sep 27 14:52:29 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Sep 2006 14:52:29 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060927062822.GQ24009@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 27 Sep 2006 09:28:22 +0300")
References: <OF16859FE9.35787750-ON872571F6.00214B1B-882571F6.000836F6@us.ibm.com>
	<20060927062822.GQ24009@mellanox.co.il>
Message-ID: <ada3bad7zaa.fsf@cisco.com>

    Shirley> I forgot to mention these NAPI parameters should be
    Shirley> tunable for different device drivers, like dev->weight,
    Shirley> or set up in lower driver.

    Michael> So we need something like poll_weight in struct
    Michael> ib_device, to give a hint on how expensive an interrupt
    Michael> is versus poll?  Seems to make sense, and actually might
    Michael> be useful for other ULPs.  Roland, what do you think?

How could a low-level driver possibly know the cost of an interrupt vs
polling a CQ?  It depends on the particular CPU/cache/chipset details
of the system and it might not even be the same from one PCI slot to
another.

If this value makes a real difference in practice, we can make it
tunable but I would like to see some hard benchmarks that show it
making a big difference one way or another.  But we have too many
knobs as it is so I'm inclined to just pick a value that works OK.

 - R.


From rdreier at cisco.com  Wed Sep 27 14:54:05 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 27 Sep 2006 14:54:05 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060927062316.GO24009@mellanox.co.il> (Michael S.
	Tsirkin's message of "Wed, 27 Sep 2006 09:23:16 +0300")
References: <OF0ECEEC20.3A8E1746-ON872571F6.001FCAC6-882571F6.00073B6D@us.ibm.com>
	<20060927062316.GO24009@mellanox.co.il>
Message-ID: <aday7s56kn6.fsf@cisco.com>

    Michael> Maybe we should just assign EQs to CQs in a round-robin
    Michael> fashion for now, and just hope typical use allocates CQs
    Michael> sequentially.  Worst case, we are back to where we are
    Michael> now, performance-wise.  Roland, how does this sound?

I think what we should do is follow the IB verbs extensions and expose
multiple CQ event vectors, and let the consumer pick which one to use
when creating a CQ.  If IPoIB wants to go round robin itself, that
would be fine.

This is what I tried to set the userspace API up for.  Nothing in
userspace would have to change for this -- the kernel just needs to
add multiple EQ support.

 - R.


From tom at opengridcomputing.com  Wed Sep 27 20:57:03 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Wed, 27 Sep 2006 22:57:03 -0500
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <adaac4l7zjd.fsf@cisco.com>
Message-ID: <C140B23F.A4DB%tom@opengridcomputing.com>

This all looks good to me.

Thanks,
Tom


On 9/27/06 4:47 PM, "Roland Dreier" <rdreier at cisco.com> wrote:

> OK, this is what I just came up with to fix these.
> 
> Look OK to you Tom?
> 
> diff --git a/drivers/infiniband/hw/amso1100/c2_ae.c
> b/drivers/infiniband/hw/amso1100/c2_ae.c
> index 08f46c8..3aae497 100644
> --- a/drivers/infiniband/hw/amso1100/c2_ae.c
> +++ b/drivers/infiniband/hw/amso1100/c2_ae.c
> @@ -197,7 +197,7 @@ void c2_ae_event(struct c2_dev *c2dev, u
> "resource=%x, qp_state=%s\n",
> __FUNCTION__,
> to_event_str(event_id),
> -   be64_to_cpu(wr->ae.ae_generic.user_context),
> +   (unsigned long long) be64_to_cpu(wr->ae.ae_generic.user_context),
> be32_to_cpu(wr->ae.ae_generic.resource_type),
> be32_to_cpu(wr->ae.ae_generic.resource),
> to_qp_state_str(be32_to_cpu(wr->ae.ae_generic.qp_state)));
> diff --git a/drivers/infiniband/hw/amso1100/c2_alloc.c
> b/drivers/infiniband/hw/amso1100/c2_alloc.c
> index 1d25299..028a60b 100644
> --- a/drivers/infiniband/hw/amso1100/c2_alloc.c
> +++ b/drivers/infiniband/hw/amso1100/c2_alloc.c
> @@ -115,7 +115,7 @@ u16 *c2_alloc_mqsp(struct c2_dev *c2dev,
>    ((unsigned long) &(head->shared_ptr[mqsp]) -
>     (unsigned long) head);
> pr_debug("%s addr %p dma_addr %llx\n", __FUNCTION__,
> -    &(head->shared_ptr[mqsp]), (u64)*dma_addr);
> +    &(head->shared_ptr[mqsp]), (unsigned long long) *dma_addr);
> return &(head->shared_ptr[mqsp]);
> }
> return NULL;
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c
> b/drivers/infiniband/hw/amso1100/c2_provider.c
> index dd6af55..622d6f1 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -397,7 +397,9 @@ static struct ib_mr *c2_reg_phys_mr(stru
> pr_debug("%s - page shift %d, pbl_depth %d, total_len %u, "
> "*iova_start %llx, first pa %llx, last pa %llx\n",
> __FUNCTION__, page_shift, pbl_depth, total_len,
> -  *iova_start, page_list[0], page_list[pbl_depth-1]);
> +  (unsigned long long) *iova_start,
> +         (unsigned long long) page_list[0],
> +         (unsigned long long) page_list[pbl_depth-1]);
> err = c2_nsmr_register_phys_kern(to_c2dev(ib_pd->device), page_list,
> (1 << page_shift), pbl_depth,
> total_len, 0, iova_start,
> diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c
> b/drivers/infiniband/hw/amso1100/c2_rnic.c
> index f49a32b..e37c568 100644
> --- a/drivers/infiniband/hw/amso1100/c2_rnic.c
> +++ b/drivers/infiniband/hw/amso1100/c2_rnic.c
> @@ -527,7 +527,7 @@ int c2_rnic_init(struct c2_dev *c2dev)
> DMA_FROM_DEVICE);
> pci_unmap_addr_set(&c2dev->rep_vq, mapping, c2dev->rep_vq.host_dma);
> pr_debug("%s rep_vq va %p dma %llx\n", __FUNCTION__, q1_pages,
> -   (u64)c2dev->rep_vq.host_dma);
> +   (unsigned long long) c2dev->rep_vq.host_dma);
> c2_mq_rep_init(&c2dev->rep_vq,
>   1,
>   qsize,
> @@ -550,7 +550,7 @@ int c2_rnic_init(struct c2_dev *c2dev)
> DMA_FROM_DEVICE);
> pci_unmap_addr_set(&c2dev->aeq, mapping, c2dev->aeq.host_dma);
> pr_debug("%s aeq va %p dma %llx\n", __FUNCTION__, q1_pages,
> -   (u64)c2dev->rep_vq.host_dma);
> +   (unsigned long long) c2dev->rep_vq.host_dma);
> c2_mq_rep_init(&c2dev->aeq,
>       2,
>       qsize,


From eitan at mellanox.co.il  Wed Sep 27 21:40:11 2006
From: eitan at mellanox.co.il (Eitan Zahavi)
Date: 28 Sep 2006 07:40:11 +0300
Subject: [openib-general] [PATCH] osm_vendor_mlx_sa.c - missing status on
 timeout SA query
Message-ID: <868xk4zjro.fsf@mtl066.yok.mtl.com>

Hi Hal

Similar to the bug discovered by Yevgeny on the osm_vendor_ibumad_sa.c
the very same bug happens on osm_vendor_mlx_sa.c which fails osmtest.
The issue is that the status of the result of the query is not returned 
as the result of the SA query.

Eitan

Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Index: libvendor/osm_vendor_mlx_sa.c
===================================================================
--- libvendor/osm_vendor_mlx_sa.c	(revision 9642)
+++ libvendor/osm_vendor_mlx_sa.c	(working copy)
@@ -219,7 +219,8 @@ __osmv_sa_mad_err_cb(
 
   query_res.status = IB_TIMEOUT;
   query_res.result_cnt = 0;
-
+  query_res.p_result_madw->status = IB_TIMEOUT;
+  p_madw->status = IB_TIMEOUT;
   query_res.query_type = p_query_req_copy->query_type;
 
   p_query_req_copy->pfn_query_cb( &query_res );
@@ -611,6 +612,7 @@ __osmv_send_sa_req(
              "Waiting for async event.\n" );
     cl_event_wait_on( &p_bind->sync_event, EVENT_NO_TIMEOUT, FALSE );
     cl_event_reset(&p_bind->sync_event);
+    status = p_madw->status;
   }
 
  Exit:


From erezz at voltaire.com  Wed Sep 27 22:25:09 2006
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 28 Sep 2006 08:25:09 +0300
Subject: [openib-general] oops after rmmod ib_cm when stopping iSER
In-Reply-To: <451AAF2E.1060602@ichips.intel.com>
References: <451A2E7E.8050504@voltaire.com> <451AAF2E.1060602@ichips.intel.com>
Message-ID: <451B5CB5.8090407@voltaire.com>

Sean Hefty wrote:
> Erez Zilber wrote:
>> When stopping iSER, we run 'modprobe -r ib_iser'. Then, we see an 
>> oops (below). In order to check which module caused that oops, I 
>> replaced the 'modprobe -r' call with rmmod for each module:
>>
>> rmmod ib_iser
>> rmmod libiscsi
>> rmmod scsi_transport_iscsi
>> rmmod rdma_cm
>> rmmod ib_addr
>> rmmod ib_cm
>>
>> If I wait a few seconds before the removal of ib_cm, everything is ok.
>
> Thanks for the info.  My guess is that the cm_id's are not taking a 
> reference on the cm devices, which is allowing the module unload to 
> proceed while cm_id's still remain in timewait.  I will look at this 
> in more detail and work on a patch.  How reproducible is this?
>
> - Sean
100% reproducible. It happens every time.

Erez

<http://www.voltaire.com/>

 
From moshek at voltaire.com  Wed Sep 27 22:38:31 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 28 Sep 2006 08:38:31 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85E2@taurus.voltaire.com>

Michael wrote :
> Since I don't consider this a critical fix (there's no reason driver 
> won't go up, and if it does not, there's a simple workaround by 

Michael , 
The mstflint operated in the "classic way"  in OFED-1.1 is not working
on PPC64 sles10  !!!

Telling the customer to use a workaround (open /proc...) if there
platform is PPC64 is not nice !!   

We need to fix the bug in the code !

Frank wrote :
>  The patch can be enabled by defining CONFIG_MOPEN_FALL_BACK to 1.
CONFIG_MOPEN_FALL_BACK is defined to 1 for ppc64 and x86_64 and 0 for
others

This define keeps the program from been damaged when running on other
platforms.

Can you have a look at the code once more and write how you want us (me
and Frank ) to refine it ?

It's  o.k. for us if the fix will be enter to the OFED-1.2 but we need
it in the code ! 

Moshe


____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Tseng-Hui (Frank) Lin [mailto:thlin at us.ibm.com] 
Sent: Wednesday, September 27, 2006 7:46 PM
To: Michael S. Tsirkin
Cc: Moshe Kazir; Tseng-hui Lin; openib-general at openib.org
Subject: Re: [openib-general] FW: Mstflint - not working on ppc64
andwhendriver is not loaded on AMD


On Wed, 2006-09-27 at 18:19 +0300, Michael S. Tsirkin wrote:
> Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > Subject: FW: [openib-general] Mstflint - not working on ppc64 and 
> > whendriver is not loaded on AMD
> > 
> > Michael,
> >  
> > Frank new version was tested once more in Voltaire and is working 
> > o.k. . I tested  `./mstflint -d <lspci output> q`  when drivers are 
> > loaded and when drivers are not loaded. in all cases it worked o.k.
> 
> Thanks for testing, but I'd like to get a handle on what's going on 
> first.
> 
> First, I'm pretty sure when driver is loaded things work OK on all 
> systems. When driver is not loaded - could you please answer whether 
> using /sys/bus/pci/devices/0000\:03\:00.0/resource0
> works for you (on systems that have resource0)?
> 

It doesn't work.

> >  
> > Test was ferformed on the following environments :
> >  
> > -    IBM js21 ppc64 sles10 PCI-E
> > -    IBM js21 ppc64 sles9 sp3 PCI-E
> > -    IBM hs21 em64t redhat as 4 u3 PCI-E
> > -    IBM hs21 em64t sles 9 sp3 PCI-E
> > -    x86_64 sles10  PCI-E
> > -    MAC ppc64 sles10 PCI-X
> > -    MAC ppc64 sles10 PCI-E
> >
> > Please consider inserting the patch to OFED .
> >  
> > Moshe
> 
> Since I don't consider this a critical fix (there's no reason driver 
> won't go up, and if it does not, there's a simple workaround by 
> specifying the /proc interface, that is slower but works), I don't 
> think this should go into OFED 1.1.
> 
> Unfortunately, I never got a small bugfix patch against the latest 
> mstflint - the patch I saw posted touches all kind of things all over 
> the code - so I can't insert it in trunk, either.
> 

I agree this is not critical. The patch changes nothing but the way of
opening the device.

On some ppc64 and x86_64 machines, the I/O memory mapped by mmap() is
not accessable (return 0xFFFFFFFF) unless the kernel code (usually the
device driver) does an ioremap. This is why mmap resource0 does not work
on these machines. There is no way I am aware of can do ioremap from
user space code like mstflint. The only thing I can think of is to fall
back to use the config space file in /proc/bus/pci/.

The (big) patch I made checks if the faster way (mmap resource0) works.
It it doesn't, the patch tries other slower ways and use the fastest
working way it can find. That's all the patch does. It does not make big
fix. It just save the users trouble of trying all possible ways of
opening a devices manually.

I understand applying big patch is risky unless it can be throughly
tested. Unfortunately, no one has all the machines to test the patch.
Moshe and I have tested the patch on Power MAC, Squadrons, JS20, and
JS21 (almost all living ppc64 machines) as well as a few x86_64
machines. We believe this patch is safe for these machines. The patch
can be enabled by defining CONFIG_MOPEN_FALL_BACK to 1.
CONFIG_MOPEN_FALL_BACK is defined to 1 for ppc64 and x86_64 and 0 for
others. We can enable this patch on other machines when people who have
these machines tested the patch.

I agree this is no a critical patch, but it is a useful one. Moreover,
it is well tested on the machines with the patch enabled and change
nothing on the machines with the patch disabled. I believe this is a
safe patch. Please re-consider adding it. Thanks.


From mst at mellanox.co.il  Wed Sep 27 23:03:48 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:03:48 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <1159375559.21249.60.camel@flin.austin.ibm.com>
References: <1159375559.21249.60.camel@flin.austin.ibm.com>
Message-ID: <20060928060348.GB23828@mellanox.co.il>

Quoting r. Tseng-Hui (Frank) Lin <thlin at us.ibm.com>:
> On some ppc64 and x86_64 machines, the I/O memory mapped by mmap() is
> not accessable (return 0xFFFFFFFF) unless the kernel code (usually the
> device driver) does an ioremap. This is why mmap resource0 does not work
> on these machines.

Let's be exact here: ioremap *only* does not work if driver is not loaded.
Is that right? If yes, the typical and safe thing for the user is to have
driver loaded and do
-d /sys/class/infiniband/mthca0/device/resource0
without playing with lspci and other low level hacks,
and I would rather you told users to do *that*
(by the way, would it help if you could use "-d mthca0")?

> There is no way I am aware of can do ioremap from
> user space code like mstflint. The only thing I can think of is to fall
> back to use the config space file in /proc/bus/pci/.

How about write/read to/from resource0? Does that work?

> The (big) patch I made checks if the faster way (mmap resource0) works.
> It it doesn't, the patch tries other slower ways and use the fastest
> working way it can find. That's all the patch does. It does not make big
> fix. It just save the users trouble of trying all possible ways of
> opening a devices manually.

I don't reject that approach, not on principle.
This is absolutely something we can consider for trunk.
But let's fist try to make memory access work, even if
it's not with mmap.

-- 
MST


From mst at mellanox.co.il  Wed Sep 27 23:08:17 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:08:17 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <ada3bad7zaa.fsf@cisco.com>
References: <ada3bad7zaa.fsf@cisco.com>
Message-ID: <20060928060817.GD23828@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> If this value makes a real difference in practice, we can make it
> tunable but I would like to see some hard benchmarks that show it
> making a big difference one way or another.  But we have too many
> knobs as it is so I'm inclined to just pick a value that works OK.

Fair enough, let's start simple.
BTW, are you going to post the rewritten NAPI patch
for testing soon?

-- 
MST


From ogerlitz at voltaire.com  Wed Sep 27 23:18:45 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 28 Sep 2006 09:18:45 +0300
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
In-Reply-To: <451ABF0C.90607@ichips.intel.com>
References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
	<451ABF0C.90607@ichips.intel.com>
Message-ID: <451B6945.1050707@voltaire.com>

Sean Hefty wrote:
> Sean Hefty wrote:

>> An alternative is to send a DREP in response to a DREQ, even if a local
>> connection is not found, which is what this patch does.

> If there are no objections, I will commit this patch to svn, and submit for 
> inclusion upstream.

Sean,

My understanding is that without this patch the side that sends the DREQ 
would do few DREQ resends as of the "firsts" DREPs being lost and no 
DREPs sent once the id at the peer side left the timewait state, correct?

Arlin,

Can you please share what were the implications with intel MPI running a 
64 nodes (128 ranks?) job? was the issue here just making the ***job 
termination time*** bigger?

I don't have an objection for merging it, i just think it can be nice if 
we understand better what problem this patch comes to solve in terms of 
this use case that has driven the fix.

Or.


From mst at mellanox.co.il  Wed Sep 27 23:26:14 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:26:14 +0300
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
In-Reply-To: <451ABF0C.90607@ichips.intel.com>
References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
	<451ABF0C.90607@ichips.intel.com>
Message-ID: <20060928062614.GF23828@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [RFC] [PATCH] ib_cm: send DREP in response to unmatched DREQ
> 
> Sean Hefty wrote:
> > Currently a DREP is only sent in response to a DREQ if a connection
> > has been found matching the DREQ, and it is in the proper state.  Once
> > a DREP is sent, the local connection moves into timewait.  Duplicate
> > DREQs received while in this state result in re-sending the DREP.
> > 
> > However, it's likely that the local connection will enter and exit
> > timewait before the remote side times out a lost DREP and resends a DREQ.
> > There are a couple possible solutions to this.  One is to increase how
> > long a connection remains in timewait, by multiplying its wait time by
> > max_cm_retries.  This can greatly increase the timewait state before a QP
> > can be re-used when CM messages are not lost.
> > 
> > An alternative is to send a DREP in response to a DREQ, even if a local
> > connection is not found, which is what this patch does.
> 
> If there are no objections, I will commit this patch to svn, and submit for 
> inclusion upstream.

I'm OK with this change.

-- 
MST


From mst at mellanox.co.il  Wed Sep 27 23:27:23 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:27:23 +0300
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159393578.21086.16.camel@chalcedony.pathscale.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
Message-ID: <20060928062723.GG23828@mellanox.co.il>

Quoting r. Bryan O'Sullivan <bos at pathscale.com>:
> Subject: Re: 2.6.18 kernel support in the main trunk.
> 
> On Wed, 2006-09-27 at 14:24 -0700, Roland Dreier wrote:
> > Do we have to keep the kernel modules in svn limping along?  As time
> > goes on, I have less and less patience for double maintenance.
> 
> I'm still all in favour of nuking them...

Me too.

-- 
MST


From mst at mellanox.co.il  Wed Sep 27 23:29:19 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:29:19 +0300
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <adaac4l7zjd.fsf@cisco.com>
References: <20060926135114.1da96c1b@freekitty>
 <adaac4l7zjd.fsf@cisco.com>
Message-ID: <20060928062919.GH23828@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> -		 (u64)c2dev->rep_vq.host_dma);
> +		 (unsigned long long) c2dev->rep_vq.host_dma);

BTW, is there some printk format to print u64 type?

-- 
MST


From mst at mellanox.co.il  Wed Sep 27 23:31:33 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:31:33 +0300
Subject: [openib-general] RDMA CM callback status
In-Reply-To: <451AEFAD.4000708@ichips.intel.com>
References: <000001c6dd9e$9736a570$46fc070a@amr.corp.intel.com>
	<451AEFAD.4000708@ichips.intel.com>
Message-ID: <20060928063133.GI23828@mellanox.co.il>

Quoting r. Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: RDMA CM callback status
> 
> Sean Hefty wrote:
> >>1. Should I even be looking at event->status or does the event type tell me
> >>  everything I need to know?  I've had a report that the assertion
> >>  (event->status != 0) is failing on RDMA_CM_EVENT_ROUTE_ERROR.
> > 
> > It sounds like (and looks like from reading the code) that you've hit a bug with
> > the ROUTE_ERROR event.  The failure status isn't being propagated up to the
> > user.
> 
> I've committed a patch to svn which will set the event status correctly when a 
> route error occurs.

Can you post a patch pls?

-- 
MST


From mst at mellanox.co.il  Wed Sep 27 23:33:01 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:33:01 +0300
Subject: [openib-general] backporting fixes
In-Reply-To: <adamz8l80ru.fsf@cisco.com>
References: <20060927040745.GI24009@mellanox.co.il> <adamz8l80ru.fsf@cisco.com>
Message-ID: <20060928063301.GJ23828@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: backporting fixes
> 
>  > Now that  2.6.18 (with an additional patch) I looked at backporting bugfixes to
>  > older kernels.  The main problem I see is that the neighbour destructor
>  > interface change is not in 2.6.16, so IPoIB crashes randomly.
>  > 
>  > So approaches are
>  > - Try to push the change into 2.6.16 by netdev
>  > - Use the all-neighbour list as done by ofed
>  > - Abandon the whole project
>  > 
>  > Ideas?
> 
> Unfortunately I don't think this bug is very amenable to being fixed
> in a 2.6.16/-stable tree.  So the third solution is probably the best
> we can do at this point.

OK. How about 2.6.17.y?
I'm somewhat confused whether someone is still maintaining these.

-- 
MST


From mst at mellanox.co.il  Wed Sep 27 23:35:06 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 09:35:06 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85E2@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85E2@taurus.voltaire.com>
Message-ID: <20060928063506.GK23828@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> The mstflint operated in the "classic way"  in OFED-1.1 is not working
> on PPC64 sles10  !!!

I consider the classic way to be
-d /sys/class/infiniband/mthca0/device/resource0

It does seem a bit verbse now that you mention this - would
a shortcut to allow just -d mthca0 help a lot?

-- 
MST


From ogerlitz at voltaire.com  Wed Sep 27 23:43:04 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 28 Sep 2006 09:43:04 +0300
Subject: [openib-general] is IB/cm: Randomize starting comm ID fix missing
	in OFED 1.1 ?!
Message-ID: <D4F8F0B3820E754C887699BEF26A894001444BAE@taurus.voltaire.com>

Michael,

I understand that OFED 1.1 is based on the IB code of 2.6.18-rc6, however, 
this patch which was pushed to 2.6.19-rc1 solves a real problem which was 
reported from a Lustre field install and can be easily reproducable in the lab. 

Can it go into rc7?

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06d26537559113207e4b73af6a22eaa5c5e9dc3

Or.


From gil at mellanox.co.il  Thu Sep 28 00:16:00 2006
From: gil at mellanox.co.il (Gil Bloch)
Date: Thu, 28 Sep 2006 10:16:00 +0300
Subject: [openib-general] mvapich2-gen2 svn - vapi <--> gen2 ??
Message-ID: <6C2C79E72C305246B504CBA17B5500C9059AD6@mtlexch01.mtl.com>

Bryan,

As far as I know, the mvapich2 libraries are not intended for
heterogeneous IB installation. We (and I think it's the same with OSU)
do not check it in such configuration. 
For more details you might want to contact the mvapich team:
http://nowlab.cse.ohio-state.edu/projects/mpi-iba/

Regards,
Gil Bloch
Mellanox Technologies


> -----Original Message-----
> From: openib-general-bounces at openib.org [mailto:openib-general-
> bounces at openib.org] On Behalf Of Bryan Green
> Sent: Wednesday, September 27, 2006 8:59 PM
> To: openib-general at openib.org
> Subject: [openib-general] mvapich2-gen2 svn - vapi <--> gen2 ??
> 
> Hello,
> Regarding mvapich2-gen2 in the openib svn,
> can an mvapich2 vapi build on one machine
> communicate with a gen2 build on another?
> 
> -bryan
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-
> general


From mst at mellanox.co.il  Thu Sep 28 01:21:02 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 11:21:02 +0300
Subject: [openib-general] is IB/cm: Randomize starting comm ID fix
 missing in OFED 1.1 ?!
In-Reply-To: <D4F8F0B3820E754C887699BEF26A894001444BAE@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A894001444BAE@taurus.voltaire.com>
Message-ID: <20060928082102.GD25010@mellanox.co.il>

Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> Subject: is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?!
> 
> Michael,
> 
> I understand that OFED 1.1 is based on the IB code of 2.6.18-rc6, however, 
> this patch which was pushed to 2.6.19-rc1 solves a real problem which was 
> reported from a Lustre field install and can be easily reproducable in the lab. 
> 
> Can it go into rc7?
> 
> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06d26537559113207e4b73af6a22eaa5c5e9dc3
> 
> Or.
> 

Looks safe enough. OK.

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 01:39:09 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 11:39:09 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <aday7s56kn6.fsf@cisco.com>
References: <OF0ECEEC20.3A8E1746-ON872571F6.001FCAC6-882571F6.00073B6D@us.ibm.com>
	<20060927062316.GO24009@mellanox.co.il> <aday7s56kn6.fsf@cisco.com>
Message-ID: <20060928083909.GF25010@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> I think what we should do is follow the IB verbs extensions and expose
> multiple CQ event vectors, and let the consumer pick which one to use
> when creating a CQ.  If IPoIB wants to go round robin itself, that
> would be fine.
> 
> This is what I tried to set the userspace API up for.  Nothing in
> userspace would have to change for this -- the kernel just needs to
> add multiple EQ support.

Sounds good.
Fancy taking it up now, or should I look into this?

-- 
MST


From moshek at voltaire.com  Thu Sep 28 02:00:10 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 28 Sep 2006 12:00:10 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85E4@taurus.voltaire.com>

I prefer the "mstflint -d 0c:00.0 q " format 

As in enables the writing of script that extract lscpi info  and getting
results  ->

# mstflint -d `lspci | grep Mellanox |grep -v Bridge | cut -f1 -d" "` q
Image type:      Failsafe
I.S. Version:    1
Chip Revision:   A0
GUID Des:        Node             Port1            Port2            Sys
image
GUIDs:           0008f1040398047c 0008f1040398047d 0008f1040398047e
0008f1040398047f
Board ID:         (0TLV00700003)
VSD:
PSID:            0TLV00700003
#

The format "mstflint -d mtch0 " is good but no sufficient .  

When the HCA is old/wrong/damaged insmod may fail. In this case we'll
need mstflint to fix problems. 
Me must have a way to operate mstflint when driver is not loaded.


> When mthca is loaded, what does
> mstflint -d /sys/class/infiniband/mthca0/device/resource0 q
> do on PPC?


On PPC64 sles10 with Franks last fix 
 # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0  q
*** ERROR *** Can not open
/sys/class/infiniband/mthca0/device/resource0: Invalid argument
*** ERROR *** Can not get flash type using device
/sys/class/infiniband/mthca0/device/resource0
 #

On PPC64 with OFED-1.1 rc6 original sources 

 # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0  q
*** ERROR *** Can not open
/sys/class/infiniband/mthca0/device/resource0: Invalid argument
*** ERROR *** Can not get flash type using device
/sys/class/infiniband/mthca0/device/resource0
 #


Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Thursday, September 28, 2006 9:35 AM
To: Moshe Kazir
Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org;
openib-general at openib.org
Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not
loaded on AMD


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> The mstflint operated in the "classic way"  in OFED-1.1 is not working

> on PPC64 sles10  !!!

I consider the classic way to be
-d /sys/class/infiniband/mthca0/device/resource0

It does seem a bit verbse now that you mention this - would
a shortcut to allow just -d mthca0 help a lot?

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 02:48:18 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 12:48:18 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85E4@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85E4@taurus.voltaire.com>
Message-ID: <20060928094818.GH25010@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > When mthca is loaded, what does
> > mstflint -d /sys/class/infiniband/mthca0/device/resource0 q
> > do on PPC?
> 
> 
> On PPC64 sles10 with Franks last fix 
>  # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0  q
> *** ERROR *** Can not open

Does /sys/class/infiniband/mthca0/device/resource0 exist on this system?

Pls send output of
ls /sys/class/infiniband/mthca0/device/

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 02:53:08 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 12:53:08 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85E4@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85E4@taurus.voltaire.com>
Message-ID: <20060928095308.GI25010@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD
> 
> I prefer the "mstflint -d 0c:00.0 q " format 

BTW, this won't work on systems with multiple domains - you must
add the domain as well:

mstflint -d 0000:0c:00.0 q

-- 
MST


From ogerlitz at voltaire.com  Thu Sep 28 04:05:52 2006
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 28 Sep 2006 14:05:52 +0300
Subject: [openib-general] is IB/cm: Randomize starting comm ID fix
 missing in OFED 1.1 ?!
In-Reply-To: <20060928082102.GD25010@mellanox.co.il>
References: <D4F8F0B3820E754C887699BEF26A894001444BAE@taurus.voltaire.com>
	<20060928082102.GD25010@mellanox.co.il>
Message-ID: <451BAC90.5020002@voltaire.com>

Michael S. Tsirkin wrote:
> Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
>> Subject: is IB/cm: Randomize starting comm ID fix missing in OFED 1.1 ?!
>>
>> Michael,
>>
>> I understand that OFED 1.1 is based on the IB code of 2.6.18-rc6, however, 
>> this patch which was pushed to 2.6.19-rc1 solves a real problem which was 
>> reported from a Lustre field install and can be easily reproducable in the lab. 
>>
>> Can it go into rc7?
>>
>> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f06d26537559113207e4b73af6a22eaa5c5e9dc3
>>
>> Or.
>>
> 
> Looks safe enough. OK.

cool, thanks.

Or.


From moshek at voltaire.com  Thu Sep 28 04:16:26 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 28 Sep 2006 14:16:26 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85E7@taurus.voltaire.com>

O.k.

mstflint -d `lspci | grep Mellanox |grep -v Bridge | cut -f1 -d" "` q 

Will do the job .

Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Thursday, September 28, 2006 12:53 PM
To: Moshe Kazir
Cc: openib-general at openib.org; openfabrics-ewg at openib.org
Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not
loaded on AMD


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not 
> loaded on AMD
> 
> I prefer the "mstflint -d 0c:00.0 q " format

BTW, this won't work on systems with multiple domains - you must add the
domain as well:

mstflint -d 0000:0c:00.0 q

-- 
MST


From moshek at voltaire.com  Thu Sep 28 04:25:32 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 28 Sep 2006 14:25:32 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85E8@taurus.voltaire.com>


 # ls /sys/class/infiniband/mthca0/device/resource0
/sys/class/infiniband/mthca0/device/resource0

 # ls -ald /sys/class/infiniband/mthca0/device/*
lrwxrwxrwx 1 root root         0 Sep 27 11:33
/sys/class/infiniband/mthca0/device/bus -> ../../../../bus/pci
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/class
-rw-r--r-- 1 root root       256 Sep 28 14:17
/sys/class/infiniband/mthca0/device/config
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/device
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/devspec
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/driver ->
../../../../bus/pci/drivers/ib_mthca
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/infiniband:mthca0 ->
../../../../class/infiniband/mthca0
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/infiniband_mad:issm0 ->
../../../../class/infiniband_mad/issm0
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/infiniband_mad:issm1 ->
../../../../class/infiniband_mad/issm1
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/infiniband_mad:umad0 ->
../../../../class/infiniband_mad/umad0
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/infiniband_mad:umad1 ->
../../../../class/infiniband_mad/umad1
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/infiniband_verbs:uverbs0 ->
../../../../class/infiniband_verbs/uverbs0
-r--r--r-- 1 root root      4096 Sep 28 14:17
/sys/class/infiniband/mthca0/device/irq
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/local_cpus
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/modalias
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/net:ib0 -> ../../../../class/net/ib0
lrwxrwxrwx 1 root root         0 Sep 28 11:43
/sys/class/infiniband/mthca0/device/net:ib1 -> ../../../../class/net/ib1
-r--r--r-- 1 root root      4096 Sep 28 11:43
/sys/class/infiniband/mthca0/device/pools
-r--r--r-- 1 root root      4096 Sep 28 14:17
/sys/class/infiniband/mthca0/device/resource
-rw------- 1 root root   1048576 Sep 28 14:17
/sys/class/infiniband/mthca0/device/resource0
-rw------- 1 root root   8388608 Sep 27 11:33
/sys/class/infiniband/mthca0/device/resource2
-rw------- 1 root root 134217728 Sep 27 11:33
/sys/class/infiniband/mthca0/device/resource4
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/subsystem_device
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/subsystem_vendor
--w------- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/uevent
-r--r--r-- 1 root root      4096 Sep 27 11:33
/sys/class/infiniband/mthca0/device/vendor
 #


Moshe


____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Thursday, September 28, 2006 12:48 PM
To: Moshe Kazir
Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org;
openib-general at openib.org
Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not
loaded on AMD


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > When mthca is loaded, what does
> > mstflint -d /sys/class/infiniband/mthca0/device/resource0 q do on 
> > PPC?
> 
> 
> On PPC64 sles10 with Franks last fix
>  # ./mstflint -d /sys/class/infiniband/mthca0/device/resource0  q
> *** ERROR *** Can not open

Does /sys/class/infiniband/mthca0/device/resource0 exist on this system?

Pls send output of
ls /sys/class/infiniband/mthca0/device/

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 04:41:03 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 14:41:03 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85E8@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85E8@taurus.voltaire.com>
Message-ID: <20060928114103.GA26457@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD
> 
> 
>  # ls /sys/class/infiniband/mthca0/device/resource0
> /sys/class/infiniband/mthca0/device/resource0

OK, so can you try this please:

strace -f -v -o log  mstflint -d /sys/class/infiniband/mthca0/device/resource0 q

cat log

-- 
MST


From moshek at voltaire.com  Thu Sep 28 04:59:04 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 28 Sep 2006 14:59:04 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85E9@taurus.voltaire.com>


See attached files.


Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Thursday, September 28, 2006 2:41 PM
To: Moshe Kazir
Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org;
openib-general at openib.org
Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not
loaded on AMD


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not 
> loaded on AMD
> 
> 
>  # ls /sys/class/infiniband/mthca0/device/resource0
> /sys/class/infiniband/mthca0/device/resource0

OK, so can you try this please:

strace -f -v -o log  mstflint -d
/sys/class/infiniband/mthca0/device/resource0 q

cat log

-- 
MST
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Franks.mstflint.trace.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060928/3a5901dc/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: OFED-1.1-orig.mstflint.trace.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060928/3a5901dc/attachment-0001.txt>

From mst at mellanox.co.il  Thu Sep 28 05:12:00 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 15:12:00 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85E9@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85E9@taurus.voltaire.com>
Message-ID: <20060928121200.GB26457@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD
> 
> 
> See attached files.

OK, so we can open the file, but can't mmap it.
Let's see if we can read it.
Pls compile the following test and run with strace:

>strace -f -x -v -o log a.out
>cat log

#define _XOPEN_SOURCE 500
#define _FILE_OFFSET_BITS 64

#include <stdio.h>

#include <unistd.h>

#include <netinet/in.h>
#include <endian.h>
#include <byteswap.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>

int main()
{
        int fd, rc;
        unsigned value;
        fd = open("/sys/class/infiniband/mthca0/device/resource0" ,O_RDWR | O_SYNC);
        rc = pread(fd, &value, 4, 0xf0014);
        printf("0x%x\n", value);
        return 0;
}


-- 
MST


From mlakshmanan at silverstorm.com  Thu Sep 28 05:36:50 2006
From: mlakshmanan at silverstorm.com (Lakshmanan, Madhu)
Date: Thu, 28 Sep 2006 08:36:50 -0400
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <1106.89.1.173.135.1159391291.squirrel@dev.mellanox.co.il>
Message-ID: <D80D83302DEE6249A221093BF2BB69AE937E63@mail.silverstorm.com>

>> Roland Dreier wrote:
>> Maybe we should just use the port GUID instead of the node GUID to
>> form the initiator ID?  That would solve this pretty cleanly I think.

> This is also Vu's idea.
>
> There are two issues:
>
> 1) My patch allows a sophisticated user to have two logical
connections on
> the same physical solution. He can have different connection
parameters
> (e.g., MAX_CMD_PER_LUN) according to the application needs.
> Do you think there is such need?
>
> 2) In the current implementation there is a problem when there are two
> connections on the same physical connection - when the second
connection
> sends REQ to the target, the target sends a DREQ to the first
connection,
> but when someone tries to access the first scsi_host, ib_srp tries to
> reconnect the first connection and then the second connection gets a
DREQ
> - and so the ping pong goes.
> And if there is a multipath daemon that checks the status of the
> connections this ping pong can be for ever.
> We need to find a way to eliminate this behavior.

> Ishai

Silverstorm's native SRP implementation allows for the initiator ID to
be the port GUID and the initiator extension to be user-specified. This
approach is taken to initiate multiple connections to a single SRP
target from the same host; i.e. the initiator ID is kept the same (port
GUID) and a different initiator extension is specified.

Btw, could you point me to the latest source code? I didn't see it under

gen2/trunk/src/linux-kernel/infiniband/ulp/srp. I'd like to collaborate
with you on OFED SRP.

Madhu Lakshmanan
SilverStorm Technologies


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From mst at mellanox.co.il  Thu Sep 28 05:53:57 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 15:53:57 +0300
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <D80D83302DEE6249A221093BF2BB69AE937E63@mail.silverstorm.com>
References: <1106.89.1.173.135.1159391291.squirrel@dev.mellanox.co.il>
	<D80D83302DEE6249A221093BF2BB69AE937E63@mail.silverstorm.com>
Message-ID: <20060928125357.GA28381@mellanox.co.il>

Quoting r. Lakshmanan, Madhu <mlakshmanan at silverstorm.com>:
> gen2/trunk/src/linux-kernel/infiniband/ulp/srp.

This is deprecated.
You can get the exact code used for OFED 1.1 from ofed git tree.
The instructions are here:
https://openib.org/svn/gen2/branches/1.1/ofed/docs/HOWTO.build_ofed


> I'd like to collaborate with you on OFED SRP.

Please note that OFED 1.1 is in freeze and only critical and documentation
fixes are accepted.

Note also that OFED is a distribution testing and packaging, not a development
effort.  OFED backports kernel.org code to older kernels, so there's no "OFED
SRP" as such: to get your work into the next OFED release you should just work
against the latest kernel.org tree, and get Roland to accept your patches.

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 06:00:52 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 16:00:52 +0300
Subject: [openib-general] [PATCH] IB/SRP: Enable multichannel
In-Reply-To: <adar6xx821b.fsf@cisco.com>
References: <20060926144541.GA17938@mellanox.co.il>
	<4519E86D.9030508@mellanox.com> <20060927071059.GA21509@mellanox.co.il>
	<451A95D5.7060409@mellanox.com> <451AA4ED.7010501@mellanox.com>
	<adar6xx821b.fsf@cisco.com>
Message-ID: <20060928130052.GB28381@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/SRP: Enable multichannel
> 
> Maybe we should just use the port GUID instead of the node GUID to
> form the initiator ID?  That would solve this pretty cleanly I think.

Sounds good.
I think we should also stick the pkey into the identifier extension -
I think it's nice for each partition to be able to act as a separate virtual network,
not affecting others.

What do you think?

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 06:40:30 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 16:40:30 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85E9@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85E9@taurus.voltaire.com>
Message-ID: <20060928134029.GA25913@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> 
> Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not 
> > loaded on AMD
> > 
> > 
> >  # ls /sys/class/infiniband/mthca0/device/resource0
> > /sys/class/infiniband/mthca0/device/resource0
> 
> OK, so can you try this please:
> 
> strace -f -v -o log  mstflint -d
> /sys/class/infiniband/mthca0/device/resource0 q
> 
> cat log
> 
> -- 
> MST


> 30463 open("/sys/class/infiniband/mthca0/device/resource0", O_RDWR|O_SYNC|O_LARGEFILE) = 3
> 30463 mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 EINVAL (Invalid argument)

So we see that mmap is failing with EINVAL.
But why? We seem to be passing all valid parameters to it.

I'm looking at arch/ppc/kernel/pci.c at the moment.
It seems that EINVAL is returned if __pci_mmap_make_offset
fails, and that seems to be only looking for a valid resource size.

Are you up to finding the root cause of the problem in arch/ppc/kernel/pci.c?

Maybe the resource offsets are wrong? What does
cat /sys/class/infiniband/mthca0/device/resource
show?

Maybe there's some problem to map a full megabyte?
Here's a test that only maps 4K. Could you strace it please?

>>>>>>>>>>>

#define _XOPEN_SOURCE 500
#define _FILE_OFFSET_BITS 64

#include <stdio.h>

#include <unistd.h>

#include <netinet/in.h>
#include <endian.h>
#include <byteswap.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>

#include <sys/pci.h>
#include <sys/ioctl.h>

#include <sys/mman.h>
#include <sys/pci.h>
#include <sys/stat.h>
/* #include <sys/ioctl.h>
 * #include <sys/types.h> */

int main()
{
        int fd;
        unsigned value;
        volatile void *ptr;
        fd = open("/proc/bus/pci/00/00.0" ,O_RDWR | O_SYNC);

        /* ioctl(fd, PCIIOC_MMAP_IS_MEM); */
        ptr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xf0000);
        memcpy(&value, (void*)(ptr + 0x14), sizeof value);
        printf("0x%x\n");
        return 0;
}


-- 
MST


From moshek at voltaire.com  Thu Sep 28 06:59:14 2006
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 28 Sep 2006 16:59:14 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
Message-ID: <D4F8F0B3820E754C887699BEF26A8940EB85ED@taurus.voltaire.com>

Michael,

Frank found the cause to the problem in the implementation of
arch/ppc/kernel/pci.c , 
and asked the IBM kernel group to send a bug fix to the Linux kernel
group.

The problem is :

1. This bug fix will not enter SLES10 as it is closed.
2. It also will not enter SLES9 :-) or Redhate as4 u4 .

So we need a bug fix that will enable the use of mstflint on js21 PPC64
+ backport to old systems  .

Franks fix is based on two points (if I understand the code with no
errors) -

1. It opens /proc/bus/pci... And not /sys/bus/pci/...
2. It perform an ictl(fd, PCIIOC_MMAP_IS_MEM) ;

Frank - am I write ?

Can we enter these two small changes to the mstflint to have it working
on the PPC64 js21 ?

Moshe 


____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
-----Original Message-----
From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
Sent: Thursday, September 28, 2006 4:41 PM
To: Moshe Kazir
Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org;
openib-general at openib.org
Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not
loaded on AMD


Quoting r. Moshe Kazir <moshek at voltaire.com>:
> 
> Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is 
> > not
> > loaded on AMD
> > 
> > 
> >  # ls /sys/class/infiniband/mthca0/device/resource0
> > /sys/class/infiniband/mthca0/device/resource0
> 
> OK, so can you try this please:
> 
> strace -f -v -o log  mstflint -d 
> /sys/class/infiniband/mthca0/device/resource0 q
> 
> cat log
> 
> --
> MST


> 30463 open("/sys/class/infiniband/mthca0/device/resource0",
O_RDWR|O_SYNC|O_LARGEFILE) = 3
> 30463 mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) =
-1 EINVAL (Invalid argument)

So we see that mmap is failing with EINVAL.
But why? We seem to be passing all valid parameters to it.

I'm looking at arch/ppc/kernel/pci.c at the moment.
It seems that EINVAL is returned if __pci_mmap_make_offset
fails, and that seems to be only looking for a valid resource size.

Are you up to finding the root cause of the problem in
arch/ppc/kernel/pci.c?

Maybe the resource offsets are wrong? What does
cat /sys/class/infiniband/mthca0/device/resource
show?

Maybe there's some problem to map a full megabyte?
Here's a test that only maps 4K. Could you strace it please?

>>>>>>>>>>>

#define _XOPEN_SOURCE 500
#define _FILE_OFFSET_BITS 64

#include <stdio.h>

#include <unistd.h>

#include <netinet/in.h>
#include <endian.h>
#include <byteswap.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <stdlib.h>

#include <sys/pci.h>
#include <sys/ioctl.h>

#include <sys/mman.h>
#include <sys/pci.h>
#include <sys/stat.h>
/* #include <sys/ioctl.h>
 * #include <sys/types.h> */

int main()
{
        int fd;
        unsigned value;
        volatile void *ptr;
        fd = open("/proc/bus/pci/00/00.0" ,O_RDWR | O_SYNC);

        /* ioctl(fd, PCIIOC_MMAP_IS_MEM); */
        ptr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
0xf0000);
        memcpy(&value, (void*)(ptr + 0x14), sizeof value);
        printf("0x%x\n");
        return 0;
}


-- 
MST


From mst at mellanox.co.il  Thu Sep 28 07:17:15 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 17:17:15 +0300
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85ED@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85ED@taurus.voltaire.com>
Message-ID: <20060928141715.GB28790@mellanox.co.il>

Quoting r. Moshe Kazir <moshek at voltaire.com>:
> Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is not loaded on AMD
> 
> Michael,
> 
> Frank found the cause to the problem in the implementation of
> arch/ppc/kernel/pci.c , 
> and asked the IBM kernel group to send a bug fix to the Linux kernel
> group.
> 
> The problem is :
> 
> 1. This bug fix will not enter SLES10 as it is closed.
> 2. It also will not enter SLES9 :-) or Redhate as4 u4 .
> 
> So we need a bug fix that will enable the use of mstflint on js21 PPC64
> + backport to old systems  .

OK, cool, but could I see this discussion/patch please, to understand the
solution?
Just googling for Frank's name only gets me something related to SIOCGIFCONF
ioctl.

> Franks fix is based on two points (if I understand the code with no
> errors) -
> 
> 1. It opens /proc/bus/pci... And not /sys/bus/pci/...
> 2. It perform an ictl(fd, PCIIOC_MMAP_IS_MEM) ;
> 
> Frank - am I write ?
> 
> Can we enter these two small changes to the mstflint to have it working
> on the PPC64 js21 ?

Oh, I was under impression that we were falling back on pread/pwrite from /proc,
which is not safe without locking.

-- 
MST


From rdreier at cisco.com  Thu Sep 28 07:52:35 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 07:52:35 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060928060817.GD23828@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 28 Sep 2006 09:08:17 +0300")
References: <ada3bad7zaa.fsf@cisco.com> <20060928060817.GD23828@mellanox.co.il>
Message-ID: <adad59g6o24.fsf@cisco.com>

    Michael> Fair enough, let's start simple.  BTW, are you going to
    Michael> post the rewritten NAPI patch for testing soon?

Yes.  I need to finish the driver changes first.


From rdreier at cisco.com  Thu Sep 28 07:53:03 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 07:53:03 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060928083909.GF25010@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 28 Sep 2006 11:39:09 +0300")
References: <OF0ECEEC20.3A8E1746-ON872571F6.001FCAC6-882571F6.00073B6D@us.ibm.com>
	<20060927062316.GO24009@mellanox.co.il> <aday7s56kn6.fsf@cisco.com>
	<20060928083909.GF25010@mellanox.co.il>
Message-ID: <ada8xk46o1c.fsf@cisco.com>

    Michael> Sounds good.  Fancy taking it up now, or should I look
    Michael> into this?

Go ahead and work on it -- I've been meaning to for a year or so, and
I haven't started yet.


From rdreier at cisco.com  Thu Sep 28 07:53:24 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 07:53:24 -0700
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <20060928062919.GH23828@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 28 Sep 2006 09:29:19 +0300")
References: <20060926135114.1da96c1b@freekitty>
	<adaac4l7zjd.fsf@cisco.com> <20060928062919.GH23828@mellanox.co.il>
Message-ID: <ada4pus6o0r.fsf@cisco.com>

    Michael> BTW, is there some printk format to print u64 type?

Not that I know of.


From bos at pathscale.com  Thu Sep 28 07:58:26 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 07:58:26 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <20060928062723.GG23828@mellanox.co.il>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
Message-ID: <1159455506.11976.1.camel@chalcedony.pathscale.com>

On Thu, 2006-09-28 at 09:27 +0300, Michael S. Tsirkin wrote:

> Me too.

Roland and I (following his example) checked in changes to the mthca and
ipath drivers in SVN yesterday that add a #warning to a core driver
source file saying "don't look here, look over there!"  That's a first
step towards dropping the drivers from SVN trunk altogether.

	<b


From mst at mellanox.co.il  Thu Sep 28 08:15:49 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 18:15:49 +0300
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adad59g6o24.fsf@cisco.com>
References: <ada3bad7zaa.fsf@cisco.com>
	<20060928060817.GD23828@mellanox.co.il> <adad59g6o24.fsf@cisco.com>
Message-ID: <20060928151549.GG28790@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] IB/ipoib: NAPI
> 
>     Michael> Fair enough, let's start simple.  BTW, are you going to
>     Michael> post the rewritten NAPI patch for testing soon?
> 
> Yes.  I need to finish the driver changes first.

Looked pretty simple on the outset, but oh well. Keep us posted.

-- 
MST


From mst at mellanox.co.il  Thu Sep 28 08:18:14 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Thu, 28 Sep 2006 18:18:14 +0300
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159455506.11976.1.camel@chalcedony.pathscale.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
Message-ID: <20060928151814.GH28790@mellanox.co.il>

Quoting r. Bryan O'Sullivan <bos at pathscale.com>:
> Subject: Re: 2.6.18 kernel support in the main trunk.
> 
> On Thu, 2006-09-28 at 09:27 +0300, Michael S. Tsirkin wrote:
> 
> > Me too.
> 
> Roland and I (following his example) checked in changes to the mthca and
> ipath drivers in SVN yesterday that add a #warning to a core driver
> source file saying "don't look here, look over there!"  That's a first
> step towards dropping the drivers from SVN trunk altogether.

Good idea.

-- 
MST


From kliteyn at dev.mellanox.co.il  Thu Sep 28 08:16:36 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 28 Sep 2006 18:16:36 +0300
Subject: [openib-general] [PATCH 1/2] osm: osmtest ignores error status
Message-ID: <yzslko4owbv.fsf@kliteynik.yok.mtl.com>

Hi Hal.

This patch takes care of several cases where osmtest
ignored error status.

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Index: osmt_slvl_vl_arb.c
===================================================================
--- osmt_slvl_vl_arb.c	(revision 9661)
+++ osmt_slvl_vl_arb.c	(working copy)
@@ -164,12 +164,9 @@ osmt_query_vl_arb(
 
   if( status != IB_SUCCESS )
   {
-    if (status != IB_INVALID_PARAMETER)
-    {
-      osm_log( &p_osmt->log, OSM_LOG_ERROR,
-               "osmt_query_vl_arb: ERR 0466: "
-               "ib_query failed (%s)\n", ib_get_err_str( status ) );
-    }
+    osm_log( &p_osmt->log, OSM_LOG_ERROR,
+             "osmt_query_vl_arb: ERR 0466: "
+             "ib_query failed (%s)\n", ib_get_err_str( status ) );
 
     if( status == IB_REMOTE_ERROR )
     {
@@ -385,12 +382,9 @@ osmt_query_slvl_map(
 
   if( status != IB_SUCCESS )
   {
-    if (status != IB_INVALID_PARAMETER)
-    {
-      osm_log( &p_osmt->log, OSM_LOG_ERROR,
-               "osmt_query_slvl_map: ERR 0470: "
-               "ib_query failed (%s)\n", ib_get_err_str( status ) );
-    }
+    osm_log( &p_osmt->log, OSM_LOG_ERROR,
+             "osmt_query_slvl_map: ERR 0470: "
+             "ib_query failed (%s)\n", ib_get_err_str( status ) );
 
     if( status == IB_REMOTE_ERROR )
     {
Index: osmt_inform.c
===================================================================
--- osmt_inform.c	(revision 9661)
+++ osmt_inform.c	(working copy)
@@ -103,6 +103,7 @@ osmt_bind_inform_qp( IN osmtest_t * cons
     osm_log( p_log, OSM_LOG_ERROR,
              "osmt_bind_inform_qp: ERR 0109: "
              "Unable to obtain CA and port (%d).\n" );
+    status = IB_ERROR;
     goto Exit;
   }
 
@@ -579,6 +580,7 @@ osmt_send_trap_wait_for_forward( IN osmt
                "Did not receive a Report(Notice) but attr:%d\n",
                cl_ntoh16(p_sa_mad->attr_id)
                );
+      status = IB_ERROR;
     }
   }
   else
@@ -588,6 +590,7 @@ osmt_send_trap_wait_for_forward( IN osmt
              "Received an Unexpected Method:%d\n",
              p_smp->method
              );
+    status = IB_ERROR;
   }
 
  Exit:
@@ -666,6 +669,7 @@ osmt_trap_wait( IN osmtest_t * const    
                "Did not receive a Report(Notice) but attr:%d\n",
                cl_ntoh16(p_sa_mad->attr_id)
                );
+      status = IB_ERROR;
     }
   }
   else
@@ -675,6 +679,7 @@ osmt_trap_wait( IN osmtest_t * const    
              "Received an Unexpected Method:%d\n",
              p_smp->method
              );
+    status = IB_ERROR;
   }
 
  Exit:


From or.gerlitz at gmail.com  Thu Sep 28 08:20:57 2006
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Thu, 28 Sep 2006 17:20:57 +0200
Subject: [openib-general] [RFC] determining which changes in svn to
 merge upstream or remove
In-Reply-To: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com>
References: <000001c6e0ff$37474de0$a440e984@amr.corp.intel.com>
Message-ID: <15ddcffd0609280820r6e36d834q2c6e8802b180ae25@mail.gmail.com>

On 9/26/06, Sean Hefty <sean.hefty at intel.com> wrote:

> Specifically, the following features are in svn only:
> * RDMA CM:
>         - userspace support
>         - multicast support
>         - UD QP support (required for multicast)

I think that all the above three should be prioritized to be push for 2.6.20
and can be cool to have them in -mm before so people can experience
with them before (the latter two are not part of OFED).

The user space support for writing IB/RC ULPs is now in OFED and used
by uDAPL which is in turn used in Intel MPI and more products to come, i hope.

Exposing IB/UD/Mcast RDMA CM api in user space would allow to offload UDP based
ULPs which use IP multicast, which is something also being talked here
and there.

It makes sense to use the IB mulitcast module in the kernel for
keeping refs and managing
user space processes SA Join/Leave interaction. It makes much sense to
port IPoIB to this module, I saw the patch and basically it seems
stright-forward.

Or.


From kliteyn at dev.mellanox.co.il  Thu Sep 28 08:20:28 2006
From: kliteyn at dev.mellanox.co.il (Yevgeny Kliteynik)
Date: Thu, 28 Sep 2006 18:20:28 +0300
Subject: [openib-general] [PATCH 2/2] osm: osmtest ignores error status
Message-ID: <yzs8xk45877.fsf@kliteynik.yok.mtl.com>

Hi Hal.

This patch takes care of several cases where osmtest
ignored error status (plus some cosmetics).

Yevgeny

Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Index: osmt_service.c
===================================================================
--- osmt_service.c	(revision 9661)
+++ osmt_service.c	(working copy)
@@ -60,6 +60,9 @@
 #include <complib/cl_debug.h>
 #include "osmtest.h"
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_register_service( IN osmtest_t * const p_osmt,
                        IN ib_net64_t      service_id,
@@ -174,6 +177,9 @@ osmt_register_service( IN osmtest_t * co
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_register_service_with_full_key ( IN osmtest_t * const p_osmt,
                                       IN ib_net64_t      service_id,
@@ -260,6 +266,23 @@ osmt_register_service_with_full_key ( IN
   }
 
   status = context.result.status;
+  if( status != IB_SUCCESS )
+  {
+    osm_log( &p_osmt->log, OSM_LOG_ERROR,
+             "osmt_register_service_with_full_key: ERR 4A04: "
+             "ib_query failed (%s)\n", ib_get_err_str( status ) );
+
+    if( status == IB_REMOTE_ERROR )
+    {
+      osm_log( &p_osmt->log, OSM_LOG_ERROR,
+               "osmt_register_service_with_full_key: "
+               "Remote error = %s\n",
+               ib_get_mad_status_str( osm_madw_get_mad_ptr
+                                      ( context.result.
+                                        p_result_madw ) ) );
+    }
+    goto Exit;
+  }
 
   /*  Check service key on context to see if match */
   p_rec = osmv_get_query_svc_rec( context.result.p_result_madw, 0 );
@@ -277,30 +300,12 @@ osmt_register_service_with_full_key ( IN
   {
     status = IB_REMOTE_ERROR;
     osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmt_register_service_with_full_key:"
+             "osmt_register_service_with_full_key: ERR 4A34: "
              "Data mismatch in service_key\n"
              );
     goto Exit;
   }
 
-  if( status != IB_SUCCESS )
-  {
-    osm_log( &p_osmt->log, OSM_LOG_ERROR,
-             "osmt_register_service_with_full_key: ERR 4A04: "
-             "ib_query failed (%s)\n", ib_get_err_str( status ) );
-
-    if( status == IB_REMOTE_ERROR )
-    {
-      osm_log( &p_osmt->log, OSM_LOG_ERROR,
-               "osmt_register_service_with_full_key: "
-               "Remote error = %s\n",
-               ib_get_mad_status_str( osm_madw_get_mad_ptr
-                                      ( context.result.
-                                        p_result_madw ) ) );
-    }
-    goto Exit;
-  }
-
  Exit:
   if( context.result.p_result_madw != NULL )
   {
@@ -312,6 +317,9 @@ osmt_register_service_with_full_key ( IN
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_register_service_with_data( IN osmtest_t * const p_osmt,
                                  IN ib_net64_t      service_id,
@@ -478,6 +486,9 @@ osmt_register_service_with_data( IN osmt
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_get_service_by_id_and_name ( IN osmtest_t * const p_osmt,
                                   IN uint32_t rec_num,
@@ -618,6 +629,9 @@ osmt_get_service_by_id_and_name ( IN osm
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_get_service_by_id ( IN osmtest_t * const p_osmt,
                          IN uint32_t rec_num,
@@ -755,6 +769,9 @@ osmt_get_service_by_id ( IN osmtest_t * 
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_get_service_by_name_and_key ( IN osmtest_t * const p_osmt,
                                    IN char * sr_name,
@@ -907,6 +924,9 @@ osmt_get_service_by_name_and_key ( IN os
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_get_service_by_name( IN osmtest_t * const p_osmt,
                           IN char * sr_name,
@@ -1036,6 +1056,9 @@ osmt_get_service_by_name( IN osmtest_t *
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 #ifdef VENDOR_RMPP_SUPPORT
 ib_api_status_t
 osmt_get_all_services_and_check_names( IN osmtest_t * const p_osmt,
@@ -1170,6 +1193,9 @@ osmt_get_all_services_and_check_names( I
 }
 #endif
 
+/**********************************************************************
+ **********************************************************************/
+
 ib_api_status_t
 osmt_delete_service_by_name(IN osmtest_t * const p_osmt,
                             IN uint8_t IsServiceExist,
@@ -1293,6 +1319,9 @@ osmt_delete_service_by_name(IN osmtest_t
   return status;
 }
 
+/**********************************************************************
+ **********************************************************************/
+
 /*
  * Run a complete service records flow:
  * - register a service


From jlentini at netapp.com  Thu Sep 28 08:48:10 2006
From: jlentini at netapp.com (James Lentini)
Date: Thu, 28 Sep 2006 11:48:10 -0400 (EDT)
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <ada4pus6o0r.fsf@cisco.com>
References: <20060926135114.1da96c1b@freekitty>
	<adaac4l7zjd.fsf@cisco.com> <20060928062919.GH23828@mellanox.co.il>
	<ada4pus6o0r.fsf@cisco.com>
Message-ID: <Pine.LNX.4.64.0609281145140.9963@jlentini-linux.nane.netapp.com>


On Thu, 28 Sep 2006, Roland Dreier wrote:

>     Michael> BTW, is there some printk format to print u64 type?

Try "%Lu", That will print a long long unsigned value. 


From bos at pathscale.com  Thu Sep 28 08:59:58 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 08:59:58 -0700
Subject: [openib-general] [PATCH 2 of 28] IB/ipath - fix memory leak if
	allocation fails
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <45079acba20851290d1f.1159459198@eng-12.pathscale.com>

If the second allocation failed, the first structure allocated in this
routine was not freed.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r c46292ccb0f5 -r 45079acba208 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1326,6 +1326,9 @@ int ipath_create_rcvhdrq(struct ipath_de
 				      "for port %u rcvhdrqtailaddr failed\n",
 				      pd->port_port);
 			ret = -ENOMEM;
+			dma_free_coherent(&dd->pcidev->dev, amt,
+					  pd->port_rcvhdrq, pd->port_rcvhdrq_phys);
+			pd->port_rcvhdrq = NULL;
 			goto bail;
 		}
 		pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail;


From bos at pathscale.com  Thu Sep 28 08:59:56 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 08:59:56 -0700
Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19
Message-ID: <patchbomb.1159459196@eng-12.pathscale.com>

Hi, Roland -

This patch series brings the ipath driver almost up to date with what's
in our internal tree.  The only substantial thing missing is the
memcpy_cachebypass patch that I sent out a while back and haven't had
time to rework.

These patches have seen a lot of testing, including on a git snapshot
as of yesterday afternoon.  Please apply.

Thanks,

	<b


From bos at pathscale.com  Thu Sep 28 08:59:57 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 08:59:57 -0700
Subject: [openib-general] [PATCH 1 of 28] IB/ipath - limit # of packets sent
 without an ACK received
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <c46292ccb0f54abc77f7.1159459197@eng-12.pathscale.com>

The sender requests an ACK every 1/2 MB to avoid retransmit timeouts that
were causing MVAPICH mod_bw to fail after a predictable number of sends.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c	Thu Sep 28 08:57:12 2006 -0700
@@ -342,6 +342,7 @@ static void ipath_reset_qp(struct ipath_
 	qp->s_last = 0;
 	qp->s_ssn = 1;
 	qp->s_lsn = 0;
+	qp->s_wait_credit = 0;
 	if (qp->r_rq.wq) {
 		qp->r_rq.wq->head = 0;
 		qp->r_rq.wq->tail = 0;
@@ -516,7 +517,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
 		qp->remote_qpn = attr->dest_qp_num;
 
 	if (attr_mask & IB_QP_SQ_PSN) {
-		qp->s_next_psn = attr->sq_psn;
+		qp->s_psn = qp->s_next_psn = attr->sq_psn;
 		qp->s_last_psn = qp->s_next_psn - 1;
 	}
 
diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -201,6 +201,18 @@ int ipath_make_rc_req(struct ipath_qp *q
 	    qp->s_rnr_timeout)
 		goto done;
 
+	/* Limit the number of packets sent without an ACK. */
+	if (ipath_cmp24(qp->s_psn, qp->s_last_psn + IPATH_PSN_CREDIT) > 0) {
+		qp->s_wait_credit = 1;
+		dev->n_rc_stalls++;
+		spin_lock(&dev->pending_lock);
+		if (list_empty(&qp->timerwait))
+			list_add_tail(&qp->timerwait,
+				      &dev->pending[dev->pending_index]);
+		spin_unlock(&dev->pending_lock);
+		goto done;
+	}
+
 	/* header size in 32-bit words LRH+BTH = (8+12)/4. */
 	hwords = 5;
 	bth0 = 0;
@@ -221,7 +233,7 @@ int ipath_make_rc_req(struct ipath_qp *q
 			/* Check if send work queue is empty. */
 			if (qp->s_tail == qp->s_head)
 				goto done;
-			qp->s_psn = wqe->psn = qp->s_next_psn;
+			wqe->psn = qp->s_next_psn;
 			newreq = 1;
 		}
 		/*
@@ -393,12 +405,6 @@ int ipath_make_rc_req(struct ipath_qp *q
 		ss = &qp->s_sge;
 		len = qp->s_len;
 		if (len > pmtu) {
-			/*
-			 * Request an ACK every 1/2 MB to avoid retransmit
-			 * timeouts.
-			 */
-			if (((wqe->length - len) % (512 * 1024)) == 0)
-				bth2 |= 1 << 31;
 			len = pmtu;
 			break;
 		}
@@ -435,12 +441,6 @@ int ipath_make_rc_req(struct ipath_qp *q
 		ss = &qp->s_sge;
 		len = qp->s_len;
 		if (len > pmtu) {
-			/*
-			 * Request an ACK every 1/2 MB to avoid retransmit
-			 * timeouts.
-			 */
-			if (((wqe->length - len) % (512 * 1024)) == 0)
-				bth2 |= 1 << 31;
 			len = pmtu;
 			break;
 		}
@@ -498,6 +498,8 @@ int ipath_make_rc_req(struct ipath_qp *q
 		 */
 		goto done;
 	}
+	if (ipath_cmp24(qp->s_psn, qp->s_last_psn + IPATH_PSN_CREDIT - 1) >= 0)
+		bth2 |= 1 << 31;	/* Request ACK. */
 	qp->s_len -= len;
 	qp->s_hdrwords = hwords;
 	qp->s_cur_sge = ss;
@@ -737,6 +739,15 @@ bail:
 	return;
 }
 
+static inline void update_last_psn(struct ipath_qp *qp, u32 psn)
+{
+	if (qp->s_wait_credit) {
+		qp->s_wait_credit = 0;
+		tasklet_hi_schedule(&qp->s_task);
+	}
+	qp->s_last_psn = psn;
+}
+
 /**
  * do_rc_ack - process an incoming RC ACK
  * @qp: the QP the ACK came in on
@@ -805,7 +816,7 @@ static int do_rc_ack(struct ipath_qp *qp
 			 * The last valid PSN seen is the previous
 			 * request's.
 			 */
-			qp->s_last_psn = wqe->psn - 1;
+			update_last_psn(qp, wqe->psn - 1);
 			/* Retry this request. */
 			ipath_restart_rc(qp, wqe->psn, &wc);
 			/*
@@ -864,7 +875,7 @@ static int do_rc_ack(struct ipath_qp *qp
 		ipath_get_credit(qp, aeth);
 		qp->s_rnr_retry = qp->s_rnr_retry_cnt;
 		qp->s_retry = qp->s_retry_cnt;
-		qp->s_last_psn = psn;
+		update_last_psn(qp, psn);
 		ret = 1;
 		goto bail;
 
@@ -883,7 +894,7 @@ static int do_rc_ack(struct ipath_qp *qp
 			goto bail;
 
 		/* The last valid PSN is the previous PSN. */
-		qp->s_last_psn = psn - 1;
+		update_last_psn(qp, psn - 1);
 
 		dev->n_rc_resends += (int)qp->s_psn - (int)psn;
 
@@ -898,7 +909,7 @@ static int do_rc_ack(struct ipath_qp *qp
 	case 3:		/* NAK */
 		/* The last valid PSN seen is the previous request's. */
 		if (qp->s_last != qp->s_tail)
-			qp->s_last_psn = wqe->psn - 1;
+			update_last_psn(qp, wqe->psn - 1);
 		switch ((aeth >> IPATH_AETH_CREDIT_SHIFT) &
 			IPATH_AETH_CREDIT_MASK) {
 		case 0:	/* PSN sequence error */
@@ -1071,7 +1082,7 @@ static inline void ipath_rc_rcv_resp(str
 		 * since we don't want s_sge modified.
 		 */
 		qp->s_len -= pmtu;
-		qp->s_last_psn = psn;
+		update_last_psn(qp, psn);
 		spin_unlock_irqrestore(&qp->s_lock, flags);
 		ipath_copy_sge(&qp->s_sge, data, pmtu);
 		goto bail;
diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1683,6 +1683,7 @@ static ssize_t show_stats(struct class_d
 		      "RC OTH NAKs %d\n"
 		      "RC timeouts %d\n"
 		      "RC RDMA dup %d\n"
+		      "RC stalls   %d\n"
 		      "piobuf wait %d\n"
 		      "no piobuf   %d\n"
 		      "PKT drops   %d\n"
@@ -1690,7 +1691,7 @@ static ssize_t show_stats(struct class_d
 		      dev->n_rc_resends, dev->n_rc_qacks, dev->n_rc_acks,
 		      dev->n_seq_naks, dev->n_rdma_seq, dev->n_rnr_naks,
 		      dev->n_other_naks, dev->n_timeouts,
-		      dev->n_rdma_dup_busy, dev->n_piowait,
+		      dev->n_rdma_dup_busy, dev->n_rc_stalls, dev->n_piowait,
 		      dev->n_no_piobuf, dev->n_pkt_drops, dev->n_wqe_errs);
 	for (i = 0; i < ARRAY_SIZE(dev->opstats); i++) {
 		const struct ipath_opcode_stats *si = &dev->opstats[i];
diff -r f1b431dca1f9 -r c46292ccb0f5 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h	Thu Sep 28 08:57:12 2006 -0700
@@ -370,6 +370,7 @@ struct ipath_qp {
 	u8 s_rnr_retry_cnt;
 	u8 s_retry;		/* requester retry counter */
 	u8 s_rnr_retry;		/* requester RNR retry counter */
+	u8 s_wait_credit;	/* limit number of unacked packets sent */
 	u8 s_pkey_index;	/* PKEY index to use */
 	u8 timeout;		/* Timeout for this QP */
 	enum ib_mtu path_mtu;
@@ -392,6 +393,8 @@ struct ipath_qp {
  */
 #define IPATH_S_BUSY		0
 #define IPATH_S_SIGNAL_REQ_WR	1
+
+#define IPATH_PSN_CREDIT	2048
 
 /*
  * Since struct ipath_swqe is not a fixed size, we can't simply index into
@@ -521,6 +524,7 @@ struct ipath_ibdev {
 	u32 n_rnr_naks;
 	u32 n_other_naks;
 	u32 n_timeouts;
+	u32 n_rc_stalls;
 	u32 n_pkt_drops;
 	u32 n_vl15_dropped;
 	u32 n_wqe_errs;


From bos at pathscale.com  Thu Sep 28 09:00:00 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:00 -0700
Subject: [openib-general] [PATCH 4 of 28] IB/ipath - support revision 2
 InfiniPath PCIE devices
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <a69f8b7a8a04a8742e0f.1159459200@eng-12.pathscale.com>

This also entailed a little GPIO-interrupt general cleanup.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:12 2006 -0700
@@ -186,6 +186,8 @@ typedef enum _ipath_ureg {
 #define IPATH_RUNTIME_FORCE_WC_ORDER	0x4
 #define IPATH_RUNTIME_RCVHDR_COPY	0x8
 #define IPATH_RUNTIME_MASTER	0x10
+#define IPATH_RUNTIME_PBC_REWRITE 0x20
+#define IPATH_RUNTIME_LOOSE_DMA_ALIGN 0x40
 
 /*
  * This structure is returned by ipath_userinit() immediately after
diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
@@ -294,6 +294,13 @@ static const struct ipath_cregs ipath_pe
 #define IPATH_GPIO_SCL (1ULL << \
 	(_IPATH_GPIO_SCL_NUM+INFINIPATH_EXTC_GPIOOE_SHIFT))
 
+/*
+ * Rev2 silicon allows suppressing check for ArmLaunch errors.
+ * this can speed up short packet sends on systems that do
+ * not guaranteee write-order.
+ */
+#define INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR (1ULL<<63)
+
 /**
  * ipath_pe_handle_hwerrors - display hardware errors.
  * @dd: the infinipath device
@@ -571,9 +578,12 @@ static void ipath_pe_init_hwerrors(struc
 	if (!dd->ipath_boardrev)	// no PLL for Emulator
 		val &= ~INFINIPATH_HWE_SERDESPLLFAILED;
 
-	/* workaround bug 9460 in internal interface bus parity checking */
-	val &= ~INFINIPATH_HWE_PCIEBUSPARITYRADM;
-
+	if (dd->ipath_minrev < 2) {
+		/* workaround bug 9460 in internal interface bus parity
+		 * checking. Fixed (HW bug 9490) in Rev2.
+		 */
+		val &= ~INFINIPATH_HWE_PCIEBUSPARITYRADM;
+	}
 	dd->ipath_hwerrmask = val;
 }
 
@@ -583,8 +593,8 @@ static void ipath_pe_init_hwerrors(struc
  */
 static int ipath_pe_bringup_serdes(struct ipath_devdata *dd)
 {
-	u64 val, tmp, config1;
-	int ret = 0, change = 0;
+	u64 val, tmp, config1, prev_val;
+	int ret = 0;
 
 	ipath_dbg("Trying to bringup serdes\n");
 
@@ -641,6 +651,7 @@ static int ipath_pe_bringup_serdes(struc
 	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
 
 	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig);
+	prev_val = val;
 	if (((val >> INFINIPATH_XGXS_MDIOADDR_SHIFT) &
 	     INFINIPATH_XGXS_MDIOADDR_MASK) != 3) {
 		val &=
@@ -648,11 +659,9 @@ static int ipath_pe_bringup_serdes(struc
 			  INFINIPATH_XGXS_MDIOADDR_SHIFT);
 		/* MDIO address 3 */
 		val |= 3ULL << INFINIPATH_XGXS_MDIOADDR_SHIFT;
-		change = 1;
 	}
 	if (val & INFINIPATH_XGXS_RESET) {
 		val &= ~INFINIPATH_XGXS_RESET;
-		change = 1;
 	}
 	if (((val >> INFINIPATH_XGXS_RX_POL_SHIFT) &
 	     INFINIPATH_XGXS_RX_POL_MASK) != dd->ipath_rx_pol_inv ) {
@@ -661,9 +670,19 @@ static int ipath_pe_bringup_serdes(struc
 		         INFINIPATH_XGXS_RX_POL_SHIFT);
 		val |= dd->ipath_rx_pol_inv <<
 			INFINIPATH_XGXS_RX_POL_SHIFT;
-		change = 1;
-	}
-	if (change)
+	}
+	if (dd->ipath_minrev >= 2) {
+		/* Rev 2. can tolerate multiple writes to PBC, and
+		 * allowing them can provide lower latency on some
+		 * CPUs, but this feature is off by default, only
+		 * turned on by setting D63 of XGXSconfig reg.
+		 * May want to make this conditional more
+		 * fine-grained in future. This is not exactly
+		 * related to XGXS, but where the bit ended up.
+		 */
+		val |= INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR;
+	}
+	if (val != prev_val)
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
 
 	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_serdesconfig0);
@@ -717,9 +736,25 @@ static void ipath_pe_quiet_serdes(struct
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_serdesconfig0, val);
 }
 
-/* this is not yet needed on this chip, so just return 0. */
 static int ipath_pe_intconfig(struct ipath_devdata *dd)
 {
+	u64 val;
+	u32 chiprev;
+
+	/* 
+	 * If the chip supports added error indication via GPIO pins,
+	 * enable interrupts on those bits so the interrupt routine
+	 * can count the events. Also set flag so interrupt routine
+	 * can know they are expected.
+	 */
+	chiprev = dd->ipath_revision >> INFINIPATH_R_CHIPREVMINOR_SHIFT;
+	if ((chiprev & INFINIPATH_R_CHIPREVMINOR_MASK) > 1) {
+		/* Rev2+ reports extra errors via internal GPIO pins */
+		dd->ipath_flags |= IPATH_GPIO_ERRINTRS;
+		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_mask);
+		val |= IPATH_GPIO_ERRINTR_MASK;
+		ipath_write_kreg( dd, dd->ipath_kregs->kr_gpio_mask, val);
+	}
 	return 0;
 }
 
@@ -1082,6 +1117,45 @@ static void ipath_pe_put_tid(struct ipat
 	mmiowb();
 	spin_unlock_irqrestore(&dd->ipath_tid_lock, flags);
 }
+/**
+ * ipath_pe_put_tid_2 - write a TID in chip, Revision 2 or higher
+ * @dd: the infinipath device
+ * @tidptr: pointer to the expected TID (in chip) to udpate
+ * @tidtype: 0 for eager, 1 for expected
+ * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing
+ *
+ * This exists as a separate routine to allow for selection of the
+ * appropriate "flavor". The static calls in cleanup just use the
+ * revision-agnostic form, as they are not performance critical.
+ */
+static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr,
+			     u32 type, unsigned long pa)
+{
+	u32 __iomem *tidp32 = (u32 __iomem *)tidptr;
+
+	if (pa != dd->ipath_tidinvalid) {
+		if (pa & ((1U << 11) - 1)) {
+			dev_info(&dd->pcidev->dev, "BUG: physaddr %lx "
+				 "not 4KB aligned!\n", pa);
+			return;
+		}
+		pa >>= 11;
+		/* paranoia check */
+		if (pa & (7<<29))
+			ipath_dev_err(dd,
+				      "BUG: Physical page address 0x%lx "
+				      "has bits set in 31-29\n", pa);
+
+		if (type == 0)
+			pa |= dd->ipath_tidtemplate;
+		else /* for now, always full 4KB page */
+			pa |= 2 << 29;
+	}
+	if (dd->ipath_kregbase)
+		writel(pa, tidp32);
+	mmiowb();
+}
+
 
 /**
  * ipath_pe_clear_tid - clear all TID entries for a port, expected and eager
@@ -1203,7 +1277,7 @@ int __attribute__((weak)) ipath_unordere
 
 /**
  * ipath_init_pe_get_base_info - set chip-specific flags for user code
- * @dd: the infinipath device
+ * @pd: the infinipath port
  * @kbase: ipath_base_info pointer
  *
  * We set the PCIE flag because the lower bandwidth on PCIe vs
@@ -1212,6 +1286,7 @@ static int ipath_pe_get_base_info(struct
 static int ipath_pe_get_base_info(struct ipath_portdata *pd, void *kbase)
 {
 	struct ipath_base_info *kinfo = kbase;
+	struct ipath_devdata *dd;
 
 	if (ipath_unordered_wc()) {
 		kinfo->spi_runtime_flags |= IPATH_RUNTIME_FORCE_WC_ORDER;
@@ -1220,8 +1295,20 @@ static int ipath_pe_get_base_info(struct
 	else
 		ipath_cdbg(PROC, "Not Intel processor, WC ordered\n");
 
+	if (pd == NULL)
+		goto done;
+
+	dd = pd->port_dd;
+
+	if (dd != NULL && dd->ipath_minrev >= 2) {
+		ipath_cdbg(PROC, "IBA6120 Rev2, allow multiple PBC write\n");
+		kinfo->spi_runtime_flags |= IPATH_RUNTIME_PBC_REWRITE;
+		ipath_cdbg(PROC, "IBA6120 Rev2, allow loose DMA alignment\n");
+		kinfo->spi_runtime_flags |= IPATH_RUNTIME_LOOSE_DMA_ALIGN;
+	}
+
+done:
 	kinfo->spi_runtime_flags |= IPATH_RUNTIME_PCIE;
-
 	return 0;
 }
 
@@ -1244,7 +1331,10 @@ void ipath_init_iba6120_funcs(struct ipa
 	dd->ipath_f_quiet_serdes = ipath_pe_quiet_serdes;
 	dd->ipath_f_bringup_serdes = ipath_pe_bringup_serdes;
 	dd->ipath_f_clear_tids = ipath_pe_clear_tids;
-	dd->ipath_f_put_tid = ipath_pe_put_tid;
+	if (dd->ipath_minrev >= 2)
+		dd->ipath_f_put_tid = ipath_pe_put_tid_2;
+	else
+		dd->ipath_f_put_tid = ipath_pe_put_tid;
 	dd->ipath_f_cleanup = ipath_setup_pe_cleanup;
 	dd->ipath_f_setextled = ipath_setup_pe_setextled;
 	dd->ipath_f_get_base_info = ipath_pe_get_base_info;
diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
@@ -808,7 +808,7 @@ irqreturn_t ipath_intr(int irq, void *da
 	if (oldhead != curtail) {
 		if (dd->ipath_flags & IPATH_GPIO_INTR) {
 			ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
-					 (u64) (1 << 2));
+					 (u64) (1 << IPATH_GPIO_PORT0_BIT));
 			istat = port0rbits | INFINIPATH_I_GPIO;
 		}
 		else
@@ -867,25 +867,79 @@ irqreturn_t ipath_intr(int irq, void *da
 
 	if (istat & INFINIPATH_I_GPIO) {
 		/*
-		 * Packets are available in the port 0 rcv queue.
-		 * Eventually this needs to be generalized to check
-		 * IPATH_GPIO_INTR, and the specific GPIO bit, if
-		 * GPIO interrupts are used for anything else.
-		 */
-		if (unlikely(!(dd->ipath_flags & IPATH_GPIO_INTR))) {
-			u32 gpiostatus;
-			gpiostatus = ipath_read_kreg32(
-				dd, dd->ipath_kregs->kr_gpio_status);
-			ipath_dbg("Unexpected GPIO interrupt bits %x\n",
-				  gpiostatus);
+		 * GPIO interrupts fall in two broad classes:
+		 * GPIO_2 indicates (on some HT4xx boards) that a packet
+		 *        has arrived for Port 0. Checking for this
+		 *        is controlled by flag IPATH_GPIO_INTR.
+		 * GPIO_3..5 on IBA6120 Rev2 chips indicate errors
+		 *        that we need to count. Checking for this
+		 *        is controlled by flag IPATH_GPIO_ERRINTRS.
+		 */
+		u32 gpiostatus;
+		u32 to_clear = 0;
+
+		gpiostatus = ipath_read_kreg32(
+			dd, dd->ipath_kregs->kr_gpio_status);
+		/* First the error-counter case.
+		 */
+		if ((gpiostatus & IPATH_GPIO_ERRINTR_MASK) &&
+		    (dd->ipath_flags & IPATH_GPIO_ERRINTRS)) {
+			/* want to clear the bits we see asserted. */
+			to_clear |= (gpiostatus & IPATH_GPIO_ERRINTR_MASK);
+
+			/*
+			 * Count appropriately, clear bits out of our copy,
+			 * as they have been "handled".
+			 */
+			if (gpiostatus & (1 << IPATH_GPIO_RXUVL_BIT)) {
+				ipath_dbg("FlowCtl on UnsupVL\n");
+				dd->ipath_rxfc_unsupvl_errs++;
+			}
+			if (gpiostatus & (1 << IPATH_GPIO_OVRUN_BIT)) {
+				ipath_dbg("Overrun Threshold exceeded\n");
+				dd->ipath_overrun_thresh_errs++;
+			}
+			if (gpiostatus & (1 << IPATH_GPIO_LLI_BIT)) {
+				ipath_dbg("Local Link Integrity error\n");
+				dd->ipath_lli_errs++;
+			}
+			gpiostatus &= ~IPATH_GPIO_ERRINTR_MASK;
+		}
+		/* Now the Port0 Receive case */
+		if ((gpiostatus & (1 << IPATH_GPIO_PORT0_BIT)) &&
+		    (dd->ipath_flags & IPATH_GPIO_INTR)) {
+			/*
+			 * GPIO status bit 2 is set, and we expected it.
+			 * clear it and indicate in p0bits.
+			 * This probably only happens if a Port0 pkt
+			 * arrives at _just_ the wrong time, and we
+			 * handle that by seting chk0rcv;
+			 */
+			to_clear |= (1 << IPATH_GPIO_PORT0_BIT);
+			gpiostatus &= ~(1 << IPATH_GPIO_PORT0_BIT);
+			chk0rcv = 1;
+		}
+		if (unlikely(gpiostatus)) {
+			/*
+			 * Some unexpected bits remain. If they could have
+			 * caused the interrupt, complain and clear.
+			 * MEA: this is almost certainly non-ideal.
+			 * we should look into auto-disable of unexpected
+			 * GPIO interrupts, possibly on a "three strikes"
+			 * basis.
+			 */
+			u32 mask;
+			mask = ipath_read_kreg32(
+				dd, dd->ipath_kregs->kr_gpio_mask);
+			if (mask & gpiostatus) {
+				ipath_dbg("Unexpected GPIO IRQ bits %x\n",
+				  gpiostatus & mask);
+				to_clear |= (gpiostatus & mask);
+			}
+		}
+		if (to_clear) {
 			ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
-					 gpiostatus);
-		}
-		else {
-			/* Clear GPIO status bit 2 */
-			ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
-					(u64) (1 << 2));
-			chk0rcv = 1;
+					(u64) to_clear);
 		}
 	}
 	chk0rcv |= istat & port0rbits;
diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
@@ -524,6 +524,15 @@ struct ipath_devdata {
 	u32 ipath_lli_counter;
 	/* local link integrity errors */
 	u32 ipath_lli_errors;
+	/*
+	 * Above counts only cases where _successive_ LocalLinkIntegrity
+	 * errors were seen in the receive headers of kern-packets.
+	 * Below are the three (monotonically increasing) counters
+	 * maintained via GPIO interrupts on iba6120-rev2.
+	 */
+	u32 ipath_rxfc_unsupvl_errs;
+	u32 ipath_overrun_thresh_errs;
+	u32 ipath_lli_errs;
 };
 
 /* Private data for file operations */
@@ -636,6 +645,15 @@ int ipath_set_rx_pol_inv(struct ipath_de
 		/* can miss port0 rx interrupts */
 #define IPATH_POLL_RX_INTR  0x40000
 #define IPATH_DISABLED      0x80000 /* administratively disabled */
+		/* Use GPIO interrupts for new counters */    
+#define IPATH_GPIO_ERRINTRS 0x100000
+
+/* Bits in GPIO for the added interrupts */
+#define IPATH_GPIO_PORT0_BIT 2
+#define IPATH_GPIO_RXUVL_BIT 3
+#define IPATH_GPIO_OVRUN_BIT 4
+#define IPATH_GPIO_LLI_BIT 5
+#define IPATH_GPIO_ERRINTR_MASK 0x38
 
 /* portdata flag bit offsets */
 		/* waiting for a packet to arrive */
diff -r 7f5b6127be15 -r a69f8b7a8a04 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c	Thu Sep 28 08:57:12 2006 -0700
@@ -898,7 +898,8 @@ int ipath_get_counters(struct ipath_devd
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_erricrccnt) +
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_errvcrccnt) +
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_errlpcrccnt) +
-		ipath_snap_cntr(dd, dd->ipath_cregs->cr_badformatcnt);
+		ipath_snap_cntr(dd, dd->ipath_cregs->cr_badformatcnt) +
+		dd->ipath_rxfc_unsupvl_errs;
 	cntrs->port_rcv_remphys_errors =
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_rcvebpcnt);
 	cntrs->port_xmit_discards =
@@ -911,8 +912,10 @@ int ipath_get_counters(struct ipath_devd
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktsendcnt);
 	cntrs->port_rcv_packets =
 		ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktrcvcnt);
-	cntrs->local_link_integrity_errors = dd->ipath_lli_errors;
-	cntrs->excessive_buffer_overrun_errors = 0; /* XXX */
+	cntrs->local_link_integrity_errors =
+		(dd->ipath_flags & IPATH_GPIO_ERRINTRS) ?
+		dd->ipath_lli_errs : dd->ipath_lli_errors;
+	cntrs->excessive_buffer_overrun_errors = dd->ipath_overrun_thresh_errs;
 
 	ret = 0;
 
@@ -1380,11 +1383,13 @@ static int enable_timer(struct ipath_dev
 	 * processing.
 	 */
 	if (dd->ipath_flags & IPATH_GPIO_INTR) {
+		u64 val;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_debugportselect,
 				 0x2074076542310ULL);
 		/* Enable GPIO bit 2 interrupt */
-		ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_mask,
-				 (u64) (1 << 2));
+		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_mask);
+		val |= (u64) (1 << IPATH_GPIO_PORT0_BIT);
+		ipath_write_kreg( dd, dd->ipath_kregs->kr_gpio_mask, val);
 	}
 
 	init_timer(&dd->verbs_timer);
@@ -1399,8 +1404,17 @@ static int disable_timer(struct ipath_de
 static int disable_timer(struct ipath_devdata *dd)
 {
 	/* Disable GPIO bit 2 interrupt */
-	if (dd->ipath_flags & IPATH_GPIO_INTR)
-		ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_mask, 0);
+	if (dd->ipath_flags & IPATH_GPIO_INTR) {
+                u64 val;
+                /* Disable GPIO bit 2 interrupt */
+                val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_gpio_mask);
+                val &= ~((u64) (1 << IPATH_GPIO_PORT0_BIT));
+                ipath_write_kreg( dd, dd->ipath_kregs->kr_gpio_mask, val);
+		/*
+		 * We might want to undo changes to debugportselect,
+		 * but how?
+		 */
+	}
 
 	del_timer_sync(&dd->verbs_timer);
 

From bos at pathscale.com  Thu Sep 28 09:00:01 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:01 -0700
Subject: [openib-general] [PATCH 5 of 28] IB/ipath - unregister from IB core
	early
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <e2916bbf09ed8c3adc05.1159459201@eng-12.pathscale.com>

This gives upper-level protocols a chance to unregister while the device
is still usable.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r a69f8b7a8a04 -r e2916bbf09ed drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
@@ -536,7 +536,12 @@ static void __devexit ipath_remove_one(s
 		return;
 
 	dd = pci_get_drvdata(pdev);
-	ipath_unregister_ib_device(dd->verbs_dev);
+
+	if (dd->verbs_dev) {
+		ipath_unregister_ib_device(dd->verbs_dev);
+		dd->verbs_dev = NULL;
+	}
+
 	ipath_diag_remove(dd);
 	ipath_user_remove(dd);
 	ipathfs_remove_device(dd);
@@ -2027,6 +2032,11 @@ static void __exit infinipath_cleanup(vo
 	list_for_each_entry_safe(dd, tmp, &ipath_dev_list, ipath_list) {
 		spin_unlock_irqrestore(&ipath_devs_lock, flags);
 
+		if (dd->verbs_dev) {
+			ipath_unregister_ib_device(dd->verbs_dev);
+			dd->verbs_dev = NULL;
+		}
+
 		if (dd->ipath_kregbase)
 			cleanup_device(dd);
 

From bos at pathscale.com  Thu Sep 28 08:59:59 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 08:59:59 -0700
Subject: [openib-general] [PATCH 3 of 28] IB/ipath - driver support for
 userspace sharing of HW contexts
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <7f5b6127be15cded56e1.1159459199@eng-12.pathscale.com>

This allows multiple userspace processes to share a single hardware
context in a master/slave arrangement.  It is backwards binary compatible
with existing userspace.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:12 2006 -0700
@@ -185,6 +185,7 @@ typedef enum _ipath_ureg {
 #define IPATH_RUNTIME_PCIE	0x2
 #define IPATH_RUNTIME_FORCE_WC_ORDER	0x4
 #define IPATH_RUNTIME_RCVHDR_COPY	0x8
+#define IPATH_RUNTIME_MASTER	0x10
 
 /*
  * This structure is returned by ipath_userinit() immediately after
@@ -202,7 +203,8 @@ struct ipath_base_info {
 	/* version of software, for feature checking. */
 	__u32 spi_sw_version;
 	/* InfiniPath port assigned, goes into sent packets */
-	__u32 spi_port;
+	__u16 spi_port;
+	__u16 spi_subport;
 	/*
 	 * IB MTU, packets IB data must be less than this.
 	 * The MTU is in bytes, and will be a multiple of 4 bytes.
@@ -218,7 +220,7 @@ struct ipath_base_info {
 	__u32 spi_tidcnt;
 	/* size of the TID Eager list in infinipath, in entries */
 	__u32 spi_tidegrcnt;
-	/* size of a single receive header queue entry. */
+	/* size of a single receive header queue entry in words. */
 	__u32 spi_rcvhdrent_size;
 	/*
 	 * Count of receive header queue entries allocated.
@@ -310,6 +312,12 @@ struct ipath_base_info {
 	__u32 spi_filler_for_align;
 	/* address of readonly memory copy of the rcvhdrq tail register. */
 	__u64 spi_rcvhdr_tailaddr;
+
+	/* shared memory pages for subports if IPATH_RUNTIME_MASTER is set */
+	__u64 spi_subport_uregbase;
+	__u64 spi_subport_rcvegrbuf;
+	__u64 spi_subport_rcvhdr_base;
+
 } __attribute__ ((aligned(8)));
 
 
@@ -328,12 +336,12 @@ struct ipath_base_info {
 
 /*
  * Minor version differences are always compatible
- * a within a major version, however if if user software is larger
+ * a within a major version, however if user software is larger
  * than driver software, some new features and/or structure fields
  * may not be implemented; the user code must deal with this if it
- * cares, or it must abort after initialization reports the difference
- */
-#define IPATH_USER_SWMINOR 2
+ * cares, or it must abort after initialization reports the difference.
+ */
+#define IPATH_USER_SWMINOR 3
 
 #define IPATH_USER_SWVERSION ((IPATH_USER_SWMAJOR<<16) | IPATH_USER_SWMINOR)
 
@@ -379,7 +387,16 @@ struct ipath_user_info {
 	 */
 	__u32 spu_rcvhdrsize;
 
-	__u64 spu_unused; /* kept for compatible layout */
+	/*
+	 * If two or more processes wish to share a port, each process
+	 * must set the spu_subport_cnt and spu_subport_id to the same
+	 * values.  The only restriction on the spu_subport_id is that
+	 * it be unique for a given node.
+	 */
+	__u16 spu_subport_cnt;
+	__u16 spu_subport_id;
+
+	__u32 spu_unused; /* kept for compatible layout */
 
 	/*
 	 * address of struct base_info to write to
@@ -398,13 +415,17 @@ struct ipath_user_info {
 #define IPATH_CMD_TID_UPDATE	19	/* update expected TID entries */
 #define IPATH_CMD_TID_FREE	20	/* free expected TID entries */
 #define IPATH_CMD_SET_PART_KEY	21	/* add partition key */
-
-#define IPATH_CMD_MAX		21
+#define IPATH_CMD_SLAVE_INFO	22	/* return info on slave processes */
+
+#define IPATH_CMD_MAX		22
 
 struct ipath_port_info {
 	__u32 num_active;	/* number of active units */
 	__u32 unit;		/* unit (chip) assigned to caller */
-	__u32 port;		/* port on unit assigned to caller */
+	__u16 port;		/* port on unit assigned to caller */
+	__u16 subport;		/* subport on unit assigned to caller */
+	__u16 num_ports;	/* number of ports available on unit */
+	__u16 num_subports;	/* number of subport slaves opened on port */
 };
 
 struct ipath_tid_info {
@@ -435,6 +456,8 @@ struct ipath_cmd {
 		__u32 recv_ctrl;
 		/* partition key to set */
 		__u16 part_key;
+		/* user address of __u32 bitmask of active slaves */
+		__u64 slave_mask_addr;
 	} cmd;
 };
 
@@ -596,6 +619,10 @@ struct infinipath_counters {
 
 /* K_PktFlags bits */
 #define INFINIPATH_KPF_INTR 0x1
+#define INFINIPATH_KPF_SUBPORT_MASK 0x3
+#define INFINIPATH_KPF_SUBPORT_SHIFT 1
+
+#define INFINIPATH_MAX_SUBPORT	4
 
 /* SendPIO per-buffer control */
 #define INFINIPATH_SP_TEST    0x40
@@ -610,7 +637,7 @@ struct ipath_header {
 	/*
 	 * Version - 4 bits, Port - 4 bits, TID - 10 bits and Offset -
 	 * 14 bits before ECO change ~28 Dec 03.  After that, Vers 4,
-	 * Port 3, TID 11, offset 14.
+	 * Port 4, TID 11, offset 13.
 	 */
 	__le32 ver_port_tid_offset;
 	__le16 chksum;
diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1827,9 +1827,9 @@ void ipath_free_pddata(struct ipath_devd
 			dma_free_coherent(&dd->pcidev->dev, size,
 				base, pd->port_rcvegrbuf_phys[e]);
 		}
-		vfree(pd->port_rcvegrbuf);
+		kfree(pd->port_rcvegrbuf);
 		pd->port_rcvegrbuf = NULL;
-		vfree(pd->port_rcvegrbuf_phys);
+		kfree(pd->port_rcvegrbuf_phys);
 		pd->port_rcvegrbuf_phys = NULL;
 		pd->port_rcvegrbuf_chunks = 0;
 	} else if (pd->port_port == 0 && dd->ipath_port0_skbs) {
@@ -1845,6 +1845,9 @@ void ipath_free_pddata(struct ipath_devd
 		vfree(skbs);
 	}
 	kfree(pd->port_tid_pg_list);
+	vfree(pd->subport_uregbase);
+	vfree(pd->subport_rcvegrbuf);
+	vfree(pd->subport_rcvhdr_base);
 	kfree(pd);
 }
 
diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:12 2006 -0700
@@ -41,6 +41,12 @@
 #include "ipath_kernel.h"
 #include "ipath_common.h"
 
+/*
+ * mmap64 doesn't allow all 64 bits for 32-bit applications
+ * so only use the low 43 bits.
+ */
+#define MMAP64_MASK	0x7FFFFFFFFFFUL
+
 static int ipath_open(struct inode *, struct file *);
 static int ipath_close(struct inode *, struct file *);
 static ssize_t ipath_write(struct file *, const char __user *, size_t,
@@ -57,18 +63,35 @@ static struct file_operations ipath_file
 	.mmap = ipath_mmap
 };
 
-static int ipath_get_base_info(struct ipath_portdata *pd,
+static int ipath_get_base_info(struct file *fp,
 			       void __user *ubase, size_t ubase_size)
 {
+	struct ipath_portdata *pd = port_fp(fp);
 	int ret = 0;
 	struct ipath_base_info *kinfo = NULL;
 	struct ipath_devdata *dd = pd->port_dd;
-
-	if (ubase_size < sizeof(*kinfo)) {
+	unsigned subport_cnt;
+	int shared, master;
+	size_t sz;
+
+	subport_cnt = pd->port_subport_cnt;
+	if (!subport_cnt) {
+		shared = 0;
+		master = 0;
+		subport_cnt = 1;
+	} else {
+		shared = 1;
+		master = !subport_fp(fp);
+	}
+
+	sz = sizeof(*kinfo);
+	/* If port sharing is not requested, allow the old size structure */
+	if (!shared)
+		sz -= 3 * sizeof(u64);
+	if (ubase_size < sz) {
 		ipath_cdbg(PROC,
-			   "Base size %lu, need %lu (version mismatch?)\n",
-			   (unsigned long) ubase_size,
-			   (unsigned long) sizeof(*kinfo));
+			   "Base size %zu, need %zu (version mismatch?)\n",
+			   ubase_size, sz);
 		ret = -EINVAL;
 		goto bail;
 	}
@@ -95,7 +118,9 @@ static int ipath_get_base_info(struct ip
 	kinfo->spi_rcv_egrperchunk = pd->port_rcvegrbufs_perchunk;
 	kinfo->spi_rcv_egrchunksize = kinfo->spi_rcv_egrbuftotlen /
 		pd->port_rcvegrbuf_chunks;
-	kinfo->spi_tidcnt = dd->ipath_rcvtidcnt;
+	kinfo->spi_tidcnt = dd->ipath_rcvtidcnt / subport_cnt;
+	if (master)
+		kinfo->spi_tidcnt += dd->ipath_rcvtidcnt % subport_cnt;
 	/*
 	 * for this use, may be ipath_cfgports summed over all chips that
 	 * are are configured and present
@@ -118,30 +143,75 @@ static int ipath_get_base_info(struct ip
 	 * page_address() macro worked, but in 2.6.11, even that returns the
 	 * full 64 bit address (upper bits all 1's).  So far, using the
 	 * physical addresses (or chip offsets, for chip mapping) works, but
-	 * no doubt some future kernel release will chang that, and we'll be
-	 * on to yet another method of dealing with this
+	 * no doubt some future kernel release will change that, and we'll be
+	 * on to yet another method of dealing with this.
 	 */
 	kinfo->spi_rcvhdr_base = (u64) pd->port_rcvhdrq_phys;
-	kinfo->spi_rcvhdr_tailaddr = (u64)pd->port_rcvhdrqtailaddr_phys;
+	kinfo->spi_rcvhdr_tailaddr = (u64) pd->port_rcvhdrqtailaddr_phys;
 	kinfo->spi_rcv_egrbufs = (u64) pd->port_rcvegr_phys;
 	kinfo->spi_pioavailaddr = (u64) dd->ipath_pioavailregs_phys;
 	kinfo->spi_status = (u64) kinfo->spi_pioavailaddr +
 		(void *) dd->ipath_statusp -
 		(void *) dd->ipath_pioavailregs_dma;
-	kinfo->spi_piobufbase = (u64) pd->port_piobufs;
-	kinfo->__spi_uregbase =
-		dd->ipath_uregbase + dd->ipath_palign * pd->port_port;
-
-	kinfo->spi_pioindex = dd->ipath_pbufsport * (pd->port_port - 1);
-	kinfo->spi_piocnt = dd->ipath_pbufsport;
+	if (!shared) {
+		kinfo->spi_piocnt = dd->ipath_pbufsport;
+		kinfo->spi_piobufbase = (u64) pd->port_piobufs;
+		kinfo->__spi_uregbase = (u64) dd->ipath_uregbase +
+			dd->ipath_palign * pd->port_port;
+	} else if (master) {
+		kinfo->spi_piocnt = (dd->ipath_pbufsport / subport_cnt) +
+				    (dd->ipath_pbufsport % subport_cnt);
+		/* Master's PIO buffers are after all the slave's */
+		kinfo->spi_piobufbase = (u64) pd->port_piobufs +
+			dd->ipath_palign *
+			(dd->ipath_pbufsport - kinfo->spi_piocnt);
+		kinfo->__spi_uregbase = (u64) dd->ipath_uregbase +
+			dd->ipath_palign * pd->port_port;
+	} else {
+		unsigned slave = subport_fp(fp) - 1;
+
+		kinfo->spi_piocnt = dd->ipath_pbufsport / subport_cnt;
+		kinfo->spi_piobufbase = (u64) pd->port_piobufs +
+			dd->ipath_palign * kinfo->spi_piocnt * slave;
+		kinfo->__spi_uregbase = ((u64) pd->subport_uregbase +
+			PAGE_SIZE * slave) & MMAP64_MASK;
+
+		kinfo->spi_rcvhdr_base = ((u64) pd->subport_rcvhdr_base +
+			pd->port_rcvhdrq_size * slave) & MMAP64_MASK;
+		kinfo->spi_rcvhdr_tailaddr =
+			(u64) pd->port_rcvhdrqtailaddr_phys & MMAP64_MASK;
+		kinfo->spi_rcv_egrbufs = ((u64) pd->subport_rcvegrbuf +
+			dd->ipath_rcvegrcnt * dd->ipath_rcvegrbufsize * slave) &
+			MMAP64_MASK;
+	}
+
+	kinfo->spi_pioindex = (kinfo->spi_piobufbase - dd->ipath_piobufbase) /
+		dd->ipath_palign;
 	kinfo->spi_pioalign = dd->ipath_palign;
 
 	kinfo->spi_qpair = IPATH_KD_QP;
 	kinfo->spi_piosize = dd->ipath_ibmaxlen;
 	kinfo->spi_mtu = dd->ipath_ibmaxlen;	/* maxlen, not ibmtu */
 	kinfo->spi_port = pd->port_port;
+	kinfo->spi_subport = subport_fp(fp);
 	kinfo->spi_sw_version = IPATH_KERN_SWVERSION;
 	kinfo->spi_hw_version = dd->ipath_revision;
+
+	if (master) {
+		kinfo->spi_runtime_flags |= IPATH_RUNTIME_MASTER;
+		kinfo->spi_subport_uregbase =
+			(u64) pd->subport_uregbase & MMAP64_MASK;
+		kinfo->spi_subport_rcvegrbuf =
+			(u64) pd->subport_rcvegrbuf & MMAP64_MASK;
+		kinfo->spi_subport_rcvhdr_base =
+			(u64) pd->subport_rcvhdr_base & MMAP64_MASK;
+		ipath_cdbg(PROC, "port %u flags %x %llx %llx %llx\n",
+			kinfo->spi_port,
+			kinfo->spi_runtime_flags,
+			kinfo->spi_subport_uregbase,
+			kinfo->spi_subport_rcvegrbuf,
+			kinfo->spi_subport_rcvhdr_base);
+	}
 
 	if (copy_to_user(ubase, kinfo, sizeof(*kinfo)))
 		ret = -EFAULT;
@@ -154,6 +224,7 @@ bail:
 /**
  * ipath_tid_update - update a port TID
  * @pd: the port
+ * @fp: the ipath device file
  * @ti: the TID information
  *
  * The new implementation as of Oct 2004 is that the driver assigns
@@ -176,11 +247,11 @@ bail:
  * virtually contiguous pages, that should change to improve
  * performance.
  */
-static int ipath_tid_update(struct ipath_portdata *pd,
+static int ipath_tid_update(struct ipath_portdata *pd, struct file *fp,
 			    const struct ipath_tid_info *ti)
 {
 	int ret = 0, ntids;
-	u32 tid, porttid, cnt, i, tidcnt;
+	u32 tid, porttid, cnt, i, tidcnt, tidoff;
 	u16 *tidlist;
 	struct ipath_devdata *dd = pd->port_dd;
 	u64 physaddr;
@@ -188,6 +259,7 @@ static int ipath_tid_update(struct ipath
 	u64 __iomem *tidbase;
 	unsigned long tidmap[8];
 	struct page **pagep = NULL;
+	unsigned subport = subport_fp(fp);
 
 	if (!dd->ipath_pageshadow) {
 		ret = -ENOMEM;
@@ -204,20 +276,34 @@ static int ipath_tid_update(struct ipath
 		ret = -EFAULT;
 		goto done;
 	}
-	tidcnt = dd->ipath_rcvtidcnt;
-	if (cnt >= tidcnt) {
+	porttid = pd->port_port * dd->ipath_rcvtidcnt;
+	if (!pd->port_subport_cnt) {
+		tidcnt = dd->ipath_rcvtidcnt;
+		tid = pd->port_tidcursor;
+		tidoff = 0;
+	} else if (!subport) {
+		tidcnt = (dd->ipath_rcvtidcnt / pd->port_subport_cnt) +
+			 (dd->ipath_rcvtidcnt % pd->port_subport_cnt);
+		tidoff = dd->ipath_rcvtidcnt - tidcnt;
+		porttid += tidoff;
+		tid = tidcursor_fp(fp);
+	} else {
+		tidcnt = dd->ipath_rcvtidcnt / pd->port_subport_cnt;
+		tidoff = tidcnt * (subport - 1);
+		porttid += tidoff;
+		tid = tidcursor_fp(fp);
+	}
+	if (cnt > tidcnt) {
 		/* make sure it all fits in port_tid_pg_list */
 		dev_info(&dd->pcidev->dev, "Process tried to allocate %u "
 			 "TIDs, only trying max (%u)\n", cnt, tidcnt);
 		cnt = tidcnt;
 	}
-	pagep = (struct page **)pd->port_tid_pg_list;
-	tidlist = (u16 *) (&pagep[cnt]);
+	pagep = &((struct page **) pd->port_tid_pg_list)[tidoff];
+	tidlist = &((u16 *) &pagep[dd->ipath_rcvtidcnt])[tidoff];
 
 	memset(tidmap, 0, sizeof(tidmap));
-	tid = pd->port_tidcursor;
 	/* before decrement; chip actual # */
-	porttid = pd->port_port * tidcnt;
 	ntids = tidcnt;
 	tidbase = (u64 __iomem *) (((char __iomem *) dd->ipath_kregbase) +
 				   dd->ipath_rcvtidbase +
@@ -274,9 +360,9 @@ static int ipath_tid_update(struct ipath
 			ret = -ENOMEM;
 			break;
 		}
-		tidlist[i] = tid;
+		tidlist[i] = tid + tidoff;
 		ipath_cdbg(VERBOSE, "Updating idx %u to TID %u, "
-			   "vaddr %lx\n", i, tid, vaddr);
+			   "vaddr %lx\n", i, tid + tidoff, vaddr);
 		/* we "know" system pages and TID pages are same size */
 		dd->ipath_pageshadow[porttid + tid] = pagep[i];
 		/*
@@ -341,7 +427,10 @@ static int ipath_tid_update(struct ipath
 		}
 		if (tid == tidcnt)
 			tid = 0;
-		pd->port_tidcursor = tid;
+		if (!pd->port_subport_cnt)
+			pd->port_tidcursor = tid;
+		else
+			tidcursor_fp(fp) = tid;
 	}
 
 done:
@@ -354,6 +443,7 @@ done:
 /**
  * ipath_tid_free - free a port TID
  * @pd: the port
+ * @subport: the subport
  * @ti: the TID info
  *
  * right now we are unlocking one page at a time, but since
@@ -367,7 +457,7 @@ done:
  * they pass in to us.
  */
 
-static int ipath_tid_free(struct ipath_portdata *pd,
+static int ipath_tid_free(struct ipath_portdata *pd, unsigned subport,
 			  const struct ipath_tid_info *ti)
 {
 	int ret = 0;
@@ -388,11 +478,20 @@ static int ipath_tid_free(struct ipath_p
 	}
 
 	porttid = pd->port_port * dd->ipath_rcvtidcnt;
+	if (!pd->port_subport_cnt)
+		tidcnt = dd->ipath_rcvtidcnt;
+	else if (!subport) {
+		tidcnt = (dd->ipath_rcvtidcnt / pd->port_subport_cnt) +
+			 (dd->ipath_rcvtidcnt % pd->port_subport_cnt);
+		porttid += dd->ipath_rcvtidcnt - tidcnt;
+	} else {
+		tidcnt = dd->ipath_rcvtidcnt / pd->port_subport_cnt;
+		porttid += tidcnt * (subport - 1);
+	}
 	tidbase = (u64 __iomem *) ((char __iomem *)(dd->ipath_kregbase) +
 				   dd->ipath_rcvtidbase +
 				   porttid * sizeof(*tidbase));
 
-	tidcnt = dd->ipath_rcvtidcnt;
 	limit = sizeof(tidmap) * BITS_PER_BYTE;
 	if (limit > tidcnt)
 		/* just in case size changes in future */
@@ -581,20 +680,24 @@ bail:
 /**
  * ipath_manage_rcvq - manage a port's receive queue
  * @pd: the port
+ * @subport: the subport
  * @start_stop: action to carry out
  *
  * start_stop == 0 disables receive on the port, for use in queue
  * overflow conditions.  start_stop==1 re-enables, to be used to
  * re-init the software copy of the head register
  */
-static int ipath_manage_rcvq(struct ipath_portdata *pd, int start_stop)
+static int ipath_manage_rcvq(struct ipath_portdata *pd, unsigned subport,
+			     int start_stop)
 {
 	struct ipath_devdata *dd = pd->port_dd;
 	u64 tval;
 
-	ipath_cdbg(PROC, "%sabling rcv for unit %u port %u\n",
+	ipath_cdbg(PROC, "%sabling rcv for unit %u port %u:%u\n",
 		   start_stop ? "en" : "dis", dd->ipath_unit,
-		   pd->port_port);
+		   pd->port_port, subport);
+	if (subport)
+		goto bail;
 	/* atomically clear receive enable port. */
 	if (start_stop) {
 		/*
@@ -630,6 +733,7 @@ static int ipath_manage_rcvq(struct ipat
 		tval = ipath_read_ureg32(dd, ur_rcvhdrtail, pd->port_port);
 	}
 	/* always; new head should be equal to new tail; see above */
+bail:
 	return 0;
 }
 
@@ -687,6 +791,36 @@ static void ipath_clean_part_key(struct 
 	}
 }
 
+/*
+ * Initialize the port data with the receive buffer sizes
+ * so this can be done while the master port is locked.
+ * Otherwise, there is a race with a slave opening the port
+ * and seeing these fields uninitialized.
+ */
+static void init_user_egr_sizes(struct ipath_portdata *pd)
+{
+	struct ipath_devdata *dd = pd->port_dd;
+	unsigned egrperchunk, egrcnt, size;
+
+	/*
+	 * to avoid wasting a lot of memory, we allocate 32KB chunks of
+	 * physically contiguous memory, advance through it until used up
+	 * and then allocate more.  Of course, we need memory to store those
+	 * extra pointers, now.  Started out with 256KB, but under heavy
+	 * memory pressure (creating large files and then copying them over
+	 * NFS while doing lots of MPI jobs), we hit some allocation
+	 * failures, even though we can sleep...  (2.6.10) Still get
+	 * failures at 64K.  32K is the lowest we can go without wasting
+	 * additional memory.
+	 */
+	size = 0x8000;
+	egrperchunk = size / dd->ipath_rcvegrbufsize;
+	egrcnt = dd->ipath_rcvegrcnt;
+	pd->port_rcvegrbuf_chunks = (egrcnt + egrperchunk - 1) / egrperchunk;
+	pd->port_rcvegrbufs_perchunk = egrperchunk;
+	pd->port_rcvegrbuf_size = size;
+}
+
 /**
  * ipath_create_user_egr - allocate eager TID buffers
  * @pd: the port to allocate TID buffers for
@@ -702,7 +836,7 @@ static int ipath_create_user_egr(struct 
 static int ipath_create_user_egr(struct ipath_portdata *pd)
 {
 	struct ipath_devdata *dd = pd->port_dd;
-	unsigned e, egrcnt, alloced, egrperchunk, chunk, egrsize, egroff;
+	unsigned e, egrcnt, egrperchunk, chunk, egrsize, egroff;
 	size_t size;
 	int ret;
 	gfp_t gfp_flags;
@@ -722,31 +856,18 @@ static int ipath_create_user_egr(struct 
 	ipath_cdbg(VERBOSE, "Allocating %d egr buffers, at egrtid "
 		   "offset %x, egrsize %u\n", egrcnt, egroff, egrsize);
 
-	/*
-	 * to avoid wasting a lot of memory, we allocate 32KB chunks of
-	 * physically contiguous memory, advance through it until used up
-	 * and then allocate more.  Of course, we need memory to store those
-	 * extra pointers, now.  Started out with 256KB, but under heavy
-	 * memory pressure (creating large files and then copying them over
-	 * NFS while doing lots of MPI jobs), we hit some allocation
-	 * failures, even though we can sleep...  (2.6.10) Still get
-	 * failures at 64K.  32K is the lowest we can go without wasting
-	 * additional memory.
-	 */
-	size = 0x8000;
-	alloced = ALIGN(egrsize * egrcnt, size);
-	egrperchunk = size / egrsize;
-	chunk = (egrcnt + egrperchunk - 1) / egrperchunk;
-	pd->port_rcvegrbuf_chunks = chunk;
-	pd->port_rcvegrbufs_perchunk = egrperchunk;
-	pd->port_rcvegrbuf_size = size;
-	pd->port_rcvegrbuf = vmalloc(chunk * sizeof(pd->port_rcvegrbuf[0]));
+	chunk = pd->port_rcvegrbuf_chunks;
+	egrperchunk = pd->port_rcvegrbufs_perchunk;
+	size = pd->port_rcvegrbuf_size;
+	pd->port_rcvegrbuf = kmalloc(chunk * sizeof(pd->port_rcvegrbuf[0]),
+				     GFP_KERNEL);
 	if (!pd->port_rcvegrbuf) {
 		ret = -ENOMEM;
 		goto bail;
 	}
 	pd->port_rcvegrbuf_phys =
-		vmalloc(chunk * sizeof(pd->port_rcvegrbuf_phys[0]));
+		kmalloc(chunk * sizeof(pd->port_rcvegrbuf_phys[0]),
+			GFP_KERNEL);
 	if (!pd->port_rcvegrbuf_phys) {
 		ret = -ENOMEM;
 		goto bail_rcvegrbuf;
@@ -791,94 +912,12 @@ bail_rcvegrbuf_phys:
 				  pd->port_rcvegrbuf_phys[e]);
 
 	}
-	vfree(pd->port_rcvegrbuf_phys);
+	kfree(pd->port_rcvegrbuf_phys);
 	pd->port_rcvegrbuf_phys = NULL;
 bail_rcvegrbuf:
-	vfree(pd->port_rcvegrbuf);
+	kfree(pd->port_rcvegrbuf);
 	pd->port_rcvegrbuf = NULL;
 bail:
-	return ret;
-}
-
-static int ipath_do_user_init(struct ipath_portdata *pd,
-			      const struct ipath_user_info *uinfo)
-{
-	int ret = 0;
-	struct ipath_devdata *dd = pd->port_dd;
-	u32 head32;
-
-	/* for now, if major version is different, bail */
-	if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) {
-		dev_info(&dd->pcidev->dev,
-			 "User major version %d not same as driver "
-			 "major %d\n", uinfo->spu_userversion >> 16,
-			 IPATH_USER_SWMAJOR);
-		ret = -ENODEV;
-		goto done;
-	}
-
-	if ((uinfo->spu_userversion & 0xffff) != IPATH_USER_SWMINOR)
-		ipath_dbg("User minor version %d not same as driver "
-			  "minor %d\n", uinfo->spu_userversion & 0xffff,
-			  IPATH_USER_SWMINOR);
-
-	if (uinfo->spu_rcvhdrsize) {
-		ret = ipath_setrcvhdrsize(dd, uinfo->spu_rcvhdrsize);
-		if (ret)
-			goto done;
-	}
-
-	/* for now we do nothing with rcvhdrcnt: uinfo->spu_rcvhdrcnt */
-
-	/* for right now, kernel piobufs are at end, so port 1 is at 0 */
-	pd->port_piobufs = dd->ipath_piobufbase +
-		dd->ipath_pbufsport * (pd->port_port -
-				       1) * dd->ipath_palign;
-	ipath_cdbg(VERBOSE, "Set base of piobufs for port %u to 0x%x\n",
-		   pd->port_port, pd->port_piobufs);
-
-	/*
-	 * Now allocate the rcvhdr Q and eager TIDs; skip the TID
-	 * array for time being.  If pd->port_port > chip-supported,
-	 * we need to do extra stuff here to handle by handling overflow
-	 * through port 0, someday
-	 */
-	ret = ipath_create_rcvhdrq(dd, pd);
-	if (!ret)
-		ret = ipath_create_user_egr(pd);
-	if (ret)
-		goto done;
-
-	/*
-	 * set the eager head register for this port to the current values
-	 * of the tail pointers, since we don't know if they were
-	 * updated on last use of the port.
-	 */
-	head32 = ipath_read_ureg32(dd, ur_rcvegrindextail, pd->port_port);
-	ipath_write_ureg(dd, ur_rcvegrindexhead, head32, pd->port_port);
-	dd->ipath_lastegrheads[pd->port_port] = -1;
-	dd->ipath_lastrcvhdrqtails[pd->port_port] = -1;
-	ipath_cdbg(VERBOSE, "Wrote port%d egrhead %x from tail regs\n",
-		pd->port_port, head32);
-	pd->port_tidcursor = 0;	/* start at beginning after open */
-	/*
-	 * now enable the port; the tail registers will be written to memory
-	 * by the chip as soon as it sees the write to
-	 * dd->ipath_kregs->kr_rcvctrl.  The update only happens on
-	 * transition from 0 to 1, so clear it first, then set it as part of
-	 * enabling the port.  This will (very briefly) affect any other
-	 * open ports, but it shouldn't be long enough to be an issue.
-	 * We explictly set the in-memory copy to 0 beforehand, so we don't
-	 * have to wait to be sure the DMA update has happened.
-	 */
-	*pd->port_rcvhdrtail_kvaddr = 0ULL;
-	set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port,
-		&dd->ipath_rcvctrl);
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
-			 dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD);
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
-			 dd->ipath_rcvctrl);
-done:
 	return ret;
 }
 
@@ -957,7 +996,8 @@ static int mmap_ureg(struct vm_area_stru
 
 static int mmap_piobufs(struct vm_area_struct *vma,
 			struct ipath_devdata *dd,
-			struct ipath_portdata *pd)
+			struct ipath_portdata *pd,
+			unsigned piobufs, unsigned piocnt)
 {
 	unsigned long phys;
 	int ret;
@@ -968,16 +1008,15 @@ static int mmap_piobufs(struct vm_area_s
 	 * process data, and catches users who might try to read the i/o
 	 * space due to a bug.
 	 */
-	if ((vma->vm_end - vma->vm_start) >
-	    (dd->ipath_pbufsport * dd->ipath_palign)) {
+	if ((vma->vm_end - vma->vm_start) > (piocnt * dd->ipath_palign)) {
 		dev_info(&dd->pcidev->dev, "FAIL mmap piobufs: "
 			 "reqlen %lx > PAGE\n",
 			 vma->vm_end - vma->vm_start);
-		ret = -EFAULT;
-		goto bail;
-	}
-
-	phys = dd->ipath_physaddr + pd->port_piobufs;
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	phys = dd->ipath_physaddr + piobufs;
 
 	/*
 	 * Don't mark this as non-cached, or we don't get the
@@ -1021,7 +1060,7 @@ static int mmap_rcvegrbufs(struct vm_are
 			 "reqlen %lx > actual %lx\n",
 			 vma->vm_end - vma->vm_start,
 			 (unsigned long) total_size);
-		ret = -EFAULT;
+		ret = -EINVAL;
 		goto bail;
 	}
 
@@ -1043,6 +1082,122 @@ static int mmap_rcvegrbufs(struct vm_are
 		if (ret < 0)
 			goto bail;
 	}
+	ret = 0;
+
+bail:
+	return ret;
+}
+
+/*
+ * ipath_file_vma_nopage - handle a VMA page fault.
+ */
+static struct page *ipath_file_vma_nopage(struct vm_area_struct *vma,
+					  unsigned long address, int *type)
+{
+	unsigned long offset = address - vma->vm_start;
+	struct page *page = NOPAGE_SIGBUS;
+	void *pageptr;
+
+	/*
+	 * Convert the vmalloc address into a struct page.
+	 */
+	pageptr = (void *)(offset + (vma->vm_pgoff << PAGE_SHIFT));
+	page = vmalloc_to_page(pageptr);
+	if (!page)
+		goto out;
+
+	/* Increment the reference count. */
+	get_page(page);
+	if (type)
+		*type = VM_FAULT_MINOR;
+out:
+	return page;
+}
+
+static struct vm_operations_struct ipath_file_vm_ops = {
+	.nopage = ipath_file_vma_nopage,
+};
+
+static int mmap_kvaddr(struct vm_area_struct *vma, u64 pgaddr,
+		       struct ipath_portdata *pd, unsigned subport)
+{
+	unsigned long len;
+	struct ipath_devdata *dd;
+	void *addr;
+	size_t size;
+	int ret;
+
+	/* If the port is not shared, all addresses should be physical */
+	if (!pd->port_subport_cnt) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	dd = pd->port_dd;
+	size = pd->port_rcvegrbuf_chunks * pd->port_rcvegrbuf_size;
+
+	/*
+	 * Master has all the slave uregbase, rcvhdrq, and
+	 * rcvegrbufs mmapped.
+	 */
+	if (subport == 0) {
+		unsigned num_slaves = pd->port_subport_cnt - 1;
+
+		if (pgaddr == ((u64) pd->subport_uregbase & MMAP64_MASK)) {
+			addr = pd->subport_uregbase;
+			size = PAGE_SIZE * num_slaves;
+		} else if (pgaddr == ((u64) pd->subport_rcvhdr_base &
+				      MMAP64_MASK)) {
+			addr = pd->subport_rcvhdr_base;
+			size = pd->port_rcvhdrq_size * num_slaves;
+		} else if (pgaddr == ((u64) pd->subport_rcvegrbuf &
+				      MMAP64_MASK)) {
+			addr = pd->subport_rcvegrbuf;
+			size *= num_slaves;
+		} else {
+			ret = -EINVAL;
+			goto bail;
+		}
+	} else if (pgaddr == (((u64) pd->subport_uregbase +
+			       PAGE_SIZE * (subport - 1)) & MMAP64_MASK)) {
+		addr = pd->subport_uregbase + PAGE_SIZE * (subport - 1);
+		size = PAGE_SIZE;
+	} else if (pgaddr == (((u64) pd->subport_rcvhdr_base +
+			       pd->port_rcvhdrq_size * (subport - 1)) &
+			      MMAP64_MASK)) {
+		addr = pd->subport_rcvhdr_base +
+			pd->port_rcvhdrq_size * (subport - 1);
+		size = pd->port_rcvhdrq_size;
+	} else if (pgaddr == (((u64) pd->subport_rcvegrbuf +
+			       size * (subport - 1)) & MMAP64_MASK)) {
+		addr = pd->subport_rcvegrbuf + size * (subport - 1);
+		/* rcvegrbufs are read-only on the slave */
+		if (vma->vm_flags & VM_WRITE) {
+			dev_info(&dd->pcidev->dev,
+				 "Can't map eager buffers as "
+				 "writable (flags=%lx)\n", vma->vm_flags);
+			ret = -EPERM;
+			goto bail;
+		}
+		/*
+		 * Don't allow permission to later change to writeable
+		 * with mprotect.
+		 */
+		vma->vm_flags &= ~VM_MAYWRITE;
+	} else {
+		ret = -EINVAL;
+		goto bail;
+	}
+	len = vma->vm_end - vma->vm_start;
+	if (len > size) {
+		ipath_cdbg(MM, "FAIL: reqlen %lx > %zx\n", len, size);
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	vma->vm_pgoff = (unsigned long) addr >> PAGE_SHIFT;
+	vma->vm_ops = &ipath_file_vm_ops;
+	vma->vm_flags |= VM_RESERVED | VM_DONTEXPAND;
 	ret = 0;
 
 bail:
@@ -1064,73 +1219,99 @@ static int ipath_mmap(struct file *fp, s
 	struct ipath_portdata *pd;
 	struct ipath_devdata *dd;
 	u64 pgaddr, ureg;
+	unsigned piobufs, piocnt;
 	int ret;
 
 	pd = port_fp(fp);
+	if (!pd) {
+		ret = -EINVAL;
+		goto bail;
+	}
 	dd = pd->port_dd;
 
 	/*
 	 * This is the ipath_do_user_init() code, mapping the shared buffers
 	 * into the user process. The address referred to by vm_pgoff is the
-	 * virtual, not physical, address; we only do one mmap for each
-	 * space mapped.
+	 * file offset passed via mmap().  For shared ports, this is the
+	 * kernel vmalloc() address of the pages to share with the master.
+	 * For non-shared or master ports, this is a physical address.
+	 * We only do one mmap for each space mapped.
 	 */
 	pgaddr = vma->vm_pgoff << PAGE_SHIFT;
 
 	/*
-	 * Must fit in 40 bits for our hardware; some checked elsewhere,
-	 * but we'll be paranoid.  Check for 0 is mostly in case one of the
-	 * allocations failed, but user called mmap anyway.   We want to catch
-	 * that before it can match.
+	 * Check for 0 in case one of the allocations failed, but user
+	 * called mmap anyway.
 	 */
-	if (!pgaddr || pgaddr >= (1ULL<<40))  {
-		ipath_dev_err(dd, "Bad phys addr %llx, start %lx, end %lx\n",
-			(unsigned long long)pgaddr, vma->vm_start, vma->vm_end);
-		return -EINVAL;
-	}
-
-	/* just the offset of the port user registers, not physical addr */
-	ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port;
-
-	ipath_cdbg(MM, "ushare: pgaddr %llx vm_start=%lx, vmlen %lx\n",
+	if (!pgaddr)  {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	ipath_cdbg(MM, "pgaddr %llx vm_start=%lx len %lx port %u:%u:%u\n",
 		   (unsigned long long) pgaddr, vma->vm_start,
-		   vma->vm_end - vma->vm_start);
-
-	if (vma->vm_start & (PAGE_SIZE-1)) {
-		ipath_dev_err(dd,
-			"vm_start not aligned: %lx, end=%lx phys %lx\n",
-			vma->vm_start, vma->vm_end, (unsigned long)pgaddr);
+		   vma->vm_end - vma->vm_start, dd->ipath_unit,
+		   pd->port_port, subport_fp(fp));
+
+	/*
+	 * Physical addresses must fit in 40 bits for our hardware.
+	 * Check for kernel virtual addresses first, anything else must
+	 * match a HW or memory address.
+	 */
+	if (pgaddr >= (1ULL<<40)) {
+		ret = mmap_kvaddr(vma, pgaddr, pd, subport_fp(fp));
+		goto bail;
+	}
+
+	if (!pd->port_subport_cnt) {
+		/* port is not shared */
+		ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port;
+		piocnt = dd->ipath_pbufsport;
+		piobufs = pd->port_piobufs;
+	} else if (!subport_fp(fp)) {
+		/* caller is the master */
+		ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port;
+		piocnt = (dd->ipath_pbufsport / pd->port_subport_cnt) +
+			 (dd->ipath_pbufsport % pd->port_subport_cnt);
+		piobufs = pd->port_piobufs +
+			dd->ipath_palign * (dd->ipath_pbufsport - piocnt);
+	} else {
+		unsigned slave = subport_fp(fp) - 1;
+
+		/* caller is a slave */
+		ureg = 0;
+		piocnt = dd->ipath_pbufsport / pd->port_subport_cnt;
+		piobufs = pd->port_piobufs + dd->ipath_palign * piocnt * slave;
+	}
+
+	if (pgaddr == ureg)
+		ret = mmap_ureg(vma, dd, ureg);
+	else if (pgaddr == piobufs)
+		ret = mmap_piobufs(vma, dd, pd, piobufs, piocnt);
+	else if (pgaddr == dd->ipath_pioavailregs_phys)
+		/* in-memory copy of pioavail registers */
+		ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
+			      	     dd->ipath_pioavailregs_phys,
+				     "pioavail registers");
+	else if (subport_fp(fp))
+		/* Subports don't mmap the physical receive buffers */
 		ret = -EINVAL;
-	}
-	else if (pgaddr == ureg)
-		ret = mmap_ureg(vma, dd, ureg);
-	else if (pgaddr == pd->port_piobufs)
-		ret = mmap_piobufs(vma, dd, pd);
-	else if (pgaddr == (u64) pd->port_rcvegr_phys)
+	else if (pgaddr == pd->port_rcvegr_phys)
 		ret = mmap_rcvegrbufs(vma, pd);
-	else if (pgaddr == (u64) pd->port_rcvhdrq_phys) {
+	else if (pgaddr == (u64) pd->port_rcvhdrq_phys)
 		/*
 		 * The rcvhdrq itself; readonly except on HT (so have
 		 * to allow writable mapping), multiple pages, contiguous
 		 * from an i/o perspective.
 		 */
-		unsigned total_size =
-			ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize
-			   * sizeof(u32), PAGE_SIZE);
-		ret = ipath_mmap_mem(vma, pd, total_size, 1,
+		ret = ipath_mmap_mem(vma, pd, pd->port_rcvhdrq_size, 1,
 				     pd->port_rcvhdrq_phys,
 				     "rcvhdrq");
-	}
-	else if (pgaddr == (u64)pd->port_rcvhdrqtailaddr_phys)
+	else if (pgaddr == (u64) pd->port_rcvhdrqtailaddr_phys)
 		/* in-memory copy of rcvhdrq tail register */
 		ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
 				     pd->port_rcvhdrqtailaddr_phys,
 				     "rcvhdrq tail");
-	else if (pgaddr == dd->ipath_pioavailregs_phys)
-		/* in-memory copy of pioavail registers */
-		ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
-				     dd->ipath_pioavailregs_phys,
-				     "pioavail registers");
 	else
 		ret = -EINVAL;
 
@@ -1138,9 +1319,10 @@ static int ipath_mmap(struct file *fp, s
 
 	if (ret < 0)
 		dev_info(&dd->pcidev->dev,
-			 "Failure %d on addr %lx, off %lx\n",
-			 -ret, vma->vm_start, vma->vm_pgoff);
-
+			 "Failure %d on off %llx len %lx\n",
+			 -ret, (unsigned long long)pgaddr,
+			 vma->vm_end - vma->vm_start);
+bail:
 	return ret;
 }
 
@@ -1154,6 +1336,8 @@ static unsigned int ipath_poll(struct fi
 	struct ipath_devdata *dd;
 
 	pd = port_fp(fp);
+	if (!pd)
+		goto bail;
 	dd = pd->port_dd;
 
 	bit = pd->port_port + INFINIPATH_R_INTRAVAIL_SHIFT;
@@ -1176,7 +1360,7 @@ static unsigned int ipath_poll(struct fi
 
 	if (tail == head) {
 		set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag);
-		if(dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */
+		if (dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */
 			(void)ipath_write_ureg(dd, ur_rcvhdrhead,
 					       dd->ipath_rhdrhead_intr_off
 					       | head, pd->port_port);
@@ -1200,18 +1384,80 @@ static unsigned int ipath_poll(struct fi
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 			 dd->ipath_rcvctrl);
 
+bail:
 	return pollflag;
 }
 
+static int init_subports(struct ipath_devdata *dd,
+			 struct ipath_portdata *pd,
+			 const struct ipath_user_info *uinfo)
+{
+	int ret = 0;
+	unsigned num_slaves;
+	size_t size;
+
+	/* Old user binaries don't know about subports */
+	if ((uinfo->spu_userversion & 0xffff) != IPATH_USER_SWMINOR)
+		goto bail;
+	/*
+	 * If the user is requesting zero or one port,
+	 * skip the subport allocation.
+	 */
+	if (uinfo->spu_subport_cnt <= 1)
+		goto bail;
+	if (uinfo->spu_subport_cnt > 4) {
+		ret = -EINVAL;
+		goto bail;
+	}
+
+	num_slaves = uinfo->spu_subport_cnt - 1;
+	pd->subport_uregbase = vmalloc(PAGE_SIZE * num_slaves);
+	if (!pd->subport_uregbase) {
+		ret = -ENOMEM;
+		goto bail;
+	}
+	/* Note: pd->port_rcvhdrq_size isn't initialized yet. */
+	size = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize *
+		     sizeof(u32), PAGE_SIZE) * num_slaves;
+	pd->subport_rcvhdr_base = vmalloc(size);
+	if (!pd->subport_rcvhdr_base) {
+		ret = -ENOMEM;
+		goto bail_ureg;
+	}
+
+	pd->subport_rcvegrbuf = vmalloc(pd->port_rcvegrbuf_chunks *
+					pd->port_rcvegrbuf_size *
+					num_slaves);
+	if (!pd->subport_rcvegrbuf) {
+		ret = -ENOMEM;
+		goto bail_rhdr;
+	}
+
+	pd->port_subport_cnt = uinfo->spu_subport_cnt;
+	pd->port_subport_id = uinfo->spu_subport_id;
+	pd->active_slaves = 1;
+	goto bail;
+
+bail_rhdr:
+	vfree(pd->subport_rcvhdr_base);
+bail_ureg:
+	vfree(pd->subport_uregbase);
+	pd->subport_uregbase = NULL;
+bail:
+	return ret;
+}
+
 static int try_alloc_port(struct ipath_devdata *dd, int port,
-			  struct file *fp)
-{
+			  struct file *fp,
+			  const struct ipath_user_info *uinfo)
+{
+	struct ipath_portdata *pd;
 	int ret;
 
-	if (!dd->ipath_pd[port]) {
-		void *p, *ptmp;
-
-		p = kzalloc(sizeof(struct ipath_portdata), GFP_KERNEL);
+	if (!(pd = dd->ipath_pd[port])) {
+		void *ptmp;
+
+		pd = kzalloc(sizeof(struct ipath_portdata), GFP_KERNEL);
 
 		/*
 		 * Allocate memory for use in ipath_tid_update() just once
@@ -1221,34 +1467,36 @@ static int try_alloc_port(struct ipath_d
 		ptmp = kmalloc(dd->ipath_rcvtidcnt * sizeof(u16) +
 			       dd->ipath_rcvtidcnt * sizeof(struct page **),
 			       GFP_KERNEL);
-		if (!p || !ptmp) {
+		if (!pd || !ptmp) {
 			ipath_dev_err(dd, "Unable to allocate portdata "
 				      "memory, failing open\n");
 			ret = -ENOMEM;
-			kfree(p);
+			kfree(pd);
 			kfree(ptmp);
 			goto bail;
 		}
-		dd->ipath_pd[port] = p;
+		dd->ipath_pd[port] = pd;
 		dd->ipath_pd[port]->port_port = port;
 		dd->ipath_pd[port]->port_dd = dd;
 		dd->ipath_pd[port]->port_tid_pg_list = ptmp;
 		init_waitqueue_head(&dd->ipath_pd[port]->port_wait);
 	}
-	if (!dd->ipath_pd[port]->port_cnt) {
-		dd->ipath_pd[port]->port_cnt = 1;
-		fp->private_data = (void *) dd->ipath_pd[port];
+	if (!pd->port_cnt) {
+		pd->userversion = uinfo->spu_userversion;
+		init_user_egr_sizes(pd);
+		if ((ret = init_subports(dd, pd, uinfo)) != 0)
+			goto bail;
 		ipath_cdbg(PROC, "%s[%u] opened unit:port %u:%u\n",
 			   current->comm, current->pid, dd->ipath_unit,
 			   port);
-		dd->ipath_pd[port]->port_pid = current->pid;
-		strncpy(dd->ipath_pd[port]->port_comm, current->comm,
-			sizeof(dd->ipath_pd[port]->port_comm));
+		pd->port_cnt = 1;
+		port_fp(fp) = pd;
+		pd->port_pid = current->pid;
+		strncpy(pd->port_comm, current->comm, sizeof(pd->port_comm));
 		ipath_stats.sps_ports++;
 		ret = 0;
-		goto bail;
-	}
-	ret = -EBUSY;
+	} else
+		ret = -EBUSY;
 
 bail:
 	return ret;
@@ -1264,7 +1512,8 @@ static inline int usable(struct ipath_de
 				     | IPATH_LINKUNK));
 }
 
-static int find_free_port(int unit, struct file *fp)
+static int find_free_port(int unit, struct file *fp,
+			  const struct ipath_user_info *uinfo)
 {
 	struct ipath_devdata *dd = ipath_lookup(unit);
 	int ret, i;
@@ -1279,8 +1528,8 @@ static int find_free_port(int unit, stru
 		goto bail;
 	}
 
-	for (i = 0; i < dd->ipath_cfgports; i++) {
-		ret = try_alloc_port(dd, i, fp);
+	for (i = 1; i < dd->ipath_cfgports; i++) {
+		ret = try_alloc_port(dd, i, fp, uinfo);
 		if (ret != -EBUSY)
 			goto bail;
 	}
@@ -1290,13 +1539,14 @@ bail:
 	return ret;
 }
 
-static int find_best_unit(struct file *fp)
+static int find_best_unit(struct file *fp,
+			  const struct ipath_user_info *uinfo)
 {
 	int ret = 0, i, prefunit = -1, devmax;
 	int maxofallports, npresent, nup;
 	int ndev;
 
-	(void) ipath_count_units(&npresent, &nup, &maxofallports);
+	devmax = ipath_count_units(&npresent, &nup, &maxofallports);
 
 	/*
 	 * This code is present to allow a knowledgeable person to
@@ -1343,8 +1593,6 @@ static int find_best_unit(struct file *f
 
 	if (prefunit != -1)
 		devmax = prefunit + 1;
-	else
-		devmax = ipath_count_units(NULL, NULL, NULL);
 recheck:
 	for (i = 1; i < maxofallports; i++) {
 		for (ndev = prefunit != -1 ? prefunit : 0; ndev < devmax;
@@ -1359,7 +1607,7 @@ recheck:
 				 * next.
 				 */
 				continue;
-			ret = try_alloc_port(dd, i, fp);
+			ret = try_alloc_port(dd, i, fp, uinfo);
 			if (!ret)
 				goto done;
 		}
@@ -1395,22 +1643,174 @@ done:
 	return ret;
 }
 
+static int find_shared_port(struct file *fp,
+			    const struct ipath_user_info *uinfo)
+{
+	int devmax, ndev, i;
+	int ret = 0;
+
+	devmax = ipath_count_units(NULL, NULL, NULL);
+
+	for (ndev = 0; ndev < devmax; ndev++) {
+		struct ipath_devdata *dd = ipath_lookup(ndev);
+
+		if (!dd)
+			continue;
+		for (i = 1; i < dd->ipath_cfgports; i++) {
+			struct ipath_portdata *pd = dd->ipath_pd[i];
+
+			/* Skip ports which are not yet open */
+			if (!pd || !pd->port_cnt)
+				continue;
+			/* Skip port if it doesn't match the requested one */
+			if (pd->port_subport_id != uinfo->spu_subport_id)
+				continue;
+			/* Verify the sharing process matches the master */
+			if (pd->port_subport_cnt != uinfo->spu_subport_cnt ||
+			    pd->userversion != uinfo->spu_userversion ||
+			    pd->port_cnt >= pd->port_subport_cnt) {
+				ret = -EINVAL;
+				goto done;
+			}
+			port_fp(fp) = pd;
+			subport_fp(fp) = pd->port_cnt++;
+			tidcursor_fp(fp) = 0;
+			pd->active_slaves |= 1 << subport_fp(fp);
+			ipath_cdbg(PROC,
+				   "%s[%u] %u sharing %s[%u] unit:port %u:%u\n",
+				   current->comm, current->pid,
+				   subport_fp(fp),
+				   pd->port_comm, pd->port_pid,
+				   dd->ipath_unit, pd->port_port);
+			ret = 1;
+			goto done;
+		}
+	}
+
+done:
+	return ret;
+}
+
 static int ipath_open(struct inode *in, struct file *fp)
 {
-	int ret, user_minor;
+	/* The real work is performed later in ipath_do_user_init() */
+	fp->private_data = kzalloc(sizeof(struct ipath_filedata), GFP_KERNEL);
+	return fp->private_data ? 0 : -ENOMEM;
+}
+
+static int ipath_do_user_init(struct file *fp,
+			      const struct ipath_user_info *uinfo)
+{
+	int ret;
+	struct ipath_portdata *pd;
+	struct ipath_devdata *dd;
+	u32 head32;
+	int i_minor;
+	unsigned swminor;
+
+	/* Check to be sure we haven't already initialized this file */
+	if (port_fp(fp)) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	/* for now, if major version is different, bail */
+	if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) {
+		ipath_dbg("User major version %d not same as driver "
+			  "major %d\n", uinfo->spu_userversion >> 16,
+			  IPATH_USER_SWMAJOR);
+		ret = -ENODEV;
+		goto done;
+	}
+
+	swminor = uinfo->spu_userversion & 0xffff;
+	if (swminor != IPATH_USER_SWMINOR)
+		ipath_dbg("User minor version %d not same as driver "
+			  "minor %d\n", swminor, IPATH_USER_SWMINOR);
 
 	mutex_lock(&ipath_mutex);
 
-	user_minor = iminor(in) - IPATH_USER_MINOR_BASE;
+	if (swminor == IPATH_USER_SWMINOR && uinfo->spu_subport_cnt &&
+	    (ret = find_shared_port(fp, uinfo))) {
+		mutex_unlock(&ipath_mutex);
+		if (ret > 0)
+			ret = 0;
+		goto done;
+	}
+
+	i_minor = iminor(fp->f_dentry->d_inode) - IPATH_USER_MINOR_BASE;
 	ipath_cdbg(VERBOSE, "open on dev %lx (minor %d)\n",
-		   (long)in->i_rdev, user_minor);
-
-	if (user_minor)
-		ret = find_free_port(user_minor - 1, fp);
+		   (long)fp->f_dentry->d_inode->i_rdev, i_minor);
+
+	if (i_minor)
+		ret = find_free_port(i_minor - 1, fp, uinfo);
 	else
-		ret = find_best_unit(fp);
+		ret = find_best_unit(fp, uinfo);
 
 	mutex_unlock(&ipath_mutex);
+
+	if (ret)
+		goto done;
+
+	pd = port_fp(fp);
+	dd = pd->port_dd;
+
+	if (uinfo->spu_rcvhdrsize) {
+		ret = ipath_setrcvhdrsize(dd, uinfo->spu_rcvhdrsize);
+		if (ret)
+			goto done;
+	}
+
+	/* for now we do nothing with rcvhdrcnt: uinfo->spu_rcvhdrcnt */
+
+	/* for right now, kernel piobufs are at end, so port 1 is at 0 */
+	pd->port_piobufs = dd->ipath_piobufbase +
+		dd->ipath_pbufsport * (pd->port_port - 1) * dd->ipath_palign;
+	ipath_cdbg(VERBOSE, "Set base of piobufs for port %u to 0x%x\n",
+		   pd->port_port, pd->port_piobufs);
+
+	/*
+	 * Now allocate the rcvhdr Q and eager TIDs; skip the TID
+	 * array for time being.  If pd->port_port > chip-supported,
+	 * we need to do extra stuff here to handle by handling overflow
+	 * through port 0, someday
+	 */
+	ret = ipath_create_rcvhdrq(dd, pd);
+	if (!ret)
+		ret = ipath_create_user_egr(pd);
+	if (ret)
+		goto done;
+
+	/*
+	 * set the eager head register for this port to the current values
+	 * of the tail pointers, since we don't know if they were
+	 * updated on last use of the port.
+	 */
+	head32 = ipath_read_ureg32(dd, ur_rcvegrindextail, pd->port_port);
+	ipath_write_ureg(dd, ur_rcvegrindexhead, head32, pd->port_port);
+	dd->ipath_lastegrheads[pd->port_port] = -1;
+	dd->ipath_lastrcvhdrqtails[pd->port_port] = -1;
+	ipath_cdbg(VERBOSE, "Wrote port%d egrhead %x from tail regs\n",
+		pd->port_port, head32);
+	pd->port_tidcursor = 0;	/* start at beginning after open */
+	/*
+	 * now enable the port; the tail registers will be written to memory
+	 * by the chip as soon as it sees the write to
+	 * dd->ipath_kregs->kr_rcvctrl.  The update only happens on
+	 * transition from 0 to 1, so clear it first, then set it as part of
+	 * enabling the port.  This will (very briefly) affect any other
+	 * open ports, but it shouldn't be long enough to be an issue.
+	 * We explictly set the in-memory copy to 0 beforehand, so we don't
+	 * have to wait to be sure the DMA update has happened.
+	 */
+	*pd->port_rcvhdrtail_kvaddr = 0ULL;
+	set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port,
+		&dd->ipath_rcvctrl);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
+			 dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
+			 dd->ipath_rcvctrl);
+done:
 	return ret;
 }
 
@@ -1453,6 +1853,7 @@ static int ipath_close(struct inode *in,
 static int ipath_close(struct inode *in, struct file *fp)
 {
 	int ret = 0;
+	struct ipath_filedata *fd;
 	struct ipath_portdata *pd;
 	struct ipath_devdata *dd;
 	unsigned port;
@@ -1462,9 +1863,24 @@ static int ipath_close(struct inode *in,
 
 	mutex_lock(&ipath_mutex);
 
-	pd = port_fp(fp);
+	fd = (struct ipath_filedata *) fp->private_data;
+	fp->private_data = NULL;
+	pd = fd->pd;
+	if (!pd) {
+		mutex_unlock(&ipath_mutex);
+		goto bail;
+	}
+	if (--pd->port_cnt) {
+		/*
+		 * XXX If the master closes the port before the slave(s),
+		 * revoke the mmap for the eager receive queue so
+		 * the slave(s) don't wait for receive data forever.
+		 */
+		pd->active_slaves &= ~(1 << fd->subport);
+		mutex_unlock(&ipath_mutex);
+		goto bail;
+	}
 	port = pd->port_port;
-	fp->private_data = NULL;
 	dd = pd->port_dd;
 
 	if (pd->port_hdrqfull) {
@@ -1503,8 +1919,6 @@ static int ipath_close(struct inode *in,
 
 		/* clean up the pkeys for this port user */
 		ipath_clean_part_key(pd, dd);
-
-
 		/*
 		 * be paranoid, and never write 0's to these, just use an
 		 * unused part of the port 0 tail page.  Of course,
@@ -1533,35 +1947,55 @@ static int ipath_close(struct inode *in,
 		dd->ipath_f_clear_tids(dd, pd->port_port);
 	}
 
-	pd->port_cnt = 0;
 	pd->port_pid = 0;
-
 	dd->ipath_pd[pd->port_port] = NULL; /* before releasing mutex */
 	mutex_unlock(&ipath_mutex);
 	ipath_free_pddata(dd, pd); /* after releasing the mutex */
 
-	return ret;
-}
-
-static int ipath_port_info(struct ipath_portdata *pd,
+bail:
+	kfree(fd);
+	return ret;
+}
+
+static int ipath_port_info(struct ipath_portdata *pd, u16 subport,
 			   struct ipath_port_info __user *uinfo)
 {
 	struct ipath_port_info info;
 	int nup;
 	int ret;
+	size_t sz;
 
 	(void) ipath_count_units(NULL, &nup, NULL);
 	info.num_active = nup;
 	info.unit = pd->port_dd->ipath_unit;
 	info.port = pd->port_port;
-
-	if (copy_to_user(uinfo, &info, sizeof(info))) {
+	info.subport = subport;
+	/* Don't return new fields if old library opened the port. */
+	if ((pd->userversion & 0xffff) == IPATH_USER_SWMINOR) {
+		/* Number of user ports available for this device. */
+		info.num_ports = pd->port_dd->ipath_cfgports - 1;
+		info.num_subports = pd->port_subport_cnt;
+		sz = sizeof(info);
+	} else
+		sz = sizeof(info) - 2 * sizeof(u16);
+
+	if (copy_to_user(uinfo, &info, sz)) {
 		ret = -EFAULT;
 		goto bail;
 	}
 	ret = 0;
 
 bail:
+	return ret;
+}
+
+static int ipath_get_slave_info(struct ipath_portdata *pd,
+				void __user *slave_mask_addr)
+{
+	int ret = 0;
+
+	if (copy_to_user(slave_mask_addr, &pd->active_slaves, sizeof(u32)))
+		ret = -EFAULT;
 	return ret;
 }
 
@@ -1617,6 +2051,11 @@ static ssize_t ipath_write(struct file *
 		dest = &cmd.cmd.part_key;
 		src = &ucmd->cmd.part_key;
 		break;
+	case IPATH_CMD_SLAVE_INFO:
+		copy = sizeof(cmd.cmd.slave_mask_addr);
+		dest = &cmd.cmd.slave_mask_addr;
+		src = &ucmd->cmd.slave_mask_addr;
+		break;
 	default:
 		ret = -EINVAL;
 		goto bail;
@@ -1634,33 +2073,42 @@ static ssize_t ipath_write(struct file *
 
 	consumed += copy;
 	pd = port_fp(fp);
+	if (!pd && cmd.type != IPATH_CMD_USER_INIT) {
+		ret = -EINVAL;
+		goto bail;
+	}
 
 	switch (cmd.type) {
 	case IPATH_CMD_USER_INIT:
-		ret = ipath_do_user_init(pd, &cmd.cmd.user_info);
-		if (ret < 0)
+		ret = ipath_do_user_init(fp, &cmd.cmd.user_info);
+		if (ret)
 			goto bail;
 		ret = ipath_get_base_info(
-			pd, (void __user *) (unsigned long)
+			fp, (void __user *) (unsigned long)
 			cmd.cmd.user_info.spu_base_info,
 			cmd.cmd.user_info.spu_base_info_size);
 		break;
 	case IPATH_CMD_RECV_CTRL:
-		ret = ipath_manage_rcvq(pd, cmd.cmd.recv_ctrl);
+		ret = ipath_manage_rcvq(pd, subport_fp(fp), cmd.cmd.recv_ctrl);
 		break;
 	case IPATH_CMD_PORT_INFO:
-		ret = ipath_port_info(pd,
+		ret = ipath_port_info(pd, subport_fp(fp),
 				      (struct ipath_port_info __user *)
 				      (unsigned long) cmd.cmd.port_info);
 		break;
 	case IPATH_CMD_TID_UPDATE:
-		ret = ipath_tid_update(pd, &cmd.cmd.tid_info);
+		ret = ipath_tid_update(pd, fp, &cmd.cmd.tid_info);
 		break;
 	case IPATH_CMD_TID_FREE:
-		ret = ipath_tid_free(pd, &cmd.cmd.tid_info);
+		ret = ipath_tid_free(pd, subport_fp(fp), &cmd.cmd.tid_info);
 		break;
 	case IPATH_CMD_SET_PART_KEY:
 		ret = ipath_set_part_key(pd, cmd.cmd.part_key);
+		break;
+	case IPATH_CMD_SLAVE_INFO:
+		ret = ipath_get_slave_info(pd,
+					   (void __user *) (unsigned long)
+					   cmd.cmd.slave_mask_addr);
 		break;
 	}
 
@@ -1858,4 +2306,3 @@ bail:
 bail:
 	return;
 }
-
diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
@@ -79,8 +79,8 @@ struct ipath_portdata {
 	dma_addr_t port_rcvhdrq_phys;
 	dma_addr_t port_rcvhdrqtailaddr_phys;
 	/*
-	 * number of opens on this instance (0 or 1; ignoring forks, dup,
-	 * etc. for now)
+	 * number of opens (including slave subports) on this instance
+	 * (ignoring forks, dup, etc. for now)
 	 */
 	int port_cnt;
 	/*
@@ -89,6 +89,10 @@ struct ipath_portdata {
 	 */
 	/* instead of calculating it */
 	unsigned port_port;
+	/* non-zero if port is being shared. */
+	u16 port_subport_cnt;
+	/* non-zero if port is being shared. */
+	u16 port_subport_id;
 	/* chip offset of PIO buffers for this port */
 	u32 port_piobufs;
 	/* how many alloc_pages() chunks in port_rcvegrbuf_pages */
@@ -121,6 +125,16 @@ struct ipath_portdata {
 	u16 port_pkeys[4];
 	/* so file ops can get at unit */
 	struct ipath_devdata *port_dd;
+	/* A page of memory for rcvhdrhead, rcvegrhead, rcvegrtail * N */
+	void *subport_uregbase;
+	/* An array of pages for the eager receive buffers * N */
+	void *subport_rcvegrbuf;
+	/* An array of pages for the eager header queue entries * N */
+	void *subport_rcvhdr_base;
+	/* The version of the library which opened this port */
+	u32 userversion;
+	/* Bitmask of active slaves */
+	u32 active_slaves;
 };
 
 struct sk_buff;
@@ -512,6 +526,12 @@ struct ipath_devdata {
 	u32 ipath_lli_errors;
 };
 
+/* Private data for file operations */
+struct ipath_filedata {
+	struct ipath_portdata *pd;
+	unsigned subport;
+	unsigned tidcursor;
+};
 extern struct list_head ipath_dev_list;
 extern spinlock_t ipath_devs_lock;
 extern struct ipath_devdata *ipath_lookup(int unit);
@@ -572,7 +592,11 @@ int ipath_set_rx_pol_inv(struct ipath_de
 int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv);
 
 /* for use in system calls, where we want to know device type, etc. */
-#define port_fp(fp) ((struct ipath_portdata *) (fp)->private_data)
+#define port_fp(fp) ((struct ipath_filedata *)(fp)->private_data)->pd
+#define subport_fp(fp) \
+	((struct ipath_filedata *)(fp)->private_data)->subport
+#define tidcursor_fp(fp) \
+	((struct ipath_filedata *)(fp)->private_data)->tidcursor
 
 /*
  * values for ipath_flags
diff -r 45079acba208 -r 7f5b6127be15 drivers/infiniband/hw/ipath/ipath_sysfs.c
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c	Thu Sep 28 08:57:12 2006 -0700
@@ -295,6 +295,16 @@ static ssize_t show_nguid(struct device 
 	struct ipath_devdata *dd = dev_get_drvdata(dev);
 
 	return scnprintf(buf, PAGE_SIZE, "%u\n", dd->ipath_nguid);
+}
+
+static ssize_t show_nports(struct device *dev,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct ipath_devdata *dd = dev_get_drvdata(dev);
+
+	/* Return the number of user ports available. */
+	return scnprintf(buf, PAGE_SIZE, "%u\n", dd->ipath_cfgports - 1);
 }
 
 static ssize_t show_serial(struct device *dev,
@@ -608,6 +618,7 @@ static DEVICE_ATTR(mtu, S_IWUSR | S_IRUG
 static DEVICE_ATTR(mtu, S_IWUSR | S_IRUGO, show_mtu, store_mtu);
 static DEVICE_ATTR(enabled, S_IWUSR | S_IRUGO, show_enabled, store_enabled);
 static DEVICE_ATTR(nguid, S_IRUGO, show_nguid, NULL);
+static DEVICE_ATTR(nports, S_IRUGO, show_nports, NULL);
 static DEVICE_ATTR(reset, S_IWUSR, NULL, store_reset);
 static DEVICE_ATTR(serial, S_IRUGO, show_serial, NULL);
 static DEVICE_ATTR(status, S_IRUGO, show_status, NULL);
@@ -623,6 +634,7 @@ static struct attribute *dev_attributes[
 	&dev_attr_mlid.attr,
 	&dev_attr_mtu.attr,
 	&dev_attr_nguid.attr,
+	&dev_attr_nports.attr,
 	&dev_attr_serial.attr,
 	&dev_attr_status.attr,
 	&dev_attr_status_str.attr,


From bos at pathscale.com  Thu Sep 28 09:00:05 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:05 -0700
Subject: [openib-general] [PATCH 9 of 28] IB/ipath - only allow complete
	writes to flash
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <934e5c1d6adecef606f8.1159459205@eng-12.pathscale.com>

Don't allow a write to the eeprom from ipathfs unless the write is exactly
128 bytes and starts at offset 0.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r cc3350eeb557 -r 934e5c1d6ade drivers/infiniband/hw/ipath/ipath_fs.c
--- a/drivers/infiniband/hw/ipath/ipath_fs.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_fs.c	Thu Sep 28 08:57:12 2006 -0700
@@ -357,18 +357,15 @@ static ssize_t flash_write(struct file *
 
 	pos = *ppos;
 
-	if ( pos < 0) {
+	if (pos != 0) {
 		ret = -EINVAL;
 		goto bail;
 	}
 
-	if (pos >= sizeof(struct ipath_flash)) {
-		ret = 0;
-		goto bail;
-	}
-
-	if (count > sizeof(struct ipath_flash) - pos)
-		count = sizeof(struct ipath_flash) - pos;
+	if (count != sizeof(struct ipath_flash)) {
+		ret = -EINVAL;
+		goto bail;
+	}
 
 	tmp = kmalloc(count, GFP_KERNEL);
 	if (!tmp) {


From bos at pathscale.com  Thu Sep 28 09:00:04 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:04 -0700
Subject: [openib-general] [PATCH 8 of 28] IB/ipath - count SRQs properly
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <cc3350eeb557466198c1.1159459204@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r fcd3e3bc98d8 -r cc3350eeb557 drivers/infiniband/hw/ipath/ipath_srq.c
--- a/drivers/infiniband/hw/ipath/ipath_srq.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_srq.c	Thu Sep 28 08:57:12 2006 -0700
@@ -103,11 +103,6 @@ struct ib_srq *ipath_create_srq(struct i
 	struct ipath_srq *srq;
 	u32 sz;
 	struct ib_srq *ret;
-
-	if (dev->n_srqs_allocated == ib_ipath_max_srqs) {
-		ret = ERR_PTR(-ENOMEM);
-		goto done;
-	}
 
 	if (srq_init_attr->attr.max_wr == 0) {
 		ret = ERR_PTR(-EINVAL);
@@ -180,10 +175,17 @@ struct ib_srq *ipath_create_srq(struct i
 	spin_lock_init(&srq->rq.lock);
 	srq->rq.wq->head = 0;
 	srq->rq.wq->tail = 0;
-	srq->rq.max_sge = srq_init_attr->attr.max_sge;
 	srq->limit = srq_init_attr->attr.srq_limit;
 
-	dev->n_srqs_allocated++;
+	spin_lock(&dev->n_srqs_lock);
+	if (dev->n_srqs_allocated == ib_ipath_max_srqs) {
+		spin_unlock(&dev->n_srqs_lock);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_wq;
+	}
+
+ 	dev->n_srqs_allocated++;
+	spin_unlock(&dev->n_srqs_lock);
 
 	ret = &srq->ibsrq;
 	goto done;
@@ -351,8 +353,13 @@ int ipath_destroy_srq(struct ib_srq *ibs
 	struct ipath_srq *srq = to_isrq(ibsrq);
 	struct ipath_ibdev *dev = to_idev(ibsrq->device);
 
+	spin_lock(&dev->n_srqs_lock);
 	dev->n_srqs_allocated--;
-	vfree(srq->rq.wq);
+	spin_unlock(&dev->n_srqs_lock);
+	if (srq->ip)
+		kref_put(&srq->ip->ref, ipath_release_mmap_info);
+	else
+		vfree(srq->rq.wq);
 	kfree(srq);
 
 	return 0;


From bos at pathscale.com  Thu Sep 28 09:00:02 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:02 -0700
Subject: [openib-general] [PATCH 6 of 28] IB/ipath - clean up handling of
	GUID 0
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <0fe847c544586f6f74d0.1159459202@eng-12.pathscale.com>

Respond with an error to the SM if our GUID is 0, and don't allow the
user to set our GUID to 0.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r e2916bbf09ed -r 0fe847c54458 drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c	Thu Sep 28 08:57:12 2006 -0700
@@ -87,7 +87,8 @@ static int recv_subn_get_nodeinfo(struct
 	struct ipath_devdata *dd = to_idev(ibdev)->dd;
 	u32 vendor, majrev, minrev;
 
-	if (smp->attr_mod)
+	/* GUID 0 is illegal */
+	if (smp->attr_mod || (dd->ipath_guid == 0))
 		smp->status |= IB_SMP_INVALID_FIELD;
 
 	nip->base_version = 1;
@@ -131,10 +132,15 @@ static int recv_subn_get_guidinfo(struct
 	 * We only support one GUID for now.  If this changes, the
 	 * portinfo.guid_cap field needs to be updated too.
 	 */
-	if (startgx == 0)
-		/* The first is a copy of the read-only HW GUID. */
-		*p = to_idev(ibdev)->dd->ipath_guid;
-	else
+	if (startgx == 0) {
+		__be64 g = to_idev(ibdev)->dd->ipath_guid;
+		if (g == 0)
+			/* GUID 0 is illegal */
+			smp->status |= IB_SMP_INVALID_FIELD;
+		else
+			/* The first is a copy of the read-only HW GUID. */
+			*p = g;
+	} else
 		smp->status |= IB_SMP_INVALID_FIELD;
 
 	return reply(smp);
diff -r e2916bbf09ed -r 0fe847c54458 drivers/infiniband/hw/ipath/ipath_sysfs.c
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c	Thu Sep 28 08:57:12 2006 -0700
@@ -257,7 +257,7 @@ static ssize_t store_guid(struct device 
 	struct ipath_devdata *dd = dev_get_drvdata(dev);
 	ssize_t ret;
 	unsigned short guid[8];
-	__be64 nguid;
+	__be64 new_guid;
 	u8 *ng;
 	int i;
 
@@ -266,7 +266,7 @@ static ssize_t store_guid(struct device 
 		   &guid[4], &guid[5], &guid[6], &guid[7]) != 8)
 		goto invalid;
 
-	ng = (u8 *) &nguid;
+	ng = (u8 *) &new_guid;
 
 	for (i = 0; i < 8; i++) {
 		if (guid[i] > 0xff)
@@ -274,7 +274,10 @@ static ssize_t store_guid(struct device 
 		ng[i] = guid[i];
 	}
 
-	dd->ipath_guid = nguid;
+	if (new_guid == 0)
+		goto invalid;
+
+	dd->ipath_guid = new_guid;
 	dd->ipath_nguid = 1;
 
 	ret = strlen(buf);


From bos at pathscale.com  Thu Sep 28 09:00:06 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:06 -0700
Subject: [openib-general] [PATCH 10 of 28] IB/ipath - RC and UC should
 validate SLID and DLID
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <f8c0eb9dc3b8ddcf8f4c.1159459206@eng-12.pathscale.com>

This is required for IB conformance (spec ch. 9.6.1.5).

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 934e5c1d6ade -r f8c0eb9dc3b8 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1319,6 +1319,10 @@ void ipath_rc_rcv(struct ipath_ibdev *de
 	int diff;
 	struct ib_reth *reth;
 	int header_in_data;
+
+	/* Validate the SLID. See Ch. 9.6.1.5 */
+	if (unlikely(be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid))
+		goto done;
 
 	/* Check for GRH */
 	if (!has_grh) {
diff -r 934e5c1d6ade -r f8c0eb9dc3b8 drivers/infiniband/hw/ipath/ipath_uc.c
--- a/drivers/infiniband/hw/ipath/ipath_uc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -246,6 +246,10 @@ void ipath_uc_rcv(struct ipath_ibdev *de
 	struct ib_reth *reth;
 	int header_in_data;
 
+	/* Validate the SLID. See Ch. 9.6.1.5 */
+	if (unlikely(be16_to_cpu(hdr->lrh[3]) != qp->remote_ah_attr.dlid))
+		goto done;
+
 	/* Check for GRH */
 	if (!has_grh) {
 		ohdr = &hdr->u.oth;


From bos at pathscale.com  Thu Sep 28 09:00:10 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:10 -0700
Subject: [openib-general] [PATCH 14 of 28] IB/ipath - Fix mismatch in shifts
 and masks for printing debug info
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <42f82d2c62bce5aa8ae0.1159459210@eng-12.pathscale.com>

Fixed mismatch in linkstate/trainingstate shifts and masks in the
IPATH_IBSTATE_MASK macro.  It kept some linktrainingstates
from being printed correctly in debug; no functionality issue unless
I misread the code.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 2a328f7db58f -r 42f82d2c62bc drivers/infiniband/hw/ipath/ipath_registers.h
--- a/drivers/infiniband/hw/ipath/ipath_registers.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h	Thu Sep 28 08:57:12 2006 -0700
@@ -223,9 +223,9 @@
 
 /* combination link status states that we use with some frequency */
 #define IPATH_IBSTATE_MASK ((INFINIPATH_IBCS_LINKTRAININGSTATE_MASK \
-		<< INFINIPATH_IBCS_LINKSTATE_SHIFT) | \
+		<< INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) | \
 		(INFINIPATH_IBCS_LINKSTATE_MASK \
-		<<INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT))
+		<<INFINIPATH_IBCS_LINKSTATE_SHIFT))
 #define IPATH_IBSTATE_INIT ((INFINIPATH_IBCS_L_STATE_INIT \
 		<< INFINIPATH_IBCS_LINKSTATE_SHIFT) | \
 		(INFINIPATH_IBCS_LT_STATE_LINKUP \


From bos at pathscale.com  Thu Sep 28 09:00:03 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:03 -0700
Subject: [openib-general] [PATCH 7 of 28] IB/ipath - lock and count
 allocated CQs properly
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <fcd3e3bc98d8132c8fe9.1159459203@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 0fe847c54458 -r fcd3e3bc98d8 drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c	Thu Sep 28 08:57:12 2006 -0700
@@ -174,11 +174,6 @@ struct ib_cq *ipath_create_cq(struct ib_
 
 	if (entries < 1 || entries > ib_ipath_max_cqes) {
 		ret = ERR_PTR(-EINVAL);
-		goto done;
-	}
-
-	if (dev->n_cqs_allocated == ib_ipath_max_cqs) {
-		ret = ERR_PTR(-ENOMEM);
 		goto done;
 	}
 
@@ -237,6 +232,16 @@ struct ib_cq *ipath_create_cq(struct ib_
 	} else
 		cq->ip = NULL;
 
+	spin_lock(&dev->n_cqs_lock);
+	if (dev->n_cqs_allocated == ib_ipath_max_cqs) {
+		spin_unlock(&dev->n_cqs_lock);
+		ret = ERR_PTR(-ENOMEM);
+		goto bail_wc;
+	}
+
+	dev->n_cqs_allocated++;
+	spin_unlock(&dev->n_cqs_lock);
+
 	/*
 	 * ib_create_cq() will initialize cq->ibcq except for cq->ibcq.cqe.
 	 * The number of entries should be >= the number requested or return
@@ -253,7 +258,6 @@ struct ib_cq *ipath_create_cq(struct ib_
 
 	ret = &cq->ibcq;
 
-	dev->n_cqs_allocated++;
 	goto done;
 
 bail_wc:
@@ -280,7 +284,9 @@ int ipath_destroy_cq(struct ib_cq *ibcq)
 	struct ipath_cq *cq = to_icq(ibcq);
 
 	tasklet_kill(&cq->comptask);
+	spin_lock(&dev->n_cqs_lock);
 	dev->n_cqs_allocated--;
+	spin_unlock(&dev->n_cqs_lock);
 	if (cq->ip)
 		kref_put(&cq->ip->ref, ipath_release_mmap_info);
 	else


From bos at pathscale.com  Thu Sep 28 09:00:09 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:09 -0700
Subject: [openib-general] [PATCH 13 of 28] IB/ipath - fix compiler warnings
 and errors on non-x86_64 systems
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <2a328f7db58fad9a19ff.1159459209@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r a7ba4b73f972 -r 2a328f7db58f drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:12 2006 -0700
@@ -206,11 +206,10 @@ static int ipath_get_base_info(struct fi
 		kinfo->spi_subport_rcvhdr_base =
 			(u64) pd->subport_rcvhdr_base & MMAP64_MASK;
 		ipath_cdbg(PROC, "port %u flags %x %llx %llx %llx\n",
-			kinfo->spi_port,
-			kinfo->spi_runtime_flags,
-			kinfo->spi_subport_uregbase,
-			kinfo->spi_subport_rcvegrbuf,
-			kinfo->spi_subport_rcvhdr_base);
+			kinfo->spi_port, kinfo->spi_runtime_flags,
+			(unsigned long long) kinfo->spi_subport_uregbase,
+			(unsigned long long) kinfo->spi_subport_rcvegrbuf,
+			(unsigned long long) kinfo->spi_subport_rcvhdr_base);
 	}
 
 	if (copy_to_user(ubase, kinfo, sizeof(*kinfo)))


From bos at pathscale.com  Thu Sep 28 09:00:07 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:07 -0700
Subject: [openib-general] [PATCH 11 of 28] IB/ipath - ensure that PD of MR
 matches PD of QP checking the Rkey
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <4dbe5e686c780530dd04.1159459207@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c	Thu Sep 28 08:57:12 2006 -0700
@@ -118,9 +118,10 @@ void ipath_free_lkey(struct ipath_lkey_t
  * Check the IB SGE for validity and initialize our internal version
  * of it.
  */
-int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge,
+int ipath_lkey_ok(struct ipath_qp *qp, struct ipath_sge *isge,
 		  struct ib_sge *sge, int acc)
 {
+	struct ipath_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
 	struct ipath_mregion *mr;
 	unsigned n, m;
 	size_t off;
@@ -140,7 +141,8 @@ int ipath_lkey_ok(struct ipath_lkey_tabl
 		goto bail;
 	}
 	mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))];
-	if (unlikely(mr == NULL || mr->lkey != sge->lkey)) {
+	if (unlikely(mr == NULL || mr->lkey != sge->lkey ||
+		     qp->ibqp.pd != mr->pd)) {
 		ret = 0;
 		goto bail;
 	}
@@ -188,9 +190,10 @@ bail:
  *
  * Return 1 if successful, otherwise 0.
  */
-int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss,
+int ipath_rkey_ok(struct ipath_qp *qp, struct ipath_sge_state *ss,
 		  u32 len, u64 vaddr, u32 rkey, int acc)
 {
+	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
 	struct ipath_lkey_table *rkt = &dev->lk_table;
 	struct ipath_sge *sge = &ss->sge;
 	struct ipath_mregion *mr;
@@ -214,7 +217,8 @@ int ipath_rkey_ok(struct ipath_ibdev *de
 	}
 
 	mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))];
-	if (unlikely(mr == NULL || mr->lkey != rkey)) {
+	if (unlikely(mr == NULL || mr->lkey != rkey ||
+		     qp->ibqp.pd != mr->pd)) {
 		ret = 0;
 		goto bail;
 	}
diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_mr.c
--- a/drivers/infiniband/hw/ipath/ipath_mr.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mr.c	Thu Sep 28 08:57:12 2006 -0700
@@ -138,6 +138,7 @@ struct ib_mr *ipath_reg_phys_mr(struct i
 		goto bail;
 	}
 
+	mr->mr.pd = pd;
 	mr->mr.user_base = *iova_start;
 	mr->mr.iova = *iova_start;
 	mr->mr.length = 0;
@@ -197,6 +198,7 @@ struct ib_mr *ipath_reg_user_mr(struct i
 		goto bail;
 	}
 
+	mr->mr.pd = pd;
 	mr->mr.user_base = region->user_base;
 	mr->mr.iova = region->virt_base;
 	mr->mr.length = region->length;
@@ -289,6 +291,7 @@ struct ib_fmr *ipath_alloc_fmr(struct ib
 	 * Resources are allocated but no valid mapping (RKEY can't be
 	 * used).
 	 */
+	fmr->mr.pd = pd;
 	fmr->mr.user_base = 0;
 	fmr->mr.iova = 0;
 	fmr->mr.length = 0;
diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1234,7 +1234,7 @@ static inline int ipath_rc_rcv_error(str
 			 * Address range must be a subset of the original
 			 * request and start on pmtu boundaries.
 			 */
-			ok = ipath_rkey_ok(dev, &qp->s_rdma_sge,
+			ok = ipath_rkey_ok(qp, &qp->s_rdma_sge,
 					   qp->s_rdma_len, vaddr, rkey,
 					   IB_ACCESS_REMOTE_READ);
 			if (unlikely(!ok)) {
@@ -1532,7 +1532,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
 			int ok;
 
 			/* Check rkey & NAK */
-			ok = ipath_rkey_ok(dev, &qp->r_sge,
+			ok = ipath_rkey_ok(qp, &qp->r_sge,
 					   qp->r_len, vaddr, rkey,
 					   IB_ACCESS_REMOTE_WRITE);
 			if (unlikely(!ok))
@@ -1574,7 +1574,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
 			int ok;
 
 			/* Check rkey & NAK */
-			ok = ipath_rkey_ok(dev, &qp->s_rdma_sge,
+			ok = ipath_rkey_ok(qp, &qp->s_rdma_sge,
 					   qp->s_rdma_len, vaddr, rkey,
 					   IB_ACCESS_REMOTE_READ);
 			if (unlikely(!ok)) {
@@ -1633,7 +1633,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
 			goto nack_inv;
 		rkey = be32_to_cpu(ateth->rkey);
 		/* Check rkey & NAK */
-		if (unlikely(!ipath_rkey_ok(dev, &qp->r_sge,
+		if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge,
 					    sizeof(u64), vaddr, rkey,
 					    IB_ACCESS_REMOTE_ATOMIC)))
 			goto nack_acc;
diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -108,7 +108,6 @@ void ipath_insert_rnr_queue(struct ipath
 
 static int init_sge(struct ipath_qp *qp, struct ipath_rwqe *wqe)
 {
-	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
 	int user = to_ipd(qp->ibqp.pd)->user;
 	int i, j, ret;
 	struct ib_wc wc;
@@ -119,8 +118,7 @@ static int init_sge(struct ipath_qp *qp,
 			continue;
 		/* Check LKEY */
 		if ((user && wqe->sg_list[i].lkey == 0) ||
-		    !ipath_lkey_ok(&dev->lk_table,
-				   &qp->r_sg_list[j], &wqe->sg_list[i],
+		    !ipath_lkey_ok(qp, &qp->r_sg_list[j], &wqe->sg_list[i],
 				   IB_ACCESS_LOCAL_WRITE))
 			goto bad_lkey;
 		qp->r_len += wqe->sg_list[i].length;
@@ -326,7 +324,7 @@ again:
 	case IB_WR_RDMA_WRITE:
 		if (wqe->length == 0)
 			break;
-		if (unlikely(!ipath_rkey_ok(dev, &qp->r_sge, wqe->length,
+		if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, wqe->length,
 					    wqe->wr.wr.rdma.remote_addr,
 					    wqe->wr.wr.rdma.rkey,
 					    IB_ACCESS_REMOTE_WRITE))) {
@@ -350,7 +348,7 @@ again:
 		break;
 
 	case IB_WR_RDMA_READ:
-		if (unlikely(!ipath_rkey_ok(dev, &sqp->s_sge, wqe->length,
+		if (unlikely(!ipath_rkey_ok(qp, &sqp->s_sge, wqe->length,
 					    wqe->wr.wr.rdma.remote_addr,
 					    wqe->wr.wr.rdma.rkey,
 					    IB_ACCESS_REMOTE_READ)))
@@ -365,7 +363,7 @@ again:
 
 	case IB_WR_ATOMIC_CMP_AND_SWP:
 	case IB_WR_ATOMIC_FETCH_AND_ADD:
-		if (unlikely(!ipath_rkey_ok(dev, &qp->r_sge, sizeof(u64),
+		if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, sizeof(u64),
 					    wqe->wr.wr.rdma.remote_addr,
 					    wqe->wr.wr.rdma.rkey,
 					    IB_ACCESS_REMOTE_ATOMIC)))
@@ -575,8 +573,7 @@ int ipath_post_ruc_send(struct ipath_qp 
 		}
 		if (wr->sg_list[i].length == 0)
 			continue;
-		if (!ipath_lkey_ok(&to_idev(qp->ibqp.device)->lk_table,
-				   &wqe->sg_list[j], &wr->sg_list[i],
+		if (!ipath_lkey_ok(qp, &wqe->sg_list[j], &wr->sg_list[i],
 				   acc)) {
 			spin_unlock_irqrestore(&qp->s_lock, flags);
 			ret = -EINVAL;
diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_uc.c
--- a/drivers/infiniband/hw/ipath/ipath_uc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -444,7 +444,7 @@ void ipath_uc_rcv(struct ipath_ibdev *de
 			int ok;
 
 			/* Check rkey */
-			ok = ipath_rkey_ok(dev, &qp->r_sge, qp->r_len,
+			ok = ipath_rkey_ok(qp, &qp->r_sge, qp->r_len,
 					   vaddr, rkey,
 					   IB_ACCESS_REMOTE_WRITE);
 			if (unlikely(!ok)) {
diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_ud.c
--- a/drivers/infiniband/hw/ipath/ipath_ud.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c	Thu Sep 28 08:57:12 2006 -0700
@@ -39,7 +39,6 @@ static int init_sge(struct ipath_qp *qp,
 static int init_sge(struct ipath_qp *qp, struct ipath_rwqe *wqe,
 		    u32 *lengthp, struct ipath_sge_state *ss)
 {
-	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
 	int user = to_ipd(qp->ibqp.pd)->user;
 	int i, j, ret;
 	struct ib_wc wc;
@@ -50,8 +49,7 @@ static int init_sge(struct ipath_qp *qp,
 			continue;
 		/* Check LKEY */
 		if ((user && wqe->sg_list[i].lkey == 0) ||
-		    !ipath_lkey_ok(&dev->lk_table,
-				   j ? &ss->sg_list[j - 1] : &ss->sge,
+		    !ipath_lkey_ok(qp, j ? &ss->sg_list[j - 1] : &ss->sge,
 				   &wqe->sg_list[i], IB_ACCESS_LOCAL_WRITE))
 			goto bad_lkey;
 		*lengthp += wqe->sg_list[i].length;
@@ -343,7 +341,7 @@ int ipath_post_ud_send(struct ipath_qp *
 
 		if (wr->sg_list[i].length == 0)
 			continue;
-		if (!ipath_lkey_ok(&dev->lk_table, ss.num_sge ?
+		if (!ipath_lkey_ok(qp, ss.num_sge ?
 				   sg_list + ss.num_sge - 1 : &ss.sge,
 				   &wr->sg_list[i], 0)) {
 			ret = -EINVAL;
diff -r f8c0eb9dc3b8 -r 4dbe5e686c78 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h	Thu Sep 28 08:57:12 2006 -0700
@@ -220,6 +220,7 @@ struct ipath_segarray {
 };
 
 struct ipath_mregion {
+	struct ib_pd *pd;	/* shares refcnt of ibmr.pd */
 	u64 user_base;		/* User's address for this region */
 	u64 iova;		/* IB start address of this region */
 	size_t length;
@@ -657,12 +658,6 @@ int ipath_verbs_send(struct ipath_devdat
 
 void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);
 
-int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss,
-		  u32 len, u64 vaddr, u32 rkey, int acc);
-
-int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge,
-		  struct ib_sge *sge, int acc);
-
 void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length);
 
 void ipath_skip_sge(struct ipath_sge_state *ss, u32 length);
@@ -687,10 +682,10 @@ int ipath_alloc_lkey(struct ipath_lkey_t
 
 void ipath_free_lkey(struct ipath_lkey_table *rkt, u32 lkey);
 
-int ipath_lkey_ok(struct ipath_lkey_table *rkt, struct ipath_sge *isge,
+int ipath_lkey_ok(struct ipath_qp *qp, struct ipath_sge *isge,
 		  struct ib_sge *sge, int acc);
 
-int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss,
+int ipath_rkey_ok(struct ipath_qp *qp, struct ipath_sge_state *ss,
 		  u32 len, u64 vaddr, u32 rkey, int acc);
 
 int ipath_post_srq_receive(struct ib_srq *ibsrq, struct ib_recv_wr *wr,


From bos at pathscale.com  Thu Sep 28 09:00:08 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:08 -0700
Subject: [openib-general] [PATCH 12 of 28] IB/ipath - print more informative
 parity error messages
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <a7ba4b73f972dba7eb77.1159459208@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:12 2006 -0700
@@ -389,17 +389,28 @@ static void hwerr_crcbits(struct ipath_d
 				     _IPATH_HTLINK1_CRCBITS)));
 }
 
+/* 6110 specific hardware errors... */
+static const struct ipath_hwerror_msgs ipath_6110_hwerror_msgs[] = {
+	INFINIPATH_HWE_MSG(HTCBUSIREQPARITYERR, "HTC Ireq Parity"),
+	INFINIPATH_HWE_MSG(HTCBUSTREQPARITYERR, "HTC Treq Parity"),
+	INFINIPATH_HWE_MSG(HTCBUSTRESPPARITYERR, "HTC Tresp Parity"),
+	INFINIPATH_HWE_MSG(HTCMISCERR5, "HT core Misc5"),
+	INFINIPATH_HWE_MSG(HTCMISCERR6, "HT core Misc6"),
+	INFINIPATH_HWE_MSG(HTCMISCERR7, "HT core Misc7"),
+	INFINIPATH_HWE_MSG(RXDSYNCMEMPARITYERR, "Rx Dsync"),
+	INFINIPATH_HWE_MSG(SERDESPLLFAILED, "SerDes PLL"),
+};
+
 /**
- * ipath_ht_handle_hwerrors - display hardware errors
+ * ipath_ht_handle_hwerrors - display hardware errors.
  * @dd: the infinipath device
  * @msg: the output buffer
  * @msgl: the size of the output buffer
  *
- * Use same msg buffer as regular errors to avoid
- * excessive stack use.  Most hardware errors are catastrophic, but for
- * right now, we'll print them and continue.
- * We reuse the same message buffer as ipath_handle_errors() to avoid
- * excessive stack usage.
+ * Use same msg buffer as regular errors to avoid excessive stack
+ * use.  Most hardware errors are catastrophic, but for right now,
+ * we'll print them and continue.  We reuse the same message buffer as
+ * ipath_handle_errors() to avoid excessive stack usage.
  */
 static void ipath_ht_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 				     size_t msgl)
@@ -499,44 +510,16 @@ static void ipath_ht_handle_hwerrors(str
 			 bits);
 		strlcat(msg, bitsmsg, msgl);
 	}
-	if (hwerrs & (INFINIPATH_HWE_RXEMEMPARITYERR_MASK
-		      << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT)) {
-		bits = (u32) ((hwerrs >>
-			       INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) &
-			      INFINIPATH_HWE_RXEMEMPARITYERR_MASK);
-		snprintf(bitsmsg, sizeof bitsmsg, "[RXE Parity Errs %x] ",
-			 bits);
-		strlcat(msg, bitsmsg, msgl);
-	}
-	if (hwerrs & (INFINIPATH_HWE_TXEMEMPARITYERR_MASK
-		      << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) {
-		bits = (u32) ((hwerrs >>
-			       INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) &
-			      INFINIPATH_HWE_TXEMEMPARITYERR_MASK);
-		snprintf(bitsmsg, sizeof bitsmsg, "[TXE Parity Errs %x] ",
-			 bits);
-		strlcat(msg, bitsmsg, msgl);
-	}
-	if (hwerrs & INFINIPATH_HWE_IBCBUSTOSPCPARITYERR)
-		strlcat(msg, "[IB2IPATH Parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_IBCBUSFRSPCPARITYERR)
-		strlcat(msg, "[IPATH2IB Parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_HTCBUSIREQPARITYERR)
-		strlcat(msg, "[HTC Ireq Parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_HTCBUSTREQPARITYERR)
-		strlcat(msg, "[HTC Treq Parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_HTCBUSTRESPPARITYERR)
-		strlcat(msg, "[HTC Tresp Parity]", msgl);
+
+	ipath_format_hwerrors(hwerrs,
+			      ipath_6110_hwerror_msgs,
+			      sizeof(ipath_6110_hwerror_msgs) /
+			      sizeof(ipath_6110_hwerror_msgs[0]),
+			      msg, msgl);
 
 	if (hwerrs & (_IPATH_HTLINK0_CRCBITS | _IPATH_HTLINK1_CRCBITS))
 		hwerr_crcbits(dd, hwerrs, msg, msgl);
 
-	if (hwerrs & INFINIPATH_HWE_HTCMISCERR5)
-		strlcat(msg, "[HT core Misc5]", msgl);
-	if (hwerrs & INFINIPATH_HWE_HTCMISCERR6)
-		strlcat(msg, "[HT core Misc6]", msgl);
-	if (hwerrs & INFINIPATH_HWE_HTCMISCERR7)
-		strlcat(msg, "[HT core Misc7]", msgl);
 	if (hwerrs & INFINIPATH_HWE_MEMBISTFAILED) {
 		strlcat(msg, "[Memory BIST test failed, InfiniPath hardware unusable]",
 			msgl);
@@ -572,11 +555,6 @@ static void ipath_ht_handle_hwerrors(str
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
 				 dd->ipath_hwerrmask);
 	}
-
-	if (hwerrs & INFINIPATH_HWE_RXDSYNCMEMPARITYERR)
-		strlcat(msg, "[Rx Dsync]", msgl);
-	if (hwerrs & INFINIPATH_HWE_SERDESPLLFAILED)
-		strlcat(msg, "[SerDes PLL]", msgl);
 
 	ipath_dev_err(dd, "%s hardware error\n", msg);
 	if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg)
diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
@@ -301,6 +301,26 @@ static const struct ipath_cregs ipath_pe
  */
 #define INFINIPATH_XGXS_SUPPRESS_ARMLAUNCH_ERR (1ULL<<63)
 
+/* 6120 specific hardware errors... */
+static const struct ipath_hwerror_msgs ipath_6120_hwerror_msgs[] = {
+	INFINIPATH_HWE_MSG(PCIEPOISONEDTLP, "PCIe Poisoned TLP"),
+	INFINIPATH_HWE_MSG(PCIECPLTIMEOUT, "PCIe completion timeout"),
+	/*
+	 * In practice, it's unlikely wthat we'll see PCIe PLL, or bus
+	 * parity or memory parity error failures, because most likely we
+	 * won't be able to talk to the core of the chip.  Nonetheless, we
+	 * might see them, if they are in parts of the PCIe core that aren't
+	 * essential.
+	 */
+	INFINIPATH_HWE_MSG(PCIE1PLLFAILED, "PCIePLL1"),
+	INFINIPATH_HWE_MSG(PCIE0PLLFAILED, "PCIePLL0"),
+	INFINIPATH_HWE_MSG(PCIEBUSPARITYXTLH, "PCIe XTLH core parity"),
+	INFINIPATH_HWE_MSG(PCIEBUSPARITYXADM, "PCIe ADM TX core parity"),
+	INFINIPATH_HWE_MSG(PCIEBUSPARITYRADM, "PCIe ADM RX core parity"),
+	INFINIPATH_HWE_MSG(RXDSYNCMEMPARITYERR, "Rx Dsync"),
+	INFINIPATH_HWE_MSG(SERDESPLLFAILED, "SerDes PLL"),
+};
+
 /**
  * ipath_pe_handle_hwerrors - display hardware errors.
  * @dd: the infinipath device
@@ -403,24 +423,13 @@ static void ipath_pe_handle_hwerrors(str
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
 				 dd->ipath_hwerrmask);
 	}
-	if (hwerrs & (INFINIPATH_HWE_RXEMEMPARITYERR_MASK
-		      << INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT)) {
-		bits = (u32) ((hwerrs >>
-			       INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) &
-			      INFINIPATH_HWE_RXEMEMPARITYERR_MASK);
-		snprintf(bitsmsg, sizeof bitsmsg, "[RXE Parity Errs %x] ",
-			 bits);
-		strlcat(msg, bitsmsg, msgl);
-	}
-	if (hwerrs & (INFINIPATH_HWE_TXEMEMPARITYERR_MASK
-		      << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) {
-		bits = (u32) ((hwerrs >>
-			       INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) &
-			      INFINIPATH_HWE_TXEMEMPARITYERR_MASK);
-		snprintf(bitsmsg, sizeof bitsmsg, "[TXE Parity Errs %x] ",
-			 bits);
-		strlcat(msg, bitsmsg, msgl);
-	}
+
+	ipath_format_hwerrors(hwerrs,
+			      ipath_6120_hwerror_msgs,
+			      sizeof(ipath_6120_hwerror_msgs)/
+			      sizeof(ipath_6120_hwerror_msgs[0]),
+			      msg, msgl);
+
 	if (hwerrs & (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK
 		      << INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT)) {
 		bits = (u32) ((hwerrs >>
@@ -430,10 +439,6 @@ static void ipath_pe_handle_hwerrors(str
 			 "[PCIe Mem Parity Errs %x] ", bits);
 		strlcat(msg, bitsmsg, msgl);
 	}
-	if (hwerrs & INFINIPATH_HWE_IBCBUSTOSPCPARITYERR)
-		strlcat(msg, "[IB2IPATH Parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_IBCBUSFRSPCPARITYERR)
-		strlcat(msg, "[IPATH2IB Parity]", msgl);
 
 #define _IPATH_PLL_FAIL (INFINIPATH_HWE_COREPLL_FBSLIP |	\
 			 INFINIPATH_HWE_COREPLL_RFSLIP )
@@ -458,34 +463,6 @@ static void ipath_pe_handle_hwerrors(str
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
 				 dd->ipath_hwerrmask);
 	}
-
-	if (hwerrs & INFINIPATH_HWE_PCIEPOISONEDTLP)
-		strlcat(msg, "[PCIe Poisoned TLP]", msgl);
-	if (hwerrs & INFINIPATH_HWE_PCIECPLTIMEOUT)
-		strlcat(msg, "[PCIe completion timeout]", msgl);
-
-	/*
-	 * In practice, it's unlikely wthat we'll see PCIe PLL, or bus
-	 * parity or memory parity error failures, because most likely we
-	 * won't be able to talk to the core of the chip.  Nonetheless, we
-	 * might see them, if they are in parts of the PCIe core that aren't
-	 * essential.
-	 */
-	if (hwerrs & INFINIPATH_HWE_PCIE1PLLFAILED)
-		strlcat(msg, "[PCIePLL1]", msgl);
-	if (hwerrs & INFINIPATH_HWE_PCIE0PLLFAILED)
-		strlcat(msg, "[PCIePLL0]", msgl);
-	if (hwerrs & INFINIPATH_HWE_PCIEBUSPARITYXTLH)
-		strlcat(msg, "[PCIe XTLH core parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_PCIEBUSPARITYXADM)
-		strlcat(msg, "[PCIe ADM TX core parity]", msgl);
-	if (hwerrs & INFINIPATH_HWE_PCIEBUSPARITYRADM)
-		strlcat(msg, "[PCIe ADM RX core parity]", msgl);
-
-	if (hwerrs & INFINIPATH_HWE_RXDSYNCMEMPARITYERR)
-		strlcat(msg, "[Rx Dsync]", msgl);
-	if (hwerrs & INFINIPATH_HWE_SERDESPLLFAILED)
-		strlcat(msg, "[SerDes PLL]", msgl);
 
 	ipath_dev_err(dd, "%s hardware error\n", msg);
 	if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) {
diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
@@ -132,6 +132,82 @@ static u64 handle_e_sum_errs(struct ipat
 	return ignore_this_time;
 }
 
+/* generic hw error messages... */
+#define INFINIPATH_HWE_TXEMEMPARITYERR_MSG(a) \
+	{ \
+		.mask = ( INFINIPATH_HWE_TXEMEMPARITYERR_##a <<    \
+			  INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT ),   \
+		.msg = "TXE " #a " Memory Parity"	     \
+	}
+#define INFINIPATH_HWE_RXEMEMPARITYERR_MSG(a) \
+	{ \
+		.mask = ( INFINIPATH_HWE_RXEMEMPARITYERR_##a <<    \
+			  INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT ),   \
+		.msg = "RXE " #a " Memory Parity"	     \
+	}
+
+static const struct ipath_hwerror_msgs ipath_generic_hwerror_msgs[] = {
+	INFINIPATH_HWE_MSG(IBCBUSFRSPCPARITYERR, "IPATH2IB Parity"),
+	INFINIPATH_HWE_MSG(IBCBUSTOSPCPARITYERR, "IB2IPATH Parity"),
+
+	INFINIPATH_HWE_TXEMEMPARITYERR_MSG(PIOBUF),
+	INFINIPATH_HWE_TXEMEMPARITYERR_MSG(PIOPBC),
+	INFINIPATH_HWE_TXEMEMPARITYERR_MSG(PIOLAUNCHFIFO),
+
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(RCVBUF),
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(LOOKUPQ),
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(EAGERTID),
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(EXPTID),
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(FLAGBUF),
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(DATAINFO),
+	INFINIPATH_HWE_RXEMEMPARITYERR_MSG(HDRINFO),
+};
+
+/**
+ * ipath_format_hwmsg - format a single hwerror message
+ * @msg message buffer
+ * @msgl length of message buffer
+ * @hwmsg message to add to message buffer
+ */
+static void ipath_format_hwmsg(char *msg, size_t msgl, const char *hwmsg)
+{
+	strlcat(msg, "[", msgl);
+	strlcat(msg, hwmsg, msgl);
+	strlcat(msg, "]", msgl);
+}
+
+/**
+ * ipath_format_hwerrors - format hardware error messages for display
+ * @hwerrs hardware errors bit vector
+ * @hwerrmsgs hardware error descriptions
+ * @nhwerrmsgs number of hwerrmsgs
+ * @msg message buffer
+ * @msgl message buffer length
+ */
+void ipath_format_hwerrors(u64 hwerrs,
+			   const struct ipath_hwerror_msgs *hwerrmsgs,
+			   size_t nhwerrmsgs,
+			   char *msg, size_t msgl)
+{
+	int i;
+	const int glen =
+	    sizeof(ipath_generic_hwerror_msgs) /
+	    sizeof(ipath_generic_hwerror_msgs[0]);
+
+	for (i=0; i<glen; i++) {
+		if (hwerrs & ipath_generic_hwerror_msgs[i].mask) {
+			ipath_format_hwmsg(msg, msgl,
+					   ipath_generic_hwerror_msgs[i].msg);
+		}
+	}
+
+	for (i=0; i<nhwerrmsgs; i++) {
+		if (hwerrs & hwerrmsgs[i].mask) {
+			ipath_format_hwmsg(msg, msgl, hwerrmsgs[i].msg);
+		}
+	}
+}
+
 /* return the strings for the most common link states */
 static char *ib_linkstate(u32 linkstate)
 {
diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
@@ -897,4 +897,20 @@ extern struct mutex ipath_mutex;
 
 #endif /* _IPATH_DEBUGGING */
 
+/*
+ * this is used for formatting hw error messages...
+ */
+struct ipath_hwerror_msgs {
+	u64 mask;
+	const char *msg;
+};
+
+#define INFINIPATH_HWE_MSG(a, b) { .mask = INFINIPATH_HWE_##a, .msg = b }
+
+/* in ipath_intr.c... */
+void ipath_format_hwerrors(u64 hwerrs,
+			   const struct ipath_hwerror_msgs *hwerrmsgs,
+			   size_t nhwerrmsgs,
+			   char *msg, size_t lmsg);
+
 #endif				/* _IPATH_KERNEL_H */
diff -r 4dbe5e686c78 -r a7ba4b73f972 drivers/infiniband/hw/ipath/ipath_registers.h
--- a/drivers/infiniband/hw/ipath/ipath_registers.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h	Thu Sep 28 08:57:12 2006 -0700
@@ -134,10 +134,24 @@
 #define INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT 40
 #define INFINIPATH_HWE_RXEMEMPARITYERR_MASK 0x7FULL
 #define INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT 44
-#define INFINIPATH_HWE_RXDSYNCMEMPARITYERR  0x0000000400000000ULL
-#define INFINIPATH_HWE_MEMBISTFAILED        0x0040000000000000ULL
 #define INFINIPATH_HWE_IBCBUSTOSPCPARITYERR 0x4000000000000000ULL
 #define INFINIPATH_HWE_IBCBUSFRSPCPARITYERR 0x8000000000000000ULL
+/* txe mem parity errors (shift by INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) */
+#define INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF	0x1ULL
+#define INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC	0x2ULL
+#define INFINIPATH_HWE_TXEMEMPARITYERR_PIOLAUNCHFIFO 0x4ULL
+/* rxe mem parity errors (shift by INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) */
+#define INFINIPATH_HWE_RXEMEMPARITYERR_RCVBUF   0x01ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_LOOKUPQ  0x02ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_EAGERTID 0x04ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_EXPTID   0x08ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_FLAGBUF  0x10ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_DATAINFO 0x20ULL
+#define INFINIPATH_HWE_RXEMEMPARITYERR_HDRINFO  0x40ULL
+/* waldo specific -- find the rest in ipath_6110.c */
+#define INFINIPATH_HWE_RXDSYNCMEMPARITYERR  0x0000000400000000ULL
+/* monty specific -- find the rest in ipath_6120.c */
+#define INFINIPATH_HWE_MEMBISTFAILED	0x0040000000000000ULL
 
 /* kr_hwdiagctrl bits */
 #define INFINIPATH_DC_FORCETXEMEMPARITYERR_MASK 0xFULL


From bos at pathscale.com  Thu Sep 28 09:00:20 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:20 -0700
Subject: [openib-general] [PATCH 24 of 28] IB/mthca - Fix compiler warnings
 with gcc4 on possible unitialized variables
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com>

It's possible (from the compiler perspective) that f0 is unitialized
in two functions (shows up with gcc4.0.2 on fc4, for example).  Initialize
to zero to fix warning.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 6a9a67c2b35a -r 9fa624c592af drivers/infiniband/hw/mthca/mthca_qp.c
--- a/drivers/infiniband/hw/mthca/mthca_qp.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c	Thu Sep 28 08:57:13 2006 -0700
@@ -1527,7 +1527,7 @@ int mthca_tavor_post_send(struct ib_qp *
 	int i;
 	int size;
 	int size0 = 0;
-	u32 f0;
+	u32 f0 = 0;
 	int ind;
 	u8 op0 = 0;
 
@@ -1870,7 +1870,7 @@ int mthca_arbel_post_send(struct ib_qp *
 	int i;
 	int size;
 	int size0 = 0;
-	u32 f0;
+	u32 f0 = 0;
 	int ind;
 	u8 op0 = 0;
 

From bos at pathscale.com  Thu Sep 28 09:00:12 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:12 -0700
Subject: [openib-general] [PATCH 16 of 28] IB/ipath - drop unnecessary
	"(void *)" casts
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <cdbbf110848d15d93674.1159459212@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r dcf5ac390abd -r cdbbf110848d drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1350,7 +1350,7 @@ int ipath_create_rcvhdrq(struct ipath_de
 
 	/* clear for security and sanity on each use */
 	memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size);
-	memset((void *)pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE);
+	memset(pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE);
 
 	/*
 	 * tell chip each time we init it, even if we are re-using previous
@@ -1803,7 +1803,7 @@ void ipath_free_pddata(struct ipath_devd
 		pd->port_rcvhdrq = NULL;
 		if (pd->port_rcvhdrtail_kvaddr) {
 			dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
-					 (void *)pd->port_rcvhdrtail_kvaddr,
+					 pd->port_rcvhdrtail_kvaddr,
 					 pd->port_rcvhdrqtailaddr_phys);
 			pd->port_rcvhdrtail_kvaddr = NULL;
 		}
@@ -1934,7 +1934,7 @@ static void cleanup_device(struct ipath_
 
 	if (dd->ipath_pioavailregs_dma) {
 		dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
-				  (void *) dd->ipath_pioavailregs_dma,
+				  dd->ipath_pioavailregs_dma,
 				  dd->ipath_pioavailregs_phys);
 		dd->ipath_pioavailregs_dma = NULL;
 	}


From bos at pathscale.com  Thu Sep 28 09:00:19 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:19 -0700
Subject: [openib-general] [PATCH 23 of 28] IB/ipath - fix EEPROM read when
 driver is compiled with -Os
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <6a9a67c2b35aa7f6636f.1159459219@eng-12.pathscale.com>

The EEPROM is read via programmable I/O pins. When the driver
is compiled -Os, the CPU can speculatively read the I/O
value before it is valid.  This patch fixes the problem.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 5aea5f31529d -r 6a9a67c2b35a drivers/infiniband/hw/ipath/ipath_eeprom.c
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c	Thu Sep 28 08:57:13 2006 -0700
@@ -187,6 +187,7 @@ static void i2c_wait_for_writes(struct i
 static void i2c_wait_for_writes(struct ipath_devdata *dd)
 {
 	(void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	rmb();
 }
 
 static void scl_out(struct ipath_devdata *dd, u8 bit)


From bos at pathscale.com  Thu Sep 28 09:00:15 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:15 -0700
Subject: [openib-general] [PATCH 19 of 28] IB/ipath - call mtrr_del with
 correct arguments
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <858280e8cbab089eab00.1159459215@eng-12.pathscale.com>

We were passing 0 for base and length, which worked on older kernels,
but it doesn't seem to any longer.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r de99d6fb2d1d -r 858280e8cbab drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
@@ -336,6 +336,8 @@ struct ipath_devdata {
 	u8 ipath_ht_slave_off;
 	/* for write combining settings */
 	unsigned long ipath_wc_cookie;
+	unsigned long ipath_wc_base;
+	unsigned long ipath_wc_len;
 	/* ref count for each pkey */
 	atomic_t ipath_pkeyrefs[4];
 	/* shadow copy of all exptids physaddr; used only by funcsim */
diff -r de99d6fb2d1d -r 858280e8cbab drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
--- a/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c	Thu Sep 28 08:57:12 2006 -0700
@@ -123,6 +123,8 @@ int ipath_enable_wc(struct ipath_devdata
 			ipath_cdbg(VERBOSE, "Set mtrr for chip to WC, "
 				   "cookie is %d\n", cookie);
 			dd->ipath_wc_cookie = cookie;
+			dd->ipath_wc_base = (unsigned long) pioaddr;
+			dd->ipath_wc_len = (unsigned long) piolen;
 		}
 	}
 
@@ -136,9 +138,16 @@ void ipath_disable_wc(struct ipath_devda
 void ipath_disable_wc(struct ipath_devdata *dd)
 {
 	if (dd->ipath_wc_cookie) {
+		int r;
 		ipath_cdbg(VERBOSE, "undoing WCCOMB on pio buffers\n");
-		mtrr_del(dd->ipath_wc_cookie, 0, 0);
-		dd->ipath_wc_cookie = 0;
+		r = mtrr_del(dd->ipath_wc_cookie, dd->ipath_wc_base,
+			     dd->ipath_wc_len);
+		if (r < 0)
+			dev_info(&dd->pcidev->dev,
+				 "mtrr_del(%lx, %lx, %lx) failed: %d\n",
+				 dd->ipath_wc_cookie, dd->ipath_wc_base,
+				 dd->ipath_wc_len, r);
+		dd->ipath_wc_cookie = 0; // even on failure
 	}
 }
 

From bos at pathscale.com  Thu Sep 28 09:00:14 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:14 -0700
Subject: [openib-general] [PATCH 18 of 28] IB/ipath - flush RWQEs if access
 error or invalid error seen
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <de99d6fb2d1db47c1048.1159459214@eng-12.pathscale.com>

If the receiver goes into the error state, we need to flush the
posted receive WQEs.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c	Thu Sep 28 08:57:12 2006 -0700
@@ -335,6 +335,7 @@ static void ipath_reset_qp(struct ipath_
 	qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
 	qp->r_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
 	qp->r_nak_state = 0;
+	qp->r_wrid_valid = 0;
 	qp->s_rnr_timeout = 0;
 	qp->s_head = 0;
 	qp->s_tail = 0;
@@ -353,12 +354,13 @@ static void ipath_reset_qp(struct ipath_
 /**
  * ipath_error_qp - put a QP into an error state
  * @qp: the QP to put into an error state
+ * @err: the receive completion error to signal if a RWQE is active
  *
  * Flushes both send and receive work queues.
  * QP s_lock should be held and interrupts disabled.
  */
 
-void ipath_error_qp(struct ipath_qp *qp)
+void ipath_error_qp(struct ipath_qp *qp, enum ib_wc_status err)
 {
 	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
 	struct ib_wc wc;
@@ -374,7 +376,6 @@ void ipath_error_qp(struct ipath_qp *qp)
 		list_del_init(&qp->piowait);
 	spin_unlock(&dev->pending_lock);
 
-	wc.status = IB_WC_WR_FLUSH_ERR;
 	wc.vendor_err = 0;
 	wc.byte_len = 0;
 	wc.imm_data = 0;
@@ -386,6 +387,12 @@ void ipath_error_qp(struct ipath_qp *qp)
 	wc.sl = 0;
 	wc.dlid_path_bits = 0;
 	wc.port_num = 0;
+	if (qp->r_wrid_valid) {
+		qp->r_wrid_valid = 0;
+		wc.status = err;
+		ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 1);
+	}
+	wc.status = IB_WC_WR_FLUSH_ERR;
 
 	while (qp->s_last != qp->s_head) {
 		struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);
@@ -502,7 +509,7 @@ int ipath_modify_qp(struct ib_qp *ibqp, 
 		break;
 
 	case IB_QPS_ERR:
-		ipath_error_qp(qp);
+		ipath_error_qp(qp, IB_WC_GENERAL_ERR);
 		break;
 
 	default:
diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1293,6 +1293,14 @@ done:
 	return 1;
 }
 
+static void ipath_rc_error(struct ipath_qp *qp, enum ib_wc_status err)
+{
+	spin_lock_irq(&qp->s_lock);
+	qp->state = IB_QPS_ERR;
+	ipath_error_qp(qp, err);
+	spin_unlock_irq(&qp->s_lock);
+}
+
 /**
  * ipath_rc_rcv - process an incoming RC packet
  * @dev: the device this packet came in on
@@ -1385,8 +1393,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
 		 */
 		if (qp->r_ack_state >= OP(COMPARE_SWAP))
 			goto send_ack;
-		/* XXX Flush WQEs */
-		qp->state = IB_QPS_ERR;
+		ipath_rc_error(qp, IB_WC_REM_INV_REQ_ERR);
 		qp->r_ack_state = OP(SEND_ONLY);
 		qp->r_nak_state = IB_NAK_INVALID_REQUEST;
 		qp->r_ack_psn = qp->r_psn;
@@ -1492,9 +1499,9 @@ void ipath_rc_rcv(struct ipath_ibdev *de
 			goto nack_inv;
 		ipath_copy_sge(&qp->r_sge, data, tlen);
 		qp->r_msn++;
-		if (opcode == OP(RDMA_WRITE_LAST) ||
-		    opcode == OP(RDMA_WRITE_ONLY))
+		if (!qp->r_wrid_valid)
 			break;
+		qp->r_wrid_valid = 0;
 		wc.wr_id = qp->r_wr_id;
 		wc.status = IB_WC_SUCCESS;
 		wc.opcode = IB_WC_RECV;
@@ -1685,8 +1692,7 @@ nack_acc:
 	 * is pending though.
 	 */
 	if (qp->r_ack_state < OP(COMPARE_SWAP)) {
-		/* XXX Flush WQEs */
-		qp->state = IB_QPS_ERR;
+		ipath_rc_error(qp, IB_WC_REM_ACCESS_ERR);
 		qp->r_ack_state = OP(RDMA_WRITE_ONLY);
 		qp->r_nak_state = IB_NAK_REMOTE_ACCESS_ERROR;
 		qp->r_ack_psn = qp->r_psn;
diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c	Thu Sep 28 08:57:12 2006 -0700
@@ -229,6 +229,7 @@ int ipath_get_rwqe(struct ipath_qp *qp, 
 		}
 	}
 	spin_unlock_irqrestore(&rq->lock, flags);
+	qp->r_wrid_valid = 1;
 
 bail:
 	return ret;
diff -r f6794c8289ab -r de99d6fb2d1d drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h	Thu Sep 28 08:57:12 2006 -0700
@@ -365,6 +365,7 @@ struct ipath_qp {
 	u8 r_min_rnr_timer;	/* retry timeout value for RNR NAKs */
 	u8 r_reuse_sge;		/* for UC receive errors */
 	u8 r_sge_inx;		/* current index into sg_list */
+	u8 r_wrid_valid;	/* r_wrid set but CQ entry not yet made */
 	u8 qp_access_flags;
 	u8 s_max_sge;		/* size of s_wq->sg_list */
 	u8 s_retry_cnt;		/* number of times to retry */
@@ -639,6 +640,8 @@ struct ib_qp *ipath_create_qp(struct ib_
 
 int ipath_destroy_qp(struct ib_qp *ibqp);
 
+void ipath_error_qp(struct ipath_qp *qp, enum ib_wc_status err);
+
 int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 		    int attr_mask, struct ib_udata *udata);
 

From bos at pathscale.com  Thu Sep 28 09:00:11 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:11 -0700
Subject: [openib-general] [PATCH 15 of 28] IB/ipath - support multiple
 simultaneous devices of different types
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <dcf5ac390abd8d130212.1159459211@eng-12.pathscale.com>

Prior to this change, the driver was not able to support a HT and PCIE
card simultaneously present in the same machine.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
@@ -94,16 +94,6 @@ const char *ipath_ibcstatus_str[] = {
 	"RecovWaitRmt",
 	"RecovIdle",
 };
-
-/*
- * These variables are initialized in the chip-specific files
- * but are defined here.
- */
-u16 ipath_gpio_sda_num, ipath_gpio_scl_num;
-u64 ipath_gpio_sda, ipath_gpio_scl;
-u64 infinipath_i_bitsextant;
-ipath_err_t infinipath_e_bitsextant, infinipath_hwe_bitsextant;
-u32 infinipath_i_rcvavail_mask, infinipath_i_rcvurg_mask;
 
 static void __devexit ipath_remove_one(struct pci_dev *);
 static int __devinit ipath_init_one(struct pci_dev *,
diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_eeprom.c
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c	Thu Sep 28 08:57:12 2006 -0700
@@ -100,9 +100,9 @@ static int i2c_gpio_set(struct ipath_dev
 	gpioval = &dd->ipath_gpio_out;
 	read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl);
 	if (line == i2c_line_scl)
-		mask = ipath_gpio_scl;
+		mask = dd->ipath_gpio_scl;
 	else
-		mask = ipath_gpio_sda;
+		mask = dd->ipath_gpio_sda;
 
 	if (new_line_state == i2c_line_high)
 		/* tri-state the output rather than force high */
@@ -119,12 +119,12 @@ static int i2c_gpio_set(struct ipath_dev
 		write_val = 0x0UL;
 
 	if (line == i2c_line_scl) {
-		write_val <<= ipath_gpio_scl_num;
-		*gpioval = *gpioval & ~(1UL << ipath_gpio_scl_num);
+		write_val <<= dd->ipath_gpio_scl_num;
+		*gpioval = *gpioval & ~(1UL << dd->ipath_gpio_scl_num);
 		*gpioval |= write_val;
 	} else {
-		write_val <<= ipath_gpio_sda_num;
-		*gpioval = *gpioval & ~(1UL << ipath_gpio_sda_num);
+		write_val <<= dd->ipath_gpio_sda_num;
+		*gpioval = *gpioval & ~(1UL << dd->ipath_gpio_sda_num);
 		*gpioval |= write_val;
 	}
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_out, *gpioval);
@@ -157,9 +157,9 @@ static int i2c_gpio_get(struct ipath_dev
 	read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extctrl);
 	/* config line to be an input */
 	if (line == i2c_line_scl)
-		mask = ipath_gpio_scl;
+		mask = dd->ipath_gpio_scl;
 	else
-		mask = ipath_gpio_sda;
+		mask = dd->ipath_gpio_sda;
 	write_val = read_val & ~mask;
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, write_val);
 	read_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extstatus);
diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:12 2006 -0700
@@ -252,8 +252,8 @@ static const struct ipath_cregs ipath_ht
 };
 
 /* kr_intstatus, kr_intclear, kr_intmask bits */
-#define INFINIPATH_I_RCVURG_MASK 0x1FF
-#define INFINIPATH_I_RCVAVAIL_MASK 0x1FF
+#define INFINIPATH_I_RCVURG_MASK ((1U<<9)-1)
+#define INFINIPATH_I_RCVAVAIL_MASK ((1U<<9)-1)
 
 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
 #define INFINIPATH_HWE_HTCMEMPARITYERR_SHIFT 0
@@ -457,10 +457,10 @@ static void ipath_ht_handle_hwerrors(str
 			 "(cleared)\n", (unsigned long long) hwerrs);
 	dd->ipath_lasthwerror |= hwerrs;
 
-	if (hwerrs & ~infinipath_hwe_bitsextant)
+	if (hwerrs & ~dd->ipath_hwe_bitsextant)
 		ipath_dev_err(dd, "hwerror interrupt with unknown errors "
 			      "%llx set\n", (unsigned long long)
-			      (hwerrs & ~infinipath_hwe_bitsextant));
+			      (hwerrs & ~dd->ipath_hwe_bitsextant));
 
 	ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control);
 	if (ctrl & INFINIPATH_C_FREEZEMODE) {
@@ -1059,21 +1059,21 @@ static void ipath_setup_ht_setextled(str
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl);
 }
 
-static void ipath_init_ht_variables(void)
-{
-	ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM;
-	ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM;
-	ipath_gpio_sda = IPATH_GPIO_SDA;
-	ipath_gpio_scl = IPATH_GPIO_SCL;
-
-	infinipath_i_bitsextant =
+static void ipath_init_ht_variables(struct ipath_devdata *dd)
+{
+	dd->ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM;
+	dd->ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM;
+	dd->ipath_gpio_sda = IPATH_GPIO_SDA;
+	dd->ipath_gpio_scl = IPATH_GPIO_SCL;
+
+	dd->ipath_i_bitsextant =
 		(INFINIPATH_I_RCVURG_MASK << INFINIPATH_I_RCVURG_SHIFT) |
 		(INFINIPATH_I_RCVAVAIL_MASK <<
 		 INFINIPATH_I_RCVAVAIL_SHIFT) |
 		INFINIPATH_I_ERROR | INFINIPATH_I_SPIOSENT |
 		INFINIPATH_I_SPIOBUFAVAIL | INFINIPATH_I_GPIO;
 
-	infinipath_e_bitsextant =
+	dd->ipath_e_bitsextant =
 		INFINIPATH_E_RFORMATERR | INFINIPATH_E_RVCRC |
 		INFINIPATH_E_RICRC | INFINIPATH_E_RMINPKTLEN |
 		INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RLONGPKTLEN |
@@ -1091,7 +1091,7 @@ static void ipath_init_ht_variables(void
 		INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET |
 		INFINIPATH_E_HARDWARE;
 
-	infinipath_hwe_bitsextant =
+	dd->ipath_hwe_bitsextant =
 		(INFINIPATH_HWE_HTCMEMPARITYERR_MASK <<
 		 INFINIPATH_HWE_HTCMEMPARITYERR_SHIFT) |
 		(INFINIPATH_HWE_TXEMEMPARITYERR_MASK <<
@@ -1120,8 +1120,8 @@ static void ipath_init_ht_variables(void
 		INFINIPATH_HWE_IBCBUSTOSPCPARITYERR |
 		INFINIPATH_HWE_IBCBUSFRSPCPARITYERR;
 
-	infinipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
-	infinipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
+	dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
+	dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
 }
 
 /**
@@ -1586,5 +1586,5 @@ void ipath_init_iba6110_funcs(struct ipa
 	 * do very early init that is needed before ipath_f_bus is
 	 * called
 	 */
-	ipath_init_ht_variables();
-}
+	ipath_init_ht_variables(dd);
+}
diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
@@ -263,8 +263,8 @@ static const struct ipath_cregs ipath_pe
 };
 
 /* kr_intstatus, kr_intclear, kr_intmask bits */
-#define INFINIPATH_I_RCVURG_MASK 0x1F
-#define INFINIPATH_I_RCVAVAIL_MASK 0x1F
+#define INFINIPATH_I_RCVURG_MASK ((1U<<5)-1)
+#define INFINIPATH_I_RCVAVAIL_MASK ((1U<<5)-1)
 
 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
 #define INFINIPATH_HWE_PCIEMEMPARITYERR_MASK  0x000000000000003fULL
@@ -376,10 +376,10 @@ static void ipath_pe_handle_hwerrors(str
 			 "(cleared)\n", (unsigned long long) hwerrs);
 	dd->ipath_lasthwerror |= hwerrs;
 
-	if (hwerrs & ~infinipath_hwe_bitsextant)
+	if (hwerrs & ~dd->ipath_hwe_bitsextant)
 		ipath_dev_err(dd, "hwerror interrupt with unknown errors "
 			      "%llx set\n", (unsigned long long)
-			      (hwerrs & ~infinipath_hwe_bitsextant));
+			      (hwerrs & ~dd->ipath_hwe_bitsextant));
 
 	ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control);
 	if (ctrl & INFINIPATH_C_FREEZEMODE) {
@@ -865,19 +865,19 @@ static int ipath_setup_pe_config(struct 
 	return 0;
 }
 
-static void ipath_init_pe_variables(void)
+static void ipath_init_pe_variables(struct ipath_devdata *dd)
 {
 	/*
 	 * bits for selecting i2c direction and values,
 	 * used for I2C serial flash
 	 */
-	ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM;
-	ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM;
-	ipath_gpio_sda = IPATH_GPIO_SDA;
-	ipath_gpio_scl = IPATH_GPIO_SCL;
+	dd->ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM;
+	dd->ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM;
+	dd->ipath_gpio_sda = IPATH_GPIO_SDA;
+	dd->ipath_gpio_scl = IPATH_GPIO_SCL;
 
 	/* variables for sanity checking interrupt and errors */
-	infinipath_hwe_bitsextant =
+	dd->ipath_hwe_bitsextant =
 		(INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
 		 INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) |
 		(INFINIPATH_HWE_PCIEMEMPARITYERR_MASK <<
@@ -895,13 +895,13 @@ static void ipath_init_pe_variables(void
 		INFINIPATH_HWE_SERDESPLLFAILED |
 		INFINIPATH_HWE_IBCBUSTOSPCPARITYERR |
 		INFINIPATH_HWE_IBCBUSFRSPCPARITYERR;
-	infinipath_i_bitsextant =
+	dd->ipath_i_bitsextant =
 		(INFINIPATH_I_RCVURG_MASK << INFINIPATH_I_RCVURG_SHIFT) |
 		(INFINIPATH_I_RCVAVAIL_MASK <<
 		 INFINIPATH_I_RCVAVAIL_SHIFT) |
 		INFINIPATH_I_ERROR | INFINIPATH_I_SPIOSENT |
 		INFINIPATH_I_SPIOBUFAVAIL | INFINIPATH_I_GPIO;
-	infinipath_e_bitsextant =
+	dd->ipath_e_bitsextant =
 		INFINIPATH_E_RFORMATERR | INFINIPATH_E_RVCRC |
 		INFINIPATH_E_RICRC | INFINIPATH_E_RMINPKTLEN |
 		INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RLONGPKTLEN |
@@ -919,8 +919,8 @@ static void ipath_init_pe_variables(void
 		INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET |
 		INFINIPATH_E_HARDWARE;
 
-	infinipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
-	infinipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
+	dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
+	dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
 }
 
 /* setup the MSI stuff again after a reset.  I'd like to just call
@@ -1326,6 +1326,6 @@ void ipath_init_iba6120_funcs(struct ipa
 	dd->ipath_kregs = &ipath_pe_kregs;
 	dd->ipath_cregs = &ipath_pe_cregs;
 
-	ipath_init_pe_variables();
-}
-
+	ipath_init_pe_variables(dd);
+}
+
diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
@@ -480,10 +480,10 @@ static int handle_errors(struct ipath_de
 		dd->ipath_f_handle_hwerrors(dd, msg, sizeof msg);
 	}
 
-	if (!noprint && (errs & ~infinipath_e_bitsextant))
+	if (!noprint && (errs & ~dd->ipath_e_bitsextant))
 		ipath_dev_err(dd, "error interrupt with unknown errors "
 			      "%llx set\n", (unsigned long long)
-			      (errs & ~infinipath_e_bitsextant));
+			      (errs & ~dd->ipath_e_bitsextant));
 
 	if (errs & E_SUM_ERRS)
 		ignore_this_time = handle_e_sum_errs(dd, errs);
@@ -805,9 +805,9 @@ static void handle_urcv(struct ipath_dev
 	int rcvdint = 0;
 
 	portr = ((istat >> INFINIPATH_I_RCVAVAIL_SHIFT) &
-		 infinipath_i_rcvavail_mask)
+		 dd->ipath_i_rcvavail_mask)
 		| ((istat >> INFINIPATH_I_RCVURG_SHIFT) &
-		   infinipath_i_rcvurg_mask);
+		   dd->ipath_i_rcvurg_mask);
 	for (i = 1; i < dd->ipath_cfgports; i++) {
 		struct ipath_portdata *pd = dd->ipath_pd[i];
 		if (portr & (1 << i) && pd && pd->port_cnt &&
@@ -914,10 +914,10 @@ irqreturn_t ipath_intr(int irq, void *da
 	if (unexpected)
 		unexpected = 0;
 
-	if (unlikely(istat & ~infinipath_i_bitsextant))
+	if (unlikely(istat & ~dd->ipath_i_bitsextant))
 		ipath_dev_err(dd,
 			      "interrupt with unknown interrupts %x set\n",
-			      istat & (u32) ~ infinipath_i_bitsextant);
+			      istat & (u32) ~ dd->ipath_i_bitsextant);
 	else
 		ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat);
 
@@ -1041,9 +1041,9 @@ irqreturn_t ipath_intr(int irq, void *da
 		istat &= ~port0rbits;
 	}
 
-	if (istat & ((infinipath_i_rcvavail_mask <<
+	if (istat & ((dd->ipath_i_rcvavail_mask <<
 		      INFINIPATH_I_RCVAVAIL_SHIFT)
-		     | (infinipath_i_rcvurg_mask <<
+		     | (dd->ipath_i_rcvurg_mask <<
 			INFINIPATH_I_RCVURG_SHIFT)))
 		handle_urcv(dd, istat);
 
diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
@@ -533,6 +533,30 @@ struct ipath_devdata {
 	u32 ipath_rxfc_unsupvl_errs;
 	u32 ipath_overrun_thresh_errs;
 	u32 ipath_lli_errs;
+
+	/*
+	 * Not all devices managed by a driver instance are the same
+	 * type, so these fields must be per-device.
+	 */
+	u64 ipath_i_bitsextant;
+	ipath_err_t ipath_e_bitsextant;
+	ipath_err_t ipath_hwe_bitsextant;
+
+	/*
+	 * Below should be computable from number of ports,
+	 * since they are never modified.
+	 */
+	u32 ipath_i_rcvavail_mask;
+	u32 ipath_i_rcvurg_mask;
+
+	/*
+	 * Register bits for selecting i2c direction and values, used for
+	 * I2C serial flash.
+	 */
+	u16 ipath_gpio_sda_num;
+	u16 ipath_gpio_scl_num;
+	u64 ipath_gpio_sda;
+	u64 ipath_gpio_scl;
 };
 
 /* Private data for file operations */
diff -r 42f82d2c62bc -r dcf5ac390abd drivers/infiniband/hw/ipath/ipath_registers.h
--- a/drivers/infiniband/hw/ipath/ipath_registers.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h	Thu Sep 28 08:57:12 2006 -0700
@@ -316,19 +316,23 @@
 
 typedef u64 ipath_err_t;
 
+/* The following change with the type of device, so
+ * need to be part of the ipath_devdata struct, or
+ * we could have problems plugging in devices of
+ * different types (e.g. one HT, one PCIE)
+ * in one system, to be managed by one driver.
+ * On the other hand, this file is may also be included
+ * by other code, so leave the declarations here
+ * temporarily. Minor footprint issue if common-model
+ * linker used, none if C89+ linker used.
+ */
+
 /* mask of defined bits for various registers */
 extern u64 infinipath_i_bitsextant;
 extern ipath_err_t infinipath_e_bitsextant, infinipath_hwe_bitsextant;
 
 /* masks that are different in various chips, or only exist in some chips */
 extern u32 infinipath_i_rcvavail_mask, infinipath_i_rcvurg_mask;
-
-/*
- * register bits for selecting i2c direction and values, used for I2C serial
- * flash
- */
-extern u16 ipath_gpio_sda_num, ipath_gpio_scl_num;
-extern u64 ipath_gpio_sda, ipath_gpio_scl;
 
 /*
  * These are the infinipath general register numbers (not offsets).


From bos at pathscale.com  Thu Sep 28 09:00:22 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:22 -0700
Subject: [openib-general] [PATCH 26 of 28] IB/ipath - support new PCIE
	device, QLE7142
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <8b45b43df5adb4ea7dec.1159459222@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 4269068599c2 -r 8b45b43df5ad drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:13 2006 -0700
@@ -538,6 +538,9 @@ static int ipath_pe_boardname(struct ipa
 	case 5:
 		n = "InfiniPath_QMH7140";
 		break;
+	case 6:
+		n = "InfiniPath_QLE7142";
+		break;
 	default:
 		ipath_dev_err(dd,
 			      "Don't yet know about board with ID %u\n",


From bos at pathscale.com  Thu Sep 28 09:00:17 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:17 -0700
Subject: [openib-general] [PATCH 21 of 28] IB/ipath - change HT CRC message
 to indicate how to resolve problem
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <a78c7b475df6e62937fa.1159459217@eng-12.pathscale.com>

The system must be powercycled to clear a HT CRC error; reloading the
driver is not enough.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r e3158e62d6bf -r a78c7b475df6 drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:13 2006 -0700
@@ -338,7 +338,7 @@ static void hwerr_crcbits(struct ipath_d
 	if (crcbits) {
 		u16 ctrl0, ctrl1;
 		snprintf(bitsmsg, sizeof bitsmsg,
-			 "[HT%s lane %s CRC (%llx); ignore till reload]",
+			 "[HT%s lane %s CRC (%llx); powercycle to completely clear]",
 			 !(crcbits & _IPATH_HTLINK1_CRCBITS) ?
 			 "0 (A)" : (!(crcbits & _IPATH_HTLINK0_CRCBITS)
 				    ? "1 (B)" : "0+1 (A+B)"),


From bos at pathscale.com  Thu Sep 28 09:00:23 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:23 -0700
Subject: [openib-general] [PATCH 27 of 28] IB/ipath - fix races with
	ib_resize_cq()
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <944d7e53a04937d73513.1159459223@eng-12.pathscale.com>

The resize CQ function changes the memory used to store the queue.
Other routines need to honor the lock before accessing the pointer
to the queue and verify that the head and tail are in range.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 8b45b43df5ad -r 944d7e53a049 drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c	Thu Sep 28 08:57:13 2006 -0700
@@ -46,7 +46,7 @@
  */
 void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int solicited)
 {
-	struct ipath_cq_wc *wc = cq->queue;
+	struct ipath_cq_wc *wc;
 	unsigned long flags;
 	u32 head;
 	u32 next;
@@ -57,6 +57,7 @@ void ipath_cq_enter(struct ipath_cq *cq,
 	 * Note that the head pointer might be writable by user processes.
 	 * Take care to verify it is a sane value.
 	 */
+	wc = cq->queue;
 	head = wc->head;
 	if (head >= (unsigned) cq->ibcq.cqe) {
 		head = cq->ibcq.cqe;
@@ -109,21 +110,27 @@ int ipath_poll_cq(struct ib_cq *ibcq, in
 int ipath_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
 {
 	struct ipath_cq *cq = to_icq(ibcq);
-	struct ipath_cq_wc *wc = cq->queue;
+	struct ipath_cq_wc *wc;
 	unsigned long flags;
 	int npolled;
+	u32 tail;
 
 	spin_lock_irqsave(&cq->lock, flags);
 
+	wc = cq->queue;
+	tail = wc->tail;
+	if (tail > (u32) cq->ibcq.cqe)
+		tail = (u32) cq->ibcq.cqe;
 	for (npolled = 0; npolled < num_entries; ++npolled, ++entry) {
-		if (wc->tail == wc->head)
+		if (tail == wc->head)
 			break;
-		*entry = wc->queue[wc->tail];
-		if (wc->tail >= cq->ibcq.cqe)
-			wc->tail = 0;
+		*entry = wc->queue[tail];
+		if (tail >= cq->ibcq.cqe)
+			tail = 0;
 		else
-			wc->tail++;
-	}
+			tail++;
+	}
+	wc->tail = tail;
 
 	spin_unlock_irqrestore(&cq->lock, flags);
 
@@ -322,10 +329,16 @@ int ipath_req_notify_cq(struct ib_cq *ib
 	return 0;
 }
 
+/**
+ * ipath_resize_cq - change the size of the CQ
+ * @ibcq: the completion queue
+ *
+ * Returns 0 for success.
+ */
 int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata)
 {
 	struct ipath_cq *cq = to_icq(ibcq);
-	struct ipath_cq_wc *old_wc = cq->queue;
+	struct ipath_cq_wc *old_wc;
 	struct ipath_cq_wc *wc;
 	u32 head, tail, n;
 	int ret;
@@ -361,6 +374,7 @@ int ipath_resize_cq(struct ib_cq *ibcq, 
 	 * Make sure head and tail are sane since they
 	 * might be user writable.
 	 */
+	old_wc = cq->queue;
 	head = old_wc->head;
 	if (head > (u32) cq->ibcq.cqe)
 		head = (u32) cq->ibcq.cqe;


From bos at pathscale.com  Thu Sep 28 09:00:24 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:24 -0700
Subject: [openib-general] [PATCH 28 of 28] IB/ipath - fix lockdep error upon
 "ifconfig ibN down"
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <c61b17b5602f2690dc3a.1159459224@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 944d7e53a049 -r c61b17b5602f drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c	Thu Sep 28 08:57:13 2006 -0700
@@ -1202,6 +1202,7 @@ static struct ib_ah *ipath_create_ah(str
 	struct ipath_ah *ah;
 	struct ib_ah *ret;
 	struct ipath_ibdev *dev = to_idev(pd->device);
+	unsigned long flags;
 
 	/* A multicast address requires a GRH (see ch. 8.4.1). */
 	if (ah_attr->dlid >= IPATH_MULTICAST_LID_BASE &&
@@ -1228,16 +1229,16 @@ static struct ib_ah *ipath_create_ah(str
 		goto bail;
 	}
 
-	spin_lock(&dev->n_ahs_lock);
+	spin_lock_irqsave(&dev->n_ahs_lock, flags);
 	if (dev->n_ahs_allocated == ib_ipath_max_ahs) {
-		spin_unlock(&dev->n_ahs_lock);
+		spin_unlock_irqrestore(&dev->n_ahs_lock, flags);
 		kfree(ah);
 		ret = ERR_PTR(-ENOMEM);
 		goto bail;
 	}
 
 	dev->n_ahs_allocated++;
-	spin_unlock(&dev->n_ahs_lock);
+	spin_unlock_irqrestore(&dev->n_ahs_lock, flags);
 
 	/* ib_create_ah() will initialize ah->ibah. */
 	ah->attr = *ah_attr;
@@ -1258,10 +1259,11 @@ static int ipath_destroy_ah(struct ib_ah
 {
 	struct ipath_ibdev *dev = to_idev(ibah->device);
 	struct ipath_ah *ah = to_iah(ibah);
-
-	spin_lock(&dev->n_ahs_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->n_ahs_lock, flags);
 	dev->n_ahs_allocated--;
-	spin_unlock(&dev->n_ahs_lock);
+	spin_unlock_irqrestore(&dev->n_ahs_lock, flags);
 
 	kfree(ah);
 

From bos at pathscale.com  Thu Sep 28 09:00:16 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:16 -0700
Subject: [openib-general] [PATCH 20 of 28] IB/ipath - clean up module exit
	code
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <e3158e62d6bf923cadc2.1159459216@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 858280e8cbab -r e3158e62d6bf drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:13 2006 -0700
@@ -517,33 +517,146 @@ bail:
 	return ret;
 }
 
+static void __devexit cleanup_device(struct ipath_devdata *dd)
+{
+	int port;
+
+	ipath_shutdown_device(dd);
+
+	if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) {
+		/* can't do anything more with chip; needs re-init */
+		*dd->ipath_statusp &= ~IPATH_STATUS_CHIP_PRESENT;
+		if (dd->ipath_kregbase) {
+			/*
+			 * if we haven't already cleaned up before these are
+			 * to ensure any register reads/writes "fail" until
+			 * re-init
+			 */
+			dd->ipath_kregbase = NULL;
+			dd->ipath_uregbase = 0;
+			dd->ipath_sregbase = 0;
+			dd->ipath_cregbase = 0;
+			dd->ipath_kregsize = 0;
+		}
+		ipath_disable_wc(dd);
+	}
+
+	if (dd->ipath_pioavailregs_dma) {
+		dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
+				  (void *) dd->ipath_pioavailregs_dma,
+				  dd->ipath_pioavailregs_phys);
+		dd->ipath_pioavailregs_dma = NULL;
+	}
+	if (dd->ipath_dummy_hdrq) {
+		dma_free_coherent(&dd->pcidev->dev,
+			dd->ipath_pd[0]->port_rcvhdrq_size,
+			dd->ipath_dummy_hdrq, dd->ipath_dummy_hdrq_phys);
+		dd->ipath_dummy_hdrq = NULL;
+	}
+
+	if (dd->ipath_pageshadow) {
+		struct page **tmpp = dd->ipath_pageshadow;
+		dma_addr_t *tmpd = dd->ipath_physshadow;
+		int i, cnt = 0;
+
+		ipath_cdbg(VERBOSE, "Unlocking any expTID pages still "
+			   "locked\n");
+		for (port = 0; port < dd->ipath_cfgports; port++) {
+			int port_tidbase = port * dd->ipath_rcvtidcnt;
+			int maxtid = port_tidbase + dd->ipath_rcvtidcnt;
+			for (i = port_tidbase; i < maxtid; i++) {
+				if (!tmpp[i])
+					continue;
+				pci_unmap_page(dd->pcidev, tmpd[i],
+					PAGE_SIZE, PCI_DMA_FROMDEVICE);
+				ipath_release_user_pages(&tmpp[i], 1);
+				tmpp[i] = NULL;
+				cnt++;
+			}
+		}
+		if (cnt) {
+			ipath_stats.sps_pageunlocks += cnt;
+			ipath_cdbg(VERBOSE, "There were still %u expTID "
+				   "entries locked\n", cnt);
+		}
+		if (ipath_stats.sps_pagelocks ||
+		    ipath_stats.sps_pageunlocks)
+			ipath_cdbg(VERBOSE, "%llu pages locked, %llu "
+				   "unlocked via ipath_m{un}lock\n",
+				   (unsigned long long)
+				   ipath_stats.sps_pagelocks,
+				   (unsigned long long)
+				   ipath_stats.sps_pageunlocks);
+
+		ipath_cdbg(VERBOSE, "Free shadow page tid array at %p\n",
+			   dd->ipath_pageshadow);
+		vfree(dd->ipath_pageshadow);
+		dd->ipath_pageshadow = NULL;
+	}
+
+	/*
+	 * free any resources still in use (usually just kernel ports)
+	 * at unload; we do for portcnt, not cfgports, because cfgports
+	 * could have changed while we were loaded.
+	 */
+	for (port = 0; port < dd->ipath_portcnt; port++) {
+		struct ipath_portdata *pd = dd->ipath_pd[port];
+		dd->ipath_pd[port] = NULL;
+		ipath_free_pddata(dd, pd);
+	}
+	kfree(dd->ipath_pd);
+	/*
+	 * debuggability, in case some cleanup path tries to use it
+	 * after this
+	 */
+	dd->ipath_pd = NULL;
+}
+
 static void __devexit ipath_remove_one(struct pci_dev *pdev)
 {
-	struct ipath_devdata *dd;
-
-	ipath_cdbg(VERBOSE, "removing, pdev=%p\n", pdev);
-	if (!pdev)
-		return;
-
-	dd = pci_get_drvdata(pdev);
-
-	if (dd->verbs_dev) {
+	struct ipath_devdata *dd = pci_get_drvdata(pdev);
+
+	ipath_cdbg(VERBOSE, "removing, pdev=%p, dd=%p\n", pdev, dd);
+
+	if (dd->verbs_dev)
 		ipath_unregister_ib_device(dd->verbs_dev);
-		dd->verbs_dev = NULL;
-	}
 
 	ipath_diag_remove(dd);
 	ipath_user_remove(dd);
 	ipathfs_remove_device(dd);
 	ipath_device_remove_group(&pdev->dev, dd);
+
 	ipath_cdbg(VERBOSE, "Releasing pci memory regions, dd %p, "
 		   "unit %u\n", dd, (u32) dd->ipath_unit);
-	if (dd->ipath_kregbase) {
-		ipath_cdbg(VERBOSE, "Unmapping kregbase %p\n",
-			   dd->ipath_kregbase);
-		iounmap((volatile void __iomem *) dd->ipath_kregbase);
-		dd->ipath_kregbase = NULL;
-	}
+
+	cleanup_device(dd);
+
+	/*
+	 * turn off rcv, send, and interrupts for all ports, all drivers
+	 * should also hard reset the chip here?
+	 * free up port 0 (kernel) rcvhdr, egr bufs, and eventually tid bufs
+	 * for all versions of the driver, if they were allocated
+	 */
+	if (pdev->irq) {
+		ipath_cdbg(VERBOSE,
+			   "unit %u free_irq of irq %x\n",
+			   dd->ipath_unit, pdev->irq);
+		free_irq(pdev->irq, dd);
+	} else
+		ipath_dbg("irq is 0, not doing free_irq "
+			  "for unit %u\n", dd->ipath_unit);
+	/*
+	 * we check for NULL here, because it's outside
+	 * the kregbase check, and we need to call it
+	 * after the free_irq.	Thus it's possible that
+	 * the function pointers were never initialized.
+	 */
+	if (dd->ipath_f_cleanup)
+		/* clean up chip-specific stuff */
+		dd->ipath_f_cleanup(dd);
+
+	ipath_cdbg(VERBOSE, "Unmapping kregbase %p\n", dd->ipath_kregbase);
+	iounmap((volatile void __iomem *) dd->ipath_kregbase);
 	pci_release_regions(pdev);
 	ipath_cdbg(VERBOSE, "calling pci_disable_device\n");
 	pci_disable_device(pdev);
@@ -1917,157 +2030,11 @@ bail:
 	return ret;
 }
 
-static void cleanup_device(struct ipath_devdata *dd)
-{
-	int port;
-
-	ipath_shutdown_device(dd);
-
-	if (*dd->ipath_statusp & IPATH_STATUS_CHIP_PRESENT) {
-		/* can't do anything more with chip; needs re-init */
-		*dd->ipath_statusp &= ~IPATH_STATUS_CHIP_PRESENT;
-		if (dd->ipath_kregbase) {
-			/*
-			 * if we haven't already cleaned up before these are
-			 * to ensure any register reads/writes "fail" until
-			 * re-init
-			 */
-			dd->ipath_kregbase = NULL;
-			dd->ipath_uregbase = 0;
-			dd->ipath_sregbase = 0;
-			dd->ipath_cregbase = 0;
-			dd->ipath_kregsize = 0;
-		}
-		ipath_disable_wc(dd);
-	}
-
-	if (dd->ipath_pioavailregs_dma) {
-		dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
-				  (void *) dd->ipath_pioavailregs_dma,
-				  dd->ipath_pioavailregs_phys);
-		dd->ipath_pioavailregs_dma = NULL;
-	}
-	if (dd->ipath_dummy_hdrq) {
-		dma_free_coherent(&dd->pcidev->dev,
-			dd->ipath_pd[0]->port_rcvhdrq_size,
-			dd->ipath_dummy_hdrq, dd->ipath_dummy_hdrq_phys);
-		dd->ipath_dummy_hdrq = NULL;
-	}
-
-	if (dd->ipath_pageshadow) {
-		struct page **tmpp = dd->ipath_pageshadow;
-		dma_addr_t *tmpd = dd->ipath_physshadow;
-		int i, cnt = 0;
-
-		ipath_cdbg(VERBOSE, "Unlocking any expTID pages still "
-			   "locked\n");
-		for (port = 0; port < dd->ipath_cfgports; port++) {
-			int port_tidbase = port * dd->ipath_rcvtidcnt;
-			int maxtid = port_tidbase + dd->ipath_rcvtidcnt;
-			for (i = port_tidbase; i < maxtid; i++) {
-				if (!tmpp[i])
-					continue;
-				pci_unmap_page(dd->pcidev, tmpd[i],
-					       PAGE_SIZE, PCI_DMA_FROMDEVICE);
-				ipath_release_user_pages(&tmpp[i], 1);
-				tmpp[i] = NULL;
-				cnt++;
-			}
-		}
-		if (cnt) {
-			ipath_stats.sps_pageunlocks += cnt;
-			ipath_cdbg(VERBOSE, "There were still %u expTID "
-				   "entries locked\n", cnt);
-		}
-		if (ipath_stats.sps_pagelocks ||
-		    ipath_stats.sps_pageunlocks)
-			ipath_cdbg(VERBOSE, "%llu pages locked, %llu "
-				   "unlocked via ipath_m{un}lock\n",
-				   (unsigned long long)
-				   ipath_stats.sps_pagelocks,
-				   (unsigned long long)
-				   ipath_stats.sps_pageunlocks);
-
-		ipath_cdbg(VERBOSE, "Free shadow page tid array at %p\n",
-			   dd->ipath_pageshadow);
-		vfree(dd->ipath_pageshadow);
-		dd->ipath_pageshadow = NULL;
-	}
-
-	/*
-	 * free any resources still in use (usually just kernel ports)
-	 * at unload; we do for portcnt, not cfgports, because cfgports
-	 * could have changed while we were loaded.
-	 */
-	for (port = 0; port < dd->ipath_portcnt; port++) {
-		struct ipath_portdata *pd = dd->ipath_pd[port];
-		dd->ipath_pd[port] = NULL;
-		ipath_free_pddata(dd, pd);
-	}
-	kfree(dd->ipath_pd);
-	/*
-	 * debuggability, in case some cleanup path tries to use it
-	 * after this
-	 */
-	dd->ipath_pd = NULL;
-}
-
 static void __exit infinipath_cleanup(void)
 {
-	struct ipath_devdata *dd, *tmp;
-	unsigned long flags;
-
-	ipath_diagpkt_remove();
-
 	ipath_exit_ipathfs();
 
 	ipath_driver_remove_group(&ipath_driver.driver);
-
-	spin_lock_irqsave(&ipath_devs_lock, flags);
-
-	/*
-	 * turn off rcv, send, and interrupts for all ports, all drivers
-	 * should also hard reset the chip here?
-	 * free up port 0 (kernel) rcvhdr, egr bufs, and eventually tid bufs
-	 * for all versions of the driver, if they were allocated
-	 */
-	list_for_each_entry_safe(dd, tmp, &ipath_dev_list, ipath_list) {
-		spin_unlock_irqrestore(&ipath_devs_lock, flags);
-
-		if (dd->verbs_dev) {
-			ipath_unregister_ib_device(dd->verbs_dev);
-			dd->verbs_dev = NULL;
-		}
-
-		if (dd->ipath_kregbase)
-			cleanup_device(dd);
-
-		if (dd->pcidev) {
-			if (dd->pcidev->irq) {
-				ipath_cdbg(VERBOSE,
-					   "unit %u free_irq of irq %x\n",
-					   dd->ipath_unit, dd->pcidev->irq);
-				free_irq(dd->pcidev->irq, dd);
-			} else
-				ipath_dbg("irq is 0, not doing free_irq "
-					  "for unit %u\n", dd->ipath_unit);
-
-			/*
-			 * we check for NULL here, because it's outside
-			 * the kregbase check, and we need to call it
-			 * after the free_irq.  Thus it's possible that
-			 * the function pointers were never initialized.
-			 */
-			if (dd->ipath_f_cleanup)
-				/* clean up chip-specific stuff */
-				dd->ipath_f_cleanup(dd);
-
-			dd->pcidev = NULL;
-		}
-		spin_lock_irqsave(&ipath_devs_lock, flags);
-	}
-
-	spin_unlock_irqrestore(&ipath_devs_lock, flags);
 
 	ipath_cdbg(VERBOSE, "Unregistering pci driver\n");
 	pci_unregister_driver(&ipath_driver);


From bos at pathscale.com  Thu Sep 28 09:00:18 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:18 -0700
Subject: [openib-general] [PATCH 22 of 28] IB/ipath - fix and recover TXE
 piobuf and PBC parity errors
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <5aea5f31529d9b8ff214.1159459218@eng-12.pathscale.com>

We can sometimes trigger parity errors due to processor speculative
reads to our write-combined memory (mostly seen on Woodcrest).   Add a
stats counter for these.

Factored out the sendbuffererror buffer cancellation code so it can be
used in the new handling; suppress likely subsequent error messages if
within two jiffies of the cancellation.

Also restore 2 dropped TXE lines on hwe_bitsextant noticed while
debugging.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:13 2006 -0700
@@ -141,8 +141,9 @@ struct infinipath_stats {
 	 * packets if ipath not configured, etc.)
 	 */
 	__u64 sps_krdrops;
+	__u64 sps_txeparity; // PIO buffer parity error, recovered
 	/* pad for future growth */
-	__u64 __sps_pad[46];
+	__u64 __sps_pad[45];
 };
 
 /*
diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_iba6110.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c	Thu Sep 28 08:57:13 2006 -0700
@@ -451,7 +451,10 @@ static void ipath_ht_handle_hwerrors(str
 	 * make sure we get this much out, unless told to be quiet,
 	 * or it's occurred within the last 5 seconds
 	 */
-	if ((hwerrs & ~dd->ipath_lasthwerror) ||
+	if ((hwerrs & ~(dd->ipath_lasthwerror |
+			((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+			  INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+			<< INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) ||
 	    (ipath_debug & __IPATH_VERBDBG))
 		dev_info(&dd->pcidev->dev, "Hardware error: hwerr=0x%llx "
 			 "(cleared)\n", (unsigned long long) hwerrs);
@@ -464,6 +467,33 @@ static void ipath_ht_handle_hwerrors(str
 
 	ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control);
 	if (ctrl & INFINIPATH_C_FREEZEMODE) {
+		/*
+		 * parity errors in send memory are recoverable,
+		 * just cancel the send (if indicated in * sendbuffererror),
+		 * count the occurrence, unfreeze (if no other handled
+		 * hardware error bits are set), and continue. They can
+		 * occur if a processor speculative read is done to the PIO
+		 * buffer while we are sending a packet, for example.
+		 */
+		if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+			       INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+			      << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) {
+			ipath_stats.sps_txeparity++;
+			ipath_dbg("Recovering from TXE parity error (%llu), "
+			    	  "hwerrstatus=%llx\n",
+				  (unsigned long long) ipath_stats.sps_txeparity,
+				  (unsigned long long) hwerrs);
+			ipath_disarm_senderrbufs(dd);
+			hwerrs &= ~((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+				     INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+				    << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT);
+			if (!hwerrs) { // else leave in freeze mode
+				ipath_write_kreg(dd,
+						 dd->ipath_kregs->kr_control,
+						 dd->ipath_control);
+				return;
+			}
+		}
 		if (hwerrs) {
 			/*
 			 * if any set that we aren't ignoring; only
diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:13 2006 -0700
@@ -370,7 +370,10 @@ static void ipath_pe_handle_hwerrors(str
 	 * make sure we get this much out, unless told to be quiet,
 	 * or it's occurred within the last 5 seconds
 	 */
-	if ((hwerrs & ~dd->ipath_lasthwerror) ||
+	if ((hwerrs & ~(dd->ipath_lasthwerror |
+			((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+			  INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+			 << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) ||
 	    (ipath_debug & __IPATH_VERBDBG))
 		dev_info(&dd->pcidev->dev, "Hardware error: hwerr=0x%llx "
 			 "(cleared)\n", (unsigned long long) hwerrs);
@@ -383,6 +386,33 @@ static void ipath_pe_handle_hwerrors(str
 
 	ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control);
 	if (ctrl & INFINIPATH_C_FREEZEMODE) {
+		/*
+		 * parity errors in send memory are recoverable,
+		 * just cancel the send (if indicated in * sendbuffererror),
+		 * count the occurrence, unfreeze (if no other handled
+		 * hardware error bits are set), and continue. They can
+		 * occur if a processor speculative read is done to the PIO
+		 * buffer while we are sending a packet, for example.
+		 */
+		if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+			       INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+			      << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) {
+			ipath_stats.sps_txeparity++;
+			ipath_dbg("Recovering from TXE parity error (%llu), "
+			    	  "hwerrstatus=%llx\n",
+				  (unsigned long long) ipath_stats.sps_txeparity,
+				  (unsigned long long) hwerrs);
+			ipath_disarm_senderrbufs(dd);
+			hwerrs &= ~((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+				     INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+				    << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT);
+			if (!hwerrs) { // else leave in freeze mode
+				ipath_write_kreg(dd,
+						 dd->ipath_kregs->kr_control,
+						 dd->ipath_control);
+			    return;
+			}
+		}
 		if (hwerrs) {
 			/*
 			 * if any set that we aren't ignoring only make the
@@ -406,9 +436,8 @@ static void ipath_pe_handle_hwerrors(str
 		} else {
 			ipath_dbg("Clearing freezemode on ignored hardware "
 				  "error\n");
-			ctrl &= ~INFINIPATH_C_FREEZEMODE;
 			ipath_write_kreg(dd, dd->ipath_kregs->kr_control,
-					 ctrl);
+			   		 dd->ipath_control);
 		}
 	}
 
@@ -880,6 +909,8 @@ static void ipath_init_pe_variables(stru
 	dd->ipath_hwe_bitsextant =
 		(INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
 		 INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) |
+		(INFINIPATH_HWE_TXEMEMPARITYERR_MASK <<
+		 INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) |
 		(INFINIPATH_HWE_PCIEMEMPARITYERR_MASK <<
 		 INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT) |
 		INFINIPATH_HWE_PCIE1PLLFAILED |
diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:13 2006 -0700
@@ -37,6 +37,50 @@
 #include "ipath_verbs.h"
 #include "ipath_common.h"
 
+/*
+ * Called when we might have an error that is specific to a particular
+ * PIO buffer, and may need to cancel that buffer, so it can be re-used.
+ */
+void ipath_disarm_senderrbufs(struct ipath_devdata *dd)
+{
+	u32 piobcnt;
+	unsigned long sbuf[4];
+	/*
+	 * it's possible that sendbuffererror could have bits set; might
+	 * have already done this as a result of hardware error handling
+	 */
+	piobcnt = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k;
+	/* read these before writing errorclear */
+	sbuf[0] = ipath_read_kreg64(
+		dd, dd->ipath_kregs->kr_sendbuffererror);
+	sbuf[1] = ipath_read_kreg64(
+		dd, dd->ipath_kregs->kr_sendbuffererror + 1);
+	if (piobcnt > 128) {
+		sbuf[2] = ipath_read_kreg64(
+			dd, dd->ipath_kregs->kr_sendbuffererror + 2);
+		sbuf[3] = ipath_read_kreg64(
+			dd, dd->ipath_kregs->kr_sendbuffererror + 3);
+	}
+
+	if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) {
+		int i;
+		if (ipath_debug & (__IPATH_PKTDBG|__IPATH_DBG)) {
+			__IPATH_DBG_WHICH(__IPATH_PKTDBG|__IPATH_DBG,
+					  "SendbufErrs %lx %lx", sbuf[0],
+					  sbuf[1]);
+			if (ipath_debug & __IPATH_PKTDBG && piobcnt > 128)
+				printk(" %lx %lx ", sbuf[2], sbuf[3]);
+			printk("\n");
+		}
+
+		for (i = 0; i < piobcnt; i++)
+			if (test_bit(i, sbuf))
+				ipath_disarm_piobufs(dd, i, 1);
+		dd->ipath_lastcancel = jiffies+3; // no armlaunch for a bit
+	}
+}
+
+
 /* These are all rcv-related errors which we want to count for stats */
 #define E_SUM_PKTERRS \
 	(INFINIPATH_E_RHDRLEN | INFINIPATH_E_RBADTID | \
@@ -68,53 +112,9 @@
 
 static u64 handle_e_sum_errs(struct ipath_devdata *dd, ipath_err_t errs)
 {
-	unsigned long sbuf[4];
 	u64 ignore_this_time = 0;
-	u32 piobcnt;
-
-	/* if possible that sendbuffererror could be valid */
-	piobcnt = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k;
-	/* read these before writing errorclear */
-	sbuf[0] = ipath_read_kreg64(
-		dd, dd->ipath_kregs->kr_sendbuffererror);
-	sbuf[1] = ipath_read_kreg64(
-		dd, dd->ipath_kregs->kr_sendbuffererror + 1);
-	if (piobcnt > 128) {
-		sbuf[2] = ipath_read_kreg64(
-			dd, dd->ipath_kregs->kr_sendbuffererror + 2);
-		sbuf[3] = ipath_read_kreg64(
-			dd, dd->ipath_kregs->kr_sendbuffererror + 3);
-	}
-
-	if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) {
-		int i;
-
-		ipath_cdbg(PKT, "SendbufErrs %lx %lx ", sbuf[0], sbuf[1]);
-		if (ipath_debug & __IPATH_PKTDBG && piobcnt > 128)
-			printk("%lx %lx ", sbuf[2], sbuf[3]);
-		for (i = 0; i < piobcnt; i++) {
-			if (test_bit(i, sbuf)) {
-				u32 __iomem *piobuf;
-				if (i < dd->ipath_piobcnt2k)
-					piobuf = (u32 __iomem *)
-						(dd->ipath_pio2kbase +
-						 i * dd->ipath_palign);
-				else
-					piobuf = (u32 __iomem *)
-						(dd->ipath_pio4kbase +
-						 (i - dd->ipath_piobcnt2k) *
-						 dd->ipath_4kalign);
-
-				ipath_cdbg(PKT,
-					   "PIObuf[%u] @%p pbc is %x; ",
-					   i, piobuf, readl(piobuf));
-
-				ipath_disarm_piobufs(dd, i, 1);
-			}
-		}
-		if (ipath_debug & __IPATH_PKTDBG)
-			printk("\n");
-	}
+
+	ipath_disarm_senderrbufs(dd);
 	if ((errs & E_SUM_LINK_PKTERRS) &&
 	    !(dd->ipath_flags & IPATH_LINKACTIVE)) {
 		/*
@@ -554,6 +554,14 @@ static int handle_errors(struct ipath_de
 			~(INFINIPATH_E_HARDWARE |
 			  INFINIPATH_E_IBSTATUSCHANGED);
 	}
+
+	// likely due to cancel, so suppress
+	if ((errs & (INFINIPATH_E_SPKTLEN | INFINIPATH_E_SPIOARMLAUNCH)) &&
+		dd->ipath_lastcancel > jiffies) {
+		ipath_dbg("Suppressed armlaunch/spktlen after error send cancel\n");
+		errs &= ~(INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SPKTLEN);
+	}
+
 	if (!errs)
 		return 0;
 
diff -r a78c7b475df6 -r 5aea5f31529d drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:13 2006 -0700
@@ -427,6 +427,9 @@ struct ipath_devdata {
 	unsigned long ipath_rcvctrl;
 	/* shadow kr_sendctrl */
 	unsigned long ipath_sendctrl;
+	/* ports waiting for PIOavail intr */
+	unsigned long ipath_portpiowait;
+	unsigned long ipath_lastcancel; // to not count armlaunch after cancel
 
 	/* value we put in kr_rcvhdrcnt */
 	u32 ipath_rcvhdrcnt;
@@ -490,8 +493,6 @@ struct ipath_devdata {
 	u32 ipath_htwidth;
 	/* HT speed (200,400,800,1000) from HT config */
 	u32 ipath_htspeed;
-	/* ports waiting for PIOavail intr */
-	unsigned long ipath_portpiowait;
 	/*
 	 * number of sequential ibcstatus change for polling active/quiet
 	 * (i.e., link not coming up).
@@ -585,6 +586,7 @@ void ipath_disable_wc(struct ipath_devda
 void ipath_disable_wc(struct ipath_devdata *dd);
 int ipath_count_units(int *npresentp, int *nupp, u32 *maxportsp);
 void ipath_shutdown_device(struct ipath_devdata *);
+void ipath_disarm_senderrbufs(struct ipath_devdata *);
 
 struct file_operations;
 int ipath_cdev_init(int minor, char *name, struct file_operations *fops,


From bos at pathscale.com  Thu Sep 28 09:00:13 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:13 -0700
Subject: [openib-general] [PATCH 17 of 28] IB/ipath - improved support for
	powerpc
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <f6794c8289abafda12be.1159459213@eng-12.pathscale.com>

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c	Thu Sep 28 08:57:12 2006 -0700
@@ -755,8 +755,8 @@ static inline void *ipath_get_egrbuf(str
 static inline void *ipath_get_egrbuf(struct ipath_devdata *dd, u32 bufnum,
 				     int err)
 {
-	return dd->ipath_port0_skbs ?
-		(void *)dd->ipath_port0_skbs[bufnum]->data : NULL;
+	return dd->ipath_port0_skbinfo ?
+		(void *) dd->ipath_port0_skbinfo[bufnum].skb->data : NULL;
 }
 
 /**
@@ -778,31 +778,34 @@ struct sk_buff *ipath_alloc_skb(struct i
 	 */
 
 	/*
-	 * We need 4 extra bytes for unaligned transfer copying
+	 * We need 2 extra bytes for ipath_ether data sent in the
+	 * key header.  In order to keep everything dword aligned,
+	 * we'll reserve 4 bytes.
 	 */
+	len = dd->ipath_ibmaxlen + 4;
+
 	if (dd->ipath_flags & IPATH_4BYTE_TID) {
-		/* we need a 4KB multiple alignment, and there is no way
+		/* We need a 2KB multiple alignment, and there is no way
 		 * to do it except to allocate extra and then skb_reserve
 		 * enough to bring it up to the right alignment.
 		 */
-		len = dd->ipath_ibmaxlen + 4 + (1 << 11) - 1;
-	}
-	else
-		len = dd->ipath_ibmaxlen + 4;
+		len += 2047;
+	}
+
 	skb = __dev_alloc_skb(len, gfp_mask);
 	if (!skb) {
 		ipath_dev_err(dd, "Failed to allocate skbuff, length %u\n",
 			      len);
 		goto bail;
 	}
+
+	skb_reserve(skb, 4);
+
 	if (dd->ipath_flags & IPATH_4BYTE_TID) {
-		u32 una = ((1 << 11) - 1) & (unsigned long)(skb->data + 4);
+		u32 una = (unsigned long)skb->data & 2047;
 		if (una)
-			skb_reserve(skb, 4 + (1 << 11) - una);
-		else
-			skb_reserve(skb, 4);
-	} else
-		skb_reserve(skb, 4);
+			skb_reserve(skb, 2048 - una);
+	}
 
 bail:
 	return skb;
@@ -1345,8 +1348,9 @@ int ipath_create_rcvhdrq(struct ipath_de
 		ipath_cdbg(VERBOSE, "reuse port %d rcvhdrq @%p %llx phys; "
 			   "hdrtailaddr@%p %llx physical\n",
 			   pd->port_port, pd->port_rcvhdrq,
-			   pd->port_rcvhdrq_phys, pd->port_rcvhdrtail_kvaddr,
-			   (unsigned long long)pd->port_rcvhdrqtailaddr_phys);
+			   (unsigned long long) pd->port_rcvhdrq_phys,
+			   pd->port_rcvhdrtail_kvaddr, (unsigned long long)
+			   pd->port_rcvhdrqtailaddr_phys);
 
 	/* clear for security and sanity on each use */
 	memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size);
@@ -1827,17 +1831,22 @@ void ipath_free_pddata(struct ipath_devd
 		kfree(pd->port_rcvegrbuf_phys);
 		pd->port_rcvegrbuf_phys = NULL;
 		pd->port_rcvegrbuf_chunks = 0;
-	} else if (pd->port_port == 0 && dd->ipath_port0_skbs) {
+	} else if (pd->port_port == 0 && dd->ipath_port0_skbinfo) {
 		unsigned e;
-		struct sk_buff **skbs = dd->ipath_port0_skbs;
-
-		dd->ipath_port0_skbs = NULL;
-		ipath_cdbg(VERBOSE, "free closed port %d ipath_port0_skbs "
-			   "@ %p\n", pd->port_port, skbs);
+		struct ipath_skbinfo *skbinfo = dd->ipath_port0_skbinfo;
+
+		dd->ipath_port0_skbinfo = NULL;
+		ipath_cdbg(VERBOSE, "free closed port %d "
+			   "ipath_port0_skbinfo @ %p\n", pd->port_port,
+			   skbinfo);
 		for (e = 0; e < dd->ipath_rcvegrcnt; e++)
-			if (skbs[e])
-				dev_kfree_skb(skbs[e]);
-		vfree(skbs);
+		if (skbinfo[e].skb) {
+			pci_unmap_single(dd->pcidev, skbinfo[e].phys,
+					 dd->ipath_ibmaxlen,
+					 PCI_DMA_FROMDEVICE);
+			dev_kfree_skb(skbinfo[e].skb);
+		}
+		vfree(skbinfo);
 	}
 	kfree(pd->port_tid_pg_list);
 	vfree(pd->subport_uregbase);
@@ -1934,7 +1943,7 @@ static void cleanup_device(struct ipath_
 
 	if (dd->ipath_pioavailregs_dma) {
 		dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
-				  dd->ipath_pioavailregs_dma,
+				  (void *) dd->ipath_pioavailregs_dma,
 				  dd->ipath_pioavailregs_phys);
 		dd->ipath_pioavailregs_dma = NULL;
 	}
@@ -1947,6 +1956,7 @@ static void cleanup_device(struct ipath_
 
 	if (dd->ipath_pageshadow) {
 		struct page **tmpp = dd->ipath_pageshadow;
+		dma_addr_t *tmpd = dd->ipath_physshadow;
 		int i, cnt = 0;
 
 		ipath_cdbg(VERBOSE, "Unlocking any expTID pages still "
@@ -1957,6 +1967,8 @@ static void cleanup_device(struct ipath_
 			for (i = port_tidbase; i < maxtid; i++) {
 				if (!tmpp[i])
 					continue;
+				pci_unmap_page(dd->pcidev, tmpd[i],
+					       PAGE_SIZE, PCI_DMA_FROMDEVICE);
 				ipath_release_user_pages(&tmpp[i], 1);
 				tmpp[i] = NULL;
 				cnt++;
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:12 2006 -0700
@@ -364,11 +364,14 @@ static int ipath_tid_update(struct ipath
 			   "vaddr %lx\n", i, tid + tidoff, vaddr);
 		/* we "know" system pages and TID pages are same size */
 		dd->ipath_pageshadow[porttid + tid] = pagep[i];
+		dd->ipath_physshadow[porttid + tid] = ipath_map_page(
+			dd->pcidev, pagep[i], 0, PAGE_SIZE,
+			PCI_DMA_FROMDEVICE);
 		/*
 		 * don't need atomic or it's overhead
 		 */
 		__set_bit(tid, tidmap);
-		physaddr = page_to_phys(pagep[i]);
+		physaddr = dd->ipath_physshadow[porttid + tid];
 		ipath_stats.sps_pagelocks++;
 		ipath_cdbg(VERBOSE,
 			   "TID %u, vaddr %lx, physaddr %llx pgp %p\n",
@@ -402,6 +405,9 @@ static int ipath_tid_update(struct ipath
 					   tid);
 				dd->ipath_f_put_tid(dd, &tidbase[tid], 1,
 						    dd->ipath_tidinvalid);
+				pci_unmap_page(dd->pcidev,
+					dd->ipath_physshadow[porttid + tid],
+					PAGE_SIZE, PCI_DMA_FROMDEVICE);
 				dd->ipath_pageshadow[porttid + tid] = NULL;
 				ipath_stats.sps_pageunlocks++;
 			}
@@ -515,6 +521,9 @@ static int ipath_tid_free(struct ipath_p
 				   pd->port_pid, tid);
 			dd->ipath_f_put_tid(dd, &tidbase[tid], 1,
 					    dd->ipath_tidinvalid);
+			pci_unmap_page(dd->pcidev,
+				dd->ipath_physshadow[porttid + tid],
+				PAGE_SIZE, PCI_DMA_FROMDEVICE);
 			ipath_release_user_pages(
 				&dd->ipath_pageshadow[porttid + tid], 1);
 			dd->ipath_pageshadow[porttid + tid] = NULL;
@@ -711,7 +720,7 @@ static int ipath_manage_rcvq(struct ipat
 		 * updated and correct itself, even in the face of software
 		 * bugs.
 		 */
-		*pd->port_rcvhdrtail_kvaddr = 0;
+		*(volatile u64 *)pd->port_rcvhdrtail_kvaddr = 0;
 		set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port,
 			&dd->ipath_rcvctrl);
 	} else
@@ -923,11 +932,11 @@ bail:
 
 /* common code for the mappings on dma_alloc_coherent mem */
 static int ipath_mmap_mem(struct vm_area_struct *vma,
-			     struct ipath_portdata *pd, unsigned len,
-			     int write_ok, dma_addr_t addr, char *what)
+	struct ipath_portdata *pd, unsigned len, int write_ok,
+	void *kvaddr, char *what)
 {
 	struct ipath_devdata *dd = pd->port_dd;
-	unsigned pfn = (unsigned long)addr >> PAGE_SHIFT;
+	unsigned long pfn;
 	int ret;
 
 	if ((vma->vm_end - vma->vm_start) > len) {
@@ -950,17 +959,17 @@ static int ipath_mmap_mem(struct vm_area
 		vma->vm_flags &= ~VM_MAYWRITE;
 	}
 
+	pfn = virt_to_phys(kvaddr) >> PAGE_SHIFT;
 	ret = remap_pfn_range(vma, vma->vm_start, pfn,
 			      len, vma->vm_page_prot);
 	if (ret)
-		dev_info(&dd->pcidev->dev,
-			 "%s port%u mmap of %lx, %x bytes r%c failed: %d\n",
-			 what, pd->port_port, (unsigned long)addr, len,
-			 write_ok?'w':'o', ret);
+		dev_info(&dd->pcidev->dev, "%s port%u mmap of %lx, %x "
+			 "bytes r%c failed: %d\n", what, pd->port_port,
+			 pfn, len, write_ok?'w':'o', ret);
 	else
-		ipath_cdbg(VERBOSE, "%s port%u mmaped %lx, %x bytes r%c\n",
-			what, pd->port_port, (unsigned long)addr, len,
-			 write_ok?'w':'o');
+		ipath_cdbg(VERBOSE, "%s port%u mmaped %lx, %x bytes "
+			   "r%c\n", what, pd->port_port, pfn, len,
+			   write_ok?'w':'o');
 bail:
 	return ret;
 }
@@ -1049,7 +1058,7 @@ static int mmap_rcvegrbufs(struct vm_are
 	struct ipath_devdata *dd = pd->port_dd;
 	unsigned long start, size;
 	size_t total_size, i;
-	dma_addr_t *phys;
+	unsigned long pfn;
 	int ret;
 
 	size = pd->port_rcvegrbuf_size;
@@ -1073,11 +1082,11 @@ static int mmap_rcvegrbufs(struct vm_are
 	vma->vm_flags &= ~VM_MAYWRITE;
 
 	start = vma->vm_start;
-	phys = pd->port_rcvegrbuf_phys;
 
 	for (i = 0; i < pd->port_rcvegrbuf_chunks; i++, start += size) {
-		ret = remap_pfn_range(vma, start, phys[i] >> PAGE_SHIFT,
-				      size, vma->vm_page_prot);
+		pfn = virt_to_phys(pd->port_rcvegrbuf[i]) >> PAGE_SHIFT;
+		ret = remap_pfn_range(vma, start, pfn, size,
+				      vma->vm_page_prot);
 		if (ret < 0)
 			goto bail;
 	}
@@ -1290,7 +1299,7 @@ static int ipath_mmap(struct file *fp, s
 	else if (pgaddr == dd->ipath_pioavailregs_phys)
 		/* in-memory copy of pioavail registers */
 		ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
-			      	     dd->ipath_pioavailregs_phys,
+			      	     (void *) dd->ipath_pioavailregs_dma,
 				     "pioavail registers");
 	else if (subport_fp(fp))
 		/* Subports don't mmap the physical receive buffers */
@@ -1304,12 +1313,12 @@ static int ipath_mmap(struct file *fp, s
 		 * from an i/o perspective.
 		 */
 		ret = ipath_mmap_mem(vma, pd, pd->port_rcvhdrq_size, 1,
-				     pd->port_rcvhdrq_phys,
+				     pd->port_rcvhdrq,
 				     "rcvhdrq");
 	else if (pgaddr == (u64) pd->port_rcvhdrqtailaddr_phys)
 		/* in-memory copy of rcvhdrq tail register */
 		ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
-				     pd->port_rcvhdrqtailaddr_phys,
+				     pd->port_rcvhdrtail_kvaddr,
 				     "rcvhdrq tail");
 	else
 		ret = -EINVAL;
@@ -1802,7 +1811,7 @@ static int ipath_do_user_init(struct fil
 	 * We explictly set the in-memory copy to 0 beforehand, so we don't
 	 * have to wait to be sure the DMA update has happened.
 	 */
-	*pd->port_rcvhdrtail_kvaddr = 0ULL;
+	*(volatile u64 *)pd->port_rcvhdrtail_kvaddr = 0ULL;
 	set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port,
 		&dd->ipath_rcvctrl);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
@@ -1832,6 +1841,8 @@ static void unlock_expected_tids(struct 
 		if (!dd->ipath_pageshadow[i])
 			continue;
 
+		pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i],
+			PAGE_SIZE, PCI_DMA_FROMDEVICE);
 		ipath_release_user_pages_on_close(&dd->ipath_pageshadow[i],
 						  1);
 		dd->ipath_pageshadow[i] = NULL;
@@ -1936,14 +1947,14 @@ static int ipath_close(struct inode *in,
 		i = dd->ipath_pbufsport * (port - 1);
 		ipath_disarm_piobufs(dd, i, dd->ipath_pbufsport);
 
+		dd->ipath_f_clear_tids(dd, pd->port_port);
+
 		if (dd->ipath_pageshadow)
 			unlock_expected_tids(pd);
 		ipath_stats.sps_ports--;
 		ipath_cdbg(PROC, "%s[%u] closed port %u:%u\n",
 			   pd->port_comm, pd->port_pid,
 			   dd->ipath_unit, port);
-
-		dd->ipath_f_clear_tids(dd, pd->port_port);
 	}
 
 	pd->port_pid = 0;
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_iba6120.c
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c	Thu Sep 28 08:57:12 2006 -0700
@@ -1113,7 +1113,7 @@ static void ipath_pe_put_tid_2(struct ip
 	if (pa != dd->ipath_tidinvalid) {
 		if (pa & ((1U << 11) - 1)) {
 			dev_info(&dd->pcidev->dev, "BUG: physaddr %lx "
-				 "not 4KB aligned!\n", pa);
+				 "not 2KB aligned!\n", pa);
 			return;
 		}
 		pa >>= 11;
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c	Thu Sep 28 08:57:12 2006 -0700
@@ -88,13 +88,13 @@ static int create_port0_egr(struct ipath
 static int create_port0_egr(struct ipath_devdata *dd)
 {
 	unsigned e, egrcnt;
-	struct sk_buff **skbs;
+	struct ipath_skbinfo *skbinfo;
 	int ret;
 
 	egrcnt = dd->ipath_rcvegrcnt;
 
-	skbs = vmalloc(sizeof(*dd->ipath_port0_skbs) * egrcnt);
-	if (skbs == NULL) {
+	skbinfo = vmalloc(sizeof(*dd->ipath_port0_skbinfo) * egrcnt);
+	if (skbinfo == NULL) {
 		ipath_dev_err(dd, "allocation error for eager TID "
 			      "skb array\n");
 		ret = -ENOMEM;
@@ -109,13 +109,13 @@ static int create_port0_egr(struct ipath
 		 * 4 bytes so that the data buffer stays word aligned.
 		 * See ipath_kreceive() for more details.
 		 */
-		skbs[e] = ipath_alloc_skb(dd, GFP_KERNEL);
-		if (!skbs[e]) {
+		skbinfo[e].skb = ipath_alloc_skb(dd, GFP_KERNEL);
+		if (!skbinfo[e].skb) {
 			ipath_dev_err(dd, "SKB allocation error for "
 				      "eager TID %u\n", e);
 			while (e != 0)
-				dev_kfree_skb(skbs[--e]);
-			vfree(skbs);
+				dev_kfree_skb(skbinfo[--e].skb);
+			vfree(skbinfo);
 			ret = -ENOMEM;
 			goto bail;
 		}
@@ -124,14 +124,17 @@ static int create_port0_egr(struct ipath
 	 * After loop above, so we can test non-NULL to see if ready
 	 * to use at receive, etc.
 	 */
-	dd->ipath_port0_skbs = skbs;
+	dd->ipath_port0_skbinfo = skbinfo;
 
 	for (e = 0; e < egrcnt; e++) {
-		unsigned long phys =
-			virt_to_phys(dd->ipath_port0_skbs[e]->data);
+		dd->ipath_port0_skbinfo[e].phys =
+		  ipath_map_single(dd->pcidev,
+				   dd->ipath_port0_skbinfo[e].skb->data,
+				   dd->ipath_ibmaxlen, PCI_DMA_FROMDEVICE);
 		dd->ipath_f_put_tid(dd, e + (u64 __iomem *)
 				    ((char __iomem *) dd->ipath_kregbase +
-				     dd->ipath_rcvegrbase), 0, phys);
+				     dd->ipath_rcvegrbase), 0,
+				    dd->ipath_port0_skbinfo[e].phys);
 	}
 
 	ret = 0;
@@ -432,16 +435,33 @@ done:
  */
 static void init_shadow_tids(struct ipath_devdata *dd)
 {
-	dd->ipath_pageshadow = (struct page **)
-		vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt *
+	struct page **pages;
+	dma_addr_t *addrs;
+
+	pages = vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt *
 			sizeof(struct page *));
-	if (!dd->ipath_pageshadow)
+	if (!pages) {
 		ipath_dev_err(dd, "failed to allocate shadow page * "
 			      "array, no expected sends!\n");
-	else
-		memset(dd->ipath_pageshadow, 0,
-		       dd->ipath_cfgports * dd->ipath_rcvtidcnt *
-		       sizeof(struct page *));
+		dd->ipath_pageshadow = NULL;
+		return;
+	}
+
+	addrs = vmalloc(dd->ipath_cfgports * dd->ipath_rcvtidcnt *
+			sizeof(dma_addr_t));
+	if (!addrs) {
+		ipath_dev_err(dd, "failed to allocate shadow dma handle "
+			      "array, no expected sends!\n");
+		vfree(dd->ipath_pageshadow);
+		dd->ipath_pageshadow = NULL;
+		return;
+	}
+
+	memset(pages, 0, dd->ipath_cfgports * dd->ipath_rcvtidcnt *
+	       sizeof(struct page *));
+
+	dd->ipath_pageshadow = pages;
+	dd->ipath_physshadow = addrs;
 }
 
 static void enable_chip(struct ipath_devdata *dd,
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c	Thu Sep 28 08:57:12 2006 -0700
@@ -605,7 +605,7 @@ static int handle_errors(struct ipath_de
 				 * don't report same point multiple times,
 				 * except kernel
 				 */
-				tl = (u32) * pd->port_rcvhdrtail_kvaddr;
+				tl = *(u64 *) pd->port_rcvhdrtail_kvaddr;
 				if (tl == dd->ipath_lastrcvhdrqtails[i])
 					continue;
 				hd = ipath_read_ureg32(dd, ur_rcvhdrhead,
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h	Thu Sep 28 08:57:12 2006 -0700
@@ -39,6 +39,8 @@
  */
 
 #include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/dma-mapping.h>
 #include <asm/io.h>
 
 #include "ipath_common.h"
@@ -62,7 +64,7 @@ struct ipath_portdata {
 	/* rcvhdrq base, needs mmap before useful */
 	void *port_rcvhdrq;
 	/* kernel virtual address where hdrqtail is updated */
-	volatile __le64 *port_rcvhdrtail_kvaddr;
+	void *port_rcvhdrtail_kvaddr;
 	/*
 	 * temp buffer for expected send setup, allocated at open, instead
 	 * of each setup call
@@ -146,6 +148,11 @@ struct _ipath_layer {
 	void *l_arg;
 };
 
+struct ipath_skbinfo {
+	struct sk_buff *skb;
+	dma_addr_t phys;
+};
+
 struct ipath_devdata {
 	struct list_head ipath_list;
 
@@ -168,7 +175,7 @@ struct ipath_devdata {
 	/* ipath_cfgports pointers */
 	struct ipath_portdata **ipath_pd;
 	/* sk_buffs used by port 0 eager receive queue */
-	struct sk_buff **ipath_port0_skbs;
+	struct ipath_skbinfo *ipath_port0_skbinfo;
 	/* kvirt address of 1st 2k pio buffer */
 	void __iomem *ipath_pio2kbase;
 	/* kvirt address of 1st 4k pio buffer */
@@ -335,6 +342,8 @@ struct ipath_devdata {
 	u64 *ipath_tidsimshadow;
 	/* shadow copy of struct page *'s for exp tid pages */
 	struct page **ipath_pageshadow;
+	/* shadow copy of dma handles for exp tid pages */
+	dma_addr_t *ipath_physshadow;
 	/* lock to workaround chip bug 9437 */
 	spinlock_t ipath_tid_lock;
 
@@ -865,6 +874,13 @@ int ipathfs_remove_device(struct ipath_d
 int ipathfs_remove_device(struct ipath_devdata *);
 
 /*
+ * dma_addr wrappers - all 0's invalid for hw
+ */
+dma_addr_t ipath_map_page(struct pci_dev *, struct page *, unsigned long,
+			  size_t, int);
+dma_addr_t ipath_map_single(struct pci_dev *, void *, size_t, int);
+
+/*
  * Flush write combining store buffers (if present) and perform a write
  * barrier.
  */
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_user_pages.c
--- a/drivers/infiniband/hw/ipath/ipath_user_pages.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_user_pages.c	Thu Sep 28 08:57:12 2006 -0700
@@ -90,6 +90,62 @@ bail:
 }
 
 /**
+ * ipath_map_page - a safety wrapper around pci_map_page()
+ *
+ * A dma_addr of all 0's is interpreted by the chip as "disabled".
+ * Unfortunately, it can also be a valid dma_addr returned on some
+ * architectures.
+ *
+ * The powerpc iommu assigns dma_addrs in ascending order, so we don't
+ * have to bother with retries or mapping a dummy page to insure we
+ * don't just get the same mapping again.
+ *
+ * I'm sure we won't be so lucky with other iommu's, so FIXME.
+ */
+dma_addr_t ipath_map_page(struct pci_dev *hwdev, struct page *page,
+	unsigned long offset, size_t size, int direction)
+{
+	dma_addr_t phys;
+
+	phys = pci_map_page(hwdev, page, offset, size, direction);
+
+	if (phys == 0) {
+		pci_unmap_page(hwdev, phys, size, direction);
+		phys = pci_map_page(hwdev, page, offset, size, direction);
+		/*
+		 * FIXME: If we get 0 again, we should keep this page,
+		 * map another, then free the 0 page.
+		 */
+	}
+
+	return phys;
+}
+
+/**
+ * ipath_map_single - a safety wrapper around pci_map_single()
+ *
+ * Same idea as ipath_map_page().
+ */
+dma_addr_t ipath_map_single(struct pci_dev *hwdev, void *ptr, size_t size,
+	int direction)
+{
+	dma_addr_t phys;
+
+	phys = pci_map_single(hwdev, ptr, size, direction);
+
+	if (phys == 0) {
+		pci_unmap_single(hwdev, phys, size, direction);
+		phys = pci_map_single(hwdev, ptr, size, direction);
+		/*
+		 * FIXME: If we get 0 again, we should keep this page,
+		 * map another, then free the 0 page.
+		 */
+	}
+
+	return phys;
+}
+
+/**
  * ipath_get_user_pages - lock user pages into memory
  * @start_page: the start page
  * @num_pages: the number of pages
diff -r cdbbf110848d -r f6794c8289ab drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
--- a/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c	Thu Sep 28 08:57:12 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c	Thu Sep 28 08:57:12 2006 -0700
@@ -38,13 +38,23 @@
 #include "ipath_kernel.h"
 
 /**
- * ipath_unordered_wc - indicate whether write combining is ordered
+ * ipath_enable_wc - enable write combining for MMIO writes to the device
+ * @dd: infinipath device
  *
- * PowerPC systems (at least those in the 970 processor family)
- * write partially filled store buffers in address order, but will write
- * completely filled store buffers in "random" order, and therefore must
- * have serialization for correctness with current InfiniPath chips.
+ * Nothing to do on PowerPC, so just return without error.
+ */
+int ipath_enable_wc(struct ipath_devdata *dd)
+{
+	return 0;
+}
+
+/**
+ * ipath_unordered_wc - indicate whether write combining is unordered
  *
+ * Because our performance depends on our ability to do write
+ * combining mmio writes in the most efficient way, we need to
+ * know if we are on a processor that may reorder stores when
+ * write combining.
  */
 int ipath_unordered_wc(void)
 {


From bos at pathscale.com  Thu Sep 28 09:00:21 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:00:21 -0700
Subject: [openib-general] [PATCH 25 of 28] IB/ipath - Set CPU affinity early
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <4269068599c270538c2e.1159459221@eng-12.pathscale.com>

This change moves around port assignment so that it happens before any
memory is allocated.  This allows memory to be allocated on an appropriate
CPU, which improves performance for users of /dev/ipath.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r 9fa624c592af -r 4269068599c2 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h	Thu Sep 28 08:57:13 2006 -0700
@@ -412,15 +412,17 @@ struct ipath_user_info {
 
 #define IPATH_CMD_MIN		16
 
-#define IPATH_CMD_USER_INIT	16	/* set up userspace */
+#define __IPATH_CMD_USER_INIT	16	/* old set up userspace (for old user code) */
 #define IPATH_CMD_PORT_INFO	17	/* find out what resources we got */
 #define IPATH_CMD_RECV_CTRL	18	/* control receipt of packets */
 #define IPATH_CMD_TID_UPDATE	19	/* update expected TID entries */
 #define IPATH_CMD_TID_FREE	20	/* free expected TID entries */
 #define IPATH_CMD_SET_PART_KEY	21	/* add partition key */
 #define IPATH_CMD_SLAVE_INFO	22	/* return info on slave processes */
-
-#define IPATH_CMD_MAX		22
+#define IPATH_CMD_ASSIGN_PORT	23	/* allocate HCA and port */
+#define IPATH_CMD_USER_INIT 	24	/* set up userspace */
+
+#define IPATH_CMD_MAX		24
 
 struct ipath_port_info {
 	__u32 num_active;	/* number of active units */
diff -r 9fa624c592af -r 4269068599c2 drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:13 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c	Thu Sep 28 08:57:13 2006 -0700
@@ -1701,18 +1701,17 @@ done:
 
 static int ipath_open(struct inode *in, struct file *fp)
 {
-	/* The real work is performed later in ipath_do_user_init() */
+	/* The real work is performed later in ipath_assign_port() */
 	fp->private_data = kzalloc(sizeof(struct ipath_filedata), GFP_KERNEL);
 	return fp->private_data ? 0 : -ENOMEM;
 }
 
-static int ipath_do_user_init(struct file *fp,
+
+// Get port early, so can set affinity prior to memory allocation
+static int ipath_assign_port(struct file *fp,
 			      const struct ipath_user_info *uinfo)
 {
 	int ret;
-	struct ipath_portdata *pd;
-	struct ipath_devdata *dd;
-	u32 head32;
 	int i_minor;
 	unsigned swminor;
 
@@ -1757,8 +1756,18 @@ static int ipath_do_user_init(struct fil
 
 	mutex_unlock(&ipath_mutex);
 
-	if (ret)
-		goto done;
+done:
+	return ret;
+}
+
+
+static int ipath_do_user_init(struct file *fp,
+			      const struct ipath_user_info *uinfo)
+{
+	int ret;
+	struct ipath_portdata *pd;
+	struct ipath_devdata *dd;
+	u32 head32;
 
 	pd = port_fp(fp);
 	dd = pd->port_dd;
@@ -2035,6 +2044,8 @@ static ssize_t ipath_write(struct file *
 	consumed = sizeof(cmd.type);
 
 	switch (cmd.type) {
+	case IPATH_CMD_ASSIGN_PORT:
+	case __IPATH_CMD_USER_INIT:
 	case IPATH_CMD_USER_INIT:
 		copy = sizeof(cmd.cmd.user_info);
 		dest = &cmd.cmd.user_info;
@@ -2083,12 +2094,24 @@ static ssize_t ipath_write(struct file *
 
 	consumed += copy;
 	pd = port_fp(fp);
-	if (!pd && cmd.type != IPATH_CMD_USER_INIT) {
+	if (!pd && cmd.type != __IPATH_CMD_USER_INIT &&
+		cmd.type != IPATH_CMD_ASSIGN_PORT) {
 		ret = -EINVAL;
 		goto bail;
 	}
 
 	switch (cmd.type) {
+	case IPATH_CMD_ASSIGN_PORT:
+		ret = ipath_assign_port(fp, &cmd.cmd.user_info);
+		if (ret)
+			goto bail;
+		break;
+	case __IPATH_CMD_USER_INIT:
+		// backwards compatibility, get port first
+		ret = ipath_assign_port(fp, &cmd.cmd.user_info);
+		if (ret)
+			goto bail;
+		// and fall through to current version.
 	case IPATH_CMD_USER_INIT:
 		ret = ipath_do_user_init(fp, &cmd.cmd.user_info);
 		if (ret)


From sean.hefty at intel.com  Thu Sep 28 09:02:14 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 28 Sep 2006 09:02:14 -0700
Subject: [openib-general] RDMA CM callback status
In-Reply-To: <20060928063133.GI23828@mellanox.co.il>
Message-ID: <000601c6e317$76ca4c00$8698070a@amr.corp.intel.com>

>Can you post a patch pls?

This was the patch committed to svn.  I'm creating a patch set for review for
2.6.19/2.6.20 to merge the svn code upstream.  I will post those patches against
the 2.6.19 code tree when they are ready.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>

Index: core/cma.c
===================================================================
--- core/cma.c	(revision 9652)
+++ core/cma.c	(revision 9653)
@@ -1245,6 +1245,7 @@
 		work->old_state = CMA_ROUTE_QUERY;
 		work->new_state = CMA_ADDR_RESOLVED;
 		work->event.event = RDMA_CM_EVENT_ROUTE_ERROR;
+		work->event.status = status;
 	}
 
 	queue_work(cma_wq, &work->work);


From rdreier at cisco.com  Thu Sep 28 09:11:11 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 09:11:11 -0700
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <Pine.LNX.4.64.0609281145140.9963@jlentini-linux.nane.netapp.com>
	(James Lentini's message of "Thu, 28 Sep 2006 11:48:10 -0400 (EDT)")
References: <20060926135114.1da96c1b@freekitty>
	<adaac4l7zjd.fsf@cisco.com> <20060928062919.GH23828@mellanox.co.il>
	<ada4pus6o0r.fsf@cisco.com>
	<Pine.LNX.4.64.0609281145140.9963@jlentini-linux.nane.netapp.com>
Message-ID: <adaslic55uo.fsf@cisco.com>

    Michael> BTW, is there some printk format to print u64 type?

    James> Try "%Lu", That will print a long long unsigned value.

That's the problem -- u64 is not always unsigned long long.  For
example on ppc64, u64 is just unsigned long.

 - R.


From rdreier at cisco.com  Thu Sep 28 09:11:47 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 09:11:47 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <20060928151549.GG28790@mellanox.co.il> (Michael S.
	Tsirkin's message of "Thu, 28 Sep 2006 18:15:49 +0300")
References: <ada3bad7zaa.fsf@cisco.com>
	<20060928060817.GD23828@mellanox.co.il> <adad59g6o24.fsf@cisco.com>
	<20060928151549.GG28790@mellanox.co.il>
Message-ID: <adaodt055to.fsf@cisco.com>

    Michael> Looked pretty simple on the outset, but oh well. Keep us
    Michael> posted.

I just work slowly.

Anyway I don't think this is that urgent -- we've dumped enough stuff
into 2.6.19, so I think this should wait for 2.6.20 at the earliest anyway.


From mshefty at ichips.intel.com  Thu Sep 28 09:09:58 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 28 Sep 2006 09:09:58 -0700
Subject: [openib-general] [RFC] [PATCH] ib_cm: send DREP in response to
 unmatched DREQ
In-Reply-To: <451B6945.1050707@voltaire.com>
References: <000401c6d853$f5dfd510$97d8180a@amr.corp.intel.com>
	<451ABF0C.90607@ichips.intel.com> <451B6945.1050707@voltaire.com>
Message-ID: <451BF3D6.7080403@ichips.intel.com>

Or Gerlitz wrote:
> My understanding is that without this patch the side that sends the DREQ 
> would do few DREQ resends as of the "firsts" DREPs being lost and no 
> DREPs sent once the id at the peer side left the timewait state, correct?

This is correct.  Note that the number of DREQ retries was changed to 15 now.

> Can you please share what were the implications with intel MPI running a 
> 64 nodes (128 ranks?) job? was the issue here just making the ***job 
> termination time*** bigger?

The job termination time was taking about a minute waiting for the DREQ to 
timeout.  When running a series of tests, this becomes a fairly large issue.

- Sean


From jlentini at netapp.com  Thu Sep 28 09:21:52 2006
From: jlentini at netapp.com (James Lentini)
Date: Thu, 28 Sep 2006 12:21:52 -0400 (EDT)
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159455506.11976.1.camel@chalcedony.pathscale.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
Message-ID: <Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>


On Thu, 28 Sep 2006, Bryan O'Sullivan wrote:

> On Thu, 2006-09-28 at 09:27 +0300, Michael S. Tsirkin wrote:
> 
> > Me too.
> 
> Roland and I (following his example) checked in changes to the mthca and
> ipath drivers in SVN yesterday that add a #warning to a core driver
> source file saying "don't look here, look over there!"  That's a first
> step towards dropping the drivers from SVN trunk altogether.

As a user of the SVN repository, I'm confused about what this means 
going forward. 

Are you going to completely remove the mthca and ipath code from SVN 
or just stop updating the code that is there?

Will the other components that are upstream (SRP, iSER, IPoIB, CM, 
RDMA CM, SA, MAD, CORE, ...) be removed? What rules are you using to 
determine if the SVN version will be kept up to date?

To date, the process for using and testing new OFA features has been 
very simple. Users simply downloaded the lastest stable kernel release 
and replaced the drivers/infiniband directory with the sources out of 
SVN. This worked for the development of new components (e.g. eHCA, 
ipath, RDMA cm, iSER, SRP, etc.). In the future, how will users work 
with new features that are not yet upstream?


From shemminger at osdl.org  Thu Sep 28 08:39:02 2006
From: shemminger at osdl.org (Stephen Hemminger)
Date: Thu, 28 Sep 2006 08:39:02 -0700
Subject: [openib-general] Compile warnings (cross build)
In-Reply-To: <adaslic55uo.fsf@cisco.com>
References: <20060926135114.1da96c1b@freekitty>
	<adaac4l7zjd.fsf@cisco.com> <20060928062919.GH23828@mellanox.co.il>
	<ada4pus6o0r.fsf@cisco.com>
	<Pine.LNX.4.64.0609281145140.9963@jlentini-linux.nane.netapp.com>
	<adaslic55uo.fsf@cisco.com>
Message-ID: <20060928083902.62850820@freekitty>

On Thu, 28 Sep 2006 09:11:11 -0700
Roland Dreier <rdreier at cisco.com> wrote:

>     Michael> BTW, is there some printk format to print u64 type?
> 
>     James> Try "%Lu", That will print a long long unsigned value.
> 
> That's the problem -- u64 is not always unsigned long long.  For
> example on ppc64, u64 is just unsigned long.
> 
>  - R.

The only safe way is to cast u64 to long long unsigned. and then use
either %Lu or %llu as format string.  It means that on 64bit platforms
the u64 will end up getting extended, but the it's harmless.

-- 
Stephen Hemminger <shemminger at osdl.org>


From bos at pathscale.com  Thu Sep 28 09:31:33 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 09:31:33 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
Message-ID: <1159461093.5010.8.camel@chalcedony.pathscale.com>

On Thu, 2006-09-28 at 12:21 -0400, James Lentini wrote:

> As a user of the SVN repository, I'm confused about what this means 
> going forward. 
> 
> Are you going to completely remove the mthca and ipath code from SVN 
> or just stop updating the code that is there?

I will let Roland speak for the mthca driver, but we have stopped
maintaining the ipath driver in the SVN tree, and I expect that we will
remove it entirely in perhaps a month or so.

> Will the other components that are upstream (SRP, iSER, IPoIB, CM, 
> RDMA CM, SA, MAD, CORE, ...) be removed? What rules are you using to 
> determine if the SVN version will be kept up to date?

I have no stake in what happens to those components, but I would not
personally mind if they moved into Roland's git tree.  I don't care for
git, but I vastly prefer using it to waiting for SVN.

> In the future, how will users work 
> with new features that are not yet upstream?

One possibility would be to pull the same components out of a branch of
a git tree; same procedure, different source.

	<b


From robert.j.woodruff at intel.com  Thu Sep 28 09:58:06 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 28 Sep 2006 09:58:06 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>

James wrote,
>As a user of the SVN repository, I'm confused about what this means 
>going forward. 

>Are you going to completely remove the mthca and ipath code from SVN 
>or just stop updating the code that is there?

I have said this before, but I will repeat myself once again.
I really do not care where the latest code is, but there needs
to be ONE place where we can get all the latest code for development
and testing. Right now there are three branches, SVN which has some
of Sean's latest changes, Rolands git tree, and the OFED git tree.
All three of these have slightly different code bases and thus
there is no one "latest" code base anymore and that is really
confusing for people trying to use and test with the latest code
to make sure their components work properly. 
It also multiplies the testing efforts, do we test with the SVN version,
the OFED version, Roland's version.
As an ISV of a 3rd party ULP (Intel MPI) this is making my life
much more difficult than it should be. 
Please get your act together and lets get back to ONE database
for the trunk code. I can live with having a branch for OFED
releases as I see the need to branch and stabilize periodically
for releases, but having 2 different development trees (Rolands git tree
and SVN)
for development is not working very well.

my 2 cents.

woody


From mlleinin at hpcn.ca.sandia.gov  Thu Sep 28 10:19:59 2006
From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger)
Date: Thu, 28 Sep 2006 10:19:59 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159461093.5010.8.camel@chalcedony.pathscale.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
Message-ID: <1159463999.15009.207.camel@localhost>

  If we move forward with a git repository then we should move all
kernel code into git.  I don't want to get into a situation where kernel
components are spread out over various repositories and servers.  I'm
all for making your development lives easier.  The entire development
tree has gotten very confusing over the past few months.  The ipath
driver is never up to date (therefore it's always broken).  Iwarp is
upstream but not in the main line development tree.  If a simpler
process can fix this then I'm all for it.

  So what it your proposal (Roland and Bryan)?  Do you want to move all
kernel development into Roland's git tree, and have the user space code
stay in svn (at least for the time being)?  This would allow OFED
releases to be pulled direct from Roland's git tree (kernel) and the
openfabrics svn (user space).   BTW if it is useful we can set up a git
repository on openfabrics once we move the server to its new provider.

 Thanks,

  - Matt


On Thu, 2006-09-28 at 09:31 -0700, Bryan O'Sullivan wrote:
> On Thu, 2006-09-28 at 12:21 -0400, James Lentini wrote:
> 
> > As a user of the SVN repository, I'm confused about what this means 
> > going forward. 
> > 
> > Are you going to completely remove the mthca and ipath code from SVN 
> > or just stop updating the code that is there?
> 
> I will let Roland speak for the mthca driver, but we have stopped
> maintaining the ipath driver in the SVN tree, and I expect that we will
> remove it entirely in perhaps a month or so.
> 
> > Will the other components that are upstream (SRP, iSER, IPoIB, CM, 
> > RDMA CM, SA, MAD, CORE, ...) be removed? What rules are you using to 
> > determine if the SVN version will be kept up to date?
> 
> I have no stake in what happens to those components, but I would not
> personally mind if they moved into Roland's git tree.  I don't care for
> git, but I vastly prefer using it to waiting for SVN.
> 
> > In the future, how will users work 
> > with new features that are not yet upstream?
> 
> One possibility would be to pull the same components out of a branch of
> a git tree; same procedure, different source.
> 
> 	<b
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From rdreier at cisco.com  Thu Sep 28 10:33:17 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 10:33:17 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159463999.15009.207.camel@localhost> (Matt Leininger's
	message of "Thu, 28 Sep 2006 10:19:59 -0700")
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost>
Message-ID: <adad59f6gma.fsf@cisco.com>

    Matt>   So what it your proposal (Roland and Bryan)?  Do you want
    Matt> to move all kernel development into Roland's git tree, and
    Matt> have the user space code stay in svn (at least for the time
    Matt> being)?

My proposal would be to leave userspace in svn, and make Linus's git
tree the definitive source for Linux kernel code.  My git tree may be
useful for people who want to try things that haven't been merged
upstream yet, but other developers of Linux kernel code may want to
host their work too (either as a git tree, a patch set, or however
else they want).  This would match existing practice for other
subsystems pretty closely.

 - R.


From rdreier at cisco.com  Thu Sep 28 10:31:07 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 10:31:07 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	(Robert J. Woodruff's message of "Thu, 28 Sep 2006 09:58:06 -0700")
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
Message-ID: <adahcyr6gpw.fsf@cisco.com>

 > I have said this before, but I will repeat myself once again.
 > I really do not care where the latest code is, but there needs
 > to be ONE place where we can get all the latest code for development
 > and testing.

I'll repeat my usual response: the notion of a single "latest" tree
doesn't match reality, and any attempt to coerce things into that mold
just causes problems.  There's not necessarily any correlation between
the newest ipath code and Sean's RDMA CM.

git (or any other true distributed SCM system) makes this easier to
handle: you can easily merge the branches you're interested in trying
into your local tree.

 - R.


From bos at pathscale.com  Thu Sep 28 10:43:20 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 10:43:20 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
Message-ID: <1159465400.5010.49.camel@chalcedony.pathscale.com>

On Thu, 2006-09-28 at 09:58 -0700, Woodruff, Robert J wrote:

> I have said this before, but I will repeat myself once again.
> I really do not care where the latest code is, but there needs
> to be ONE place where we can get all the latest code for development
> and testing. Right now there are three branches, SVN which has some
> of Sean's latest changes, Rolands git tree, and the OFED git tree.

If you want to focus on one thing to test, use Linus's current git tree
or a release candidate tarball from it.  That way, all of the extraneous
cruft that sits in SVN doesn't matter until someone actually submits it,
and everyone has a shared understanding of what bits to bang on.  The
OFED tree gets built from what's in Linus's tree, so if something gets
fixed in Linus's tree, the fix will percolate into OFED.

This might lose you the ability to look in a single place to test stuff
like SDP that's not yet upstream, but as far as I'm concerned, that's
*good*.  At least you'll then know for sure "I'm testing something that
is different from what other people are working with".

	<b


From mlleinin at hpcn.ca.sandia.gov  Thu Sep 28 10:52:34 2006
From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger)
Date: Thu, 28 Sep 2006 10:52:34 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adad59f6gma.fsf@cisco.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost>  <adad59f6gma.fsf@cisco.com>
Message-ID: <1159465954.15009.223.camel@localhost>

On Thu, 2006-09-28 at 10:33 -0700, Roland Dreier wrote:
>     Matt>   So what it your proposal (Roland and Bryan)?  Do you want
>     Matt> to move all kernel development into Roland's git tree, and
>     Matt> have the user space code stay in svn (at least for the time
>     Matt> being)?
> 
> My proposal would be to leave userspace in svn, and make Linus's git
> tree the definitive source for Linux kernel code.  My git tree may be
> useful for people who want to try things that haven't been merged
> upstream yet, but other developers of Linux kernel code may want to
> host their work too (either as a git tree, a patch set, or however
> else they want).  This would match existing practice for other
> subsystems pretty closely.
> 
  That sounds reasonable to me.  

  I'd add one more thing.  To make the OFED release process go more
smoothly I'd like to see the maintainers for each stack component spin
out releases from time to time.  Roland has been doing this with
libmthca and libibverbs.  If we had the development releases for other
kernel and all user space components then OFED could simple combine the
latest development releases and start more through testing.

  Thoughts?

  - Matt


From robert.j.woodruff at intel.com  Thu Sep 28 10:53:45 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 28 Sep 2006 10:53:45 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBEED61@orsmsx418.amr.corp.intel.com>

Bryan wrote,
>If you want to focus on one thing to test, use Linus's current git tree
>or a release candidate tarball from it.  That way, all of the
extraneous
>cruft that sits in SVN doesn't matter until someone actually submits
it,
>and everyone has a shared understanding of what bits to bang on.  The
>OFED tree gets built from what's in Linus's tree, so if something gets
>fixed in Linus's tree, the fix will percolate into OFED.

The problem I have is that I need to test new code that is not yet
ready to be merged upstream and thus is not yet in Linus's tree.
We have deliverables to the National labs as part of the Pathforward
work and we need a tree where we can put the code, they can
pull it and test it and provide feedback prior to us submitting
it upstream. In the past, all of the latest code has been
put into SVN which allowed us and our customers the ability to
try it out and provide feedback so we know it is what they
want/need before it is submitted to Linus via Roland. 

Perhaps we need our own Pathforward tree for this, but we would
rather not have to maintain a separate tree for this work
and would prefer the model that we used over the last couple of
years, where there was one main trunk development branch.

woody


From rdreier at cisco.com  Thu Sep 28 10:56:18 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 10:56:18 -0700
Subject: [openib-general] [PATCH 0/3] IB/iser: bug fixes for 2.6.19 rc1
In-Reply-To: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus> (Erez
	Zilber's message of "Wed, 27 Sep 2006 15:21:35 +0300 (IDT)")
References: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
Message-ID: <ada1wpv6fjx.fsf@cisco.com>

Thanks, applied

although I had to fix up patch 3/3 by hand, since it did not apply to my tree
<standard whine>
I merge > 100 patches every kernel release.  If I have to spend an
extra 5 minutes for each one fixing a patch or pulling it out of svn,
then I end up burning an extra 9 hours of stupid work.  If 20+ people
who contribute patches sent me clean patches, then everyone will be
happier because I'll be able to merge things quicker and focus on
productive work.
</standard whine>


From rdreier at cisco.com  Thu Sep 28 10:59:12 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 10:59:12 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159465954.15009.223.camel@localhost> (Matt Leininger's
	message of "Thu, 28 Sep 2006 10:52:34 -0700")
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost>
Message-ID: <adawt7n50un.fsf@cisco.com>

    Matt>   I'd add one more thing.  To make the OFED release process
    Matt> go more smoothly I'd like to see the maintainers for each
    Matt> stack component spin out releases from time to time.  Roland
    Matt> has been doing this with libmthca and libibverbs.  If we had
    Matt> the development releases for other kernel and all user space
    Matt> components then OFED could simple combine the latest
    Matt> development releases and start more through testing.

Yes, I strongly support that, although the OFED benefits are just a
side effect to me.  The real reason to have these releases is to
support distributions other than OFED -- for example having tarball
releases of all the components makes it possible to get this stuff
further upstream into real Linux distros.

eg I have gotten libibverbs/libmthca into Fedora Extras and
Debian/Ubuntu, so users of those distros can install them natively
using standard distro tools.

 - R.


From mshefty at ichips.intel.com  Thu Sep 28 10:59:41 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 28 Sep 2006 10:59:41 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159465954.15009.223.camel@localhost>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost>
Message-ID: <451C0D8D.5030601@ichips.intel.com>

Matt Leininger wrote:
>   I'd add one more thing.  To make the OFED release process go more
> smoothly I'd like to see the maintainers for each stack component spin
> out releases from time to time.  Roland has been doing this with
> libmthca and libibverbs.  If we had the development releases for other
> kernel and all user space components then OFED could simple combine the
> latest development releases and start more through testing.

I agree, but to clarify:  The rdma_cm does not have kernel support for userspace 
upstream yet.  A release of librdmacm before that seems premature.  Likewise, 
the libibcm is, for practical purposes, useless without a userspace SA solution.

- Sean


From halr at voltaire.com  Thu Sep 28 10:59:06 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Sep 2006 13:59:06 -0400
Subject: [openib-general] [PATCH TRIVIAL] opensm: libibumad: show
 open()'s errno string.
In-Reply-To: <20060926205130.GB23096@sashak.voltaire.com>
References: <20060926205130.GB23096@sashak.voltaire.com>
Message-ID: <1159466342.4353.317825.camel@hal.voltaire.com>

On Tue, 2006-09-26 at 16:51, Sasha Khapyorsky wrote:
> Show errno string then open() fails.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>

Thanks. Applied to trunk only.

-- Hal


From bos at pathscale.com  Thu Sep 28 11:00:41 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 11:00:41 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CBEED61@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEED61@orsmsx418.amr.corp.intel.com>
Message-ID: <1159466441.5010.58.camel@chalcedony.pathscale.com>

On Thu, 2006-09-28 at 10:53 -0700, Woodruff, Robert J wrote:

> Perhaps we need our own Pathforward tree for this, but we would
> rather not have to maintain a separate tree for this work
> and would prefer the model that we used over the last couple of
> years, where there was one main trunk development branch.

I understand your desire to have a single tree, but it's just not
feasible.  For Pathforward, you have presumably got a bunch of features
to deal with that as a non-Pathforward participant I don't want to be
troubled by as I try to assure that the ipath driver is in reasonable
shape.

And really, I think that if you give it a try, you will not find
maintaining a git or whatever tree to be much work; in fact, it's vastly
easier in my experience than having a rat's nest of unrelated things all
artificially crammed into a single branch.

	<b


From robert.j.woodruff at intel.com  Thu Sep 28 11:11:10 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 28 Sep 2006 11:11:10 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBEEDCB@orsmsx418.amr.corp.intel.com>

Bryan wrote,
>I understand your desire to have a single tree, but it's just not
>feasible.  For Pathforward, you have presumably got a bunch of features
>to deal with that as a non-Pathforward participant I don't want to be
>troubled by as I try to assure that the ipath driver is in reasonable
>shape.

Given that people like the Labs are the customers that buy your
hardware, you should be concerned that your driver works with the
features that they want, even if you are not a pathforward
participant.  If not, and your driver does not work,
they will just buy someone else's
hardware that does work with those features. So it is really in
your best interest to have a working driver in the same
development tree that is being used for the Pathforward development,
right now that is SVN. 


woody


From parks at lanl.gov  Thu Sep 28 11:16:14 2006
From: parks at lanl.gov (Parks Fields)
Date: Thu, 28 Sep 2006 12:16:14 -0600
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159463999.15009.207.camel@localhost>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost>
Message-ID: <7.0.1.0.2.20060928121458.025b0310@lanl.gov>

At 11:19 AM 9/28/2006, Matt Leininger wrote:
>   If we move forward with a git repository then we should move all
>kernel code into git.  I don't want to get into a situation where kernel
>components are spread out over various repositories and servers.  I'm
>all for making your development lives easier.  The entire development
>tree has gotten very confusing over the past few months.  The ipath
>driver is never up to date (therefore it's always broken).  Iwarp is
>upstream but not in the main line development tree.  If a simpler
>process can fix this then I'm all for it.


I agree.   We need to make this useable by the major of the people. I 
know you can't please all the people all the time.


                    ***** Correspondence *****

This email contains no programmatic content that requires independent 
ADC review  


From rdreier at cisco.com  Thu Sep 28 11:14:28 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 11:14:28 -0700
Subject: [openib-general] [PATCH 24 of 28] IB/mthca - Fix compiler
 warnings with gcc4 on possible unitialized variables
In-Reply-To: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com> (
	Bryan O'Sullivan's message of "Thu, 28 Sep 2006 09:00:20 -0700")
References: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com>
Message-ID: <adaslib5057.fsf@cisco.com>

NAK -- I don't want to generate worse code to fix a compiler warning
false positive.

 - R.


From rdreier at cisco.com  Thu Sep 28 11:15:03 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 11:15:03 -0700
Subject: [openib-general] [PATCH 4 of 28] IB/ipath - support revision 2
 InfiniPath PCIE devices
In-Reply-To: <a69f8b7a8a04a8742e0f.1159459200@eng-12.pathscale.com> (
	Bryan O'Sullivan's message of "Thu, 28 Sep 2006 09:00:00 -0700")
References: <a69f8b7a8a04a8742e0f.1159459200@eng-12.pathscale.com>
Message-ID: <adaodsz5048.fsf@cisco.com>

 > +	/* 

 > +		/* Use GPIO interrupts for new counters */    

trailing whitespace...


From rdreier at cisco.com  Thu Sep 28 11:15:45 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 11:15:45 -0700
Subject: [openib-general] [PATCH 25 of 28] IB/ipath - Set CPU affinity
	early
In-Reply-To: <4269068599c270538c2e.1159459221@eng-12.pathscale.com> (
	Bryan O'Sullivan's message of "Thu, 28 Sep 2006 09:00:21 -0700")
References: <4269068599c270538c2e.1159459221@eng-12.pathscale.com>
Message-ID: <adak63n5032.fsf@cisco.com>

 > +// Get port early, so can set affinity prior to memory allocation

C++ style comments are frowned on in the kernel.

I fixed all the new ones up to "/* */" style when applying the
patches.


From rdreier at cisco.com  Thu Sep 28 11:16:57 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 11:16:57 -0700
Subject: [openib-general] [PATCH 1 of 28] IB/ipath - limit # of packets
 sent without an ACK received
In-Reply-To: <c46292ccb0f54abc77f7.1159459197@eng-12.pathscale.com> (
	Bryan O'Sullivan's message of "Thu, 28 Sep 2006 08:59:57 -0700")
References: <c46292ccb0f54abc77f7.1159459197@eng-12.pathscale.com>
Message-ID: <adafyeb5012.fsf@cisco.com>

I applied all except #24 with minor comments as sent separately.

 - R.


From rdreier at cisco.com  Thu Sep 28 11:20:16 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 11:20:16 -0700
Subject: [openib-general] [GIT PULL] Please pull infiniband.git
Message-ID: <adabqoz4zvj.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will merge:
 - ipath updates
 - iSER updates
 - a few of amso1100 Coverity fixes and warning cleanups

Bryan O'Sullivan:
      IB/ipath: Limit # of packets sent without an ACK received
      IB/ipath: Fix memory leak if allocation fails
      IB/ipath: Driver support for userspace sharing of HW contexts
      IB/ipath: Support revision 2 InfiniPath PCIE devices
      IB/ipath: Unregister from IB core early
      IB/ipath: Clean up handling of GUID 0
      IB/ipath: Lock and count allocated CQs properly
      IB/ipath: Count SRQs properly
      IB/ipath: Only allow complete writes to flash
      IB/ipath: RC and UC should validate SLID and DLID
      IB/ipath: Ensure that PD of MR matches PD of QP checking the Rkey
      IB/ipath: Print more informative parity error messages
      IB/ipath: Fix compiler warnings and errors on non-x86_64 systems
      IB/ipath: Fix mismatch in shifts and masks for printing debug info
      IB/ipath: Support multiple simultaneous devices of different types
      IB/ipath: Drop unnecessary "(void *)" casts
      IB/ipath: Improved support for PowerPC
      IB/ipath: Flush RWQEs if access error or invalid error seen
      IB/ipath: Call mtrr_del with correct arguments
      IB/ipath: Clean up module exit code
      IB/ipath: Change HT CRC message to indicate how to resolve problem
      IB/ipath: Fix and recover TXE piobuf and PBC parity errors
      IB/ipath: Fix EEPROM read when driver is compiled with -Os
      IB/ipath: Set CPU affinity early
      IB/ipath: Support new PCIE device, QLE7142
      IB/ipath: Fix races with ib_resize_cq()
      IB/ipath: Fix lockdep error upon "ifconfig ibN down"

Erez Zilber:
      IB/iser: Have iSER data transaction object point to iSER conn
      IB/iser: DMA unmap unaligned for RDMA data before touching it
      IB/iser: Fix the description of iSER in Kconfig

Eric Sesterhenn:
      RDMA/amso1100: Fix error path in c2_llp_accept()

Roland Dreier:
      RDMA/amso1100: Fix compile warnings
      RDMA/amso1100: Fix memory leak in c2_reg_phys_mr()

 drivers/infiniband/hw/amso1100/c2_ae.c         |    2 
 drivers/infiniband/hw/amso1100/c2_alloc.c      |    2 
 drivers/infiniband/hw/amso1100/c2_cm.c         |   15 
 drivers/infiniband/hw/amso1100/c2_provider.c   |    8 
 drivers/infiniband/hw/amso1100/c2_rnic.c       |    4 
 drivers/infiniband/hw/ipath/ipath_common.h     |   54 +
 drivers/infiniband/hw/ipath/ipath_cq.c         |   48 +
 drivers/infiniband/hw/ipath/ipath_driver.c     |  359 ++++-----
 drivers/infiniband/hw/ipath/ipath_eeprom.c     |   17 
 drivers/infiniband/hw/ipath/ipath_file_ops.c   |  974 ++++++++++++++++++------
 drivers/infiniband/hw/ipath/ipath_fs.c         |    9 
 drivers/infiniband/hw/ipath/ipath_iba6110.c    |  132 ++-
 drivers/infiniband/hw/ipath/ipath_iba6120.c    |  263 ++++--
 drivers/infiniband/hw/ipath/ipath_init_chip.c  |   56 +
 drivers/infiniband/hw/ipath/ipath_intr.c       |  280 +++++--
 drivers/infiniband/hw/ipath/ipath_kernel.h     |  116 +++
 drivers/infiniband/hw/ipath/ipath_keys.c       |   12 
 drivers/infiniband/hw/ipath/ipath_mad.c        |   16 
 drivers/infiniband/hw/ipath/ipath_mr.c         |    3 
 drivers/infiniband/hw/ipath/ipath_qp.c         |   16 
 drivers/infiniband/hw/ipath/ipath_rc.c         |   77 +-
 drivers/infiniband/hw/ipath/ipath_registers.h  |   40 +
 drivers/infiniband/hw/ipath/ipath_ruc.c        |   14 
 drivers/infiniband/hw/ipath/ipath_srq.c        |   23 -
 drivers/infiniband/hw/ipath/ipath_sysfs.c      |   21 -
 drivers/infiniband/hw/ipath/ipath_uc.c         |    6 
 drivers/infiniband/hw/ipath/ipath_ud.c         |    6 
 drivers/infiniband/hw/ipath/ipath_user_pages.c |   56 +
 drivers/infiniband/hw/ipath/ipath_verbs.c      |   43 +
 drivers/infiniband/hw/ipath/ipath_verbs.h      |   18 
 drivers/infiniband/hw/ipath/ipath_wc_ppc64.c   |   20 
 drivers/infiniband/hw/ipath/ipath_wc_x86_64.c  |   13 
 drivers/infiniband/ulp/iser/Kconfig            |   13 
 drivers/infiniband/ulp/iser/iscsi_iser.c       |    2 
 drivers/infiniband/ulp/iser/iscsi_iser.h       |    9 
 drivers/infiniband/ulp/iser/iser_initiator.c   |   60 -
 drivers/infiniband/ulp/iser/iser_memory.c      |   42 +
 drivers/infiniband/ulp/iser/iser_verbs.c       |    8 
 38 files changed, 1973 insertions(+), 884 deletions(-)


From halr at voltaire.com  Thu Sep 28 11:26:52 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Sep 2006 14:26:52 -0400
Subject: [openib-general] [PATCH 3/3] IB/iser: fix the description of
 iSER in Kconfig
In-Reply-To: <Pine.LNX.4.44.0609271546020.20024-100000@hydrus>
References: <Pine.LNX.4.44.0609271546020.20024-100000@hydrus>
Message-ID: <1159468012.4353.318794.camel@hal.voltaire.com>

On Wed, 2006-09-27 at 09:48, Erez Zilber wrote:
> fix the description of iSER in Kconfig. It is not accurate.
> 
> Signed-off-by: Erez Zilber <erezz at voltaire.com>
> 
> ---
> 
>  drivers/infiniband/ulp/iser/Kconfig |   11 ++++++-----
>  1 files changed, 6 insertions(+), 5 deletions(-)
> 
> e6a8887cad4e2270c5173451e8b706b907b88133
> diff --git a/drivers/infiniband/ulp/iser/Kconfig b/drivers/infiniband/ulp/iser/Kconfig
> index fead87d..80f6716 100644
> --- a/drivers/infiniband/ulp/iser/Kconfig
> +++ b/drivers/infiniband/ulp/iser/Kconfig
> @@ -1,11 +1,12 @@
>  config INFINIBAND_ISER
> -	tristate "ISCSI RDMA Protocol"
> +	tristate "iSCSI Extensions for RDMA (iSER)"
>  	depends on INFINIBAND && SCSI
>  	select SCSI_ISCSI_ATTRS
>  	---help---
> -	  Support for the ISCSI RDMA Protocol over InfiniBand.  This
> -	  allows you to access storage devices that speak ISER/ISCSI
> +	  Support for the iSCSI Extensions for RDMA (iSER) Protocol over InfiniBand. This
> +	  allows you to access storage devices that speak iSCSI over iSER
>  	  over InfiniBand.
>  
> -	  The ISER protocol is defined by IETF.
> -	  See <http://www.ietf.org/>.
> +	  The iSER protocol is defined by IETF.
> +	  See <http://www.ietf.org/internet-drafts/draft-ietf-ips-iser-05.txt>
> +	  and <http://www.infinibandta.org/members/spec/iser_annex_060418.pdf>

This spec is now officially released from IBTA as an annex and the URL
is different.

-- Hal


From bos at pathscale.com  Thu Sep 28 11:33:52 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Thu, 28 Sep 2006 11:33:52 -0700
Subject: [openib-general] [PATCH 1 of 28] IB/ipath - limit # of packets
 sent without an ACK received
In-Reply-To: <adafyeb5012.fsf@cisco.com>
References: <c46292ccb0f54abc77f7.1159459197@eng-12.pathscale.com>
	<adafyeb5012.fsf@cisco.com>
Message-ID: <1159468432.5010.60.camel@chalcedony.pathscale.com>

On Thu, 2006-09-28 at 11:16 -0700, Roland Dreier wrote:
> I applied all except #24 with minor comments as sent separately.

Thanks!

	<b


From mlleinin at hpcn.ca.sandia.gov  Thu Sep 28 12:22:30 2006
From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger)
Date: Thu, 28 Sep 2006 12:22:30 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adawt7n50un.fsf@cisco.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost>  <adawt7n50un.fsf@cisco.com>
Message-ID: <1159471350.15009.237.camel@localhost>

On Thu, 2006-09-28 at 10:59 -0700, Roland Dreier wrote:
>     Matt>   I'd add one more thing.  To make the OFED release process
>     Matt> go more smoothly I'd like to see the maintainers for each
>     Matt> stack component spin out releases from time to time.  Roland
>     Matt> has been doing this with libmthca and libibverbs.  If we had
>     Matt> the development releases for other kernel and all user space
>     Matt> components then OFED could simple combine the latest
>     Matt> development releases and start more through testing.
> 
> Yes, I strongly support that, although the OFED benefits are just a
> side effect to me.  The real reason to have these releases is to
> support distributions other than OFED -- for example having tarball
> releases of all the components makes it possible to get this stuff
> further upstream into real Linux distros.
> 
  RedHat and SuSE have stated several times that they want an OFED like
process that takes the OF code and runs it through a rigorous suite of
regression and performance tests.  The purpose of OFED is to get into
the commercially supported distros (e.g RHEL and SLES).   That is what
the majority of end customers want/need.  That said spinning out
"pre-OFED" releases of each component would help to get the code into
the other distros (FC, Debian, Ubuntu, Gentoo, etc.) which, of course,
is a very good thing to do.

   Thanks,

	- Matt
 

From halr at voltaire.com  Thu Sep 28 12:20:49 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Sep 2006 15:20:49 -0400
Subject: [openib-general] [PATCH] osm_vendor_mlx_sa.c - missing status
 on timeout SA query
In-Reply-To: <868xk4zjro.fsf@mtl066.yok.mtl.com>
References: <868xk4zjro.fsf@mtl066.yok.mtl.com>
Message-ID: <1159471248.4353.320594.camel@hal.voltaire.com>

On Thu, 2006-09-28 at 00:40, Eitan Zahavi wrote:
> Hi Hal
> 
> Similar to the bug discovered by Yevgeny on the osm_vendor_ibumad_sa.c
> the very same bug happens on osm_vendor_mlx_sa.c which fails osmtest.
> The issue is that the status of the result of the query is not returned 
> as the result of the SA query.
> 
> Eitan
> 
> Signed-off-by:  Eitan Zahavi <eitan at mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From thlin at us.ibm.com  Thu Sep 28 12:43:15 2006
From: thlin at us.ibm.com (Tseng-Hui (Frank) Lin)
Date: Thu, 28 Sep 2006 14:43:15 -0500
Subject: [openib-general] FW: Mstflint - not working on ppc64
 andwhendriver is not loaded on AMD
In-Reply-To: <D4F8F0B3820E754C887699BEF26A8940EB85ED@taurus.voltaire.com>
References: <D4F8F0B3820E754C887699BEF26A8940EB85ED@taurus.voltaire.com>
Message-ID: <1159472595.21249.79.camel@flin.austin.ibm.com>

The ppc64 problem is actually in pci_64.c. Here is the patch:

============ cut here =============
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 4c4449b..490403c 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -734,9 +734,7 @@ static struct resource *__pci_mmap_make_
 	if (hose == 0)
 		return NULL;		/* should never happen */
 
-	/* If memory, add on the PCI bridge address offset */
 	if (mmap_state == pci_mmap_mem) {
-		*offset += hose->pci_mem_offset;
 		res_bit = IORESOURCE_MEM;
 	} else {
 		io_offset = (unsigned long)hose->io_base_virt - pci_io_base;
============= end cut =============

The mmap() system call on resource0 does not work on ppc64 without this
patch. PowerMAC G5 got away with this because its hose->pci_mem_offset
was set to 0.

The fix is made on 8/21. It may be able to make it into 2.6.19. But it
certainly won't get into SLES10, SLES9-SP3, or REHL4-U4 which have
already been released. 

To cover both cases with and without the fix, my patch try to
mmap /sys/bus/pci/..../resource0 first. It it failed it tries
mmap /proc/bus/pci/.... If it failed again, we have no choice but fall
back to use PCI config space.


On Thu, 2006-09-28 at 16:59 +0300, Moshe Kazir wrote:
> Michael,
> 
> Frank found the cause to the problem in the implementation of
> arch/ppc/kernel/pci.c , 
> and asked the IBM kernel group to send a bug fix to the Linux kernel
> group.
> 
> The problem is :
> 
> 1. This bug fix will not enter SLES10 as it is closed.
> 2. It also will not enter SLES9 :-) or Redhate as4 u4 .
> 
> So we need a bug fix that will enable the use of mstflint on js21 PPC64
> + backport to old systems  .
> 
> Franks fix is based on two points (if I understand the code with no
> errors) -
> 
> 1. It opens /proc/bus/pci... And not /sys/bus/pci/...
> 2. It perform an ictl(fd, PCIIOC_MMAP_IS_MEM) ;
> 
> Frank - am I write ?
> 
> Can we enter these two small changes to the mstflint to have it working
> on the PPC64 js21 ?
> 
> Moshe 
> 
> 
> 
> 
> 
> ____________________________________________________________
> Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
>  
> Voltaire - The Grid Backbone
>  
> www.voltaire.com
> 
>   
> 
> 
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] 
> Sent: Thursday, September 28, 2006 4:41 PM
> To: Moshe Kazir
> Cc: Tseng-Hui (Frank) Lin; openfabrics-ewg at openib.org;
> openib-general at openib.org
> Subject: Re: FW: Mstflint - not working on ppc64 andwhendriver is not
> loaded on AMD
> 
> 
> Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > 
> > Quoting r. Moshe Kazir <moshek at voltaire.com>:
> > > Subject: RE: FW: Mstflint - not working on ppc64 andwhendriver is 
> > > not
> > > loaded on AMD
> > > 
> > > 
> > >  # ls /sys/class/infiniband/mthca0/device/resource0
> > > /sys/class/infiniband/mthca0/device/resource0
> > 
> > OK, so can you try this please:
> > 
> > strace -f -v -o log  mstflint -d 
> > /sys/class/infiniband/mthca0/device/resource0 q
> > 
> > cat log
> > 
> > --
> > MST
> 
> 
> 
> > 30463 open("/sys/class/infiniband/mthca0/device/resource0",
> O_RDWR|O_SYNC|O_LARGEFILE) = 3
> > 30463 mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) =
> -1 EINVAL (Invalid argument)
> 
> So we see that mmap is failing with EINVAL.
> But why? We seem to be passing all valid parameters to it.
> 
> I'm looking at arch/ppc/kernel/pci.c at the moment.
> It seems that EINVAL is returned if __pci_mmap_make_offset
> fails, and that seems to be only looking for a valid resource size.
> 
> Are you up to finding the root cause of the problem in
> arch/ppc/kernel/pci.c?
> 
> Maybe the resource offsets are wrong? What does
> cat /sys/class/infiniband/mthca0/device/resource
> show?
> 
> Maybe there's some problem to map a full megabyte?
> Here's a test that only maps 4K. Could you strace it please?
> 
> >>>>>>>>>>>
> 
> #define _XOPEN_SOURCE 500
> #define _FILE_OFFSET_BITS 64
> 
> #include <stdio.h>
> 
> #include <unistd.h>
> 
> #include <netinet/in.h>
> #include <endian.h>
> #include <byteswap.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <string.h>
> #include <stdlib.h>
> 
> #include <sys/pci.h>
> #include <sys/ioctl.h>
> 
> #include <sys/mman.h>
> #include <sys/pci.h>
> #include <sys/stat.h>
> /* #include <sys/ioctl.h>
>  * #include <sys/types.h> */
> 
> int main()
> {
>         int fd;
>         unsigned value;
>         volatile void *ptr;
>         fd = open("/proc/bus/pci/00/00.0" ,O_RDWR | O_SYNC);
> 
>         /* ioctl(fd, PCIIOC_MMAP_IS_MEM); */
>         ptr = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
> 0xf0000);
>         memcpy(&value, (void*)(ptr + 0x14), sizeof value);
>         printf("0x%x\n");
>         return 0;
> }
> 
> 
> 


From halr at voltaire.com  Thu Sep 28 13:21:10 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Sep 2006 16:21:10 -0400
Subject: [openib-general] [PATCH 1/2] osm: osmtest ignores error status
In-Reply-To: <yzslko4owbv.fsf@kliteynik.yok.mtl.com>
References: <yzslko4owbv.fsf@kliteynik.yok.mtl.com>
Message-ID: <1159474864.4353.322640.camel@hal.voltaire.com>

On Thu, 2006-09-28 at 11:16, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> This patch takes care of several cases where osmtest
> ignored error status.
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From halr at voltaire.com  Thu Sep 28 13:44:11 2006
From: halr at voltaire.com (Hal Rosenstock)
Date: 28 Sep 2006 16:44:11 -0400
Subject: [openib-general] [PATCH 2/2] osm: osmtest ignores error status
In-Reply-To: <yzs8xk45877.fsf@kliteynik.yok.mtl.com>
References: <yzs8xk45877.fsf@kliteynik.yok.mtl.com>
Message-ID: <1159476246.4353.323381.camel@hal.voltaire.com>

On Thu, 2006-09-28 at 11:20, Yevgeny Kliteynik wrote:
> Hi Hal.
> 
> This patch takes care of several cases where osmtest
> ignored error status (plus some cosmetics).
> 
> Yevgeny
> 
> Signed-off-by:  Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>

Thanks. Applied to trunk only.

-- Hal


From jeremy at goop.org  Thu Sep 28 13:46:19 2006
From: jeremy at goop.org (Jeremy Fitzhardinge)
Date: Thu, 28 Sep 2006 13:46:19 -0700
Subject: [openib-general] [PATCH 24 of 28] IB/mthca - Fix compiler
 warnings with gcc4 on possible unitialized variables
In-Reply-To: <adaslib5057.fsf@cisco.com>
References: <9fa624c592af68f7a851.1159459220@eng-12.pathscale.com>
	<adaslib5057.fsf@cisco.com>
Message-ID: <451C349B.9020102@goop.org>

Roland Dreier wrote:
> NAK -- I don't want to generate worse code to fix a compiler warning
> false positive.
>   

Maybe we should have a "make defined" operation for this kind of thing:

    #define DEFVALUE(x)   asm("" : "=rm" (x))

Which is pretty ugly, I admit...

    J


From swise at opengridcomputing.com  Thu Sep 28 13:49:45 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Sep 2006 15:49:45 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adahcyr6gpw.fsf@cisco.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com>
Message-ID: <1159476585.30153.80.camel@stevo-desktop>

On Thu, 2006-09-28 at 10:31 -0700, Roland Dreier wrote:
>  > I have said this before, but I will repeat myself once again.
>  > I really do not care where the latest code is, but there needs
>  > to be ONE place where we can get all the latest code for development
>  > and testing.
> 
> I'll repeat my usual response: the notion of a single "latest" tree
> doesn't match reality, and any attempt to coerce things into that mold
> just causes problems.  There's not necessarily any correlation between
> the newest ipath code and Sean's RDMA CM.
> 
> git (or any other true distributed SCM system) makes this easier to
> handle: you can easily merge the branches you're interested in trying
> into your local tree.
> 
>  - R.

So I think we all agree on the need for a way to get a "latest" snapshot
of the kernel code (we argue a LOT about how this is done :).     

And at this point in time its definitely _not_ the svn trunk for some
kernel components.  Like infiniband/core, which is behind linus's git
tree for some things (eg iwarp), and ahead of linus's git tree for
others (eg ucma).  This is bad.  There's no way to get the latest code
with all features (eg iwarp + user cma).

The model we should adopt IMO is: linus's git tree + some set of patches
that compose the latest open fabrics kernel code.  The patches are all
in-process for going into linus's tree at some point.  And the
maintainer of that technology, (eg sean for ucma) will keep that patch
set up to date for folks to pull until it gets pulled into an upstream
git tree (like linus's or roland's).  With git and stg this is pretty
easy IMO.

So the kernel developers all adopt git and maintain their latest changes
that are always on top of linus's git tree, or roland's infiniband git
tree.  And we document where each component's patches or git tree is
located.  Perhaps on the openib wiki.  

OFED and others who build snapshots will have to pull from these
different components, merge them, test them, then release it as a
"snaphot".

Here is what we did for iwarp, which is an example of one of these
components:


Setup initial patch set:

- clone linus's git tree 

- clone roland's git tree and reference linus's tree

- create your stacked patchset


Whenever you want to get upstream updates from roland's tree and/or
linus's tree you do this:

- pop your patchset

- git pull from linus's tree to update your clone of his git tree.

- re-clone roland's tree again referencing linus's tree (this makes the
operation quicker). 

- push (and merge) your patchset back on top 


Dunno if all this helps the conversation, but we've argued about it for
a long time, and I thought maybe getting more specific on process and
how to maintain patches might help move things along...
 

STeve.


From rdreier at cisco.com  Thu Sep 28 13:55:31 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 13:55:31 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159471350.15009.237.camel@localhost> (Matt Leininger's
	message of "Thu, 28 Sep 2006 12:22:30 -0700")
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost>
Message-ID: <adawt7n3e4c.fsf@cisco.com>

    Matt>   RedHat and SuSE have stated several times that they want
    Matt> an OFED like process that takes the OF code and runs it
    Matt> through a rigorous suite of regression and performance
    Matt> tests.  The purpose of OFED is to get into the commercially
    Matt> supported distros (e.g RHEL and SLES).  That is what the
    Matt> majority of end customers want/need.  That said spinning out
    Matt> "pre-OFED" releases of each component would help to get the
    Matt> code into the other distros (FC, Debian, Ubuntu, Gentoo,
    Matt> etc.) which, of course, is a very good thing to do.

I think we've gotten mixed up about "release" vs. "distribution"
again.  I would say that all the packaging crap, which OFED does as a
short-term thing to make it possible for naive users to install, is
actually a big negative for RH and Novell -- they would rather package
and build software themselves.

What is missing is the tested, coordinated tarball release of OF
userspace stuff -- http://www.gnome.org/start/2.16/ might be a useful
model, particularly the "Getting GNOME 2.16" section.

Then if the OFED group wants to build a distribution, that's fine and
healthy.

 - R.


From rdreier at cisco.com  Thu Sep 28 14:00:59 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 14:00:59 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159476585.30153.80.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 28 Sep 2006 15:49:45 -0500")
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
Message-ID: <adaslib3dv8.fsf@cisco.com>

    Steve> So I think we all agree on the need for a way to get a
    Steve> "latest" snapshot of the kernel code (we argue a LOT about
    Steve> how this is done :).

Not to be difficult -- but I disagree.  I think this statement doesn't
actually make sense, because: ** what does "latest" mean?? **

Does someone who wants to check if the new ipath tree fixed a bug
really want to run my bleeding-edge IPoIB NAPI stuff?  Does someone
who wants to try IPoIB NAPI want to run possibly-broken bleeding edge
RDMA CM code?  etc. etc.

    Steve> The model we should adopt IMO is: linus's git tree + some
    Steve> set of patches that compose the latest open fabrics kernel
    Steve> code.  The patches are all in-process for going into
    Steve> linus's tree at some point.  And the maintainer of that
    Steve> technology, (eg sean for ucma) will keep that patch set up
    Steve> to date for folks to pull until it gets pulled into an
    Steve> upstream git tree (like linus's or roland's).  With git and
    Steve> stg this is pretty easy IMO.

Well, I think that's sort of reasonable, except that it has to be more
than one git branch.  All the in-process stuff should be on logically
separate "topic branches".  I'm happy to maintain for-2.6.x trees that
represent stuff queued for the current and next kernel release, but
stuff that hasn't been fully stabilized and reviewed should be kept
separate, and I'm happy to create branches in my git tree for any
other patch sets for developers who don't want to use git or don't
have a place to host things.

 - R.


From rdreier at cisco.com  Thu Sep 28 14:03:32 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 14:03:32 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adawt7n3e4c.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 28 Sep 2006 13:55:31 -0700")
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost> <adawt7n3e4c.fsf@cisco.com>
Message-ID: <adaodsz3dqz.fsf@cisco.com>

BTW, http://www.kernel.org/git/?p=linux/kernel/git/jgarzik/libata-dev.git;a=summary
is an example of what I'm talking about: notice how Jeff has branches
for specific changesets.

 - R.


From swise at opengridcomputing.com  Thu Sep 28 14:13:24 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Sep 2006 16:13:24 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adaslib3dv8.fsf@cisco.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com>
Message-ID: <1159478004.30153.98.camel@stevo-desktop>

On Thu, 2006-09-28 at 14:00 -0700, Roland Dreier wrote:
>     Steve> So I think we all agree on the need for a way to get a
>     Steve> "latest" snapshot of the kernel code (we argue a LOT about
>     Steve> how this is done :).
> 
> Not to be difficult -- but I disagree.  I think this statement doesn't
> actually make sense, because: ** what does "latest" mean?? **
> 

Perhaps "latest" was a bad word.

> Does someone who wants to check if the new ipath tree fixed a bug
> really want to run my bleeding-edge IPoIB NAPI stuff?  

No, they just want to test a bug in the ipath code.   They don't care
about iwarp or rdma cm probably either.

> Does someone
> who wants to try IPoIB NAPI want to run possibly-broken bleeding edge
> RDMA CM code?  etc. etc.
> 

Right,  there are users who DONT want that.  But there are users who
want:  dapl + user mode rdma cm + user mode iwarp + rdma cm kernel +
iwarp kernel + chelsio driver + ipath driver, for example.  They should
be able to pull these into a single tree and build it somehow.
Previously it was easy because everyone pushed their code into the svn
repos.  With the changing focus on feeding things into kernel.org I
think we need a new process. 

>     Steve> The model we should adopt IMO is: linus's git tree + some
>     Steve> set of patches that compose the latest open fabrics kernel
>     Steve> code.  The patches are all in-process for going into
>     Steve> linus's tree at some point.  And the maintainer of that
>     Steve> technology, (eg sean for ucma) will keep that patch set up
>     Steve> to date for folks to pull until it gets pulled into an
>     Steve> upstream git tree (like linus's or roland's).  With git and
>     Steve> stg this is pretty easy IMO.
> 
> Well, I think that's sort of reasonable, except that it has to be more
> than one git branch.  All the in-process stuff should be on logically
> separate "topic branches".  I'm happy to maintain for-2.6.x trees that
> represent stuff queued for the current and next kernel release, but
> stuff that hasn't been fully stabilized and reviewed should be kept
> separate, and I'm happy to create branches in my git tree for any
> other patch sets for developers who don't want to use git or don't
> have a place to host things.
> 

ok.  topic branches in your git tree or a set of git trees sounds
reasonable.    But to facilitate those trying to assemble bits and
pieces, we should provide documentation on where they get this stuff.
This _might_ help convince those who are hanging on to the svn idea to
adopt this new scheme...

Steve.


From robert.j.woodruff at intel.com  Thu Sep 28 14:25:35 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 28 Sep 2006 14:25:35 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBEF188@orsmsx418.amr.corp.intel.com>

Steve wrote,
>ok.  topic branches in your git tree or a set of git trees sounds
>reasonable.    But to facilitate those trying to assemble bits and
>pieces, we should provide documentation on where they get this stuff.
>This _might_ help convince those who are hanging on to the svn idea to
>adopt this new scheme...

>Steve.

Perhaps we need something similar to the concept of an MM tree
where new, more experimental patches, can be applied and tested
together before going into Roland's mainline git tree that is queued for
kernel.org.

Again, some sort of development branch like what we use to have
with SVN. Does not matter to me if this is git or SVN, but
a central data base is desirable so that people don't have to
get things from all over the place. 

There are definitely going to be early adopters that want to 
try out several of the new things, iWarp, rdma_cm, SDP, etc.
all at once from one code base, so having a way for them to
get versions of all the various components that are still under
development is what is needed.

What I don't want to see is what we have now. Things like iWarp that
are submitted upstream but do not work with some of the latest 
development code (in SVN) for the rdma_cm, SDP, etc. In that model,
there is no easy way for someone to get a version of all of the 
different pieces that all work together.

woody


From swise at opengridcomputing.com  Thu Sep 28 14:30:52 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Sep 2006 16:30:52 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adaodsz3dqz.fsf@cisco.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost> <adawt7n3e4c.fsf@cisco.com>
	<adaodsz3dqz.fsf@cisco.com>
Message-ID: <1159479052.30153.103.camel@stevo-desktop>


On Thu, 2006-09-28 at 14:03 -0700, Roland Dreier wrote:
> BTW, http://www.kernel.org/git/?p=linux/kernel/git/jgarzik/libata-dev.git;a=summary
> is an example of what I'm talking about: notice how Jeff has branches
> for specific changesets.
> 

I see.  

What might the branch layout look like today for openib?  This might
help clarify the idea.

Steve.


From mshefty at ichips.intel.com  Thu Sep 28 14:30:42 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Thu, 28 Sep 2006 14:30:42 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adaslib3dv8.fsf@cisco.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com>
Message-ID: <451C3F02.3000907@ichips.intel.com>

Roland Dreier wrote:
> Not to be difficult -- but I disagree.  I think this statement doesn't
> actually make sense, because: ** what does "latest" mean?? **

I think this is more a matter of whether there's a single, "main" development 
branch somewhere, or if one even needs to exist.

> Well, I think that's sort of reasonable, except that it has to be more
> than one git branch.  All the in-process stuff should be on logically
> separate "topic branches".

agreed

> I'm happy to create branches in my git tree for any
> other patch sets for developers who don't want to use git or don't
> have a place to host things.

Someday soon I hear, OFA will be able to host git repositories, so my preference 
is to delay any svn to git transition until then.  (I cannot host git from 
inside Intel's firewall, nor can I access a git repository which isn't hosted at 
kernel.org.)  How would you handle merging in changes from the main branch to 
side branches?

- Sean


From robert.j.woodruff at intel.com  Thu Sep 28 14:32:53 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 28 Sep 2006 14:32:53 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBEF1B4@orsmsx418.amr.corp.intel.com>

Steve wrote

>The model we should adopt IMO is: linus's git tree + some set of
patches
>that compose the latest open fabrics kernel code.  The patches are all
>in-process for going into linus's tree at some point.  And the
>maintainer of that technology, (eg sean for ucma) will keep that patch
>set up to date for folks to pull until it gets pulled into an upstream
>git tree (like linus's or roland's).  With git and stg this is pretty
>easy IMO.

>So the kernel developers all adopt git and maintain their latest
changes
>that are always on top of linus's git tree, or roland's infiniband git
>tree.  And we document where each component's patches or git tree is
>located.  Perhaps on the openib wiki.  

Perhaps if we used an MM tree model, the initial MM tree would be
cloned from Linus's git tree or Rolands tree that is queued for 
Linus, then people that want to test their new code with other 
more experimental code would sumbit a patch for the MM tree.
When a component is thought to be stable enough to go up stream,
a patch is then submitted to Roland for his git tree.

If there are changes made to Rolands tree for non-experimental 
components, the MM tree maintainer would periodically sink 
the MM tree to the mainline (Roland's) tree.

Would something like this work ?

woody


From tom at opengridcomputing.com  Thu Sep 28 14:40:15 2006
From: tom at opengridcomputing.com (Tom Tucker)
Date: Thu, 28 Sep 2006 16:40:15 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CBEF188@orsmsx418.amr.corp.intel.com>
Message-ID: <C141AB6F.A6EC%tom@opengridcomputing.com>


On 9/28/06 4:25 PM, "Woodruff, Robert J" <robert.j.woodruff at intel.com>
wrote:

> Steve wrote,
>> ok.  topic branches in your git tree or a set of git trees sounds
>> reasonable.    But to facilitate those trying to assemble bits and
>> pieces, we should provide documentation on where they get this stuff.
>> This _might_ help convince those who are hanging on to the svn idea to
>> adopt this new scheme...
> 
>> Steve.
> 
> Perhaps we need something similar to the concept of an MM tree
> where new, more experimental patches, can be applied and tested
> together before going into Roland's mainline git tree that is queued for
> kernel.org.
> 
> Again, some sort of development branch like what we use to have
> with SVN. Does not matter to me if this is git or SVN, but
> a central data base is desirable so that people don't have to
> get things from all over the place.

I think that there is an elephant in this room that everyone seems to be
ignoring -- no one is signed up to select and merge the relevant topic
branches together to create a unified, working "release candidate" and then
posting it in a convenient place for you to pull from.

Unless this developer resource problem is solved, you will be left with a
well defined (but empty) branch to pull from.


> 
> There are definitely going to be early adopters that want to
> try out several of the new things, iWarp, rdma_cm, SDP, etc.
> all at once from one code base, so having a way for them to
> get versions of all the various components that are still under
> development is what is needed.
> 
> What I don't want to see is what we have now. Things like iWarp that
> are submitted upstream but do not work with some of the latest
> development code (in SVN) for the rdma_cm, SDP, etc. In that model,
> there is no easy way for someone to get a version of all of the
> different pieces that all work together.
> 
> woody
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From swise at opengridcomputing.com  Thu Sep 28 14:40:59 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 28 Sep 2006 16:40:59 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CBEF1B4@orsmsx418.amr.corp.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEF1B4@orsmsx418.amr.corp.intel.com>
Message-ID: <1159479659.30153.108.camel@stevo-desktop>

On Thu, 2006-09-28 at 14:32 -0700, Woodruff, Robert J wrote:
> Steve wrote
> 
> >The model we should adopt IMO is: linus's git tree + some set of
> patches
> >that compose the latest open fabrics kernel code.  The patches are all
> >in-process for going into linus's tree at some point.  And the
> >maintainer of that technology, (eg sean for ucma) will keep that patch
> >set up to date for folks to pull until it gets pulled into an upstream
> >git tree (like linus's or roland's).  With git and stg this is pretty
> >easy IMO.
> 
> >So the kernel developers all adopt git and maintain their latest
> changes
> >that are always on top of linus's git tree, or roland's infiniband git
> >tree.  And we document where each component's patches or git tree is
> >located.  Perhaps on the openib wiki.  
> 
> Perhaps if we used an MM tree model, the initial MM tree would be
> cloned from Linus's git tree or Rolands tree that is queued for 
> Linus, then people that want to test their new code with other 
> more experimental code would sumbit a patch for the MM tree.
> When a component is thought to be stable enough to go up stream,
> a patch is then submitted to Roland for his git tree.
> 
> If there are changes made to Rolands tree for non-experimental 
> components, the MM tree maintainer would periodically sink 
> the MM tree to the mainline (Roland's) tree.
> 
> Would something like this work ?
> 

Yes, that seems like it would work.  We need an Andrew Morton. ;-)

Seriously, I think part of the issue here is getting the warm body that
will do that work...  

Steve.


From robert.j.woodruff at intel.com  Thu Sep 28 14:59:32 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Thu, 28 Sep 2006 14:59:32 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CBEF237@orsmsx418.amr.corp.intel.com>

Steve Wise wrote,
>Yes, that seems like it would work.  We need an Andrew Morton. ;-)

>Seriously, I think part of the issue here is getting the warm body that
>will do that work...  

>Steve.

Perhaps we could (Sean, Hal, and/or myself) could maintain such a
development (MM) tree
once OpenFabrics is able to host git as we already have to maintain
separate clones anyway to use for development of new features for the
Labs. 
I'll talk offline with Sean and Hal and see if we have the time to 
maintain an MM-like development branch. 
But until OpenFabrics can host git, I think 
we are stuck with SVN and the current mess
unless we asked to host the MM tree branch at kernel.org, and I am not
sure
what it takes to get kernel.org to host a git tree.

Thoughts ? 

woody


From rdreier at cisco.com  Thu Sep 28 16:16:37 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 16:16:37 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159479052.30153.103.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 28 Sep 2006 16:30:52 -0500")
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost> <adawt7n3e4c.fsf@cisco.com>
	<adaodsz3dqz.fsf@cisco.com> <1159479052.30153.103.camel@stevo-desktop>
Message-ID: <adak63n37l6.fsf@cisco.com>

    Steve> What might the branch layout look like today for openib?
    Steve> This might help clarify the idea.

I don't really know everything people are working on.  But we might
have ipoib-napi, ipath, ehca, ucma branches at least.


From rdreier at cisco.com  Thu Sep 28 16:18:24 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 16:18:24 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159479659.30153.108.camel@stevo-desktop> (Steve Wise's
	message of "Thu, 28 Sep 2006 16:40:59 -0500")
References: <BAE9DCEF64577A439B3A37F36F9B691CBEF1B4@orsmsx418.amr.corp.intel.com>
	<1159479659.30153.108.camel@stevo-desktop>
Message-ID: <adafyeb37i7.fsf@cisco.com>

    Steve> Yes, that seems like it would work.  We need an Andrew
    Steve> Morton. ;-)

    Steve> Seriously, I think part of the issue here is getting the
    Steve> warm body that will do that work...

It would be fairly easy to create a "union of all git development
branches" git branch, as long as I can use native git to get to all
the branches.  So I'm happy to maintain that, with updates once or
twice a week say.

 - R.


From rdreier at cisco.com  Thu Sep 28 16:19:37 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 16:19:37 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <BAE9DCEF64577A439B3A37F36F9B691CBEF237@orsmsx418.amr.corp.intel.com>
	(Robert J. Woodruff's message of "Thu, 28 Sep 2006 14:59:32 -0700")
References: <BAE9DCEF64577A439B3A37F36F9B691CBEF237@orsmsx418.amr.corp.intel.com>
Message-ID: <adabqoz37g6.fsf@cisco.com>

    Robert> such a development (MM) tree once OpenFabrics is able to
    Robert> host git as we already have to maintain separate clones
    Robert> anyway to use for development of new features for the
    Robert> Labs.  I'll talk offline with Sean and Hal and see if we
    Robert> have the time to maintain an MM-like development branch.
    Robert> But until OpenFabrics can host git, I think we are stuck
    Robert> with SVN and the current mess unless we asked to host the
    Robert> MM tree branch at kernel.org, and I am not sure what it
    Robert> takes to get kernel.org to host a git tree.

It's really easy to host git trees at kernel.org.  I don't really know
what the criteria are for getting a kernel.org account but I don't
think they're that stringent.

 - R.


From mlleinin at hpcn.ca.sandia.gov  Thu Sep 28 16:29:38 2006
From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger)
Date: Thu, 28 Sep 2006 16:29:38 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adawt7n3e4c.fsf@cisco.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost>  <adawt7n3e4c.fsf@cisco.com>
Message-ID: <1159486178.15009.302.camel@localhost>

On Thu, 2006-09-28 at 13:55 -0700, Roland Dreier wrote:
>     Matt>   RedHat and SuSE have stated several times that they want
>     Matt> an OFED like process that takes the OF code and runs it
>     Matt> through a rigorous suite of regression and performance
>     Matt> tests.  The purpose of OFED is to get into the commercially
>     Matt> supported distros (e.g RHEL and SLES).  That is what the
>     Matt> majority of end customers want/need.  That said spinning out
>     Matt> "pre-OFED" releases of each component would help to get the
>     Matt> code into the other distros (FC, Debian, Ubuntu, Gentoo,
>     Matt> etc.) which, of course, is a very good thing to do.
> 
> I think we've gotten mixed up about "release" vs. "distribution"
> again.  I would say that all the packaging crap, which OFED does as a
> short-term thing to make it possible for naive users to install, is
> actually a big negative for RH and Novell -- they would rather package
> and build software themselves.

  Fair point.  I don't like the way OFED is packaged.  It's messy and
just causes more problems than it is worth.  What I do like about OFED
is the rigorous testing that each company does.  It would be great if we
can include this rigorous testing into the OF release process. 

  
> 
> What is missing is the tested, coordinated tarball release of OF
> userspace stuff -- http://www.gnome.org/start/2.16/ might be a useful
> model, particularly the "Getting GNOME 2.16" section.
> 
  Yes, we need something like this.

  - Matt


From rdreier at cisco.com  Thu Sep 28 16:53:18 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 28 Sep 2006 16:53:18 -0700
Subject: [openib-general] Coverity found iSER bug?
In-Reply-To: <ada1wpv6fjx.fsf@cisco.com> (Roland Dreier's message of
	"Thu, 28 Sep 2006 10:56:18 -0700")
References: <Pine.LNX.4.44.0609271518230.20024-100000@hydrus>
	<ada1wpv6fjx.fsf@cisco.com>
Message-ID: <ada7izn35w1.fsf_-_@cisco.com>

(This is from the Coverity scanner, CID 1396)

In iser_initiator.c there is suspicious code in iser_rcv_completion().
We start with

	char   *rx_data = NULL;
	int     rx_data_len = 0;

and then do

	if (dto_xfer_len > ISER_TOTAL_HEADERS_LEN) { /* we have data */
		rx_data_len = dto_xfer_len - ISER_TOTAL_HEADERS_LEN;
		rx_data     = dto->regd[1]->virt_addr;
		rx_data    += dto->offset[1];
	}

I see no assignment to rx_data if dto_xfer_len <= ISER_TOTAL_HEADERS_LEN.
Then after a bunch of other stuff, we do

	iscsi_iser_recv(conn->iscsi_conn, hdr, rx_data, rx_data_len);

Coverity eventually follows this path to iscsi_scsi_cmd_rsp(), which
might dereference rx_data directly.

Is this a "can't happen" false positive or is there really a problem here?

 - R.


From swise at opengridcomputing.com  Fri Sep 29 06:38:12 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Sep 2006 08:38:12 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adak63n37l6.fsf@cisco.com>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost> <adawt7n3e4c.fsf@cisco.com>
	<adaodsz3dqz.fsf@cisco.com> <1159479052.30153.103.camel@stevo-desktop>
	<adak63n37l6.fsf@cisco.com>
Message-ID: <1159537092.21613.14.camel@stevo-desktop>

On Thu, 2006-09-28 at 16:16 -0700, Roland Dreier wrote:
>     Steve> What might the branch layout look like today for openib?
>     Steve> This might help clarify the idea.
> 
> I don't really know everything people are working on.  But we might
> have ipoib-napi, ipath, ehca, ucma branches at least.
> 

Add to that cxgb3 for chelsio's T3 drivers.


From jlentini at netapp.com  Fri Sep 29 07:58:45 2006
From: jlentini at netapp.com (James Lentini)
Date: Fri, 29 Sep 2006 10:58:45 -0400 (EDT)
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <C141AB6F.A6EC%tom@opengridcomputing.com>
References: <C141AB6F.A6EC%tom@opengridcomputing.com>
Message-ID: <Pine.LNX.4.64.0609291054580.9963@jlentini-linux.nane.netapp.com>


On Thu, 28 Sep 2006, Tom Tucker wrote:

> I think that there is an elephant in this room that everyone seems 
> to be ignoring -- no one is signed up to select and merge the 
> relevant topic branches together to create a unified, working 
> "release candidate" and then posting it in a convenient place for 
> you to pull from.
> 
> Unless this developer resource problem is solved, you will be left 
> with a well defined (but empty) branch to pull from.

I think there are two elephants in the room. What bout the dual 
license policy that is enforced by the OpenFabrics Alliance? 

Currently the OpenFabrics Alliance members require that all code 
committed to the OFA repository will be dual GPL/BSD licensed. If the 
source code is no longer hosted on OFA servers, who is going to 
guarantee that?


From jlentini at netapp.com  Fri Sep 29 09:26:41 2006
From: jlentini at netapp.com (James Lentini)
Date: Fri, 29 Sep 2006 12:26:41 -0400 (EDT)
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <451C3F02.3000907@ichips.intel.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com> <451C3F02.3000907@ichips.intel.com>
Message-ID: <Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>


On Thu, 28 Sep 2006, Sean Hefty wrote:

> Someday soon I hear, OFA will be able to host git repositories, so 
> my preference is to delay any svn to git transition until then.  (I 
> cannot host git from inside Intel's firewall, nor can I access a git 
> repository which isn't hosted at kernel.org.)  

Sean's concern brings to mind an important issue. The OFA repository 
is a common, neutral area to which we can all contribute. It would be 
a shame if we went back to the "dark ages" before OFA were every 
vendor had their own slightly different software stack.

Balkanizing the OFA repository into corporate repositories would be a 
mistake. It is likely that companies will restrict developers at HCA 
vendor X from contributing code to HCA vendor Y's repository.


From swise at opengridcomputing.com  Fri Sep 29 09:34:14 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Sep 2006 11:34:14 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com> <451C3F02.3000907@ichips.intel.com>
	<Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
Message-ID: <1159547654.21613.71.camel@stevo-desktop>

On Fri, 2006-09-29 at 12:26 -0400, James Lentini wrote:
> 
> On Thu, 28 Sep 2006, Sean Hefty wrote:
> 
> > Someday soon I hear, OFA will be able to host git repositories, so 
> > my preference is to delay any svn to git transition until then.  (I 
> > cannot host git from inside Intel's firewall, nor can I access a git 
> > repository which isn't hosted at kernel.org.)  
> 
> Sean's concern brings to mind an important issue. The OFA repository 
> is a common, neutral area to which we can all contribute. It would be 
> a shame if we went back to the "dark ages" before OFA were every 
> vendor had their own slightly different software stack.
> 
> Balkanizing the OFA repository into corporate repositories would be a 
> mistake. It is likely that companies will restrict developers at HCA 
> vendor X from contributing code to HCA vendor Y's repository.

I don't think anybody is suggesting corporate private git trees...

But just to state it clearly:  We either host git trees on kernel.org or
on openfabrics.org.


Steve.


From bos at pathscale.com  Fri Sep 29 10:03:51 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 29 Sep 2006 10:03:51 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159486178.15009.302.camel@localhost>
References: <1159300251.11549.6.camel@stevo-desktop>
	<1159300894.11549.11.camel@stevo-desktop> <adairj980le.fsf@cisco.com>
	<1159393578.21086.16.camel@chalcedony.pathscale.com>
	<20060928062723.GG23828@mellanox.co.il>
	<1159455506.11976.1.camel@chalcedony.pathscale.com>
	<Pine.LNX.4.64.0609281201210.9963@jlentini-linux.nane.netapp.com>
	<1159461093.5010.8.camel@chalcedony.pathscale.com>
	<1159463999.15009.207.camel@localhost> <adad59f6gma.fsf@cisco.com>
	<1159465954.15009.223.camel@localhost> <adawt7n50un.fsf@cisco.com>
	<1159471350.15009.237.camel@localhost> <adawt7n3e4c.fsf@cisco.com>
	<1159486178.15009.302.camel@localhost>
Message-ID: <1159549431.17595.21.camel@sardonyx>

On Thu, 2006-09-28 at 16:29 -0700, Matt Leininger wrote:

>   Fair point.  I don't like the way OFED is packaged.  It's messy and
> just causes more problems than it is worth.

+10

>   What I do like about OFED
> is the rigorous testing that each company does.  It would be great if we
> can include this rigorous testing into the OF release process. 

+1

	<b


From bos at pathscale.com  Fri Sep 29 10:24:27 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 29 Sep 2006 10:24:27 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com> <451C3F02.3000907@ichips.intel.com>
	<Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
Message-ID: <1159550667.17595.29.camel@sardonyx>

On Fri, 2006-09-29 at 12:26 -0400, James Lentini wrote:

> Balkanizing the OFA repository into corporate repositories would be a 
> mistake.

Nobody is suggesting this.  However, separating the mess that is the
current SVN trunk into a set of well-understood branches, each of which
sees some testing by its authors in isolation, can *only* be a good
thing for ensuring a higher-quality OF process in general.

>  It is likely that companies will restrict developers at HCA 
> vendor X from contributing code to HCA vendor Y's repository.

I doubt it.  As a practical matter, having your driver in the kernel
tree means it's open season for anyone who wants to take a crack at it.
Just look at the number of IB/10gbE/iWarp hardware vendors that have
fingerprints all over each other's code in drivers/infiniband/hw for an
example.

	<b


From bos at pathscale.com  Fri Sep 29 10:25:57 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 29 Sep 2006 10:25:57 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <Pine.LNX.4.64.0609291054580.9963@jlentini-linux.nane.netapp.com>
References: <C141AB6F.A6EC%tom@opengridcomputing.com>
	<Pine.LNX.4.64.0609291054580.9963@jlentini-linux.nane.netapp.com>
Message-ID: <1159550757.17595.31.camel@sardonyx>

On Fri, 2006-09-29 at 10:58 -0400, James Lentini wrote:

> Currently the OpenFabrics Alliance members require that all code 
> committed to the OFA repository will be dual GPL/BSD licensed. If the 
> source code is no longer hosted on OFA servers, who is going to 
> guarantee that?

It's been the responsibility of OFA members to ensure that all along.
Just because people are using a different revision control tool doesn't
have much bearing on that.

	<b


From xma at us.ibm.com  Fri Sep 29 10:19:15 2006
From: xma at us.ibm.com (Shirley Ma)
Date: Fri, 29 Sep 2006 10:19:15 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adaodt055to.fsf@cisco.com>
Message-ID: <OF769BC08C.9DA4407E-ON872571F8.005EF945-882571F8.005F24F3@us.ibm.com>


openib-general-bounces at openib.org wrote on 09/28/2006 09:11:47 AM:

>     Michael> Looked pretty simple on the outset, but oh well. Keep us
>     Michael> posted.
>
> I just work slowly.
>
> Anyway I don't think this is that urgent -- we've dumped enough stuff
> into 2.6.19, so I think this should wait for 2.6.20 at the earliest
anyway.

Please wait for other device drivers to finish the performance test. This
NAPI patch somehow kills ehca performance, extremly bad.

Shirley Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060929/1968ab7c/attachment.html>

From robert.j.woodruff at intel.com  Fri Sep 29 10:55:53 2006
From: robert.j.woodruff at intel.com (Woodruff, Robert J)
Date: Fri, 29 Sep 2006 10:55:53 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
Message-ID: <BAE9DCEF64577A439B3A37F36F9B691CC1EA55@orsmsx418.amr.corp.intel.com>

Steve wrote,
>I don't think anybody is suggesting corporate private git trees...

>But just to state it clearly:  We either host git trees on kernel.org
or
>on openfabrics.org.

>Steve.

Right now it looks like the OFED git tree is hosted on the mellanox
site,
not kernel.org or openfabrics.

>From the HOW to build documentation in OFED 1.1...

 mkdir gitdir
      cd gitdir
      git clone -s --bare git://www.mellanox.co.il/~git/infiniband .git
      git checkout ofed_1_1 `git-ls-tree -r --name-only ofed_1_1 \
                   include/rdma include/scsi/srp.h drivers/infiniband \
                   Documentation/infiniband ofed_scripts kernel_patches`


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From sean.hefty at intel.com  Fri Sep 29 11:47:06 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 11:47:06 -0700
Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of cm_id's in
 case of failures
Message-ID: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>

cma_connect_ib and cma_connect_iw leak cm_id's in failure cases.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Steve, I modified Krishna's patch to include a fix for iWarp as well.
Please verify that it looks okay.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1178bd4..69bb089 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1862,6 +1862,11 @@ static int cma_connect_ib(struct rdma_id
 
 	ret = ib_send_cm_req(id_priv->cm_id.ib, &req);
 out:
+	if (ret && !IS_ERR(id_priv->cm_id.ib)) {
+		ib_destroy_cm_id(id_priv->cm_id.ib);
+		id_priv->cm_id.ib = NULL;
+	}
+
 	kfree(private_data);
 	return ret;
 }
@@ -1889,10 +1894,8 @@ static int cma_connect_iw(struct rdma_id
 	cm_id->remote_addr = *sin;
 
 	ret = cma_modify_qp_rtr(&id_priv->id);
-	if (ret) {
-		iw_destroy_cm_id(cm_id);
-		return ret;
-	}
+	if (ret)
+		goto out;
 
 	iw_param.ord = conn_param->initiator_depth;
 	iw_param.ird = conn_param->responder_resources;
@@ -1904,6 +1907,10 @@ static int cma_connect_iw(struct rdma_id
 		iw_param.qpn = conn_param->qp_num;
 	ret = iw_cm_connect(cm_id, &iw_param);
 out:
+	if (ret && !IS_ERR(cm_id)) {
+		iw_destroy_cm_id(cm_id);
+		id_priv->cm_id.iw = NULL;
+	}
 	return ret;
 }
 

From sean.hefty at intel.com  Fri Sep 29 11:51:49 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 11:51:49 -0700
Subject: [openib-general] [PATCH 2/5] 2.6.19 rdma_cm: fix device removal race
In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>
Message-ID: <000201c6e3f8$51de5e40$ff0da8c0@amr.corp.intel.com>

The race is as follows:

A process : cma_process_remove() calls cma_remove_id_dev(),
	    which sets id state to CMA_DEVICE_REMOVAL and
	    calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
	    and calls cma_acquire_ib_dev() and on failure
	    calls cma_release_remove(), which does a
	    wake_up of cma_process_remove(). Then
	    cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
	    state of id, and since it is still (wrongly)
	    CMA_DEVICE_REMOVAL, it calls notify_user(id)
	    and if that fails, the caller - cma_process_remove()
	    calls rdma_destroy_id(id). Two processes can
	    call rdma_destroy_id(), resulting in one
	    de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 69bb089..f383a4f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -932,6 +932,7 @@ static int cma_req_handler(struct ib_cm_
 	mutex_unlock(&lock);
 	if (ret) {
 		ret = -ENODEV;
+		cma_exch(conn_id, CMA_DESTROYING);
 		cma_release_remove(conn_id);
 		rdma_destroy_id(&conn_id->id);
 		goto out;


From sean.hefty at intel.com  Fri Sep 29 11:57:09 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 11:57:09 -0700
Subject: [openib-general] [PATCH 3/5] 2.6.19 rdma_cm: set status correct on
 route resolution error
In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>
Message-ID: <000301c6e3f9$10eae830$ff0da8c0@amr.corp.intel.com>

On reporting a route error, also include the status for the error, rather than
indicating a status of 0 when an error has occurred.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index f383a4f..d10fdf1 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1308,6 +1308,7 @@ static void cma_query_handler(int status
 		work->old_state = CMA_ROUTE_QUERY;
 		work->new_state = CMA_ADDR_RESOLVED;
 		work->event.event = RDMA_CM_EVENT_ROUTE_ERROR;
+		work->event.status = status;
 	}
 
 	queue_work(cma_wq, &work->work);


From sean.hefty at intel.com  Fri Sep 29 12:03:35 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 12:03:35 -0700
Subject: [openib-general] [PATCH 4/5] 2.6.19 rdma_cm: eliminate unnecessary
	remove list
In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>
Message-ID: <000401c6e3f9$f718b030$ff0da8c0@amr.corp.intel.com>

Eliminate remove_list by using list_del_init instead during device removal
handling.

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
This removes a stack variable and simplifies the code, but does not fix
any bugs.  We can defer this to 2.6.20 if necessary.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d10fdf1..3982b81 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2151,12 +2151,9 @@ static int cma_remove_id_dev(struct rdma
 
 static void cma_process_remove(struct cma_device *cma_dev)
 {
-	struct list_head remove_list;
 	struct rdma_id_private *id_priv;
 	int ret;
 
-	INIT_LIST_HEAD(&remove_list);
-
 	mutex_lock(&lock);
 	while (!list_empty(&cma_dev->id_list)) {
 		id_priv = list_entry(cma_dev->id_list.next,
@@ -2167,8 +2164,7 @@ static void cma_process_remove(struct cm
 			continue;
 		}
 
-		list_del(&id_priv->list);
-		list_add_tail(&id_priv->list, &remove_list);
+		list_del_init(&id_priv->list);
 		atomic_inc(&id_priv->refcount);
 		mutex_unlock(&lock);
 

From sean.hefty at intel.com  Fri Sep 29 12:09:51 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 12:09:51 -0700
Subject: [openib-general] [PATCH 5/5] 2.6.19 rdma_cm: optimize error handling
In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>
Message-ID: <000501c6e3fa$d6f90150$ff0da8c0@amr.corp.intel.com>

Re-organize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
This does not fix a bug.  We can defer this to 2.6.20 if necessary.

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 3982b81..9ae4f3a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -874,23 +874,25 @@ static struct rdma_id_private *cma_new_i
 	__u16 port;
 	u8 ip_ver;
 
+	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
+			     &ip_ver, &port, &src, &dst))
+		goto err;
+
 	id = rdma_create_id(listen_id->event_handler, listen_id->context,
 			    listen_id->ps);
 	if (IS_ERR(id))
-		return NULL;
+		goto err;
+
+	cma_save_net_info(&id->route.addr, &listen_id->route.addr,
+			  ip_ver, port, src, dst);
 
 	rt = &id->route;
 	rt->num_paths = ib_event->param.req_rcvd.alternate_path ? 2 : 1;
-	rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths, GFP_KERNEL);
+	rt->path_rec = kmalloc(sizeof *rt->path_rec * rt->num_paths,
+			       GFP_KERNEL);
 	if (!rt->path_rec)
-		goto err;
+		goto destroy_id;
 
-	if (cma_get_net_info(ib_event->private_data, listen_id->ps,
-			     &ip_ver, &port, &src, &dst))
-		goto err;
-
-	cma_save_net_info(&id->route.addr, &listen_id->route.addr,
-			  ip_ver, port, src, dst);
 	rt->path_rec[0] = *ib_event->param.req_rcvd.primary_path;
 	if (rt->num_paths == 2)
 		rt->path_rec[1] = *ib_event->param.req_rcvd.alternate_path;
@@ -903,8 +905,10 @@ static struct rdma_id_private *cma_new_i
 	id_priv = container_of(id, struct rdma_id_private, id);
 	id_priv->state = CMA_CONNECT;
 	return id_priv;
-err:
+
+destroy_id:
 	rdma_destroy_id(id);
+err:
 	return NULL;
 }
 

From rdreier at cisco.com  Fri Sep 29 12:45:23 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 12:45:23 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159547654.21613.71.camel@stevo-desktop> (Steve Wise's
	message of "Fri, 29 Sep 2006 11:34:14 -0500")
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com> <451C3F02.3000907@ichips.intel.com>
	<Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
	<1159547654.21613.71.camel@stevo-desktop>
Message-ID: <adalko21mp8.fsf@cisco.com>

    Steve> I don't think anybody is suggesting corporate private git
    Steve> trees...

    Steve> But just to state it clearly: We either host git trees on
    Steve> kernel.org or on openfabrics.org.

Why?  I don't see anything wrong with the git trees that are at
www.mellanox.co.il right now.

As long as we agree that Linus's tree is the ultimate destination for
Linux drivers, I don't think the domain name that people use to
publish their work-in-progress trees matters at all.

 - R.


From swise at opengridcomputing.com  Fri Sep 29 12:47:50 2006
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 29 Sep 2006 14:47:50 -0500
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <adalko21mp8.fsf@cisco.com>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com> <451C3F02.3000907@ichips.intel.com>
	<Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
	<1159547654.21613.71.camel@stevo-desktop> <adalko21mp8.fsf@cisco.com>
Message-ID: <1159559270.21613.94.camel@stevo-desktop>

On Fri, 2006-09-29 at 12:45 -0700, Roland Dreier wrote:
>     Steve> I don't think anybody is suggesting corporate private git
>     Steve> trees...
> 
>     Steve> But just to state it clearly: We either host git trees on
>     Steve> kernel.org or on openfabrics.org.
> 
> Why?  I don't see anything wrong with the git trees that are at
> www.mellanox.co.il right now.
> 
> As long as we agree that Linus's tree is the ultimate destination for
> Linux drivers, I don't think the domain name that people use to
> publish their work-in-progress trees matters at all.
> 
>  - R.

Just trying to simplify things and centralize the technology location...


From rdreier at cisco.com  Fri Sep 29 12:48:37 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 12:48:37 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <Pine.LNX.4.64.0609291054580.9963@jlentini-linux.nane.netapp.com>
	(James Lentini's message of "Fri, 29 Sep 2006 10:58:45 -0400 (EDT)")
References: <C141AB6F.A6EC%tom@opengridcomputing.com>
	<Pine.LNX.4.64.0609291054580.9963@jlentini-linux.nane.netapp.com>
Message-ID: <adahcyq1mju.fsf@cisco.com>

    James> I think there are two elephants in the room. What bout the
    James> dual license policy that is enforced by the OpenFabrics
    James> Alliance?

    James> Currently the OpenFabrics Alliance members require that all
    James> code committed to the OFA repository will be dual GPL/BSD
    James> licensed. If the source code is no longer hosted on OFA
    James> servers, who is going to guarantee that?

I would call this more of a red herring than an elephant.  Right now
their is nothing that prevents me or anyone else from writing GPL-only
code and getting it merged into Linus's tree.  When I pointed this out
before, your response was that such code would not be part of the
OpenFabrics stack -- and I think that's exactly the answer to the
issue you're raising:

For better or for worse, the OFA marketing has created a peer pressure
situation that all the IB and RDMA vendors feel compelled to play
along with.  And if GPL-only code doesn't get the OFA stamp of
approval, then vendors aren't going to do that.  The domain name of a
source code repository is pretty irrelevant here.

(Not to mention the fact that no one is enforcing the dual license on
things that _are_ checked into openib.org anyway...)

 - R.


From rdreier at cisco.com  Fri Sep 29 12:49:14 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 12:49:14 -0700
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <OF769BC08C.9DA4407E-ON872571F8.005EF945-882571F8.005F24F3@us.ibm.com>
	(Shirley Ma's message of "Fri, 29 Sep 2006 10:19:15 -0700")
References: <OF769BC08C.9DA4407E-ON872571F8.005EF945-882571F8.005F24F3@us.ibm.com>
Message-ID: <adad59e1mit.fsf@cisco.com>

    Shirley> Please wait for other device drivers to finish the
    Shirley> performance test. This NAPI patch somehow kills ehca
    Shirley> performance, extremly bad.

Which NAPI patch?  The last one posted can't work for non-Mellanox
devices, since it doesn't handle the "rotting packet" issue.

 - R.


From rdreier at cisco.com  Fri Sep 29 12:50:42 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 12:50:42 -0700
Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of
 cm_id's in case of failures
In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> (Sean
	Hefty's message of "Fri, 29 Sep 2006 11:47:06 -0700")
References: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>
Message-ID: <ada8xk21mgd.fsf@cisco.com>

Are all the patches here that are 'Signed-off-by: Krishna Kumar <krkumar2 at in.ibm.com>'
also 'From: Krishna Kumar <krkumar2 at in.ibm.com>' ?

I try to keep the right authorship information when merging stuff...


From sean.hefty at intel.com  Fri Sep 29 13:37:06 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 13:37:06 -0700
Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of
 cm_id's in case of failures
In-Reply-To: <ada8xk21mgd.fsf@cisco.com>
Message-ID: <000601c6e407$07b31a40$ff0da8c0@amr.corp.intel.com>

>Are all the patches here that are 'Signed-off-by: Krishna Kumar
><krkumar2 at in.ibm.com>'
>also 'From: Krishna Kumar <krkumar2 at in.ibm.com>' ?
>
>I try to keep the right authorship information when merging stuff...

Yes.  I made additional changes to patch 1 to include iwarp in the fixes.  The
rest are unmodified patches from Krishna.

- Sean


From mshefty at ichips.intel.com  Fri Sep 29 13:44:28 2006
From: mshefty at ichips.intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 13:44:28 -0700
Subject: [openib-general] 2.6.18 kernel support in the main trunk.
In-Reply-To: <1159559270.21613.94.camel@stevo-desktop>
References: <BAE9DCEF64577A439B3A37F36F9B691CBEEC08@orsmsx418.amr.corp.intel.com>
	<adahcyr6gpw.fsf@cisco.com> <1159476585.30153.80.camel@stevo-desktop>
	<adaslib3dv8.fsf@cisco.com> <451C3F02.3000907@ichips.intel.com>
	<Pine.LNX.4.64.0609291216241.9963@jlentini-linux.nane.netapp.com>
	<1159547654.21613.71.camel@stevo-desktop> <adalko21mp8.fsf@cisco.com>
	<1159559270.21613.94.camel@stevo-desktop>
Message-ID: <451D85AC.60603@ichips.intel.com>

Steve Wise wrote:
>>Why?  I don't see anything wrong with the git trees that are at
>>www.mellanox.co.il right now.
> 
> Just trying to simplify things and centralize the technology location...

Well, for myself, I have been unable to access the git trees at mellanox.  For 
me to access git repositories, I need access through Intel's firewall.  Today 
I'm restricted to kernel.org only, unless the git repositories are accessible 
using http.  (Yes, it's a personal problem, but not one easily fixed...)

- Sean


From bos at pathscale.com  Fri Sep 29 14:37:51 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 29 Sep 2006 14:37:51 -0700
Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads
Message-ID: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com>

The PSN used to generate the request following a RDMA read was incorrect
and some state booking wasn't maintained correctly.
This patch fixes that.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan at qlogic.com>

diff -r ac3953427dbf -r 7b2b5b33a248 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c	Fri Sep 29 14:20:17 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c	Fri Sep 29 14:20:40 2006 -0700
@@ -241,10 +241,7 @@ int ipath_make_rc_req(struct ipath_qp *q
 		 * original work request since we may need to resend
 		 * it.
 		 */
-		qp->s_sge.sge = wqe->sg_list[0];
-		qp->s_sge.sg_list = wqe->sg_list + 1;
-		qp->s_sge.num_sge = wqe->wr.num_sge;
-		qp->s_len = len = wqe->length;
+		len = wqe->length;
 		ss = &qp->s_sge;
 		bth2 = 0;
 		switch (wqe->wr.opcode) {
@@ -368,14 +365,23 @@ int ipath_make_rc_req(struct ipath_qp *q
 		default:
 			goto done;
 		}
+		qp->s_sge.sge = wqe->sg_list[0];
+		qp->s_sge.sg_list = wqe->sg_list + 1;
+		qp->s_sge.num_sge = wqe->wr.num_sge;
+		qp->s_len = wqe->length;
 		if (newreq) {
 			qp->s_tail++;
 			if (qp->s_tail >= qp->s_size)
 				qp->s_tail = 0;
 		}
-		bth2 |= qp->s_psn++ & IPATH_PSN_MASK;
-		if ((int)(qp->s_psn - qp->s_next_psn) > 0)
-			qp->s_next_psn = qp->s_psn;
+		bth2 |= qp->s_psn & IPATH_PSN_MASK;
+		if (wqe->wr.opcode == IB_WR_RDMA_READ)
+			qp->s_psn = wqe->lpsn + 1;
+		else {
+			qp->s_psn++;
+			if ((int)(qp->s_psn - qp->s_next_psn) > 0)
+				qp->s_next_psn = qp->s_psn;
+		}
 		/*
 		 * Put the QP on the pending list so lost ACKs will cause
 		 * a retry.  More than one request can be pending so the
@@ -690,13 +696,6 @@ void ipath_restart_rc(struct ipath_qp *q
 	struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);
 	struct ipath_ibdev *dev;
 
-	/*
-	 * If there are no requests pending, we are done.
-	 */
-	if (ipath_cmp24(psn, qp->s_next_psn) >= 0 ||
-	    qp->s_last == qp->s_tail)
-		goto done;
-
 	if (qp->s_retry == 0) {
 		wc->wr_id = wqe->wr.wr_id;
 		wc->status = IB_WC_RETRY_EXC_ERR;
@@ -731,8 +730,6 @@ void ipath_restart_rc(struct ipath_qp *q
 		dev->n_rc_resends += (int)qp->s_psn - (int)psn;
 
 	reset_psn(qp, psn);
-
-done:
 	tasklet_hi_schedule(&qp->s_task);
 
 bail:
@@ -765,6 +762,7 @@ static int do_rc_ack(struct ipath_qp *qp
 	struct ib_wc wc;
 	struct ipath_swqe *wqe;
 	int ret = 0;
+	u32 ack_psn;
 
 	/*
 	 * Remove the QP from the timeout queue (or RNR timeout queue).
@@ -777,26 +775,26 @@ static int do_rc_ack(struct ipath_qp *qp
 		list_del_init(&qp->timerwait);
 	spin_unlock(&dev->pending_lock);
 
+	/* Nothing is pending to ACK/NAK. */
+	if (unlikely(qp->s_last == qp->s_tail))
+		goto bail;
+
 	/*
 	 * Note that NAKs implicitly ACK outstanding SEND and RDMA write
 	 * requests and implicitly NAK RDMA read and atomic requests issued
 	 * before the NAK'ed request.  The MSN won't include the NAK'ed
 	 * request but will include an ACK'ed request(s).
 	 */
+	ack_psn = psn;
+	if (aeth >> 29)
+		ack_psn--;
 	wqe = get_swqe_ptr(qp, qp->s_last);
-
-	/* Nothing is pending to ACK/NAK. */
-	if (qp->s_last == qp->s_tail)
-		goto bail;
 
 	/*
 	 * The MSN might be for a later WQE than the PSN indicates so
 	 * only complete WQEs that the PSN finishes.
 	 */
-	while (ipath_cmp24(psn, wqe->lpsn) >= 0) {
-		/* If we are ACKing a WQE, the MSN should be >= the SSN. */
-		if (ipath_cmp24(aeth, wqe->ssn) < 0)
-			break;
+	while (ipath_cmp24(ack_psn, wqe->lpsn) >= 0) {
 		/*
 		 * If this request is a RDMA read or atomic, and the ACK is
 		 * for a later operation, this ACK NAKs the RDMA read or
@@ -807,7 +805,8 @@ static int do_rc_ack(struct ipath_qp *qp
 		 * is sent but before the response is received.
 		 */
 		if ((wqe->wr.opcode == IB_WR_RDMA_READ &&
-		     opcode != OP(RDMA_READ_RESPONSE_LAST)) ||
+		     (opcode != OP(RDMA_READ_RESPONSE_LAST) ||
+		       ipath_cmp24(ack_psn, wqe->lpsn) != 0)) ||
 		    ((wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
 		      wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) &&
 		     (opcode != OP(ATOMIC_ACKNOWLEDGE) ||
@@ -825,6 +824,10 @@ static int do_rc_ack(struct ipath_qp *qp
 			 */
 			goto bail;
 		}
+		if (wqe->wr.opcode == IB_WR_RDMA_READ ||
+		    wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+		    wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
+			tasklet_hi_schedule(&qp->s_task);
 		/* Post a send completion queue entry if requested. */
 		if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &qp->s_flags) ||
 		    (wqe->wr.send_flags & IB_SEND_SIGNALED)) {
@@ -1055,7 +1058,8 @@ static inline void ipath_rc_rcv_resp(str
 		/* no AETH, no ACK */
 		if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) {
 			dev->n_rdma_seq++;
-			ipath_restart_rc(qp, qp->s_last_psn + 1, &wc);
+			if (qp->s_last != qp->s_tail)
+				ipath_restart_rc(qp, qp->s_last_psn + 1, &wc);
 			goto ack_done;
 		}
 	rdma_read:
@@ -1091,7 +1095,8 @@ static inline void ipath_rc_rcv_resp(str
 		/* ACKs READ req. */
 		if (unlikely(ipath_cmp24(psn, qp->s_last_psn + 1))) {
 			dev->n_rdma_seq++;
-			ipath_restart_rc(qp, qp->s_last_psn + 1, &wc);
+			if (qp->s_last != qp->s_tail)
+				ipath_restart_rc(qp, qp->s_last_psn + 1, &wc);
 			goto ack_done;
 		}
 		/* FALLTHROUGH */


From rdreier at cisco.com  Fri Sep 29 14:39:32 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 14:39:32 -0700
Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads
In-Reply-To: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> (
	Bryan O'Sullivan's message of "Fri, 29 Sep 2006 14:37:51 -0700")
References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com>
Message-ID: <ada4puq1hez.fsf@cisco.com>

I assume this is 'From: Ralph Campbell <ralph.campbell at qlogic.com>'
(based on the sign-off)?


From bos at pathscale.com  Fri Sep 29 14:41:36 2006
From: bos at pathscale.com (Bryan O'Sullivan)
Date: Fri, 29 Sep 2006 14:41:36 -0700
Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads
In-Reply-To: <ada4puq1hez.fsf@cisco.com>
References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com>
	<ada4puq1hez.fsf@cisco.com>
Message-ID: <1159566096.4841.13.camel@chalcedony.pathscale.com>

On Fri, 2006-09-29 at 14:39 -0700, Roland Dreier wrote:
> I assume this is 'From: Ralph Campbell <ralph.campbell at qlogic.com>'
> (based on the sign-off)?

Yes, my patchbomb script ate the "From:".

Thanks,

	<b


From rdreier at cisco.com  Fri Sep 29 14:44:59 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 14:44:59 -0700
Subject: [openib-general] [PATCH 1/5] 2.6.19 rdma_cm: fix leak of
 cm_id's in case of failures
In-Reply-To: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com> (Sean
	Hefty's message of "Fri, 29 Sep 2006 11:47:06 -0700")
References: <000101c6e3f7$a998d2b0$ff0da8c0@amr.corp.intel.com>
Message-ID: <adazmciz6sk.fsf@cisco.com>

Thanks, applied 1-5 to for-2.6.19.

The merge worked very smoothly -- my only request would be to track
authorship of patches you forward on (by including a "From:" line with
the original author).


From rdreier at cisco.com  Fri Sep 29 14:46:42 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 14:46:42 -0700
Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads
In-Reply-To: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com> (
	Bryan O'Sullivan's message of "Fri, 29 Sep 2006 14:37:51 -0700")
References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com>
Message-ID: <adaven6z6pp.fsf@cisco.com>

Thanks, applied (I assumed Ralph was the author when merging, please
let me know if that was wrong)


From ralphc at pathscale.com  Fri Sep 29 15:14:48 2006
From: ralphc at pathscale.com (Ralph Campbell)
Date: Fri, 29 Sep 2006 15:14:48 -0700
Subject: [openib-general] [PATCH] IB/ipath - fix RDMA reads
In-Reply-To: <adaven6z6pp.fsf@cisco.com>
References: <7b2b5b33a24891601ac1.1159565871@eng-12.pathscale.com>
	<adaven6z6pp.fsf@cisco.com>
Message-ID: <1159568088.29948.14.camel@brick.pathscale.com>

Yes, I am the author.

On Fri, 2006-09-29 at 14:46 -0700, Roland Dreier wrote:
> Thanks, applied (I assumed Ralph was the author when merging, please
> let me know if that was wrong)
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


From sean.hefty at intel.com  Fri Sep 29 16:52:26 2006
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 29 Sep 2006 16:52:26 -0700
Subject: [openib-general] [PATCH] ib_cm: fix module unload race with timewait
In-Reply-To: <451A2E7E.8050504@voltaire.com>
Message-ID: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com>

If the ib_cm module is unloaded while id's are still in timewait,
the CM will destroy the work queue used to process timewait.  Once
the id's exit timewait, their timers will fire, leading to a crash
trying to access the destroyed work queue.

We need to track id's that are in timewait, and cancel their deferred
work on module unload.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>
---
Erez, can you see if this fixes the crash problem that you're seeing?

Index: cm.c
===================================================================
--- cm.c	(revision 9680)
+++ cm.c	(working copy)
@@ -75,6 +75,7 @@
 	struct rb_root remote_sidr_table;
 	struct idr local_id_table;
 	__be32 random_id_operand;
+	struct list_head timewait_list;
 	struct workqueue_struct *wq;
 } cm;
 
@@ -112,6 +113,7 @@
 
 struct cm_timewait_info {
 	struct cm_work work;			/* Must be first. */
+	struct list_head list;
 	struct rb_node remote_qp_node;
 	struct rb_node remote_id_node;
 	__be64 remote_ca_guid;
@@ -648,13 +650,6 @@
 
 static void cm_cleanup_timewait(struct cm_timewait_info *timewait_info)
 {
-	unsigned long flags;
-
-	if (!timewait_info->inserted_remote_id &&
-	    !timewait_info->inserted_remote_qp)
-	    return;
-
-	spin_lock_irqsave(&cm.lock, flags);
 	if (timewait_info->inserted_remote_id) {
 		rb_erase(&timewait_info->remote_id_node, &cm.remote_id_table);
 		timewait_info->inserted_remote_id = 0;
@@ -664,7 +659,6 @@
 		rb_erase(&timewait_info->remote_qp_node, &cm.remote_qp_table);
 		timewait_info->inserted_remote_qp = 0;
 	}
-	spin_unlock_irqrestore(&cm.lock, flags);
 }
 
 static struct cm_timewait_info * cm_create_timewait_info(__be32 local_id)
@@ -685,8 +679,12 @@
 static void cm_enter_timewait(struct cm_id_private *cm_id_priv)
 {
 	int wait_time;
+	unsigned long flags;
 
+	spin_lock_irqsave(&cm.lock, flags);
 	cm_cleanup_timewait(cm_id_priv->timewait_info);
+	list_add_tail(&cm_id_priv->timewait_info->list, &cm.timewait_list);
+	spin_unlock_irqrestore(&cm.lock, flags);
 
 	/*
 	 * The cm_id could be destroyed by the user before we exit timewait.
@@ -702,9 +700,13 @@
 
 static void cm_reset_to_idle(struct cm_id_private *cm_id_priv)
 {
+	unsigned long flags;
+
 	cm_id_priv->id.state = IB_CM_IDLE;
 	if (cm_id_priv->timewait_info) {
+		spin_lock_irqsave(&cm.lock, flags);
 		cm_cleanup_timewait(cm_id_priv->timewait_info);
+		spin_unlock_irqrestore(&cm.lock, flags);
 		kfree(cm_id_priv->timewait_info);
 		cm_id_priv->timewait_info = NULL;
 	}
@@ -1308,6 +1310,7 @@
 	if (timewait_info) {
 		cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
 					   timewait_info->work.remote_id);
+		cm_cleanup_timewait(cm_id_priv->timewait_info);
 		spin_unlock_irqrestore(&cm.lock, flags);
 		if (cur_cm_id_priv) {
 			cm_dup_req_handler(work, cur_cm_id_priv);
@@ -1316,7 +1319,8 @@
 			cm_issue_rej(work->port, work->mad_recv_wc,
 				     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
 				     NULL, 0);
-		goto error;
+		listen_cm_id_priv = NULL;
+		goto out;
 	}
 
 	/* Find matching listen request. */
@@ -1324,21 +1328,20 @@
 					   req_msg->service_id,
 					   req_msg->private_data);
 	if (!listen_cm_id_priv) {
+		cm_cleanup_timewait(cm_id_priv->timewait_info);
 		spin_unlock_irqrestore(&cm.lock, flags);
 		cm_issue_rej(work->port, work->mad_recv_wc,
 			     IB_CM_REJ_INVALID_SERVICE_ID, CM_MSG_RESPONSE_REQ,
 			     NULL, 0);
-		goto error;
+		goto out;
 	}
 	atomic_inc(&listen_cm_id_priv->refcount);
 	atomic_inc(&cm_id_priv->refcount);
 	cm_id_priv->id.state = IB_CM_REQ_RCVD;
 	atomic_inc(&cm_id_priv->work_count);
 	spin_unlock_irqrestore(&cm.lock, flags);
+out:
 	return listen_cm_id_priv;
-
-error:	cm_cleanup_timewait(cm_id_priv->timewait_info);
-	return NULL;
 }
 
 static int cm_req_handler(struct cm_work *work)
@@ -2634,7 +2637,9 @@
 	int ret;
 
 	timewait_info = (struct cm_timewait_info *)work;
-	cm_cleanup_timewait(timewait_info);
+	spin_lock_irqsave(&cm.lock, flags);
+	list_del(&timewait_info->list);
+	spin_unlock_irqrestore(&cm.lock, flags);
 
 	cm_id_priv = cm_acquire_id(timewait_info->work.local_id,
 				   timewait_info->work.remote_id);
@@ -3434,6 +3439,7 @@
 	idr_init(&cm.local_id_table);
 	get_random_bytes(&cm.random_id_operand, sizeof cm.random_id_operand);
 	idr_pre_get(&cm.local_id_table, GFP_KERNEL);
+	INIT_LIST_HEAD(&cm.timewait_list);
 
 	cm.wq = create_workqueue("ib_cm");
 	if (!cm.wq)
@@ -3451,7 +3457,23 @@
 
 static void __exit ib_cm_cleanup(void)
 {
+	struct cm_timewait_info *timewait_info;
+	unsigned long flags;
+
+	spin_lock_irqsave(&cm.lock, flags);
+	list_for_each_entry(timewait_info, &cm.timewait_list, list)
+		cancel_delayed_work(&timewait_info->work.work);
+	spin_unlock_irqrestore(&cm.lock, flags);
+
 	destroy_workqueue(cm.wq);
+
+	while (!list_empty(&cm.timewait_list)) {
+		timewait_info = container_of(cm.timewait_list.next,
+					     struct cm_timewait_info, list);
+		list_del(&timewait_info->list);
+		kfree(timewait_info);
+	}
+
 	ib_unregister_client(&cm_client);
 	idr_destroy(&cm.local_id_table);
 }


From rdreier at cisco.com  Fri Sep 29 17:10:05 2006
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 29 Sep 2006 17:10:05 -0700
Subject: [openib-general] [PATCH] ib_cm: fix module unload race with
 timewait
In-Reply-To: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com> (Sean
	Hefty's message of "Fri, 29 Sep 2006 16:52:26 -0700")
References: <000701c6e422$51589a60$ff0da8c0@amr.corp.intel.com>
Message-ID: <adamz8iz02q.fsf@cisco.com>

Some of these spin_lock_irqsave()s are too conservative.  For example:

 >  static void __exit ib_cm_cleanup(void)
 >  {
 > +	struct cm_timewait_info *timewait_info;
 > +	unsigned long flags;
 > +
 > +	spin_lock_irqsave(&cm.lock, flags);
 > +	list_for_each_entry(timewait_info, &cm.timewait_list, list)
 > +		cancel_delayed_work(&timewait_info->work.work);
 > +	spin_unlock_irqrestore(&cm.lock, flags);
 > +
 >  	destroy_workqueue(cm.wq);

destroy_workqueue() can only be called in process context -- so it
is fine to just use spin_lock_irq() above.

 > +
 > +	while (!list_empty(&cm.timewait_list)) {
 > +		timewait_info = container_of(cm.timewait_list.next,
 > +					     struct cm_timewait_info, list);
 > +		list_del(&timewait_info->list);
 > +		kfree(timewait_info);
 > +	}

list_for_each_entry_safe() here?  I assume nothing is getting added to
the list while the loop runs...

 > +
 >  	ib_unregister_client(&cm_client);
 >  	idr_destroy(&cm.local_id_table);
 >  }


From ebiederm at xmission.com  Fri Sep 29 17:36:25 2006
From: ebiederm at xmission.com (ebiederm at xmission.com)
Date: Fri, 29 Sep 2006 18:36:25 -0600
Subject: [openib-general] [PATCH 0 of 28] ipath patches for 2.6.19
In-Reply-To: <patchbomb.1159459196@eng-12.pathscale.com> (Bryan
	O'Sullivan's message of "Thu, 28 Sep 2006 08:59:56 -0700")
References: <patchbomb.1159459196@eng-12.pathscale.com>
Message-ID: <m1irj6gph2.fsf@ebiederm.dsl.xmission.com>

"Bryan O'Sullivan" <bos at pathscale.com> writes:

> Hi, Roland -
>
> This patch series brings the ipath driver almost up to date with what's
> in our internal tree.  The only substantial thing missing is the
> memcpy_cachebypass patch that I sent out a while back and haven't had
> time to rework.
>
> These patches have seen a lot of testing, including on a git snapshot
> as of yesterday afternoon.  Please apply.

Have you tested your driver against the -mm tree?

To the best of my knowledge the irq handling of your hypertransport card
is a complete and total hack that works only by chance.

In the -mm tree I have added a first pass at proper support for the
hypertranport interrupt capability.  As this code is slated to go into
2.6.19 could you please test against that?

I would have tested it myself except when I mentioned this earlier I was told
that your card does not actually implement the hypertransport interrupt
capability properly.  

The practical reason for pathscale to work on this is the genirq work
in 2.6.19 changes the internal implementation detail your
hypertransport card has been relying on to work so your hypertranport
card will not work without fixes.

Thanks,
Eric


From bugzilla-daemon at openib.org  Sat Sep 30 21:14:00 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Sat, 30 Sep 2006 21:14:00 -0700 (PDT)
Subject: [openib-general] [Bug 256] New: Missing include in ib_verbs.h
Message-ID: <20061001041400.0A8962283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=256

           Summary: Missing include in ib_verbs.h
           Product: OpenFabrics Linux
           Version: 1.1rc6
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Verbs
        AssignedTo: bugzilla at openib.org
        ReportedBy: rk at scali.com
                CC: sp at scali.com


ib_verbs.h uses struct kref, but fails to include kref.h

One practical effect of this is that lustre fail to compile the o2ib module.


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at openib.org  Sat Sep 30 21:14:44 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Sat, 30 Sep 2006 21:14:44 -0700 (PDT)
Subject: [openib-general] [Bug 256] Missing include in ib_verbs.h
Message-ID: <20061001041444.DAF8B2283D4@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=256


------- Comment #1 from rk at scali.com  2006-09-30 21:14 -------
Created an attachment (id=47)
 --> (http://openib.org/bugzilla/attachment.cgi?id=47&action=view)
Add missing include


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at openib.org  Sat Sep 30 23:00:22 2006
From: bugzilla-daemon at openib.org (bugzilla-daemon at openib.org)
Date: Sat, 30 Sep 2006 23:00:22 -0700 (PDT)
Subject: [openib-general] [Bug 256] Missing include in ib_verbs.h
Message-ID: <20061001060022.89BF82283D8@openib.ca.sandia.gov>

http://openib.org/bugzilla/show_bug.cgi?id=256


vlad at mellanox.co.il changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|bugzilla at openib.org         |jackm at mellanox.co.il


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.


From mst at mellanox.co.il  Sat Sep 30 23:52:07 2006
From: mst at mellanox.co.il (Michael S. Tsirkin)
Date: Sun, 1 Oct 2006 08:52:07 +0200
Subject: [openib-general] [PATCH] IB/ipoib: NAPI
In-Reply-To: <adaodt055to.fsf@cisco.com>
References: <adaodt055to.fsf@cisco.com>
Message-ID: <20061001065206.GA888@mellanox.co.il>

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Anyway I don't think this is that urgent -- we've dumped enough stuff
> into 2.6.19, so I think this should wait for 2.6.20 at the earliest anyway.

Hmm, iwarp went in, ehca went in, OK. Pathscale are dumping out their internal
tree at a high rate.  But if you look at IPoIB over mthca for example, there
were almost no changes.  Isn't that true?  And isn't the NAPI patch quite small?

Maybe, if you are worried about stability, we can make NAPI optional and off by
default in 2.6.19? There's precedent for this with e1000.  This would also give
low level driver maintainers the chance to experiment and select the best API's,
instead of just guessing which one is best. In 2.6.20 we'll be able to remove
the non-NAPI path if it works out well.

Want to see a patch like this?

-- 
MST