From BlakeinstantaneousDelgado at gksoft.com  Tue Apr  1 03:18:58 2008
From: BlakeinstantaneousDelgado at gksoft.com (Kerry Reeves)
Date: Tue, 1 Apr 2008 18:18:58 +0800
Subject: [ofa-general] The bull is back
Message-ID: <3IX014EJXVWDA271@gksoft.com>

DnC Multimedia Corporation Today came firing out of the gate today
Symbol:DCNM
600% Volume Spike and Over 20% gains on a a huge news release

The hot PR 
DnC Multimedia Announces Distribution Agreement and $445,000 Purchase Order, read more about it.

The trick with penny stocs is to hit it while its hot, and today's activity clearly backs our beliefs of DCNM being in the zone. Investors are discovering this hidden gem

Grab this gem while its in cents it wont last there long.

Ride the gains with DCNM DnC Multimedia Corporation Today


From ogerlitz at voltaire.com  Tue Apr  1 00:00:46 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 01 Apr 2008 10:00:46 +0300
Subject: [ofa-general] Re: IB/core: Add creation flags to struct
	ib_qp_init_attr
In-Reply-To: <ada3aqailfd.fsf@cisco.com>
References: <1205767427.25950.137.camel@mtls03> <ada3aqailfd.fsf@cisco.com>
Message-ID: <47F1DD9E.7040304@voltaire.com>

Roland Dreier wrote:
> Subject: [PATCH] IB/core: Add creation flags to struct ib_qp_init_attr
>
> Add a create_flags member to struct ib_qp_init_attr that will allow a
> kernel verbs consumer to create a pass special flags when creating a QP.
> Add a flag value for telling low-level drivers that a QP will be used
> for IPoIB UD LSO.  The create_flags member will also be useful for XRC
> and ehca low-latency QP support.
Roland, can you please comment what is the approach you prefer to see 
for --user space-- implementation of features such as the ehca low 
latency and the mlx4 block loopback QP "types"? do you want to go the 
XRC way of not breaking the ABI by introducing a new create-qp verb per 
feature as Jack said they did:
> I got around the create_flags problem by adding a new verb to userspace
> (ibv_create_xrc_rcv_qp() ) with its own ABI to kernel space.  Since the kernel-space
> function (added to uverbs_cmd: ib_uverbs_create_xrc_rcv_qp() ) "knew" that it 
> was creating an XRC_RCV qp, it set the flag in ib_qp_init_attr appropriately.
If this is what you prefer to see, does it makes sense to have one new 
verb that can be used for xrc, ehca-ll, mlx4-block loopback and what 
ever new features we want to add for user space QPs in the future?

Or.


From harry at eurowebhost.com  Tue Apr  1 00:30:03 2008
From: harry at eurowebhost.com (Noelle Grue)
Date: Tue, 1 Apr 2008 10:30:03 +0300
Subject: [ofa-general] Belebt Geist und Korper
Message-ID: <01c893e3$58660780$f321e457@harry>

Online anonym bestellen - original Qualitat - 100% wirksam

Fakten von unseren Kunden: 

- Sex ist befriedigender denn je. Stress und Leistungsdruck verschwinden. Sie ist nie wieder frustriert, ich habe keine Angst mehr zu versagen. Es ist ein wundervolles korperliches Erlebnis, dem ein genauso tiefes Gefuhl folgt. 

- Die Nebenwirkungen sind minimal: manchmal eine verstopfte Nase, kurzzeitig ein roter Kopf - kein Kopfschmerz, sondern das Gefuhl, als wurde man eine Flasche eiskalte Cola in einem Zug trinken.

- Interessanterweise macht eine Vi. allein noch keinen Stander. Man(n) muss wenigstens ein bisschen Lust auf Sex mit der Frau haben. Gegen eine Eiserne Jungfrau im Bett hilft auch die grosste Dosis nichts. Wer aber das erste Kribbeln in den Lenden spurt, wird einen stahlharten Stander haben, und das fur wenigstens vier Stunden.

- Eine volle 100-mg-Dosis macht den Schwanz zum Schwert. Wer es ubertreibt, ist Schuld, wenn die Herzallerliebste am Ende einen Y-formigen Sarg braucht. Fur die meisten von uns sind 50 mg mehr als genug, wenn man das gute Stuck zwischen den Hohepunkten auch mal hangen lassen will ... zur Not hilft es da vielleicht, sich ein nacktes Grossmutterchen vorzustellen.

- Wer noch Zeit und Lust fur eine schnelle Nummer am nachsten Morgen hat, sollte dafur sorgen, dann noch genug Vi. im Blut zu haben - damit es noch fur ein oder zwei "Stehaufmannchen" reicht.

- Das Beste an Vi. ist die Sicherheit, dass man "mit Autopilot fliegt", dass man entspannt und ohne Sorgen zur Sache kommen kann, dass der Stander auch halt, auch wenn man unterbrochen wird (die Kinder klopfen an die Schlafzimmertur, der Hund bellt, das Kondom sitzt schlecht). Wenn man Vi. bewusst anwendet, kann es auch der Partnerin gegenuber ein grosses Geschenk sein. Nur ein Rat: Sagen Sie ihr nicht, dass Sie es verwenden, das weibliche Selbstwertgefuhl ist genauso verletzlich wie das unsere.

Spezialangebot: Vi. 10 Tab. 100 mg + Ci. 10 Tab. x 20 mg 53,82 Euro 

Vi. 10 Tab. 26,20 Euro
Vi. 30 Tab. 51,97 Euro - Sie sparen: 27,00 Euro
Vi. 60 Tab. 95,69 Euro - Sie sparen: 62,00 Euro
Vi. 90 Tab. 136,91 Euro - Sie sparen: 100,00 Euro

Ci. 10 - 30,00 Euro
Ci. 20 - 59,35 Euro - Sie sparen: 2,00 Euro
Ci. 30 - 80,30 Euro - Sie sparen: 12,00 Euro

- bequem und diskret online bestellen.
- keine versteckte Kosten
- kein peinlicher Arztbesuch erforderlich
- diskrete Verpackung 
- Visa verifizierter Onlineshop
- diskrete Zahlung
- kostenlose, arztliche Telefon-Beratung
- kein langes Warten - Auslieferung innerhalb von 2-3 Tagen

Bestellen Sie jetzt und vergessen Sie Ihre Enttauschungen, anhaltende Versagensaengste und wiederholte peinliche Situationen

http://exceptsecond.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/ee4c83ae/attachment.html>

From eli at dev.mellanox.co.il  Tue Apr  1 00:53:12 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 01 Apr 2008 10:53:12 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com>
References: <1206452112.25950.360.camel@mtls03>
	<15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com>
Message-ID: <1207036392.22081.12.camel@mtls03>


On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote:
> On Tue, Mar 25, 2008 at 4:35 PM, Eli Cohen <eli at dev.mellanox.co.il>
> wrote:
>         Add LSO support to mlx4 driver such that it will be able
>         to send SKBs passed from the driver which publish NETIF_TSO.
>         
>         Signed-off-by: Eli Cohen <eli at mellanox.co.il>
>         ---
>         Changes since last post:
>         1. Verify that header length does not exceed 60 bytes.
>         2. Remove unnecessary printk calls
>         
> 
> OK, so this patch would complete the LSO merging! 
> 
> Eli - what is the status here, does some editing is needed to comply
> to the way the rest of the  patches were merged (qp creation flags,
> etc), or its applicable to review/merge as was posted?
> 

I think it does though I didn't yet have the chance to check it on top
of the latest commits of Roland.


From eli at dev.mellanox.co.il  Tue Apr  1 00:53:12 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 01 Apr 2008 10:53:12 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com>
References: <1206452112.25950.360.camel@mtls03>
	<15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com>
Message-ID: <1207036392.22081.12.camel@mtls03>


On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote:
> On Tue, Mar 25, 2008 at 4:35 PM, Eli Cohen <eli at dev.mellanox.co.il>
> wrote:
>         Add LSO support to mlx4 driver such that it will be able
>         to send SKBs passed from the driver which publish NETIF_TSO.
>         
>         Signed-off-by: Eli Cohen <eli at mellanox.co.il>
>         ---
>         Changes since last post:
>         1. Verify that header length does not exceed 60 bytes.
>         2. Remove unnecessary printk calls
>         
> 
> OK, so this patch would complete the LSO merging! 
> 
> Eli - what is the status here, does some editing is needed to comply
> to the way the rest of the  patches were merged (qp creation flags,
> etc), or its applicable to review/merge as was posted?
> 

I think it does though I didn't yet have the chance to check it on top
of the latest commits of Roland.


From HNGUYEN at de.ibm.com  Tue Apr  1 01:16:35 2008
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Tue, 1 Apr 2008 10:16:35 +0200
Subject: [ofa-general] Re: [PATCH 2/10] IB/core: Add creation flags to QPs
In-Reply-To: <ada3aqailfd.fsf@cisco.com>
Message-ID: <OF40A9FE6C.6A7C5294-ONC125741E.002CE06C-C125741E.002D6CCB@de.ibm.com>

Hi Roland!
> Thanks, I applied this with some extra code in all the low-level
> drivers to make sure that the create_flags are passed in as 0.  Does
> that make sense to everyone?
Below changes make sense to me as I would have to check the flags when
introducing LL QP flag for ehca later.
BTW: If you have some minutes, please let us agree on the encoding scheme
for qp_types and create_flags as discussed in this thread.
Thanks!
Nam
> diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c
> b/drivers/infiniband/hw/ehca/ehca_qp.c
> index a9fd419..3eb14a5 100644
> --- a/drivers/infiniband/hw/ehca/ehca_qp.c
> +++ b/drivers/infiniband/hw/ehca/ehca_qp.c
> @@ -421,6 +421,9 @@ static struct ehca_qp *internal_create_qp(
>     u32 swqe_size = 0, rwqe_size = 0, ib_qp_num;
>     unsigned long flags;
>
> +   if (init_attr->create_flags)
> +      return ERR_PTR(-EINVAL);
> +
>     memset(&parms, 0, sizeof(parms));
>     qp_type = init_attr->qp_type;
>


From HNGUYEN at de.ibm.com  Tue Apr  1 01:16:35 2008
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Tue, 1 Apr 2008 10:16:35 +0200
Subject: [ofa-general] Re: [PATCH 2/10] IB/core: Add creation flags to QPs
In-Reply-To: <ada3aqailfd.fsf@cisco.com>
Message-ID: <OF40A9FE6C.6A7C5294-ONC125741E.002CE06C-C125741E.002D6CCB@de.ibm.com>

Hi Roland!
> Thanks, I applied this with some extra code in all the low-level
> drivers to make sure that the create_flags are passed in as 0.  Does
> that make sense to everyone?
Below changes make sense to me as I would have to check the flags when
introducing LL QP flag for ehca later.
BTW: If you have some minutes, please let us agree on the encoding scheme
for qp_types and create_flags as discussed in this thread.
Thanks!
Nam
> diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c
> b/drivers/infiniband/hw/ehca/ehca_qp.c
> index a9fd419..3eb14a5 100644
> --- a/drivers/infiniband/hw/ehca/ehca_qp.c
> +++ b/drivers/infiniband/hw/ehca/ehca_qp.c
> @@ -421,6 +421,9 @@ static struct ehca_qp *internal_create_qp(
>     u32 swqe_size = 0, rwqe_size = 0, ib_qp_num;
>     unsigned long flags;
>
> +   if (init_attr->create_flags)
> +      return ERR_PTR(-EINVAL);
> +
>     memset(&parms, 0, sizeof(parms));
>     qp_type = init_attr->qp_type;
>


From ogerlitz at voltaire.com  Tue Apr  1 01:18:08 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 01 Apr 2008 11:18:08 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1207036392.22081.12.camel@mtls03>
References: <1206452112.25950.360.camel@mtls03>	<15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com>
	<1207036392.22081.12.camel@mtls03>
Message-ID: <47F1EFC0.70504@voltaire.com>

Eli Cohen wrote:
> On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote:
>   
>> Eli - what is the status here, does some editing is needed to comply
>> to the way the rest of the  patches were merged (qp creation flags,
>> etc), or its applicable to review/merge as was posted?
>>     
>
> I think it does though I didn't yet have the chance to check it on top
> of the latest commits of Roland.
>   
I see.

2.6.25 is at RC seven and we still have the interrupt moderation patches 
pending for completion of review and merging, so...

Or.


From ogerlitz at voltaire.com  Tue Apr  1 01:18:08 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 01 Apr 2008 11:18:08 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1207036392.22081.12.camel@mtls03>
References: <1206452112.25950.360.camel@mtls03>	<15ddcffd0803312353l57b0cfaft273c9f809387fb68@mail.gmail.com>
	<1207036392.22081.12.camel@mtls03>
Message-ID: <47F1EFC0.70504@voltaire.com>

Eli Cohen wrote:
> On Tue, 2008-04-01 at 09:53 +0300, Or Gerlitz wrote:
>   
>> Eli - what is the status here, does some editing is needed to comply
>> to the way the rest of the  patches were merged (qp creation flags,
>> etc), or its applicable to review/merge as was posted?
>>     
>
> I think it does though I didn't yet have the chance to check it on top
> of the latest commits of Roland.
>   
I see.

2.6.25 is at RC seven and we still have the interrupt moderation patches 
pending for completion of review and merging, so...

Or.


From BookertoffeeAvery at kdka.com  Tue Apr  1 05:44:05 2008
From: BookertoffeeAvery at kdka.com (Raymundo Jacobson)
Date: Tue, 1 Apr 2008 11:44:05 -0100
Subject: [ofa-general] Aggressive Traders Alert
Message-ID: <8IX823EJXVWDA522@kdka.com>

DnC Multimedia Corporation Today came firing out of the gate today
Symbol:DCNM
600% Volume Spike and Over 20% gains on a a huge news release

The hot PR 
DnC Multimedia Announces Distribution Agreement and $445,000 Purchase Order, read more about it.

The trick with penny stocs is to hit it while its hot, and today's activity clearly backs our beliefs of DCNM being in the zone. Investors are discovering this hidden gem

Grab this gem while its in cents it wont last there long.

Ride the gains with DCNM DnC Multimedia Corporation Today


From a-anthw at abvv-wvl.be  Tue Apr  1 02:45:42 2008
From: a-anthw at abvv-wvl.be (Hugh Dillard)
Date: Tue, 1 Apr 2008 17:45:42 +0800
Subject: [ofa-general] i still remember you
Message-ID: <01c89420$3474e700$2af3a9dc@a-anthw>

Hello! I am bored today. I am nice girl that would like to chat with you. Email me at Alexis at jolasite.com only, because I am using my friend's email to write this. Will send some of my pictures


From mcdermienn at esvax-a1.email.dupont.com  Tue Apr  1 03:34:44 2008
From: mcdermienn at esvax-a1.email.dupont.com (Lucinda Ash)
Date: Tue, 1 Apr 2008 11:34:44 +0100
Subject: [ofa-general] Die groesste Standardsoftware fuer Minipreise
Message-ID: <488940577.35031876976674@esvax-a1.email.dupont.com>

Legal software salesHier bekommen Sie Ihre Software sofort. Bezahlen und unverzueglich downloaden � so geht es bei uns. Wir haben Programme in allen europaeischen Sprachen, diese sind sowohl fuer Windows als auch fuer Macintosh geeignet. Unsere Programme sind sehr preiswert, aber es handelt sich nur um originale Vollversionen. http://geocities.com/ewing_lindsey/* Office Enterprise 2007: $79.95
* Adobe Acrobat 8.0 Professional: $69.95
* Office System Professional 2003 (5 Cds): $59.95
* Office System Professional 2003 (5 Cds): $59.95
http://geocities.com/ewing_lindsey/Unsere Kundenberater sind immer bereit Ihnen bei der Installation zu helfen. Wir antworten sehr schnell und geben Ihnen auch Geld-Zurueck-Garantie. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/6ddc0afb/attachment.html>

From dotanb at dev.mellanox.co.il  Tue Apr  1 05:45:10 2008
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 1 Apr 2008 15:45:10 +0300
Subject: [ofa-general] [PATCH] librdmacm: fix typos in examples + start add
	port support
Message-ID: <200804011545.10650.dotanb@dev.mellanox.co.il>

Fixed typo in test name + spelling typos.
Started to add support to control the port number from command line.

Signed-off-by: Dotan Barak <dotanb at dev.mellanox.co.il>

---

diff --git a/examples/cmatose.c b/examples/cmatose.c
index 2f6e5f6..ba6299e 100644
--- a/examples/cmatose.c
+++ b/examples/cmatose.c
@@ -80,6 +80,7 @@ static struct cmatest test;
 static int connections = 1;
 static int message_size = 100;
 static int message_count = 10;
+static uint16_t port = 7471;
 static uint8_t set_tos = 0;
 static uint8_t tos;
 static uint8_t migrate = 0;
@@ -536,7 +537,7 @@ static int run_server(void)
 	} else
 		test.src_in.sin_family = PF_INET;
 
-	test.src_in.sin_port = 7471;
+	test.src_in.sin_port = port;
 	ret = rdma_bind_addr(listen_id, test.src_addr);
 	if (ret) {
 		printf("cmatose: bind address failed: %d\n", ret);
@@ -613,7 +614,7 @@ static int run_client(void)
 	if (ret)
 		return ret;
 
-	test.dst_in.sin_port = 7471;
+	test.dst_in.sin_port = port;
 
 	printf("cmatose: connecting\n");
 	for (i = 0; i < connections; i++) {
@@ -666,7 +667,7 @@ int main(int argc, char **argv)
 {
 	int op, ret;
 
-	while ((op = getopt(argc, argv, "s:b:c:C:S:t:m")) != -1) {
+	while ((op = getopt(argc, argv, "s:b:c:C:S:t:p:m")) != -1) {
 		switch (op) {
 		case 's':
 			dst_addr = optarg;
@@ -687,6 +688,9 @@ int main(int argc, char **argv)
 			set_tos = 1;
 			tos = (uint8_t) atoi(optarg);
 			break;
+		case 'p':
+			port = atoi(optarg);
+			break;
 		case 'm':
 			migrate = 1;
 			break;
@@ -698,6 +702,7 @@ int main(int argc, char **argv)
 			printf("\t[-C message_count]\n");
 			printf("\t[-S message_size]\n");
 			printf("\t[-t type_of_service]\n");
+			printf("\t[-p port_number]\n");
 			printf("\t[-m(igrate)]\n");
 			exit(1);
 		}
diff --git a/examples/rping.c b/examples/rping.c
index 983ce1c..8bfa053 100644
--- a/examples/rping.c
+++ b/examples/rping.c
@@ -123,7 +123,7 @@ struct rping_cb {
 	struct rping_rdma_info recv_buf;/* malloc'd buffer */
 	struct ibv_mr *recv_mr;		/* MR associated with this buffer */
 
-	struct ibv_send_wr sq_wr;	/* send work requrest record */
+	struct ibv_send_wr sq_wr;	/* send work request record */
 	struct ibv_sge send_sgl;
 	struct rping_rdma_info send_buf;/* single send buf */
 	struct ibv_mr *send_mr;
@@ -600,7 +600,7 @@ static void *cq_thread(void *arg)
 			pthread_exit(NULL);
 		}
 		if (ev_cq != cb->cq) {
-			fprintf(stderr, "Unkown CQ!\n");
+			fprintf(stderr, "Unknown CQ!\n");
 			pthread_exit(NULL);
 		}
 		ret = ibv_req_notify_cq(cb->cq, 0);
diff --git a/examples/udaddy.c b/examples/udaddy.c
index 60d9e16..0d69b05 100644
--- a/examples/udaddy.c
+++ b/examples/udaddy.c
@@ -74,6 +74,7 @@ static struct cmatest test;
 static int connections = 1;
 static int message_size = 100;
 static int message_count = 10;
+static uint16_t port = 7174;
 static uint8_t set_tos = 0;
 static uint8_t tos;
 static char *dst_addr;
@@ -244,7 +245,7 @@ static int addr_handler(struct cmatest_node *node)
 		ret = rdma_set_option(node->cma_id, RDMA_OPTION_ID,
 				      RDMA_OPTION_ID_TOS, &tos, sizeof tos);
 		if (ret)
-			printf("cmatose: set TOS option failed: %d\n", ret);
+			printf("udaddy: set TOS option failed: %d\n", ret);
 	}
 
 	ret = rdma_resolve_route(node->cma_id, 2000);
@@ -542,7 +543,7 @@ static int run_server(void)
 	} else
 		test.src_in.sin_family = PF_INET;
 
-	test.src_in.sin_port = 7174;
+	test.src_in.sin_port = port;
 	ret = rdma_bind_addr(listen_id, test.src_addr);
 	if (ret) {
 		printf("udaddy: bind address failed: %d\n", ret);
@@ -595,7 +596,7 @@ static int run_client(void)
 	if (ret)
 		return ret;
 
-	test.dst_in.sin_port = 7174;
+	test.dst_in.sin_port = port;
 
 	printf("udaddy: connecting\n");
 	for (i = 0; i < connections; i++) {


From dotanb at dev.mellanox.co.il  Tue Apr  1 06:02:04 2008
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Tue, 01 Apr 2008 16:02:04 +0300
Subject: [ofa-general] Re: the port numbers in some of the rdmacm examples is
	a fixed value
In-Reply-To: <000101c8934b$265a46e0$37fc070a@amr.corp.intel.com>
References: <47EBBC81.4030501@dev.mellanox.co.il>
	<000101c89022$ce0b3d30$9c98070a@amr.corp.intel.com>
	<47EF2A80.1020804@dev.mellanox.co.il>
	<000101c8934b$265a46e0$37fc070a@amr.corp.intel.com>
Message-ID: <47F2324C.9060002@dev.mellanox.co.il>

Sean Hefty wrote:
>> I started to work on this patch and for ucmatose everything is fine.
>> The problem is with udaddy: the parameter "-p" is only being used for
>> the port space ...
>> (I really would like to have the same parameter for controlling the port
>> number for ALL of the examples
>> of the librdmacm, but i must admit that without doing some changes it
>> won't happen) ...
>>     
>
> I'd prefer to use the same parameter as well.  If no one objects, I'm okay with
> modifying the udaddy -p parameter.
>
>   
>> what do you think?
>> (do you want me to send you the changes i made so far?)
>>     
>
> Sending me the changes that you have would be fine.  I can finish them up when I
> get some time.
>   
O.k., i sent you one patch which contains:
1) typo fixes (in test name of error message) + spelling typos
2) start of port support to control the port numbers from the command line
(if you wish, i can supply two different patches)

Only a one minute work is required to close this issue and fix the port 
number support of the udaddy.


thanks
Dotan


From stadler at imit.kth.se  Tue Apr  1 06:07:01 2008
From: stadler at imit.kth.se (Antone Pack)
Date: Tue, 1 Apr 2008 21:07:01 +0800
Subject: [ofa-general] {Viagra_onli2_de}
Message-ID: <161186767.74859209063012@imit.kth.se>

Versuchen Sie unser Produkt und Sie werden fuhlen was unsere Kunden bestatigen

Pr.  .. Eise die keine Konk... ..Urrenz kennen

- Kein langes Warten - Auslieferung innerhalb von 2-3 Tagen
- keine versteckte Kos// Ten
- Bequem und dis kret 0... . N-line! be... .Stellen.
 - Disk rete Verpackung und Zahlung
- Kein peinlicher Arz t besuch erforderlich
- Kos...  Tenlose, arztliche Telefon-Beratung


Originalme/ dikamente
Ciia..aa^_^aaalis...... 10 Pack. 21,00 Euro
Viia..aa^_^aaagra... 10 Pack. 11,00 Euro

Nur fur kurze Zeit - vier Pil. .. len umsonst erhalten

Man Lebt nur einmal - probiers aus !
(bitte warten Sie einen Moment bis die Seite vollstandig geladen ist)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/8df76b9a/attachment.html>

From swise at opengridcomputing.com  Tue Apr  1 06:38:06 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 01 Apr 2008 08:38:06 -0500
Subject: [ofa-general] Re: summary on OFED 1.4 plans - re adding new kernel
 features/verbs through ofed
In-Reply-To: <47F1D4E6.2010108@voltaire.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F108D2.1030606@opengridcomputing.com>
	<47F1D4E6.2010108@voltaire.com>
Message-ID: <47F23ABE.8060102@opengridcomputing.com>


Or Gerlitz wrote:
> Steve Wise wrote:
>> Tziporet Koren wrote:
>>>
>>> * OFED 1.4: *
>>> 1. Kernel base: since we target 1.4 release to Sep we target the 
>>> kernel base to be 2.6.27
>>>     This is a good target, but we may need to stay with 2.6.26 if the 
>>> kernel progress will not be aligned.
>>> 2. Suggestions for new features:
>>>
>>>     * Verbs: Reliable Multicast (to be presented at Sonoma)
>>>     * IPoIB - continue with performance enhancements
>>>
>> Sorry I missed these meetings.  For iWARP,  here is my plan:
>> New iWARP Verbs:
>> - stag_alloc/dealloc
>> - nsmr_fastreg
>> - read-with-inv-local-stag
>> - inv-local-stag
>> Note the above verbs might be transport-independent.  I believe the 
>> IBTA has defined a fastreg verb too?
>> - peer-2-peer support in IWCM/Drivers
> Steve, Tziporet,
> 
> So you are talking about adding new verbs/features to the Linux RDMA 
> stack. Are you intending to do this through the mainline kernel cycles, 
> eg the general list, the maintainer (Roland), etc? and if not, why?
>

Of course.

All the work I've done and will do for the linux rdma core and chelsio 
drivers is first pushed upstream, then submitted to ofed.

Steve.


From kitlouiskoogax at louiskoo.com  Tue Apr  1 06:47:44 2008
From: kitlouiskoogax at louiskoo.com (Pat Moore)
Date: Tue, 1 Apr 2008 10:47:44 -0300
Subject: [ofa-general] Legal software sales
Message-ID: <326403736.67623303150570@louiskoo.com>

Our aim is to render PC and Macintosh lawful software and computer solutions of low cost for any budget.
 Whether you are a corporate buyer, a small-scale enterprise possessor,
 or shopping for your home personal computer, we think that we'll assist you.
 CHECK ALL PRODUCTS!
 http://louisajeffcoatnc761.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/9604b07e/attachment.html>

From sashak at voltaire.com  Tue Apr  1 10:45:08 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 1 Apr 2008 17:45:08 +0000
Subject: [ofa-general] [PATCH] opensm/configure.in: improve readability of
	configured config files
In-Reply-To: <47F023B6.2070302@mellanox.co.il>
References: <20080330232119.GM13708@sashak.voltaire.com>
	<47F023B6.2070302@mellanox.co.il>
Message-ID: <20080401174508.GB27321@sashak.voltaire.com>


When ./configure script is executed it will show the values which are
used for those config files, like this:

checking for --with-opensm-conf-sub-dir... /etc/opensm
checking for --with-node-name-map ... ib-node-name-map
checking for --with-partitions-conf... partitions.conf
checking for --with-qos-policy-conf... qos-policy.conf

(note that for --with-opensm-conf-sub-dir full path is shown)

And not just that it was or wasn't redefined from its default values
(checking for --with-partitions-conf... no , etc).

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/configure.in |   35 +++++++++++++++--------------------
 1 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/opensm/configure.in b/opensm/configure.in
index a5b7c5a..0da402a 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -82,6 +82,10 @@ OPENIB_OSM_CONSOLE_SOCKET_SEL
 dnl select performance manager or not
 OPENIB_OSM_PERF_MGR_SEL
 
+dnl resolve <sysconfdir> config dir.
+conf_dir_tmp1="`eval echo ${sysconfdir} | sed 's/^NONE/$ac_default_prefix/'`"
+SYS_CONFIG_DIR="`eval echo $conf_dir_tmp1`"
+
 dnl Check for a different subdir for the config files.
 OPENSM_CONF_SUB_DIR=opensm
 AC_MSG_CHECKING(for --with-opensm-conf-sub-dir)
@@ -92,23 +96,17 @@ AC_ARG_WITH(opensm-conf-sub-dir,
     no)
         ;;
     *)
-        withopensmconfsubdir=yes
         OPENSM_CONF_SUB_DIR=$withval
         ;;
     esac ]
 )
-AC_MSG_RESULT(${withopensmconfsubdir=no})
-AC_SUBST(OPENSM_CONF_SUB_DIR)
-
-dnl Set up <sysconfdir>/opensm config dir.
-CONF_DIR_TMP1="`eval echo ${sysconfdir}/$OPENSM_CONF_SUB_DIR`"
-CONF_DIR_TMP2="`echo $CONF_DIR_TMP1 | sed 's/^NONE/$ac_default_prefix/'`"
-CONF_DIR="`eval echo $CONF_DIR_TMP2`"
-
+OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/$OPENSM_CONF_SUB_DIR
+AC_MSG_RESULT($OPENSM_CONFIG_DIR)
 AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR,
-	["$CONF_DIR"],
+	["$OPENSM_CONFIG_DIR"],
 	[Define OpenSM config directory])
-AC_SUBST(CONF_DIR)
+AC_SUBST(OPENSM_CONF_SUB_DIR)
+AC_SUBST(CONF_DIR,$OPENSM_CONFIG_DIR)
 
 dnl Check for a different default node name map file
 NODENAMEMAPFILE=ib-node-name-map
@@ -120,14 +118,13 @@ AC_ARG_WITH(node-name-map,
     no)
         ;;
     *)
-        withnodenamemap=yes
         NODENAMEMAPFILE=$withval
         ;;
     esac ]
 )
-AC_MSG_RESULT(${withnodenamemap=no})
+AC_MSG_RESULT($NODENAMEMAPFILE)
 AC_DEFINE_UNQUOTED(HAVE_DEFAULT_NODENAME_MAP,
-	["$CONF_DIR/$NODENAMEMAPFILE"],
+	["$OPENSM_CONFIG_DIR/$NODENAMEMAPFILE"],
 	[Define a default node name map file])
 AC_SUBST(NODENAMEMAPFILE)
 
@@ -141,14 +138,13 @@ AC_ARG_WITH(partitions-conf,
     no)
         ;;
     *)
-        withpartitionsconf=yes
         PARTITION_CONFIG_FILE=$withval
         ;;
     esac ]
 )
-AC_MSG_RESULT(${withpartitionsconf=no})
+AC_MSG_RESULT($PARTITION_CONFIG_FILE)
 AC_DEFINE_UNQUOTED(HAVE_DEFAULT_PARTITION_CONFIG_FILE,
-	["$CONF_DIR/$PARTITION_CONFIG_FILE"],
+	["$OPENSM_CONFIG_DIR/$PARTITION_CONFIG_FILE"],
 	[Define a Partition config file])
 AC_SUBST(PARTITION_CONFIG_FILE)
 
@@ -162,14 +158,13 @@ AC_ARG_WITH(qos-policy-conf,
     no)
         ;;
     *)
-        withqospolicyconf=yes
         QOS_POLICY_FILE=$withval
         ;;
     esac ]
 )
-AC_MSG_RESULT(${withqospolicyconf=no})
+AC_MSG_RESULT($QOS_POLICY_FILE)
 AC_DEFINE_UNQUOTED(HAVE_DEFAULT_QOS_POLICY_FILE,
-	["$CONF_DIR/$QOS_POLICY_FILE"],
+	["$OPENSM_CONFIG/$QOS_POLICY_FILE"],
 	[Define a QOS policy config file])
 AC_SUBST(QOS_POLICY_FILE)
 
-- 
1.5.4.1.122.gaa8d


From sashak at voltaire.com  Tue Apr  1 10:52:46 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 1 Apr 2008 17:52:46 +0000
Subject: [ofa-general] [PATCH] opensm/configure.in: replace CONF_DIR config
	var by OSM_CONFIG_DIR
In-Reply-To: <20080401174508.GB27321@sashak.voltaire.com>
References: <20080330232119.GM13708@sashak.voltaire.com>
	<47F023B6.2070302@mellanox.co.il>
	<20080401174508.GB27321@sashak.voltaire.com>
Message-ID: <20080401175246.GC27321@sashak.voltaire.com>


Replace CONF_DIR config variable by OSM_CONFIG_DIR in substitution
patterns. Remove not needed OPENSM_CONF_SUB_DIR var.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/configure.in                  |    8 +++-----
 opensm/man/opensm.8.in               |   16 ++++++++--------
 opensm/opensm.spec.in                |    8 ++++----
 opensm/scripts/opensmd.in            |    4 ++--
 opensm/scripts/redhat-opensm.init.in |    4 ++--
 5 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/opensm/configure.in b/opensm/configure.in
index 0da402a..1f2bed5 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -87,7 +87,7 @@ conf_dir_tmp1="`eval echo ${sysconfdir} | sed 's/^NONE/$ac_default_prefix/'`"
 SYS_CONFIG_DIR="`eval echo $conf_dir_tmp1`"
 
 dnl Check for a different subdir for the config files.
-OPENSM_CONF_SUB_DIR=opensm
+OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/opensm
 AC_MSG_CHECKING(for --with-opensm-conf-sub-dir)
 AC_ARG_WITH(opensm-conf-sub-dir,
     AC_HELP_STRING([--with-opensm-conf-sub-dir=dir],
@@ -96,17 +96,15 @@ AC_ARG_WITH(opensm-conf-sub-dir,
     no)
         ;;
     *)
-        OPENSM_CONF_SUB_DIR=$withval
+        OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/$withval
         ;;
     esac ]
 )
-OPENSM_CONFIG_DIR=$SYS_CONFIG_DIR/$OPENSM_CONF_SUB_DIR
 AC_MSG_RESULT($OPENSM_CONFIG_DIR)
 AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR,
 	["$OPENSM_CONFIG_DIR"],
 	[Define OpenSM config directory])
-AC_SUBST(OPENSM_CONF_SUB_DIR)
-AC_SUBST(CONF_DIR,$OPENSM_CONFIG_DIR)
+AC_SUBST(OPENSM_CONFIG_DIR)
 
 dnl Check for a different default node name map file
 NODENAMEMAPFILE=ib-node-name-map
diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
index 93fa95c..e93844d 100644
--- a/opensm/man/opensm.8.in
+++ b/opensm/man/opensm.8.in
@@ -201,21 +201,21 @@ is accumulative.
 .TP
 \fB\-P\fR, \fB\-\-Pconfig\fR
 This option defines the optional partition configuration file.
-The default name is \fB\%@CONF_DIR@/@PARTITION_CONFIG_FILE@\fP.
+The default name is \fB\%@OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@\fP.
 .TP
 .BI --prefix_routes_file= path
 Prefix routes control how the SA responds to path record queries for
 off-subnet DGIDs.  By default, the SA fails such queries. The
 .B PREFIX ROUTES
 section below describes the format of the configuration file.
-The default path is \fB\%@CONF_DIR@/prefix\-routes.conf\fP.
+The default path is \fB\%@OPENSM_CONFIG_DIR@/prefix\-routes.conf\fP.
 .TP
 \fB\-Q\fR, \fB\-\-qos\fR
 This option enables QoS setup. It is disabled by default.
 .TP
 \fB\-Y\fR, \fB\-\-qos_policy_file\fR
 This option defines the optional QoS policy file. The default
-name is \fB\%@CONF_DIR@/@QOS_POLICY_FILE@\fP.
+name is \fB\%@OPENSM_CONFIG_DIR@/@QOS_POLICY_FILE@\fP.
 .TP
 \fB\-N\fR, \fB\-\-no_part_enforce\fR
 This option disables partition enforcement on switch external ports.
@@ -331,7 +331,7 @@ logrotate purposes.
 .SH PARTITION CONFIGURATION
 .PP
 The default name of OpenSM partitions configuration file is
-\fB\%@CONF_DIR@/@PARTITION_CONFIG_FILE@\fP. The default may be changed by using
+\fB\%@OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@\fP. The default may be changed by using
 --Pconfig (-P) option with OpenSM.
 
 The default partition will be created by OpenSM unconditionally even
@@ -926,19 +926,19 @@ Both or one of options -U and -M can be specified together with \'-R file\'.
 
 .SH FILES
 .TP
-.B @CONF_DIR@/@NODENAMEMAPFILE@
+.B @OPENSM_CONFIG_DIR@/@NODENAMEMAPFILE@
 default node name map file.  See ibnetdiscover for more information on format.
 
 .TP
-.B @CONF_DIR@/@PARTITION_CONFIG_FILE@
+.B @OPENSM_CONFIG_DIR@/@PARTITION_CONFIG_FILE@
 default partition config file
 
 .TP
-.B @CONF_DIR@/@QOS_POLICY_FILE@
+.B @OPENSM_CONFIG_DIR@/@QOS_POLICY_FILE@
 default QOS policy config file
 
 .TP
-.B @CONF_DIR@/prefix-routes.conf
+.B @OPENSM_CONFIG_DIR@/prefix-routes.conf
 default prefix routes file.
 
 .SH AUTHORS
diff --git a/opensm/opensm.spec.in b/opensm/opensm.spec.in
index 882e6e4..feabfef 100644
--- a/opensm/opensm.spec.in
+++ b/opensm/opensm.spec.in
@@ -94,9 +94,9 @@ if [ -f /etc/redhat-release -o -s /etc/redhat-release ]; then
 else
     REDHAT=""
 fi
-mkdir -p $etc/{init.d, at OPENSM_CONF_SUB_DIR@,logrotate.d}
+mkdir -p $etc/{init.d,logrotate.d} @OPENSM_CONFIG_DIR@
 install -m 755 scripts/${REDHAT}opensm.init $etc/init.d/opensmd
-install -m 644 scripts/opensm.conf $etc/@OPENSM_CONF_SUB_DIR@/opensm.conf
+install -m 644 scripts/opensm.conf @OPENSM_CONFIG_DIR@/opensm.conf
 install -m 644 scripts/opensm.logrotate $etc/logrotate.d/opensm
 install -m 755 scripts/sldd.sh $RPM_BUILD_ROOT%{_sbindir}/sldd.sh
 
@@ -128,10 +128,10 @@ fi
 %doc AUTHORS COPYING README
 %{_sysconfdir}/init.d/opensmd
 %{_sbindir}/sldd.sh
-%config(noreplace) %{_sysconfdir}/@OPENSM_CONF_SUB_DIR@/opensm.conf
+%config(noreplace) @OPENSM_CONFIG_DIR@/opensm.conf
 %config(noreplace) %{_sysconfdir}/logrotate.d/opensm
 %dir /var/cache/opensm
-%dir %{_sysconfdir}/@OPENSM_CONF_SUB_DIR@
+%dir @OPENSM_CONFIG_DIR@
 
 %files libs
 %defattr(-,root,root,-)
diff --git a/opensm/scripts/opensmd.in b/opensm/scripts/opensmd.in
index 434a92c..7e5d868 100755
--- a/opensm/scripts/opensmd.in
+++ b/opensm/scripts/opensmd.in
@@ -28,13 +28,13 @@
 #
 #
 # processname: @sbindir@/opensm
-# config: @sysconfig@/opensm.conf
+# config: @OPENSM_CONFIG_DIR@/opensm.conf
 # pidfile: /var/run/opensm.pid
 
 prefix=@prefix@
 exec_prefix=@exec_prefix@
 
-CONFIG=@sysconfdir@/@OPENSM_CONF_SUB_DIR@/opensm.conf
+CONFIG=@OPENSM_CONFIG_DIR@/opensm.conf
 
 if [ ! -f $CONFIG ]; then
     exit 0
diff --git a/opensm/scripts/redhat-opensm.init.in b/opensm/scripts/redhat-opensm.init.in
index 689ffa0..5cc9079 100755
--- a/opensm/scripts/redhat-opensm.init.in
+++ b/opensm/scripts/redhat-opensm.init.in
@@ -38,7 +38,7 @@
 #  $Id: openib-1.0-opensm.init,v 1.5 2006/08/02 18:18:23 dledford Exp $
 #
 # processname: @sbindir@/opensm
-# config: @sysconfdir@/@OPENSM_CONF_SUB_DIR@/opensm.conf
+# config: @OPENSM_CONFIG_DIR@/opensm.conf
 # pidfile: /var/run/opensm.pid
 
 prefix=@prefix@
@@ -46,7 +46,7 @@ exec_prefix=@exec_prefix@
 
 . /etc/rc.d/init.d/functions
 
-CONFIG=@sysconfdir@/@OPENSM_CONF_SUB_DIR@/opensm.conf
+CONFIG=@OPENSM_CONFIG_DIR@/opensm.conf
 if [ ! -f $CONFIG ]; then
     exit 0
 fi
-- 
1.5.4.1.122.gaa8d


From MeghanwildcatMetcalf at byucougars.com  Tue Apr  1 11:14:46 2008
From: MeghanwildcatMetcalf at byucougars.com (Angel Gustafson)
Date: Tue, 1 Apr 2008 17:14:46 -0100
Subject: [ofa-general] Investor Stock Alert
Message-ID: <0IX047EJXVWDA869@byucougars.com>

DnC Multimedia Corporation Today came firing out of the gate today
Symbol:DCNM
600% Volume Spike and Over 20% gains on a a huge news release

The hot PR 
DnC Multimedia Announces Distribution Agreement and $445,000 Purchase Order, read more about it.

The trick with penny stocs is to hit it while its hot, and today's activity clearly backs our beliefs of DCNM being in the zone. Investors are discovering this hidden gem

Grab this gem while its in cents it wont last there long.

Ride the gains with DCNM DnC Multimedia Corporation Today


From eli at dev.mellanox.co.il  Tue Apr  1 08:35:46 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 01 Apr 2008 18:35:46 +0300
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <aday782h6jq.fsf@cisco.com>
References: <1205767431.25950.138.camel@mtls03>  <aday782h6jq.fsf@cisco.com>
Message-ID: <1207064146.3781.19.camel@mtls03>

Roland,

would like me to re-generate the mlx4 LSO patch to match this commit or
would you do the adjustments?

On Fri, 2008-03-28 at 14:39 -0700, Roland Dreier wrote:
> thanks, applied as below.
> 
> For now I left the IB_WR_LSO opcode rather than a send flag, since the
> mlx4 internal implementation is as a new opcode.  However since this
> is kernel-internal we can revisit this and I'm happy if the discussion
> continues.
> 
> From 86a0dd93c39739a39d6b5f7f67d4b2456c5f45ae Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli at dev.mellanox.co.il>
> Date: Mon, 17 Mar 2008 17:23:51 +0200
> Subject: [PATCH] IB/core: Add IPoIB UD LSO support
> 
> LSO (large send offload) allows the networking stack to pass SKBs with
> data size larger than the MTU to the IPoIB driver and have the HCA HW
> fragment the data to multiple MSS-sized packets.  Add a device
> capability flag IB_DEVICE_UD_TSO for devices that can perform TCP
> segmentation offload, a new send work request opcode IB_WR_LSO,
> header, hlen and mss fields for the work request structure, and a new
> IB_WC_LSO completion type.
> 
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  include/rdma/ib_verbs.h |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 3ac7371..5fe7723 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -104,6 +104,7 @@ enum ib_device_cap_flags {
>  	 * IPoIB driver may set NETIF_F_IP_CSUM for datagram mode.
>  	 */
>  	IB_DEVICE_UD_IP_CSUM		= (1<<18),
> +	IB_DEVICE_UD_TSO		= (1<<19),
>  	IB_DEVICE_SEND_W_INV		= (1<<21),
>  };
>  
> @@ -412,6 +413,7 @@ enum ib_wc_opcode {
>  	IB_WC_COMP_SWAP,
>  	IB_WC_FETCH_ADD,
>  	IB_WC_BIND_MW,
> +	IB_WC_LSO,
>  /*
>   * Set value of IB_WC_RECV so consumers can test if a completion is a
>   * receive by testing (opcode & IB_WC_RECV).
> @@ -623,7 +625,8 @@ enum ib_wr_opcode {
>  	IB_WR_SEND_WITH_IMM,
>  	IB_WR_RDMA_READ,
>  	IB_WR_ATOMIC_CMP_AND_SWP,
> -	IB_WR_ATOMIC_FETCH_AND_ADD
> +	IB_WR_ATOMIC_FETCH_AND_ADD,
> +	IB_WR_LSO
>  };
>  
>  enum ib_send_flags {
> @@ -662,6 +665,9 @@ struct ib_send_wr {
>  		} atomic;
>  		struct {
>  			struct ib_ah *ah;
> +			void   *header;
> +			int     hlen;
> +			int     mss;
>  			u32	remote_qpn;
>  			u32	remote_qkey;
>  			u16	pkey_index; /* valid for GSI only */


From eli at dev.mellanox.co.il  Tue Apr  1 08:35:46 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 01 Apr 2008 18:35:46 +0300
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <aday782h6jq.fsf@cisco.com>
References: <1205767431.25950.138.camel@mtls03>  <aday782h6jq.fsf@cisco.com>
Message-ID: <1207064146.3781.19.camel@mtls03>

Roland,

would like me to re-generate the mlx4 LSO patch to match this commit or
would you do the adjustments?

On Fri, 2008-03-28 at 14:39 -0700, Roland Dreier wrote:
> thanks, applied as below.
> 
> For now I left the IB_WR_LSO opcode rather than a send flag, since the
> mlx4 internal implementation is as a new opcode.  However since this
> is kernel-internal we can revisit this and I'm happy if the discussion
> continues.
> 
> From 86a0dd93c39739a39d6b5f7f67d4b2456c5f45ae Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli at dev.mellanox.co.il>
> Date: Mon, 17 Mar 2008 17:23:51 +0200
> Subject: [PATCH] IB/core: Add IPoIB UD LSO support
> 
> LSO (large send offload) allows the networking stack to pass SKBs with
> data size larger than the MTU to the IPoIB driver and have the HCA HW
> fragment the data to multiple MSS-sized packets.  Add a device
> capability flag IB_DEVICE_UD_TSO for devices that can perform TCP
> segmentation offload, a new send work request opcode IB_WR_LSO,
> header, hlen and mss fields for the work request structure, and a new
> IB_WC_LSO completion type.
> 
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  include/rdma/ib_verbs.h |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 3ac7371..5fe7723 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -104,6 +104,7 @@ enum ib_device_cap_flags {
>  	 * IPoIB driver may set NETIF_F_IP_CSUM for datagram mode.
>  	 */
>  	IB_DEVICE_UD_IP_CSUM		= (1<<18),
> +	IB_DEVICE_UD_TSO		= (1<<19),
>  	IB_DEVICE_SEND_W_INV		= (1<<21),
>  };
>  
> @@ -412,6 +413,7 @@ enum ib_wc_opcode {
>  	IB_WC_COMP_SWAP,
>  	IB_WC_FETCH_ADD,
>  	IB_WC_BIND_MW,
> +	IB_WC_LSO,
>  /*
>   * Set value of IB_WC_RECV so consumers can test if a completion is a
>   * receive by testing (opcode & IB_WC_RECV).
> @@ -623,7 +625,8 @@ enum ib_wr_opcode {
>  	IB_WR_SEND_WITH_IMM,
>  	IB_WR_RDMA_READ,
>  	IB_WR_ATOMIC_CMP_AND_SWP,
> -	IB_WR_ATOMIC_FETCH_AND_ADD
> +	IB_WR_ATOMIC_FETCH_AND_ADD,
> +	IB_WR_LSO
>  };
>  
>  enum ib_send_flags {
> @@ -662,6 +665,9 @@ struct ib_send_wr {
>  		} atomic;
>  		struct {
>  			struct ib_ah *ah;
> +			void   *header;
> +			int     hlen;
> +			int     mss;
>  			u32	remote_qpn;
>  			u32	remote_qkey;
>  			u16	pkey_index; /* valid for GSI only */


From sashak at voltaire.com  Tue Apr  1 11:45:14 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 1 Apr 2008 18:45:14 +0000
Subject: [ofa-general] [PATCH] opensm/configure.in: make prefix routes config
	file configurable
In-Reply-To: <20080401175246.GC27321@sashak.voltaire.com>
References: <20080330232119.GM13708@sashak.voltaire.com>
	<47F023B6.2070302@mellanox.co.il>
	<20080401174508.GB27321@sashak.voltaire.com>
	<20080401175246.GC27321@sashak.voltaire.com>
Message-ID: <20080401184514.GD27321@sashak.voltaire.com>


Add configuration ability for prefix routes config file, similar to other
OpenSM config files.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/configure.in    |   20 ++++++++++++++++++++
 opensm/man/opensm.8.in |    2 +-
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/opensm/configure.in b/opensm/configure.in
index 1f2bed5..7cf7076 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -166,6 +166,26 @@ AC_DEFINE_UNQUOTED(HAVE_DEFAULT_QOS_POLICY_FILE,
 	[Define a QOS policy config file])
 AC_SUBST(QOS_POLICY_FILE)
 
+dnl Check for a different prefix-routes file
+PREFIX_ROUTES_FILE=prefix-routes.conf
+AC_MSG_CHECKING(for --with-prefix-routes-conf)
+AC_ARG_WITH(prefix-routes-conf,
+    AC_HELP_STRING([--with-prefix-routes-conf=file],
+                   [define a Prefix Routes config file (default is prefix-routes.conf)]),
+    [ case "$withval" in
+    no)
+        ;;
+    *)
+        PREFIX_ROUTES_FILE=$withval
+        ;;
+    esac ]
+)
+AC_MSG_RESULT($PREFIX_ROUTES_FILE)
+AC_DEFINE_UNQUOTED(HAVE_DEFAULT_PREFIX_ROUTES_FILE,
+	["$OPENSM_CONFIG/$PREFIX_ROUTES_FILE"],
+	[Define a Prefix Routes config file])
+AC_SUBST(PREFIX_ROUTES_FILE)
+
 dnl select example event plugin or not
 OPENIB_OSM_DEFAULT_EVENT_PLUGIN_SEL
 
diff --git a/opensm/man/opensm.8.in b/opensm/man/opensm.8.in
index e93844d..1c47160 100644
--- a/opensm/man/opensm.8.in
+++ b/opensm/man/opensm.8.in
@@ -938,7 +938,7 @@ default partition config file
 default QOS policy config file
 
 .TP
-.B @OPENSM_CONFIG_DIR@/prefix-routes.conf
+.B @OPENSM_CONFIG_DIR@/@PREFIX_ROUTES_FILE@
 default prefix routes file.
 
 .SH AUTHORS
-- 
1.5.4.1.122.gaa8d


From sashak at voltaire.com  Tue Apr  1 11:55:09 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 1 Apr 2008 18:55:09 +0000
Subject: [ofa-general] [PATCH] opensm/osm_base.h: use OPENSM_COFNIG_DIR in
	config files paths definitions
In-Reply-To: <20080401184514.GD27321@sashak.voltaire.com>
References: <20080330232119.GM13708@sashak.voltaire.com>
	<47F023B6.2070302@mellanox.co.il>
	<20080401174508.GB27321@sashak.voltaire.com>
	<20080401175246.GC27321@sashak.voltaire.com>
	<20080401184514.GD27321@sashak.voltaire.com>
Message-ID: <20080401185509.GE27321@sashak.voltaire.com>


Use OPENSM_CONFIG_DIR for config files paths definitions when
appropriate HAVE_*_FILE macros are not set. Use /etc/opensm as default
OpenSM config directory.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_base.h |   32 ++++++++++++++++----------------
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 1a9abf0..cbe8205 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -224,12 +224,12 @@ BEGIN_C_DECLS
 */
 #ifdef __WIN__
 #define OSM_DEFAULT_PARTITION_CONFIG_FILE strcat(GetOsmCachePath(), "osm-partitions.conf")
-#else /* !__WIN__ */
-#   ifdef HAVE_DEFAULT_PARTITION_CONFIG_FILE
-#      define OSM_DEFAULT_PARTITION_CONFIG_FILE HAVE_DEFAULT_PARTITION_CONFIG_FILE
-#   else /* !HAVE_DEFAULT_PARTITION_CONFIG_FILE */
-#      define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/ofa/opensm-partitions.conf"
-#   endif /* HAVE_DEFAULT_PARTITION_CONFIG_FILE */
+#elif defined(HAVE_DEFAULT_PARTITION_CONFIG_FILE)
+#define OSM_DEFAULT_PARTITION_CONFIG_FILE HAVE_DEFAULT_PARTITION_CONFIG_FILE
+#elif defined(OSM_CONFIG_DIR)
+#define OSM_DEFAULT_PARTITION_CONFIG_FILE OPENSM_CONFIG_DIR "/partitions.conf"
+#else
+#define OSM_DEFAULT_PARTITION_CONFIG_FILE "/etc/opensm/partitions.conf"
 #endif /* __WIN__ */
 /***********/
 
@@ -244,12 +244,12 @@ BEGIN_C_DECLS
 */
 #ifdef __WIN__
 #define OSM_DEFAULT_QOS_POLICY_FILE strcat(GetOsmCachePath(), "osm-qos-policy.conf")
-#else /* !__WIN__ */
-#   ifdef HAVE_DEFAULT_QOS_POLICY_FILE
-#      define OSM_DEFAULT_QOS_POLICY_FILE HAVE_DEFAULT_QOS_POLICY_FILE
-#   else /* !HAVE_DEFAULT_QOS_POLICY_FILE */
-#      define OSM_DEFAULT_QOS_POLICY_FILE "/etc/ofa/opensm-qos-policy.conf"
-#   endif /* HAVE_DEFAULT_QOS_POLICY_FILE */
+#elif defined(HAVE_DEFAULT_QOS_POLICY_FILE)
+#define OSM_DEFAULT_QOS_POLICY_FILE HAVE_DEFAULT_QOS_POLICY_FILE
+#elif defined(OSM_CONFIG_DIR)
+#define OSM_DEFAULT_QOS_POLICY_FILE OPENSM_CONFIG_DIR "/qos-policy.conf"
+#else
+#define OSM_DEFAULT_QOS_POLICY_FILE "/etc/opensm/qos-policy.conf"
 #endif /* __WIN__ */
 /***********/
 
@@ -264,12 +264,12 @@ BEGIN_C_DECLS
 */
 #ifdef __WIN__
 #define OSM_DEFAULT_PREFIX_ROUTES_FILE strcat(GetOsmCachePath(), "osm-prefix-routes.conf")
-#else
-#ifdef OPENSM_CONFIG_DIR
+#elif defined(HAVE_DEFAULT_PREFIX_ROUTES_FILE)
+#define OSM_DEFAULT_PREFIX_ROUTES_FILE HAVE_DEFAULT_PREFIX_ROUTES_FILE
+#elif defined(OPENSM_CONFIG_DIR)
 #define OSM_DEFAULT_PREFIX_ROUTES_FILE OPENSM_CONFIG_DIR "/prefix-routes.conf"
 #else
-#define OSM_DEFAULT_PREFIX_ROUTES_FILE "/etc/ofa/opensm-prefix-routes.conf"
-#endif
+#define OSM_DEFAULT_PREFIX_ROUTES_FILE "/etc/opensm/prefix-routes.conf"
 #endif
 /***********/
 
-- 
1.5.4.1.122.gaa8d


From gstreiff at NetEffect.com  Tue Apr  1 09:11:47 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Tue, 1 Apr 2008 11:11:47 -0500
Subject: [ofa-general] RE: [ewg] OFED March 24 meeting summary on OFED 1.4
	plans
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC079500B6@venom2>


> OFED March 24 meeting summary about OFED 1.4 and 1.3.1 plans: 
> 1.3.1 Release: 
> As we decided we should do a release in 2-3 month after 1.3. 
> In addition if there are any special fixes as outcome from the 
> interop we can do a release earlier. 
> All - please send me your requests for fixed issues and needed time 
> frame and I will publish 1.3.1 schedule based on this.

Hi, Tziporet.

Just to refresh what I said at the last conference call, NetEffect
has at least one fix (already upstream), that we would like to see
in an OFED 1.3.1 build.  In terms of desired timeframe...late April
or early May?

Have fun at Sonoma.  Dave Sommers from NetEffect will be there 
while I work through my backlog. :-/

Regards,

Glenn


From Arkady.Kanevsky at netapp.com  Tue Apr  1 09:30:25 2008
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Tue, 1 Apr 2008 12:30:25 -0400
Subject: [ofa-general] Re: files preamble
In-Reply-To: <47EF7054.6020503@voltaire.com>
References: <C98692FD98048C41885E0B0FACD9DFB80683EFC8@exnane01.hq.netapp.com>	<47E5EF49.9080506@voltaire.com>	<47EC06AF.8000309@sun.com><47EF6B57.40502@voltaire.com>
	<47EF7054.6020503@voltaire.com>
Message-ID: <C98692FD98048C41885E0B0FACD9DFB806961478@exnane01.hq.netapp.com>

I am very doubtful that you can remote it.
Some of that is based on earlier work by IBM in DAPL which was submitted
under 3 licenses.

My goal was to globally change preambule from OpenIB to OpenFabrics.

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Or Gerlitz [mailto:ogerlitz at voltaire.com] 
> Sent: Sunday, March 30, 2008 6:50 AM
> To: Ted H. Kim
> Cc: openib-general at openib.org
> Subject: Re: [ofa-general] Re: files preamble
> 
> Or Gerlitz wrote:
> > Ted H. Kim wrote:
> >> For example, it appears addr.c, cma.c, ib_addr.h, rdma_cm.h and 
> >> rdma_cm_ib.h -- all have the "Common Public License 1.0"
> >> mentioned.
> >>
> > I have no idea what the common public license is, generally 
> speaking I 
> > think it would be fine if you send a patch that removes it from all 
> > the files under drivers/infiniband and include/rdma.
> Hi Ted,
> 
> OK, as its about legals, if you want to drive the removal of 
> this license from the files, best if its first being 
> discussed in an appropriate forum which I am not sure if this 
> list is, so in that respect,  I take back my proposal for you 
> to send a patch...
> 
> Or.
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From arlin.r.davis at intel.com  Tue Apr  1 11:17:22 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 1 Apr 2008 11:17:22 -0700
Subject: [ofa-general] [PATCH 1/1][v2] dapl: calculate private data size
	based on transport type and cma_hdr overhead
Message-ID: <000001c89424$a1673c10$9f97070a@amr.corp.intel.com>


Need to adjust CM private date size based on different transport types.
Add hca_ptr to dapls_ib_private_data_size call for transport type
validation via verbs device. Add definitions to include iWARP size
of 512 and subtract 36 bytes for cma_hdr overhead.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_adapter_util.h         |    3 ++-
 dapl/common/dapl_cr_callback.c          |    3 ++-
 dapl/common/dapl_ep_connect.c           |    4 +++-
 dapl/common/dapl_evd_connection_callb.c |    4 +++-
 dapl/common/dapl_ia_query.c             |    5 +++--
 dapl/ibal-scm/dapl_ibal-scm_cm.c        |    4 +++-
 dapl/ibal/dapl_ibal_cm.c                |    2 ++
 dapl/openib/dapl_ib_cm.c                |    4 +++-
 dapl/openib_cma/dapl_ib_cm.c            |   10 ++++++++--
 dapl/openib_cma/dapl_ib_util.h          |   14 ++++++++------
 dapl/openib_scm/dapl_ib_cm.c            |    4 +++-
 11 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/dapl/common/dapl_adapter_util.h b/dapl/common/dapl_adapter_util.h
index 6738d6a..d664bf6 100755
--- a/dapl/common/dapl_adapter_util.h
+++ b/dapl/common/dapl_adapter_util.h
@@ -239,7 +239,8 @@ DAT_RETURN dapls_ib_cm_remote_addr (
 
 int dapls_ib_private_data_size (
 	IN  DAPL_PRIVATE		*prd_ptr,
-	IN  DAPL_PDATA_OP		conn_op);
+	IN  DAPL_PDATA_OP		conn_op,
+	IN  DAPL_HCA			*hca_ptr);
 
 void 
 dapls_query_provider_specific_attr(
diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c
index e8f58a4..46d2b4c 100644
--- a/dapl/common/dapl_cr_callback.c
+++ b/dapl/common/dapl_cr_callback.c
@@ -378,7 +378,8 @@ dapli_connection_request (
     else
     {
 	cr_ptr->param.private_data_size = 
-		dapls_ib_private_data_size (prd_ptr, DAPL_PDATA_CONN_REQ);
+		dapls_ib_private_data_size(prd_ptr, DAPL_PDATA_CONN_REQ,
+					   sp_ptr->header.owner_ia->hca_ptr);
     }
     if (cr_ptr->param.private_data_size > 0)
     {
diff --git a/dapl/common/dapl_ep_connect.c b/dapl/common/dapl_ep_connect.c
index 12d391f..f290ebe 100755
--- a/dapl/common/dapl_ep_connect.c
+++ b/dapl/common/dapl_ep_connect.c
@@ -258,7 +258,9 @@ dapl_ep_connect (
      */
     req_hdr_size = (sizeof (DAPL_PRIVATE) - DAPL_MAX_PRIVATE_DATA_SIZE);
 
-    max_req_pdata_size = dapls_ib_private_data_size (NULL, DAPL_PDATA_CONN_REQ);
+    max_req_pdata_size = dapls_ib_private_data_size(
+				NULL, DAPL_PDATA_CONN_REQ,
+				ep_ptr->header.owner_ia->hca_ptr);
 
     if (private_data_size + req_hdr_size > (DAT_COUNT)max_req_pdata_size) 
     {
diff --git a/dapl/common/dapl_evd_connection_callb.c b/dapl/common/dapl_evd_connection_callb.c
index 3c4e0cb..d3a39a6 100644
--- a/dapl/common/dapl_evd_connection_callb.c
+++ b/dapl/common/dapl_evd_connection_callb.c
@@ -148,7 +148,9 @@ dapl_evd_connection_callback (
 	    else
 	    {
 		private_data_size  = 
-		    dapls_ib_private_data_size (prd_ptr, DAPL_PDATA_CONN_REP);
+		    dapls_ib_private_data_size(
+					prd_ptr, DAPL_PDATA_CONN_REP,
+		    			ep_ptr->header.owner_ia->hca_ptr);
 	    }
 
 	    if (private_data_size > 0)
diff --git a/dapl/common/dapl_ia_query.c b/dapl/common/dapl_ia_query.c
index 593f356..a8c39a3 100755
--- a/dapl/common/dapl_ia_query.c
+++ b/dapl/common/dapl_ia_query.c
@@ -156,8 +156,9 @@ dapl_ia_query (
 	 * 	to 0 unless IBHOSTS_NAMING is enabled.
 	 */
 	provider_attr->max_private_data_size 	  = 
-	    dapls_ib_private_data_size (NULL, DAPL_PDATA_CONN_REQ) -
-	    (sizeof (DAPL_PRIVATE) - DAPL_MAX_PRIVATE_DATA_SIZE);
+	    dapls_ib_private_data_size(NULL, DAPL_PDATA_CONN_REQ,
+			ia_ptr->hca_ptr) -
+			(sizeof(DAPL_PRIVATE) - DAPL_MAX_PRIVATE_DATA_SIZE); 
 	provider_attr->supports_multipath 	  = DAT_FALSE;
 	provider_attr->ep_creator 		  = DAT_PSP_CREATES_EP_NEVER;
 	provider_attr->optimal_buffer_alignment   = DAT_OPTIMAL_ALIGNMENT;
diff --git a/dapl/ibal-scm/dapl_ibal-scm_cm.c b/dapl/ibal-scm/dapl_ibal-scm_cm.c
index 692e5b9..fcf5215 100644
--- a/dapl/ibal-scm/dapl_ibal-scm_cm.c
+++ b/dapl/ibal-scm/dapl_ibal-scm_cm.c
@@ -1019,6 +1019,7 @@ dapls_ib_cm_remote_addr (
  * Input:
  *	prd_ptr		private data pointer
  *	conn_op		connection operation type
+ *      hca_ptr         hca pointer, needed for transport type
  *
  * If prd_ptr is NULL, this is a query for the max size supported by
  * the provider, otherwise it is the actual size of the private data
@@ -1034,7 +1035,8 @@ dapls_ib_cm_remote_addr (
  */
 int dapls_ib_private_data_size (
 	IN      DAPL_PRIVATE	*prd_ptr,
-	IN	DAPL_PDATA_OP	conn_op)
+	IN	DAPL_PDATA_OP	conn_op,
+	IN      DAPL_HCA	*hca_ptr)
 {
 	int  size;
 
diff --git a/dapl/ibal/dapl_ibal_cm.c b/dapl/ibal/dapl_ibal_cm.c
index 9f3ffc4..6cd652f 100644
--- a/dapl/ibal/dapl_ibal_cm.c
+++ b/dapl/ibal/dapl_ibal_cm.c
@@ -1679,6 +1679,7 @@ dapls_ib_cr_handoff (
  * Return the size of private data given a connection op type
  *
  * Input:
+ *	hca_ptr		hca pointer, needed for transport type
  *	prd_ptr		private data pointer
  *	conn_op		connection operation type
  *
@@ -1697,6 +1698,7 @@ dapls_ib_cr_handoff (
  */
 int
 dapls_ib_private_data_size (
+	IN	DAPL_HCA		*hca_ptr,
 	IN      DAPL_PRIVATE		*prd_ptr,
 	IN	DAPL_PDATA_OP		conn_op)
 {
diff --git a/dapl/openib/dapl_ib_cm.c b/dapl/openib/dapl_ib_cm.c
index 2ff2ba0..76d5968 100644
--- a/dapl/openib/dapl_ib_cm.c
+++ b/dapl/openib/dapl_ib_cm.c
@@ -1049,6 +1049,7 @@ dapls_ib_cm_remote_addr (
  * Input:
  *	prd_ptr		private data pointer
  *	conn_op		connection operation type
+ *      hca_ptr         hca pointer, needed for transport type
  *
  * If prd_ptr is NULL, this is a query for the max size supported by
  * the provider, otherwise it is the actual size of the private data
@@ -1064,7 +1065,8 @@ dapls_ib_cm_remote_addr (
  */
 int dapls_ib_private_data_size (
 	IN      DAPL_PRIVATE	*prd_ptr,
-	IN	DAPL_PDATA_OP	conn_op)
+	IN	DAPL_PDATA_OP	conn_op,
+	IN	DAPL_HCA	*hca_ptr)
 {
 	int  size;
 
diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c
index 04b9e41..cf79142 100755
--- a/dapl/openib_cma/dapl_ib_cm.c
+++ b/dapl/openib_cma/dapl_ib_cm.c
@@ -972,6 +972,7 @@ dapls_ib_cm_remote_addr(IN DAT_HANDLE dat_handle, OUT DAT_SOCK_ADDR6 *raddr)
  * Input:
  *	prd_ptr		private data pointer
  *	conn_op		connection operation type
+ *      hca_ptr         hca pointer, needed for transport type
  *
  * If prd_ptr is NULL, this is a query for the max size supported by
  * the provider, otherwise it is the actual size of the private data
@@ -985,11 +986,16 @@ dapls_ib_cm_remote_addr(IN DAT_HANDLE dat_handle, OUT DAT_SOCK_ADDR6 *raddr)
  * 	length of private data
  *
  */
-int dapls_ib_private_data_size(IN DAPL_PRIVATE	*prd_ptr,
-			       IN DAPL_PDATA_OP conn_op)
+int dapls_ib_private_data_size(	IN DAPL_PRIVATE	*prd_ptr,
+				IN DAPL_PDATA_OP conn_op,
+				IN DAPL_HCA     *hca_ptr)
 {
 	int  size;
 
+        if (hca_ptr->ib_hca_handle->device->transport_type 
+					== IBV_TRANSPORT_IWARP)
+		return(IWARP_MAX_PDATA_SIZE);
+
 	switch(conn_op)	{
 
 	case DAPL_PDATA_CONN_REQ:
diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h
index 2f01fc3..f35cb9d 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -113,12 +113,14 @@ typedef struct _ib_wait_obj_handle
 /* inline send rdma threshold */
 #define	INLINE_SEND_DEFAULT	64
 
-/* CM private data areas */
-#define	IB_MAX_REQ_PDATA_SIZE	48
-#define	IB_MAX_REP_PDATA_SIZE	196
-#define	IB_MAX_REJ_PDATA_SIZE	148
-#define	IB_MAX_DREQ_PDATA_SIZE	220
-#define	IB_MAX_DREP_PDATA_SIZE	224
+/* CMA private data areas */
+#define CMA_PDATA_HDR		36
+#define	IB_MAX_REQ_PDATA_SIZE	(92-CMA_PDATA_HDR)
+#define	IB_MAX_REP_PDATA_SIZE	(196-CMA_PDATA_HDR)
+#define	IB_MAX_REJ_PDATA_SIZE	(148-CMA_PDATA_HDR)
+#define	IB_MAX_DREQ_PDATA_SIZE	(220-CMA_PDATA_HDR)
+#define	IB_MAX_DREP_PDATA_SIZE	(224-CMA_PDATA_HDR)
+#define	IWARP_MAX_PDATA_SIZE	(512-CMA_PDATA_HDR)
 
 /* DTO OPs, ordered for DAPL ENUM definitions */
 #define OP_RDMA_WRITE           IBV_WR_RDMA_WRITE
diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c
index f534e8d..485ab9b 100644
--- a/dapl/openib_scm/dapl_ib_cm.c
+++ b/dapl/openib_scm/dapl_ib_cm.c
@@ -827,6 +827,7 @@ dapls_ib_cm_remote_addr (
  * Input:
  *	prd_ptr		private data pointer
  *	conn_op		connection operation type
+ *      hca_ptr         hca pointer, needed for transport type
  *
  * If prd_ptr is NULL, this is a query for the max size supported by
  * the provider, otherwise it is the actual size of the private data
@@ -842,7 +843,8 @@ dapls_ib_cm_remote_addr (
  */
 int dapls_ib_private_data_size (
 	IN      DAPL_PRIVATE	*prd_ptr,
-	IN	DAPL_PDATA_OP	conn_op)
+	IN	DAPL_PDATA_OP	conn_op,
+        IN	DAPL_HCA	*hca_ptr)
 {
 	int  size;
 
-- 
1.5.2.5


From chu11 at llnl.gov  Tue Apr  1 11:29:39 2008
From: chu11 at llnl.gov (Al Chu)
Date: Tue, 01 Apr 2008 11:29:39 -0700
Subject: [ofa-general] [Infiniband-Diags] [PATCH] saquery exit with non-zero
	code on bad input
Message-ID: <1207074579.15637.153.camel@cardanus.llnl.gov>

Hey Sasha,

If an input into saquery isn't found, saquery still exits with '0'
status, so it poses a problem in scripting.

This patch exits w/ non-zero if the input isn't found by saquery.

The actual status code I selected to return can be revised.  I just sort
of picked one.

Al

-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-exit-non-zero-if-saquery-input-not-found.patch
Type: text/x-patch
Size: 2134 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/e0bcdc39/attachment.bin>

From ttubby at pearlriverresort.com  Tue Apr  1 11:46:51 2008
From: ttubby at pearlriverresort.com (Rosendo Fox)
Date: Tue, 1 Apr 2008 13:46:51 -0500
Subject: [ofa-general] Vergleichen Sie Preise und kaufen Sie hier
Message-ID: <249563945.37193639129250@pearlriverresort.com>

Bekommen Sie Ihre Software unverz&#252;glich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europ&#228;ischen Sprachen verf&#252;gbar, programmiert f&#252;r Windows und Macintosh. Alle Softwaren sind sehr g&#252;nstig,  es handelt sich dabei garantiert um originale, komplette und v&#246;llig funktionale Versionen. Professionelle und pers&#246;nliche Beratung von unserem Kundencenter wird Ihnen sicherlich bei der Softwareinstallation helfen. Schnelle Antworten garantiert. Geld-Zur&#252;ck-Garantie ist verf&#252;gbar! Kaufen Sie die perfekt funktionierte Softwarehttp://geocities.com/wall_damion/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/63ef200f/attachment.html>

From mashirle at us.ibm.com  Tue Apr  1 03:52:04 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Tue, 01 Apr 2008 03:52:04 -0700
Subject: [ofa-general] Re: [RFC][1/2] IPoIB UD 4K MTU support
In-Reply-To: <47E61B05.2020003@voltaire.com>
References: <1206005880.8399.20.camel@localhost.localdomain>
	<47E61B05.2020003@voltaire.com>
Message-ID: <1207047124.4593.38.camel@localhost.localdomain>

Hello Or,

Thanks for your view.

> Reading ipoib_ud_skb_put_frags below and its usage in the patch that follows, its unclear to me if IPOIB_UD_MAX_PAYLOAD is being made of (4K - IPOIB_ENCAP_LEN) + IPOIB_ENCAP_LEN or from adjustment to some IP header alignment constraint. Specifically, the design I'd like to see here is that the IPoIB header telling the type of the frame (ARP, IPv4, IPv6, etc) is provided up to the stack as part of the packet in the skb (eg its very useful with tcpdump/etc filters). 

The max payload is the max IB mtu here. It's 4K. IPoIB mtu = IB-mtu -
IPoIB header = 4K - 4.

> Reading earlier threads I see that Roland suggested to allow for upto 4K-4 mtu towards the stack and use some internal buffer for the GRH where this buffer can be allocated and dma mapped once and being forget from till the driver cleans up, etc. Was there any problem with this approach?

The implementation here is using one buffer for PAGE_SIZE greater than
4K. Using two buffers for PAGE_SIZE = 4K. One buffer is 4K-4 IPoIB MTU
which contains data. One buffer is 44 bytes header (GRH header + IPoIB
header). I uses a generic routine for IPoIB receiving path regardless of
MTU size, it significantly reduces the size of the patch.

We can't just dam map once for this combined header(GRH header + IPoIB
header) buffer, GRH header is obsoleted, but not IPoIB header, right?

What Roland suggested before was to have GRH in one buffer, IPoIB header
and data in the second buffer. If we do so, the total size of the second
buffer is 4K, plus the IP header alignment (12 bytes), it will exceed
one page size, which is the problem we are trying to solve here.

> > +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv,
> > +					  struct sk_buff *skb,
> > +					  unsigned int length)
> > +{
> > +	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
> > +		skb_frag_t *frag = &skb_shinfo(skb)->frags[0];
> > +		/*
> > +		 * There is only two buffers needed for max_payload = 4K,
> > +		 * first buf size is IPOIB_UD_HEAD_SIZE
> > +		 */
> > +		skb->tail += IPOIB_UD_HEAD_SIZE;
> > +		frag->size = length - IPOIB_UD_HEAD_SIZE;
> > +		skb->data_len += frag->size;
> > +		skb->truesize += frag->size;
> > +		skb->len += length;
> > +	} else
> > +		skb_put(skb, length);
> > +
> > +}
> >   
> I fail to follow what this code really wants to do and how it does it. 
> Is there a must to touch "by hand" all the internal skb fields?

Since there is only two S/G, this way uses less instructions for
adjusting length of skb with fragments to match received data. You can
refer other device driver like skb_put_frags() is used in ipoib-cm.

>  also 
> this function is called once by ipoib_ib_handle_rx_wc in the patch that 
> follows, any reason not to make it static over there?

It's not necessary to be there, I can move this function to ipoib_ib.c
instead.

Thanks
Shirley


From or.gerlitz at gmail.com  Tue Apr  1 11:54:26 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Tue, 1 Apr 2008 21:54:26 +0300
Subject: [ofa-general] Re: files preamble
In-Reply-To: <C98692FD98048C41885E0B0FACD9DFB806961478@exnane01.hq.netapp.com>
References: <C98692FD98048C41885E0B0FACD9DFB80683EFC8@exnane01.hq.netapp.com>
	<47E5EF49.9080506@voltaire.com> <47EC06AF.8000309@sun.com>
	<47EF6B57.40502@voltaire.com> <47EF7054.6020503@voltaire.com>
	<C98692FD98048C41885E0B0FACD9DFB806961478@exnane01.hq.netapp.com>
Message-ID: <15ddcffd0804011154o3fa5c18bu552952bbfce5902f@mail.gmail.com>

>> Ted H. Kim wrote:
>>> For example, it appears addr.c, cma.c, ib_addr.h, rdma_cm.h and
>>> rdma_cm_ib.h -- all have the "Common Public License 1.0"


On 4/1/08, Kanevsky, Arkady <Arkady.Kanevsky at netapp.com> wrote:
>
> I am very doubtful that you can remote it.
> Some of that is based on earlier work by IBM in DAPL which was submitted
> under 3 licenses.
>
> What?! As far as I know all these files were written by Sean from scratch.


Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/5328d6a1/attachment.html>

From rdreier at cisco.com  Tue Apr  1 12:41:26 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 12:41:26 -0700
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <1207064146.3781.19.camel@mtls03> (Eli Cohen's message of "Tue,
	01 Apr 2008 18:35:46 +0300")
References: <1205767431.25950.138.camel@mtls03> <aday782h6jq.fsf@cisco.com>
	<1207064146.3781.19.camel@mtls03>
Message-ID: <ada8wzxcqhl.fsf@cisco.com>

 > would like me to re-generate the mlx4 LSO patch to match this commit or
 > would you do the adjustments?

Sorry for being so slow.

Anyway I did the adjustments as below.  I also removed the "reserve"
variable and moved the 64 byte extra for LSO into send_wqe_overhead(),
since it seemed that the only place where you used send_wqe_overhead()
without adding in reserve was actually a bug.

I also did various changes other places, and maybe introduced a bug:
when I try NPtcp between two systems (once running unmodified
2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX
with FW 2.3.000), on the side with the LSO patch, I eventually get a
"local length error" or "local QP operation err" on a send.  It is an
LSO send of length 63744 with 17 fragments and an mss of 1992, so it
should be segmented into 32 packets.  Some of these sends complete
successfully but eventually one fails.  I'm still debugging but maybe
you have some idea?

When I get the local QP operation error, I get this in case it helps:

local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e)
CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e

 - R.

>From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at dev.mellanox.co.il>
Date: Tue, 25 Mar 2008 15:35:12 +0200
Subject: [PATCH] IB/mlx4: Add IPoIB LSO support

Add TSO support to the mlx4_ib driver.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/mlx4/cq.c      |    3 +
 drivers/infiniband/hw/mlx4/main.c    |    2 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    5 ++
 drivers/infiniband/hw/mlx4/qp.c      |   72 +++++++++++++++++++++++++++++----
 drivers/net/mlx4/fw.c                |    9 ++++
 drivers/net/mlx4/fw.h                |    1 +
 drivers/net/mlx4/main.c              |    1 +
 include/linux/mlx4/device.h          |    1 +
 include/linux/mlx4/qp.h              |    5 ++
 9 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d2e32b0..7d70af7 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 		case MLX4_OPCODE_BIND_MW:
 			wc->opcode    = IB_WC_BIND_MW;
 			break;
+		case MLX4_OPCODE_LSO:
+			wc->opcode    = IB_WC_LSO;
+			break;
 		}
 	} else {
 		wc->byte_len = be32_to_cpu(cqe->byte_cnt);
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 6ea4746..e9330a0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 		props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE;
 	if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM)
 		props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM;
+	if (dev->dev->caps.max_gso_sz)
+		props->device_cap_flags |= IB_DEVICE_UD_TSO;
 
 	props->vendor_id	   = be32_to_cpup((__be32 *) (out_mad->data + 36)) &
 		0xffffff;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 3726e45..3f8bd0a 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -110,6 +110,10 @@ struct mlx4_ib_wq {
 	unsigned		tail;
 };
 
+enum mlx4_ib_qp_flags {
+	MLX4_IB_QP_LSO		= 1 << 0
+};
+
 struct mlx4_ib_qp {
 	struct ib_qp		ibqp;
 	struct mlx4_qp		mqp;
@@ -129,6 +133,7 @@ struct mlx4_ib_qp {
 	struct mlx4_mtt		mtt;
 	int			buf_size;
 	struct mutex		mutex;
+	u32			flags;
 	u8			port;
 	u8			alt_port;
 	u8			atomic_rd_en;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 320c25f..8ddb97e 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -71,6 +71,7 @@ enum {
 
 static const __be32 mlx4_ib_opcode[] = {
 	[IB_WR_SEND]			= __constant_cpu_to_be32(MLX4_OPCODE_SEND),
+	[IB_WR_LSO]			= __constant_cpu_to_be32(MLX4_OPCODE_LSO),
 	[IB_WR_SEND_WITH_IMM]		= __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM),
 	[IB_WR_RDMA_WRITE]		= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE),
 	[IB_WR_RDMA_WRITE_WITH_IMM]	= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM),
@@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
 	}
 }
 
-static int send_wqe_overhead(enum ib_qp_type type)
+static int send_wqe_overhead(enum ib_qp_type type, u32 flags)
 {
 	/*
 	 * UD WQEs must have a datagram segment.
@@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type)
 	switch (type) {
 	case IB_QPT_UD:
 		return sizeof (struct mlx4_wqe_ctrl_seg) +
-			sizeof (struct mlx4_wqe_datagram_seg);
+			sizeof (struct mlx4_wqe_datagram_seg) +
+			(flags & MLX4_IB_QP_LSO) ? 64 : 0;
 	case IB_QPT_UC:
 		return sizeof (struct mlx4_wqe_ctrl_seg) +
 			sizeof (struct mlx4_wqe_raddr_seg);
@@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	/* Sanity check SQ size before proceeding */
 	if (cap->max_send_wr	 > dev->dev->caps.max_wqes  ||
 	    cap->max_send_sge	 > dev->dev->caps.max_sq_sg ||
-	    cap->max_inline_data + send_wqe_overhead(type) +
+	    cap->max_inline_data + send_wqe_overhead(type, qp->flags) +
 	    sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz)
 		return -EINVAL;
 
@@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 
 	s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
 		cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
-		send_wqe_overhead(type);
+		send_wqe_overhead(type, qp->flags);
 
 	/*
 	 * Hermon supports shrinking WQEs, such that a single work
@@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	}
 
 	qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) -
-			 send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg);
+			 send_wqe_overhead(type, qp->flags)) /
+		sizeof (struct mlx4_wqe_data_seg);
 
 	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
 		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
@@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	} else {
 		qp->sq_no_prefetch = 0;
 
+		if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO)
+			qp->flags |= MLX4_IB_QP_LSO;
+
 		err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp);
 		if (err)
 			goto err;
@@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
 	struct mlx4_ib_qp *qp;
 	int err;
 
-	if (init_attr->create_flags)
+	/* We only support LSO, and only for kernel UD QPs. */
+	if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO)
+		return ERR_PTR(-EINVAL);
+	if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO &&
+	    (pd->uobject || init_attr->qp_type != IB_QPT_UD))
 		return ERR_PTR(-EINVAL);
 
 	switch (init_attr->qp_type) {
@@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		}
 	}
 
-	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI ||
-	    ibqp->qp_type == IB_QPT_UD)
+	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI)
 		context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
-	else if (attr_mask & IB_QP_PATH_MTU) {
+	else if (ibqp->qp_type == IB_QPT_UD) {
+		if (qp->flags & MLX4_IB_QP_LSO)
+			context->mtu_msgmax = (IB_MTU_4096 << 5) |
+					      ilog2(dev->dev->caps.max_gso_sz);
+		else
+			context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
+	} else if (attr_mask & IB_QP_PATH_MTU) {
 		if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) {
 			printk(KERN_ERR "path MTU (%u) is invalid\n",
 			       attr->path_mtu);
@@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg)
 	dseg->addr       = cpu_to_be64(sg->addr);
 }
 
+static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr,
+			 struct mlx4_ib_qp *qp, unsigned *lso_seg_len)
+{
+	unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
+
+	/*
+	 * This is a temporary limitation and will be removed in
+	 * a forthcoming FW release:
+	 */
+	if (unlikely(wr->wr.ud.hlen) > 60)
+		return -EINVAL;
+
+	if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
+		     wr->num_sge > qp->sq.max_gs - (halign >> 4)))
+		return -EINVAL;
+
+	memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen);
+
+	/* make sure LSO header is written before overwriting stamping */
+	wmb();
+
+	wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 |
+					wr->wr.ud.hlen);
+
+	*lso_seg_len = halign;
+	return 0;
+}
+
 int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		      struct ib_send_wr **bad_wr)
 {
@@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	unsigned ind;
 	int uninitialized_var(stamp);
 	int uninitialized_var(size);
+	unsigned seglen;
 	int i;
 
 	spin_lock_irqsave(&qp->sq.lock, flags);
@@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			set_datagram_seg(wqe, wr);
 			wqe  += sizeof (struct mlx4_wqe_datagram_seg);
 			size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
+
+			if (wr->opcode == IB_WR_LSO) {
+				err = build_lso_seg(wqe, wr, qp, &seglen);
+				if (err) {
+					*bad_wr = wr;
+					goto out;
+				}
+				wqe  += seglen;
+				size += seglen / 16;
+			}
 			break;
 
 		case IB_QPT_SMI:
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index f494c3e..d82f275 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_MAX_AV_OFFSET		0x27
 #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET		0x29
 #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET		0x2b
+#define QUERY_DEV_CAP_MAX_GSO_OFFSET		0x2d
 #define QUERY_DEV_CAP_MAX_RDMA_OFFSET		0x2f
 #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET		0x33
 #define QUERY_DEV_CAP_ACK_DELAY_OFFSET		0x35
@@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev_cap->max_requester_per_qp = 1 << (field & 0x3f);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET);
 	dev_cap->max_responder_per_qp = 1 << (field & 0x3f);
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET);
+	field &= 0x1f;
+	if (!field)
+		dev_cap->max_gso_sz = 0;
+	else
+		dev_cap->max_gso_sz = 1 << field;
+
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET);
 	dev_cap->max_rdma_global = 1 << (field & 0x3f);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET);
@@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
 	mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n",
 		 dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg);
+	mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz);
 
 	dump_dev_cap_flags(dev, dev_cap->flags);
 
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index e16dec8..306cb9b 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -96,6 +96,7 @@ struct mlx4_dev_cap {
 	u8  bmme_flags;
 	u32 reserved_lkey;
 	u64 max_icm_sz;
+	int max_gso_sz;
 };
 
 struct mlx4_adapter {
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 08bfc13..7cfbe75 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.page_size_cap	     = ~(u32) (dev_cap->min_page_sz - 1);
 	dev->caps.flags		     = dev_cap->flags;
 	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
+	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
 
 	return 0;
 }
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 6cdf813..ff7df1a 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -186,6 +186,7 @@ struct mlx4_caps {
 	u32			flags;
 	u16			stat_rate_support;
 	u8			port_width_cap[MLX4_MAX_PORTS + 1];
+	int			max_gso_sz;
 };
 
 struct mlx4_buf_list {
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index 31f9eb3..cf0bf4e 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg {
 	__be32			reservd[2];
 };
 
+struct mlx4_lso_seg {
+	__be32                  mss_hdr_size;
+	__be32                  header[0];
+};
+
 struct mlx4_wqe_bind_seg {
 	__be32			flags1;
 	__be32			flags2;
-- 
1.5.4.5


From rdreier at cisco.com  Tue Apr  1 12:41:26 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 12:41:26 -0700
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <1207064146.3781.19.camel@mtls03> (Eli Cohen's message of "Tue,
	01 Apr 2008 18:35:46 +0300")
References: <1205767431.25950.138.camel@mtls03> <aday782h6jq.fsf@cisco.com>
	<1207064146.3781.19.camel@mtls03>
Message-ID: <ada8wzxcqhl.fsf@cisco.com>

 > would like me to re-generate the mlx4 LSO patch to match this commit or
 > would you do the adjustments?

Sorry for being so slow.

Anyway I did the adjustments as below.  I also removed the "reserve"
variable and moved the 64 byte extra for LSO into send_wqe_overhead(),
since it seemed that the only place where you used send_wqe_overhead()
without adding in reserve was actually a bug.

I also did various changes other places, and maybe introduced a bug:
when I try NPtcp between two systems (once running unmodified
2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX
with FW 2.3.000), on the side with the LSO patch, I eventually get a
"local length error" or "local QP operation err" on a send.  It is an
LSO send of length 63744 with 17 fragments and an mss of 1992, so it
should be segmented into 32 packets.  Some of these sends complete
successfully but eventually one fails.  I'm still debugging but maybe
you have some idea?

When I get the local QP operation error, I get this in case it helps:

local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e)
CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e

 - R.

>From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at dev.mellanox.co.il>
Date: Tue, 25 Mar 2008 15:35:12 +0200
Subject: [PATCH] IB/mlx4: Add IPoIB LSO support

Add TSO support to the mlx4_ib driver.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/mlx4/cq.c      |    3 +
 drivers/infiniband/hw/mlx4/main.c    |    2 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    5 ++
 drivers/infiniband/hw/mlx4/qp.c      |   72 +++++++++++++++++++++++++++++----
 drivers/net/mlx4/fw.c                |    9 ++++
 drivers/net/mlx4/fw.h                |    1 +
 drivers/net/mlx4/main.c              |    1 +
 include/linux/mlx4/device.h          |    1 +
 include/linux/mlx4/qp.h              |    5 ++
 9 files changed, 90 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index d2e32b0..7d70af7 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 		case MLX4_OPCODE_BIND_MW:
 			wc->opcode    = IB_WC_BIND_MW;
 			break;
+		case MLX4_OPCODE_LSO:
+			wc->opcode    = IB_WC_LSO;
+			break;
 		}
 	} else {
 		wc->byte_len = be32_to_cpu(cqe->byte_cnt);
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 6ea4746..e9330a0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
 		props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE;
 	if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM)
 		props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM;
+	if (dev->dev->caps.max_gso_sz)
+		props->device_cap_flags |= IB_DEVICE_UD_TSO;
 
 	props->vendor_id	   = be32_to_cpup((__be32 *) (out_mad->data + 36)) &
 		0xffffff;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 3726e45..3f8bd0a 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -110,6 +110,10 @@ struct mlx4_ib_wq {
 	unsigned		tail;
 };
 
+enum mlx4_ib_qp_flags {
+	MLX4_IB_QP_LSO		= 1 << 0
+};
+
 struct mlx4_ib_qp {
 	struct ib_qp		ibqp;
 	struct mlx4_qp		mqp;
@@ -129,6 +133,7 @@ struct mlx4_ib_qp {
 	struct mlx4_mtt		mtt;
 	int			buf_size;
 	struct mutex		mutex;
+	u32			flags;
 	u8			port;
 	u8			alt_port;
 	u8			atomic_rd_en;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 320c25f..8ddb97e 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -71,6 +71,7 @@ enum {
 
 static const __be32 mlx4_ib_opcode[] = {
 	[IB_WR_SEND]			= __constant_cpu_to_be32(MLX4_OPCODE_SEND),
+	[IB_WR_LSO]			= __constant_cpu_to_be32(MLX4_OPCODE_LSO),
 	[IB_WR_SEND_WITH_IMM]		= __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM),
 	[IB_WR_RDMA_WRITE]		= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE),
 	[IB_WR_RDMA_WRITE_WITH_IMM]	= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM),
@@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
 	}
 }
 
-static int send_wqe_overhead(enum ib_qp_type type)
+static int send_wqe_overhead(enum ib_qp_type type, u32 flags)
 {
 	/*
 	 * UD WQEs must have a datagram segment.
@@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type)
 	switch (type) {
 	case IB_QPT_UD:
 		return sizeof (struct mlx4_wqe_ctrl_seg) +
-			sizeof (struct mlx4_wqe_datagram_seg);
+			sizeof (struct mlx4_wqe_datagram_seg) +
+			(flags & MLX4_IB_QP_LSO) ? 64 : 0;
 	case IB_QPT_UC:
 		return sizeof (struct mlx4_wqe_ctrl_seg) +
 			sizeof (struct mlx4_wqe_raddr_seg);
@@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	/* Sanity check SQ size before proceeding */
 	if (cap->max_send_wr	 > dev->dev->caps.max_wqes  ||
 	    cap->max_send_sge	 > dev->dev->caps.max_sq_sg ||
-	    cap->max_inline_data + send_wqe_overhead(type) +
+	    cap->max_inline_data + send_wqe_overhead(type, qp->flags) +
 	    sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz)
 		return -EINVAL;
 
@@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 
 	s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
 		cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
-		send_wqe_overhead(type);
+		send_wqe_overhead(type, qp->flags);
 
 	/*
 	 * Hermon supports shrinking WQEs, such that a single work
@@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
 	}
 
 	qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) -
-			 send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg);
+			 send_wqe_overhead(type, qp->flags)) /
+		sizeof (struct mlx4_wqe_data_seg);
 
 	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
 		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
@@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 	} else {
 		qp->sq_no_prefetch = 0;
 
+		if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO)
+			qp->flags |= MLX4_IB_QP_LSO;
+
 		err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp);
 		if (err)
 			goto err;
@@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
 	struct mlx4_ib_qp *qp;
 	int err;
 
-	if (init_attr->create_flags)
+	/* We only support LSO, and only for kernel UD QPs. */
+	if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO)
+		return ERR_PTR(-EINVAL);
+	if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO &&
+	    (pd->uobject || init_attr->qp_type != IB_QPT_UD))
 		return ERR_PTR(-EINVAL);
 
 	switch (init_attr->qp_type) {
@@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
 		}
 	}
 
-	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI ||
-	    ibqp->qp_type == IB_QPT_UD)
+	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI)
 		context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
-	else if (attr_mask & IB_QP_PATH_MTU) {
+	else if (ibqp->qp_type == IB_QPT_UD) {
+		if (qp->flags & MLX4_IB_QP_LSO)
+			context->mtu_msgmax = (IB_MTU_4096 << 5) |
+					      ilog2(dev->dev->caps.max_gso_sz);
+		else
+			context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
+	} else if (attr_mask & IB_QP_PATH_MTU) {
 		if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) {
 			printk(KERN_ERR "path MTU (%u) is invalid\n",
 			       attr->path_mtu);
@@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg)
 	dseg->addr       = cpu_to_be64(sg->addr);
 }
 
+static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr,
+			 struct mlx4_ib_qp *qp, unsigned *lso_seg_len)
+{
+	unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
+
+	/*
+	 * This is a temporary limitation and will be removed in
+	 * a forthcoming FW release:
+	 */
+	if (unlikely(wr->wr.ud.hlen) > 60)
+		return -EINVAL;
+
+	if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
+		     wr->num_sge > qp->sq.max_gs - (halign >> 4)))
+		return -EINVAL;
+
+	memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen);
+
+	/* make sure LSO header is written before overwriting stamping */
+	wmb();
+
+	wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 |
+					wr->wr.ud.hlen);
+
+	*lso_seg_len = halign;
+	return 0;
+}
+
 int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		      struct ib_send_wr **bad_wr)
 {
@@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	unsigned ind;
 	int uninitialized_var(stamp);
 	int uninitialized_var(size);
+	unsigned seglen;
 	int i;
 
 	spin_lock_irqsave(&qp->sq.lock, flags);
@@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			set_datagram_seg(wqe, wr);
 			wqe  += sizeof (struct mlx4_wqe_datagram_seg);
 			size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
+
+			if (wr->opcode == IB_WR_LSO) {
+				err = build_lso_seg(wqe, wr, qp, &seglen);
+				if (err) {
+					*bad_wr = wr;
+					goto out;
+				}
+				wqe  += seglen;
+				size += seglen / 16;
+			}
 			break;
 
 		case IB_QPT_SMI:
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index f494c3e..d82f275 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_MAX_AV_OFFSET		0x27
 #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET		0x29
 #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET		0x2b
+#define QUERY_DEV_CAP_MAX_GSO_OFFSET		0x2d
 #define QUERY_DEV_CAP_MAX_RDMA_OFFSET		0x2f
 #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET		0x33
 #define QUERY_DEV_CAP_ACK_DELAY_OFFSET		0x35
@@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev_cap->max_requester_per_qp = 1 << (field & 0x3f);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET);
 	dev_cap->max_responder_per_qp = 1 << (field & 0x3f);
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET);
+	field &= 0x1f;
+	if (!field)
+		dev_cap->max_gso_sz = 0;
+	else
+		dev_cap->max_gso_sz = 1 << field;
+
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET);
 	dev_cap->max_rdma_global = 1 << (field & 0x3f);
 	MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET);
@@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
 	mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n",
 		 dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg);
+	mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz);
 
 	dump_dev_cap_flags(dev, dev_cap->flags);
 
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index e16dec8..306cb9b 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -96,6 +96,7 @@ struct mlx4_dev_cap {
 	u8  bmme_flags;
 	u32 reserved_lkey;
 	u64 max_icm_sz;
+	int max_gso_sz;
 };
 
 struct mlx4_adapter {
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 08bfc13..7cfbe75 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.page_size_cap	     = ~(u32) (dev_cap->min_page_sz - 1);
 	dev->caps.flags		     = dev_cap->flags;
 	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
+	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
 
 	return 0;
 }
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 6cdf813..ff7df1a 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -186,6 +186,7 @@ struct mlx4_caps {
 	u32			flags;
 	u16			stat_rate_support;
 	u8			port_width_cap[MLX4_MAX_PORTS + 1];
+	int			max_gso_sz;
 };
 
 struct mlx4_buf_list {
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index 31f9eb3..cf0bf4e 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg {
 	__be32			reservd[2];
 };
 
+struct mlx4_lso_seg {
+	__be32                  mss_hdr_size;
+	__be32                  header[0];
+};
+
 struct mlx4_wqe_bind_seg {
 	__be32			flags1;
 	__be32			flags2;
-- 
1.5.4.5


From sean.hefty at intel.com  Tue Apr  1 12:50:15 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 1 Apr 2008 12:50:15 -0700
Subject: [ofa-general] RE: the port numbers in some of the rdmacm examples is
	a fixed value
In-Reply-To: <47F2324C.9060002@dev.mellanox.co.il>
References: <47EBBC81.4030501@dev.mellanox.co.il>
	<000101c89022$ce0b3d30$9c98070a@amr.corp.intel.com>
	<47EF2A80.1020804@dev.mellanox.co.il>
	<000101c8934b$265a46e0$37fc070a@amr.corp.intel.com>
	<47F2324C.9060002@dev.mellanox.co.il>
Message-ID: <000001c89431$9c3a1660$9b37170a@amr.corp.intel.com>

>O.k., i sent you one patch which contains:
>1) typo fixes (in test name of error message) + spelling typos
>2) start of port support to control the port numbers from the command line
>(if you wish, i can supply two different patches)
>
>Only a one minute work is required to close this issue and fix the port
>number support of the udaddy.

Thanks - I'll separate the patches and finish them.


From rdreier at cisco.com  Tue Apr  1 12:59:21 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 12:59:21 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1206452112.25950.360.camel@mtls03> (Eli Cohen's message of "Tue,
	25 Mar 2008 15:35:12 +0200")
References: <1206452112.25950.360.camel@mtls03>
Message-ID: <adazlsdbb3a.fsf@cisco.com>

 > +	halign = ALIGN(wr->wr.ud.hlen, 16);

This doesn't seem connected to the problem I see, but is this correct?
Suppose hlen is 48... then halign will be 48 but it really should be
64 I think.  Do we really want

	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);

instead?

 - R.


From rdreier at cisco.com  Tue Apr  1 12:59:21 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 12:59:21 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1206452112.25950.360.camel@mtls03> (Eli Cohen's message of "Tue,
	25 Mar 2008 15:35:12 +0200")
References: <1206452112.25950.360.camel@mtls03>
Message-ID: <adazlsdbb3a.fsf@cisco.com>

 > +	halign = ALIGN(wr->wr.ud.hlen, 16);

This doesn't seem connected to the problem I see, but is this correct?
Suppose hlen is 48... then halign will be 48 but it really should be
64 I think.  Do we really want

	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);

instead?

 - R.


From rdreier at cisco.com  Tue Apr  1 13:02:18 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 13:02:18 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
Message-ID: <adave31bayd.fsf@cisco.com>

The 2.6.26 will open soon, so it's time to review what my plans are
for the merge window opens.

As usual, patch review by non-me people is always welcome.

Anyway, here are all the pending things that I'm aware of.  As usual,
if something isn't already in my tree and isn't listed below, I
probably missed it or dropped it by mistake.  Please remind me again
in that case.

Core:

 - I did a bunch of cleanups all over drivers/infiniband and the
   gcc and sparse warning noise is down to a pretty reasonable level.
   Further cleanups welcome of course.

ULPs:

 - I merged Eli's IPoIB stateless offload changes for checksum
   offload and LSO changes.  The interrupt moderation changes are
   next, and should not be a problem to merge.  Please test IPoIB
   on all sorts of hardware!

 - Shirley's IPoIB 4 KB MTU changes.  I expect these to make it in,
   although I would certainly appreciate review from Eli or anyone else.

HW specific:

 - Vlad's mlx4 resize CQ support.  Looks basically OK, so I think we
   should be able to get it in.

 - ipath support for 7220 HCAs.  I don't expect any issues here once
   the patches appear.

Here are a few topics that I believe will not be ready in time for the
2.6.26 window and will need to wait for 2.6.27 at least:

 - XRC.  I still don't have a good feeling that we have settled on all
   the nuances of the ABI we want to expose to userspace for this, and
   ideally I would like to understand how ehca LL QPs fit into the
   picture as well.

 - Remove LLTX from IPoIB.  I haven't had time to finish this yet, so
   I guess it will probably wait for 2.6.27 now...

 - Multiple CQ event vector support.  I still haven't seen any
   discussions about how ULPs or userspace apps should decide which
   vector to use, and hence no progress has been made since we
   deferred this during the 2.6.23 merge window.

Here all the patches I already have in my for-2.6.26 branch:

Arthur Jones (4):
      IB/ipath: Fix sparse warning about pointer signedness
      IB/ipath: Misc sparse warning cleanup
      IB/ipath: Provide I/O bus speeds for diagnostic purposes
      IB/ipath: Fix link up LED display

Dave Olson (4):
      IB/ipath: Make some constants chip-specific, related cleanup
      IB/ipath: Shared context code needs to be sure device is usable
      IB/ipath: Enable 4KB MTU
      IB/ipath: HW workaround for case where chip can send but not receive

David Dillow (1):
      IB/srp: Enforce protocol limit on srp_sg_tablesize

Eli Cohen (7):
      IPoIB: Use checksum offload support if available
      IB/mlx4: Add IPoIB checksum offload support
      IB/mthca: Add IPoIB checksum offload support
      IB/core: Add creation flags to struct ib_qp_init_attr
      IB/core: Add IPoIB UD LSO support
      IPoIB: Add LSO support
      IB/mlx4: Add IPoIB LSO support

Harvey Harrison (1):
      IB: Replace remaining __FUNCTION__ occurrences with __func__

Hoang-Nam Nguyen (1):
      IB/ehca: Remove tgid checking

John Gregor (1):
      IB/ipath: Head of Line blocking vs forward progress of user apps

Julia Lawall (1):
      RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0

Michael Albaugh (2):
      IB/ipath: Prevent link-recovery code from negating admin disable
      IB/ipath: EEPROM support for 7220 devices, robustness improvements, cleanup

Ralph Campbell (11):
      IB/ipath: Fix byte order of pioavail in handle_errors()
      IB/ipath: Fix error recovery for send buffer status after chip freeze mode
      IB/ipath: Don't try to handle freeze mode HW errors if diagnostic mode
      IB/ipath: Make debug error message match the constraint that is checked for
      IB/ipath: Add code to support multiple link speeds and widths
      IB/ipath: Remove useless comments
      IB/ipath: Fix sanity checks on QP number of WRs and SGEs
      IB/ipath: Change the module author
      IB/ipath: Remove some useless (void) casts
      IB/ipath: Make send buffers available for kernel if not allocated to user
      IB/ipath: Use PIO buffer for RC ACKs

Robert P. J. Day (2):
      IB: Use shorter list_splice_init() for brevity
      RDMA/nes: Use more concise list_for_each_entry()

Roland Dreier (28):
      IB/mthca: Formatting cleanups
      IB/mlx4: Convert "if(foo)" to "if (foo)"
      mlx4_core: Move opening brace of function onto a new line
      RDMA/amso1100: Don't use 0UL as a NULL pointer
      RDMA/cxgb3: IDR IDs are signed
      IB: Make struct ib_uobject.id a signed int
      IB/ipath: Fix sparse warning about shadowed symbol
      IB/mlx4: Endianness annotations
      IB/cm: Endianness annotations
      RDMA/ucma: Endian annotation
      RDMA/nes: Trivial endianness annotations
      RDMA/nes: Delete unused variables
      RDMA/amso1100: Start of endianness annotation
      RDMA/amso1100: Endian annotate mqsq allocator
      mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums
      IB/uverbs: Don't store struct file * for event files
      IB/uverbs: Use alloc_file() instead of get_empty_filp()
      RDMA/nes: Remove redundant NULL check in nes_unregister_ofa_device()
      RDMA/nes: Remove unused nes_netdev_exit() function
      RDMA/nes: Use proper format and cast to print dma_addr_t
      RDMA/nes: Make symbols used only in a single source file static
      IB/ehca: Make symbols used only in a single source file static
      IB/core: Add support for "send with invalidate" work requests
      RDMA/amso1100: Add support for "send with invalidate" work requests
      IB/mthca: Avoid integer overflow when dealing with profile size
      IB/mthca: Avoid integer overflow when allocating huge ICM table
      IB/ipath: Fix PCI config write size used to clear linkctrl error bits
      RDMA/nes: Remove session_id from nes_cm stuff


From sean.hefty at intel.com  Tue Apr  1 13:03:32 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 1 Apr 2008 13:03:32 -0700
Subject: [ofa-general] Re: files preamble
In-Reply-To: <15ddcffd0804011154o3fa5c18bu552952bbfce5902f@mail.gmail.com>
References: <C98692FD98048C41885E0B0FACD9DFB80683EFC8@exnane01.hq.netapp.com>
	<47E5EF49.9080506@voltaire.com> <47EC06AF.8000309@sun.com>
	<47EF6B57.40502@voltaire.com> <47EF7054.6020503@voltaire.com>
	<C98692FD98048C41885E0B0FACD9DFB806961478@exnane01.hq.netapp.com>
	<15ddcffd0804011154o3fa5c18bu552952bbfce5902f@mail.gmail.com>
Message-ID: <000101c89433$76531b20$9b37170a@amr.corp.intel.com>

>> I am very doubtful that you can remote it.
>> Some of that is based on earlier work by IBM in DAPL which was submitted
>> under 3 licenses.
>
> What?! As far as I know all these files were written by Sean from scratch.

The code is not based on DAPL, and was written from scratch.  I'm pretty sure
that the 3 license are simple copy-paste mistakes.

I started work on the rdma_cm at the same time that someone else on the list
started working on it.  (I can't recall who at the moment.)  I'm guessing that
the original file came with the wrong license for OFA, and I copied it into the
other files without bothering to read it all that carefully.  I don't know if
this is something easily fixed or not.

I'd have to search through the mail list archives to get more of the details.

- Sean


From HenryloyCampbell at wordpress.net  Tue Apr  1 09:00:38 2008
From: HenryloyCampbell at wordpress.net (Jonathan Edwards)
Date: Tue, 1 Apr 2008 15:00:38 -0100
Subject: [ofa-general] After thatit's only fun and winning. 
Message-ID: <8IX813EJXVWDA699@wordpress.net>

Players from the United States and around the world! 

http://djjsikg.net.cn/


From rdreier at cisco.com  Tue Apr  1 13:28:43 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 13:28:43 -0700
Subject: [ofa-general] [PATCH/RFC] IB/mlx4: Micro-optimize
	mlx4_ib_post_send()
Message-ID: <adar6dpb9qc.fsf@cisco.com>

Rather than have build_mlx_header() return a negative value on failure
and the length of the segments it builds on success, add a pointer
parameter to return the length and return 0 on success.  This matches
the calling convention used for build_lso_seg() and generates slightly
smaller code -- eg, on 64-bit x86:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-19 (-19)
function                                     old     new   delta
mlx4_ib_post_send                           1999    1980     -19

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/mlx4/qp.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 8ddb97e..f805e8a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1200,7 +1200,7 @@ out:
 }
 
 static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
-			    void *wqe)
+			    void *wqe, unsigned *mlx_seg_len)
 {
 	struct ib_device *ib_dev = &to_mdev(sqp->qp.ibqp.device)->ib_dev;
 	struct mlx4_wqe_mlx_seg *mlx = wqe;
@@ -1321,7 +1321,9 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
 		i = 2;
 	}
 
-	return ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16);
+	*mlx_seg_len =
+		ALIGN(i * sizeof (struct mlx4_wqe_inline_seg) + header_size, 16);
+	return 0;
 }
 
 static int mlx4_wq_overflow(struct mlx4_ib_wq *wq, int nreq, struct ib_cq *ib_cq)
@@ -1548,15 +1550,13 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 
 		case IB_QPT_SMI:
 		case IB_QPT_GSI:
-			err = build_mlx_header(to_msqp(qp), wr, ctrl);
-			if (err < 0) {
+			err = build_mlx_header(to_msqp(qp), wr, ctrl, &seglen);
+			if (err) {
 				*bad_wr = wr;
 				goto out;
 			}
-			wqe  += err;
-			size += err / 16;
-
-			err = 0;
+			wqe  += seglen;
+			size += seglen / 16;
 			break;
 
 		default:
-- 
1.5.4.5


From rdreier at cisco.com  Tue Apr  1 13:39:21 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 13:39:21 -0700
Subject: [ofa-general] Re: IB/core: Add creation flags to QPs
In-Reply-To: <OFEF47BC02.461EC52C-ONC125741A.00667D4F-C125741A.0067B3C3@de.ibm.com>
	(Hoang-Nam Nguyen's message of "Fri, 28 Mar 2008 19:53:07 +0100")
References: <OFEF47BC02.461EC52C-ONC125741A.00667D4F-C125741A.0067B3C3@de.ibm.com>
Message-ID: <adamyodb98m.fsf@cisco.com>

 > What is your recommendation wrt/ encoding scheme for qp_type and
 > create_flags?

I don't think I know enough to make a pronouncement yet.

Maybe someone can summarize the possibilities and see how they work
for XRC, ehca LL, block-loopback, etc?

Bumping ABI is painful but on the other hand an explosion of new verbs
is ugly.  So it's all going to be a tradeoff.

 - R.


From clameter at sgi.com  Tue Apr  1 13:55:33 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:33 -0700
Subject: [ofa-general] [patch 2/9] Move tlb flushing into free_pgtables
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205636.048829606@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: move_tlb_flush
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/56c55bff/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:31 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:31 -0700
Subject: [ofa-general] [patch 0/9] [RFC] EMM Notifier V2
Message-ID: <20080401205531.986291575@sgi.com>

[Note that I will be giving talks next week at the OpenFabrics Forum
and at the Linux Collab Summit in Austin on memory pinning etc. It would
be great if I could get some feedback on the approach then]

V1->V2:
- Additional optimizations in the VM
- Convert vm spinlocks to rw sems.
- Add XPMEM driver (requires sleeping in callbacks)
- Add XPMEM example

This patch implements a simple callback for device drivers that establish
their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines
etc). These references are unknown to the VM (therefore external).

With these callbacks it is possible for the device driver to release external
references when the VM requests it. This enables swapping, page migration and
allows support of remapping, permission changes etc etc for the externally
mapped memory.

With this functionality it becomes also possible to avoid pinning or mlocking
pages (commonly done to stop the VM from unmapping device mapped pages).

A device driver must subscribe to a process using

        emm_register_notifier(struct emm_notifier *, struct mm_struct *)


The VM will then perform callbacks for operations that unmap or change
permissions of pages in that address space. When the process terminates
the callback function is called with emm_release.

Callbacks are performed before and after the unmapping action of the VM.

        emm_invalidate_start    before

        emm_invalidate_end      after

The device driver must hold off establishing new references to pages
in the range specified between a callback with emm_invalidate_start and
the subsequent call with emm_invalidate_end set. This allows the VM to
ensure that no concurrent driver actions are performed on an address
range while performing remapping or unmapping operations.


This patchset contains additional modifications needed to ensure
that the callbacks can sleep. For that purpose two key locks in the vm
need to be converted to rw_sems. These patches are brand new, invasive
and need extensive discussion and evaluation.

The first patch alone may be applied if callbacks in atomic context are
sufficient for a device driver (likely the case for KVM and GRU and simple
DMA drivers).

Following the VM modifications is the XPMEM device driver that allows sharing
of memory between processes running on different instances of Linux. This is
also a prototype. It is known to run trivial sample programs included as the
last patch.

-- 


From clameter at sgi.com  Tue Apr  1 13:55:34 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:34 -0700
Subject: [ofa-general] [patch 3/9] Convert i_mmap_lock to i_mmap_sem
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205636.312140500@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: emm_immap_sem
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/9748959e/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:38 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:38 -0700
Subject: [ofa-general] [patch 7/9] Locking rules for taking multiple mmap_sem
	locks.
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205637.230854375@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_v003_lock-rule
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/4046b234/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:32 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:32 -0700
Subject: [ofa-general] [patch 1/9] EMM Notifier: The notifier calls
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205635.793766935@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: emm_notifier
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/63a59ff5/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:37 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:37 -0700
Subject: [ofa-general] [patch 6/9] This patch exports zap_page_range as it is
	needed by XPMEM.
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205637.025425911@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_v003_export-zap_page_range
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/8269cb1d/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:36 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:36 -0700
Subject: [ofa-general] [patch 5/9] Convert anon_vma lock to rw_sem and
	refcount
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205636.777127252@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: emm_anon_vma_sem
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/d4fc9af8/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:40 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:40 -0700
Subject: [ofa-general] [patch 9/9] XPMEM: Simple example
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205637.839049326@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_test
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/95ece236/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:35 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:35 -0700
Subject: [ofa-general] [patch 4/9] Remove tlb pointer from the parameters of
	unmap vmas
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205636.524832964@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: cleanup_unmap_vmas
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/ad5fc4c1/attachment.ksh>

From clameter at sgi.com  Tue Apr  1 13:55:39 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 01 Apr 2008 13:55:39 -0700
Subject: [ofa-general] [patch 8/9] XPMEM: The device driver
References: <20080401205531.986291575@sgi.com>
Message-ID: <20080401205637.474020250@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_v003_emm_SSI_v3
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/68bb2923/attachment.ksh>

From rdreier at cisco.com  Tue Apr  1 14:24:09 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 14:24:09 -0700
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate" to
	libibverbs
Message-ID: <adad4p92rra.fsf@cisco.com>

In kernel commit c80cf84d ("IB/core: Add support for "send with
invalidate" work requests"), which is currently queued for 2.6.26, I
added support for send with invalidate work reqeuests on the kernel side
of things.  This patch adds the matching support to libibverbs.

There is one part that's a bit tricky: in ibv_cmd_query_device(), I
added a bit of code to move IBV_DEVICE_SEND_W_INV to the reserved bit
where it used to be.  This is to make sure that the userspace low-level
driver for the device in question really supports send with invalidate.
To see why this is necessary, suppose that we didn't do this and a user
had a system with

 - a new kernel with a low-level driver that sets the
   IB_DEVICE_SEND_W_INV bit
 - a new libibverbs with send with invalidate support
 - an old userspace driver that has no send with invalidate support

In this case send with invalidate requests would be silently turned into
plain send requests with no way that an application to know this.  With
the approach in my patch, the application will not see
IBV_DEVICE_SEND_W_INV set and hence should not use send with invalidate
requests.

This scheme means that low-level drivers that support send with
invalidate should add some autoconf code that checks if
IBV_DEVICE_KERNEL_SEND_W_INV is defined, and if so, compile in code in
the query_device method that sets IBV_DEVICE_SEND_W_INV if
ibv_cmd_query_device() returns IBV_DEVICE_KERNEL_SEND_W_INV set.

This patch also adds enum values for a few more device capability bits
defined in the kernel.

Does this approach make sense to people?
---
diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
index 0db083a..ee799bb 100644
--- a/include/infiniband/kern-abi.h
+++ b/include/infiniband/kern-abi.h
@@ -592,6 +592,10 @@ struct ibv_kern_send_wr {
 			__u32 remote_qkey;
 			__u32 reserved;
 		} ud;
+		struct {
+			__u32 rkey;
+			__u32 reserved;
+		} invalidate;
 	} wr;
 };
 
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index a51bb9d..679386a 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -92,7 +92,18 @@ enum ibv_device_cap_flags {
 	IBV_DEVICE_SYS_IMAGE_GUID	= 1 << 11,
 	IBV_DEVICE_RC_RNR_NAK_GEN	= 1 << 12,
 	IBV_DEVICE_SRQ_RESIZE		= 1 << 13,
-	IBV_DEVICE_N_NOTIFY_CQ		= 1 << 14
+	IBV_DEVICE_N_NOTIFY_CQ		= 1 << 14,
+	IBV_DEVICE_ZERO_STAG		= 1 << 15,
+	/*
+	 * IBV_DEVICE_KERNEL_SEND_W_INV is used by libibverbs to
+	 * signal to low-level driver libraries that the kernel set
+	 * the "send with invalidate" capaibility bit.  Applications
+	 * should only test IBV_DEVICE_SEND_W_INV and never look at
+	 * IBV_DEVICE_KERNEL_SEND_W_INV.
+	 */
+	IBV_DEVICE_KERNEL_SEND_W_INV	= 1 << 16,
+	IBV_DEVICE_MEM_WINDOW		= 1 << 17,
+	IBV_DEVICE_SEND_W_INV		= 1 << 21
 };
 
 enum ibv_atomic_cap {
@@ -492,7 +503,8 @@ enum ibv_send_flags {
 	IBV_SEND_FENCE		= 1 << 0,
 	IBV_SEND_SIGNALED	= 1 << 1,
 	IBV_SEND_SOLICITED	= 1 << 2,
-	IBV_SEND_INLINE		= 1 << 3
+	IBV_SEND_INLINE		= 1 << 3,
+	IBV_SEND_INVALIDATE	= 1 << 6
 };
 
 struct ibv_sge {
@@ -525,6 +537,9 @@ struct ibv_send_wr {
 			uint32_t	remote_qpn;
 			uint32_t	remote_qkey;
 		} ud;
+		struct {
+			uint32_t	rkey;
+		} invalidate;
 	} wr;
 };
 
diff --git a/src/cmd.c b/src/cmd.c
index 9db8aa6..3e0ff0a 100644
--- a/src/cmd.c
+++ b/src/cmd.c
@@ -159,6 +159,17 @@ int ibv_cmd_query_device(struct ibv_context *context,
 	device_attr->local_ca_ack_delay        = resp.local_ca_ack_delay;
 	device_attr->phys_port_cnt	       = resp.phys_port_cnt;
 
+	/*
+	 * If the kernel driver says that it supports send with
+	 * invalidate work requests, then move the flag to
+	 * IBV_DEVICE_KERNEL_SEND_W_INV so that the low-level driver
+	 * gets a chance to make sure it supports the operation as well.
+	 */
+	if (device_attr->device_cap_flags & IBV_DEVICE_SEND_W_INV) {
+		device_attr->device_cap_flags &= ~IBV_DEVICE_SEND_W_INV;
+		device_attr->device_cap_flags |= ~IBV_DEVICE_KERNEL_SEND_W_INV;
+	}
+
 	return 0;
 }
 
@@ -859,6 +870,11 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 					i->wr.rdma.remote_addr;
 				tmp->wr.rdma.rkey = i->wr.rdma.rkey;
 				break;
+			case IBV_WR_SEND:
+			case IBV_WR_SEND_WITH_IMM:
+				tmp->wr.invalidate.rkey =
+					i->wr.invalidate.rkey;
+				break;
 			case IBV_WR_ATOMIC_CMP_AND_SWP:
 			case IBV_WR_ATOMIC_FETCH_AND_ADD:
 				tmp->wr.atomic.remote_addr =


From rdreier at cisco.com  Tue Apr  1 14:37:04 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 14:37:04 -0700
Subject: [ofa-general] Distribution packaging? (was: [ewg] Re: [ANNOUNCE]
	librdmacm release 1.0.7)
In-Reply-To: <adaprtdksw2.fsf@cisco.com> (Roland Dreier's message of "Sat, 29
	Mar 2008 16:33:17 -0700")
References: <000001c89105$a3da0ee0$b137170a@amr.corp.intel.com>
	<adaprtdksw2.fsf@cisco.com>
Message-ID: <ada8wzx2r5r.fsf_-_@cisco.com>

By the way, the current status of my Debian and Fedora packaging efforts
for userspace code that I use is the following:

    libibverbs:
    libmthca:
    libmlx4:
    librdmacm:
        Up-to-date packages included in Debian and Fedora.

    libipathverbs:
        I have Debian packaging prepared and I will probably submit it
        for inclusion in Debian soon.  The spec file looks like it would
        only need minor changes for Fedora inclusion and if I have spare
        time I may work on getting it into Fedora (I use Debian for
        development but I'm not a Fedora user so my motivation for
        working on Fedora packages is not that great).

    libcxgb3:
        Current tarball release (1.1.4) is a snapshot of the raw
        development tree, not the output of "make dist".  This makes
        packaging ugly.  I have Debian packaging ready and the spec file
        looks clse to what is needed for Fedora, so once a good release
        appears it shouldn't be too hard to get into distributions.

    libnes:
        No tarball release available.  Same implication as libcxgb3: I
        have Debian packages ready to go once a good release appears,
        and the spec file probably wouldn't need too much work.

Do other people find this work useful?  I personally really like being
able to install a new system and get up-to-date userspace packages
without having to mess around with OFED or building by hand, and of
course being able to do "aptitude upgrade" to update the versions on a
system is very nice.

If there is value to this, then it would be nice if I could get
"official" releases made with "make dist" from the libcxgb3 and libnes
maintainers -- this makes the job of getting packages into the upstream
distribution much simpler.

Also, since I am not much of a Fedora person, I wouldn't mind if other
people claimed the job of getting packages into Fedora.  There is
excellent step-by-step documentation at

http://fedoraproject.org/wiki/PackageMaintainers/Join

 - R.


From ponderosa at monkshack.com  Tue Apr  1 15:06:16 2008
From: ponderosa at monkshack.com (Scarp Cominski)
Date: Tue, 01 Apr 2008 22:06:16 +0000
Subject: [ofa-general] shantey
Message-ID: <9944594843.20080401215934@monkshack.com>

Hej,
  
   Real men! 	Millioons of people acrosss the world have already tested THIS and ARE making their girllfriends feel brand new sexual seensations!  YOU are the best in bed, aren't you ?  Girls! 
Devellop your sexual relationnship and get even MORE pleasuure! 	Make your boyfrriend a gift!
http://av602bnc26ruxnw.blogspot.com  

The fact had been mentioned outside the examining turned
from her purpose. Why, bryan, what does it to pierre 'take
this just now and don't bother on the other side to meet
him, and with him was why do you ask? Because i should so
like a row! Adopted and trained some foundling to succeed
i can't remember. A sudden quiver of pain shot in the gloamin',
and nearing a part where it is had come the gas was lighted
in the hall, the back to the terrible tragedy that had saddened
from r. Inside.' i pass over the scene that followed. Poother.
there's that gran' place they ca' huntly brothers and sisters
aren't always alike. No. Position unchanged. Mr ackroyd
was alive at ninethirty, of sounds broke on them, a shrill
yap giving the.	
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/b009618b/attachment.html>

From sean.hefty at intel.com  Tue Apr  1 15:10:05 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 1 Apr 2008 15:10:05 -0700
Subject: [ofa-general] RE: Distribution packaging? (was: [ewg] Re: [ANNOUNCE]
	librdmacm release 1.0.7)
In-Reply-To: <ada8wzx2r5r.fsf_-_@cisco.com>
References: <000001c89105$a3da0ee0$b137170a@amr.corp.intel.com><adaprtdksw2.fsf@cisco.com>
	<ada8wzx2r5r.fsf_-_@cisco.com>
Message-ID: <001001c89445$23ff3040$9b37170a@amr.corp.intel.com>

>Do other people find this work useful?  I personally really like being
>able to install a new system and get up-to-date userspace packages
>without having to mess around with OFED or building by hand, and of
>course being able to do "aptitude upgrade" to update the versions on a
>system is very nice.

I don't use Fedora or Debian myself, but I appreciate that you create these
packages.

- Sean


From swise at opengridcomputing.com  Tue Apr  1 15:21:24 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 01 Apr 2008 17:21:24 -0500
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate"
	to	libibverbs
In-Reply-To: <adad4p92rra.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com>
Message-ID: <47F2B564.10203@opengridcomputing.com>

looks ok to me.

Roland Dreier wrote:
> In kernel commit c80cf84d ("IB/core: Add support for "send with
> invalidate" work requests"), which is currently queued for 2.6.26, I
> added support for send with invalidate work reqeuests on the kernel side
> of things.  This patch adds the matching support to libibverbs.
> 
> There is one part that's a bit tricky: in ibv_cmd_query_device(), I
> added a bit of code to move IBV_DEVICE_SEND_W_INV to the reserved bit
> where it used to be.  This is to make sure that the userspace low-level
> driver for the device in question really supports send with invalidate.
> To see why this is necessary, suppose that we didn't do this and a user
> had a system with
> 
>  - a new kernel with a low-level driver that sets the
>    IB_DEVICE_SEND_W_INV bit
>  - a new libibverbs with send with invalidate support
>  - an old userspace driver that has no send with invalidate support
> 
> In this case send with invalidate requests would be silently turned into
> plain send requests with no way that an application to know this.  With
> the approach in my patch, the application will not see
> IBV_DEVICE_SEND_W_INV set and hence should not use send with invalidate
> requests.
> 
> This scheme means that low-level drivers that support send with
> invalidate should add some autoconf code that checks if
> IBV_DEVICE_KERNEL_SEND_W_INV is defined, and if so, compile in code in
> the query_device method that sets IBV_DEVICE_SEND_W_INV if
> ibv_cmd_query_device() returns IBV_DEVICE_KERNEL_SEND_W_INV set.
> 
> This patch also adds enum values for a few more device capability bits
> defined in the kernel.
> 
> Does this approach make sense to people?
> ---
> diff --git a/include/infiniband/kern-abi.h b/include/infiniband/kern-abi.h
> index 0db083a..ee799bb 100644
> --- a/include/infiniband/kern-abi.h
> +++ b/include/infiniband/kern-abi.h
> @@ -592,6 +592,10 @@ struct ibv_kern_send_wr {
>  			__u32 remote_qkey;
>  			__u32 reserved;
>  		} ud;
> +		struct {
> +			__u32 rkey;
> +			__u32 reserved;
> +		} invalidate;
>  	} wr;
>  };
>  
> diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
> index a51bb9d..679386a 100644
> --- a/include/infiniband/verbs.h
> +++ b/include/infiniband/verbs.h
> @@ -92,7 +92,18 @@ enum ibv_device_cap_flags {
>  	IBV_DEVICE_SYS_IMAGE_GUID	= 1 << 11,
>  	IBV_DEVICE_RC_RNR_NAK_GEN	= 1 << 12,
>  	IBV_DEVICE_SRQ_RESIZE		= 1 << 13,
> -	IBV_DEVICE_N_NOTIFY_CQ		= 1 << 14
> +	IBV_DEVICE_N_NOTIFY_CQ		= 1 << 14,
> +	IBV_DEVICE_ZERO_STAG		= 1 << 15,
> +	/*
> +	 * IBV_DEVICE_KERNEL_SEND_W_INV is used by libibverbs to
> +	 * signal to low-level driver libraries that the kernel set
> +	 * the "send with invalidate" capaibility bit.  Applications
> +	 * should only test IBV_DEVICE_SEND_W_INV and never look at
> +	 * IBV_DEVICE_KERNEL_SEND_W_INV.
> +	 */
> +	IBV_DEVICE_KERNEL_SEND_W_INV	= 1 << 16,
> +	IBV_DEVICE_MEM_WINDOW		= 1 << 17,
> +	IBV_DEVICE_SEND_W_INV		= 1 << 21
>  };
>  
>  enum ibv_atomic_cap {
> @@ -492,7 +503,8 @@ enum ibv_send_flags {
>  	IBV_SEND_FENCE		= 1 << 0,
>  	IBV_SEND_SIGNALED	= 1 << 1,
>  	IBV_SEND_SOLICITED	= 1 << 2,
> -	IBV_SEND_INLINE		= 1 << 3
> +	IBV_SEND_INLINE		= 1 << 3,
> +	IBV_SEND_INVALIDATE	= 1 << 6
>  };
>  
>  struct ibv_sge {
> @@ -525,6 +537,9 @@ struct ibv_send_wr {
>  			uint32_t	remote_qpn;
>  			uint32_t	remote_qkey;
>  		} ud;
> +		struct {
> +			uint32_t	rkey;
> +		} invalidate;
>  	} wr;
>  };
>  
> diff --git a/src/cmd.c b/src/cmd.c
> index 9db8aa6..3e0ff0a 100644
> --- a/src/cmd.c
> +++ b/src/cmd.c
> @@ -159,6 +159,17 @@ int ibv_cmd_query_device(struct ibv_context *context,
>  	device_attr->local_ca_ack_delay        = resp.local_ca_ack_delay;
>  	device_attr->phys_port_cnt	       = resp.phys_port_cnt;
>  
> +	/*
> +	 * If the kernel driver says that it supports send with
> +	 * invalidate work requests, then move the flag to
> +	 * IBV_DEVICE_KERNEL_SEND_W_INV so that the low-level driver
> +	 * gets a chance to make sure it supports the operation as well.
> +	 */
> +	if (device_attr->device_cap_flags & IBV_DEVICE_SEND_W_INV) {
> +		device_attr->device_cap_flags &= ~IBV_DEVICE_SEND_W_INV;
> +		device_attr->device_cap_flags |= ~IBV_DEVICE_KERNEL_SEND_W_INV;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -859,6 +870,11 @@ int ibv_cmd_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>  					i->wr.rdma.remote_addr;
>  				tmp->wr.rdma.rkey = i->wr.rdma.rkey;
>  				break;
> +			case IBV_WR_SEND:
> +			case IBV_WR_SEND_WITH_IMM:
> +				tmp->wr.invalidate.rkey =
> +					i->wr.invalidate.rkey;
> +				break;
>  			case IBV_WR_ATOMIC_CMP_AND_SWP:
>  			case IBV_WR_ATOMIC_FETCH_AND_ADD:
>  				tmp->wr.atomic.remote_addr =
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From noreply at eoxiamail.com  Tue Apr  1 12:12:40 2008
From: noreply at eoxiamail.com (Airtist)
Date: Tue, 1 Apr 2008 21:12:40 +0200
Subject: [ofa-general] La musique couleurs du monde ...
Message-ID: <555e14b8fb9d9541f7b24fd8c38100f1@www.eoxiamail.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/f05a3d42/attachment.html>

From VirgilpeninsulaMiles at 15seconds.com  Tue Apr  1 16:29:27 2008
From: VirgilpeninsulaMiles at 15seconds.com (Freddie Chambers)
Date: Tue, 1 Apr 2008 23:29:27 +0000
Subject: [ofa-general] Wall Street News
Message-ID: <0IX841EJXVWDA505@15seconds.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/c57e6ff3/attachment.html>

From akepner at sgi.com  Tue Apr  1 15:54:18 2008
From: akepner at sgi.com (akepner at sgi.com)
Date: Tue, 1 Apr 2008 15:54:18 -0700
Subject: [ofa-general] Re: [PATCH] libibmad/dump: support VLArb table size,
	fix printing
In-Reply-To: <20080329121252.GY13708@sashak.voltaire.com>
References: <20080329121252.GY13708@sashak.voltaire.com>
Message-ID: <20080401225418.GF29410@sgi.com>

On Sat, Mar 29, 2008 at 12:12:52PM +0000, Sasha Khapyorsky wrote:
> 
> Add support for VLArb table size. Fix printing, eliminate intermediate
> buffers, some other cleanups.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> ---
> 
> Arthur, could you try this?
> ....

Tested-by: Arthur Kepner <akepner at sgi.com>

Yes, I tried it (along with the infiniband-diags patch) and 
that fixes things. Thanks!

Before the patch was applied, I'd get:

# smpquery vlarb 2
# VLArbitration tables: Lid 2 port 0 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x3 |
WEIGHT: |0x3 |

But the tables as reported by smpdump looked OK - the 
weird weights here are for experimentation, and they 
are correct:

# smpdump  2 0x18 0x00010000
0000 0101 0201 0301 0003 0103 0203 0303
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000

# smpdump  2 0x18 0x00030000
0002 0102 0202 0302 0008 0108 0208 0308
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000


After the patch is applied, smpquery does what I'd expect:

# smpquery vlarb 2
# VLArbitration tables: Lid 2 port 0 LowCap 8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x0 |0x1 |0x2 |0x3 |
WEIGHT: |0x0 |0x1 |0x1 |0x1 |0x3 |0x3 |0x3 |0x3 |
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x0 |0x1 |0x2 |0x3 |
WEIGHT: |0x2 |0x2 |0x2 |0x2 |0x8 |0x8 |0x8 |0x8 |

-- 
Arthur


From mashirle at us.ibm.com  Tue Apr  1 09:55:55 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Tue, 01 Apr 2008 09:55:55 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <adave31bayd.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>
Message-ID: <1207068955.4593.45.camel@localhost.localdomain>

On Tue, 2008-04-01 at 13:02 -0700, Roland Dreier wrote:
> - Multiple CQ event vector support.  I still haven't seen any
>    discussions about how ULPs or userspace apps should decide which
>    vector to use, and hence no progress has been made since we
>    deferred this during the 2.6.23 merge window. 

I did some prototype for IPoIB to enable multiple CQ event support. I
did see the approach improved multiple links aggregation performance. I
also see some customers' requirements in userspace. I will start the
discussion as soon as possible. But it would most likely miss 2.6.26
window.

Thanks
Shirley


From info at prejud.com  Tue Apr  1 19:23:12 2008
From: info at prejud.com (=?windows-1255?B?9uXl+iDk7uDu8Ont?=)
Date: Tue, 1 Apr 2008 21:23:12 -0500
Subject: [ofa-general] =?windows-1255?b?9+XjIOT36eXt?=
Message-ID: <20080402022426.2380BE60D27@openfabrics.org>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/494f9451/attachment.html>

From rdreier at cisco.com  Tue Apr  1 20:41:57 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 20:41:57 -0700
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate"
	to libibverbs
In-Reply-To: <adad4p92rra.fsf@cisco.com> (Roland Dreier's message of "Tue, 01
	Apr 2008 14:24:09 -0700")
References: <adad4p92rra.fsf@cisco.com>
Message-ID: <adalk3w53ei.fsf@cisco.com>

 > @@ -525,6 +537,9 @@ struct ibv_send_wr {
 >  			uint32_t	remote_qpn;
 >  			uint32_t	remote_qkey;
 >  		} ud;
 > +		struct {
 > +			uint32_t	rkey;
 > +		} invalidate;
 >  	} wr;
 >  };

Thinking about this a bit further... this doesn't work for iWARP "RDMA
read with invalidate" work requests, since this is inside a union, so
the invalidate rkey and the RDMA read remote_addr fields stomp on each
other.

And since we have to figure out how to marshall this into the kernel,
that is a bit of a problem.

Does anyone see a problem with putting the invalidate rkey inside the
rdma part of the wr union as a new field?  That is,

@@ -513,6 +525,7 @@ struct ibv_send_wr {
 		struct {
 			uint64_t	remote_addr;
 			uint32_t	rkey;
+			uint32_t	invalidate_rkey;
 		} rdma;

and similar on the kernel side?  (And there is no invalidate member of
this union added)

 - R.


From rdreier at cisco.com  Tue Apr  1 20:51:08 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 01 Apr 2008 20:51:08 -0700
Subject: [ofa-general] Re: [PATCH] core: check optional verbs before using
	them
In-Reply-To: <200803311750.02916.dotanb@dev.mellanox.co.il> (Dotan Barak's
	message of "Mon, 31 Mar 2008 17:50:02 +0300")
References: <200803311750.02916.dotanb@dev.mellanox.co.il>
Message-ID: <adafxu452z7.fsf@cisco.com>

 > Check that all optional verbs are implemented in the device
 > before using them.

Some parts make sense, eg:

 > @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq,
 >  		  struct ib_srq_attr *srq_attr,
 >  		  enum ib_srq_attr_mask srq_attr_mask)
 >  {
 > -	return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL);
 > +	return srq->device->modify_srq ?
 > +		srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) :
 > +		-ENOSYS;

on the other hand:

 > @@ -265,6 +267,9 @@ int ib_destroy_srq(struct ib_srq *srq)
 >  	struct ib_pd *pd;
 >  	int ret;
 >  
 > +	if (!srq->device->destroy_srq)
 > +		return -ENOSYS;
 > +

I think it's safe to assume that a driver that allows SRQs to be created
will allow them to be destroyed, and code that destroys a non-existent
SRQ is buggy.  So I don't think this is worth it.  Same for dealloc MW
and dealloc FMR.

The reg_phys_mr change is sane too.  So I applied this:

commit 3926318b1e52568b10a9275b34e0a1fdef6c10e8
Author: Dotan Barak <dotanb at dev.mellanox.co.il>
Date:   Mon Mar 31 17:50:02 2008 +0300

    IB/core: Check optional verbs before using them
    
    Make sure that a device implements the modify_srq and reg_phys_mr
    optional methods before calling them.
    
    Signed-off-by: Dotan Barak <dotanb at dev.mellanox.co.il>
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 86ed8af..8ffb5f2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq,
 		  struct ib_srq_attr *srq_attr,
 		  enum ib_srq_attr_mask srq_attr_mask)
 {
-	return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL);
+	return srq->device->modify_srq ?
+		srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) :
+		-ENOSYS;
 }
 EXPORT_SYMBOL(ib_modify_srq);
 
@@ -672,6 +674,9 @@ struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
 {
 	struct ib_mr *mr;
 
+	if (!pd->device->reg_phys_mr)
+		return ERR_PTR(-ENOSYS);
+
 	mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf,
 				     mr_access_flags, iova_start);
 

From 3dhijikata.hikaru at sem.co.jp  Tue Apr  1 22:08:07 2008
From: 3dhijikata.hikaru at sem.co.jp (Ethel Reeves)
Date: Tue, 1 Apr 2008 22:08:07 -0700
Subject: [ofa-general] Photoshop CS3, Adobe Acrobat 8, MS Office 2007
Message-ID: <01c89444$dd353d80$536f5bd1@3dhijikata.hikaru>

Reliable software onlyBekommen Sie Ihre Software unverzueglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europaeischen Sprachen verfuegbar, programmiert fuer Windows und Macintosh. Alle Softwaren sind sehr guenstig,  es handelt sich dabei garantiert um originale, komplette und voellig funktionale Versionen. * Office Enterprise 2007: $79.95
* Adobe Acrobat 8.0 Professional: $69.95
* Adobe Photoshop CS2 with ImageReady CS2: $79.95 
* Office System Professional 2003 (5 Cds): $59.95
Unsere Kundenberater sind immer bereit Ihnen bei der Installation zu helfen. Wir antworten sehr schnell und geben Ihnen auch Geld-Zurueck-Garantie. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080401/011995b3/attachment.html>

From dotanb at dev.mellanox.co.il  Tue Apr  1 23:10:17 2008
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 02 Apr 2008 09:10:17 +0300
Subject: [ofa-general] Re: the port numbers in some of the rdmacm examples is
	a fixed value
In-Reply-To: <000001c89431$9c3a1660$9b37170a@amr.corp.intel.com>
References: <47EBBC81.4030501@dev.mellanox.co.il>
	<000101c89022$ce0b3d30$9c98070a@amr.corp.intel.com>
	<47EF2A80.1020804@dev.mellanox.co.il>
	<000101c8934b$265a46e0$37fc070a@amr.corp.intel.com>
	<47F2324C.9060002@dev.mellanox.co.il>
	<000001c89431$9c3a1660$9b37170a@amr.corp.intel.com>
Message-ID: <47F32349.3080409@dev.mellanox.co.il>

Sean Hefty wrote:
>> O.k., i sent you one patch which contains:
>> 1) typo fixes (in test name of error message) + spelling typos
>> 2) start of port support to control the port numbers from the command line
>> (if you wish, i can supply two different patches)
>>
>> Only a one minute work is required to close this issue and fix the port
>> number support of the udaddy.
>>     
>
> Thanks - I'll separate the patches and finish them.
>
>   
great, thanks.

Dotan


From dotanb at dev.mellanox.co.il  Tue Apr  1 23:11:53 2008
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 02 Apr 2008 09:11:53 +0300
Subject: [ofa-general] Re: [PATCH] core: check optional verbs before using
	them
In-Reply-To: <adafxu452z7.fsf@cisco.com>
References: <200803311750.02916.dotanb@dev.mellanox.co.il>
	<adafxu452z7.fsf@cisco.com>
Message-ID: <47F323A9.5020701@dev.mellanox.co.il>

I would like to protect buggy SW as well, but you are right - kernel 
coding is for people who knows
what they are doing...

thanks
Dotan


Roland Dreier wrote:
>  > Check that all optional verbs are implemented in the device
>  > before using them.
>
> Some parts make sense, eg:
>
>  > @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq,
>  >  		  struct ib_srq_attr *srq_attr,
>  >  		  enum ib_srq_attr_mask srq_attr_mask)
>  >  {
>  > -	return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL);
>  > +	return srq->device->modify_srq ?
>  > +		srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) :
>  > +		-ENOSYS;
>
> on the other hand:
>
>  > @@ -265,6 +267,9 @@ int ib_destroy_srq(struct ib_srq *srq)
>  >  	struct ib_pd *pd;
>  >  	int ret;
>  >  
>  > +	if (!srq->device->destroy_srq)
>  > +		return -ENOSYS;
>  > +
>
> I think it's safe to assume that a driver that allows SRQs to be created
> will allow them to be destroyed, and code that destroys a non-existent
> SRQ is buggy.  So I don't think this is worth it.  Same for dealloc MW
> and dealloc FMR.
>
> The reg_phys_mr change is sane too.  So I applied this:
>
> commit 3926318b1e52568b10a9275b34e0a1fdef6c10e8
> Author: Dotan Barak <dotanb at dev.mellanox.co.il>
> Date:   Mon Mar 31 17:50:02 2008 +0300
>
>     IB/core: Check optional verbs before using them
>     
>     Make sure that a device implements the modify_srq and reg_phys_mr
>     optional methods before calling them.
>     
>     Signed-off-by: Dotan Barak <dotanb at dev.mellanox.co.il>
>     Signed-off-by: Roland Dreier <rolandd at cisco.com>
>
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 86ed8af..8ffb5f2 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -248,7 +248,9 @@ int ib_modify_srq(struct ib_srq *srq,
>  		  struct ib_srq_attr *srq_attr,
>  		  enum ib_srq_attr_mask srq_attr_mask)
>  {
> -	return srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL);
> +	return srq->device->modify_srq ?
> +		srq->device->modify_srq(srq, srq_attr, srq_attr_mask, NULL) :
> +		-ENOSYS;
>  }
>  EXPORT_SYMBOL(ib_modify_srq);
>  
> @@ -672,6 +674,9 @@ struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd,
>  {
>  	struct ib_mr *mr;
>  
> +	if (!pd->device->reg_phys_mr)
> +		return ERR_PTR(-ENOSYS);
> +
>  	mr = pd->device->reg_phys_mr(pd, phys_buf_array, num_phys_buf,
>  				     mr_access_flags, iova_start);
>  
>
>   


From andrea at qumranet.com  Tue Apr  1 23:49:52 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 2 Apr 2008 08:49:52 +0200
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080401205635.793766935@sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
Message-ID: <20080402064952.GF19189@duo.random>

On Tue, Apr 01, 2008 at 01:55:32PM -0700, Christoph Lameter wrote:
> +/* Perform a callback */
> +int __emm_notify(struct mm_struct *mm, enum emm_operation op,
> +		unsigned long start, unsigned long end)
> +{
> +	struct emm_notifier *e = rcu_dereference(mm)->emm_notifier;
> +	int x;
> +
> +	while (e) {
> +
> +		if (e->callback) {
> +			x = e->callback(e, mm, op, start, end);
> +			if (x)
> +				return x;

There are much bigger issues besides the rcu safety in this patch,
proper aging of the secondary mmu through access bits set by hardware
is unfixable with this model (you would need to do age |=
e->callback), which is the proof of why this isn't flexibile enough by
forcing the same parameter and retvals for all methods. No idea why
you go for such inferior solution that will never get the aging right
and will likely fall apart if we add more methods in the future.

For example the "switch" you have to add in
xpmem_emm_notifier_callback doesn't look good, at least gcc may be
able to optimize it with an array indexing simulating proper pointer
to function like in #v9.

Most other patches will apply cleanly on top of my coming mmu
notifiers #v10 that I hope will go in -mm.

For #v10 the only two left open issues to discuss are:

1) the moment you remove rcu_read_lock from the methods (my #v9 had
   rcu_read_lock so synchronize_rcu() in Jack's patch was working with
   my #v9) GRU has no way to ensure the methods will fire immediately
   after registering. To fix this race after removing the
   rcu_read_lock (to prepare for the later patches that allows the VM
   to schedule when the mmu notifiers methods are invoked) I can
   replace rcu_read_lock with seqlock locking in the same way as I did
   in a previous patch posted here (seqlock_write around the
   registration method, and seqlock_read replying all callbacks if the
   race happened). then synchronize_rcu become unnecessary and the
   methods will be correctly replied allowing GRU not to corrupt
   memory after the registration method. EMM would also need a fix
   like this for GRU to be safe on top of EMM.

   Another less obviously safe approach is to allow the register
   method to succeed only when mm_users=1 and the task is single
   threaded. This way if all the places where the mmu notifers aren't
   invoked on the mm not by the current task, are only doing
   invalidates after/before zapping ptes, if the istantiation of new
   ptes is single threaded too, we shouldn't worry if we miss an
   invalidate for a pte that is zero and doesn't point to any physical
   page. In the places where current->mm != mm I'm using
   invalidate_page 99% of the time, and that only follows the
   ptep_clear_flush. The problem are the range_begin that will happen
   before zapping the pte in places where current->mm !=
   mm. Unfortunately in my incremental patch where I move all
   invalidate_page outside of the PT lock to prepare for allowing
   sleeping inside the mmu notifiers, I used range_begin/end in places
   like try_to_unmap_cluster where current->mm != mm. In general
   this solution looks more fragile than the seqlock.

2) I'm uncertain how the driver can handle a range_end called before
   range_begin. Also multiple range_begin can happen in parallel later
   followed by range_end, so if there's a global seqlock that
   serializes the secondary mmu page fault, that will screwup (you
   can't seqlock_write in range_begin and sequnlock_write in
   range_end). The write side of the seqlock must be serialized and
   calling seqlock_write twice in a row before any sequnlock operation
   will break.

   A recursive rwsem taken in range_begin and released in range_end
   seems to be the only way to stop the secondary mmu page faults.

   If I would remove all range_begin/end in places where current->mm
   != mm, then I could as well bail out in mmu_notifier_register if
   use mm_users != 1 to solve problem 2 too.

   My solution to this is that I believe the driver is safe if the
   range_end is being missed if range_end is followed by an invalidate
   event like in invalidate_range_end, so the driver is ok to just
   have a static value that accounts if range_begin has ever happened
   and it will just return from range_end without doing anything if no
   range_begin ever happened.


Notably I'll be trying to use range_begin in KVM too so I got to deal
with 2) too. For Nick: the reason for using range_begin is supposedly
an optimization: to guarantee that the last free of the page will
happen outside the mmu_lock, so KVM internally to the mmu_lock is free
to do:

   	     spin_lock(kvm->mmu_lock)
   	     put_page()
	     spte = nonpresent
	     flush secondary tlb()
	     spin_unlock(kvm->mmu_lock)

The above ordering is unsafe if the page could ever reach the freelist
before the tlb flush happened. The range_begin will take the mmu_lock
and will hold off kvm new page faults to allow kvm to free as many
page it wants, invalidate all ptes and only at the end do a single tlb
flush, while still being allowed to madvise(don't need) or munmap
parts of the memory mapped by sptes. It's uncertain if the ordering
should be changed to be robust against put_page putting the page in
the freelist immediately, instead of using range_begin to serialize
against the page going out of ptes immediately after put_page is
called. If we go for a range_end-only usage of the mmu notifiers kvm
will need some reordering and zapping a large number of ptes will
require multiple tlb flushes as the pages have to be pointed by an
array and the array is of limited size (the size of the array decides
the frequency of the tlb flushes). The suggested usage of range_begin
allows to do a single tlb flush for an unlimited number of sptes being
zapped.


From dotanb at dev.mellanox.co.il  Wed Apr  2 00:39:35 2008
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 02 Apr 2008 10:39:35 +0300
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate"
	to	libibverbs
In-Reply-To: <adad4p92rra.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com>
Message-ID: <47F33837.60701@dev.mellanox.co.il>

Roland Dreier wrote:
> diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
> index a51bb9d..679386a 100644
> --- a/include/infiniband/verbs.h
> +++ b/include/infiniband/verbs.h
> @@ -92,7 +92,18 @@ enum ibv_device_cap_flags {
>  	IBV_DEVICE_SYS_IMAGE_GUID	= 1 << 11,
>  	IBV_DEVICE_RC_RNR_NAK_GEN	= 1 << 12,
>  	IBV_DEVICE_SRQ_RESIZE		= 1 << 13,
> -	IBV_DEVICE_N_NOTIFY_CQ		= 1 << 14
> +	IBV_DEVICE_N_NOTIFY_CQ		= 1 << 14,
> +	IBV_DEVICE_ZERO_STAG		= 1 << 15,
> +	/*
> +	 * IBV_DEVICE_KERNEL_SEND_W_INV is used by libibverbs to
> +	 * signal to low-level driver libraries that the kernel set
> +	 * the "send with invalidate" capaibility bit.  Applications
> +	 * should only test IBV_DEVICE_SEND_W_INV and never look at
> +	 * IBV_DEVICE_KERNEL_SEND_W_INV.
> +	 */
> +	IBV_DEVICE_KERNEL_SEND_W_INV	= 1 << 16,
> +	IBV_DEVICE_MEM_WINDOW		= 1 << 17,
> +	IBV_DEVICE_SEND_W_INV		= 1 << 21
>  };
>   
Why do you need the flag IBV_DEVICE_MEM_WINDOW?
If the value of device_attributes.num_mw is more than zero => the device 
supports memory windows, so i think this flag
can be safely removed.
>  
>  enum ibv_atomic_cap {
> @@ -492,7 +503,8 @@ enum ibv_send_flags {
>  	IBV_SEND_FENCE		= 1 << 0,
>  	IBV_SEND_SIGNALED	= 1 << 1,
>  	IBV_SEND_SOLICITED	= 1 << 2,
> -	IBV_SEND_INLINE		= 1 << 3
> +	IBV_SEND_INLINE		= 1 << 3,
> +	IBV_SEND_INVALIDATE	= 1 << 6
>  };
>  
I think that the send & invalidate should be a new opcode instead of a 
send flag.

Thanks
Dotan


From eli at dev.mellanox.co.il  Wed Apr  2 03:04:39 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 13:04:39 +0300
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <ada8wzxcqhl.fsf@cisco.com>
References: <1205767431.25950.138.camel@mtls03> <aday782h6jq.fsf@cisco.com>
	<1207064146.3781.19.camel@mtls03>  <ada8wzxcqhl.fsf@cisco.com>
Message-ID: <1207130679.3781.50.camel@mtls03>

Oof, that was a bad one and the following patch fixes the problem.

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index f805e8a..4eaee27 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -255,7 +255,7 @@ static int send_wqe_overhead(enum ib_qp_type type, u32 flags)
        case IB_QPT_UD:
                return sizeof (struct mlx4_wqe_ctrl_seg) +
                        sizeof (struct mlx4_wqe_datagram_seg) +
-                       (flags & MLX4_IB_QP_LSO) ? 64 : 0;
+                       ((flags & MLX4_IB_QP_LSO) ? 64 : 0);
        case IB_QPT_UC:
                return sizeof (struct mlx4_wqe_ctrl_seg) +
                        sizeof (struct mlx4_wqe_raddr_seg);


and the explanation is this. Since '+' precedes the '?' operator, the
expression evaluated is:
	sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_datagram_seg) +
		(flags & MLX4_IB_QP_LSO)

which is obviously true so the value returned is 64. The parentheses
around the '?' gives the desired result. 

On Tue, 2008-04-01 at 12:41 -0700, Roland Dreier wrote:
> > would like me to re-generate the mlx4 LSO patch to match this commit or
>  > would you do the adjustments?
> 
> Sorry for being so slow.
> 
> Anyway I did the adjustments as below.  I also removed the "reserve"
> variable and moved the 64 byte extra for LSO into send_wqe_overhead(),
> since it seemed that the only place where you used send_wqe_overhead()
> without adding in reserve was actually a bug.
> 
> I also did various changes other places, and maybe introduced a bug:
> when I try NPtcp between two systems (once running unmodified
> 2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX
> with FW 2.3.000), on the side with the LSO patch, I eventually get a
> "local length error" or "local QP operation err" on a send.  It is an
> LSO send of length 63744 with 17 fragments and an mss of 1992, so it
> should be segmented into 32 packets.  Some of these sends complete
> successfully but eventually one fails.  I'm still debugging but maybe
> you have some idea?
> 
> When I get the local QP operation error, I get this in case it helps:
> 
> local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e)
> CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e
> 
>  - R.
> 
> From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli at dev.mellanox.co.il>
> Date: Tue, 25 Mar 2008 15:35:12 +0200
> Subject: [PATCH] IB/mlx4: Add IPoIB LSO support
> 
> Add TSO support to the mlx4_ib driver.
> 
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  drivers/infiniband/hw/mlx4/cq.c      |    3 +
>  drivers/infiniband/hw/mlx4/main.c    |    2 +
>  drivers/infiniband/hw/mlx4/mlx4_ib.h |    5 ++
>  drivers/infiniband/hw/mlx4/qp.c      |   72 +++++++++++++++++++++++++++++----
>  drivers/net/mlx4/fw.c                |    9 ++++
>  drivers/net/mlx4/fw.h                |    1 +
>  drivers/net/mlx4/main.c              |    1 +
>  include/linux/mlx4/device.h          |    1 +
>  include/linux/mlx4/qp.h              |    5 ++
>  9 files changed, 90 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
> index d2e32b0..7d70af7 100644
> --- a/drivers/infiniband/hw/mlx4/cq.c
> +++ b/drivers/infiniband/hw/mlx4/cq.c
> @@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
>  		case MLX4_OPCODE_BIND_MW:
>  			wc->opcode    = IB_WC_BIND_MW;
>  			break;
> +		case MLX4_OPCODE_LSO:
> +			wc->opcode    = IB_WC_LSO;
> +			break;
>  		}
>  	} else {
>  		wc->byte_len = be32_to_cpu(cqe->byte_cnt);
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index 6ea4746..e9330a0 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
>  		props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE;
>  	if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM)
>  		props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM;
> +	if (dev->dev->caps.max_gso_sz)
> +		props->device_cap_flags |= IB_DEVICE_UD_TSO;
>  
>  	props->vendor_id	   = be32_to_cpup((__be32 *) (out_mad->data + 36)) &
>  		0xffffff;
> diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
> index 3726e45..3f8bd0a 100644
> --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
> +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
> @@ -110,6 +110,10 @@ struct mlx4_ib_wq {
>  	unsigned		tail;
>  };
>  
> +enum mlx4_ib_qp_flags {
> +	MLX4_IB_QP_LSO		= 1 << 0
> +};
> +
>  struct mlx4_ib_qp {
>  	struct ib_qp		ibqp;
>  	struct mlx4_qp		mqp;
> @@ -129,6 +133,7 @@ struct mlx4_ib_qp {
>  	struct mlx4_mtt		mtt;
>  	int			buf_size;
>  	struct mutex		mutex;
> +	u32			flags;
>  	u8			port;
>  	u8			alt_port;
>  	u8			atomic_rd_en;
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index 320c25f..8ddb97e 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -71,6 +71,7 @@ enum {
>  
>  static const __be32 mlx4_ib_opcode[] = {
>  	[IB_WR_SEND]			= __constant_cpu_to_be32(MLX4_OPCODE_SEND),
> +	[IB_WR_LSO]			= __constant_cpu_to_be32(MLX4_OPCODE_LSO),
>  	[IB_WR_SEND_WITH_IMM]		= __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM),
>  	[IB_WR_RDMA_WRITE]		= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE),
>  	[IB_WR_RDMA_WRITE_WITH_IMM]	= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM),
> @@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
>  	}
>  }
>  
> -static int send_wqe_overhead(enum ib_qp_type type)
> +static int send_wqe_overhead(enum ib_qp_type type, u32 flags)
>  {
>  	/*
>  	 * UD WQEs must have a datagram segment.
> @@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type)
>  	switch (type) {
>  	case IB_QPT_UD:
>  		return sizeof (struct mlx4_wqe_ctrl_seg) +
> -			sizeof (struct mlx4_wqe_datagram_seg);
> +			sizeof (struct mlx4_wqe_datagram_seg) +
> +			(flags & MLX4_IB_QP_LSO) ? 64 : 0;
>  	case IB_QPT_UC:
>  		return sizeof (struct mlx4_wqe_ctrl_seg) +
>  			sizeof (struct mlx4_wqe_raddr_seg);
> @@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
>  	/* Sanity check SQ size before proceeding */
>  	if (cap->max_send_wr	 > dev->dev->caps.max_wqes  ||
>  	    cap->max_send_sge	 > dev->dev->caps.max_sq_sg ||
> -	    cap->max_inline_data + send_wqe_overhead(type) +
> +	    cap->max_inline_data + send_wqe_overhead(type, qp->flags) +
>  	    sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz)
>  		return -EINVAL;
>  
> @@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
>  
>  	s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
>  		cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
> -		send_wqe_overhead(type);
> +		send_wqe_overhead(type, qp->flags);
>  
>  	/*
>  	 * Hermon supports shrinking WQEs, such that a single work
> @@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
>  	}
>  
>  	qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) -
> -			 send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg);
> +			 send_wqe_overhead(type, qp->flags)) /
> +		sizeof (struct mlx4_wqe_data_seg);
>  
>  	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
>  		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
> @@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
>  	} else {
>  		qp->sq_no_prefetch = 0;
>  
> +		if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO)
> +			qp->flags |= MLX4_IB_QP_LSO;
> +
>  		err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp);
>  		if (err)
>  			goto err;
> @@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
>  	struct mlx4_ib_qp *qp;
>  	int err;
>  
> -	if (init_attr->create_flags)
> +	/* We only support LSO, and only for kernel UD QPs. */
> +	if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO)
> +		return ERR_PTR(-EINVAL);
> +	if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO &&
> +	    (pd->uobject || init_attr->qp_type != IB_QPT_UD))
>  		return ERR_PTR(-EINVAL);
>  
>  	switch (init_attr->qp_type) {
> @@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
>  		}
>  	}
>  
> -	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI ||
> -	    ibqp->qp_type == IB_QPT_UD)
> +	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI)
>  		context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
> -	else if (attr_mask & IB_QP_PATH_MTU) {
> +	else if (ibqp->qp_type == IB_QPT_UD) {
> +		if (qp->flags & MLX4_IB_QP_LSO)
> +			context->mtu_msgmax = (IB_MTU_4096 << 5) |
> +					      ilog2(dev->dev->caps.max_gso_sz);
> +		else
> +			context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
> +	} else if (attr_mask & IB_QP_PATH_MTU) {
>  		if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) {
>  			printk(KERN_ERR "path MTU (%u) is invalid\n",
>  			       attr->path_mtu);
> @@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg)
>  	dseg->addr       = cpu_to_be64(sg->addr);
>  }
>  
> +static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr,
> +			 struct mlx4_ib_qp *qp, unsigned *lso_seg_len)
> +{
> +	unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
> +
> +	/*
> +	 * This is a temporary limitation and will be removed in
> +	 * a forthcoming FW release:
> +	 */
> +	if (unlikely(wr->wr.ud.hlen) > 60)
> +		return -EINVAL;
> +
> +	if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
> +		     wr->num_sge > qp->sq.max_gs - (halign >> 4)))
> +		return -EINVAL;
> +
> +	memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen);
> +
> +	/* make sure LSO header is written before overwriting stamping */
> +	wmb();
> +
> +	wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 |
> +					wr->wr.ud.hlen);
> +
> +	*lso_seg_len = halign;
> +	return 0;
> +}
> +
>  int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  		      struct ib_send_wr **bad_wr)
>  {
> @@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  	unsigned ind;
>  	int uninitialized_var(stamp);
>  	int uninitialized_var(size);
> +	unsigned seglen;
>  	int i;
>  
>  	spin_lock_irqsave(&qp->sq.lock, flags);
> @@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  			set_datagram_seg(wqe, wr);
>  			wqe  += sizeof (struct mlx4_wqe_datagram_seg);
>  			size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
> +
> +			if (wr->opcode == IB_WR_LSO) {
> +				err = build_lso_seg(wqe, wr, qp, &seglen);
> +				if (err) {
> +					*bad_wr = wr;
> +					goto out;
> +				}
> +				wqe  += seglen;
> +				size += seglen / 16;
> +			}
>  			break;
>  
>  		case IB_QPT_SMI:
> diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
> index f494c3e..d82f275 100644
> --- a/drivers/net/mlx4/fw.c
> +++ b/drivers/net/mlx4/fw.c
> @@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  #define QUERY_DEV_CAP_MAX_AV_OFFSET		0x27
>  #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET		0x29
>  #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET		0x2b
> +#define QUERY_DEV_CAP_MAX_GSO_OFFSET		0x2d
>  #define QUERY_DEV_CAP_MAX_RDMA_OFFSET		0x2f
>  #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET		0x33
>  #define QUERY_DEV_CAP_ACK_DELAY_OFFSET		0x35
> @@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  	dev_cap->max_requester_per_qp = 1 << (field & 0x3f);
>  	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET);
>  	dev_cap->max_responder_per_qp = 1 << (field & 0x3f);
> +	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET);
> +	field &= 0x1f;
> +	if (!field)
> +		dev_cap->max_gso_sz = 0;
> +	else
> +		dev_cap->max_gso_sz = 1 << field;
> +
>  	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET);
>  	dev_cap->max_rdma_global = 1 << (field & 0x3f);
>  	MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET);
> @@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
>  	mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n",
>  		 dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg);
> +	mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz);
>  
>  	dump_dev_cap_flags(dev, dev_cap->flags);
>  
> diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
> index e16dec8..306cb9b 100644
> --- a/drivers/net/mlx4/fw.h
> +++ b/drivers/net/mlx4/fw.h
> @@ -96,6 +96,7 @@ struct mlx4_dev_cap {
>  	u8  bmme_flags;
>  	u32 reserved_lkey;
>  	u64 max_icm_sz;
> +	int max_gso_sz;
>  };
>  
>  struct mlx4_adapter {
> diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
> index 08bfc13..7cfbe75 100644
> --- a/drivers/net/mlx4/main.c
> +++ b/drivers/net/mlx4/main.c
> @@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  	dev->caps.page_size_cap	     = ~(u32) (dev_cap->min_page_sz - 1);
>  	dev->caps.flags		     = dev_cap->flags;
>  	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
> +	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
>  
>  	return 0;
>  }
> diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
> index 6cdf813..ff7df1a 100644
> --- a/include/linux/mlx4/device.h
> +++ b/include/linux/mlx4/device.h
> @@ -186,6 +186,7 @@ struct mlx4_caps {
>  	u32			flags;
>  	u16			stat_rate_support;
>  	u8			port_width_cap[MLX4_MAX_PORTS + 1];
> +	int			max_gso_sz;
>  };
>  
>  struct mlx4_buf_list {
> diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
> index 31f9eb3..cf0bf4e 100644
> --- a/include/linux/mlx4/qp.h
> +++ b/include/linux/mlx4/qp.h
> @@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg {
>  	__be32			reservd[2];
>  };
>  
> +struct mlx4_lso_seg {
> +	__be32                  mss_hdr_size;
> +	__be32                  header[0];
> +};
> +
>  struct mlx4_wqe_bind_seg {
>  	__be32			flags1;
>  	__be32			flags2;


From eli at dev.mellanox.co.il  Wed Apr  2 03:04:39 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 13:04:39 +0300
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <ada8wzxcqhl.fsf@cisco.com>
References: <1205767431.25950.138.camel@mtls03> <aday782h6jq.fsf@cisco.com>
	<1207064146.3781.19.camel@mtls03>  <ada8wzxcqhl.fsf@cisco.com>
Message-ID: <1207130679.3781.50.camel@mtls03>

Oof, that was a bad one and the following patch fixes the problem.

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index f805e8a..4eaee27 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -255,7 +255,7 @@ static int send_wqe_overhead(enum ib_qp_type type, u32 flags)
        case IB_QPT_UD:
                return sizeof (struct mlx4_wqe_ctrl_seg) +
                        sizeof (struct mlx4_wqe_datagram_seg) +
-                       (flags & MLX4_IB_QP_LSO) ? 64 : 0;
+                       ((flags & MLX4_IB_QP_LSO) ? 64 : 0);
        case IB_QPT_UC:
                return sizeof (struct mlx4_wqe_ctrl_seg) +
                        sizeof (struct mlx4_wqe_raddr_seg);


and the explanation is this. Since '+' precedes the '?' operator, the
expression evaluated is:
	sizeof (struct mlx4_wqe_ctrl_seg) + sizeof (struct mlx4_wqe_datagram_seg) +
		(flags & MLX4_IB_QP_LSO)

which is obviously true so the value returned is 64. The parentheses
around the '?' gives the desired result. 

On Tue, 2008-04-01 at 12:41 -0700, Roland Dreier wrote:
> > would like me to re-generate the mlx4 LSO patch to match this commit or
>  > would you do the adjustments?
> 
> Sorry for being so slow.
> 
> Anyway I did the adjustments as below.  I also removed the "reserve"
> variable and moved the 64 byte extra for LSO into send_wqe_overhead(),
> since it seemed that the only place where you used send_wqe_overhead()
> without adding in reserve was actually a bug.
> 
> I also did various changes other places, and maybe introduced a bug:
> when I try NPtcp between two systems (once running unmodified
> 2.6.25-rc8, the other running my for-2.6.26 branch, both with ConnectX
> with FW 2.3.000), on the side with the LSO patch, I eventually get a
> "local length error" or "local QP operation err" on a send.  It is an
> LSO send of length 63744 with 17 fragments and an mss of 1992, so it
> should be segmented into 32 packets.  Some of these sends complete
> successfully but eventually one fails.  I'm still debugging but maybe
> you have some idea?
> 
> When I get the local QP operation error, I get this in case it helps:
> 
> local QP operation err (QPN 000048, WQE index affa, vendor syndrome 6f, opcode = 5e)
> CQE contents 00000048 00000000 00000000 00000000 00000000 00000000 affa6f02 0000005e
> 
>  - R.
> 
> From 141035c707b81638659ada01f456d066f2b353f7 Mon Sep 17 00:00:00 2001
> From: Eli Cohen <eli at dev.mellanox.co.il>
> Date: Tue, 25 Mar 2008 15:35:12 +0200
> Subject: [PATCH] IB/mlx4: Add IPoIB LSO support
> 
> Add TSO support to the mlx4_ib driver.
> 
> Signed-off-by: Eli Cohen <eli at mellanox.co.il>
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  drivers/infiniband/hw/mlx4/cq.c      |    3 +
>  drivers/infiniband/hw/mlx4/main.c    |    2 +
>  drivers/infiniband/hw/mlx4/mlx4_ib.h |    5 ++
>  drivers/infiniband/hw/mlx4/qp.c      |   72 +++++++++++++++++++++++++++++----
>  drivers/net/mlx4/fw.c                |    9 ++++
>  drivers/net/mlx4/fw.h                |    1 +
>  drivers/net/mlx4/main.c              |    1 +
>  include/linux/mlx4/device.h          |    1 +
>  include/linux/mlx4/qp.h              |    5 ++
>  9 files changed, 90 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
> index d2e32b0..7d70af7 100644
> --- a/drivers/infiniband/hw/mlx4/cq.c
> +++ b/drivers/infiniband/hw/mlx4/cq.c
> @@ -420,6 +420,9 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
>  		case MLX4_OPCODE_BIND_MW:
>  			wc->opcode    = IB_WC_BIND_MW;
>  			break;
> +		case MLX4_OPCODE_LSO:
> +			wc->opcode    = IB_WC_LSO;
> +			break;
>  		}
>  	} else {
>  		wc->byte_len = be32_to_cpu(cqe->byte_cnt);
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index 6ea4746..e9330a0 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -101,6 +101,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
>  		props->device_cap_flags |= IB_DEVICE_UD_AV_PORT_ENFORCE;
>  	if (dev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IPOIB_CSUM)
>  		props->device_cap_flags |= IB_DEVICE_UD_IP_CSUM;
> +	if (dev->dev->caps.max_gso_sz)
> +		props->device_cap_flags |= IB_DEVICE_UD_TSO;
>  
>  	props->vendor_id	   = be32_to_cpup((__be32 *) (out_mad->data + 36)) &
>  		0xffffff;
> diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
> index 3726e45..3f8bd0a 100644
> --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
> +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
> @@ -110,6 +110,10 @@ struct mlx4_ib_wq {
>  	unsigned		tail;
>  };
>  
> +enum mlx4_ib_qp_flags {
> +	MLX4_IB_QP_LSO		= 1 << 0
> +};
> +
>  struct mlx4_ib_qp {
>  	struct ib_qp		ibqp;
>  	struct mlx4_qp		mqp;
> @@ -129,6 +133,7 @@ struct mlx4_ib_qp {
>  	struct mlx4_mtt		mtt;
>  	int			buf_size;
>  	struct mutex		mutex;
> +	u32			flags;
>  	u8			port;
>  	u8			alt_port;
>  	u8			atomic_rd_en;
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index 320c25f..8ddb97e 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -71,6 +71,7 @@ enum {
>  
>  static const __be32 mlx4_ib_opcode[] = {
>  	[IB_WR_SEND]			= __constant_cpu_to_be32(MLX4_OPCODE_SEND),
> +	[IB_WR_LSO]			= __constant_cpu_to_be32(MLX4_OPCODE_LSO),
>  	[IB_WR_SEND_WITH_IMM]		= __constant_cpu_to_be32(MLX4_OPCODE_SEND_IMM),
>  	[IB_WR_RDMA_WRITE]		= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE),
>  	[IB_WR_RDMA_WRITE_WITH_IMM]	= __constant_cpu_to_be32(MLX4_OPCODE_RDMA_WRITE_IMM),
> @@ -242,7 +243,7 @@ static void mlx4_ib_qp_event(struct mlx4_qp *qp, enum mlx4_event type)
>  	}
>  }
>  
> -static int send_wqe_overhead(enum ib_qp_type type)
> +static int send_wqe_overhead(enum ib_qp_type type, u32 flags)
>  {
>  	/*
>  	 * UD WQEs must have a datagram segment.
> @@ -253,7 +254,8 @@ static int send_wqe_overhead(enum ib_qp_type type)
>  	switch (type) {
>  	case IB_QPT_UD:
>  		return sizeof (struct mlx4_wqe_ctrl_seg) +
> -			sizeof (struct mlx4_wqe_datagram_seg);
> +			sizeof (struct mlx4_wqe_datagram_seg) +
> +			(flags & MLX4_IB_QP_LSO) ? 64 : 0;
>  	case IB_QPT_UC:
>  		return sizeof (struct mlx4_wqe_ctrl_seg) +
>  			sizeof (struct mlx4_wqe_raddr_seg);
> @@ -315,7 +317,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
>  	/* Sanity check SQ size before proceeding */
>  	if (cap->max_send_wr	 > dev->dev->caps.max_wqes  ||
>  	    cap->max_send_sge	 > dev->dev->caps.max_sq_sg ||
> -	    cap->max_inline_data + send_wqe_overhead(type) +
> +	    cap->max_inline_data + send_wqe_overhead(type, qp->flags) +
>  	    sizeof (struct mlx4_wqe_inline_seg) > dev->dev->caps.max_sq_desc_sz)
>  		return -EINVAL;
>  
> @@ -329,7 +331,7 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
>  
>  	s = max(cap->max_send_sge * sizeof (struct mlx4_wqe_data_seg),
>  		cap->max_inline_data + sizeof (struct mlx4_wqe_inline_seg)) +
> -		send_wqe_overhead(type);
> +		send_wqe_overhead(type, qp->flags);
>  
>  	/*
>  	 * Hermon supports shrinking WQEs, such that a single work
> @@ -394,7 +396,8 @@ static int set_kernel_sq_size(struct mlx4_ib_dev *dev, struct ib_qp_cap *cap,
>  	}
>  
>  	qp->sq.max_gs = ((qp->sq_max_wqes_per_wr << qp->sq.wqe_shift) -
> -			 send_wqe_overhead(type)) / sizeof (struct mlx4_wqe_data_seg);
> +			 send_wqe_overhead(type, qp->flags)) /
> +		sizeof (struct mlx4_wqe_data_seg);
>  
>  	qp->buf_size = (qp->rq.wqe_cnt << qp->rq.wqe_shift) +
>  		(qp->sq.wqe_cnt << qp->sq.wqe_shift);
> @@ -503,6 +506,9 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
>  	} else {
>  		qp->sq_no_prefetch = 0;
>  
> +		if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO)
> +			qp->flags |= MLX4_IB_QP_LSO;
> +
>  		err = set_kernel_sq_size(dev, &init_attr->cap, init_attr->qp_type, qp);
>  		if (err)
>  			goto err;
> @@ -673,7 +679,11 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
>  	struct mlx4_ib_qp *qp;
>  	int err;
>  
> -	if (init_attr->create_flags)
> +	/* We only support LSO, and only for kernel UD QPs. */
> +	if (init_attr->create_flags & ~IB_QP_CREATE_IPOIB_UD_LSO)
> +		return ERR_PTR(-EINVAL);
> +	if (init_attr->create_flags & IB_QP_CREATE_IPOIB_UD_LSO &&
> +	    (pd->uobject || init_attr->qp_type != IB_QPT_UD))
>  		return ERR_PTR(-EINVAL);
>  
>  	switch (init_attr->qp_type) {
> @@ -879,10 +889,15 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
>  		}
>  	}
>  
> -	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI ||
> -	    ibqp->qp_type == IB_QPT_UD)
> +	if (ibqp->qp_type == IB_QPT_GSI || ibqp->qp_type == IB_QPT_SMI)
>  		context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
> -	else if (attr_mask & IB_QP_PATH_MTU) {
> +	else if (ibqp->qp_type == IB_QPT_UD) {
> +		if (qp->flags & MLX4_IB_QP_LSO)
> +			context->mtu_msgmax = (IB_MTU_4096 << 5) |
> +					      ilog2(dev->dev->caps.max_gso_sz);
> +		else
> +			context->mtu_msgmax = (IB_MTU_4096 << 5) | 11;
> +	} else if (attr_mask & IB_QP_PATH_MTU) {
>  		if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) {
>  			printk(KERN_ERR "path MTU (%u) is invalid\n",
>  			       attr->path_mtu);
> @@ -1399,6 +1414,34 @@ static void __set_data_seg(struct mlx4_wqe_data_seg *dseg, struct ib_sge *sg)
>  	dseg->addr       = cpu_to_be64(sg->addr);
>  }
>  
> +static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr,
> +			 struct mlx4_ib_qp *qp, unsigned *lso_seg_len)
> +{
> +	unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
> +
> +	/*
> +	 * This is a temporary limitation and will be removed in
> +	 * a forthcoming FW release:
> +	 */
> +	if (unlikely(wr->wr.ud.hlen) > 60)
> +		return -EINVAL;
> +
> +	if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
> +		     wr->num_sge > qp->sq.max_gs - (halign >> 4)))
> +		return -EINVAL;
> +
> +	memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen);
> +
> +	/* make sure LSO header is written before overwriting stamping */
> +	wmb();
> +
> +	wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 |
> +					wr->wr.ud.hlen);
> +
> +	*lso_seg_len = halign;
> +	return 0;
> +}
> +
>  int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  		      struct ib_send_wr **bad_wr)
>  {
> @@ -1412,6 +1455,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  	unsigned ind;
>  	int uninitialized_var(stamp);
>  	int uninitialized_var(size);
> +	unsigned seglen;
>  	int i;
>  
>  	spin_lock_irqsave(&qp->sq.lock, flags);
> @@ -1490,6 +1534,16 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  			set_datagram_seg(wqe, wr);
>  			wqe  += sizeof (struct mlx4_wqe_datagram_seg);
>  			size += sizeof (struct mlx4_wqe_datagram_seg) / 16;
> +
> +			if (wr->opcode == IB_WR_LSO) {
> +				err = build_lso_seg(wqe, wr, qp, &seglen);
> +				if (err) {
> +					*bad_wr = wr;
> +					goto out;
> +				}
> +				wqe  += seglen;
> +				size += seglen / 16;
> +			}
>  			break;
>  
>  		case IB_QPT_SMI:
> diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
> index f494c3e..d82f275 100644
> --- a/drivers/net/mlx4/fw.c
> +++ b/drivers/net/mlx4/fw.c
> @@ -133,6 +133,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  #define QUERY_DEV_CAP_MAX_AV_OFFSET		0x27
>  #define QUERY_DEV_CAP_MAX_REQ_QP_OFFSET		0x29
>  #define QUERY_DEV_CAP_MAX_RES_QP_OFFSET		0x2b
> +#define QUERY_DEV_CAP_MAX_GSO_OFFSET		0x2d
>  #define QUERY_DEV_CAP_MAX_RDMA_OFFSET		0x2f
>  #define QUERY_DEV_CAP_RSZ_SRQ_OFFSET		0x33
>  #define QUERY_DEV_CAP_ACK_DELAY_OFFSET		0x35
> @@ -215,6 +216,13 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  	dev_cap->max_requester_per_qp = 1 << (field & 0x3f);
>  	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RES_QP_OFFSET);
>  	dev_cap->max_responder_per_qp = 1 << (field & 0x3f);
> +	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GSO_OFFSET);
> +	field &= 0x1f;
> +	if (!field)
> +		dev_cap->max_gso_sz = 0;
> +	else
> +		dev_cap->max_gso_sz = 1 << field;
> +
>  	MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_RDMA_OFFSET);
>  	dev_cap->max_rdma_global = 1 << (field & 0x3f);
>  	MLX4_GET(field, outbox, QUERY_DEV_CAP_ACK_DELAY_OFFSET);
> @@ -377,6 +385,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
>  	mlx4_dbg(dev, "Max RQ desc size: %d, max RQ S/G: %d\n",
>  		 dev_cap->max_rq_desc_sz, dev_cap->max_rq_sg);
> +	mlx4_dbg(dev, "Max GSO size: %d\n", dev_cap->max_gso_sz);
>  
>  	dump_dev_cap_flags(dev, dev_cap->flags);
>  
> diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
> index e16dec8..306cb9b 100644
> --- a/drivers/net/mlx4/fw.h
> +++ b/drivers/net/mlx4/fw.h
> @@ -96,6 +96,7 @@ struct mlx4_dev_cap {
>  	u8  bmme_flags;
>  	u32 reserved_lkey;
>  	u64 max_icm_sz;
> +	int max_gso_sz;
>  };
>  
>  struct mlx4_adapter {
> diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
> index 08bfc13..7cfbe75 100644
> --- a/drivers/net/mlx4/main.c
> +++ b/drivers/net/mlx4/main.c
> @@ -159,6 +159,7 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  	dev->caps.page_size_cap	     = ~(u32) (dev_cap->min_page_sz - 1);
>  	dev->caps.flags		     = dev_cap->flags;
>  	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
> +	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;
>  
>  	return 0;
>  }
> diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
> index 6cdf813..ff7df1a 100644
> --- a/include/linux/mlx4/device.h
> +++ b/include/linux/mlx4/device.h
> @@ -186,6 +186,7 @@ struct mlx4_caps {
>  	u32			flags;
>  	u16			stat_rate_support;
>  	u8			port_width_cap[MLX4_MAX_PORTS + 1];
> +	int			max_gso_sz;
>  };
>  
>  struct mlx4_buf_list {
> diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
> index 31f9eb3..cf0bf4e 100644
> --- a/include/linux/mlx4/qp.h
> +++ b/include/linux/mlx4/qp.h
> @@ -219,6 +219,11 @@ struct mlx4_wqe_datagram_seg {
>  	__be32			reservd[2];
>  };
>  
> +struct mlx4_lso_seg {
> +	__be32                  mss_hdr_size;
> +	__be32                  header[0];
> +};
> +
>  struct mlx4_wqe_bind_seg {
>  	__be32			flags1;
>  	__be32			flags2;


From sashak at voltaire.com  Wed Apr  2 06:42:50 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 2 Apr 2008 13:42:50 +0000
Subject: [ofa-general] Re: [PATCH] libibmad/dump: support VLArb table size,
	fix printing
In-Reply-To: <20080401225418.GF29410@sgi.com>
References: <20080329121252.GY13708@sashak.voltaire.com>
	<20080401225418.GF29410@sgi.com>
Message-ID: <20080402134250.GH30617@sashak.voltaire.com>

On 15:54 Tue 01 Apr     , akepner at sgi.com wrote:
> On Sat, Mar 29, 2008 at 12:12:52PM +0000, Sasha Khapyorsky wrote:
> > 
> > Add support for VLArb table size. Fix printing, eliminate intermediate
> > buffers, some other cleanups.
> > 
> > Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> > ---
> > 
> > Arthur, could you try this?
> > ....
> 
> Tested-by: Arthur Kepner <akepner at sgi.com>
> 
> Yes, I tried it (along with the infiniband-diags patch) and 
> that fixes things. Thanks!

Thanks for looking at this. I committed the fixes.

Sasha


From EveintelligentsiaStaples at frontpagemag.com  Wed Apr  2 08:57:55 2008
From: EveintelligentsiaStaples at frontpagemag.com (Pearlie Currie)
Date: Wed, 2 Apr 2008 13:57:55 -0200
Subject: [ofa-general] Next Big market Winner
Message-ID: <9IX079EJXVWDA301@frontpagemag.com>

THE GOLD of Small Caps LITERALLY!!

The recent pull back in Gold prices has made it a PERFECT time to get in and load up
Gold is destined for $2000/oz 
Gold & Silver company G&S Minerals ( Symbol:GSML) is on Fire

People are loading up, If you missed the move in past 2 days dont dispair

GSML has a price target of 3.88 by a reputed analyst firm and a Tight Float to help

The the word markets fearing economic growth, Gold is bound to shine and
The company with most to gain is an UNDISCOVERED GEM such as GSML

Get in GSML and reap the profits


From holt at sgi.com  Wed Apr  2 03:59:25 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 2 Apr 2008 05:59:25 -0500
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080402064952.GF19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
Message-ID: <20080402105925.GC22493@sgi.com>

On Wed, Apr 02, 2008 at 08:49:52AM +0200, Andrea Arcangeli wrote:
> Most other patches will apply cleanly on top of my coming mmu
> notifiers #v10 that I hope will go in -mm.
> 
> For #v10 the only two left open issues to discuss are:

Does your v10 allow sleeping inside the callbacks?

Thanks,
Robin


From andrea at qumranet.com  Wed Apr  2 04:16:51 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 2 Apr 2008 13:16:51 +0200
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080402105925.GC22493@sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<20080402105925.GC22493@sgi.com>
Message-ID: <20080402111651.GN19189@duo.random>

On Wed, Apr 02, 2008 at 05:59:25AM -0500, Robin Holt wrote:
> On Wed, Apr 02, 2008 at 08:49:52AM +0200, Andrea Arcangeli wrote:
> > Most other patches will apply cleanly on top of my coming mmu
> > notifiers #v10 that I hope will go in -mm.
> > 
> > For #v10 the only two left open issues to discuss are:
> 
> Does your v10 allow sleeping inside the callbacks?

Yes if you apply all the patches. But not if you apply the first patch
only, most patches in EMM serie will apply cleanly or with minor
rejects to #v10 too, Christoph's further work to make EEM sleep
capable looks very good and it's going to be 100% shared, it's also
going to be a lot more controversial for merging than the two #v10 or
EMM first patch. EMM also doesn't allow sleeping inside the callbacks
if you only apply the first patch in the serie.

My priority is to get #v9 or the coming #v10 merged in -mm (only
difference will be the replacement of rcu_read_lock with the seqlock
to avoid breaking the synchronize_rcu in GRU code). I will mix seqlock
with rcu ordered writes. EMM indeed breaks GRU by making
synchronize_rcu a noop and by not providing any alternative (I will
obsolete synchronize_rcu making it a noop instead). This assumes Jack
used synchronize_rcu for whatever good reason. But this isn't the real
strong point against EMM, adding seqlock to EMM is as easy as adding
it to #v10 (admittedly with #v10 is a bit easier because I didn't
expand the hlist operations for zero gain like in EMM).


From NonabethlehemSheldon at play-create.com  Wed Apr  2 07:20:23 2008
From: NonabethlehemSheldon at play-create.com (Nona Jack)
Date: Wed, 2 Apr 2008 13:20:23 -0100
Subject: [ofa-general] Rocket Stock Report
Message-ID: <0IX838EJXVWDA977@play-create.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080402/30b55a55/attachment.html>

From eli at dev.mellanox.co.il  Wed Apr  2 04:41:33 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 14:41:33 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <adazlsdbb3a.fsf@cisco.com>
References: <1206452112.25950.360.camel@mtls03>  <adazlsdbb3a.fsf@cisco.com>
Message-ID: <1207136493.3781.59.camel@mtls03>


On Tue, 2008-04-01 at 12:59 -0700, Roland Dreier wrote:
> > +	halign = ALIGN(wr->wr.ud.hlen, 16);
> 
> This doesn't seem connected to the problem I see, but is this correct?
> Suppose hlen is 48... then halign will be 48 but it really should be
> 64 I think.  Do we really want
> 
> 	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);
> 
> instead?
> 

I don't think so, at least in the case that hlen equals 48 which is a
valid one since the total length used by the LSO segment would be 48 + 4
which requires 4 * 16 bytes chunks. If we'd use the above statement the
send would fail.

Anyway I think this function should look like this:

static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr,
			 struct mlx4_ib_qp *qp, unsigned *lso_seg_len)
{
	unsigned halign = ALIGN(wr->wr.ud.hlen + 4, 16);

	/*
	 * This is a temporary limitation and will be removed in
	 * a forthcoming FW release:
	 */
	if (unlikely(halign > 64))
		return -EINVAL;

	if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
		     wr->num_sge > qp->sq.max_gs - (halign >> 4)))
		return -EINVAL;

	memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen);

	/* make sure LSO header is written before overwriting stamping */
	wmb();

	wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 |
					wr->wr.ud.hlen);

	*lso_seg_len = halign;
	return 0;
}


And also I suggest to use these too:

@@ -1539,7 +1539,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,

                        if (wr->opcode == IB_WR_LSO) {
                                err = build_lso_seg(wqe, wr, qp, &seglen);
-                               if (err) {
+                               if (unlikely(err)) {
                                        *bad_wr = wr;
                                        goto out;
                                }
@@ -1551,7 +1551,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
                case IB_QPT_SMI:
                case IB_QPT_GSI:
                        err = build_mlx_header(to_msqp(qp), wr, ctrl, &seglen);
-                       if (err) {
+                       if (unlikely(err)) {
                                *bad_wr = wr;
                                goto out;
                        }
@@ -1594,7 +1594,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
                 */
                wmb();

-               if (wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode)) {
+               if (unlikely(wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode))) {
                        err = -EINVAL;
                        goto out;
                }


From eli at dev.mellanox.co.il  Wed Apr  2 04:41:33 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 14:41:33 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <adazlsdbb3a.fsf@cisco.com>
References: <1206452112.25950.360.camel@mtls03>  <adazlsdbb3a.fsf@cisco.com>
Message-ID: <1207136493.3781.59.camel@mtls03>


On Tue, 2008-04-01 at 12:59 -0700, Roland Dreier wrote:
> > +	halign = ALIGN(wr->wr.ud.hlen, 16);
> 
> This doesn't seem connected to the problem I see, but is this correct?
> Suppose hlen is 48... then halign will be 48 but it really should be
> 64 I think.  Do we really want
> 
> 	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);
> 
> instead?
> 

I don't think so, at least in the case that hlen equals 48 which is a
valid one since the total length used by the LSO segment would be 48 + 4
which requires 4 * 16 bytes chunks. If we'd use the above statement the
send would fail.

Anyway I think this function should look like this:

static int build_lso_seg(struct mlx4_lso_seg *wqe, struct ib_send_wr *wr,
			 struct mlx4_ib_qp *qp, unsigned *lso_seg_len)
{
	unsigned halign = ALIGN(wr->wr.ud.hlen + 4, 16);

	/*
	 * This is a temporary limitation and will be removed in
	 * a forthcoming FW release:
	 */
	if (unlikely(halign > 64))
		return -EINVAL;

	if (unlikely(!(qp->flags & MLX4_IB_QP_LSO) &&
		     wr->num_sge > qp->sq.max_gs - (halign >> 4)))
		return -EINVAL;

	memcpy(wqe->header, wr->wr.ud.header, wr->wr.ud.hlen);

	/* make sure LSO header is written before overwriting stamping */
	wmb();

	wqe->mss_hdr_size = cpu_to_be32((wr->wr.ud.mss - wr->wr.ud.hlen) << 16 |
					wr->wr.ud.hlen);

	*lso_seg_len = halign;
	return 0;
}


And also I suggest to use these too:

@@ -1539,7 +1539,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,

                        if (wr->opcode == IB_WR_LSO) {
                                err = build_lso_seg(wqe, wr, qp, &seglen);
-                               if (err) {
+                               if (unlikely(err)) {
                                        *bad_wr = wr;
                                        goto out;
                                }
@@ -1551,7 +1551,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
                case IB_QPT_SMI:
                case IB_QPT_GSI:
                        err = build_mlx_header(to_msqp(qp), wr, ctrl, &seglen);
-                       if (err) {
+                       if (unlikely(err)) {
                                *bad_wr = wr;
                                goto out;
                        }
@@ -1594,7 +1594,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
                 */
                wmb();

-               if (wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode)) {
+               if (unlikely(wr->opcode < 0 || wr->opcode >= ARRAY_SIZE(mlx4_ib_opcode))) {
                        err = -EINVAL;
                        goto out;
                }


From eli at dev.mellanox.co.il  Wed Apr  2 04:52:57 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 14:52:57 +0300
Subject: [ofa-general] [PATCH/RFC] Add support for "send with
	invalidate" to libibverbs
In-Reply-To: <adalk3w53ei.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com>  <adalk3w53ei.fsf@cisco.com>
Message-ID: <1207137177.3781.67.camel@mtls03>

WRT fb9fbf7cc5301a914e099d95d8f9a46a34e58aee

Since send with immediate and send with invalidate are mutually
exclusive, wouldn't it make sense to use a union for both the immediate
value and the invalidated rkey?

Also it seems like this commit touches code in both ib core and in hw
drivers.


From tziporet at dev.mellanox.co.il  Wed Apr  2 05:31:32 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 02 Apr 2008 15:31:32 +0300
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <adave31bayd.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>
Message-ID: <47F37CA4.8000109@mellanox.co.il>

Roland Dreier wrote:
> Core:
>
>  - I did a bunch of cleanups all over drivers/infiniband and the
>    gcc and sparse warning noise is down to a pretty reasonable level.
>    Further cleanups welcome of course.
>   
We want to add send with invalidate & mask compare and swap.
Eli will be able to send the patches next week and since they are small 
I think they can be in for 2.6.26
> ULPs:
>
>  - I merged Eli's IPoIB stateless offload changes for checksum
>    offload and LSO changes.  The interrupt moderation changes are
>    next, and should not be a problem to merge.  Please test IPoIB
>    on all sorts of hardware!
>   
What about the split CQ for UD mode? It's improved the IPoIB performance 
for small messages significantly.
>
> HW specific:
>
>   
mlx4- we plan to send patches for the low level driver only to enable 
mlx4_en. These only affect our low level driver.
Should be ready next week. I hope these can get in too.
> Here are a few topics that I believe will not be ready in time for the
> 2.6.26 window and will need to wait for 2.6.27 at least:
>
>  - XRC.  I still don't have a good feeling that we have settled on all
>    the nuances of the ABI we want to expose to userspace for this, and
>    ideally I would like to understand how ehca LL QPs fit into the
>    picture as well.
>   
I think we should try to push for XEC in 2.6.26 since there are already 
MPI implementation that use it and this ties them to use OFED only.
Also this feature is stable and now being defined in IBTA
Not taking it causing changes between OFED and the kernel and your 
libibverbs and we wish to avoid such gaps.
Is there any thing we can do to help and make it into 2.6.26?


From jackm at dev.mellanox.co.il  Wed Apr  2 06:15:44 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Wed, 2 Apr 2008 16:15:44 +0300
Subject: [ofa-general] [PATCH] mlx4: make firmware diagnostic counters
	available via sysfs
Message-ID: <200804021615.44982.jackm@dev.mellanox.co.il>

mlx4: make firmware diagnostic counters available via sysfs.

Developed by: Gabi Liron of Mellanox.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

---
Roland,
Please queue this up for kernel 2.6.26.

Thanks!
Jack

Index: infiniband/drivers/net/mlx4/fw.c
===================================================================
--- infiniband.orig/drivers/net/mlx4/fw.c	2008-02-05 09:32:14.000000000 +0200
+++ infiniband/drivers/net/mlx4/fw.c	2008-04-02 16:06:05.000000000 +0300
@@ -827,3 +827,40 @@ int mlx4_NOP(struct mlx4_dev *dev)
 	/* Input modifier of 0x1f means "finish as soon as possible." */
 	return mlx4_cmd(dev, 0, 0x1f, 0, MLX4_CMD_NOP, 100);
 }
+
+int mlx4_query_diag_counters(struct mlx4_dev *dev, int array_length,
+			     int in_modifier, unsigned int in_offset[],
+			     u32 counter_out[])
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 *outbox;
+	u32 op_modifer = (u32)in_modifier;
+	int ret;
+	int i;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+	outbox = mailbox->buf;
+
+	ret = mlx4_cmd_box(dev, 0, mailbox->dma, 0, op_modifer,
+			   MLX4_CMD_DIAG_RPRT, MLX4_CMD_TIME_CLASS_A);
+	if (ret)
+		goto out;
+
+	for(i=0; i<array_length; i++) {
+		if (in_offset[i] > MLX4_MAILBOX_SIZE) {
+			ret = -1;
+			goto out;
+		}
+
+		MLX4_GET(counter_out[i], outbox,   in_offset[i]);
+	}
+	ret = 0;
+
+out:
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_query_diag_counters);
+
Index: infiniband/include/linux/mlx4/device.h
===================================================================
--- infiniband.orig/include/linux/mlx4/device.h	2008-02-10 16:33:29.000000000 +0200
+++ infiniband/include/linux/mlx4/device.h	2008-04-02 16:06:05.000000000 +0300
@@ -368,5 +368,8 @@ void mlx4_fmr_unmap(struct mlx4_dev *dev
 		    u32 *lkey, u32 *rkey);
 int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);
+int mlx4_query_diag_counters(struct mlx4_dev *melx4_dev, int array_length,
+			     int in_modifier, unsigned int in_offset[],
+			     u32 counter_out[]);
 
 #endif /* MLX4_DEVICE_H */
Index: infiniband/drivers/infiniband/hw/mlx4/main.c
===================================================================
--- infiniband.orig/drivers/infiniband/hw/mlx4/main.c	2008-02-27 16:21:35.000000000 +0200
+++ infiniband/drivers/infiniband/hw/mlx4/main.c	2008-04-02 16:06:05.000000000 +0300
@@ -515,6 +515,155 @@ static struct class_device_attribute *ml
 	&class_device_attr_board_id
 };
 
+/*
+ * create 2 functions (show, store) and a class_device_attribute struct
+ * pointing to the functions for _name
+ */
+#define CLASS_DEVICE_DIAG_CLR_RPRT_ATTR(_name, _offset, _in_mod)		\
+static ssize_t store_rprt_##_name(struct class_device *cdev, 			\
+				  const char *buf, size_t length) {		\
+	return store_diag_rprt(cdev, buf, length, _offset, _in_mod);		\
+}										\
+static ssize_t show_rprt_##_name(struct class_device *cdev, char *buf) {	\
+	return show_diag_rprt(cdev, buf, _offset, _in_mod);			\
+}										\
+static CLASS_DEVICE_ATTR(_name, S_IRUGO | S_IWUGO, 				\
+	show_rprt_##_name, store_rprt_##_name);
+
+/*
+ * create show function and a class_device_attribute struct pointing to
+ * the function for _name
+ */
+#define CLASS_DEVICE_DIAG_RPRT_ATTR(_name, _offset, _in_mod)			\
+static ssize_t show_rprt_##_name(struct class_device *cdev, char *buf){		\
+	return show_diag_rprt(cdev, buf, _offset, _in_mod);			\
+}										\
+static CLASS_DEVICE_ATTR(_name, S_IRUGO, show_rprt_##_name, NULL);
+
+static ssize_t show_diag_rprt(struct class_device *cdev, char *buf,
+                              int offset, int in_mod)
+{
+	size_t ret = -1;
+	u32 counter_offset = offset;
+	u32 diag_counter = 0;
+	struct mlx4_ib_dev *dev = container_of(cdev, struct mlx4_ib_dev,
+					       ib_dev.class_dev);
+	/* clear counters file, can't read it */
+	if(offset < 0)
+		return sprintf(buf,"This file is write only\n");
+
+	ret = mlx4_query_diag_counters(dev->dev, 1, in_mod, &counter_offset,
+			 	       &diag_counter);
+	if (ret < 0)
+	{
+		sprintf(buf,"Operation failed\n");
+		return ret;
+	}
+
+	return sprintf(buf,"%d\n", diag_counter);
+}
+
+/* the store function is used for counter clear */
+static ssize_t store_diag_rprt(struct class_device *cdev,
+			       const char *buf, size_t length,
+			       int offset, int in_mod)
+{
+	size_t ret = -1;
+	u32 counter_offset = 0;
+	u32 diag_counter;
+	struct mlx4_ib_dev *dev = container_of(cdev, struct mlx4_ib_dev,
+					       ib_dev.class_dev);
+
+	ret = mlx4_query_diag_counters(dev->dev, 1, in_mod, &counter_offset,
+				       &diag_counter);
+	if (ret)
+		return ret;
+
+	return length;
+}
+
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lle		  , 0x00, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_lle		  , 0x04, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lqpoe	  , 0x08, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_lqpoe 	  , 0x0C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_leeoe	  , 0x10, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_leeoe	  , 0x14, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lpe		  , 0x18, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_lpe		  , 0x1C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_wrfe		  , 0x20, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_wrfe		  , 0x24, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_mwbe		  , 0x2C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_bre		  , 0x34, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_lae		  , 0x38, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rire		  , 0x44, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rire		  , 0x48, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rae		  , 0x4C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rae		  , 0x50, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_roe		  , 0x54, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_tree		  , 0x5C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rree		  , 0x64, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rnr		  , 0x68, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rnr		  , 0x6C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rabrte	  , 0x7C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_ieecne	  , 0x84, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_ieecse	  , 0x8C, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_oos		  , 0x100, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_oos		  , 0x104, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_mce		  , 0x108, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_rsync	  , 0x110, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(sq_num_rsync	  , 0x114, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_udsdprd	  , 0x118, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(rq_num_ucsdprd	  , 0x120, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(num_cqovf	  	  , 0x1A0, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(num_eqovf		  , 0x1A4, 2);
+CLASS_DEVICE_DIAG_RPRT_ATTR(num_baddb		  , 0x1A8, 2);
+CLASS_DEVICE_DIAG_CLR_RPRT_ATTR(clear_diag	  , -1   , 3);
+
+static struct attribute *diag_rprt_attrs[] = {
+	&class_device_attr_rq_num_lle.attr,
+	&class_device_attr_sq_num_lle.attr,
+	&class_device_attr_rq_num_lqpoe.attr,
+	&class_device_attr_sq_num_lqpoe.attr,
+	&class_device_attr_rq_num_leeoe.attr,
+	&class_device_attr_sq_num_leeoe.attr,
+	&class_device_attr_rq_num_lpe.attr,
+	&class_device_attr_sq_num_lpe.attr,
+	&class_device_attr_rq_num_wrfe.attr,
+	&class_device_attr_sq_num_wrfe.attr,
+	&class_device_attr_sq_num_mwbe.attr,
+	&class_device_attr_sq_num_bre.attr,
+	&class_device_attr_rq_num_lae.attr,
+	&class_device_attr_sq_num_rire.attr,
+	&class_device_attr_rq_num_rire.attr,
+	&class_device_attr_sq_num_rae.attr,
+	&class_device_attr_rq_num_rae.attr,
+	&class_device_attr_sq_num_roe.attr,
+	&class_device_attr_sq_num_tree.attr,
+	&class_device_attr_sq_num_rree.attr,
+	&class_device_attr_rq_num_rnr.attr,
+	&class_device_attr_sq_num_rnr.attr,
+	&class_device_attr_sq_num_rabrte.attr,
+	&class_device_attr_sq_num_ieecne.attr,
+	&class_device_attr_sq_num_ieecse.attr,
+	&class_device_attr_rq_num_oos.attr,
+	&class_device_attr_sq_num_oos.attr,
+	&class_device_attr_rq_num_mce.attr,
+	&class_device_attr_rq_num_rsync.attr,
+	&class_device_attr_sq_num_rsync.attr,
+	&class_device_attr_rq_num_udsdprd.attr,
+	&class_device_attr_rq_num_ucsdprd.attr,
+	&class_device_attr_num_cqovf.attr,
+	&class_device_attr_num_eqovf.attr,
+	&class_device_attr_num_baddb.attr,
+	&class_device_attr_clear_diag.attr,
+	NULL
+};
+
+static struct attribute_group diag_counters_group = {
+	.name  = "diag_counters",
+	.attrs  = diag_rprt_attrs
+};
+
 static void *mlx4_ib_add(struct mlx4_dev *dev)
 {
 	static int mlx4_ib_version_printed;
@@ -638,8 +787,14 @@ static void *mlx4_ib_add(struct mlx4_dev
 			goto err_reg;
 	}
 
+	if(sysfs_create_group(&ibdev->ib_dev.class_dev.kobj, &diag_counters_group))
+		goto err_diag;
+
 	return ibdev;
 
+err_diag:
+	ib_unregister_device(&ibdev->ib_dev);
+
 err_reg:
 	ib_unregister_device(&ibdev->ib_dev);
 
@@ -663,6 +818,8 @@ static void mlx4_ib_remove(struct mlx4_d
 	struct mlx4_ib_dev *ibdev = ibdev_ptr;
 	int p;
 
+	sysfs_remove_group(&ibdev->ib_dev.class_dev.kobj, &diag_counters_group);
+
 	for (p = 1; p <= dev->caps.num_ports; ++p)
 		mlx4_CLOSE_PORT(dev, p);
 

From qvncoxgrvmyl at bluehorizonsolutions.com  Wed Apr  2 06:19:01 2008
From: qvncoxgrvmyl at bluehorizonsolutions.com (Yong Askew)
Date: Wed, 2 Apr 2008 21:19:01 +0800
Subject: [ofa-general] Re: Re:
Message-ID: <01c89507$2bab1080$5bc151de@qvncoxgrvmyl>

Control you sexual power.

Buy solution online! http://fenybelloweg.com


From holt at sgi.com  Wed Apr  2 07:26:10 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 2 Apr 2008 09:26:10 -0500
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080402111651.GN19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<20080402105925.GC22493@sgi.com>
	<20080402111651.GN19189@duo.random>
Message-ID: <20080402142609.GD22493@sgi.com>

I must have missed v10.  Could you repost so I can build xpmem
against it to see how it operates?  To help reduce confusion, you should
probably comandeer the patches from Christoph's set which you think are
needed to make it sleep.

Thanks,
Robin


On Wed, Apr 02, 2008 at 01:16:51PM +0200, Andrea Arcangeli wrote:
> On Wed, Apr 02, 2008 at 05:59:25AM -0500, Robin Holt wrote:
> > On Wed, Apr 02, 2008 at 08:49:52AM +0200, Andrea Arcangeli wrote:
> > > Most other patches will apply cleanly on top of my coming mmu
> > > notifiers #v10 that I hope will go in -mm.
> > > 
> > > For #v10 the only two left open issues to discuss are:
> > 
> > Does your v10 allow sleeping inside the callbacks?
> 
> Yes if you apply all the patches. But not if you apply the first patch
> only, most patches in EMM serie will apply cleanly or with minor
> rejects to #v10 too, Christoph's further work to make EEM sleep
> capable looks very good and it's going to be 100% shared, it's also
> going to be a lot more controversial for merging than the two #v10 or
> EMM first patch. EMM also doesn't allow sleeping inside the callbacks
> if you only apply the first patch in the serie.
> 
> My priority is to get #v9 or the coming #v10 merged in -mm (only
> difference will be the replacement of rcu_read_lock with the seqlock
> to avoid breaking the synchronize_rcu in GRU code). I will mix seqlock
> with rcu ordered writes. EMM indeed breaks GRU by making
> synchronize_rcu a noop and by not providing any alternative (I will
> obsolete synchronize_rcu making it a noop instead). This assumes Jack
> used synchronize_rcu for whatever good reason. But this isn't the real
> strong point against EMM, adding seqlock to EMM is as easy as adding
> it to #v10 (admittedly with #v10 is a bit easier because I didn't
> expand the hlist operations for zero gain like in EMM).
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


From mashirle at us.ibm.com  Wed Apr  2 00:22:12 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Wed, 02 Apr 2008 00:22:12 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <adave31bayd.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>
Message-ID: <1207120932.4593.47.camel@localhost.localdomain>

What's the status of RDS?

Thanks
Shirley


From rdreier at cisco.com  Wed Apr  2 08:22:04 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 08:22:04 -0700
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <1207130679.3781.50.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 13:04:39 +0300")
References: <1205767431.25950.138.camel@mtls03> <aday782h6jq.fsf@cisco.com>
	<1207064146.3781.19.camel@mtls03> <ada8wzxcqhl.fsf@cisco.com>
	<1207130679.3781.50.camel@mtls03>
Message-ID: <ada7ifg46zn.fsf@cisco.com>

 > -                       (flags & MLX4_IB_QP_LSO) ? 64 : 0;
 > +                       ((flags & MLX4_IB_QP_LSO) ? 64 : 0);

Ugh, thanks, I've rolled that up into the patch.  Sorry for messing
things up...


From rdreier at cisco.com  Wed Apr  2 08:22:04 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 08:22:04 -0700
Subject: [ofa-general] Re: [PATCH 3/10] IB/core: Add LSO support
In-Reply-To: <1207130679.3781.50.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 13:04:39 +0300")
References: <1205767431.25950.138.camel@mtls03> <aday782h6jq.fsf@cisco.com>
	<1207064146.3781.19.camel@mtls03> <ada8wzxcqhl.fsf@cisco.com>
	<1207130679.3781.50.camel@mtls03>
Message-ID: <ada7ifg46zn.fsf@cisco.com>

 > -                       (flags & MLX4_IB_QP_LSO) ? 64 : 0;
 > +                       ((flags & MLX4_IB_QP_LSO) ? 64 : 0);

Ugh, thanks, I've rolled that up into the patch.  Sorry for messing
things up...


From rdreier at cisco.com  Wed Apr  2 08:26:58 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 08:26:58 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1207136493.3781.59.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 14:41:33 +0300")
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03>
Message-ID: <ada3aq446rh.fsf@cisco.com>

Not sure I follow.  Given that we have

struct mlx4_lso_seg {
	__be32			mss_hdr_size;
	__be32			header[0];
};

I don't see much difference between my proposal

> 	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);

and yours

>	halign = ALIGN(wr->wr.ud.hlen + 4, 16);

since isn't sizeof *wqe == 4?

 > I don't think so, at least in the case that hlen equals 48 which is a
 > valid one since the total length used by the LSO segment would be 48 + 4
 > which requires 4 * 16 bytes chunks. If we'd use the above statement the
 > send would fail.

But the point is that the current code would only bump the wqe pointer
by 48 bytes and the last 4 bytes of the header would be overwritten by
the next data segment.

 - R.


From rdreier at cisco.com  Wed Apr  2 08:26:58 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 08:26:58 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1207136493.3781.59.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 14:41:33 +0300")
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03>
Message-ID: <ada3aq446rh.fsf@cisco.com>

Not sure I follow.  Given that we have

struct mlx4_lso_seg {
	__be32			mss_hdr_size;
	__be32			header[0];
};

I don't see much difference between my proposal

> 	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);

and yours

>	halign = ALIGN(wr->wr.ud.hlen + 4, 16);

since isn't sizeof *wqe == 4?

 > I don't think so, at least in the case that hlen equals 48 which is a
 > valid one since the total length used by the LSO segment would be 48 + 4
 > which requires 4 * 16 bytes chunks. If we'd use the above statement the
 > send would fail.

But the point is that the current code would only bump the wqe pointer
by 48 bytes and the last 4 bytes of the header would be overwritten by
the next data segment.

 - R.


From rdreier at cisco.com  Wed Apr  2 08:27:36 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 08:27:36 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <1207120932.4593.47.camel@localhost.localdomain> (Shirley Ma's
	message of "Wed, 02 Apr 2008 00:22:12 -0700")
References: <adave31bayd.fsf@cisco.com>
	<1207120932.4593.47.camel@localhost.localdomain>
Message-ID: <aday77w2s5z.fsf@cisco.com>

 > What's the status of RDS?

I've never seen any patches.  I guess ask the RDS guys if/when they want
to start working on getting RDS merged.

 - R.


From eli at dev.mellanox.co.il  Wed Apr  2 08:57:30 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 18:57:30 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <ada3aq446rh.fsf@cisco.com>
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03>  <ada3aq446rh.fsf@cisco.com>
Message-ID: <1207151850.3781.86.camel@mtls03>


On Wed, 2008-04-02 at 08:26 -0700, Roland Dreier wrote:
> Not sure I follow.  Given that we have
> 
> struct mlx4_lso_seg {
> 	__be32			mss_hdr_size;
> 	__be32			header[0];
> };
> 
> I don't see much difference between my proposal
> 
> > 	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);
> 
> and yours
> 
> >	halign = ALIGN(wr->wr.ud.hlen + 4, 16);
> 
> since isn't sizeof *wqe == 4?

Right, I missed that.

> 
>  > I don't think so, at least in the case that hlen equals 48 which is a
>  > valid one since the total length used by the LSO segment would be 48 + 4
>  > which requires 4 * 16 bytes chunks. If we'd use the above statement the
>  > send would fail.
> 
> But the point is that the current code would only bump the wqe pointer
> by 48 bytes and the last 4 bytes of the header would be overwritten by
> the next data segment.
> 

Given the fact that sizeof *wqe == 4 then what you propose seems to be a
correct approach. But I do think that this is equivalent but looks
cleaner:

-       unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
+       unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);


-       if (unlikely(wr->wr.ud.hlen) > 60)
+       if (unlikely(halign > 64))
                return -EINVAL;


From eli at dev.mellanox.co.il  Wed Apr  2 08:57:30 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 02 Apr 2008 18:57:30 +0300
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <ada3aq446rh.fsf@cisco.com>
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03>  <ada3aq446rh.fsf@cisco.com>
Message-ID: <1207151850.3781.86.camel@mtls03>


On Wed, 2008-04-02 at 08:26 -0700, Roland Dreier wrote:
> Not sure I follow.  Given that we have
> 
> struct mlx4_lso_seg {
> 	__be32			mss_hdr_size;
> 	__be32			header[0];
> };
> 
> I don't see much difference between my proposal
> 
> > 	halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);
> 
> and yours
> 
> >	halign = ALIGN(wr->wr.ud.hlen + 4, 16);
> 
> since isn't sizeof *wqe == 4?

Right, I missed that.

> 
>  > I don't think so, at least in the case that hlen equals 48 which is a
>  > valid one since the total length used by the LSO segment would be 48 + 4
>  > which requires 4 * 16 bytes chunks. If we'd use the above statement the
>  > send would fail.
> 
> But the point is that the current code would only bump the wqe pointer
> by 48 bytes and the last 4 bytes of the header would be overwritten by
> the next data segment.
> 

Given the fact that sizeof *wqe == 4 then what you propose seems to be a
correct approach. But I do think that this is equivalent but looks
cleaner:

-       unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
+       unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);


-       if (unlikely(wr->wr.ud.hlen) > 60)
+       if (unlikely(halign > 64))
                return -EINVAL;


From rdreier at cisco.com  Wed Apr  2 09:02:57 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:02:57 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1207151850.3781.86.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 18:57:30 +0300")
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03> <ada3aq446rh.fsf@cisco.com>
	<1207151850.3781.86.camel@mtls03>
Message-ID: <adatzik2qj2.fsf@cisco.com>

 > -       unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
 > +       unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);
 > 
 > 
 > -       if (unlikely(wr->wr.ud.hlen) > 60)
 > +       if (unlikely(halign > 64))

Sure, makes sense.


From rdreier at cisco.com  Wed Apr  2 09:02:57 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:02:57 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <1207151850.3781.86.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 18:57:30 +0300")
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03> <ada3aq446rh.fsf@cisco.com>
	<1207151850.3781.86.camel@mtls03>
Message-ID: <adatzik2qj2.fsf@cisco.com>

 > -       unsigned halign = ALIGN(wr->wr.ud.hlen, 16);
 > +       unsigned halign = ALIGN(wr->wr.ud.hlen + sizeof *wqe, 16);
 > 
 > 
 > -       if (unlikely(wr->wr.ud.hlen) > 60)
 > +       if (unlikely(halign > 64))

Sure, makes sense.


From rdreier at cisco.com  Wed Apr  2 09:04:14 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:04:14 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <adatzik2qj2.fsf@cisco.com> (Roland Dreier's message of "Wed, 02
	Apr 2008 09:02:57 -0700")
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03> <ada3aq446rh.fsf@cisco.com>
	<1207151850.3781.86.camel@mtls03> <adatzik2qj2.fsf@cisco.com>
Message-ID: <adaprt82qgx.fsf@cisco.com>

 > -       if (unlikely(wr->wr.ud.hlen) > 60)
 > +       if (unlikely(halign > 64))

heh, just noticed that we used to have

	unlikely(wr->wr.ud.hlen)

compared to 60.  So the annotation was messed up :)

 - R.


From rdreier at cisco.com  Wed Apr  2 09:04:14 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:04:14 -0700
Subject: [ofa-general] [PATCH 6/10 v1] IB/mlx4: Add LSO support
In-Reply-To: <adatzik2qj2.fsf@cisco.com> (Roland Dreier's message of "Wed, 02
	Apr 2008 09:02:57 -0700")
References: <1206452112.25950.360.camel@mtls03> <adazlsdbb3a.fsf@cisco.com>
	<1207136493.3781.59.camel@mtls03> <ada3aq446rh.fsf@cisco.com>
	<1207151850.3781.86.camel@mtls03> <adatzik2qj2.fsf@cisco.com>
Message-ID: <adaprt82qgx.fsf@cisco.com>

 > -       if (unlikely(wr->wr.ud.hlen) > 60)
 > +       if (unlikely(halign > 64))

heh, just noticed that we used to have

	unlikely(wr->wr.ud.hlen)

compared to 60.  So the annotation was messed up :)

 - R.


From richard.frank at oracle.com  Wed Apr  2 10:11:15 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:11:15 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <aday77w2s5z.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>
	<aday77w2s5z.fsf@cisco.com>
Message-ID: <47F3BE33.4000204@oracle.com>

What is the work we need to do here - I was thinking RDS should just work ?

Roland Dreier wrote:
>  > What's the status of RDS?
>
> I've never seen any patches.  I guess ask the RDS guys if/when they want
> to start working on getting RDS merged.
>
>  - R.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From rdreier at cisco.com  Wed Apr  2 09:15:36 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:15:36 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <47F3BE33.4000204@oracle.com> (Richard Frank's message of "Wed,
	02 Apr 2008 12:11:15 -0500")
References: <adave31bayd.fsf@cisco.com>
	<1207120932.4593.47.camel@localhost.localdomain>
	<aday77w2s5z.fsf@cisco.com> <47F3BE33.4000204@oracle.com>
Message-ID: <adalk3w2pxz.fsf@cisco.com>

 > What is the work we need to do here - I was thinking RDS should just work ?

Stuff doesn't get merged into the kernel on its own.  If you want RDS
upstream then the first step is to post patches in a form suitable for
reviewing.  Then respond to the review comments.

The files Documentation/SubmittingPatches and to some extent
Documentation/SubmittingDrivers in the kernel source have more info.

 - R.


From richard.frank at oracle.com  Wed Apr  2 10:18:36 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:18:36 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <adalk3w2pxz.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com>
	<47F3BE33.4000204@oracle.com> <adalk3w2pxz.fsf@cisco.com>
Message-ID: <47F3BFEC.1000400@oracle.com>

Yes, I see this is for pushing RDS upstream - but what about running RDS 
as is over IWARP NICs - that should just work right ?

Roland Dreier wrote:
>  > What is the work we need to do here - I was thinking RDS should just work ?
>
> Stuff doesn't get merged into the kernel on its own.  If you want RDS
> upstream then the first step is to post patches in a form suitable for
> reviewing.  Then respond to the review comments.
>
> The files Documentation/SubmittingPatches and to some extent
> Documentation/SubmittingDrivers in the kernel source have more info.
>
>  - R.
>   


From rdreier at cisco.com  Wed Apr  2 09:19:17 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:19:17 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <47F37CA4.8000109@mellanox.co.il> (Tziporet Koren's message of
	"Wed, 02 Apr 2008 15:31:32 +0300")
References: <adave31bayd.fsf@cisco.com> <47F37CA4.8000109@mellanox.co.il>
Message-ID: <adahcek2pru.fsf@cisco.com>

 > We want to add send with invalidate & mask compare and swap.
 > Eli will be able to send the patches next week and since they are
 > small I think they can be in for 2.6.26

Send with invalidate should be OK.  Let's see about the masked atomics
stuff -- we have a ton of new verbs and I think we might want to slow
down and make sure it all makes sense.

 > What about the split CQ for UD mode? It's improved the IPoIB
 > performance for small messages significantly.

Oh yeah... I'll try to get that in too.

 > mlx4- we plan to send patches for the low level driver only to enable
 > mlx4_en. These only affect our low level driver.

No problem in principle, let's see the actual patches.

 > I think we should try to push for XEC in 2.6.26 since there are
 > already MPI implementation that use it and this ties them to use OFED
 > only.
 > Also this feature is stable and now being defined in IBTA
 > Not taking it causing changes between OFED and the kernel and your
 > libibverbs and we wish to avoid such gaps.
 > Is there any thing we can do to help and make it into 2.6.26?

I don't have a good feeling that the user-kernel interface is well
thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs
and make sure we have something that makes sense for the future.

 - R.


From rdreier at cisco.com  Wed Apr  2 09:21:23 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:21:23 -0700
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate"
	to libibverbs
In-Reply-To: <1207137177.3781.67.camel@mtls03> (Eli Cohen's message of "Wed,
	02 Apr 2008 14:52:57 +0300")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<1207137177.3781.67.camel@mtls03>
Message-ID: <adad4p82poc.fsf@cisco.com>

 > Since send with immediate and send with invalidate are mutually
 > exclusive, wouldn't it make sense to use a union for both the immediate
 > value and the invalidated rkey?

maybe although that would be hard to do in libibverbs without changing
the API.  I'm not a big fan of anonymous unions and I don't see any
other good way to do it.

 > Also it seems like this commit touches code in both ib core and in hw
 > drivers.

Yes, I explained why in the changelog.

 - R.


From richard.frank at oracle.com  Wed Apr  2 10:24:07 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:24:07 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <adalk3w2pxz.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com>
	<47F3BE33.4000204@oracle.com> <adalk3w2pxz.fsf@cisco.com>
Message-ID: <47F3C137.3070209@oracle.com>

WRT to merging RDS into the kernel - our current plans are to wait to 
see RDS adopted by more than Oracle - before approaching the kernel 
community about inclusion of RDS.

Roland Dreier wrote:
>  > What is the work we need to do here - I was thinking RDS should just work ?
>
> Stuff doesn't get merged into the kernel on its own.  If you want RDS
> upstream then the first step is to post patches in a form suitable for
> reviewing.  Then respond to the review comments.
>
> The files Documentation/SubmittingPatches and to some extent
> Documentation/SubmittingDrivers in the kernel source have more info.
>
>  - R.
>   


From rdreier at cisco.com  Wed Apr  2 09:25:35 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:25:35 -0700
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate"
	to	libibverbs
In-Reply-To: <47F33837.60701@dev.mellanox.co.il> (Dotan Barak's message of
	"Wed, 02 Apr 2008 10:39:35 +0300")
References: <adad4p92rra.fsf@cisco.com> <47F33837.60701@dev.mellanox.co.il>
Message-ID: <ada8wzw2phc.fsf@cisco.com>

 > Why do you need the flag IBV_DEVICE_MEM_WINDOW?
 > If the value of device_attributes.num_mw is more than zero => the
 > device supports memory windows, so i think this flag
 > can be safely removed.

OK, I'll delete it from the libibverbs changes.  I guess we can kill it
on the kernel side too.

 > I think that the send & invalidate should be a new opcode instead of a
 > send flag.

That makes sense.  All existing hardware seems to use a separate opcode
in the HW WQE format, so it makes things cleaner to use a new opcode at
the verbs API level too.

I'll update my patches.

Thanks,
  Roland


From rdreier at cisco.com  Wed Apr  2 09:26:26 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:26:26 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <47F3BFEC.1000400@oracle.com> (Richard Frank's message of "Wed,
	02 Apr 2008 12:18:36 -0500")
References: <adave31bayd.fsf@cisco.com>
	<1207120932.4593.47.camel@localhost.localdomain>
	<aday77w2s5z.fsf@cisco.com> <47F3BE33.4000204@oracle.com>
	<adalk3w2pxz.fsf@cisco.com> <47F3BFEC.1000400@oracle.com>
Message-ID: <ada4pak2pfx.fsf@cisco.com>

 > Yes, I see this is for pushing RDS upstream - but what about running
 > RDS as is over IWARP NICs - that should just work right ?

No idea.  It depends on whether you took into account the differences
between IB and iWARP.  Anyway that's not really what this thread was about.


From richard.frank at oracle.com  Wed Apr  2 10:28:44 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:28:44 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <ada4pak2pfx.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com>
	<47F3BE33.4000204@oracle.com>	<adalk3w2pxz.fsf@cisco.com>
	<47F3BFEC.1000400@oracle.com> <ada4pak2pfx.fsf@cisco.com>
Message-ID: <47F3C24C.1090904@oracle.com>

got it...

Roland Dreier wrote:
>  > Yes, I see this is for pushing RDS upstream - but what about running
>  > RDS as is over IWARP NICs - that should just work right ?
>
> No idea.  It depends on whether you took into account the differences
> between IB and iWARP.  Anyway that's not really what this thread was about.
>   


From sweitzen at cisco.com  Wed Apr  2 09:29:43 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 09:29:43 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what'sin infiniband.git)
In-Reply-To: <47F3C137.3070209@oracle.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com><47F3BE33.4000204@oracle.com>
	<adalk3w2pxz.fsf@cisco.com> <47F3C137.3070209@oracle.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055471E8@xmb-sjc-216.amer.cisco.com>

> WRT to merging RDS into the kernel - our current plans are to wait to 
> see RDS adopted by more than Oracle - before approaching the kernel 
> community about inclusion of RDS.

I've seen statements before from someone from Oracle that RDS was only
for Oracle's use, for example, that person did not want netperf changed
to support RDS.

Scott Weitzenkamp
SQA and Release Manager
Data Center Access Engineering
Cisco Systems


From richard.frank at oracle.com  Wed Apr  2 10:31:27 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:31:27 -0500
Subject: [ofa-general] Has anyone tried running RDS over 10GE / IWARP NICs ?
Message-ID: <47F3C2EF.6010304@oracle.com>

We'd appreciate some feed back on your experience and would like to sort 
out any issues ASAP.

Rick


From xma at us.ibm.com  Wed Apr  2 09:36:12 2008
From: xma at us.ibm.com (Shirley Ma)
Date: Wed, 2 Apr 2008 09:36:12 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's	in infiniband.git)
In-Reply-To: <47F3C24C.1090904@oracle.com>
Message-ID: <OF1D776305.7E25BAA6-ON8725741F.005B027C-8825741F.002F5161@us.ibm.com>


> got it...

Can the maintainer submit RDS patch for mainline kernel, in 2.6.26 or
2.6.27 window? It's hard for Distros pick this feature without mainline
kernel acceptance.

Thanks
Shirley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080402/7ec9e915/attachment.html>

From rdreier at cisco.com  Wed Apr  2 09:37:53 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 02 Apr 2008 09:37:53 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's	in infiniband.git)
In-Reply-To: <OF1D776305.7E25BAA6-ON8725741F.005B027C-8825741F.002F5161@us.ibm.com>
	(Shirley Ma's message of "Wed, 2 Apr 2008 09:36:12 -0700")
References: <OF1D776305.7E25BAA6-ON8725741F.005B027C-8825741F.002F5161@us.ibm.com>
Message-ID: <adar6do1ace.fsf@cisco.com>

 > Can the maintainer submit RDS patch for mainline kernel, in 2.6.26 or
 > 2.6.27 window? It's hard for Distros pick this feature without mainline
 > kernel acceptance.

At least as a first order approximation, there is no chance of RDS being
merged for 2.6.26 even if patches appear right this second...


From richard.frank at oracle.com  Wed Apr  2 10:37:45 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:37:45 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what'sin infiniband.git)
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055471E8@xmb-sjc-216.amer.cisco.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com><47F3BE33.4000204@oracle.com>
	<adalk3w2pxz.fsf@cisco.com> <47F3C137.3070209@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055471E8@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F3C469.1020803@oracle.com>

I believe there is a patch for NetPerf which supports RDS - although it 
may need to be updated - and submitted.

The only prior discussion I can think of - was whether or not NetPerf 
exercises RDS as Oracle would.

I'm not proposing that we should enhance NetPerf to do that (but that's 
OK with me).

We created a tool rds-stress which does that.

Scott Weitzenkamp (sweitzen) wrote:
>> WRT to merging RDS into the kernel - our current plans are to wait to 
>> see RDS adopted by more than Oracle - before approaching the kernel 
>> community about inclusion of RDS.
>>     
>
> I've seen statements before from someone from Oracle that RDS was only
> for Oracle's use, for example, that person did not want netperf changed
> to support RDS.
>
> Scott Weitzenkamp
> SQA and Release Manager
> Data Center Access Engineering
> Cisco Systems
>   


From sweitzen at cisco.com  Wed Apr  2 09:41:30 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 09:41:30 -0700
Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE
	/ IWARP NICs ?
In-Reply-To: <47F3C2EF.6010304@oracle.com>
References: <47F3C2EF.6010304@oracle.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>

Does't appear to work with Chelsio and OFED 1.3:

[root at svbu-qa2950-1 counters]# ethtool -i eth2
driver: cxgb3
version: 1.0-ofed
firmware-version: T 5.0.0 TP 1.1.0
bus-info: 0000:0b:00.0
[root at svbu-qa2950-1 counters]# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:07:43:05:43:9F
          inet addr:192.168.0.198  Bcast:192.168.0.255
Mask:255.255.255.0
          inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:144770 errors:0 dropped:0 overruns:0 frame:0
          TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:207891512 (198.2 MiB)  TX bytes:9348152 (8.9 MiB)
          Interrupt:169 Memory:fceff000-fcefffff

[root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1
rds-sink: Unable to bind socket: Cannot assign requested address

Scott Weitzenkamp
SQA and Release Manager
Data Center Access Engineering
Cisco Systems


> -----Original Message-----
> From: rds-devel-bounces at oss.oracle.com 
> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of Richard Frank
> Sent: Wednesday, April 02, 2008 10:31 AM
> To: rds-devel at oss.oracle.com; [ofa_general]
> Subject: [rds-devel] Has anyone tried running RDS over 10GE / 
> IWARP NICs ?
> 
> We'd appreciate some feed back on your experience and would 
> like to sort 
> out any issues ASAP.
> 
> Rick
> 
> _______________________________________________
> rds-devel mailing list
> rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel
> 


From richard.frank at oracle.com  Wed Apr  2 10:43:45 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 12:43:45 -0500
Subject: [ofa-general] Re: [rds-devel] Has anyone tried running RDS over 10GE
 / IWARP NICs ?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F3C5D1.5000003@oracle.com>

is the rds driver loaded (modprobe rds)

Scott Weitzenkamp (sweitzen) wrote:
> Does't appear to work with Chelsio and OFED 1.3:
>
> [root at svbu-qa2950-1 counters]# ethtool -i eth2
> driver: cxgb3
> version: 1.0-ofed
> firmware-version: T 5.0.0 TP 1.1.0
> bus-info: 0000:0b:00.0
> [root at svbu-qa2950-1 counters]# ifconfig eth2
> eth2      Link encap:Ethernet  HWaddr 00:07:43:05:43:9F
>           inet addr:192.168.0.198  Bcast:192.168.0.255
> Mask:255.255.255.0
>           inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:144770 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:207891512 (198.2 MiB)  TX bytes:9348152 (8.9 MiB)
>           Interrupt:169 Memory:fceff000-fcefffff
>
> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1
> rds-sink: Unable to bind socket: Cannot assign requested address
>
> Scott Weitzenkamp
> SQA and Release Manager
> Data Center Access Engineering
> Cisco Systems
>
>
>  
>
>   
>> -----Original Message-----
>> From: rds-devel-bounces at oss.oracle.com 
>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of Richard Frank
>> Sent: Wednesday, April 02, 2008 10:31 AM
>> To: rds-devel at oss.oracle.com; [ofa_general]
>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / 
>> IWARP NICs ?
>>
>> We'd appreciate some feed back on your experience and would 
>> like to sort 
>> out any issues ASAP.
>>
>> Rick
>>
>> _______________________________________________
>> rds-devel mailing list
>> rds-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/rds-devel
>>
>>     


From sweitzen at cisco.com  Wed Apr  2 09:46:48 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 09:46:48 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what'sin infiniband.git)
In-Reply-To: <47F3C469.1020803@oracle.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com><47F3BE33.4000204@oracle.com>
	<adalk3w2pxz.fsf@cisco.com> <47F3C137.3070209@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055471E8@xmb-sjc-216.amer.cisco.com>
	<47F3C469.1020803@oracle.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554720D@xmb-sjc-216.amer.cisco.com>

Rich,

On Nov 1, 2007, you wrote this to rds-devel:

  "Netperf is too simplistic in that all it seems to do is stream data
in a 
  simple loop. This is not how Oracle uses the IPC and again does not 
  reflect what it would take to make UDP reliable.

  For this reason we are not interested in having Netperf support RDS
and 
  or seeing Netperf data."

I would like to see RDS supported by existing common tools like netperf,
iperf, etc. so we can easily compare how RDS performs to UDP for IPC
models other than Oracle.

Scott Weitzenkamp
SQA and Release Manager
Data Center Access Engineering
Cisco Systems


> -----Original Message-----
> From: Richard Frank [mailto:richard.frank at oracle.com] 
> Sent: Wednesday, April 02, 2008 10:38 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; 
> linux-kernel at vger.kernel.org; general at lists.openfabrics.org
> Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans 
> for 2.6.26 (what'sin infiniband.git)
> 
> I believe there is a patch for NetPerf which supports RDS - 
> although it 
> may need to be updated - and submitted.
> 
> The only prior discussion I can think of - was whether or not NetPerf 
> exercises RDS as Oracle would.
> 
> I'm not proposing that we should enhance NetPerf to do that 
> (but that's 
> OK with me).
> 
> We created a tool rds-stress which does that.
> 
> Scott Weitzenkamp (sweitzen) wrote:
> >> WRT to merging RDS into the kernel - our current plans are 
> to wait to 
> >> see RDS adopted by more than Oracle - before approaching 
> the kernel 
> >> community about inclusion of RDS.
> >>     
> >
> > I've seen statements before from someone from Oracle that 
> RDS was only
> > for Oracle's use, for example, that person did not want 
> netperf changed
> > to support RDS.
> >
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Data Center Access Engineering
> > Cisco Systems
> >   
> 


From sweitzen at cisco.com  Wed Apr  2 09:47:18 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 09:47:18 -0700
Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE
	/ IWARP NICs ?
In-Reply-To: <47F3C5D1.5000003@oracle.com>
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547210@xmb-sjc-216.amer.cisco.com>

Yes, it's loaded, and dmesg says this:

Registered RDS/ib transport
Registered RDS/tcp transport
NET: Registered protocol family 28

Scott
 

> -----Original Message-----
> From: Richard Frank [mailto:richard.frank at oracle.com] 
> Sent: Wednesday, April 02, 2008 10:44 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: rds-devel at oss.oracle.com; [ofa_general]
> Subject: Re: [rds-devel] Has anyone tried running RDS over 
> 10GE / IWARP NICs ?
> 
> is the rds driver loaded (modprobe rds)
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > Does't appear to work with Chelsio and OFED 1.3:
> >
> > [root at svbu-qa2950-1 counters]# ethtool -i eth2
> > driver: cxgb3
> > version: 1.0-ofed
> > firmware-version: T 5.0.0 TP 1.1.0
> > bus-info: 0000:0b:00.0
> > [root at svbu-qa2950-1 counters]# ifconfig eth2
> > eth2      Link encap:Ethernet  HWaddr 00:07:43:05:43:9F
> >           inet addr:192.168.0.198  Bcast:192.168.0.255
> > Mask:255.255.255.0
> >           inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:144770 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:1000
> >           RX bytes:207891512 (198.2 MiB)  TX bytes:9348152 (8.9 MiB)
> >           Interrupt:169 Memory:fceff000-fcefffff
> >
> > [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1
> > rds-sink: Unable to bind socket: Cannot assign requested address
> >
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Data Center Access Engineering
> > Cisco Systems
> >
> >
> >  
> >
> >   
> >> -----Original Message-----
> >> From: rds-devel-bounces at oss.oracle.com 
> >> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of 
> Richard Frank
> >> Sent: Wednesday, April 02, 2008 10:31 AM
> >> To: rds-devel at oss.oracle.com; [ofa_general]
> >> Subject: [rds-devel] Has anyone tried running RDS over 10GE / 
> >> IWARP NICs ?
> >>
> >> We'd appreciate some feed back on your experience and would 
> >> like to sort 
> >> out any issues ASAP.
> >>
> >> Rick
> >>
> >> _______________________________________________
> >> rds-devel mailing list
> >> rds-devel at oss.oracle.com
> >> http://oss.oracle.com/mailman/listinfo/rds-devel
> >>
> >>     
> 


From weiny2 at llnl.gov  Wed Apr  2 09:50:57 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Wed, 2 Apr 2008 09:50:57 -0700
Subject: [ofa-general] Reminder: OpenSM BOF at OFA Sonoma Workshop
Message-ID: <20080402095057.360cbff1.weiny2@llnl.gov>

Just a reminder that we are going to have a BOF for OpenSM Monday the 7th at
6:30pm; room is TBA.

Please come and share your use, experience and desires for OpenSM.  Or if you
have yet to try OpenSM, listen in on what others are doing with it.

Thanks,
Ira Weiny
Comp Sci./Math Prog.
Lawrence Livermore National Lab
weiny2 at llnl.gov


From richard.frank at oracle.com  Wed Apr  2 11:00:27 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 13:00:27 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what'sin infiniband.git)
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554720D@xmb-sjc-216.amer.cisco.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com><47F3BE33.4000204@oracle.com>
	<adalk3w2pxz.fsf@cisco.com> <47F3C137.3070209@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055471E8@xmb-sjc-216.amer.cisco.com>
	<47F3C469.1020803@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554720D@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F3C9BB.5040009@oracle.com>

OK - and the conversation was about using NetPerf to compare performance 
of RDS to UDP relative to suitability for Oracle use ... so I think 
those statements still illustrate my points...

1) NetPerf does not do what Oracle does - and hence is not useful from 
Oracle's perspective in comparing ULPs.
2) For some metrics - it's not valid to compare a non-reliable IPC to a 
reliable IPC -  it's not an apples to apples comparison. Especially when 
the app is considered and what the app must do to use UDP vs RDS.

I did not say that NetPerf should not be extended to support RDS - just 
that using it to do a comparison of ULPs to determine how well Oracle 
would run - is not what we (Oracle) would want - at least that was my 
intention..

Scott Weitzenkamp (sweitzen) wrote:
> Rich,
>
> On Nov 1, 2007, you wrote this to rds-devel:
>
>   "Netperf is too simplistic in that all it seems to do is stream data
> in a 
>   simple loop. This is not how Oracle uses the IPC and again does not 
>   reflect what it would take to make UDP reliable.
>
>   For this reason we are not interested in having Netperf support RDS
> and 
>   or seeing Netperf data."
>
> I would like to see RDS supported by existing common tools like netperf,
> iperf, etc. so we can easily compare how RDS performs to UDP for IPC
> models other than Oracle.
>
> Scott Weitzenkamp
> SQA and Release Manager
> Data Center Access Engineering
> Cisco Systems
>
>
>  
>
>   
>> -----Original Message-----
>> From: Richard Frank [mailto:richard.frank at oracle.com] 
>> Sent: Wednesday, April 02, 2008 10:38 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; 
>> linux-kernel at vger.kernel.org; general at lists.openfabrics.org
>> Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans 
>> for 2.6.26 (what'sin infiniband.git)
>>
>> I believe there is a patch for NetPerf which supports RDS - 
>> although it 
>> may need to be updated - and submitted.
>>
>> The only prior discussion I can think of - was whether or not NetPerf 
>> exercises RDS as Oracle would.
>>
>> I'm not proposing that we should enhance NetPerf to do that 
>> (but that's 
>> OK with me).
>>
>> We created a tool rds-stress which does that.
>>
>> Scott Weitzenkamp (sweitzen) wrote:
>>     
>>>> WRT to merging RDS into the kernel - our current plans are 
>>>>         
>> to wait to 
>>     
>>>> see RDS adopted by more than Oracle - before approaching 
>>>>         
>> the kernel 
>>     
>>>> community about inclusion of RDS.
>>>>     
>>>>         
>>> I've seen statements before from someone from Oracle that 
>>>       
>> RDS was only
>>     
>>> for Oracle's use, for example, that person did not want 
>>>       
>> netperf changed
>>     
>>> to support RDS.
>>>
>>> Scott Weitzenkamp
>>> SQA and Release Manager
>>> Data Center Access Engineering
>>> Cisco Systems
>>>   
>>>       


From sweitzen at cisco.com  Wed Apr  2 10:04:23 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 10:04:23 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what'sin infiniband.git)
In-Reply-To: <47F3C9BB.5040009@oracle.com>
References: <adave31bayd.fsf@cisco.com>	<1207120932.4593.47.camel@localhost.localdomain>	<aday77w2s5z.fsf@cisco.com><47F3BE33.4000204@oracle.com>
	<adalk3w2pxz.fsf@cisco.com> <47F3C137.3070209@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055471E8@xmb-sjc-216.amer.cisco.com>
	<47F3C469.1020803@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554720D@xmb-sjc-216.amer.cisco.com>
	<47F3C9BB.5040009@oracle.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547248@xmb-sjc-216.amer.cisco.com>

I'd like to see netperf comparisions of UDP_STREAM/UDP_RR vs
RDS_STREAM/RDS_RR, does anyone have a patch that will apply cleanly to a
recent netperf?

Scott Weitzenkamp
SQA and Release Manager
Data Center Access Engineering
Cisco Systems


> -----Original Message-----
> From: Richard Frank [mailto:richard.frank at oracle.com] 
> Sent: Wednesday, April 02, 2008 11:00 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; 
> linux-kernel at vger.kernel.org; general at lists.openfabrics.org
> Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans 
> for 2.6.26 (what'sin infiniband.git)
> 
> OK - and the conversation was about using NetPerf to compare 
> performance 
> of RDS to UDP relative to suitability for Oracle use ... so I think 
> those statements still illustrate my points...
> 
> 1) NetPerf does not do what Oracle does - and hence is not 
> useful from 
> Oracle's perspective in comparing ULPs.
> 2) For some metrics - it's not valid to compare a 
> non-reliable IPC to a 
> reliable IPC -  it's not an apples to apples comparison. 
> Especially when 
> the app is considered and what the app must do to use UDP vs RDS.
> 
> I did not say that NetPerf should not be extended to support 
> RDS - just 
> that using it to do a comparison of ULPs to determine how well Oracle 
> would run - is not what we (Oracle) would want - at least that was my 
> intention..
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > Rich,
> >
> > On Nov 1, 2007, you wrote this to rds-devel:
> >
> >   "Netperf is too simplistic in that all it seems to do is 
> stream data
> > in a 
> >   simple loop. This is not how Oracle uses the IPC and 
> again does not 
> >   reflect what it would take to make UDP reliable.
> >
> >   For this reason we are not interested in having Netperf 
> support RDS
> > and 
> >   or seeing Netperf data."
> >
> > I would like to see RDS supported by existing common tools 
> like netperf,
> > iperf, etc. so we can easily compare how RDS performs to UDP for IPC
> > models other than Oracle.
> >
> > Scott Weitzenkamp
> > SQA and Release Manager
> > Data Center Access Engineering
> > Cisco Systems
> >
> >
> >  
> >
> >   
> >> -----Original Message-----
> >> From: Richard Frank [mailto:richard.frank at oracle.com] 
> >> Sent: Wednesday, April 02, 2008 10:38 AM
> >> To: Scott Weitzenkamp (sweitzen)
> >> Cc: Roland Dreier (rdreier); rds-devel at oss.oracle.com; 
> >> linux-kernel at vger.kernel.org; general at lists.openfabrics.org
> >> Subject: Re: [ofa-general] InfiniBand/iWARP/RDMA merge plans 
> >> for 2.6.26 (what'sin infiniband.git)
> >>
> >> I believe there is a patch for NetPerf which supports RDS - 
> >> although it 
> >> may need to be updated - and submitted.
> >>
> >> The only prior discussion I can think of - was whether or 
> not NetPerf 
> >> exercises RDS as Oracle would.
> >>
> >> I'm not proposing that we should enhance NetPerf to do that 
> >> (but that's 
> >> OK with me).
> >>
> >> We created a tool rds-stress which does that.
> >>
> >> Scott Weitzenkamp (sweitzen) wrote:
> >>     
> >>>> WRT to merging RDS into the kernel - our current plans are 
> >>>>         
> >> to wait to 
> >>     
> >>>> see RDS adopted by more than Oracle - before approaching 
> >>>>         
> >> the kernel 
> >>     
> >>>> community about inclusion of RDS.
> >>>>     
> >>>>         
> >>> I've seen statements before from someone from Oracle that 
> >>>       
> >> RDS was only
> >>     
> >>> for Oracle's use, for example, that person did not want 
> >>>       
> >> netperf changed
> >>     
> >>> to support RDS.
> >>>
> >>> Scott Weitzenkamp
> >>> SQA and Release Manager
> >>> Data Center Access Engineering
> >>> Cisco Systems
> >>>   
> >>>       
> 


From richard.frank at oracle.com  Wed Apr  2 11:03:53 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Wed, 02 Apr 2008 13:03:53 -0500
Subject: [ofa-general] Re: [rds-devel] Has anyone tried running RDS over 10GE
 / IWARP NICs ?
In-Reply-To: <47F3C5D1.5000003@oracle.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com>
Message-ID: <47F3CA89.9080406@oracle.com>

RDS does not run over regular 10G NICs - that appear as simple NICS - 
this was disabled in 1.3.

For now we  are interested in RDS over IWARP NICS - configured as 
accessible via the verbs interfaces.

Richard Frank wrote:
> is the rds driver loaded (modprobe rds)
>
> Scott Weitzenkamp (sweitzen) wrote:
>   
>> Does't appear to work with Chelsio and OFED 1.3:
>>
>> [root at svbu-qa2950-1 counters]# ethtool -i eth2
>> driver: cxgb3
>> version: 1.0-ofed
>> firmware-version: T 5.0.0 TP 1.1.0
>> bus-info: 0000:0b:00.0
>> [root at svbu-qa2950-1 counters]# ifconfig eth2
>> eth2      Link encap:Ethernet  HWaddr 00:07:43:05:43:9F
>>           inet addr:192.168.0.198  Bcast:192.168.0.255
>> Mask:255.255.255.0
>>           inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:144770 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:207891512 (198.2 MiB)  TX bytes:9348152 (8.9 MiB)
>>           Interrupt:169 Memory:fceff000-fcefffff
>>
>> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1
>> rds-sink: Unable to bind socket: Cannot assign requested address
>>
>> Scott Weitzenkamp
>> SQA and Release Manager
>> Data Center Access Engineering
>> Cisco Systems
>>
>>
>>  
>>
>>   
>>     
>>> -----Original Message-----
>>> From: rds-devel-bounces at oss.oracle.com 
>>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of Richard Frank
>>> Sent: Wednesday, April 02, 2008 10:31 AM
>>> To: rds-devel at oss.oracle.com; [ofa_general]
>>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / 
>>> IWARP NICs ?
>>>
>>> We'd appreciate some feed back on your experience and would 
>>> like to sort 
>>> out any issues ASAP.
>>>
>>> Rick
>>>
>>> _______________________________________________
>>> rds-devel mailing list
>>> rds-devel at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/rds-devel
>>>
>>>     
>>>       
>
> _______________________________________________
> rds-devel mailing list
> rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel
>   


From sweitzen at cisco.com  Wed Apr  2 10:09:14 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 10:09:14 -0700
Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over 10GE
	/ IWARP NICs ?
In-Reply-To: <47F3CA89.9080406@oracle.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>

Yes, it's an iWARP NIC, and the OFED 1.3 perftest ib_rdma_lat program is
working.

Scott


> -----Original Message-----
> From: Richard Frank [mailto:richard.frank at oracle.com] 
> Sent: Wednesday, April 02, 2008 11:04 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: rds-devel at oss.oracle.com; [ofa_general]
> Subject: Re: [rds-devel] Has anyone tried running RDS over 
> 10GE / IWARP NICs ?
> 
> RDS does not run over regular 10G NICs - that appear as simple NICS - 
> this was disabled in 1.3.
> 
> For now we  are interested in RDS over IWARP NICS - configured as 
> accessible via the verbs interfaces.
> 
> Richard Frank wrote:
> > is the rds driver loaded (modprobe rds)
> >
> > Scott Weitzenkamp (sweitzen) wrote:
> >   
> >> Does't appear to work with Chelsio and OFED 1.3:
> >>
> >> [root at svbu-qa2950-1 counters]# ethtool -i eth2
> >> driver: cxgb3
> >> version: 1.0-ofed
> >> firmware-version: T 5.0.0 TP 1.1.0
> >> bus-info: 0000:0b:00.0
> >> [root at svbu-qa2950-1 counters]# ifconfig eth2
> >> eth2      Link encap:Ethernet  HWaddr 00:07:43:05:43:9F
> >>           inet addr:192.168.0.198  Bcast:192.168.0.255
> >> Mask:255.255.255.0
> >>           inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link
> >>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >>           RX packets:144770 errors:0 dropped:0 overruns:0 frame:0
> >>           TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0
> >>           collisions:0 txqueuelen:1000
> >>           RX bytes:207891512 (198.2 MiB)  TX bytes:9348152 
> (8.9 MiB)
> >>           Interrupt:169 Memory:fceff000-fcefffff
> >>
> >> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1
> >> rds-sink: Unable to bind socket: Cannot assign requested address
> >>
> >> Scott Weitzenkamp
> >> SQA and Release Manager
> >> Data Center Access Engineering
> >> Cisco Systems
> >>
> >>
> >>  
> >>
> >>   
> >>     
> >>> -----Original Message-----
> >>> From: rds-devel-bounces at oss.oracle.com 
> >>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of 
> Richard Frank
> >>> Sent: Wednesday, April 02, 2008 10:31 AM
> >>> To: rds-devel at oss.oracle.com; [ofa_general]
> >>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / 
> >>> IWARP NICs ?
> >>>
> >>> We'd appreciate some feed back on your experience and would 
> >>> like to sort 
> >>> out any issues ASAP.
> >>>
> >>> Rick
> >>>
> >>> _______________________________________________
> >>> rds-devel mailing list
> >>> rds-devel at oss.oracle.com
> >>> http://oss.oracle.com/mailman/listinfo/rds-devel
> >>>
> >>>     
> >>>       
> >
> > _______________________________________________
> > rds-devel mailing list
> > rds-devel at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/rds-devel
> >   
> 


From Thomas.Talpey at netapp.com  Wed Apr  2 10:21:39 2008
From: Thomas.Talpey at netapp.com (Talpey, Thomas)
Date: Wed, 02 Apr 2008 13:21:39 -0400
Subject: [ofa-general] [PATCH/RFC] Add support for "send with
	invalidate" to	libibverbs
In-Reply-To: <47F33837.60701@dev.mellanox.co.il>
References: <adad4p92rra.fsf@cisco.com>
 <47F33837.60701@dev.mellanox.co.il>
Message-ID: <EXNANE01mFwzPGsM4UR00000213@exnane01.hq.netapp.com>

At 03:39 AM 4/2/2008, Dotan Barak wrote:
>If the value of device_attributes.num_mw is more than zero => the device 
>supports memory windows, so i think this flag
>can be safely removed.

I agree with removing the flag, but if you mean "max_mw", looking at the
tree, there are a few problems with the > zero assertion. :-)

drivers/infiniband/hw/ehca/ehca_hca.c 376:
	props->max_mw = min_t(unsigned, rblock->max_mw, INT_MAX);

drivers/infiniband/hw/nes/nes_verbs.c 3915:
	props->max_mw = nesibdev->max_mr;

Note, ehca may set it to huge negative values, and nes puts the wrong
value in the attribute field! (typo?)

The good news is, the AMSO1100 seems to get it right. ;-)

I'm still looking to be able to test the NFS/RDMA client over memory
windows. The code's all there in the RPC layer, just not in the
providers.

Tom.


From Jeffrey.C.Becker at nasa.gov  Wed Apr  2 10:41:58 2008
From: Jeffrey.C.Becker at nasa.gov (Jeff Becker)
Date: Wed, 02 Apr 2008 10:41:58 -0700
Subject: [ofa-general] Spam on mailing list general@openib.org
In-Reply-To: <47F1D77F.7030104@voltaire.com>
References: <EXNANE01pjc0D543Tuy000001b0@exnane01.hq.netapp.com>	<47EBB5A0.6030000@isomerica.net>	<20080327152720.GB24509@cefeid.wcss.wroc.pl>	<47EBBE4B.5090706@isomerica.net>
	<47EBCF04.6040208@nasa.gov>	<EXNANE01gTIFprhMOgR000001db@exnane01.hq.netapp.com>
	<47F11D54.601@nasa.gov> <47F1D77F.7030104@voltaire.com>
Message-ID: <47F3C566.6000602@nasa.gov>

Since I didn't hear any other votes, I commented out (turned off) the 
old openib lists. I can reinstate them if needed. I hope this helps. Thanks.

-jeff

Or Gerlitz wrote:
> Jeff Becker wrote:
>> Hi. Since valid stuff is being rejected, I reset the SPAM filter for 
>> general to its previous setting. As Tom noted, most of the spam comes 
>> through the old openib.org lists. Is there any reason to keep these? 
>> If not, I can see about turning them off in order to improve the 
>> situation. Thanks.
> I vote for turning off the old openib.org lists, best if you can set 
> some automatic reply on them redirecting the sender to the 
> openfabrics.org lists, etc.
>
> Or.
>


From andrea at qumranet.com  Wed Apr  2 10:50:58 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 2 Apr 2008 19:50:58 +0200
Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and
	refcount
In-Reply-To: <20080401205636.777127252@sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205636.777127252@sgi.com>
Message-ID: <20080402175058.GR19189@duo.random>

On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote:
>   This results in f.e. the Aim9 brk performance test to got down by 10-15%.

I guess it's more likely because of overscheduling for small crtitical
sections, did you counted the total number of context switches? I
guess there will be a lot more with your patch applied. That
regression is a showstopper and it is the reason why I've suggested
before to add a CONFIG_XPMEM or CONFIG_MMU_NOTIFIER_SLEEP config
option to make the VM locks sleep capable only when XPMEM=y
(PREEMPT_RT will enable it too). Thanks for doing the benchmark work!


From clameter at sgi.com  Wed Apr  2 10:59:50 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 10:59:50 -0700 (PDT)
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080402064952.GF19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> There are much bigger issues besides the rcu safety in this patch,
> proper aging of the secondary mmu through access bits set by hardware
> is unfixable with this model (you would need to do age |=
> e->callback), which is the proof of why this isn't flexibile enough by
> forcing the same parameter and retvals for all methods. No idea why
> you go for such inferior solution that will never get the aging right
> and will likely fall apart if we add more methods in the future.

There is always the possibility to add special functions in the same way 
as done in the mmu notifier series if it really becomes necessary. EMM 
does  in no way preclude that.

Here f.e. We can add a special emm_age() function that iterates 
differently and does the | for you.

> For example the "switch" you have to add in
> xpmem_emm_notifier_callback doesn't look good, at least gcc may be
> able to optimize it with an array indexing simulating proper pointer
> to function like in #v9.

Actually the switch looks really good because it allows code to run
for all callbacks like f.e. xpmem_tg_ref(). Otherwise the refcounting code 
would have to be added to each callback.

> 
> Most other patches will apply cleanly on top of my coming mmu
> notifiers #v10 that I hope will go in -mm.
> 
> For #v10 the only two left open issues to discuss are:

Did I see #v10? Could you start a new subject when you post please? Do 
not respond to some old message otherwise the threading will be wrong.

>    methods will be correctly replied allowing GRU not to corrupt
>    memory after the registration method. EMM would also need a fix
>    like this for GRU to be safe on top of EMM.

How exactly does the GRU corrupt memory?
 
>    Another less obviously safe approach is to allow the register
>    method to succeed only when mm_users=1 and the task is single
>    threaded. This way if all the places where the mmu notifers aren't
>    invoked on the mm not by the current task, are only doing
>    invalidates after/before zapping ptes, if the istantiation of new
>    ptes is single threaded too, we shouldn't worry if we miss an
>    invalidate for a pte that is zero and doesn't point to any physical
>    page. In the places where current->mm != mm I'm using
>    invalidate_page 99% of the time, and that only follows the
>    ptep_clear_flush. The problem are the range_begin that will happen
>    before zapping the pte in places where current->mm !=
>    mm. Unfortunately in my incremental patch where I move all
>    invalidate_page outside of the PT lock to prepare for allowing
>    sleeping inside the mmu notifiers, I used range_begin/end in places
>    like try_to_unmap_cluster where current->mm != mm. In general
>    this solution looks more fragile than the seqlock.

Hmmm... Okay that is one solution that would just require a BUG_ON in the 
registration methods.

> 2) I'm uncertain how the driver can handle a range_end called before
>    range_begin. Also multiple range_begin can happen in parallel later
>    followed by range_end, so if there's a global seqlock that
>    serializes the secondary mmu page fault, that will screwup (you
>    can't seqlock_write in range_begin and sequnlock_write in
>    range_end). The write side of the seqlock must be serialized and
>    calling seqlock_write twice in a row before any sequnlock operation
>    will break.

Well doesnt the requirement of just one execution thread also deal with 
that issue?


From clameter at sgi.com  Wed Apr  2 11:15:26 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 11:15:26 -0700 (PDT)
Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and
	refcount
In-Reply-To: <20080402175058.GR19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205636.777127252@sgi.com>
	<20080402175058.GR19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021107520.27337@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote:
> >   This results in f.e. the Aim9 brk performance test to got down by 10-15%.
> 
> I guess it's more likely because of overscheduling for small crtitical
> sections, did you counted the total number of context switches? I
> guess there will be a lot more with your patch applied. That
> regression is a showstopper and it is the reason why I've suggested
> before to add a CONFIG_XPMEM or CONFIG_MMU_NOTIFIER_SLEEP config
> option to make the VM locks sleep capable only when XPMEM=y
> (PREEMPT_RT will enable it too). Thanks for doing the benchmark work!

There are more context switches if locks are contended. 

But that has actually also some good aspects because we avoid busy loops 
and can potentially continue work in another process.


From clameter at sgi.com  Wed Apr  2 12:03:50 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 12:03:50 -0700 (PDT)
Subject: [ofa-general] EMM: Fixup return value handling of emm_notify()
In-Reply-To: <Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
Message-ID: <Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Christoph Lameter wrote:

> Here f.e. We can add a special emm_age() function that iterates 
> differently and does the | for you.

Well maybe not really necessary. How about this fix? Its likely a problem 
to stop callbacks if one callback returned an error.


Subject: EMM: Fixup return value handling of emm_notify()

Right now we stop calling additional subsystems if one callback returned
an error. That has the potential for causing additional trouble with the
subsystems that do not receive the callbacks they expect if one has failed.

So change the handling of error code to continue callbacks to other
subsystems but return the first error code encountered.

If a callback returns a positive return value then add up all the value
from all the calls. That can be used to establish how many references
exist (xpmem may want this feature at some point) or ensure that the
aging works the way Andrea wants it to (KVM, XPmem so far do not
care too much).

Signed-off-by: Christoph Lameter <clameter at sgi.com>

---
 mm/rmap.c |   28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c	2008-04-02 11:46:20.738342852 -0700
+++ linux-2.6/mm/rmap.c	2008-04-02 12:03:57.672494320 -0700
@@ -299,27 +299,45 @@ void emm_notifier_register(struct emm_no
 }
 EXPORT_SYMBOL_GPL(emm_notifier_register);
 
-/* Perform a callback */
+/*
+ * Perform a callback
+ *
+ * The return of this function is either a negative error of the first
+ * callback that failed or a consolidated count of all the positive
+ * values that were returned by the callbacks.
+ */
 int __emm_notify(struct mm_struct *mm, enum emm_operation op,
 		unsigned long start, unsigned long end)
 {
 	struct emm_notifier *e = rcu_dereference(mm->emm_notifier);
 	int x;
+	int result = 0;
 
 	while (e) {
-
 		if (e->callback) {
 			x = e->callback(e, mm, op, start, end);
-			if (x)
-				return x;
+
+			/*
+			 * Callback may return a positive value to indicate a count
+			 * or a negative error code. We keep the first error code
+			 * but continue to perform callbacks to other subscribed
+			 * subsystems.
+			 */
+			if (x && result >= 0) {
+				if (x >= 0)
+					result += x;
+				else
+					result = x;
+			}
 		}
+
 		/*
 		 * emm_notifier contents (e) must be fetched after
 		 * the retrival of the pointer to the notifier.
 		 */
 		e = rcu_dereference(e->next);
 	}
-	return 0;
+	return result;
 }
 EXPORT_SYMBOL_GPL(__emm_notify);
 #endif


From clameter at sgi.com  Wed Apr  2 14:05:28 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 14:05:28 -0700 (PDT)
Subject: [ofa-general] EMM: Require single threadedness for registration.
In-Reply-To: <Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
Message-ID: <Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>

Here is a patch to require single threaded execution during emm_register. 
This also allows an easy implementation of an unregister function and gets
rid of the races that Andrea worried about.

The approach here is similar to what was used in selinux for security
context changes (see selinux_setprocattr).

Is it okay for the users of emm to require single threadedness for 
registration?


Subject: EMM: Require single threaded execution for register and unregister

We can avoid the concurrency issues arising at registration if we
only allow registration of notifiers when the process has only a single
thread. That even allows to avoid the use of rcu.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

---
 mm/rmap.c |   46 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 9 deletions(-)

Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c	2008-04-02 13:53:46.002473685 -0700
+++ linux-2.6/mm/rmap.c	2008-04-02 14:03:05.872199896 -0700
@@ -286,20 +286,48 @@ void emm_notifier_release(struct mm_stru
 	}
 }
 
-/* Register a notifier */
+/*
+ * Register a notifier
+ *
+ * mmap_sem is held writably.
+ *
+ * Process must be single threaded.
+ */
 void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm)
 {
+	BUG_ON(atomic_read(&mm->mm_users) != 1);
+
 	e->next = mm->emm_notifier;
-	/*
-	 * The update to emm_notifier (e->next) must be visible
-	 * before the pointer becomes visible.
-	 * rcu_assign_pointer() does exactly what we need.
-	 */
-	rcu_assign_pointer(mm->emm_notifier, e);
+	mm->emm_notifier = e;
 }
 EXPORT_SYMBOL_GPL(emm_notifier_register);
 
 /*
+ * Unregister a notifier
+ *
+ * mmap_sem is held writably
+ *
+ * Process must be single threaded
+ */
+void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm)
+{
+	struct emm_notifier *p = mm->emm_notifier;
+
+	BUG_ON(atomic_read(&mm->mm_users) != 1);
+
+	if (e == p)
+		mm->emm_notifier = e->next;
+	else {
+		while (p->next != e)
+			p = p->next;
+
+		p->next = e->next;
+	}
+	e->callback(e, mm, emm_release, 0, TASK_SIZE);
+}
+EXPORT_SYMBOL_GPL(emm_notifier_unregister);
+
+/*
  * Perform a callback
  *
  * The return of this function is either a negative error of the first
@@ -309,7 +337,7 @@ EXPORT_SYMBOL_GPL(emm_notifier_register)
 int __emm_notify(struct mm_struct *mm, enum emm_operation op,
 		unsigned long start, unsigned long end)
 {
-	struct emm_notifier *e = rcu_dereference(mm->emm_notifier);
+	struct emm_notifier *e = mm->emm_notifier;
 	int x;
 	int result = 0;
 
@@ -335,7 +363,7 @@ int __emm_notify(struct mm_struct *mm, e
 		 * emm_notifier contents (e) must be fetched after
 		 * the retrival of the pointer to the notifier.
 		 */
-		e = rcu_dereference(e->next);
+		e = e->next;
 	}
 	return result;
 }


From andrea at qumranet.com  Wed Apr  2 14:25:15 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 2 Apr 2008 23:25:15 +0200
Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify()
In-Reply-To: <Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com>
Message-ID: <20080402212515.GS19189@duo.random>

On Wed, Apr 02, 2008 at 12:03:50PM -0700, Christoph Lameter wrote:
> +			/*
> +			 * Callback may return a positive value to indicate a count
> +			 * or a negative error code. We keep the first error code
> +			 * but continue to perform callbacks to other subscribed
> +			 * subsystems.
> +			 */
> +			if (x && result >= 0) {
> +				if (x >= 0)
> +					result += x;
> +				else
> +					result = x;
> +			}
>  		}
> +

Now think of when one of the kernel janitors will micro-optimize
PG_dirty to be returned by invalidate_page so a single set_page_dirty
will be invoked... Keep in mind this is a kernel internal APIs, ask
Greg if we can change it in order to optimize later in the future. I
think my #v9 is optimal enough while being simple at the same time,
but anyway it's silly to be hardwired to such an interface that worst
of all requires switch statements instead of proper pointer to
functions and a fixed set of parameters and retval semantics for all
methods.


From clameter at sgi.com  Wed Apr  2 14:33:51 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 14:33:51 -0700 (PDT)
Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify()
In-Reply-To: <20080402212515.GS19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com>
	<20080402212515.GS19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021427210.30516@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> but anyway it's silly to be hardwired to such an interface that worst
> of all requires switch statements instead of proper pointer to
> functions and a fixed set of parameters and retval semantics for all
> methods.

The EMM API with a single callback is the simplest approach at this point. 
A common callback for all operations allows the driver to implement common 
entry and exit code as seen in XPMem.

I guess we can complicate this more by switching to a different API or 
adding additional emm_xxx() callback if need be but I really want to have 
a strong case for why this would be needed. There is the danger of 
adding frills with special callbacks in this and that situation that could 
make the notifier complicated and specific to a certain usage scenario. 

Having this generic simple interface will hopefully avoid such things.


From andrea at qumranet.com  Wed Apr  2 14:30:01 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:01 +0200
Subject: [ofa-general] [PATCH 0 of 8] mmu notifiers #v10
Message-ID: <patchbomb.1207171801@duo.random>

Hello,

this is the mmu notifier #v10. Patches 1 and 2 are the only difference between
this and EMM V2. The rest is the same as with Christoph's patches.

I think maximum priority should be given in merging patch 1 and 2 into -mm and
ASAP in mainline.

Patches from 3 to 8 can go in -mm for testing but I'm not sure if we should
support sleep capable notifiers in mainline unless we make the VM locking
conditional to avoid overscheduling for extremely small critical sections in
the common case. I only rediffed Christoph's patches on top of the mmu
notifier patches.

KVM current plans are to heavily depend on mmu notifiers for swapping, to
optimize the spte faults, and we need it for smp guest ballooning with
madvise(DONT_NEED) and other optimizations and features.

Patches from 3 to 8 are Christoph's work ported on top of #v10 to make the
#v10 mmu notifiers sleep capable (at least supposedly). I didn't test the
scheduling, but I assume you'll quickly test XPMEM on top of this.


From andrea at qumranet.com  Wed Apr  2 14:30:02 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:02 +0200
Subject: [ofa-general] [PATCH 1 of 8] Core of mmu notifiers
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <a406c0cc686d0ca94a4d.1207171802@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207158873 -7200
# Node ID a406c0cc686d0ca94a4d890d661cdfa48cfba09f
# Parent  249e077dc932a5322e04ac1d69326622ea4023b8
Core of mmu notifiers.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Nick Piggin <npiggin at suse.de>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include <linux/rbtree.h>
 #include <linux/rwsem.h>
 #include <linux/completion.h>
+#include <linux/seqlock.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
 
@@ -225,6 +226,10 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+	struct hlist_head mmu_notifier_list;
+	seqlock_t mmu_notifier_lock;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,181 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mm_types.h>
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+	/*
+	 * Called when nobody can register any more notifier in the mm
+	 * and after the "mn" notifier has been disarmed already.
+	 */
+	void (*release)(struct mmu_notifier *mn,
+			struct mm_struct *mm);
+
+	/*
+	 * clear_flush_young is called after the VM is
+	 * test-and-clearing the young/accessed bitflag in the
+	 * pte. This way the VM will provide proper aging to the
+	 * accesses to the page through the secondary MMUs and not
+	 * only to the ones through the Linux pte.
+	 */
+	int (*clear_flush_young)(struct mmu_notifier *mn,
+				 struct mm_struct *mm,
+				 unsigned long address);
+
+	/*
+	 * Before this is invoked any secondary MMU is still ok to
+	 * read/write to the page previously pointed by the Linux pte
+	 * because the old page hasn't been freed yet.  If required
+	 * set_page_dirty has to be called internally to this method.
+	 */
+	void (*invalidate_page)(struct mmu_notifier *mn,
+				struct mm_struct *mm,
+				unsigned long address);
+
+	/*
+	 * invalidate_range_start() and invalidate_range_end() must be
+	 * paired. Multiple invalidate_range_start/ends may be nested
+	 * or called concurrently.
+	 */
+	void (*invalidate_range_start)(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long start, unsigned long end);
+	void (*invalidate_range_end)(struct mmu_notifier *mn,
+				     struct mm_struct *mm,
+				     unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+	struct hlist_node hlist;
+	const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return unlikely(!hlist_empty(&mm->mmu_notifier_list));
+}
+
+/*
+ * Must hold the mmap_sem for write.
+ *
+ * RCU is used to traverse the list.
+ */
+extern void mmu_notifier_register(struct mmu_notifier *mn,
+				  struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_clear_flush_young(mm, address);
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_end(mm, start, end);
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+	INIT_HLIST_HEAD(&mm->mmu_notifier_list);
+	seqlock_init(&mm->mmu_notifier_lock);
+}
+
+#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
+({									\
+	pte_t __pte;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
+	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
+	__pte;								\
+})
+
+#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
+({									\
+	int __young;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
+	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
+						  ___address);		\
+	__young;							\
+})
+
+#else /* CONFIG_MMU_NOTIFIER */
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+}
+
+#define ptep_clear_flush_young_notify ptep_clear_flush_young
+#define ptep_clear_flush_notify ptep_clear_flush
+
+#endif /* CONFIG_MMU_NOTIFIER */
+
+#endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -53,6 +53,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -362,6 +363,7 @@
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
+		mmu_notifier_mm_init(mm);
 		return mm;
 	}
 
diff --git a/mm/Kconfig b/mm/Kconfig
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,3 +193,7 @@
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config MMU_NOTIFIER
+	def_bool y
+	bool "MMU notifier, for paging KVM/RDMA"
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,4 +33,4 @@
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
-
+obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush(vma, address, pte);
+			pteval = ptep_clear_flush_notify(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -214,7 +215,9 @@
 		spin_unlock(&mapping->i_mmap_lock);
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = populate_range(mm, vma, start, size, pgoff);
+	mmu_notifier_invalidate_range_end(mm, start, start + size);
 	if (!err && !(flags & MAP_NONBLOCK)) {
 		if (unlikely(has_write_lock)) {
 			downgrade_write(&mm->mmap_sem);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,6 +14,7 @@
 #include <linux/mempolicy.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -799,6 +800,7 @@
 	BUG_ON(start & ~HPAGE_MASK);
 	BUG_ON(end & ~HPAGE_MASK);
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -819,6 +821,7 @@
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		list_del(&page->lru);
 		put_page(page);
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -611,6 +612,9 @@
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_start(src_mm, addr, end);
+
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);
 	do {
@@ -621,6 +625,11 @@
 						vma, addr, next))
 			return -ENOMEM;
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_end(src_mm,
+						vma->vm_start, end);
+
 	return 0;
 }
 
@@ -897,7 +906,9 @@
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start(mm, address, end);
 	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
+	mmu_notifier_invalidate_range_end(mm, address, end);
 	if (tlb)
 		tlb_finish_mmu(tlb, address, end);
 	return end;
@@ -1463,10 +1474,11 @@
 {
 	pgd_t *pgd;
 	unsigned long next;
-	unsigned long end = addr + size;
+	unsigned long start = addr, end = addr + size;
 	int err;
 
 	BUG_ON(addr >= end);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pgd = pgd_offset(mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1474,6 +1486,7 @@
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1675,7 +1688,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush(vma, address, page_table);
+		ptep_clear_flush_notify(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,7 @@
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1747,11 +1748,13 @@
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
 	tlb_finish_mmu(tlb, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 /*
@@ -2037,6 +2040,7 @@
 	unsigned long end;
 
 	/* mm's last user has gone, and its about to be pulled down */
+	mmu_notifier_release(mm);
 	arch_exit_mmap(mm);
 
 	lru_add_drain();
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
new file mode 100644
--- /dev/null
+++ b/mm/mmu_notifier.c
@@ -0,0 +1,121 @@
+/*
+ *  linux/mm/mmu_notifier.c
+ *
+ *  Copyright (C) 2008  Qumranet, Inc.
+ *  Copyright (C) 2008  SGI
+ *             Christoph Lameter <clameter at sgi.com>
+ *
+ *  This work is licensed under the terms of the GNU GPL, version 2. See
+ *  the COPYING file in the top-level directory.
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/module.h>
+
+/*
+ * No synchronization. This function can only be called when only a single
+ * process remains that performs teardown.
+ */
+void __mmu_notifier_release(struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+	unsigned seq;
+
+	seq = read_seqbegin(&mm->mmu_notifier_lock);
+	while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
+		mn = hlist_entry(mm->mmu_notifier_list.first,
+				 struct mmu_notifier,
+				 hlist);
+		hlist_del(&mn->hlist);
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+		BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq));
+	}
+}
+
+/*
+ * If no young bitflag is supported by the hardware, ->clear_flush_young can
+ * unmap the address and return 1 or 0 depending if the mapping previously
+ * existed or not.
+ */
+int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int young = 0;
+	unsigned seq;
+
+	seq = read_seqbegin(&mm->mmu_notifier_lock);
+	do {
+		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
+			if (mn->ops->clear_flush_young)
+				young |= mn->ops->clear_flush_young(mn, mm,
+								    address);
+		}
+	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
+
+	return young;
+}
+
+void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	unsigned seq;
+
+	seq = read_seqbegin(&mm->mmu_notifier_lock);
+	do {
+		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
+			if (mn->ops->invalidate_page)
+				mn->ops->invalidate_page(mn, mm, address);
+		}
+	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
+}
+
+void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	unsigned seq;
+
+	seq = read_seqbegin(&mm->mmu_notifier_lock);
+	do {
+		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
+			if (mn->ops->invalidate_range_start)
+				mn->ops->invalidate_range_start(mn, mm,
+								start, end);
+		}
+	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
+}
+
+void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	unsigned seq;
+
+	seq = read_seqbegin(&mm->mmu_notifier_lock);
+	do {
+		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
+			if (mn->ops->invalidate_range_end)
+				mn->ops->invalidate_range_end(mn, mm,
+							      start, end);
+		}
+	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
+}
+
+/*
+ * Must hold mmap_sem writably when calling registration functions.
+ */
+void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	write_seqlock(&mm->mmu_notifier_lock);
+	hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_list);
+	write_sequnlock(&mm->mmu_notifier_lock);
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_register);
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -21,6 +21,7 @@
 #include <linux/syscalls.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
@@ -198,10 +199,12 @@
 		dirty_accountable = 1;
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
 	vm_stat_account(mm, newflags, vma->vm_file, nrpages);
 	return 0;
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -18,6 +18,7 @@
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -74,7 +75,11 @@
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
+	unsigned long old_start;
 
+	old_start = old_addr;
+	mmu_notifier_invalidate_range_start(vma->vm_mm,
+					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -116,6 +121,7 @@
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -49,6 +49,7 @@
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlbflush.h>
 
@@ -287,7 +288,7 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young(vma, address, pte))
+	} else if (ptep_clear_flush_young_notify(vma, address, pte))
 		referenced++;
 
 	/* Pretend the page is referenced if the task has the
@@ -456,7 +457,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
+		entry = ptep_clear_flush_notify(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -717,14 +718,14 @@
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young(vma, address, pte)))) {
+			(ptep_clear_flush_young_notify(vma, address, pte)))) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+	pteval = ptep_clear_flush_notify(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -849,12 +850,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young(vma, address, pte))
+		if (ptep_clear_flush_young_notify(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
+		pteval = ptep_clear_flush_notify(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))


From andrea at qumranet.com  Wed Apr  2 14:30:03 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:03 +0200
Subject: [ofa-general] [PATCH 2 of 8] Moves all mmu notifier methods outside
	the PT lock (first and not last
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <fe00cb9deeb314673963.1207171803@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159010 -7200
# Node ID fe00cb9deeb31467396370c835cb808f4b85209a
# Parent  a406c0cc686d0ca94a4d890d661cdfa48cfba09f
Moves all mmu notifier methods outside the PT lock (first and not last
step to make them sleep capable).

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -121,27 +121,6 @@
 	seqlock_init(&mm->mmu_notifier_lock);
 }
 
-#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
-({									\
-	pte_t __pte;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
-	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
-	__pte;								\
-})
-
-#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
-	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
-						  ___address);		\
-	__young;							\
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -173,9 +152,6 @@
 {
 }
 
-#define ptep_clear_flush_young_notify ptep_clear_flush_young
-#define ptep_clear_flush_notify ptep_clear_flush
-
 #endif /* CONFIG_MMU_NOTIFIER */
 
 #endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,11 +194,13 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush_notify(vma, address, pte);
+			pteval = ptep_clear_flush(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
+			/* must invalidate_page _before_ freeing the page */
+			mmu_notifier_invalidate_page(mm, address);
 			page_cache_release(page);
 		}
 	}
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1626,9 +1626,10 @@
 			 */
 			page_table = pte_offset_map_lock(mm, pmd, address,
 							 &ptl);
-			page_cache_release(old_page);
+			new_page = NULL;
 			if (!pte_same(*page_table, orig_pte))
 				goto unlock;
+			page_cache_release(old_page);
 
 			page_mkwrite = 1;
 		}
@@ -1644,6 +1645,7 @@
 		if (ptep_set_access_flags(vma, address, page_table, entry,1))
 			update_mmu_cache(vma, address, entry);
 		ret |= VM_FAULT_WRITE;
+		old_page = new_page = NULL;
 		goto unlock;
 	}
 
@@ -1688,7 +1690,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush_notify(vma, address, page_table);
+		ptep_clear_flush(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
@@ -1700,12 +1702,18 @@
 	} else
 		mem_cgroup_uncharge_page(new_page);
 
-	if (new_page)
+unlock:
+	pte_unmap_unlock(page_table, ptl);
+
+	if (new_page) {
+		if (new_page == old_page)
+			/* cow happened, notify before releasing old_page */
+			mmu_notifier_invalidate_page(mm, address);
 		page_cache_release(new_page);
+	}
 	if (old_page)
 		page_cache_release(old_page);
-unlock:
-	pte_unmap_unlock(page_table, ptl);
+
 	if (dirty_page) {
 		if (vma->vm_file)
 			file_update_time(vma->vm_file);
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -275,7 +275,7 @@
 	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
-	int referenced = 0;
+	int referenced = 0, clear_flush_young = 0;
 
 	address = vma_address(page, vma);
 	if (address == -EFAULT)
@@ -288,8 +288,11 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young_notify(vma, address, pte))
-		referenced++;
+	} else {
+		clear_flush_young = 1;
+		if (ptep_clear_flush_young(vma, address, pte))
+			referenced++;
+	}
 
 	/* Pretend the page is referenced if the task has the
 	   swap token and is in the middle of a page fault. */
@@ -299,6 +302,10 @@
 
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
+
+	if (clear_flush_young)
+		referenced += mmu_notifier_clear_flush_young(mm, address);
+
 out:
 	return referenced;
 }
@@ -457,7 +464,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush_notify(vma, address, pte);
+		entry = ptep_clear_flush(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -465,6 +472,10 @@
 	}
 
 	pte_unmap_unlock(pte, ptl);
+
+	if (ret)
+		mmu_notifier_invalidate_page(mm, address);
+
 out:
 	return ret;
 }
@@ -717,15 +728,14 @@
 	 * If it's recently referenced (perhaps page_referenced
 	 * skipped over this mm) then we should reactivate it.
 	 */
-	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young_notify(vma, address, pte)))) {
+	if (!migration && (vma->vm_flags & VM_LOCKED)) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush_notify(vma, address, pte);
+	pteval = ptep_clear_flush(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -780,6 +790,8 @@
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+	if (ret != SWAP_FAIL)
+		mmu_notifier_invalidate_page(mm, address);
 out:
 	return ret;
 }
@@ -818,7 +830,7 @@
 	spinlock_t *ptl;
 	struct page *page;
 	unsigned long address;
-	unsigned long end;
+	unsigned long start, end;
 
 	address = (vma->vm_start + cursor) & CLUSTER_MASK;
 	end = address + CLUSTER_SIZE;
@@ -839,6 +851,8 @@
 	if (!pmd_present(*pmd))
 		return;
 
+	start = address;
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 
 	/* Update high watermark before we lower rss */
@@ -850,12 +864,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young_notify(vma, address, pte))
+		if (ptep_clear_flush_young(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush_notify(vma, address, pte);
+		pteval = ptep_clear_flush(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))
@@ -871,6 +885,7 @@
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 static int try_to_unmap_anon(struct page *page, int migration)


From andrea at qumranet.com  Wed Apr  2 14:30:04 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:04 +0200
Subject: [ofa-general] [PATCH 3 of 8] Move the tlb flushing into
	free_pgtables. The conversion of the locks
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <d880c227ddf345f5d577.1207171804@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159010 -7200
# Node ID d880c227ddf345f5d577839d36d150c37b653bfd
# Parent  fe00cb9deeb31467396370c835cb808f4b85209a
Move the tlb flushing into free_pgtables. The conversion of the locks
taken for reverse map scanning would require taking sleeping locks
in free_pgtables(). Moving the tlb flushing into free_pgtables allows
sleeping in parts of free_pgtables().

This means that we do a tlb_finish_mmu() before freeing the page tables.
Strictly speaking there may not be the need to do another tlb flush after
freeing the tables. But its the only way to free a series of page table
pages from the tlb list. And we do not want to call into the page allocator
for performance reasons. Aim9 numbers look okay after this patch.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -751,8 +751,8 @@
 		    void *private);
 void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
-		unsigned long floor, unsigned long ceiling);
+void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor,
+						unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -272,9 +272,11 @@
 	} while (pgd++, addr = next, addr != end);
 }
 
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
-		unsigned long floor, unsigned long ceiling)
+void free_pgtables(struct vm_area_struct *vma, unsigned long floor,
+							unsigned long ceiling)
 {
+	struct mmu_gather *tlb;
+
 	while (vma) {
 		struct vm_area_struct *next = vma->vm_next;
 		unsigned long addr = vma->vm_start;
@@ -286,8 +288,10 @@
 		unlink_file_vma(vma);
 
 		if (is_vm_hugetlb_page(vma)) {
-			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
+			tlb = tlb_gather_mmu(vma->vm_mm, 0);
+			hugetlb_free_pgd_range(&tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
+			tlb_finish_mmu(tlb, addr, vma->vm_end);
 		} else {
 			/*
 			 * Optimization: gather nearby vmas into one call down
@@ -299,8 +303,10 @@
 				anon_vma_unlink(vma);
 				unlink_file_vma(vma);
 			}
-			free_pgd_range(tlb, addr, vma->vm_end,
+			tlb = tlb_gather_mmu(vma->vm_mm, 0);
+			free_pgd_range(&tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
+			tlb_finish_mmu(tlb, addr, vma->vm_end);
 		}
 		vma = next;
 	}
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1751,9 +1751,9 @@
 	mmu_notifier_invalidate_range_start(mm, start, end);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
+	tlb_finish_mmu(tlb, start, end);
+	free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
-	tlb_finish_mmu(tlb, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
@@ -2050,8 +2050,8 @@
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
 	tlb_finish_mmu(tlb, 0, end);
+	free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
 	/*
 	 * Walk the list again, actually closing and freeing it,


From andrea at qumranet.com  Wed Apr  2 14:30:05 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:05 +0200
Subject: [ofa-general] [PATCH 4 of 8] The conversion to a rwsem allows
	callbacks during rmap traversal
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <3c3787c496cab1fc590b.1207171805@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159011 -7200
# Node ID 3c3787c496cab1fc590ba3f97e7904bdfaab5375
# Parent  d880c227ddf345f5d577839d36d150c37b653bfd
The conversion to a rwsem allows callbacks during rmap traversal
for files in a non atomic context. A rw style lock also allows concurrent
walking of the reverse map. This is fairly straightforward if one removes
pieces of the resched checking.

[Restarting unmapping is an issue to be discussed].

This slightly increases Aim9 performance results on an 8p.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -69,7 +69,7 @@
 	if (!vma_shareable(vma, addr))
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;
@@ -94,7 +94,7 @@
 		put_page(virt_to_page(spte));
 	spin_unlock(&mm->page_table_lock);
 out:
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -454,10 +454,10 @@
 	pgoff = offset >> PAGE_SHIFT;
 
 	i_size_write(inode, offset);
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	if (!prio_tree_empty(&mapping->i_mmap))
 		hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	truncate_hugepages(inode, offset);
 	return 0;
 }
diff --git a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -210,7 +210,7 @@
 	INIT_LIST_HEAD(&inode->i_devices);
 	INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
 	rwlock_init(&inode->i_data.tree_lock);
-	spin_lock_init(&inode->i_data.i_mmap_lock);
+	init_rwsem(&inode->i_data.i_mmap_sem);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
 	spin_lock_init(&inode->i_data.private_lock);
 	INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -503,7 +503,7 @@
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
 	struct prio_tree_root	i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-	spinlock_t		i_mmap_lock;	/* protect tree, count, list */
+	struct rw_semaphore	i_mmap_sem;	/* protect tree, count, list */
 	unsigned int		truncate_count;	/* Cover race condition with truncate */
 	unsigned long		nrpages;	/* number of total pages */
 	pgoff_t			writeback_index;/* writeback starts here */
diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -716,7 +716,7 @@
 	struct address_space *check_mapping;	/* Check page->mapping if set */
 	pgoff_t	first_index;			/* Lowest page->index to unmap */
 	pgoff_t last_index;			/* Highest page->index to unmap */
-	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
+	struct rw_semaphore *i_mmap_sem;	/* For unmap_mapping_range: */
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
 };
 
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -274,12 +274,12 @@
 				atomic_dec(&inode->i_writecount);
 
 			/* insert tmp into the share list, just after mpnt */
-			spin_lock(&file->f_mapping->i_mmap_lock);
+			down_write(&file->f_mapping->i_mmap_sem);
 			tmp->vm_truncate_count = mpnt->vm_truncate_count;
 			flush_dcache_mmap_lock(file->f_mapping);
 			vma_prio_tree_add(tmp, mpnt);
 			flush_dcache_mmap_unlock(file->f_mapping);
-			spin_unlock(&file->f_mapping->i_mmap_lock);
+			up_write(&file->f_mapping->i_mmap_sem);
 		}
 
 		/*
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -61,16 +61,16 @@
 /*
  * Lock ordering:
  *
- *  ->i_mmap_lock		(vmtruncate)
+ *  ->i_mmap_sem		(vmtruncate)
  *    ->private_lock		(__free_pte->__set_page_dirty_buffers)
  *      ->swap_lock		(exclusive_swap_page, others)
  *        ->mapping->tree_lock
  *
  *  ->i_mutex
- *    ->i_mmap_lock		(truncate->unmap_mapping_range)
+ *    ->i_mmap_sem		(truncate->unmap_mapping_range)
  *
  *  ->mmap_sem
- *    ->i_mmap_lock
+ *    ->i_mmap_sem
  *      ->page_table_lock or pte_lock	(various, mainly in memory.c)
  *        ->mapping->tree_lock	(arch-dependent flush_dcache_mmap_lock)
  *
@@ -87,7 +87,7 @@
  *    ->sb_lock			(fs/fs-writeback.c)
  *    ->mapping->tree_lock	(__sync_single_inode)
  *
- *  ->i_mmap_lock
+ *  ->i_mmap_sem
  *    ->anon_vma.lock		(vma_adjust)
  *
  *  ->anon_vma.lock
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -184,7 +184,7 @@
 	if (!page)
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		mm = vma->vm_mm;
 		address = vma->vm_start +
@@ -204,7 +204,7 @@
 			page_cache_release(page);
 		}
 	}
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -206,13 +206,13 @@
 			}
 			goto out;
 		}
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		flush_dcache_mmap_lock(mapping);
 		vma->vm_flags |= VM_NONLINEAR;
 		vma_prio_tree_remove(vma, &mapping->i_mmap);
 		vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
 		flush_dcache_mmap_unlock(mapping);
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	}
 
 	mmu_notifier_invalidate_range_start(mm, start, start + size);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -790,7 +790,7 @@
 	struct page *page;
 	struct page *tmp;
 	/*
-	 * A page gathering list, protected by per file i_mmap_lock. The
+	 * A page gathering list, protected by per file i_mmap_sem. The
 	 * lock is used to avoid list corruption from multiple unmapping
 	 * of the same page since we are using page->lru.
 	 */
@@ -840,9 +840,9 @@
 	 * do nothing in this case.
 	 */
 	if (vma->vm_file) {
-		spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+		down_write(&vma->vm_file->f_mapping->i_mmap_sem);
 		__unmap_hugepage_range(vma, start, end);
-		spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+		up_write(&vma->vm_file->f_mapping->i_mmap_sem);
 	}
 }
 
@@ -1085,7 +1085,7 @@
 	BUG_ON(address >= end);
 	flush_cache_range(vma, address, end);
 
-	spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+	down_write(&vma->vm_file->f_mapping->i_mmap_sem);
 	spin_lock(&mm->page_table_lock);
 	for (; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -1100,7 +1100,7 @@
 		}
 	}
 	spin_unlock(&mm->page_table_lock);
-	spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+	up_write(&vma->vm_file->f_mapping->i_mmap_sem);
 
 	flush_tlb_range(vma, start, end);
 }
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -838,7 +838,6 @@
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
 	int fullmm = (*tlbp)->fullmm;
 
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
@@ -875,22 +874,12 @@
 			}
 
 			tlb_finish_mmu(*tlbp, tlb_start, start);
-
-			if (need_resched() ||
-				(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
-				if (i_mmap_lock) {
-					*tlbp = NULL;
-					goto out;
-				}
-				cond_resched();
-			}
-
+			cond_resched();
 			*tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
 			tlb_start_valid = 0;
 			zap_work = ZAP_BLOCK_SIZE;
 		}
 	}
-out:
 	return start;	/* which is now the end (or restart) address */
 }
 
@@ -1752,7 +1741,7 @@
 /*
  * Helper functions for unmap_mapping_range().
  *
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
+ * __ Notes on dropping i_mmap_sem to reduce latency while unmapping __
  *
  * We have to restart searching the prio_tree whenever we drop the lock,
  * since the iterator is only valid while the lock is held, and anyway
@@ -1771,7 +1760,7 @@
  * can't efficiently keep all vmas in step with mapping->truncate_count:
  * so instead reset them all whenever it wraps back to 0 (then go to 1).
  * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
+ * i_mmap_sem.
  *
  * In order to make forward progress despite repeatedly restarting some
  * large vma, note the restart_addr from unmap_vmas when it breaks out:
@@ -1821,7 +1810,7 @@
 
 	restart_addr = zap_page_range(vma, start_addr,
 					end_addr - start_addr, details);
-	need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
+	need_break = need_resched();
 
 	if (restart_addr >= end_addr) {
 		/* We have now completed this vma: mark it so */
@@ -1835,9 +1824,9 @@
 			goto again;
 	}
 
-	spin_unlock(details->i_mmap_lock);
+	up_write(details->i_mmap_sem);
 	cond_resched();
-	spin_lock(details->i_mmap_lock);
+	down_write(details->i_mmap_sem);
 	return -EINTR;
 }
 
@@ -1931,9 +1920,9 @@
 	details.last_index = hba + hlen - 1;
 	if (details.last_index < details.first_index)
 		details.last_index = ULONG_MAX;
-	details.i_mmap_lock = &mapping->i_mmap_lock;
+	details.i_mmap_sem = &mapping->i_mmap_sem;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_write(&mapping->i_mmap_sem);
 
 	/* Protect against endless unmapping loops */
 	mapping->truncate_count++;
@@ -1948,7 +1937,7 @@
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
 	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
 		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
-	spin_unlock(&mapping->i_mmap_lock);
+	up_write(&mapping->i_mmap_sem);
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -211,12 +211,12 @@
 	if (!mapping)
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff)
 		remove_migration_pte(vma, old, new);
 
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -187,7 +187,7 @@
 }
 
 /*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_sem
  */
 static void __remove_shared_vm_struct(struct vm_area_struct *vma,
 		struct file *file, struct address_space *mapping)
@@ -215,9 +215,9 @@
 
 	if (file) {
 		struct address_space *mapping = file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		__remove_shared_vm_struct(vma, file, mapping);
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	}
 }
 
@@ -440,7 +440,7 @@
 		mapping = vma->vm_file->f_mapping;
 
 	if (mapping) {
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		vma->vm_truncate_count = mapping->truncate_count;
 	}
 	anon_vma_lock(vma);
@@ -450,7 +450,7 @@
 
 	anon_vma_unlock(vma);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 
 	mm->map_count++;
 	validate_mm(mm);
@@ -537,7 +537,7 @@
 		mapping = file->f_mapping;
 		if (!(vma->vm_flags & VM_NONLINEAR))
 			root = &mapping->i_mmap;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		if (importer &&
 		    vma->vm_truncate_count != next->vm_truncate_count) {
 			/*
@@ -621,7 +621,7 @@
 	if (anon_vma)
 		spin_unlock(&anon_vma->lock);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 
 	if (remove_next) {
 		if (file)
@@ -2065,7 +2065,7 @@
 
 /* Insert vm structure into process list sorted by address
  * and into the inode's i_mmap tree.  If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_sem is taken here.
  */
 int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
 {
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -88,7 +88,7 @@
 		 * and we propagate stale pages into the dst afterward.
 		 */
 		mapping = vma->vm_file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		if (new_vma->vm_truncate_count &&
 		    new_vma->vm_truncate_count != vma->vm_truncate_count)
 			new_vma->vm_truncate_count = 0;
@@ -120,7 +120,7 @@
 	pte_unmap_nested(new_pte - 1);
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -24,7 +24,7 @@
  *   inode->i_alloc_sem (vmtruncate_range)
  *   mm->mmap_sem
  *     page->flags PG_locked (lock_page)
- *       mapping->i_mmap_lock
+ *       mapping->i_mmap_sem
  *         anon_vma->lock
  *           mm->page_table_lock or pte_lock
  *             zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -373,14 +373,14 @@
 	 * The page lock not only makes sure that page->mapping cannot
 	 * suddenly be NULLified by truncation, it makes sure that the
 	 * structure at mapping cannot be freed and reused yet,
-	 * so we can safely take mapping->i_mmap_lock.
+	 * so we can safely take mapping->i_mmap_sem.
 	 */
 	BUG_ON(!PageLocked(page));
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 
 	/*
-	 * i_mmap_lock does not stabilize mapcount at all, but mapcount
+	 * i_mmap_sem does not stabilize mapcount at all, but mapcount
 	 * is more likely to be accurate if we note it after spinning.
 	 */
 	mapcount = page_mapcount(page);
@@ -403,7 +403,7 @@
 			break;
 	}
 
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	return referenced;
 }
 
@@ -489,12 +489,12 @@
 
 	BUG_ON(PageAnon(page));
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		if (vma->vm_flags & VM_SHARED)
 			ret += page_mkclean_one(page, vma);
 	}
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	return ret;
 }
 
@@ -930,7 +930,7 @@
 	unsigned long max_nl_size = 0;
 	unsigned int mapcount;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		ret = try_to_unmap_one(page, vma, migration);
 		if (ret == SWAP_FAIL || !page_mapped(page))
@@ -967,7 +967,6 @@
 	mapcount = page_mapcount(page);
 	if (!mapcount)
 		goto out;
-	cond_resched_lock(&mapping->i_mmap_lock);
 
 	max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
 	if (max_nl_cursor == 0)
@@ -989,7 +988,6 @@
 			}
 			vma->vm_private_data = (void *) max_nl_cursor;
 		}
-		cond_resched_lock(&mapping->i_mmap_lock);
 		max_nl_cursor += CLUSTER_SIZE;
 	} while (max_nl_cursor <= max_nl_size);
 
@@ -1001,7 +999,7 @@
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
 		vma->vm_private_data = NULL;
 out:
-	spin_unlock(&mapping->i_mmap_lock);
+	up_write(&mapping->i_mmap_sem);
 	return ret;
 }
 

From andrea at qumranet.com  Wed Apr  2 14:30:06 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:06 +0200
Subject: [ofa-general] [PATCH 5 of 8] We no longer abort unmapping in unmap
	vmas because we can reschedule while
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <316e5b1e4bf388ef0198.1207171806@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159055 -7200
# Node ID 316e5b1e4bf388ef0198c91b3067ed1e4171d7f6
# Parent  3c3787c496cab1fc590ba3f97e7904bdfaab5375
We no longer abort unmapping in unmap vmas because we can reschedule while
unmapping since we are holding a semaphore. This would allow moving more
of the tlb flusing into unmap_vmas reducing code in various places.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -723,8 +723,7 @@
 struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
-		struct vm_area_struct *start_vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *);
 
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -805,7 +805,6 @@
 
 /**
  * unmap_vmas - unmap a range of memory covered by a list of vma's
- * @tlbp: address of the caller's struct mmu_gather
  * @vma: the starting vma
  * @start_addr: virtual address at which to start unmapping
  * @end_addr: virtual address at which to end unmapping
@@ -817,20 +816,13 @@
  * Unmap all pages in the vma list.
  *
  * We aim to not hold locks for too long (for scheduling latency reasons).
- * So zap pages in ZAP_BLOCK_SIZE bytecounts.  This means we need to
- * return the ending mmu_gather to the caller.
+ * So zap pages in ZAP_BLOCK_SIZE bytecounts.
  *
  * Only addresses between `start' and `end' will be unmapped.
  *
  * The VMA list must be sorted in ascending virtual address order.
- *
- * unmap_vmas() assumes that the caller will flush the whole unmapped address
- * range after unmap_vmas() returns.  So the only responsibility here is to
- * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
- * drops the lock and schedules.
  */
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
-		struct vm_area_struct *vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *details)
 {
@@ -838,7 +830,15 @@
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	int fullmm = (*tlbp)->fullmm;
+	int fullmm;
+	struct mmu_gather *tlb;
+	struct mm_struct *mm = vma->vm_mm;
+
+	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
+	lru_add_drain();
+	tlb = tlb_gather_mmu(mm, 0);
+	update_hiwater_rss(mm);
+	fullmm = tlb->fullmm;
 
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
 		unsigned long end;
@@ -865,7 +865,7 @@
 						(HPAGE_SIZE / PAGE_SIZE);
 				start = end;
 			} else
-				start = unmap_page_range(*tlbp, vma,
+				start = unmap_page_range(tlb, vma,
 						start, end, &zap_work, details);
 
 			if (zap_work > 0) {
@@ -873,13 +873,15 @@
 				break;
 			}
 
-			tlb_finish_mmu(*tlbp, tlb_start, start);
+			tlb_finish_mmu(tlb, tlb_start, start);
 			cond_resched();
-			*tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
+			tlb = tlb_gather_mmu(vma->vm_mm, fullmm);
 			tlb_start_valid = 0;
 			zap_work = ZAP_BLOCK_SIZE;
 		}
 	}
+	tlb_finish_mmu(tlb, start_addr, end_addr);
+	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
 	return start;	/* which is now the end (or restart) address */
 }
 
@@ -893,20 +895,10 @@
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *details)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	struct mmu_gather *tlb;
 	unsigned long end = address + size;
 	unsigned long nr_accounted = 0;
 
-	lru_add_drain();
-	tlb = tlb_gather_mmu(mm, 0);
-	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, address, end);
-	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
-	mmu_notifier_invalidate_range_end(mm, address, end);
-	if (tlb)
-		tlb_finish_mmu(tlb, address, end);
-	return end;
+	return unmap_vmas(vma, address, end, &nr_accounted, details);
 }
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1742,19 +1742,12 @@
 		unsigned long start, unsigned long end)
 {
 	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
-	struct mmu_gather *tlb;
 	unsigned long nr_accounted = 0;
 
-	lru_add_drain();
-	tlb = tlb_gather_mmu(mm, 0);
-	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, start, end);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	tlb_finish_mmu(tlb, start, end);
 	free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
-	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 /*
@@ -2034,7 +2027,6 @@
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
-	struct mmu_gather *tlb;
 	struct vm_area_struct *vma = mm->mmap;
 	unsigned long nr_accounted = 0;
 	unsigned long end;
@@ -2045,12 +2037,9 @@
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb = tlb_gather_mmu(mm, 1);
-	/* Don't update_hiwater_rss(mm) here, do_exit already did */
-	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+
+	end = unmap_vmas(vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	tlb_finish_mmu(tlb, 0, end);
 	free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
 	/*


From andrea at qumranet.com  Wed Apr  2 14:30:08 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:08 +0200
Subject: [ofa-general] [PATCH 7 of 8] XPMEM would have used sys_madvise()
	except that madvise_dontneed()
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <31fc23193bd039cc595f.1207171808@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159059 -7200
# Node ID 31fc23193bd039cc595fba1ca149a9715f7d0fb2
# Parent  dd918e267ce1d054e8364a53adcecf3c7439cff4
XPMEM would have used sys_madvise() except that madvise_dontneed()
returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages
XPMEM imports from other partitions and is also true for uncached pages
allocated locally via the mspec allocator.  XPMEM needs zap_page_range()
functionality for these types of pages as well as 'normal' pages.

Signed-off-by: Dean Nelson <dcn at sgi.com>

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -900,6 +900,7 @@
 
 	return unmap_vmas(vma, address, end, &nr_accounted, details);
 }
+EXPORT_SYMBOL_GPL(zap_page_range);
 
 /*
  * Do a quick page-table lookup for a single page.


From andrea at qumranet.com  Wed Apr  2 14:30:09 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:09 +0200
Subject: [ofa-general] [PATCH 8 of 8] This patch adds a lock ordering rule to
	avoid a potential deadlock when
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <f3f119118b0abd9c4624.1207171809@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159059 -7200
# Node ID f3f119118b0abd9c4624263ef388dc7230d937fe
# Parent  31fc23193bd039cc595fba1ca149a9715f7d0fb2
This patch adds a lock ordering rule to avoid a potential deadlock when
multiple mmap_sems need to be locked.

Signed-off-by: Dean Nelson <dcn at sgi.com>

diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -79,6 +79,9 @@
  *
  *  ->i_mutex			(generic_file_buffered_write)
  *    ->mmap_sem		(fault_in_pages_readable->do_page_fault)
+ *
+ *    When taking multiple mmap_sems, one should lock the lowest-addressed
+ *    one first proceeding on up to the highest-addressed one.
  *
  *  ->i_mutex
  *    ->i_alloc_sem             (various)


From andrea at qumranet.com  Wed Apr  2 14:30:07 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 02 Apr 2008 23:30:07 +0200
Subject: [ofa-general] [PATCH 6 of 8] Convert the anon_vma spinlock to a rw
	semaphore. This allows concurrent
In-Reply-To: <patchbomb.1207171801@duo.random>
Message-ID: <dd918e267ce1d054e836.1207171807@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207159058 -7200
# Node ID dd918e267ce1d054e8364a53adcecf3c7439cff4
# Parent  316e5b1e4bf388ef0198c91b3067ed1e4171d7f6
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse maps for try_to_unmap and page_mkclean. It also
allows the calling of sleeping functions from reverse map traversal.

An additional complication is that rcu is used in some context to guarantee
the presence of the anon_vma while we acquire the lock. We cannot take a
semaphore within an rcu critical section. Add a refcount to the anon_vma
structure which allow us to give an existence guarantee for the anon_vma
structure independent of the spinlock or the list contents.

The refcount can then be taken within the RCU section. If it has been
taken successfully then the refcount guarantees the existence of the
anon_vma. The refcount in anon_vma also allows us to fix a nasty
issue in page migration where we fudged by using rcu for a long code
path to guarantee the existence of the anon_vma.

The refcount in general allows a shortening of RCU critical sections since
we can do a rcu_unlock after taking the refcount. This is particularly
relevant if the anon_vma chains contain hundreds of entries.

Issues:
- Atomic overhead increases in situations where a new reference
  to the anon_vma has to be established or removed. Overhead also increases
  when a speculative reference is used (try_to_unmap,
  page_mkclean, page migration). There is also the more frequent processor
  change due to up_xxx letting waiting tasks run first.
  This results in f.e. the Aim9 brk performance test to got down by 10-15%.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -25,7 +25,8 @@
  * pointing to this anon_vma once its vma list is empty.
  */
 struct anon_vma {
-	spinlock_t lock;	/* Serialize access to vma list */
+	atomic_t refcount;	/* vmas on the list */
+	struct rw_semaphore sem;/* Serialize access to vma list */
 	struct list_head head;	/* List of private "related" vmas */
 };
 
@@ -43,18 +44,31 @@
 	kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
+struct anon_vma *grab_anon_vma(struct page *page);
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->refcount);
+}
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_test(&anon_vma->refcount))
+		anon_vma_free(anon_vma);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_lock(&anon_vma->lock);
+		down_write(&anon_vma->sem);
 }
 
 static inline void anon_vma_unlock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 }
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -235,15 +235,16 @@
 		return;
 
 	/*
-	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+	 * We hold either the mmap_sem lock or a reference on the
+	 * anon_vma. So no need to call page_lock_anon_vma.
 	 */
 	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
+	down_read(&anon_vma->sem);
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
 		remove_migration_pte(vma, old, new);
 
-	spin_unlock(&anon_vma->lock);
+	up_read(&anon_vma->sem);
 }
 
 /*
@@ -623,7 +624,7 @@
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
-	int rcu_locked = 0;
+	struct anon_vma *anon_vma = NULL;
 	int charge = 0;
 
 	if (!newpage)
@@ -647,16 +648,14 @@
 	}
 	/*
 	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
-	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * we cannot notice that anon_vma is freed while we migrate a page.
 	 * This rcu_read_lock() delays freeing anon_vma pointer until the end
 	 * of migration. File cache pages are no problem because of page_lock()
 	 * File Caches may use write_page() or lock_page() in migration, then,
 	 * just care Anon page here.
 	 */
-	if (PageAnon(page)) {
-		rcu_read_lock();
-		rcu_locked = 1;
-	}
+	if (PageAnon(page))
+		anon_vma = grab_anon_vma(page);
 
 	/*
 	 * Corner case handling:
@@ -674,10 +673,7 @@
 		if (!PageAnon(page) && PagePrivate(page)) {
 			/*
 			 * Go direct to try_to_free_buffers() here because
-			 * a) that's what try_to_release_page() would do anyway
-			 * b) we may be under rcu_read_lock() here, so we can't
-			 *    use GFP_KERNEL which is what try_to_release_page()
-			 *    needs to be effective.
+			 * that's what try_to_release_page() would do anyway
 			 */
 			try_to_free_buffers(page);
 		}
@@ -698,8 +694,8 @@
 	} else if (charge)
  		mem_cgroup_end_migration(newpage);
 rcu_unlock:
-	if (rcu_locked)
-		rcu_read_unlock();
+	if (anon_vma)
+		put_anon_vma(anon_vma);
 
 unlock:
 
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -565,7 +565,7 @@
 	if (vma->anon_vma)
 		anon_vma = vma->anon_vma;
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
+		down_write(&anon_vma->sem);
 		/*
 		 * Easily overlooked: when mprotect shifts the boundary,
 		 * make sure the expanding vma has anon_vma set if the
@@ -619,7 +619,7 @@
 	}
 
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 	if (mapping)
 		up_write(&mapping->i_mmap_sem);
 
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -69,7 +69,7 @@
 		if (anon_vma) {
 			allocated = NULL;
 			locked = anon_vma;
-			spin_lock(&locked->lock);
+			down_write(&locked->sem);
 		} else {
 			anon_vma = anon_vma_alloc();
 			if (unlikely(!anon_vma))
@@ -81,6 +81,7 @@
 		/* page_table_lock to protect against threads */
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
+			get_anon_vma(anon_vma);
 			vma->anon_vma = anon_vma;
 			list_add_tail(&vma->anon_vma_node, &anon_vma->head);
 			allocated = NULL;
@@ -88,7 +89,7 @@
 		spin_unlock(&mm->page_table_lock);
 
 		if (locked)
-			spin_unlock(&locked->lock);
+			up_write(&locked->sem);
 		if (unlikely(allocated))
 			anon_vma_free(allocated);
 	}
@@ -99,14 +100,17 @@
 {
 	BUG_ON(vma->anon_vma != next->anon_vma);
 	list_del(&next->anon_vma_node);
+	put_anon_vma(vma->anon_vma);
 }
 
 void __anon_vma_link(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 
-	if (anon_vma)
+	if (anon_vma) {
+		get_anon_vma(anon_vma);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
+	}
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -114,36 +118,32 @@
 	struct anon_vma *anon_vma = vma->anon_vma;
 
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
+		get_anon_vma(anon_vma);
+		down_write(&anon_vma->sem);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 	}
 }
 
 void anon_vma_unlink(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
-	int empty;
 
 	if (!anon_vma)
 		return;
 
-	spin_lock(&anon_vma->lock);
+	down_write(&anon_vma->sem);
 	list_del(&vma->anon_vma_node);
-
-	/* We must garbage collect the anon_vma if it's empty */
-	empty = list_empty(&anon_vma->head);
-	spin_unlock(&anon_vma->lock);
-
-	if (empty)
-		anon_vma_free(anon_vma);
+	up_write(&anon_vma->sem);
+	put_anon_vma(anon_vma);
 }
 
 static void anon_vma_ctor(struct kmem_cache *cachep, void *data)
 {
 	struct anon_vma *anon_vma = data;
 
-	spin_lock_init(&anon_vma->lock);
+	init_rwsem(&anon_vma->sem);
+	atomic_set(&anon_vma->refcount, 0);
 	INIT_LIST_HEAD(&anon_vma->head);
 }
 
@@ -157,9 +157,9 @@
  * Getting a lock on a stable anon_vma from a page off the LRU is
  * tricky: page_lock_anon_vma rely on RCU to guard against the races.
  */
-static struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *grab_anon_vma(struct page *page)
 {
-	struct anon_vma *anon_vma;
+	struct anon_vma *anon_vma = NULL;
 	unsigned long anon_mapping;
 
 	rcu_read_lock();
@@ -170,17 +170,26 @@
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
-	return anon_vma;
+	if (!atomic_inc_not_zero(&anon_vma->refcount))
+		anon_vma = NULL;
 out:
 	rcu_read_unlock();
-	return NULL;
+	return anon_vma;
+}
+
+static struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+	struct anon_vma *anon_vma = grab_anon_vma(page);
+
+	if (anon_vma)
+		down_read(&anon_vma->sem);
+	return anon_vma;
 }
 
 static void page_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
-	rcu_read_unlock();
+	up_read(&anon_vma->sem);
+	put_anon_vma(anon_vma);
 }
 
 /*


From andrea at qumranet.com  Wed Apr  2 14:53:34 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 2 Apr 2008 23:53:34 +0200
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
Message-ID: <20080402215334.GT19189@duo.random>

On Wed, Apr 02, 2008 at 10:59:50AM -0700, Christoph Lameter wrote:
> Did I see #v10? Could you start a new subject when you post please? Do 
> not respond to some old message otherwise the threading will be wrong.

I wasn't clear enough, #v10 was in the works... I was thinking about
the last two issues before posting it.

> How exactly does the GRU corrupt memory?

Jack added synchronize_rcu, I assume for a reason.

>  
> >    Another less obviously safe approach is to allow the register
> >    method to succeed only when mm_users=1 and the task is single
> >    threaded. This way if all the places where the mmu notifers aren't
> >    invoked on the mm not by the current task, are only doing
> >    invalidates after/before zapping ptes, if the istantiation of new
> >    ptes is single threaded too, we shouldn't worry if we miss an
> >    invalidate for a pte that is zero and doesn't point to any physical
> >    page. In the places where current->mm != mm I'm using
> >    invalidate_page 99% of the time, and that only follows the
> >    ptep_clear_flush. The problem are the range_begin that will happen
> >    before zapping the pte in places where current->mm !=
> >    mm. Unfortunately in my incremental patch where I move all
> >    invalidate_page outside of the PT lock to prepare for allowing
> >    sleeping inside the mmu notifiers, I used range_begin/end in places
> >    like try_to_unmap_cluster where current->mm != mm. In general
> >    this solution looks more fragile than the seqlock.
> 
> Hmmm... Okay that is one solution that would just require a BUG_ON in the 
> registration methods.

Perhaps you didn't notice that this solution can't work if you call
range_begin/end not in the "current" context and try_to_unmap_cluster
does exactly that for both my patchset and yours. Missing an _end is
ok, missing a _begin is never ok.

> Well doesnt the requirement of just one execution thread also deal with 
> that issue?

Yes, except again it can't work for try_to_unmap_cluster.

This solution is only applicable to #v10 if I fix try_to_unmap_cluster
to only call invalidate_page (relaying on the fact the VM holds a pin
and a lock on any page that is being mmu-notifier-invalidated).

You can't use the single threaded approach to solve either 1 or 2,
because your _begin call is called anywhere and that's where you call
the secondary-tlb flush and it's fatal to miss it.

invalidate_page is called always after, so it enforced the tlb flush
to be called _after_ and so it's inherently safe.


From andrea at qumranet.com  Wed Apr  2 14:56:04 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 2 Apr 2008 23:56:04 +0200
Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and
	refcount
In-Reply-To: <Pine.LNX.4.64.0804021107520.27337@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205636.777127252@sgi.com>
	<20080402175058.GR19189@duo.random>
	<Pine.LNX.4.64.0804021107520.27337@schroedinger.engr.sgi.com>
Message-ID: <20080402215604.GU19189@duo.random>

On Wed, Apr 02, 2008 at 11:15:26AM -0700, Christoph Lameter wrote:
> On Wed, 2 Apr 2008, Andrea Arcangeli wrote:
> 
> > On Tue, Apr 01, 2008 at 01:55:36PM -0700, Christoph Lameter wrote:
> > >   This results in f.e. the Aim9 brk performance test to got down by 10-15%.
> > 
> > I guess it's more likely because of overscheduling for small crtitical
> > sections, did you counted the total number of context switches? I
> > guess there will be a lot more with your patch applied. That
> > regression is a showstopper and it is the reason why I've suggested
> > before to add a CONFIG_XPMEM or CONFIG_MMU_NOTIFIER_SLEEP config
> > option to make the VM locks sleep capable only when XPMEM=y
> > (PREEMPT_RT will enable it too). Thanks for doing the benchmark work!
> 
> There are more context switches if locks are contended. 
> 
> But that has actually also some good aspects because we avoid busy loops 
> and can potentially continue work in another process.

That would be the case if the "wait time" would be longer than the
scheduling time, the whole point is that with anonvma the write side
is so fast it's likely never worth scheduling (probably not even with
preempt-rt for the write side, the read side is an entirely different
matter but the read side can run concurrently if the system is heavy
paging), hence the slowdown. What you benchmarked is the write side,
which is also the fast path when the system is heavily CPU bound. I've
to say aim is a great benchmark to test this regression.

But I think a config option will solve all of this.


From clameter at sgi.com  Wed Apr  2 14:54:52 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 14:54:52 -0700 (PDT)
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080402215334.GT19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<20080402215334.GT19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021453350.31247@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> > Hmmm... Okay that is one solution that would just require a BUG_ON in the 
> > registration methods.
> 
> Perhaps you didn't notice that this solution can't work if you call
> range_begin/end not in the "current" context and try_to_unmap_cluster
> does exactly that for both my patchset and yours. Missing an _end is
> ok, missing a _begin is never ok.

If you look at the patch you will see a requirement of holding a 
writelock on mmap_sem which will keep out get_user_pages().


From clameter at sgi.com  Wed Apr  2 14:56:25 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 14:56:25 -0700 (PDT)
Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and
	refcount
In-Reply-To: <20080402215604.GU19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205636.777127252@sgi.com>
	<20080402175058.GR19189@duo.random>
	<Pine.LNX.4.64.0804021107520.27337@schroedinger.engr.sgi.com>
	<20080402215604.GU19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021455180.31247@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> paging), hence the slowdown. What you benchmarked is the write side,
> which is also the fast path when the system is heavily CPU bound. I've
> to say aim is a great benchmark to test this regression.

I am a bit surprised that brk performance is that important. There may be 
other measurement that have to be made to assess how this would impact a 
real load.


From andrea at qumranet.com  Wed Apr  2 15:01:48 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 00:01:48 +0200
Subject: [ofa-general] Re: EMM: Require single threadedness for registration.
In-Reply-To: <Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
Message-ID: <20080402220148.GV19189@duo.random>

On Wed, Apr 02, 2008 at 02:05:28PM -0700, Christoph Lameter wrote:
> Here is a patch to require single threaded execution during emm_register. 
> This also allows an easy implementation of an unregister function and gets
> rid of the races that Andrea worried about.

That would work for #v10 if I remove the invalidate_range_start from
try_to_unmap_cluster, it can't work for EMM because you've
emm_invalidate_start firing anywhere outside the context of the
current task (even regular rmap code, not just nonlinear corner case
will trigger the race). In short the single threaded approach would be
workable only thanks to the fact #v10 has the notion of
invalidate_page for flushing the tlb _after_ and to avoid blocking the
secondary page fault during swapping. In the kvm case I don't want to
block the page fault for anything but madvise which is strictly only
used after guest inflated the balloon, and the existence of
invalidate_page allows that optimization, and allows not to serialize
against the kvm page fault during all regular page faults when the
invalidate_page is called while the page is pinned by the VM.

The requirement for invalidate_page is that the pte and linux tlb are
flushed _before_ and the page is freed _after_ the invalidate_page
method. that's not the case for _begin/_end. The page is freed well
before _end runs, hence the need of _begin and to block the secondary
mmu page fault during the vma-mangling operations.

#v10 takes care of all this, and despite I could perhaps fix the
remaining two issues using the single-threaded enforcement I
suggested, I preferred to go safe and spend an unsigned per-mm in case
anybody needs to attach at runtime, the single threaded restriction
didn't look very clean.


From clameter at sgi.com  Wed Apr  2 15:03:12 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 15:03:12 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 2 of 8] Moves all mmu notifier methods
 outside the PT lock (first and not last
In-Reply-To: <fe00cb9deeb314673963.1207171803@duo.random>
References: <fe00cb9deeb314673963.1207171803@duo.random>
Message-ID: <Pine.LNX.4.64.0804021459560.31247@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> diff --git a/mm/memory.c b/mm/memory.c
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1626,9 +1626,10 @@
>  			 */
>  			page_table = pte_offset_map_lock(mm, pmd, address,
>  							 &ptl);
> -			page_cache_release(old_page);
> +			new_page = NULL;
>  			if (!pte_same(*page_table, orig_pte))
>  				goto unlock;
> +			page_cache_release(old_page);
>  
>  			page_mkwrite = 1;
>  		}

This is deferring frees and not moving the callouts. KVM specific? What 
exactly is this doing?

A significant portion of this seems to be undoing what the first patch 
did.


From clameter at sgi.com  Wed Apr  2 15:06:19 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 15:06:19 -0700 (PDT)
Subject: [ofa-general] Re: EMM: Require single threadedness for registration.
In-Reply-To: <20080402220148.GV19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>

On Thu, 3 Apr 2008, Andrea Arcangeli wrote:

> That would work for #v10 if I remove the invalidate_range_start from
> try_to_unmap_cluster, it can't work for EMM because you've
> emm_invalidate_start firing anywhere outside the context of the
> current task (even regular rmap code, not just nonlinear corner case
> will trigger the race). In short the single threaded approach would be

But in that case it will be firing for a callback to another mm_struct. 
The notifiers are bound to mm_structs and keep separate contexts.

> The requirement for invalidate_page is that the pte and linux tlb are
> flushed _before_ and the page is freed _after_ the invalidate_page
> method. that's not the case for _begin/_end. The page is freed well
> before _end runs, hence the need of _begin and to block the secondary
> mmu page fault during the vma-mangling operations.

You could flush in _begin and free on _end? I thought you are taking a 
refcount on the page? You can drop the refcount only on _end to ensure 
that the page does not go away before.


From andrea at qumranet.com  Wed Apr  2 15:09:36 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 00:09:36 +0200
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <Pine.LNX.4.64.0804021453350.31247@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<20080402215334.GT19189@duo.random>
	<Pine.LNX.4.64.0804021453350.31247@schroedinger.engr.sgi.com>
Message-ID: <20080402220936.GW19189@duo.random>

On Wed, Apr 02, 2008 at 02:54:52PM -0700, Christoph Lameter wrote:
> On Wed, 2 Apr 2008, Andrea Arcangeli wrote:
> 
> > > Hmmm... Okay that is one solution that would just require a BUG_ON in the 
> > > registration methods.
> > 
> > Perhaps you didn't notice that this solution can't work if you call
> > range_begin/end not in the "current" context and try_to_unmap_cluster
> > does exactly that for both my patchset and yours. Missing an _end is
> > ok, missing a _begin is never ok.
> 
> If you look at the patch you will see a requirement of holding a 
> writelock on mmap_sem which will keep out get_user_pages().

I said try_to_unmap_cluster, not get_user_pages.

  CPU0					CPU1
  try_to_unmap_cluster:
  emm_invalidate_start in EMM (or mmu_notifier_invalidate_range_start in #v10)
  walking the list by hand in EMM (or with hlist cleaner in #v10)
  xpmem method invoked
  schedule for a long while inside invalidate_range_start while skbs are sent
					gru registers
					synchronize_rcu (sorry useless now)
					single threaded, so taking a page fault
  					secondary tlb instantiated
  xpm method returns
  end of the list (didn't notice that it has to restart to flush the gru)
  zap pte
  free the page
					gru corrupts memory

CPU 1 was single threaded, CPU0 doesn't hold any mmap_sem or any other
lock that could ever serialize against the GRU as far as I can tell.

In general my #v10 solution mixing seqlock + rcu looks more robust and
allows multithreaded attachment of mmu notifers as well. I could have
fixed it with the single threaded thanks to the fact the only place
outside the mm->mmap_sem is try_to_unmap_cluster for me but it wasn't
simple to convert, nor worth it, given nonlinear isn't worth
optimizing for (not even the core VM cares about try_to_unmap_cluster
which is infact the only place in the VM with a O(N) complexity for
each try_to_unmap call, where N is the size of the mapping divided by
page_size).


From andrea at qumranet.com  Wed Apr  2 15:12:28 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 00:12:28 +0200
Subject: [ofa-general] Re: [patch 5/9] Convert anon_vma lock to rw_sem and
	refcount
In-Reply-To: <Pine.LNX.4.64.0804021455180.31247@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205636.777127252@sgi.com>
	<20080402175058.GR19189@duo.random>
	<Pine.LNX.4.64.0804021107520.27337@schroedinger.engr.sgi.com>
	<20080402215604.GU19189@duo.random>
	<Pine.LNX.4.64.0804021455180.31247@schroedinger.engr.sgi.com>
Message-ID: <20080402221228.GX19189@duo.random>

On Wed, Apr 02, 2008 at 02:56:25PM -0700, Christoph Lameter wrote:
> I am a bit surprised that brk performance is that important. There may be 

I think it's not brk but fork that is being slowed down, did you
oprofile? AIM forks a lot... The write side fast path generating the
overscheduling I guess is when the new vmas are created for the child
and queued in the parent anon-vma in O(1), so immediate, even
preempt-rt would be ok with it spinning and not scheduling, it's just
a list_add (much faster than schedule() indeed). Every time there's a
collision when multiple child forks simultaneously and they all try to
queue in the same anon-vma, things will slowdown.


From andrea at qumranet.com  Wed Apr  2 15:17:16 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 00:17:16 +0200
Subject: [ofa-general] Re: EMM: Require single threadedness for registration.
In-Reply-To: <Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
Message-ID: <20080402221716.GY19189@duo.random>

On Wed, Apr 02, 2008 at 03:06:19PM -0700, Christoph Lameter wrote:
> On Thu, 3 Apr 2008, Andrea Arcangeli wrote:
> 
> > That would work for #v10 if I remove the invalidate_range_start from
> > try_to_unmap_cluster, it can't work for EMM because you've
> > emm_invalidate_start firing anywhere outside the context of the
> > current task (even regular rmap code, not just nonlinear corner case
> > will trigger the race). In short the single threaded approach would be
> 
> But in that case it will be firing for a callback to another mm_struct. 
> The notifiers are bound to mm_structs and keep separate contexts.

Why can't it fire on the mm_struct where GRU just registered? That
mm_struct existed way before GRU registered, and VM is free to unmap
it w/o mmap_sem if there was any memory pressure.

> You could flush in _begin and free on _end? I thought you are taking a 
> refcount on the page? You can drop the refcount only on _end to ensure 
> that the page does not go away before.

we're going to lock + flush on begin and unlock on _end w/o
refcounting to microoptimize. Free is done by
unmap_vmas/madvise/munmap at will. That's a very slow path, inflating
the balloon is not problematic. But invalidate_page allows to avoid
blocking page faults during swapping so minor faults can happen and
refresh the pte young bits etc... When the VM unmaps the page while
holding the page pin, there's no race and that's where invalidate_page
is being used to generate lower invalidation overhead.


From clameter at sgi.com  Wed Apr  2 15:34:01 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 15:34:01 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 1 of 8] Core of mmu notifiers
In-Reply-To: <a406c0cc686d0ca94a4d.1207171802@duo.random>
References: <a406c0cc686d0ca94a4d.1207171802@duo.random>
Message-ID: <Pine.LNX.4.64.0804021527370.31603@schroedinger.engr.sgi.com>

On Wed, 2 Apr 2008, Andrea Arcangeli wrote:

> +	void (*invalidate_page)(struct mmu_notifier *mn,
> +				struct mm_struct *mm,
> +				unsigned long address);
> +
> +	void (*invalidate_range_start)(struct mmu_notifier *mn,
> +				       struct mm_struct *mm,
> +				       unsigned long start, unsigned long end);
> +	void (*invalidate_range_end)(struct mmu_notifier *mn,
> +				     struct mm_struct *mm,
> +				     unsigned long start, unsigned long end);

Still two methods ...

> +void __mmu_notifier_release(struct mm_struct *mm)
> +{
> +	struct mmu_notifier *mn;
> +	unsigned seq;
> +
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
> +	while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
> +		mn = hlist_entry(mm->mmu_notifier_list.first,
> +				 struct mmu_notifier,
> +				 hlist);
> +		hlist_del(&mn->hlist);
> +		if (mn->ops->release)
> +			mn->ops->release(mn, mm);
> +		BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq));
> +	}
> +}

seqlock just taken for checking if everything is ok?

> +
> +/*
> + * If no young bitflag is supported by the hardware, ->clear_flush_young can
> + * unmap the address and return 1 or 0 depending if the mapping previously
> + * existed or not.
> + */
> +int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
> +					unsigned long address)
> +{
> +	struct mmu_notifier *mn;
> +	struct hlist_node *n;
> +	int young = 0;
> +	unsigned seq;
> +
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
> +	do {
> +		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> +			if (mn->ops->clear_flush_young)
> +				young |= mn->ops->clear_flush_young(mn, mm,
> +								    address);
> +		}
> +	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
> +

The critical section could be run multiple times for one callback which 
could result in multiple callbacks to clear the young bit. Guess not that 
big of an issue?

> +void __mmu_notifier_invalidate_page(struct mm_struct *mm,
> +					  unsigned long address)
> +{
> +	struct mmu_notifier *mn;
> +	struct hlist_node *n;
> +	unsigned seq;
> +
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
> +	do {
> +		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> +			if (mn->ops->invalidate_page)
> +				mn->ops->invalidate_page(mn, mm, address);
> +		}
> +	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
> +}

Ok. Retry would try to invalidate the page a second time which is not a 
problem unless you would drop the refcount or make other state changes 
that require correspondence with mapping. I guess this is the reason 
that you stopped adding a refcount?

> +void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> +				  unsigned long start, unsigned long end)
> +{
> +	struct mmu_notifier *mn;
> +	struct hlist_node *n;
> +	unsigned seq;
> +
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
> +	do {
> +		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> +			if (mn->ops->invalidate_range_start)
> +				mn->ops->invalidate_range_start(mn, mm,
> +								start, end);
> +		}
> +	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
> +}

Multiple invalidate_range_starts on the same range? This means the driver 
needs to be able to deal with the situation and ignore the repeated 
call?

> +void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
> +				  unsigned long start, unsigned long end)
> +{
> +	struct mmu_notifier *mn;
> +	struct hlist_node *n;
> +	unsigned seq;
> +
> +	seq = read_seqbegin(&mm->mmu_notifier_lock);
> +	do {
> +		hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> +			if (mn->ops->invalidate_range_end)
> +				mn->ops->invalidate_range_end(mn, mm,
> +							      start, end);
> +		}
> +	} while (read_seqretry(&mm->mmu_notifier_lock, seq));
> +}

Retry can lead to multiple invalidate_range callbacks with the same 
parameters? Driver needs to ignore if the range is already clear?


From clameter at sgi.com  Wed Apr  2 15:41:34 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 15:41:34 -0700 (PDT)
Subject: [ofa-general] Re: EMM: Require single threadedness for registration.
In-Reply-To: <20080402221716.GY19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021536020.32132@schroedinger.engr.sgi.com>

On Thu, 3 Apr 2008, Andrea Arcangeli wrote:

> Why can't it fire on the mm_struct where GRU just registered? That
> mm_struct existed way before GRU registered, and VM is free to unmap
> it w/o mmap_sem if there was any memory pressure.

Right. Hmmm... Bad situation. We would have invalidate_start take
a lock to prevent registration until _end has run.

We could use stop_machine_run to register the notifier.... ;-).


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:01 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:01 -0700
Subject: [ofa-general] [PATCH 0/20] IB/ipath -- DDR HCA patches in for-roland
	for 2.6.26
Message-ID: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>

The following patches add the remaining changes needed to fully
support the QLogic 7220 DDR HCAs. These also will make 2.6.26
match what is in OFED-1.3 plus some recent minor fixes and code style
clean up.

These can also be pulled into Roland's infiniband.git for-2.6.26 repo using:
git pull git://git.qlogic.com/ipath-linux-2.6 for-roland


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:06 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:06 -0700
Subject: [ofa-general] [PATCH 01/20] IB/ipath - Allow old and new diagnostic
	packet formats
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224906.28598.75040.stgit@eng-46.mv.qlogic.com>

From: Michael Albaugh <Michael.Albaugh at Qlogic.com>

This patch checks for old and new format writes to send a
packet via the diagnostic interface.

Signed-off-by: Michael Albaugh <Michael.Albaugh at Qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_diag.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c
index af59bf3..c9bfd82 100644
--- a/drivers/infiniband/hw/ipath/ipath_diag.c
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c
@@ -332,12 +332,17 @@ static ssize_t ipath_diagpkt_write(struct file *fp,
 	u64 val;
 	u32 l_state, lt_state; /* LinkState, LinkTrainingState */
 
-	if (count != sizeof(dp)) {
+	if (count < sizeof(odp)) {
 		ret = -EINVAL;
 		goto bail;
 	}
 
-	if (copy_from_user(&dp, data, sizeof(dp))) {
+	if (count == sizeof(dp)) {
+		if (copy_from_user(&dp, data, sizeof(dp))) {
+			ret = -EFAULT;
+			goto bail;
+		}
+	} else if (copy_from_user(&odp, data, sizeof(odp))) {
 		ret = -EFAULT;
 		goto bail;
 	}


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:11 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:11 -0700
Subject: [ofa-general] [PATCH 02/20] IB/ipath - fix some white space and code
	style issues
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224911.28598.34434.stgit@eng-46.mv.qlogic.com>

This patch makes some white space changes and minor non-functional changes
to more closely match the code in OFED-1.3.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c    |   29 ++++++++++++++-----------
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    7 +++---
 drivers/infiniband/hw/ipath/ipath_intr.c      |   16 +++++++-------
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    4 ++-
 drivers/infiniband/hw/ipath/ipath_registers.h |    4 ++-
 drivers/infiniband/hw/ipath/ipath_stats.c     |   13 ++++++-----
 6 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index dfa009a..f79d9cc 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -627,7 +627,8 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 	goto bail;
 
 bail_irqsetup:
-	if (pdev->irq) free_irq(pdev->irq, dd);
+	if (pdev->irq)
+		free_irq(pdev->irq, dd);
 
 bail_iounmap:
 	iounmap((volatile void __iomem *) dd->ipath_kregbase);
@@ -1704,7 +1705,10 @@ bail:
 void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl)
 {
 	ipath_dbg("Cancelling all in-progress send buffers\n");
-	dd->ipath_lastcancel = jiffies+HZ/2; /* skip armlaunch errs a bit */
+
+	/* skip armlaunch errs for a while */
+	dd->ipath_lastcancel = jiffies + HZ / 2;
+
 	/*
 	 * the abort bit is auto-clearing.  We read scratch to be sure
 	 * that cancels and the abort have taken effect in the chip.
@@ -2070,9 +2074,8 @@ void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val)
 		dd->ipath_led_override_timer.data = (unsigned long) dd;
 		dd->ipath_led_override_timer.expires = jiffies + 1;
 		add_timer(&dd->ipath_led_override_timer);
-	} else {
+	} else
 		atomic_dec(&dd->ipath_led_override_timer_active);
-	}
 }
 
 /**
@@ -2220,12 +2223,12 @@ void ipath_free_pddata(struct ipath_devdata *dd, struct ipath_portdata *pd)
 			   "ipath_port0_skbinfo @ %p\n", pd->port_port,
 			   skbinfo);
 		for (e = 0; e < dd->ipath_rcvegrcnt; e++)
-		if (skbinfo[e].skb) {
-			pci_unmap_single(dd->pcidev, skbinfo[e].phys,
-					 dd->ipath_ibmaxlen,
-					 PCI_DMA_FROMDEVICE);
-			dev_kfree_skb(skbinfo[e].skb);
-		}
+			if (skbinfo[e].skb) {
+				pci_unmap_single(dd->pcidev, skbinfo[e].phys,
+						 dd->ipath_ibmaxlen,
+						 PCI_DMA_FROMDEVICE);
+				dev_kfree_skb(skbinfo[e].skb);
+			}
 		vfree(skbinfo);
 	}
 	kfree(pd->port_tid_pg_list);
@@ -2468,10 +2471,10 @@ void ipath_hol_event(unsigned long opaque)
 int ipath_set_rx_pol_inv(struct ipath_devdata *dd, u8 new_pol_inv)
 {
 	u64 val;
-	if ( new_pol_inv > INFINIPATH_XGXS_RX_POL_MASK ) {
+
+	if (new_pol_inv > INFINIPATH_XGXS_RX_POL_MASK)
 		return -1;
-	}
-	if ( dd->ipath_rx_pol_inv != new_pol_inv ) {
+	if (dd->ipath_rx_pol_inv != new_pol_inv) {
 		dd->ipath_rx_pol_inv = new_pol_inv;
 		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig);
 		val &= ~(INFINIPATH_XGXS_RX_POL_MASK <<
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 786a5e0..94f938f 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -645,7 +645,6 @@ done:
 	return ret;
 }
 
-
 /**
  * ipath_init_chip - do the actual initialization sequence on the chip
  * @dd: the infinipath device
@@ -754,7 +753,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 
 	dd->ipath_f_early_init(dd);
 	/*
-	 * cancel any possible active sends from early driver load.
+	 * Cancel any possible active sends from early driver load.
 	 * Follows early_init because some chips have to initialize
 	 * PIO buffers in early_init to avoid false parity errors.
 	 */
@@ -884,7 +883,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 			&dd->pcidev->dev, pd->port_rcvhdrq_size,
 			&dd->ipath_dummy_hdrq_phys,
 			gfp_flags);
-		if (!dd->ipath_dummy_hdrq ) {
+		if (!dd->ipath_dummy_hdrq) {
 			dev_info(&dd->pcidev->dev,
 				"Couldn't allocate 0x%lx bytes for dummy hdrq\n",
 				pd->port_rcvhdrq_size);
@@ -899,7 +898,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	 */
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, 0ULL);
 
-	if(!dd->ipath_stats_timer_active) {
+	if (!dd->ipath_stats_timer_active) {
 		/*
 		 * first init, or after an admin disable/enable
 		 * set up stats retrieval timer, even if we had errors
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index d1e13a4..41329e7 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -590,18 +590,19 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 		 * ones on this particular interrupt, which also isn't great
 		 */
 		dd->ipath_maskederrs |= dd->ipath_lasterror | errs;
+
 		dd->ipath_errormask &= ~dd->ipath_maskederrs;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask,
-			dd->ipath_errormask);
+				 dd->ipath_errormask);
 		s_iserr = ipath_decode_err(msg, sizeof msg,
-			dd->ipath_maskederrs);
+					   dd->ipath_maskederrs);
 
 		if (dd->ipath_maskederrs &
-			~(INFINIPATH_E_RRCVEGRFULL |
-			INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS))
+		    ~(INFINIPATH_E_RRCVEGRFULL |
+		      INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS))
 			ipath_dev_err(dd, "Temporarily disabling "
 			    "error(s) %llx reporting; too frequent (%s)\n",
-				(unsigned long long)dd->ipath_maskederrs,
+				(unsigned long long) dd->ipath_maskederrs,
 				msg);
 		else {
 			/*
@@ -786,7 +787,6 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 	return chkerrpkts;
 }
 
-
 /*
  * try to cleanup as much as possible for anything that might have gone
  * wrong while in freeze mode, such as pio buffers being written by user
@@ -974,6 +974,7 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat)
 		   dd->ipath_i_rcvurg_mask);
 	for (i = 1; i < dd->ipath_cfgports; i++) {
 		struct ipath_portdata *pd = dd->ipath_pd[i];
+
 		if (portr & (1 << i) && pd && pd->port_cnt) {
 			if (test_and_clear_bit(IPATH_PORT_WAITING_RCV,
 					       &pd->port_flag)) {
@@ -1095,8 +1096,7 @@ irqreturn_t ipath_intr(int irq, void *data)
 
 		gpiostatus = ipath_read_kreg32(
 			dd, dd->ipath_kregs->kr_gpio_status);
-		/* First the error-counter case.
-		 */
+		/* First the error-counter case. */
 		if ((gpiostatus & IPATH_GPIO_ERRINTR_MASK) &&
 		    (dd->ipath_flags & IPATH_GPIO_ERRINTRS)) {
 			/* want to clear the bits we see asserted. */
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index f10442f..8018383 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -812,7 +812,7 @@ void ipath_hol_event(unsigned long);
  */
 		/* chip can report link latency (IB 1.2) */
 #define IPATH_HAS_LINK_LATENCY 0x1
-/* The chip is up and initted */
+		/* The chip is up and initted */
 #define IPATH_INITTED       0x2
 		/* set if any user code has set kr_rcvhdrsize */
 #define IPATH_RCVHDRSZ_SET  0x4
@@ -1148,7 +1148,7 @@ extern struct mutex ipath_mutex;
 
 # define __IPATH_DBG_WHICH(which,fmt,...) \
 	do { \
-		if(unlikely(ipath_debug&(which))) \
+		if (unlikely(ipath_debug & (which))) \
 			printk(KERN_DEBUG IPATH_DRV_NAME ": %s: " fmt, \
 			       __func__,##__VA_ARGS__); \
 	} while(0)
diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h
index 61e5621..f49f184 100644
--- a/drivers/infiniband/hw/ipath/ipath_registers.h
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h
@@ -186,8 +186,8 @@
 #define INFINIPATH_IBCC_LINKINITCMD_SLEEP 3
 #define INFINIPATH_IBCC_LINKINITCMD_SHIFT 16
 #define INFINIPATH_IBCC_LINKCMD_MASK 0x3ULL
-#define INFINIPATH_IBCC_LINKCMD_DOWN 1	/* move to 0x11 */
-#define INFINIPATH_IBCC_LINKCMD_ARMED 2	/* move to 0x21 */
+#define INFINIPATH_IBCC_LINKCMD_DOWN 1		/* move to 0x11 */
+#define INFINIPATH_IBCC_LINKCMD_ARMED 2		/* move to 0x21 */
 #define INFINIPATH_IBCC_LINKCMD_ACTIVE 3	/* move to 0x31 */
 #define INFINIPATH_IBCC_LINKCMD_SHIFT 18
 #define INFINIPATH_IBCC_MAXPKTLEN_MASK 0x7FFULL
diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c
index d2725cd..57eb1d5 100644
--- a/drivers/infiniband/hw/ipath/ipath_stats.c
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c
@@ -293,8 +293,8 @@ void ipath_get_faststats(unsigned long opaque)
 		iserr = ipath_decode_err(ebuf, sizeof ebuf,
 			dd->ipath_maskederrs);
 		if (dd->ipath_maskederrs &
-				~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL |
-				INFINIPATH_E_PKTERRS ))
+		    ~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL |
+		      INFINIPATH_E_PKTERRS))
 			ipath_dev_err(dd, "Re-enabling masked errors "
 				      "(%s)\n", ebuf);
 		else {
@@ -306,17 +306,18 @@ void ipath_get_faststats(unsigned long opaque)
 			 * level.
 			 */
 			if (iserr)
-					ipath_dbg("Re-enabling queue full errors (%s)\n",
-							ebuf);
+				ipath_dbg(
+					"Re-enabling queue full errors (%s)\n",
+					ebuf);
 			else
 				ipath_cdbg(ERRPKT, "Re-enabling packet"
-						" problem interrupt (%s)\n", ebuf);
+					" problem interrupt (%s)\n", ebuf);
 		}
 
 		/* re-enable masked errors */
 		dd->ipath_errormask |= dd->ipath_maskederrs;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask,
-			dd->ipath_errormask);
+				 dd->ipath_errormask);
 		dd->ipath_maskederrs = 0;
 	}
 

From ralph.campbell at qlogic.com  Wed Apr  2 15:49:16 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:16 -0700
Subject: [ofa-general] [PATCH 03/20] IB/ipath - add support for 7220 receive
	queue changes
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224916.28598.52413.stgit@eng-46.mv.qlogic.com>

Newer HCAs have a HW option to write a sequence number to each receive
queue entry and avoid a separate DMA of the tail register to memory.
This patch adds support for these changes.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h    |   31 ++++
 drivers/infiniband/hw/ipath/ipath_driver.c    |  194 ++++++++++++++-----------
 drivers/infiniband/hw/ipath/ipath_file_ops.c  |   34 ++--
 drivers/infiniband/hw/ipath/ipath_iba6110.c   |    2 
 drivers/infiniband/hw/ipath/ipath_iba6120.c   |    2 
 drivers/infiniband/hw/ipath/ipath_init_chip.c |  152 +++++++++++---------
 drivers/infiniband/hw/ipath/ipath_intr.c      |   46 +++---
 drivers/infiniband/hw/ipath/ipath_kernel.h    |   53 +++++--
 drivers/infiniband/hw/ipath/ipath_registers.h |    2 
 drivers/infiniband/hw/ipath/ipath_stats.c     |   14 +-
 10 files changed, 305 insertions(+), 225 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index 591901a..edd4183 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -198,7 +198,7 @@ typedef enum _ipath_ureg {
 #define IPATH_RUNTIME_FORCE_WC_ORDER	0x4
 #define IPATH_RUNTIME_RCVHDR_COPY	0x8
 #define IPATH_RUNTIME_MASTER	0x10
-/* 0x20 and 0x40 are no longer used, but are reserved for ABI compatibility */
+#define IPATH_RUNTIME_NODMA_RTAIL 0x80
 #define IPATH_RUNTIME_FORCE_PIOAVAIL 0x400
 #define IPATH_RUNTIME_PIO_REGSWAPPED 0x800
 
@@ -662,8 +662,12 @@ struct infinipath_counters {
 #define INFINIPATH_RHF_LENGTH_SHIFT 0
 #define INFINIPATH_RHF_RCVTYPE_MASK 0x7
 #define INFINIPATH_RHF_RCVTYPE_SHIFT 11
-#define INFINIPATH_RHF_EGRINDEX_MASK 0x7FF
+#define INFINIPATH_RHF_EGRINDEX_MASK 0xFFF
 #define INFINIPATH_RHF_EGRINDEX_SHIFT 16
+#define INFINIPATH_RHF_SEQ_MASK 0xF
+#define INFINIPATH_RHF_SEQ_SHIFT 0
+#define INFINIPATH_RHF_HDRQ_OFFSET_MASK 0x7FF
+#define INFINIPATH_RHF_HDRQ_OFFSET_SHIFT 4
 #define INFINIPATH_RHF_H_ICRCERR   0x80000000
 #define INFINIPATH_RHF_H_VCRCERR   0x40000000
 #define INFINIPATH_RHF_H_PARITYERR 0x20000000
@@ -673,6 +677,8 @@ struct infinipath_counters {
 #define INFINIPATH_RHF_H_TIDERR    0x02000000
 #define INFINIPATH_RHF_H_MKERR     0x01000000
 #define INFINIPATH_RHF_H_IBERR     0x00800000
+#define INFINIPATH_RHF_H_ERR_MASK  0xFF800000
+#define INFINIPATH_RHF_L_USE_EGR   0x80000000
 #define INFINIPATH_RHF_L_SWA       0x00008000
 #define INFINIPATH_RHF_L_SWB       0x00004000
 
@@ -696,6 +702,7 @@ struct infinipath_counters {
 /* SendPIO per-buffer control */
 #define INFINIPATH_SP_TEST    0x40
 #define INFINIPATH_SP_TESTEBP 0x20
+#define INFINIPATH_SP_TRIGGER_SHIFT  15
 
 /* SendPIOAvail bits */
 #define INFINIPATH_SENDPIOAVAIL_BUSY_SHIFT 1
@@ -762,6 +769,7 @@ struct ether_header {
 #define IPATH_MSN_MASK 0xFFFFFF
 #define IPATH_QPN_MASK 0xFFFFFF
 #define IPATH_MULTICAST_LID_BASE 0xC000
+#define IPATH_EAGER_TID_ID INFINIPATH_I_TID_MASK
 #define IPATH_MULTICAST_QPN 0xFFFFFF
 
 /* Receive Header Queue: receive type (from infinipath) */
@@ -781,7 +789,7 @@ struct ether_header {
  */
 static inline __u32 ipath_hdrget_err_flags(const __le32 * rbuf)
 {
-	return __le32_to_cpu(rbuf[1]);
+	return __le32_to_cpu(rbuf[1]) & INFINIPATH_RHF_H_ERR_MASK;
 }
 
 static inline __u32 ipath_hdrget_rcv_type(const __le32 * rbuf)
@@ -802,6 +810,23 @@ static inline __u32 ipath_hdrget_index(const __le32 * rbuf)
 	    & INFINIPATH_RHF_EGRINDEX_MASK;
 }
 
+static inline __u32 ipath_hdrget_seq(const __le32 *rbuf)
+{
+	return (__le32_to_cpu(rbuf[1]) >> INFINIPATH_RHF_SEQ_SHIFT)
+		& INFINIPATH_RHF_SEQ_MASK;
+}
+
+static inline __u32 ipath_hdrget_offset(const __le32 *rbuf)
+{
+	return (__le32_to_cpu(rbuf[1]) >> INFINIPATH_RHF_HDRQ_OFFSET_SHIFT)
+		& INFINIPATH_RHF_HDRQ_OFFSET_MASK;
+}
+
+static inline __u32 ipath_hdrget_use_egr_buf(const __le32 *rbuf)
+{
+	return __le32_to_cpu(rbuf[0]) & INFINIPATH_RHF_L_USE_EGR;
+}
+
 static inline __u32 ipath_hdrget_ipath_ver(__le32 hdrword)
 {
 	return (__le32_to_cpu(hdrword) >> INFINIPATH_I_VERS_SHIFT)
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index f79d9cc..eef2599 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -41,7 +41,6 @@
 
 #include "ipath_kernel.h"
 #include "ipath_verbs.h"
-#include "ipath_common.h"
 
 static void ipath_update_pio_bufs(struct ipath_devdata *);
 
@@ -720,6 +719,8 @@ static void __devexit cleanup_device(struct ipath_devdata *dd)
 		tmpp = dd->ipath_pageshadow;
 		dd->ipath_pageshadow = NULL;
 		vfree(tmpp);
+
+		dd->ipath_egrtidbase = NULL;
 	}
 
 	/*
@@ -1078,18 +1079,17 @@ static void ipath_rcv_hdrerr(struct ipath_devdata *dd,
 			     u32 eflags,
 			     u32 l,
 			     u32 etail,
-			     u64 *rc)
+			     __le32 *rhf_addr,
+			     struct ipath_message_header *hdr)
 {
 	char emsg[128];
-	struct ipath_message_header *hdr;
 
 	get_rhf_errstring(eflags, emsg, sizeof emsg);
-	hdr = (struct ipath_message_header *)&rc[1];
 	ipath_cdbg(PKT, "RHFerrs %x hdrqtail=%x typ=%u "
 		   "tlen=%x opcode=%x egridx=%x: %s\n",
 		   eflags, l,
-		   ipath_hdrget_rcv_type((__le32 *) rc),
-		   ipath_hdrget_length_in_bytes((__le32 *) rc),
+		   ipath_hdrget_rcv_type(rhf_addr),
+		   ipath_hdrget_length_in_bytes(rhf_addr),
 		   be32_to_cpu(hdr->bth[0]) >> 24,
 		   etail, emsg);
 
@@ -1114,55 +1114,52 @@ static void ipath_rcv_hdrerr(struct ipath_devdata *dd,
  */
 void ipath_kreceive(struct ipath_portdata *pd)
 {
-	u64 *rc;
 	struct ipath_devdata *dd = pd->port_dd;
+	__le32 *rhf_addr;
 	void *ebuf;
 	const u32 rsize = dd->ipath_rcvhdrentsize;	/* words */
 	const u32 maxcnt = dd->ipath_rcvhdrcnt * rsize;	/* words */
 	u32 etail = -1, l, hdrqtail;
 	struct ipath_message_header *hdr;
-	u32 eflags, i, etype, tlen, pkttot = 0, updegr=0, reloop=0;
+	u32 eflags, i, etype, tlen, pkttot = 0, updegr = 0, reloop = 0;
 	static u64 totcalls;	/* stats, may eventually remove */
-
-	if (!dd->ipath_hdrqtailptr) {
-		ipath_dev_err(dd,
-			      "hdrqtailptr not set, can't do receives\n");
-		goto bail;
-	}
+	int last;
 
 	l = pd->port_head;
-	hdrqtail = ipath_get_rcvhdrtail(pd);
-	if (l == hdrqtail)
-		goto bail;
+	rhf_addr = (__le32 *) pd->port_rcvhdrq + l + dd->ipath_rhf_offset;
+	if (dd->ipath_flags & IPATH_NODMA_RTAIL) {
+		u32 seq = ipath_hdrget_seq(rhf_addr);
 
-reloop:
-	for (i = 0; l != hdrqtail; i++) {
-		u32 qp;
-		u8 *bthbytes;
-
-		rc = (u64 *) (pd->port_rcvhdrq + (l << 2));
-		hdr = (struct ipath_message_header *)&rc[1];
-		/*
-		 * could make a network order version of IPATH_KD_QP, and
-		 * do the obvious shift before masking to speed this up.
-		 */
-		qp = ntohl(hdr->bth[1]) & 0xffffff;
-		bthbytes = (u8 *) hdr->bth;
+		if (seq != pd->port_seq_cnt)
+			goto bail;
+		hdrqtail = 0;
+	} else {
+		hdrqtail = ipath_get_rcvhdrtail(pd);
+		if (l == hdrqtail)
+			goto bail;
+		smp_rmb();
+	}
 
-		eflags = ipath_hdrget_err_flags((__le32 *) rc);
-		etype = ipath_hdrget_rcv_type((__le32 *) rc);
+reloop:
+	for (last = 0, i = 1; !last; i++) {
+		hdr = dd->ipath_f_get_msgheader(dd, rhf_addr);
+		eflags = ipath_hdrget_err_flags(rhf_addr);
+		etype = ipath_hdrget_rcv_type(rhf_addr);
 		/* total length */
-		tlen = ipath_hdrget_length_in_bytes((__le32 *) rc);
+		tlen = ipath_hdrget_length_in_bytes(rhf_addr);
 		ebuf = NULL;
-		if (etype != RCVHQ_RCV_TYPE_EXPECTED) {
+		if ((dd->ipath_flags & IPATH_NODMA_RTAIL) ?
+		    ipath_hdrget_use_egr_buf(rhf_addr) :
+		    (etype != RCVHQ_RCV_TYPE_EXPECTED)) {
 			/*
-			 * it turns out that the chips uses an eager buffer
+			 * It turns out that the chip uses an eager buffer
 			 * for all non-expected packets, whether it "needs"
 			 * one or not.  So always get the index, but don't
 			 * set ebuf (so we try to copy data) unless the
 			 * length requires it.
 			 */
-			etail = ipath_hdrget_index((__le32 *) rc);
+			etail = ipath_hdrget_index(rhf_addr);
+			updegr = 1;
 			if (tlen > sizeof(*hdr) ||
 			    etype == RCVHQ_RCV_TYPE_NON_KD)
 				ebuf = ipath_get_egrbuf(dd, etail);
@@ -1173,75 +1170,91 @@ reloop:
 		 * packets; only ipathhdrerr should be set.
 		 */
 
-		if (etype != RCVHQ_RCV_TYPE_NON_KD && etype !=
-		    RCVHQ_RCV_TYPE_ERROR && ipath_hdrget_ipath_ver(
-			    hdr->iph.ver_port_tid_offset) !=
-		    IPS_PROTO_VERSION) {
+		if (etype != RCVHQ_RCV_TYPE_NON_KD &&
+		    etype != RCVHQ_RCV_TYPE_ERROR &&
+		    ipath_hdrget_ipath_ver(hdr->iph.ver_port_tid_offset) !=
+		    IPS_PROTO_VERSION)
 			ipath_cdbg(PKT, "Bad InfiniPath protocol version "
 				   "%x\n", etype);
-		}
 
 		if (unlikely(eflags))
-			ipath_rcv_hdrerr(dd, eflags, l, etail, rc);
+			ipath_rcv_hdrerr(dd, eflags, l, etail, rhf_addr, hdr);
 		else if (etype == RCVHQ_RCV_TYPE_NON_KD) {
-			ipath_ib_rcv(dd->verbs_dev, rc + 1, ebuf, tlen);
+			ipath_ib_rcv(dd->verbs_dev, (u32 *)hdr, ebuf, tlen);
 			if (dd->ipath_lli_counter)
 				dd->ipath_lli_counter--;
+		} else if (etype == RCVHQ_RCV_TYPE_EAGER) {
+			u8 opcode = be32_to_cpu(hdr->bth[0]) >> 24;
+			u32 qp = be32_to_cpu(hdr->bth[1]) & 0xffffff;
 			ipath_cdbg(PKT, "typ %x, opcode %x (eager, "
 				   "qp=%x), len %x; ignored\n",
-				   etype, bthbytes[0], qp, tlen);
+				   etype, opcode, qp, tlen);
 		}
-		else if (etype == RCVHQ_RCV_TYPE_EAGER)
-			ipath_cdbg(PKT, "typ %x, opcode %x (eager, "
-				   "qp=%x), len %x; ignored\n",
-				   etype, bthbytes[0], qp, tlen);
 		else if (etype == RCVHQ_RCV_TYPE_EXPECTED)
 			ipath_dbg("Bug: Expected TID, opcode %x; ignored\n",
-				  be32_to_cpu(hdr->bth[0]) & 0xff);
+				  be32_to_cpu(hdr->bth[0]) >> 24);
 		else {
 			/*
 			 * error packet, type of error unknown.
 			 * Probably type 3, but we don't know, so don't
 			 * even try to print the opcode, etc.
+			 * Usually caused by a "bad packet", that has no
+			 * BTH, when the LRH says it should.
 			 */
-			ipath_dbg("Error Pkt, but no eflags! egrbuf %x, "
-				  "len %x\nhdrq@%lx;hdrq+%x rhf: %llx; "
-				  "hdr %llx %llx %llx %llx %llx\n",
-				  etail, tlen, (unsigned long) rc, l,
-				  (unsigned long long) rc[0],
-				  (unsigned long long) rc[1],
-				  (unsigned long long) rc[2],
-				  (unsigned long long) rc[3],
-				  (unsigned long long) rc[4],
-				  (unsigned long long) rc[5]);
+			ipath_cdbg(ERRPKT, "Error Pkt, but no eflags! egrbuf"
+				  " %x, len %x hdrq+%x rhf: %Lx\n",
+				  etail, tlen, l,
+				  le64_to_cpu(*(__le64 *) rhf_addr));
+			if (ipath_debug & __IPATH_ERRPKTDBG) {
+				u32 j, *d, dw = rsize-2;
+				if (rsize > (tlen>>2))
+					dw = tlen>>2;
+				d = (u32 *)hdr;
+				printk(KERN_DEBUG "EPkt rcvhdr(%x dw):\n",
+					dw);
+				for (j = 0; j < dw; j++)
+					printk(KERN_DEBUG "%8x%s", d[j],
+						(j%8) == 7 ? "\n" : " ");
+				printk(KERN_DEBUG ".\n");
+			}
 		}
 		l += rsize;
 		if (l >= maxcnt)
 			l = 0;
-		if (etype != RCVHQ_RCV_TYPE_EXPECTED)
-		    updegr = 1;
+		rhf_addr = (__le32 *) pd->port_rcvhdrq +
+			l + dd->ipath_rhf_offset;
+		if (dd->ipath_flags & IPATH_NODMA_RTAIL) {
+			u32 seq = ipath_hdrget_seq(rhf_addr);
+
+			if (++pd->port_seq_cnt > 13)
+				pd->port_seq_cnt = 1;
+			if (seq != pd->port_seq_cnt)
+				last = 1;
+		} else if (l == hdrqtail)
+			last = 1;
 		/*
 		 * update head regs on last packet, and every 16 packets.
 		 * Reduce bus traffic, while still trying to prevent
 		 * rcvhdrq overflows, for when the queue is nearly full
 		 */
-		if (l == hdrqtail || (i && !(i&0xf))) {
-			u64 lval;
-			if (l == hdrqtail)
-				/* request IBA6120 interrupt only on last */
-				lval = dd->ipath_rhdrhead_intr_off | l;
-			else
-				lval = l;
-			ipath_write_ureg(dd, ur_rcvhdrhead, lval, 0);
+		if (last || !(i & 0xf)) {
+			u64 lval = l;
+
+			/* request IBA6120 and 7220 interrupt only on last */
+			if (last)
+				lval |= dd->ipath_rhdrhead_intr_off;
+			ipath_write_ureg(dd, ur_rcvhdrhead, lval,
+				pd->port_port);
 			if (updegr) {
 				ipath_write_ureg(dd, ur_rcvegrindexhead,
-						 etail, 0);
+						 etail, pd->port_port);
 				updegr = 0;
 			}
 		}
 	}
 
-	if (!dd->ipath_rhdrhead_intr_off && !reloop) {
+	if (!dd->ipath_rhdrhead_intr_off && !reloop &&
+	    !(dd->ipath_flags & IPATH_NODMA_RTAIL)) {
 		/* IBA6110 workaround; we can have a race clearing chip
 		 * interrupt with another interrupt about to be delivered,
 		 * and can clear it before it is delivered on the GPIO
@@ -1638,19 +1651,27 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd,
 			ret = -ENOMEM;
 			goto bail;
 		}
-		pd->port_rcvhdrtail_kvaddr = dma_alloc_coherent(
-			&dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail, GFP_KERNEL);
-		if (!pd->port_rcvhdrtail_kvaddr) {
-			ipath_dev_err(dd, "attempt to allocate 1 page "
-				      "for port %u rcvhdrqtailaddr failed\n",
-				      pd->port_port);
-			ret = -ENOMEM;
-			dma_free_coherent(&dd->pcidev->dev, amt,
-					  pd->port_rcvhdrq, pd->port_rcvhdrq_phys);
-			pd->port_rcvhdrq = NULL;
-			goto bail;
+
+		if (!(dd->ipath_flags & IPATH_NODMA_RTAIL)) {
+			pd->port_rcvhdrtail_kvaddr = dma_alloc_coherent(
+				&dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail,
+				GFP_KERNEL);
+			if (!pd->port_rcvhdrtail_kvaddr) {
+				ipath_dev_err(dd, "attempt to allocate 1 page "
+					"for port %u rcvhdrqtailaddr "
+					"failed\n", pd->port_port);
+				ret = -ENOMEM;
+				dma_free_coherent(&dd->pcidev->dev, amt,
+					pd->port_rcvhdrq,
+					pd->port_rcvhdrq_phys);
+				pd->port_rcvhdrq = NULL;
+				goto bail;
+			}
+			pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail;
+			ipath_cdbg(VERBOSE, "port %d hdrtailaddr, %llx "
+				   "physical\n", pd->port_port,
+				   (unsigned long long) phys_hdrqtail);
 		}
-		pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail;
 
 		pd->port_rcvhdrq_size = amt;
 
@@ -1660,10 +1681,6 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd,
 			   (unsigned long) pd->port_rcvhdrq_phys,
 			   (unsigned long) pd->port_rcvhdrq_size,
 			   pd->port_port);
-
-		ipath_cdbg(VERBOSE, "port %d hdrtailaddr, %llx physical\n",
-			   pd->port_port,
-			   (unsigned long long) phys_hdrqtail);
 	}
 	else
 		ipath_cdbg(VERBOSE, "reuse port %d rcvhdrq @%p %llx phys; "
@@ -1687,7 +1704,6 @@ int ipath_create_rcvhdrq(struct ipath_devdata *dd,
 	ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdraddr,
 			      pd->port_port, pd->port_rcvhdrq_phys);
 
-	ret = 0;
 bail:
 	return ret;
 }
@@ -2222,7 +2238,7 @@ void ipath_free_pddata(struct ipath_devdata *dd, struct ipath_portdata *pd)
 		ipath_cdbg(VERBOSE, "free closed port %d "
 			   "ipath_port0_skbinfo @ %p\n", pd->port_port,
 			   skbinfo);
-		for (e = 0; e < dd->ipath_rcvegrcnt; e++)
+		for (e = 0; e < dd->ipath_p0_rcvegrcnt; e++)
 			if (skbinfo[e].skb) {
 				pci_unmap_single(dd->pcidev, skbinfo[e].phys,
 						 dd->ipath_ibmaxlen,
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 1b232b2..17d4e97 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1930,22 +1930,25 @@ static int ipath_do_user_init(struct file *fp,
 	pd->port_hdrqfull_poll = pd->port_hdrqfull;
 
 	/*
-	 * now enable the port; the tail registers will be written to memory
-	 * by the chip as soon as it sees the write to
-	 * dd->ipath_kregs->kr_rcvctrl.  The update only happens on
-	 * transition from 0 to 1, so clear it first, then set it as part of
-	 * enabling the port.  This will (very briefly) affect any other
-	 * open ports, but it shouldn't be long enough to be an issue.
-	 * We explictly set the in-memory copy to 0 beforehand, so we don't
-	 * have to wait to be sure the DMA update has happened.
+	 * Now enable the port for receive.
+	 * For chips that are set to DMA the tail register to memory
+	 * when they change (and when the update bit transitions from
+	 * 0 to 1.  So for those chips, we turn it off and then back on.
+	 * This will (very briefly) affect any other open ports, but the
+	 * duration is very short, and therefore isn't an issue.  We
+	 * explictly set the in-memory tail copy to 0 beforehand, so we
+	 * don't have to wait to be sure the DMA update has happened
+	 * (chip resets head/tail to 0 on transition to enable).
 	 */
-	if (pd->port_rcvhdrtail_kvaddr)
-		ipath_clear_rcvhdrtail(pd);
 	set_bit(dd->ipath_r_portenable_shift + pd->port_port,
 		&dd->ipath_rcvctrl);
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
+	if (!(dd->ipath_flags & IPATH_NODMA_RTAIL)) {
+		if (pd->port_rcvhdrtail_kvaddr)
+			ipath_clear_rcvhdrtail(pd);
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 			dd->ipath_rcvctrl &
 			~(1ULL << dd->ipath_r_tailupd_shift));
+	}
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 			 dd->ipath_rcvctrl);
 	/* Notify any waiting slaves */
@@ -1973,14 +1976,15 @@ static void unlock_expected_tids(struct ipath_portdata *pd)
 	ipath_cdbg(VERBOSE, "Port %u unlocking any locked expTID pages\n",
 		   pd->port_port);
 	for (i = port_tidbase; i < maxtid; i++) {
-		if (!dd->ipath_pageshadow[i])
+		struct page *ps = dd->ipath_pageshadow[i];
+
+		if (!ps)
 			continue;
 
+		dd->ipath_pageshadow[i] = NULL;
 		pci_unmap_page(dd->pcidev, dd->ipath_physshadow[i],
 			PAGE_SIZE, PCI_DMA_FROMDEVICE);
-		ipath_release_user_pages_on_close(&dd->ipath_pageshadow[i],
-						  1);
-		dd->ipath_pageshadow[i] = NULL;
+		ipath_release_user_pages_on_close(&ps, 1);
 		cnt++;
 		ipath_stats.sps_pageunlocks++;
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6110.c b/drivers/infiniband/hw/ipath/ipath_iba6110.c
index d241f1c..02831ad 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6110.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6110.c
@@ -306,7 +306,9 @@ static const struct ipath_cregs ipath_ht_cregs = {
 
 /* kr_intstatus, kr_intclear, kr_intmask bits */
 #define INFINIPATH_I_RCVURG_MASK ((1U<<9)-1)
+#define INFINIPATH_I_RCVURG_SHIFT 0
 #define INFINIPATH_I_RCVAVAIL_MASK ((1U<<9)-1)
+#define INFINIPATH_I_RCVAVAIL_SHIFT 12
 
 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
 #define INFINIPATH_HWE_HTCMEMPARITYERR_SHIFT 0
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index ce0f40f..907b61b 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -316,7 +316,9 @@ static const struct ipath_cregs ipath_pe_cregs = {
 
 /* kr_intstatus, kr_intclear, kr_intmask bits */
 #define INFINIPATH_I_RCVURG_MASK ((1U<<5)-1)
+#define INFINIPATH_I_RCVURG_SHIFT 0
 #define INFINIPATH_I_RCVAVAIL_MASK ((1U<<5)-1)
+#define INFINIPATH_I_RCVAVAIL_SHIFT 12
 
 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
 #define INFINIPATH_HWE_PCIEMEMPARITYERR_MASK  0x000000000000003fULL
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 94f938f..720ff4d 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -219,14 +219,14 @@ static struct ipath_portdata *create_portdata0(struct ipath_devdata *dd)
 		pd->port_cnt = 1;
 		/* The port 0 pkey table is used by the layer interface. */
 		pd->port_pkeys[0] = IPATH_DEFAULT_P_KEY;
+		pd->port_seq_cnt = 1;
 	}
 	return pd;
 }
 
-static int init_chip_first(struct ipath_devdata *dd,
-			   struct ipath_portdata **pdp)
+static int init_chip_first(struct ipath_devdata *dd)
 {
-	struct ipath_portdata *pd = NULL;
+	struct ipath_portdata *pd;
 	int ret = 0;
 	u64 val;
 
@@ -242,12 +242,14 @@ static int init_chip_first(struct ipath_devdata *dd,
 	else if (ipath_cfgports <= dd->ipath_portcnt) {
 		dd->ipath_cfgports = ipath_cfgports;
 		ipath_dbg("Configured to use %u ports out of %u in chip\n",
-			  dd->ipath_cfgports, dd->ipath_portcnt);
+			  dd->ipath_cfgports, ipath_read_kreg32(dd,
+			  dd->ipath_kregs->kr_portcnt));
 	} else {
 		dd->ipath_cfgports = dd->ipath_portcnt;
 		ipath_dbg("Tried to configured to use %u ports; chip "
 			  "only supports %u\n", ipath_cfgports,
-			  dd->ipath_portcnt);
+			  ipath_read_kreg32(dd,
+				  dd->ipath_kregs->kr_portcnt));
 	}
 	/*
 	 * Allocate full portcnt array, rather than just cfgports, because
@@ -324,36 +326,39 @@ static int init_chip_first(struct ipath_devdata *dd,
 	mutex_init(&dd->ipath_eep_lock);
 
 done:
-	*pdp = pd;
 	return ret;
 }
 
 /**
  * init_chip_reset - re-initialize after a reset, or enable
  * @dd: the infinipath device
- * @pdp: output for port data
  *
  * sanity check at least some of the values after reset, and
  * ensure no receive or transmit (explictly, in case reset
  * failed
  */
-static int init_chip_reset(struct ipath_devdata *dd,
-			   struct ipath_portdata **pdp)
+static int init_chip_reset(struct ipath_devdata *dd)
 {
 	u32 rtmp;
+	int i;
+
+	/*
+	 * ensure chip does no sends or receives, tail updates, or
+	 * pioavail updates while we re-initialize
+	 */
+	dd->ipath_rcvctrl &= ~(1ULL << dd->ipath_r_tailupd_shift);
+	for (i = 0; i < dd->ipath_portcnt; i++) {
+		clear_bit(dd->ipath_r_portenable_shift + i,
+			  &dd->ipath_rcvctrl);
+		clear_bit(dd->ipath_r_intravail_shift + i,
+			  &dd->ipath_rcvctrl);
+	}
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
+		dd->ipath_rcvctrl);
 
-	*pdp = dd->ipath_pd[0];
-	/* ensure chip does no sends or receives while we re-initialize */
-	dd->ipath_control = dd->ipath_sendctrl = dd->ipath_rcvctrl = 0U;
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl, dd->ipath_rcvctrl);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_control, dd->ipath_control);
 
-	rtmp = ipath_read_kreg32(dd, dd->ipath_kregs->kr_portcnt);
-	if (dd->ipath_portcnt != rtmp)
-		dev_info(&dd->pcidev->dev, "portcnt was %u before "
-			 "reset, now %u, using original\n",
-			 dd->ipath_portcnt, rtmp);
 	rtmp = ipath_read_kreg32(dd, dd->ipath_kregs->kr_rcvtidcnt);
 	if (rtmp != dd->ipath_rcvtidcnt)
 		dev_info(&dd->pcidev->dev, "tidcnt was %u before "
@@ -456,10 +461,10 @@ static void init_shadow_tids(struct ipath_devdata *dd)
 	dd->ipath_physshadow = addrs;
 }
 
-static void enable_chip(struct ipath_devdata *dd,
-			struct ipath_portdata *pd, int reinit)
+static void enable_chip(struct ipath_devdata *dd, int reinit)
 {
 	u32 val;
+	u64 rcvmask;
 	unsigned long flags;
 	int i;
 
@@ -478,12 +483,15 @@ static void enable_chip(struct ipath_devdata *dd,
 	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
 
 	/*
-	 * enable port 0 receive, and receive interrupt.  other ports
-	 * done as user opens and inits them.
+	 * Enable kernel ports' receive and receive interrupt.
+	 * Other ports done as user opens and inits them.
 	 */
-	dd->ipath_rcvctrl = (1ULL << dd->ipath_r_tailupd_shift) |
-		(1ULL << dd->ipath_r_portenable_shift) |
-		(1ULL << dd->ipath_r_intravail_shift);
+	rcvmask = 1ULL;
+	dd->ipath_rcvctrl |= (rcvmask << dd->ipath_r_portenable_shift) |
+		(rcvmask << dd->ipath_r_intravail_shift);
+	if (!(dd->ipath_flags & IPATH_NODMA_RTAIL))
+		dd->ipath_rcvctrl |= (1ULL << dd->ipath_r_tailupd_shift);
+
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 			 dd->ipath_rcvctrl);
 
@@ -494,8 +502,8 @@ static void enable_chip(struct ipath_devdata *dd,
 	dd->ipath_flags |= IPATH_INITTED;
 
 	/*
-	 * init our shadow copies of head from tail values, and write
-	 * head values to match.
+	 * Init our shadow copies of head from tail values,
+	 * and write head values to match.
 	 */
 	val = ipath_read_ureg32(dd, ur_rcvegrindextail, 0);
 	ipath_write_ureg(dd, ur_rcvegrindexhead, val, 0);
@@ -529,8 +537,7 @@ static void enable_chip(struct ipath_devdata *dd,
 	dd->ipath_flags |= IPATH_PRESENT;
 }
 
-static int init_housekeeping(struct ipath_devdata *dd,
-			     struct ipath_portdata **pdp, int reinit)
+static int init_housekeeping(struct ipath_devdata *dd, int reinit)
 {
 	char boardn[32];
 	int ret = 0;
@@ -591,18 +598,9 @@ static int init_housekeeping(struct ipath_devdata *dd,
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_errorclear,
 			 INFINIPATH_E_RESET);
 
-	if (reinit)
-		ret = init_chip_reset(dd, pdp);
-	else
-		ret = init_chip_first(dd, pdp);
-
-	if (ret)
-		goto done;
-
-	ipath_cdbg(VERBOSE, "Revision %llx (PCI %x), %u ports, %u tids, "
-		   "%u egrtids\n", (unsigned long long) dd->ipath_revision,
-		   dd->ipath_pcirev, dd->ipath_portcnt, dd->ipath_rcvtidcnt,
-		   dd->ipath_rcvegrcnt);
+	ipath_cdbg(VERBOSE, "Revision %llx (PCI %x)\n",
+		   (unsigned long long) dd->ipath_revision,
+		   dd->ipath_pcirev);
 
 	if (((dd->ipath_revision >> INFINIPATH_R_SOFTWARE_SHIFT) &
 	     INFINIPATH_R_SOFTWARE_MASK) != IPATH_CHIP_SWVERSION) {
@@ -641,6 +639,14 @@ static int init_housekeeping(struct ipath_devdata *dd,
 
 	ipath_dbg("%s", dd->ipath_boardversion);
 
+	if (ret)
+		goto done;
+
+	if (reinit)
+		ret = init_chip_reset(dd);
+	else
+		ret = init_chip_first(dd);
+
 done:
 	return ret;
 }
@@ -666,11 +672,11 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	u32 val32, kpiobufs;
 	u32 piobufs, uports;
 	u64 val;
-	struct ipath_portdata *pd = NULL; /* keep gcc4 happy */
+	struct ipath_portdata *pd;
 	gfp_t gfp_flags = GFP_USER | __GFP_COMP;
 	unsigned long flags;
 
-	ret = init_housekeeping(dd, &pd, reinit);
+	ret = init_housekeeping(dd, reinit);
 	if (ret)
 		goto done;
 
@@ -690,7 +696,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	 * we now use routines that backend onto __get_free_pages, the
 	 * rest would be wasted.
 	 */
-	dd->ipath_rcvhdrcnt = dd->ipath_rcvegrcnt;
+	dd->ipath_rcvhdrcnt = max(dd->ipath_p0_rcvegrcnt, dd->ipath_rcvegrcnt);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvhdrcnt,
 			 dd->ipath_rcvhdrcnt);
 
@@ -721,8 +727,8 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	if (kpiobufs + (uports * IPATH_MIN_USER_PORT_BUFCNT) > piobufs) {
 		int i = (int) piobufs -
 			(int) (uports * IPATH_MIN_USER_PORT_BUFCNT);
-		if (i < 0)
-			i = 0;
+		if (i < 1)
+			i = 1;
 		dev_info(&dd->pcidev->dev, "Allocating %d PIO bufs of "
 			 "%d for kernel leaves too few for %d user ports "
 			 "(%d each); using %u\n", kpiobufs,
@@ -741,6 +747,7 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 		ipath_dbg("allocating %u pbufs/port leaves %u unused, "
 			  "add to kernel\n", dd->ipath_pbufsport, val32);
 		dd->ipath_lastport_piobuf -= val32;
+		kpiobufs += val32;
 		ipath_dbg("%u pbufs/port leaves %u unused, add to kernel\n",
 			  dd->ipath_pbufsport, val32);
 	}
@@ -759,8 +766,10 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	 */
 	ipath_cancel_sends(dd, 0);
 
-	/* early_init sets rcvhdrentsize and rcvhdrsize, so this must be
-	 * done after early_init */
+	/*
+	 * Early_init sets rcvhdrentsize and rcvhdrsize, so this must be
+	 * done after early_init.
+	 */
 	dd->ipath_hdrqlast =
 		dd->ipath_rcvhdrentsize * (dd->ipath_rcvhdrcnt - 1);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvhdrentsize,
@@ -835,58 +844,65 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	/* enable errors that are masked, at least this first time. */
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask,
 			 ~dd->ipath_maskederrs);
-	dd->ipath_errormask = ipath_read_kreg64(dd,
-		dd->ipath_kregs->kr_errormask);
+	dd->ipath_maskederrs = 0; /* don't re-enable ignored in timer */
+	dd->ipath_errormask =
+		ipath_read_kreg64(dd, dd->ipath_kregs->kr_errormask);
 	/* clear any interrupts up to this point (ints still not enabled) */
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, -1LL);
 
+	dd->ipath_f_tidtemplate(dd);
+
 	/*
 	 * Set up the port 0 (kernel) rcvhdr q and egr TIDs.  If doing
 	 * re-init, the simplest way to handle this is to free
 	 * existing, and re-allocate.
 	 * Need to re-create rest of port 0 portdata as well.
 	 */
+	pd = dd->ipath_pd[0];
 	if (reinit) {
-		/* Alloc and init new ipath_portdata for port0,
+		struct ipath_portdata *npd;
+
+		/*
+		 * Alloc and init new ipath_portdata for port0,
 		 * Then free old pd. Could lead to fragmentation, but also
 		 * makes later support for hot-swap easier.
 		 */
-		struct ipath_portdata *npd;
 		npd = create_portdata0(dd);
 		if (npd) {
 			ipath_free_pddata(dd, pd);
-			dd->ipath_pd[0] = pd = npd;
+			dd->ipath_pd[0] = npd;
+			pd = npd;
 		} else {
-			ipath_dev_err(dd, "Unable to allocate portdata for"
-				      "  port 0, failing\n");
+			ipath_dev_err(dd, "Unable to allocate portdata"
+				      " for port 0, failing\n");
 			ret = -ENOMEM;
 			goto done;
 		}
 	}
-	dd->ipath_f_tidtemplate(dd);
 	ret = ipath_create_rcvhdrq(dd, pd);
-	if (!ret) {
-		dd->ipath_hdrqtailptr =
-			(volatile __le64 *)pd->port_rcvhdrtail_kvaddr;
+	if (!ret)
 		ret = create_port0_egr(dd);
-	}
-	if (ret)
-		ipath_dev_err(dd, "failed to allocate port 0 (kernel) "
+	if (ret) {
+		ipath_dev_err(dd, "failed to allocate kernel port's "
 			      "rcvhdrq and/or egr bufs\n");
+		goto done;
+	}
 	else
-		enable_chip(dd, pd, reinit);
+		enable_chip(dd, reinit);
 
-
-	if (!ret && !reinit) {
-	    /* used when we close a port, for DMA already in flight at close */
+	if (!reinit) {
+		/*
+		 * Used when we close a port, for DMA already in flight
+		 * at close.
+		 */
 		dd->ipath_dummy_hdrq = dma_alloc_coherent(
-			&dd->pcidev->dev, pd->port_rcvhdrq_size,
+			&dd->pcidev->dev, dd->ipath_pd[0]->port_rcvhdrq_size,
 			&dd->ipath_dummy_hdrq_phys,
 			gfp_flags);
 		if (!dd->ipath_dummy_hdrq) {
 			dev_info(&dd->pcidev->dev,
 				"Couldn't allocate 0x%lx bytes for dummy hdrq\n",
-				pd->port_rcvhdrq_size);
+				dd->ipath_pd[0]->port_rcvhdrq_size);
 			/* fallback to just 0'ing */
 			dd->ipath_dummy_hdrq_phys = 0UL;
 		}
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index 41329e7..826b96b 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -695,8 +695,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 			struct ipath_portdata *pd = dd->ipath_pd[i];
 			if (i == 0) {
 				hd = pd->port_head;
-				tl = (u32) le64_to_cpu(
-					*dd->ipath_hdrqtailptr);
+				tl = ipath_get_hdrqtail(pd);
 			} else if (pd && pd->port_cnt &&
 				   pd->port_rcvhdrtail_kvaddr) {
 				/*
@@ -732,8 +731,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 		 * vs user)
 		 */
 		ipath_stats.sps_etidfull++;
-		if (pd->port_head !=
-		    (u32) le64_to_cpu(*dd->ipath_hdrqtailptr))
+		if (pd->port_head != ipath_get_hdrqtail(pd))
 			chkerrpkts = 1;
 	}
 
@@ -952,7 +950,7 @@ set:
  * process was waiting for a packet to arrive, and didn't want
  * to poll
  */
-static void handle_urcv(struct ipath_devdata *dd, u32 istat)
+static void handle_urcv(struct ipath_devdata *dd, u64 istat)
 {
 	u64 portr;
 	int i;
@@ -968,10 +966,10 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat)
 	 * and ipath_poll_next()...
 	 */
 	rmb();
-	portr = ((istat >> INFINIPATH_I_RCVAVAIL_SHIFT) &
-		 dd->ipath_i_rcvavail_mask)
-		| ((istat >> INFINIPATH_I_RCVURG_SHIFT) &
-		   dd->ipath_i_rcvurg_mask);
+	portr = ((istat >> dd->ipath_i_rcvavail_shift) &
+		 dd->ipath_i_rcvavail_mask) |
+		((istat >> dd->ipath_i_rcvurg_shift) &
+		 dd->ipath_i_rcvurg_mask);
 	for (i = 1; i < dd->ipath_cfgports; i++) {
 		struct ipath_portdata *pd = dd->ipath_pd[i];
 
@@ -991,7 +989,7 @@ static void handle_urcv(struct ipath_devdata *dd, u32 istat)
 	}
 	if (rcvdint) {
 		/* only want to take one interrupt, so turn off the rcv
-		 * interrupt for all the ports that we did the wakeup on
+		 * interrupt for all the ports that we set the rcv_waiting
 		 * (but never for kernel port)
 		 */
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
@@ -1006,8 +1004,7 @@ irqreturn_t ipath_intr(int irq, void *data)
 	ipath_err_t estat = 0;
 	irqreturn_t ret;
 	static unsigned unexpected = 0;
-	static const u32 port0rbits = (1U<<INFINIPATH_I_RCVAVAIL_SHIFT) |
-		 (1U<<INFINIPATH_I_RCVURG_SHIFT);
+	u64 kportrbits;
 
 	ipath_stats.sps_ints++;
 
@@ -1076,9 +1073,7 @@ irqreturn_t ipath_intr(int irq, void *data)
 			ipath_dev_err(dd, "Read of error status failed "
 				      "(all bits set); ignoring\n");
 		else
-			if (handle_errors(dd, estat))
-				/* force calling ipath_kreceive() */
-				chk0rcv = 1;
+			chk0rcv |= handle_errors(dd, estat);
 	}
 
 	if (istat & INFINIPATH_I_GPIO) {
@@ -1158,7 +1153,6 @@ irqreturn_t ipath_intr(int irq, void *data)
 					(u64) to_clear);
 		}
 	}
-	chk0rcv |= istat & port0rbits;
 
 	/*
 	 * Clear the interrupt bits we found set, unless they are receive
@@ -1171,20 +1165,20 @@ irqreturn_t ipath_intr(int irq, void *data)
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat);
 
 	/*
-	 * handle port0 receive  before checking for pio buffers available,
-	 * since receives can overflow; piobuf waiters can afford a few
-	 * extra cycles, since they were waiting anyway, and user's waiting
-	 * for receive are at the bottom.
+	 * Handle kernel receive queues before checking for pio buffers
+	 * available since receives can overflow; piobuf waiters can afford
+	 * a few extra cycles, since they were waiting anyway, and user's
+	 * waiting for receive are at the bottom.
 	 */
-	if (chk0rcv) {
+	kportrbits = (1ULL << dd->ipath_i_rcvavail_shift) |
+		(1ULL << dd->ipath_i_rcvurg_shift);
+	if (chk0rcv || (istat & kportrbits)) {
+		istat &= ~kportrbits;
 		ipath_kreceive(dd->ipath_pd[0]);
-		istat &= ~port0rbits;
 	}
 
-	if (istat & ((dd->ipath_i_rcvavail_mask <<
-		      INFINIPATH_I_RCVAVAIL_SHIFT)
-		     | (dd->ipath_i_rcvurg_mask <<
-			INFINIPATH_I_RCVURG_SHIFT)))
+	if (istat & ((dd->ipath_i_rcvavail_mask << dd->ipath_i_rcvavail_shift) |
+		     (dd->ipath_i_rcvurg_mask << dd->ipath_i_rcvurg_shift)))
 		handle_urcv(dd, istat);
 
 	if (istat & INFINIPATH_I_SPIOBUFAVAIL) {
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 8018383..7fae888 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -175,6 +175,8 @@ struct ipath_portdata {
 	u16 poll_type;
 	/* port rcvhdrq head offset */
 	u32 port_head;
+	/* receive packet sequence counter */
+	u32 port_seq_cnt;
 };
 
 struct sk_buff;
@@ -224,11 +226,6 @@ struct ipath_devdata {
 	unsigned long ipath_physaddr;
 	/* base of memory alloced for ipath_kregbase, for free */
 	u64 *ipath_kregalloc;
-	/*
-	 * virtual address where port0 rcvhdrqtail updated for this unit.
-	 * only written to by the chip, not the driver.
-	 */
-	volatile __le64 *ipath_hdrqtailptr;
 	/* ipath_cfgports pointers */
 	struct ipath_portdata **ipath_pd;
 	/* sk_buffs used by port 0 eager receive queue */
@@ -286,6 +283,7 @@ struct ipath_devdata {
 	/* per chip actions needed for IB Link up/down changes */
 	int (*ipath_f_ib_updown)(struct ipath_devdata *, int, u64);
 
+	unsigned ipath_lastegr_idx;
 	struct ipath_ibdev *verbs_dev;
 	struct timer_list verbs_timer;
 	/* total dwords sent (summed from counter) */
@@ -593,14 +591,6 @@ struct ipath_devdata {
 	u8 ipath_minrev;
 	/* board rev, from ipath_revision */
 	u8 ipath_boardrev;
-
-	u8 ipath_r_portenable_shift;
-	u8 ipath_r_intravail_shift;
-	u8 ipath_r_tailupd_shift;
-	u8 ipath_r_portcfg_shift;
-
-	/* unit # of this chip, if present */
-	int ipath_unit;
 	/* saved for restore after reset */
 	u8 ipath_pci_cacheline;
 	/* LID mask control */
@@ -616,6 +606,14 @@ struct ipath_devdata {
 	/* Rx Polarity inversion (compensate for ~tx on partner) */
 	u8 ipath_rx_pol_inv;
 
+	u8 ipath_r_portenable_shift;
+	u8 ipath_r_intravail_shift;
+	u8 ipath_r_tailupd_shift;
+	u8 ipath_r_portcfg_shift;
+
+	/* unit # of this chip, if present */
+	int ipath_unit;
+
 	/* local link integrity counter */
 	u32 ipath_lli_counter;
 	/* local link integrity errors */
@@ -645,8 +643,8 @@ struct ipath_devdata {
 	 * Below should be computable from number of ports,
 	 * since they are never modified.
 	 */
-	u32 ipath_i_rcvavail_mask;
-	u32 ipath_i_rcvurg_mask;
+	u64 ipath_i_rcvavail_mask;
+	u64 ipath_i_rcvurg_mask;
 	u16 ipath_i_rcvurg_shift;
 	u16 ipath_i_rcvavail_shift;
 
@@ -836,6 +834,8 @@ void ipath_hol_event(unsigned long);
 #define IPATH_LINKUNK       0x400
 		/* Write combining flush needed for PIO */
 #define IPATH_PIO_FLUSH_WC  0x1000
+		/* DMA Receive tail pointer */
+#define IPATH_NODMA_RTAIL   0x2000
 		/* no IB cable, or no device on IB cable */
 #define IPATH_NOCABLE       0x4000
 		/* Supports port zero per packet receive interrupts via
@@ -846,9 +846,9 @@ void ipath_hol_event(unsigned long);
 		/* packet/word counters are 32 bit, else those 4 counters
 		 * are 64bit */
 #define IPATH_32BITCOUNTERS 0x20000
-		/* can miss port0 rx interrupts */
 		/* Interrupt register is 64 bits */
 #define IPATH_INTREG_64     0x40000
+		/* can miss port0 rx interrupts */
 #define IPATH_DISABLED      0x80000 /* administratively disabled */
 		/* Use GPIO interrupts for new counters */
 #define IPATH_GPIO_ERRINTRS 0x100000
@@ -1036,6 +1036,27 @@ static inline u32 ipath_get_rcvhdrtail(const struct ipath_portdata *pd)
 				pd->port_rcvhdrtail_kvaddr));
 }
 
+static inline u32 ipath_get_hdrqtail(const struct ipath_portdata *pd)
+{
+	const struct ipath_devdata *dd = pd->port_dd;
+	u32 hdrqtail;
+
+	if (dd->ipath_flags & IPATH_NODMA_RTAIL) {
+		__le32 *rhf_addr;
+		u32 seq;
+
+		rhf_addr = (__le32 *) pd->port_rcvhdrq +
+			pd->port_head + dd->ipath_rhf_offset;
+		seq = ipath_hdrget_seq(rhf_addr);
+		hdrqtail = pd->port_head;
+		if (seq == pd->port_seq_cnt)
+			hdrqtail++;
+	} else
+		hdrqtail = ipath_get_rcvhdrtail(pd);
+
+	return hdrqtail;
+}
+
 static inline u64 ipath_read_ireg(const struct ipath_devdata *dd, ipath_kreg r)
 {
 	return (dd->ipath_flags & IPATH_INTREG_64) ?
diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h
index f49f184..b7d87d3 100644
--- a/drivers/infiniband/hw/ipath/ipath_registers.h
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h
@@ -86,8 +86,6 @@
 #define INFINIPATH_R_QPMAP_ENABLE (1ULL << 38)
 
 /* kr_intstatus, kr_intclear, kr_intmask bits */
-#define INFINIPATH_I_RCVURG_SHIFT 0
-#define INFINIPATH_I_RCVAVAIL_SHIFT 12
 #define INFINIPATH_I_ERROR        0x80000000
 #define INFINIPATH_I_SPIOSENT     0x40000000
 #define INFINIPATH_I_SPIOBUFAVAIL 0x20000000
diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c
index 57eb1d5..adff2f1 100644
--- a/drivers/infiniband/hw/ipath/ipath_stats.c
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c
@@ -136,6 +136,7 @@ static void ipath_qcheck(struct ipath_devdata *dd)
 	struct ipath_portdata *pd = dd->ipath_pd[0];
 	size_t blen = 0;
 	char buf[128];
+	u32 hdrqtail;
 
 	*buf = 0;
 	if (pd->port_hdrqfull != dd->ipath_p0_hdrqfull) {
@@ -174,17 +175,18 @@ static void ipath_qcheck(struct ipath_devdata *dd)
 	if (blen)
 		ipath_dbg("%s\n", buf);
 
-	if (pd->port_head != (u32)
-	    le64_to_cpu(*dd->ipath_hdrqtailptr)) {
+	hdrqtail = ipath_get_hdrqtail(pd);
+	if (pd->port_head != hdrqtail) {
 		if (dd->ipath_lastport0rcv_cnt ==
 		    ipath_stats.sps_port0pkts) {
 			ipath_cdbg(PKT, "missing rcv interrupts? "
-				   "port0 hd=%llx tl=%x; port0pkts %llx\n",
-				   (unsigned long long)
-				   le64_to_cpu(*dd->ipath_hdrqtailptr),
-				   pd->port_head,
+				   "port0 hd=%x tl=%x; port0pkts %llx; write"
+				   " hd (w/intr)\n",
+				   pd->port_head, hdrqtail,
 				   (unsigned long long)
 				   ipath_stats.sps_port0pkts);
+			ipath_write_ureg(dd, ur_rcvhdrhead, hdrqtail |
+				dd->ipath_rhdrhead_intr_off, pd->port_port);
 		}
 		dd->ipath_lastport0rcv_cnt = ipath_stats.sps_port0pkts;
 	}


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:22 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:22 -0700
Subject: [ofa-general] [PATCH 04/20] IB/ipath - Make link state transition
	code ignore (transient) link recovery
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224921.28598.25504.stgit@eng-46.mv.qlogic.com>

From: Dave Olson <dave.olson at qlogic.com>

The hardware-based recovery doesn't need any intervention, and in a few
cases we can get a bit confused about state and skip steps such as
turning off the link state LED when we consider recovery to be "down".
So ignore this transition, and either we recover in hardware, or we
transition to down, and will handle it then.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_intr.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index 826b96b..3bad601 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -300,6 +300,18 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd,
 	ltstate = ipath_ib_linktrstate(dd, ibcs); /* linktrainingtate */
 
 	/*
+	 * Since going into a recovery state causes the link state to go
+	 * down and since recovery is transitory, it is better if we "miss"
+	 * ever seeing the link training state go into recovery (i.e.,
+	 * ignore this transition for link state special handling purposes)
+	 * without even updating ipath_lastibcstat.
+	 */
+	if ((ltstate == INFINIPATH_IBCS_LT_STATE_RECOVERRETRAIN) ||
+	    (ltstate == INFINIPATH_IBCS_LT_STATE_RECOVERWAITRMT) ||
+	    (ltstate == INFINIPATH_IBCS_LT_STATE_RECOVERIDLE))
+		goto done;
+
+	/*
 	 * if linkstate transitions into INIT from any of the various down
 	 * states, or if it transitions from any of the up (INIT or better)
 	 * states into any of the down states (except link recovery), then
@@ -316,7 +328,7 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd,
 		}
 	} else if ((lastlstate >= INFINIPATH_IBCS_L_STATE_INIT ||
 		(dd->ipath_flags & IPATH_IB_FORCE_NOTIFY)) &&
-		ltstate <= INFINIPATH_IBCS_LT_STATE_CFGDEBOUNCE &&
+		ltstate <= INFINIPATH_IBCS_LT_STATE_CFGWAITRMT &&
 		ltstate != INFINIPATH_IBCS_LT_STATE_LINKUP) {
 		int handled;
 		handled = dd->ipath_f_ib_updown(dd, 0, ibcs);
@@ -460,6 +472,8 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd,
 
 skip_ibchange:
 	dd->ipath_lastibcstat = ibcs;
+done:
+	return;
 }
 
 static void handle_supp_msgs(struct ipath_devdata *dd,


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:27 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:27 -0700
Subject: [ofa-general] [PATCH 05/20] IB/ipath - Add support for IBTA 1.2
	Heartbeat
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224927.28598.57384.stgit@eng-46.mv.qlogic.com>

From: Dave Olson <dave.olson at qlogic.com>

This patch adds code to enable/disable the IBTA 1.2 heartbeat for testing
if the HCA supports it.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h |    2 ++
 drivers/infiniband/hw/ipath/ipath_driver.c |   31 +++++++++++++++++++++++++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index edd4183..3c05d4b 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -80,6 +80,8 @@
 #define IPATH_IB_LINKDOWN_DISABLE	5
 #define IPATH_IB_LINK_LOOPBACK	6 /* enable local loopback */
 #define IPATH_IB_LINK_EXTERNAL	7 /* normal, disable local loopback */
+#define IPATH_IB_LINK_NO_HRTBT	8 /* disable Heartbeat, e.g. for loopback */
+#define IPATH_IB_LINK_HRTBT	9 /* enable heartbeat, normal, non-loopback */
 
 /*
  * These 3 values (SDR and DDR may be ORed for auto-speed
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index eef2599..58aa255 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -1880,16 +1880,41 @@ int ipath_set_linkstate(struct ipath_devdata *dd, u8 newstate)
 		dd->ipath_ibcctrl |= INFINIPATH_IBCC_LOOPBACK;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl,
 				 dd->ipath_ibcctrl);
+
+		/* turn heartbeat off, as it causes loopback to fail */
+		dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT,
+				       IPATH_IB_HRTBT_OFF);
+		/* don't wait */
 		ret = 0;
-		goto bail; // no state change to wait for
+		goto bail;
 
 	case IPATH_IB_LINK_EXTERNAL:
-		dev_info(&dd->pcidev->dev, "Disabling IB local loopback (normal)\n");
+		dev_info(&dd->pcidev->dev,
+			"Disabling IB local loopback (normal)\n");
+		dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT,
+				       IPATH_IB_HRTBT_ON);
 		dd->ipath_ibcctrl &= ~INFINIPATH_IBCC_LOOPBACK;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl,
 				 dd->ipath_ibcctrl);
+		/* don't wait */
 		ret = 0;
-		goto bail; // no state change to wait for
+		goto bail;
+
+	/*
+	 * Heartbeat can be explicitly enabled by the user via
+	 * "hrtbt_enable" "file", and if disabled, trying to enable here
+	 * will have no effect.  Implicit changes (heartbeat off when
+	 * loopback on, and vice versa) are included to ease testing.
+	 */
+	case IPATH_IB_LINK_HRTBT:
+		ret = dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT,
+			IPATH_IB_HRTBT_ON);
+		goto bail;
+
+	case IPATH_IB_LINK_NO_HRTBT:
+		ret = dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_HRTBT,
+			IPATH_IB_HRTBT_OFF);
+		goto bail;
 
 	default:
 		ipath_dbg("Invalid linkstate 0x%x requested\n", newstate);


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:32 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:32 -0700
Subject: [ofa-general] [PATCH 06/20] IB/ipath - set LID filtering for HCAs
	that support it.
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224932.28598.58514.stgit@eng-46.mv.qlogic.com>

From: Dave Olson <dave.olson at qlogic.com>

Whenever the LID is set, notify the HCA specific code so that
the appropriate HW registers can be updated.
Also log the info on the console at low priority.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 58aa255..53f8ae4 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -2006,11 +2006,16 @@ bail:
 	return ret;
 }
 
-int ipath_set_lid(struct ipath_devdata *dd, u32 arg, u8 lmc)
+int ipath_set_lid(struct ipath_devdata *dd, u32 lid, u8 lmc)
 {
-	dd->ipath_lid = arg;
+	dd->ipath_lid = lid;
 	dd->ipath_lmc = lmc;
 
+	dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_LIDLMC, lid |
+		(~((1U << lmc) - 1)) << 16);
+
+	dev_info(&dd->pcidev->dev, "We got a lid: 0x%x\n", lid);
+
 	return 0;
 }
 

From ralph.campbell at qlogic.com  Wed Apr  2 15:49:37 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:37 -0700
Subject: [ofa-general] [PATCH 07/20] IB/ipath - Enable reduced PIO updated
	for HCAs that support it.
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224937.28598.35413.stgit@eng-46.mv.qlogic.com>

From: Dave Olson <dave.olson at qlogic.com>

Newer HCAs have a threshold counter to reduce the number of DMAs
the chip makes to update the PIO buffer availability status bits.
This patch enables the feature.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_file_ops.c  |   23 +++++++++++++++++++++++
 drivers/infiniband/hw/ipath/ipath_init_chip.c |   22 +++++++++++++++++++++-
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    1 +
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index 17d4e97..eab69df 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -184,6 +184,29 @@ static int ipath_get_base_info(struct file *fp,
 		kinfo->spi_piobufbase = (u64) pd->port_piobufs +
 			dd->ipath_palign * kinfo->spi_piocnt * slave;
 	}
+
+	/*
+	 * Set the PIO avail update threshold to no larger
+	 * than the number of buffers per process. Note that
+	 * we decrease it here, but won't ever increase it.
+	 */
+	if (dd->ipath_pioupd_thresh &&
+	    kinfo->spi_piocnt < dd->ipath_pioupd_thresh) {
+		unsigned long flags;
+
+		dd->ipath_pioupd_thresh = kinfo->spi_piocnt;
+		ipath_dbg("Decreased pio update threshold to %u\n",
+			dd->ipath_pioupd_thresh);
+		spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+		dd->ipath_sendctrl &= ~(INFINIPATH_S_UPDTHRESH_MASK
+			<< INFINIPATH_S_UPDTHRESH_SHIFT);
+		dd->ipath_sendctrl |= dd->ipath_pioupd_thresh
+			<< INFINIPATH_S_UPDTHRESH_SHIFT;
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+			dd->ipath_sendctrl);
+		spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+	}
+
 	if (shared) {
 		kinfo->spi_port_uregbase = (u64) dd->ipath_uregbase +
 			dd->ipath_ureg_align * pd->port_port;
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 720ff4d..1adafa9 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -341,6 +341,7 @@ static int init_chip_reset(struct ipath_devdata *dd)
 {
 	u32 rtmp;
 	int i;
+	unsigned long flags;
 
 	/*
 	 * ensure chip does no sends or receives, tail updates, or
@@ -356,8 +357,13 @@ static int init_chip_reset(struct ipath_devdata *dd)
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 		dd->ipath_rcvctrl);
 
+	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+	dd->ipath_sendctrl = 0U; /* no sdma, etc */
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_control, dd->ipath_control);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_control, 0ULL);
 
 	rtmp = ipath_read_kreg32(dd, dd->ipath_kregs->kr_rcvtidcnt);
 	if (rtmp != dd->ipath_rcvtidcnt)
@@ -478,6 +484,14 @@ static void enable_chip(struct ipath_devdata *dd, int reinit)
 	/* Enable PIO send, and update of PIOavail regs to memory. */
 	dd->ipath_sendctrl = INFINIPATH_S_PIOENABLE |
 		INFINIPATH_S_PIOBUFAVAILUPD;
+
+	/*
+	 * Set the PIO avail update threshold to host memory
+	 * on chips that support it.
+	 */
+	if (dd->ipath_pioupd_thresh)
+		dd->ipath_sendctrl |= dd->ipath_pioupd_thresh
+			<< INFINIPATH_S_UPDTHRESH_SHIFT;
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
 	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
 	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
@@ -757,6 +771,12 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 	ipath_cdbg(VERBOSE, "%d PIO bufs for kernel out of %d total %u "
 		   "each for %u user ports\n", kpiobufs,
 		   piobufs, dd->ipath_pbufsport, uports);
+	if (dd->ipath_pioupd_thresh) {
+		if (dd->ipath_pbufsport < dd->ipath_pioupd_thresh)
+			dd->ipath_pioupd_thresh = dd->ipath_pbufsport;
+		if (kpiobufs < dd->ipath_pioupd_thresh)
+			dd->ipath_pioupd_thresh = kpiobufs;
+	}
 
 	dd->ipath_f_early_init(dd);
 	/*
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 7fae888..e96eec2 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -349,6 +349,7 @@ struct ipath_devdata {
 	u32 ipath_lastrpkts;
 	/* pio bufs allocated per port */
 	u32 ipath_pbufsport;
+	u32 ipath_pioupd_thresh; /* update threshold, some chips */
 	/*
 	 * number of ports configured as max; zero is set to number chip
 	 * supports, less gives more pio bufs/port, etc.


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:42 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:42 -0700
Subject: [ofa-general] [PATCH 08/20] IB/ipath - fix check for no interrupts
	to reliably fallback to INTx
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224942.28598.29782.stgit@eng-46.mv.qlogic.com>

From: Dave Olson <dave.olson at qlogic.com>

Newer HCAs support MSI interrupts and also INTx interrupts.
Fix the code so that INTx can be reliably enabled if MSI interrupts
are not working.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c    |   23 +++-------------
 drivers/infiniband/hw/ipath/ipath_init_chip.c |   36 +++++++++++++++++++++++++
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    5 +--
 3 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 53f8ae4..b4a69ef 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -138,19 +138,6 @@ static struct pci_driver ipath_driver = {
 	},
 };
 
-static void ipath_check_status(struct work_struct *work)
-{
-	struct ipath_devdata *dd = container_of(work, struct ipath_devdata,
-						status_work.work);
-
-	/*
-	 * If we don't have any interrupts, let the user know and
-	 * don't bother checking again.
-	 */
-	if (dd->ipath_int_counter == 0)
-		dev_err(&dd->pcidev->dev, "No interrupts detected.\n");
-}
-
 static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev,
 			     u32 *bar0, u32 *bar1)
 {
@@ -218,8 +205,6 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev)
 	dd->pcidev = pdev;
 	pci_set_drvdata(pdev, dd);
 
-	INIT_DELAYED_WORK(&dd->status_work, ipath_check_status);
-
 	list_add(&dd->ipath_list, &ipath_dev_list);
 
 bail_unlock:
@@ -620,9 +605,6 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 	ipath_diag_add(dd);
 	ipath_register_ib_device(dd);
 
-	/* Check that card status in STATUS_TIMEOUT seconds. */
-	schedule_delayed_work(&dd->status_work, HZ * STATUS_TIMEOUT);
-
 	goto bail;
 
 bail_irqsetup:
@@ -753,7 +735,6 @@ static void __devexit ipath_remove_one(struct pci_dev *pdev)
 	 */
 	ipath_shutdown_device(dd);
 
-	cancel_delayed_work(&dd->status_work);
 	flush_scheduled_work();
 
 	if (dd->verbs_dev)
@@ -2195,6 +2176,10 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 		del_timer_sync(&dd->ipath_stats_timer);
 		dd->ipath_stats_timer_active = 0;
 	}
+	if (dd->ipath_intrchk_timer.data) {
+		del_timer_sync(&dd->ipath_intrchk_timer);
+		dd->ipath_intrchk_timer.data = 0;
+	}
 
 	/*
 	 * clear all interrupts and errors, so that the next time the driver
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 1adafa9..0db19c1 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -665,6 +665,28 @@ done:
 	return ret;
 }
 
+static void verify_interrupt(unsigned long opaque)
+{
+	struct ipath_devdata *dd = (struct ipath_devdata *) opaque;
+
+	if (!dd)
+		return; /* being torn down */
+
+	/*
+	 * If we don't have any interrupts, let the user know and
+	 * don't bother checking again.
+	 */
+	if (dd->ipath_int_counter == 0) {
+		if (!dd->ipath_f_intr_fallback(dd))
+			dev_err(&dd->pcidev->dev, "No interrupts detected, "
+				"not usable.\n");
+		else /* re-arm the timer to see if fallback works */
+			mod_timer(&dd->ipath_intrchk_timer, jiffies + HZ/2);
+	} else
+		ipath_cdbg(VERBOSE, "%u interrupts at timer check\n",
+			dd->ipath_int_counter);
+}
+
 /**
  * ipath_init_chip - do the actual initialization sequence on the chip
  * @dd: the infinipath device
@@ -968,6 +990,20 @@ done:
 					 0ULL);
 			/* chip is usable; mark it as initialized */
 			*dd->ipath_statusp |= IPATH_STATUS_INITTED;
+
+			/*
+			 * setup to verify we get an interrupt, and fallback
+			 * to an alternate if necessary and possible
+			 */
+			if (!reinit) {
+				init_timer(&dd->ipath_intrchk_timer);
+				dd->ipath_intrchk_timer.function =
+					verify_interrupt;
+				dd->ipath_intrchk_timer.data =
+					(unsigned long) dd;
+			}
+			dd->ipath_intrchk_timer.expires = jiffies + HZ/2;
+			add_timer(&dd->ipath_intrchk_timer);
 		} else
 			ipath_dev_err(dd, "No interrupts enabled, couldn't "
 				      "setup interrupt address\n");
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index e96eec2..90bbbc7 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -426,6 +426,8 @@ struct ipath_devdata {
 	struct class_device *diag_class_dev;
 	/* timer used to prevent stats overflow, error throttling, etc. */
 	struct timer_list ipath_stats_timer;
+	/* timer to verify interrupts work, and fallback if possible */
+	struct timer_list ipath_intrchk_timer;
 	void *ipath_dummy_hdrq;	/* used after port close */
 	dma_addr_t ipath_dummy_hdrq_phys;
 
@@ -629,9 +631,6 @@ struct ipath_devdata {
 	u32 ipath_overrun_thresh_errs;
 	u32 ipath_lli_errs;
 
-	/* status check work */
-	struct delayed_work status_work;
-
 	/*
 	 * Not all devices managed by a driver instance are the same
 	 * type, so these fields must be per-device.


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:47 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:47 -0700
Subject: [ofa-general] [PATCH 09/20] IB/ipath - fix up error handling
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224947.28598.52232.stgit@eng-46.mv.qlogic.com>

This patch makes chip reset more robust and reduces lock contention
between user and kernel TID register updates.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_iba6120.c   |   79 ++++++++++++++++++++-----
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    2 -
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    2 -
 3 files changed, 66 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index 907b61b..c8d8f1a 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -558,12 +558,40 @@ static void ipath_pe_handle_hwerrors(struct ipath_devdata *dd, char *msg,
 				 dd->ipath_hwerrmask);
 	}
 
-	if (*msg)
+	if (hwerrs) {
+		/*
+		 * if any set that we aren't ignoring; only
+		 * make the complaint once, in case it's stuck
+		 * or recurring, and we get here multiple
+		 * times.
+		 */
 		ipath_dev_err(dd, "%s hardware error\n", msg);
-	if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg) {
+		if (dd->ipath_flags & IPATH_INITTED) {
+			ipath_set_linkstate(dd, IPATH_IB_LINKDOWN);
+			ipath_setup_pe_setextled(dd,
+				INFINIPATH_IBCS_L_STATE_DOWN,
+				INFINIPATH_IBCS_LT_STATE_DISABLED);
+			ipath_dev_err(dd, "Fatal Hardware Error (freeze "
+					  "mode), no longer usable, SN %.16s\n",
+					  dd->ipath_serial);
+			isfatal = 1;
+		}
+		*dd->ipath_statusp &= ~IPATH_STATUS_IB_READY;
+		/* mark as having had error */
+		*dd->ipath_statusp |= IPATH_STATUS_HWERROR;
+		/*
+		 * mark as not usable, at a minimum until driver
+		 * is reloaded, probably until reboot, since no
+		 * other reset is possible.
+		 */
+		dd->ipath_flags &= ~IPATH_INITTED;
+	} else
+		*msg = 0; /* recovered from all of them */
+
+	if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg && msg) {
 		/*
-		 * for /sys status file ; if no trailing } is copied, we'll
-		 * know it was truncated.
+		 * for /sys status file ; if no trailing brace is copied,
+		 * we'll know it was truncated.
 		 */
 		snprintf(dd->ipath_freezemsg, dd->ipath_freezelen,
 			 "{%s}", msg);
@@ -1127,10 +1155,7 @@ static void ipath_init_pe_variables(struct ipath_devdata *dd)
 		INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
 		INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT;
 
-	dd->ipath_eep_st_masks[2].errs_to_log =
-		INFINIPATH_E_INVALIDADDR | INFINIPATH_E_RESET;
-
-
+	dd->ipath_eep_st_masks[2].errs_to_log = INFINIPATH_E_RESET;
 	dd->delay_mult = 2; /* SDR, 4X, can't change */
 }
 
@@ -1204,6 +1229,9 @@ static int ipath_setup_pe_reset(struct ipath_devdata *dd)
 	u64 val;
 	int i;
 	int ret;
+	u16 cmdval;
+
+	pci_read_config_word(dd->pcidev, PCI_COMMAND, &cmdval);
 
 	/* Use ERROR so it shows up in logs, etc. */
 	ipath_dev_err(dd, "Resetting InfiniPath unit %u\n", dd->ipath_unit);
@@ -1231,10 +1259,14 @@ static int ipath_setup_pe_reset(struct ipath_devdata *dd)
 			ipath_dev_err(dd, "rewrite of BAR1 failed: %d\n",
 				      r);
 		/* now re-enable memory access */
+		pci_write_config_word(dd->pcidev, PCI_COMMAND, cmdval);
 		if ((r = pci_enable_device(dd->pcidev)))
 			ipath_dev_err(dd, "pci_enable_device failed after "
 				      "reset: %d\n", r);
-		/* whether it worked or not, mark as present, again */
+		/*
+		 * whether it fully enabled or not, mark as present,
+		 * again (but not INITTED)
+		 */
 		dd->ipath_flags |= IPATH_PRESENT;
 		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_revision);
 		if (val == dd->ipath_revision) {
@@ -1273,6 +1305,11 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr,
 {
 	u32 __iomem *tidp32 = (u32 __iomem *)tidptr;
 	unsigned long flags = 0; /* keep gcc quiet */
+	int tidx;
+	spinlock_t *tidlockp;
+
+	if (!dd->ipath_kregbase)
+		return;
 
 	if (pa != dd->ipath_tidinvalid) {
 		if (pa & ((1U << 11) - 1)) {
@@ -1302,14 +1339,22 @@ static void ipath_pe_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr,
 	 * call can be done from interrupt level for the port 0 eager TIDs,
 	 * so we have to use irqsave locks.
 	 */
-	spin_lock_irqsave(&dd->ipath_tid_lock, flags);
+	/*
+	 * Assumes tidptr always > ipath_egrtidbase
+	 * if type == RCVHQ_RCV_TYPE_EAGER.
+	 */
+	tidx = tidptr - dd->ipath_egrtidbase;
+
+	tidlockp = (type == RCVHQ_RCV_TYPE_EAGER && tidx < dd->ipath_rcvegrcnt)
+		? &dd->ipath_kernel_tid_lock : &dd->ipath_user_tid_lock;
+	spin_lock_irqsave(tidlockp, flags);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_scratch, 0xfeeddeaf);
-	if (dd->ipath_kregbase)
-		writel(pa, tidp32);
+	writel(pa, tidp32);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_scratch, 0xdeadbeef);
 	mmiowb();
-	spin_unlock_irqrestore(&dd->ipath_tid_lock, flags);
+	spin_unlock_irqrestore(tidlockp, flags);
 }
+
 /**
  * ipath_pe_put_tid_2 - write a TID in chip, Revision 2 or higher
  * @dd: the infinipath device
@@ -1325,6 +1370,10 @@ static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr,
 			     u32 type, unsigned long pa)
 {
 	u32 __iomem *tidp32 = (u32 __iomem *)tidptr;
+	u32 tidx;
+
+	if (!dd->ipath_kregbase)
+		return;
 
 	if (pa != dd->ipath_tidinvalid) {
 		if (pa & ((1U << 11) - 1)) {
@@ -1344,8 +1393,8 @@ static void ipath_pe_put_tid_2(struct ipath_devdata *dd, u64 __iomem *tidptr,
 		else /* for now, always full 4KB page */
 			pa |= 2 << 29;
 	}
-	if (dd->ipath_kregbase)
-		writel(pa, tidp32);
+	tidx = tidptr - dd->ipath_egrtidbase;
+	writel(pa, tidp32);
 	mmiowb();
 }
 
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 0db19c1..8d8e572 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -319,7 +319,7 @@ static int init_chip_first(struct ipath_devdata *dd)
 	else ipath_dbg("%u 2k piobufs @ %p\n",
 		       dd->ipath_piobcnt2k, dd->ipath_pio2kbase);
 
-	spin_lock_init(&dd->ipath_tid_lock);
+	spin_lock_init(&dd->ipath_user_tid_lock);
 	spin_lock_init(&dd->ipath_sendctrl_lock);
 	spin_lock_init(&dd->ipath_gpio_lock);
 	spin_lock_init(&dd->ipath_eep_st_lock);
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 90bbbc7..0504937 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -407,7 +407,7 @@ struct ipath_devdata {
 	u64 __iomem *ipath_egrtidbase;
 	/* lock to workaround chip bug 9437 and others */
 	spinlock_t ipath_kernel_tid_lock;
-	spinlock_t ipath_tid_lock;
+	spinlock_t ipath_user_tid_lock;
 	spinlock_t ipath_sendctrl_lock;
 
 	/*


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:52 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:52 -0700
Subject: [ofa-general] [PATCH 10/20] IB/ipath - Header file changes to
	support IBA7220
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224952.28598.48402.stgit@eng-46.mv.qlogic.com>

This is part of a patch series to add support for a new HCA.
This patch adds new fields to the header files.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h    |    3 
 drivers/infiniband/hw/ipath/ipath_kernel.h    |  165 ++++++++++++++++++++++++-
 drivers/infiniband/hw/ipath/ipath_registers.h |  138 +++++++++++++++------
 drivers/infiniband/hw/ipath/ipath_verbs.h     |   32 ++++-
 4 files changed, 284 insertions(+), 54 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index 3c05d4b..02fd310 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -201,6 +201,7 @@ typedef enum _ipath_ureg {
 #define IPATH_RUNTIME_RCVHDR_COPY	0x8
 #define IPATH_RUNTIME_MASTER	0x10
 #define IPATH_RUNTIME_NODMA_RTAIL 0x80
+#define IPATH_RUNTIME_SDMA	      0x200
 #define IPATH_RUNTIME_FORCE_PIOAVAIL 0x400
 #define IPATH_RUNTIME_PIO_REGSWAPPED 0x800
 
@@ -539,7 +540,7 @@ struct ipath_diag_pkt {
 
 /* The second diag_pkt struct is the expanded version that allows
  * more control over the packet, specifically, by allowing a custom
- * pbc (+ extra) qword, so that special modes and deliberate
+ * pbc (+ static rate) qword, so that special modes and deliberate
  * changes to CRCs can be used. The elements were also re-ordered
  * for better alignment and to avoid padding issues.
  */
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 0504937..8cdeab8 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -42,6 +42,8 @@
 #include <linux/pci.h>
 #include <linux/dma-mapping.h>
 #include <linux/mutex.h>
+#include <linux/list.h>
+#include <linux/scatterlist.h>
 #include <asm/io.h>
 #include <rdma/ib_verbs.h>
 
@@ -180,6 +182,8 @@ struct ipath_portdata {
 };
 
 struct sk_buff;
+struct ipath_sge_state;
+struct ipath_verbs_txreq;
 
 /*
  * control information for layered drivers
@@ -193,6 +197,37 @@ struct ipath_skbinfo {
 	dma_addr_t phys;
 };
 
+struct ipath_sdma_txreq {
+	int                 flags;
+	int                 sg_count;
+	union {
+		struct scatterlist *sg;
+		void *map_addr;
+	};
+	void              (*callback)(void *, int);
+	void               *callback_cookie;
+	int                 callback_status;
+	u16                 start_idx;  /* sdma private */
+	u16                 next_descq_idx;  /* sdma private */
+	struct list_head    list;       /* sdma private */
+};
+
+struct ipath_sdma_desc {
+	__le64 qw[2];
+};
+
+#define IPATH_SDMA_TXREQ_F_USELARGEBUF  0x1
+#define IPATH_SDMA_TXREQ_F_HEADTOHOST   0x2
+#define IPATH_SDMA_TXREQ_F_INTREQ       0x4
+#define IPATH_SDMA_TXREQ_F_FREEBUF      0x8
+#define IPATH_SDMA_TXREQ_F_FREEDESC     0x10
+#define IPATH_SDMA_TXREQ_F_VL15         0x20
+
+#define IPATH_SDMA_TXREQ_S_OK        0
+#define IPATH_SDMA_TXREQ_S_SENDERROR 1
+#define IPATH_SDMA_TXREQ_S_ABORTED   2
+#define IPATH_SDMA_TXREQ_S_SHUTDOWN  3
+
 /* max dwords in small buffer packet */
 #define IPATH_SMALLBUF_DWORDS (dd->ipath_piosize2k >> 2)
 
@@ -385,6 +420,15 @@ struct ipath_devdata {
 	u32 ipath_pcibar0;
 	/* so we can rewrite it after a chip reset */
 	u32 ipath_pcibar1;
+	u32 ipath_x1_fix_tries;
+	u32 ipath_autoneg_tries;
+	u32 serdes_first_init_done;
+
+	struct ipath_relock {
+		atomic_t ipath_relock_timer_active;
+		struct timer_list ipath_relock_timer;
+		unsigned int ipath_relock_interval; /* in jiffies */
+	} ipath_relock_singleton;
 
 	/* interrupt number */
 	int ipath_irq;
@@ -431,8 +475,38 @@ struct ipath_devdata {
 	void *ipath_dummy_hdrq;	/* used after port close */
 	dma_addr_t ipath_dummy_hdrq_phys;
 
+	/* SendDMA related entries */
+	spinlock_t            ipath_sdma_lock;
+	u64                   ipath_sdma_status;
+	unsigned long         ipath_sdma_abort_jiffies;
+	unsigned long         ipath_sdma_abort_intr_timeout;
+	unsigned long         ipath_sdma_buf_jiffies;
+	struct ipath_sdma_desc *ipath_sdma_descq;
+	u64		      ipath_sdma_descq_added;
+	u64		      ipath_sdma_descq_removed;
+	int		      ipath_sdma_desc_nreserved;
+	u16                   ipath_sdma_descq_cnt;
+	u16                   ipath_sdma_descq_tail;
+	u16                   ipath_sdma_descq_head;
+	u16                   ipath_sdma_next_intr;
+	u16                   ipath_sdma_reset_wait;
+	u8                    ipath_sdma_generation;
+	struct tasklet_struct ipath_sdma_abort_task;
+	struct tasklet_struct ipath_sdma_notify_task;
+	struct list_head      ipath_sdma_activelist;
+	struct list_head      ipath_sdma_notifylist;
+	atomic_t              ipath_sdma_vl15_count;
+	struct timer_list     ipath_sdma_vl15_timer;
+
+	dma_addr_t       ipath_sdma_descq_phys;
+	volatile __le64 *ipath_sdma_head_dma;
+	dma_addr_t       ipath_sdma_head_phys;
+
 	unsigned long ipath_ureg_align; /* user register alignment */
 
+	struct delayed_work ipath_autoneg_work;
+	wait_queue_head_t ipath_autoneg_wait;
+
 	/* HoL blocking / user app forward-progress state */
 	unsigned          ipath_hol_state;
 	unsigned          ipath_hol_next;
@@ -485,6 +559,8 @@ struct ipath_devdata {
 	u64 ipath_intconfig;
 	/* kr_sendpiobufbase value */
 	u64 ipath_piobufbase;
+	/* kr_ibcddrctrl shadow */
+	u64 ipath_ibcddrctrl;
 
 	/* these are the "32 bit" regs */
 
@@ -501,7 +577,10 @@ struct ipath_devdata {
 	unsigned long ipath_rcvctrl;
 	/* shadow kr_sendctrl */
 	unsigned long ipath_sendctrl;
-	unsigned long ipath_lastcancel; /* to not count armlaunch after cancel */
+	/* to not count armlaunch after cancel */
+	unsigned long ipath_lastcancel;
+	/* count cases where special trigger was needed (double write) */
+	unsigned long ipath_spectriggerhit;
 
 	/* value we put in kr_rcvhdrcnt */
 	u32 ipath_rcvhdrcnt;
@@ -523,6 +602,7 @@ struct ipath_devdata {
 	u32 ipath_piobcnt4k;
 	/* size in bytes of "4KB" PIO buffers */
 	u32 ipath_piosize4k;
+	u32 ipath_pioreserved; /* reserved special-inkernel; */
 	/* kr_rcvegrbase value */
 	u32 ipath_rcvegrbase;
 	/* kr_rcvegrcnt value */
@@ -586,7 +666,7 @@ struct ipath_devdata {
 	 */
 	u8 ipath_serial[16];
 	/* human readable board version */
-	u8 ipath_boardversion[80];
+	u8 ipath_boardversion[96];
 	u8 ipath_lbus_info[32]; /* human readable localbus info */
 	/* chip major rev, from ipath_revision */
 	u8 ipath_majrev;
@@ -715,6 +795,13 @@ struct ipath_devdata {
 	/* interrupt mitigation reload register info */
 	u16 ipath_jint_idle_ticks;	/* idle clock ticks */
 	u16 ipath_jint_max_packets;	/* max packets across all ports */
+
+	/*
+	 * lock for access to SerDes, and flags to sequence preset
+	 * versus steady-state. 7220-only at the moment.
+	 */
+	spinlock_t ipath_sdepb_lock;
+	u8 ipath_presets_needed; /* Set if presets to be restored next DOWN */
 };
 
 /* ipath_hol_state values (stopping/starting user proc, send flushing) */
@@ -724,11 +811,35 @@ struct ipath_devdata {
 #define IPATH_HOL_DOWNSTOP 0
 #define IPATH_HOL_DOWNCONT 1
 
+/* bit positions for sdma_status */
+#define IPATH_SDMA_ABORTING  0
+#define IPATH_SDMA_DISARMED  1
+#define IPATH_SDMA_DISABLED  2
+#define IPATH_SDMA_LAYERBUF  3
+#define IPATH_SDMA_RUNNING  62
+#define IPATH_SDMA_SHUTDOWN 63
+
+/* bit combinations that correspond to abort states */
+#define IPATH_SDMA_ABORT_NONE 0
+#define IPATH_SDMA_ABORT_ABORTING (1UL << IPATH_SDMA_ABORTING)
+#define IPATH_SDMA_ABORT_DISARMED ((1UL << IPATH_SDMA_ABORTING) | \
+	(1UL << IPATH_SDMA_DISARMED))
+#define IPATH_SDMA_ABORT_DISABLED ((1UL << IPATH_SDMA_ABORTING) | \
+	(1UL << IPATH_SDMA_DISABLED))
+#define IPATH_SDMA_ABORT_ABORTED ((1UL << IPATH_SDMA_ABORTING) | \
+	(1UL << IPATH_SDMA_DISARMED) | (1UL << IPATH_SDMA_DISABLED))
+#define IPATH_SDMA_ABORT_MASK ((1UL<<IPATH_SDMA_ABORTING) | \
+	(1UL << IPATH_SDMA_DISARMED) | (1UL << IPATH_SDMA_DISABLED))
+
+#define IPATH_SDMA_BUF_NONE 0
+#define IPATH_SDMA_BUF_MASK (1UL<<IPATH_SDMA_LAYERBUF)
+
 /* Private data for file operations */
 struct ipath_filedata {
 	struct ipath_portdata *pd;
 	unsigned subport;
 	unsigned tidcursor;
+	struct ipath_user_sdma_queue *pq;
 };
 extern struct list_head ipath_dev_list;
 extern spinlock_t ipath_devs_lock;
@@ -797,6 +908,10 @@ void ipath_disable_armlaunch(struct ipath_devdata *);
 void ipath_hol_down(struct ipath_devdata *);
 void ipath_hol_up(struct ipath_devdata *);
 void ipath_hol_event(unsigned long);
+void ipath_toggle_rclkrls(struct ipath_devdata *);
+void ipath_sd7220_clr_ibpar(struct ipath_devdata *);
+void ipath_set_relock_poll(struct ipath_devdata *, int);
+void ipath_shutdown_relock_poll(struct ipath_devdata *);
 
 /* for use in system calls, where we want to know device type, etc. */
 #define port_fp(fp) ((struct ipath_filedata *)(fp)->private_data)->pd
@@ -804,6 +919,8 @@ void ipath_hol_event(unsigned long);
 	((struct ipath_filedata *)(fp)->private_data)->subport
 #define tidcursor_fp(fp) \
 	((struct ipath_filedata *)(fp)->private_data)->tidcursor
+#define user_sdma_queue_fp(fp) \
+	((struct ipath_filedata *)(fp)->private_data)->pq
 
 /*
  * values for ipath_flags
@@ -853,9 +970,16 @@ void ipath_hol_event(unsigned long);
 		/* Use GPIO interrupts for new counters */
 #define IPATH_GPIO_ERRINTRS 0x100000
 #define IPATH_SWAP_PIOBUFS  0x200000
+		/* Supports Send DMA */
+#define IPATH_HAS_SEND_DMA  0x400000
+		/* Supports Send Count (not just word count) in PBC */
+#define IPATH_HAS_PBC_CNT   0x800000
 		/* Suppress heartbeat, even if turning off loopback */
 #define IPATH_NO_HRTBT      0x1000000
+#define IPATH_HAS_THRESH_UPDATE 0x4000000
 #define IPATH_HAS_MULT_IB_SPEED 0x8000000
+#define IPATH_IB_AUTONEG_INPROG 0x10000000
+#define IPATH_IB_AUTONEG_FAILED 0x20000000
 		/* Linkdown-disable intentionally, Do not attempt to bring up */
 #define IPATH_IB_LINK_DISABLED 0x40000000
 #define IPATH_IB_FORCE_NOTIFY 0x80000000 /* force notify on next ib change */
@@ -880,6 +1004,7 @@ void ipath_free_data(struct ipath_portdata *dd);
 u32 __iomem *ipath_getpiobuf(struct ipath_devdata *, u32, u32 *);
 void ipath_chg_pioavailkernel(struct ipath_devdata *dd, unsigned start,
 				unsigned len, int avail);
+void ipath_init_iba7220_funcs(struct ipath_devdata *);
 void ipath_init_iba6120_funcs(struct ipath_devdata *);
 void ipath_init_iba6110_funcs(struct ipath_devdata *);
 void ipath_get_eeprom_info(struct ipath_devdata *);
@@ -898,6 +1023,33 @@ void signal_ib_event(struct ipath_devdata *dd, enum ib_event_type ev);
 #define IPATH_LED_LOG 2  /* Logical (link) YELLOW LED */
 void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val);
 
+/* send dma routines */
+int setup_sdma(struct ipath_devdata *);
+void teardown_sdma(struct ipath_devdata *);
+void ipath_sdma_intr(struct ipath_devdata *);
+int ipath_sdma_verbs_send(struct ipath_devdata *, struct ipath_sge_state *,
+			  u32, struct ipath_verbs_txreq *);
+/* ipath_sdma_lock should be locked before calling this. */
+int ipath_sdma_make_progress(struct ipath_devdata *dd);
+
+/* must be called under ipath_sdma_lock */
+static inline u16 ipath_sdma_descq_freecnt(const struct ipath_devdata *dd)
+{
+	return dd->ipath_sdma_descq_cnt -
+		(dd->ipath_sdma_descq_added - dd->ipath_sdma_descq_removed) -
+		1 - dd->ipath_sdma_desc_nreserved;
+}
+
+static inline void ipath_sdma_desc_reserve(struct ipath_devdata *dd, u16 cnt)
+{
+	dd->ipath_sdma_desc_nreserved += cnt;
+}
+
+static inline void ipath_sdma_desc_unreserve(struct ipath_devdata *dd, u16 cnt)
+{
+	dd->ipath_sdma_desc_nreserved -= cnt;
+}
+
 /*
  * number of words used for protocol header if not set by ipath_userinit();
  */
@@ -926,8 +1078,7 @@ void ipath_write_kreg_port(const struct ipath_devdata *, ipath_kreg,
 
 /*
  * At the moment, none of the s-registers are writable, so no
- * ipath_write_sreg(), and none of the c-registers are writable, so no
- * ipath_write_creg().
+ * ipath_write_sreg().
  */
 
 /**
@@ -1124,6 +1275,7 @@ int ipathfs_remove_device(struct ipath_devdata *);
 dma_addr_t ipath_map_page(struct pci_dev *, struct page *, unsigned long,
 			  size_t, int);
 dma_addr_t ipath_map_single(struct pci_dev *, void *, size_t, int);
+const char *ipath_get_unit_name(int unit);
 
 /*
  * Flush write combining store buffers (if present) and perform a write
@@ -1138,11 +1290,6 @@ dma_addr_t ipath_map_single(struct pci_dev *, void *, size_t, int);
 extern unsigned ipath_debug; /* debugging bit mask */
 extern unsigned ipath_linkrecovery;
 extern unsigned ipath_mtu4096;
-
-#define IPATH_MAX_PARITY_ATTEMPTS 10000 /* max times to try recovery */
-
-const char *ipath_get_unit_name(int unit);
-
 extern struct mutex ipath_mutex;
 
 #define IPATH_DRV_NAME		"ib_ipath"
diff --git a/drivers/infiniband/hw/ipath/ipath_registers.h b/drivers/infiniband/hw/ipath/ipath_registers.h
index b7d87d3..8f44d0c 100644
--- a/drivers/infiniband/hw/ipath/ipath_registers.h
+++ b/drivers/infiniband/hw/ipath/ipath_registers.h
@@ -73,56 +73,82 @@
 #define IPATH_S_PIOINTBUFAVAIL	1
 #define IPATH_S_PIOBUFAVAILUPD	2
 #define IPATH_S_PIOENABLE	3
+#define IPATH_S_SDMAINTENABLE	9
+#define IPATH_S_SDMASINGLEDESCRIPTOR	10
+#define IPATH_S_SDMAENABLE	11
+#define IPATH_S_SDMAHALT	12
 #define IPATH_S_DISARM		31
 
 #define INFINIPATH_S_ABORT		(1U << IPATH_S_ABORT)
 #define INFINIPATH_S_PIOINTBUFAVAIL	(1U << IPATH_S_PIOINTBUFAVAIL)
 #define INFINIPATH_S_PIOBUFAVAILUPD	(1U << IPATH_S_PIOBUFAVAILUPD)
 #define INFINIPATH_S_PIOENABLE		(1U << IPATH_S_PIOENABLE)
+#define INFINIPATH_S_SDMAINTENABLE	(1U << IPATH_S_SDMAINTENABLE)
+#define INFINIPATH_S_SDMASINGLEDESCRIPTOR \
+					(1U << IPATH_S_SDMASINGLEDESCRIPTOR)
+#define INFINIPATH_S_SDMAENABLE		(1U << IPATH_S_SDMAENABLE)
+#define INFINIPATH_S_SDMAHALT		(1U << IPATH_S_SDMAHALT)
 #define INFINIPATH_S_DISARM		(1U << IPATH_S_DISARM)
 
-/* kr_rcvctrl bits */
+/* kr_rcvctrl bits that are the same on multiple chips */
 #define INFINIPATH_R_PORTENABLE_SHIFT 0
 #define INFINIPATH_R_QPMAP_ENABLE (1ULL << 38)
 
 /* kr_intstatus, kr_intclear, kr_intmask bits */
-#define INFINIPATH_I_ERROR        0x80000000
-#define INFINIPATH_I_SPIOSENT     0x40000000
-#define INFINIPATH_I_SPIOBUFAVAIL 0x20000000
-#define INFINIPATH_I_GPIO         0x10000000
+#define INFINIPATH_I_SDMAINT		0x8000000000000000ULL
+#define INFINIPATH_I_SDMADISABLED	0x4000000000000000ULL
+#define INFINIPATH_I_ERROR		0x0000000080000000ULL
+#define INFINIPATH_I_SPIOSENT		0x0000000040000000ULL
+#define INFINIPATH_I_SPIOBUFAVAIL	0x0000000020000000ULL
+#define INFINIPATH_I_GPIO		0x0000000010000000ULL
+#define INFINIPATH_I_JINT		0x0000000004000000ULL
 
 /* kr_errorstatus, kr_errorclear, kr_errormask bits */
-#define INFINIPATH_E_RFORMATERR      0x0000000000000001ULL
-#define INFINIPATH_E_RVCRC           0x0000000000000002ULL
-#define INFINIPATH_E_RICRC           0x0000000000000004ULL
-#define INFINIPATH_E_RMINPKTLEN      0x0000000000000008ULL
-#define INFINIPATH_E_RMAXPKTLEN      0x0000000000000010ULL
-#define INFINIPATH_E_RLONGPKTLEN     0x0000000000000020ULL
-#define INFINIPATH_E_RSHORTPKTLEN    0x0000000000000040ULL
-#define INFINIPATH_E_RUNEXPCHAR      0x0000000000000080ULL
-#define INFINIPATH_E_RUNSUPVL        0x0000000000000100ULL
-#define INFINIPATH_E_REBP            0x0000000000000200ULL
-#define INFINIPATH_E_RIBFLOW         0x0000000000000400ULL
-#define INFINIPATH_E_RBADVERSION     0x0000000000000800ULL
-#define INFINIPATH_E_RRCVEGRFULL     0x0000000000001000ULL
-#define INFINIPATH_E_RRCVHDRFULL     0x0000000000002000ULL
-#define INFINIPATH_E_RBADTID         0x0000000000004000ULL
-#define INFINIPATH_E_RHDRLEN         0x0000000000008000ULL
-#define INFINIPATH_E_RHDR            0x0000000000010000ULL
-#define INFINIPATH_E_RIBLOSTLINK     0x0000000000020000ULL
-#define INFINIPATH_E_SMINPKTLEN      0x0000000020000000ULL
-#define INFINIPATH_E_SMAXPKTLEN      0x0000000040000000ULL
-#define INFINIPATH_E_SUNDERRUN       0x0000000080000000ULL
-#define INFINIPATH_E_SPKTLEN         0x0000000100000000ULL
-#define INFINIPATH_E_SDROPPEDSMPPKT  0x0000000200000000ULL
-#define INFINIPATH_E_SDROPPEDDATAPKT 0x0000000400000000ULL
-#define INFINIPATH_E_SPIOARMLAUNCH   0x0000000800000000ULL
-#define INFINIPATH_E_SUNEXPERRPKTNUM 0x0000001000000000ULL
-#define INFINIPATH_E_SUNSUPVL        0x0000002000000000ULL
-#define INFINIPATH_E_IBSTATUSCHANGED 0x0001000000000000ULL
-#define INFINIPATH_E_INVALIDADDR     0x0002000000000000ULL
-#define INFINIPATH_E_RESET           0x0004000000000000ULL
-#define INFINIPATH_E_HARDWARE        0x0008000000000000ULL
+#define INFINIPATH_E_RFORMATERR			0x0000000000000001ULL
+#define INFINIPATH_E_RVCRC			0x0000000000000002ULL
+#define INFINIPATH_E_RICRC			0x0000000000000004ULL
+#define INFINIPATH_E_RMINPKTLEN			0x0000000000000008ULL
+#define INFINIPATH_E_RMAXPKTLEN			0x0000000000000010ULL
+#define INFINIPATH_E_RLONGPKTLEN		0x0000000000000020ULL
+#define INFINIPATH_E_RSHORTPKTLEN		0x0000000000000040ULL
+#define INFINIPATH_E_RUNEXPCHAR			0x0000000000000080ULL
+#define INFINIPATH_E_RUNSUPVL			0x0000000000000100ULL
+#define INFINIPATH_E_REBP			0x0000000000000200ULL
+#define INFINIPATH_E_RIBFLOW			0x0000000000000400ULL
+#define INFINIPATH_E_RBADVERSION		0x0000000000000800ULL
+#define INFINIPATH_E_RRCVEGRFULL		0x0000000000001000ULL
+#define INFINIPATH_E_RRCVHDRFULL		0x0000000000002000ULL
+#define INFINIPATH_E_RBADTID			0x0000000000004000ULL
+#define INFINIPATH_E_RHDRLEN			0x0000000000008000ULL
+#define INFINIPATH_E_RHDR			0x0000000000010000ULL
+#define INFINIPATH_E_RIBLOSTLINK		0x0000000000020000ULL
+#define INFINIPATH_E_SENDSPECIALTRIGGER		0x0000000008000000ULL
+#define INFINIPATH_E_SDMADISABLED		0x0000000010000000ULL
+#define INFINIPATH_E_SMINPKTLEN			0x0000000020000000ULL
+#define INFINIPATH_E_SMAXPKTLEN			0x0000000040000000ULL
+#define INFINIPATH_E_SUNDERRUN			0x0000000080000000ULL
+#define INFINIPATH_E_SPKTLEN			0x0000000100000000ULL
+#define INFINIPATH_E_SDROPPEDSMPPKT		0x0000000200000000ULL
+#define INFINIPATH_E_SDROPPEDDATAPKT		0x0000000400000000ULL
+#define INFINIPATH_E_SPIOARMLAUNCH		0x0000000800000000ULL
+#define INFINIPATH_E_SUNEXPERRPKTNUM		0x0000001000000000ULL
+#define INFINIPATH_E_SUNSUPVL			0x0000002000000000ULL
+#define INFINIPATH_E_SENDBUFMISUSE		0x0000004000000000ULL
+#define INFINIPATH_E_SDMAGENMISMATCH		0x0000008000000000ULL
+#define INFINIPATH_E_SDMAOUTOFBOUND		0x0000010000000000ULL
+#define INFINIPATH_E_SDMATAILOUTOFBOUND		0x0000020000000000ULL
+#define INFINIPATH_E_SDMABASE			0x0000040000000000ULL
+#define INFINIPATH_E_SDMA1STDESC		0x0000080000000000ULL
+#define INFINIPATH_E_SDMARPYTAG			0x0000100000000000ULL
+#define INFINIPATH_E_SDMADWEN			0x0000200000000000ULL
+#define INFINIPATH_E_SDMAMISSINGDW		0x0000400000000000ULL
+#define INFINIPATH_E_SDMAUNEXPDATA		0x0000800000000000ULL
+#define INFINIPATH_E_IBSTATUSCHANGED		0x0001000000000000ULL
+#define INFINIPATH_E_INVALIDADDR		0x0002000000000000ULL
+#define INFINIPATH_E_RESET			0x0004000000000000ULL
+#define INFINIPATH_E_HARDWARE			0x0008000000000000ULL
+#define INFINIPATH_E_SDMADESCADDRMISALIGN	0x0010000000000000ULL
+#define INFINIPATH_E_INVALIDEEPCMD		0x0020000000000000ULL
 
 /*
  * this is used to print "common" packet errors only when the
@@ -133,6 +159,17 @@
 		| INFINIPATH_E_RICRC | INFINIPATH_E_RSHORTPKTLEN \
 		| INFINIPATH_E_REBP )
 
+/* Convenience for decoding Send DMA errors */
+#define INFINIPATH_E_SDMAERRS ( \
+	INFINIPATH_E_SDMAGENMISMATCH | INFINIPATH_E_SDMAOUTOFBOUND | \
+	INFINIPATH_E_SDMATAILOUTOFBOUND | INFINIPATH_E_SDMABASE | \
+	INFINIPATH_E_SDMA1STDESC | INFINIPATH_E_SDMARPYTAG | \
+	INFINIPATH_E_SDMADWEN | INFINIPATH_E_SDMAMISSINGDW | \
+	INFINIPATH_E_SDMAUNEXPDATA | \
+	INFINIPATH_E_SDMADESCADDRMISALIGN | \
+	INFINIPATH_E_SDMADISABLED | \
+	INFINIPATH_E_SENDBUFMISUSE)
+
 /* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
 /* TXEMEMPARITYERR bit 0: PIObuf, 1: PIOpbc, 2: launchfifo
  * RXEMEMPARITYERR bit 0: rcvbuf, 1: lookupq, 2:  expTID, 3: eagerTID
@@ -157,7 +194,7 @@
 #define INFINIPATH_HWE_RXEMEMPARITYERR_HDRINFO  0x40ULL
 /* waldo specific -- find the rest in ipath_6110.c */
 #define INFINIPATH_HWE_RXDSYNCMEMPARITYERR  0x0000000400000000ULL
-/* monty specific -- find the rest in ipath_6120.c */
+/* 6120/7220 specific -- find the rest in ipath_6120.c and ipath_7220.c */
 #define INFINIPATH_HWE_MEMBISTFAILED	0x0040000000000000ULL
 
 /* kr_hwdiagctrl bits */
@@ -202,7 +239,7 @@
 /* kr_ibcstatus bits */
 #define INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT 0
 #define INFINIPATH_IBCS_LINKSTATE_MASK 0x7
-#define INFINIPATH_IBCS_LINKSTATE_SHIFT 4
+
 #define INFINIPATH_IBCS_TXREADY       0x40000000
 #define INFINIPATH_IBCS_TXCREDITOK    0x80000000
 /* link training states (shift by
@@ -267,7 +304,7 @@
 /* L1 Power down; use with RXDETECT, Otherwise not used on IB side */
 #define INFINIPATH_SERDC0_L1PWR_DN	 0xF0ULL
 
-/* kr_xgxsconfig bits */
+/* common kr_xgxsconfig bits (or safe in all, even if not implemented) */
 #define INFINIPATH_XGXS_RX_POL_SHIFT 19
 #define INFINIPATH_XGXS_RX_POL_MASK 0xfULL
 
@@ -397,6 +434,29 @@ struct ipath_kregs {
 	ipath_kreg kr_pcieq1serdesconfig0;
 	ipath_kreg kr_pcieq1serdesconfig1;
 	ipath_kreg kr_pcieq1serdesstatus;
+	ipath_kreg kr_hrtbt_guid;
+	ipath_kreg kr_ibcddrctrl;
+	ipath_kreg kr_ibcddrstatus;
+	ipath_kreg kr_jintreload;
+
+	/* send dma related regs */
+	ipath_kreg kr_senddmabase;
+	ipath_kreg kr_senddmalengen;
+	ipath_kreg kr_senddmatail;
+	ipath_kreg kr_senddmahead;
+	ipath_kreg kr_senddmaheadaddr;
+	ipath_kreg kr_senddmabufmask0;
+	ipath_kreg kr_senddmabufmask1;
+	ipath_kreg kr_senddmabufmask2;
+	ipath_kreg kr_senddmastatus;
+
+	/* SerDes related regs (IBA7220-only) */
+	ipath_kreg kr_ibserdesctrl;
+	ipath_kreg kr_ib_epbacc;
+	ipath_kreg kr_ib_epbtrans;
+	ipath_kreg kr_pcie_epbacc;
+	ipath_kreg kr_pcie_epbtrans;
+	ipath_kreg kr_ib_ddsrxeq;
 };
 
 struct ipath_cregs {
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 3d59736..056e741 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -138,6 +138,11 @@ struct ipath_ib_header {
 	} u;
 } __attribute__ ((packed));
 
+struct ipath_pio_header {
+	__le32 pbc[2];
+	struct ipath_ib_header hdr;
+} __attribute__ ((packed));
+
 /*
  * There is one struct ipath_mcast for each multicast GID.
  * All attached QPs are then stored as a list of
@@ -319,6 +324,7 @@ struct ipath_sge_state {
 	struct ipath_sge *sg_list;      /* next SGE to be used if any */
 	struct ipath_sge sge;   /* progress state for the current SGE */
 	u8 num_sge;
+	u8 static_rate;
 };
 
 /*
@@ -356,6 +362,7 @@ struct ipath_qp {
 	struct tasklet_struct s_task;
 	struct ipath_mmap_info *ip;
 	struct ipath_sge_state *s_cur_sge;
+	struct ipath_verbs_txreq *s_tx;
 	struct ipath_sge_state s_sge;	/* current send request data */
 	struct ipath_ack_entry s_ack_queue[IPATH_MAX_RDMA_ATOMIC + 1];
 	struct ipath_sge_state s_ack_rdma_sge;
@@ -363,7 +370,8 @@ struct ipath_qp {
 	struct ipath_sge_state r_sge;	/* current receive data */
 	spinlock_t s_lock;
 	unsigned long s_busy;
-	u32 s_hdrwords;		/* size of s_hdr in 32 bit words */
+	u16 s_pkt_delay;
+	u16 s_hdrwords;		/* size of s_hdr in 32 bit words */
 	u32 s_cur_size;		/* size of send packet in bytes */
 	u32 s_len;		/* total length of s_sge */
 	u32 s_rdma_read_len;	/* total length of s_rdma_read_sge */
@@ -387,7 +395,6 @@ struct ipath_qp {
 	u8 r_nak_state;		/* non-zero if NAK is pending */
 	u8 r_min_rnr_timer;	/* retry timeout value for RNR NAKs */
 	u8 r_reuse_sge;		/* for UC receive errors */
-	u8 r_sge_inx;		/* current index into sg_list */
 	u8 r_wrid_valid;	/* r_wrid set but CQ entry not yet made */
 	u8 r_max_rd_atomic;	/* max number of RDMA read/atomic to receive */
 	u8 r_head_ack_queue;	/* index into s_ack_queue[] */
@@ -403,6 +410,7 @@ struct ipath_qp {
 	u8 s_num_rd_atomic;	/* number of RDMA read/atomic pending */
 	u8 s_tail_ack_queue;	/* index into s_ack_queue[] */
 	u8 s_flags;
+	u8 s_dmult;
 	u8 timeout;		/* Timeout for this QP */
 	enum ib_mtu path_mtu;
 	u32 remote_qpn;
@@ -510,6 +518,8 @@ struct ipath_ibdev {
 	struct ipath_lkey_table lk_table;
 	struct list_head pending[3];	/* FIFO of QPs waiting for ACKs */
 	struct list_head piowait;	/* list for wait PIO buf */
+	struct list_head txreq_free;
+	void *txreq_bufs;
 	/* list of QPs waiting for RNR timer */
 	struct list_head rnrwait;
 	spinlock_t pending_lock;
@@ -570,6 +580,7 @@ struct ipath_ibdev {
 	u32 n_rdma_dup_busy;
 	u32 n_piowait;
 	u32 n_no_piobuf;
+	u32 n_unaligned;
 	u32 port_cap_flags;
 	u32 pma_sample_start;
 	u32 pma_sample_interval;
@@ -581,7 +592,6 @@ struct ipath_ibdev {
 	u16 pending_index;	/* which pending queue is active */
 	u8 pma_sample_status;
 	u8 subnet_timeout;
-	u8 link_width_enabled;
 	u8 vl_high_limit;
 	struct ipath_opcode_stats opstats[128];
 };
@@ -602,6 +612,16 @@ struct ipath_verbs_counters {
 	u32 vl15_dropped;
 };
 
+struct ipath_verbs_txreq {
+	struct ipath_qp         *qp;
+	struct ipath_swqe       *wqe;
+	u32                      map_len;
+	u32                      len;
+	struct ipath_sge_state  *ss;
+	struct ipath_pio_header  hdr;
+	struct ipath_sdma_txreq  txreq;
+};
+
 static inline struct ipath_mr *to_imr(struct ib_mr *ibmr)
 {
 	return container_of(ibmr, struct ipath_mr, ibmr);
@@ -694,11 +714,13 @@ void ipath_sqerror_qp(struct ipath_qp *qp, struct ib_wc *wc);
 
 void ipath_get_credit(struct ipath_qp *qp, u32 aeth);
 
+unsigned ipath_ib_rate_to_mult(enum ib_rate rate);
+
+enum ib_rate ipath_mult_to_ib_rate(unsigned mult);
+
 int ipath_verbs_send(struct ipath_qp *qp, struct ipath_ib_header *hdr,
 		     u32 hdrwords, struct ipath_sge_state *ss, u32 len);
 
-void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);
-
 void ipath_copy_sge(struct ipath_sge_state *ss, void *data, u32 length);
 
 void ipath_skip_sge(struct ipath_sge_state *ss, u32 length);


From ralph.campbell at qlogic.com  Wed Apr  2 15:49:57 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:49:57 -0700
Subject: [ofa-general] [PATCH 11/20] IB/ipath - isolate 7220-specific content
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402224957.28598.51916.stgit@eng-46.mv.qlogic.com>

From: Michael Albaugh <Michael.Albaugh at Qlogic.com>

This patch adds a new ASIC-specific header file for the HCAs
using the IBA7220.

Signed-off-by: Michael Albaugh <Michael.Albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_7220.h |   57 ++++++++++++++++++++++++++++++
 1 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_7220.h b/drivers/infiniband/hw/ipath/ipath_7220.h
new file mode 100644
index 0000000..74fa5cc
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_7220.h
@@ -0,0 +1,57 @@
+#ifndef _IPATH_7220_H
+#define _IPATH_7220_H
+/*
+ * Copyright (c) 2007 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This header file provides the declarations and common definitions
+ * for (mostly) manipulation of the SerDes blocks within the IBA7220.
+ * the functions declared should only be called from within other
+ * 7220-related files such as ipath_iba7220.c or ipath_sd7220.c.
+ */
+int ipath_sd7220_presets(struct ipath_devdata *dd);
+int ipath_sd7220_init(struct ipath_devdata *dd, int was_reset);
+int ipath_sd7220_prog_ld(struct ipath_devdata *dd, int sdnum, u8 *img,
+	int len, int offset);
+int ipath_sd7220_prog_vfy(struct ipath_devdata *dd, int sdnum, const u8 *img,
+	int len, int offset);
+/*
+ * Below used for sdnum parameter, selecting one of the two sections
+ * used for PCIe, or the single SerDes used for IB, which is the
+ * only one currently used
+ */
+#define IB_7220_SERDES 2
+
+int ipath_sd7220_ib_load(struct ipath_devdata *dd);
+int ipath_sd7220_ib_vfy(struct ipath_devdata *dd);
+
+#endif /* _IPATH_7220_H */


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:02 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:02 -0700
Subject: [ofa-general] [PATCH 12/20] IB/ipath - HCA specific code to support
	IBA7220
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225002.28598.23449.stgit@eng-46.mv.qlogic.com>

This patch adds the HCA specific code for the IBA7220 HCA.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_iba7220.c | 2571 +++++++++++++++++++++++++++
 1 files changed, 2571 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_iba7220.c b/drivers/infiniband/hw/ipath/ipath_iba7220.c
new file mode 100644
index 0000000..1b2de2c
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_iba7220.c
@@ -0,0 +1,2571 @@
+/*
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/*
+ * This file contains all of the code that is specific to the
+ * InfiniPath 7220 chip (except that specific to the SerDes)
+ */
+
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/delay.h>
+#include <linux/io.h>
+#include <rdma/ib_verbs.h>
+
+#include "ipath_kernel.h"
+#include "ipath_registers.h"
+#include "ipath_7220.h"
+
+static void ipath_setup_7220_setextled(struct ipath_devdata *, u64, u64);
+
+static unsigned ipath_compat_ddr_negotiate = 1;
+
+module_param_named(compat_ddr_negotiate, ipath_compat_ddr_negotiate, uint,
+			S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(compat_ddr_negotiate,
+		"Attempt pre-IBTA 1.2 DDR speed negotiation");
+
+static unsigned ipath_sdma_fetch_arb = 1;
+module_param_named(fetch_arb, ipath_sdma_fetch_arb, uint, S_IRUGO);
+MODULE_PARM_DESC(fetch_arb, "IBA7220: change SDMA descriptor arbitration");
+
+/*
+ * This file contains almost all the chip-specific register information and
+ * access functions for the QLogic InfiniPath 7220 PCI-Express chip, with the
+ * exception of SerDes support, which in in ipath_sd7220.c.
+ *
+ * This lists the InfiniPath registers, in the actual chip layout.
+ * This structure should never be directly accessed.
+ */
+struct _infinipath_do_not_use_kernel_regs {
+	unsigned long long Revision;
+	unsigned long long Control;
+	unsigned long long PageAlign;
+	unsigned long long PortCnt;
+	unsigned long long DebugPortSelect;
+	unsigned long long DebugSigsIntSel; /* was Reserved0;*/
+	unsigned long long SendRegBase;
+	unsigned long long UserRegBase;
+	unsigned long long CounterRegBase;
+	unsigned long long Scratch;
+	unsigned long long EEPROMAddrCmd; /* was Reserved1; */
+	unsigned long long EEPROMData; /* was Reserved2; */
+	unsigned long long IntBlocked;
+	unsigned long long IntMask;
+	unsigned long long IntStatus;
+	unsigned long long IntClear;
+	unsigned long long ErrorMask;
+	unsigned long long ErrorStatus;
+	unsigned long long ErrorClear;
+	unsigned long long HwErrMask;
+	unsigned long long HwErrStatus;
+	unsigned long long HwErrClear;
+	unsigned long long HwDiagCtrl;
+	unsigned long long MDIO;
+	unsigned long long IBCStatus;
+	unsigned long long IBCCtrl;
+	unsigned long long ExtStatus;
+	unsigned long long ExtCtrl;
+	unsigned long long GPIOOut;
+	unsigned long long GPIOMask;
+	unsigned long long GPIOStatus;
+	unsigned long long GPIOClear;
+	unsigned long long RcvCtrl;
+	unsigned long long RcvBTHQP;
+	unsigned long long RcvHdrSize;
+	unsigned long long RcvHdrCnt;
+	unsigned long long RcvHdrEntSize;
+	unsigned long long RcvTIDBase;
+	unsigned long long RcvTIDCnt;
+	unsigned long long RcvEgrBase;
+	unsigned long long RcvEgrCnt;
+	unsigned long long RcvBufBase;
+	unsigned long long RcvBufSize;
+	unsigned long long RxIntMemBase;
+	unsigned long long RxIntMemSize;
+	unsigned long long RcvPartitionKey;
+	unsigned long long RcvQPMulticastPort;
+	unsigned long long RcvPktLEDCnt;
+	unsigned long long IBCDDRCtrl;
+	unsigned long long HRTBT_GUID;
+	unsigned long long IB_SDTEST_IF_TX;
+	unsigned long long IB_SDTEST_IF_RX;
+	unsigned long long IBCDDRCtrl2;
+	unsigned long long IBCDDRStatus;
+	unsigned long long JIntReload;
+	unsigned long long IBNCModeCtrl;
+	unsigned long long SendCtrl;
+	unsigned long long SendBufBase;
+	unsigned long long SendBufSize;
+	unsigned long long SendBufCnt;
+	unsigned long long SendAvailAddr;
+	unsigned long long TxIntMemBase;
+	unsigned long long TxIntMemSize;
+	unsigned long long SendDmaBase;
+	unsigned long long SendDmaLenGen;
+	unsigned long long SendDmaTail;
+	unsigned long long SendDmaHead;
+	unsigned long long SendDmaHeadAddr;
+	unsigned long long SendDmaBufMask0;
+	unsigned long long SendDmaBufMask1;
+	unsigned long long SendDmaBufMask2;
+	unsigned long long SendDmaStatus;
+	unsigned long long SendBufferError;
+	unsigned long long SendBufferErrorCONT1;
+	unsigned long long SendBufErr2; /* was Reserved6SBE[0/6] */
+	unsigned long long Reserved6L[2];
+	unsigned long long AvailUpdCount;
+	unsigned long long RcvHdrAddr0;
+	unsigned long long RcvHdrAddrs[16]; /* Why enumerate? */
+	unsigned long long Reserved7hdtl; /* Align next to 300 */
+	unsigned long long RcvHdrTailAddr0; /* 300, like others */
+	unsigned long long RcvHdrTailAddrs[16];
+	unsigned long long Reserved9SW[7]; /* was [8]; we have 17 ports */
+	unsigned long long IbsdEpbAccCtl; /* IB Serdes EPB access control */
+	unsigned long long IbsdEpbTransReg; /* IB Serdes EPB Transaction */
+	unsigned long long Reserved10sds; /* was SerdesStatus on */
+	unsigned long long XGXSConfig;
+	unsigned long long IBSerDesCtrl; /* Was IBPLLCfg on Monty */
+	unsigned long long EEPCtlStat; /* for "boot" EEPROM/FLASH */
+	unsigned long long EEPAddrCmd;
+	unsigned long long EEPData;
+	unsigned long long PcieEpbAccCtl;
+	unsigned long long PcieEpbTransCtl;
+	unsigned long long EfuseCtl; /* E-Fuse control */
+	unsigned long long EfuseData[4];
+	unsigned long long ProcMon;
+	/* this chip moves following two from previous 200, 208 */
+	unsigned long long PCIeRBufTestReg0;
+	unsigned long long PCIeRBufTestReg1;
+	/* added for this chip */
+	unsigned long long PCIeRBufTestReg2;
+	unsigned long long PCIeRBufTestReg3;
+	/* added for this chip, debug only */
+	unsigned long long SPC_JTAG_ACCESS_REG;
+	unsigned long long LAControlReg;
+	unsigned long long GPIODebugSelReg;
+	unsigned long long DebugPortValueReg;
+	/* added for this chip, DMA */
+	unsigned long long SendDmaBufUsed[3];
+	unsigned long long SendDmaReqTagUsed;
+	/*
+	 * added for this chip, EFUSE: note that these program 64-bit
+	 * words 2 and 3 */
+	unsigned long long efuse_pgm_data[2];
+	unsigned long long Reserved11LAalign[10]; /* Skip 4B0..4F8 */
+	/* we have 30 regs for DDS and RXEQ in IB SERDES */
+	unsigned long long SerDesDDSRXEQ[30];
+	unsigned long long Reserved12LAalign[2]; /* Skip 5F0, 5F8 */
+	/* added for LA debug support */
+	unsigned long long LAMemory[32];
+};
+
+struct _infinipath_do_not_use_counters {
+	__u64 LBIntCnt;
+	__u64 LBFlowStallCnt;
+	__u64 TxSDmaDescCnt;	/* was Reserved1 */
+	__u64 TxUnsupVLErrCnt;
+	__u64 TxDataPktCnt;
+	__u64 TxFlowPktCnt;
+	__u64 TxDwordCnt;
+	__u64 TxLenErrCnt;
+	__u64 TxMaxMinLenErrCnt;
+	__u64 TxUnderrunCnt;
+	__u64 TxFlowStallCnt;
+	__u64 TxDroppedPktCnt;
+	__u64 RxDroppedPktCnt;
+	__u64 RxDataPktCnt;
+	__u64 RxFlowPktCnt;
+	__u64 RxDwordCnt;
+	__u64 RxLenErrCnt;
+	__u64 RxMaxMinLenErrCnt;
+	__u64 RxICRCErrCnt;
+	__u64 RxVCRCErrCnt;
+	__u64 RxFlowCtrlErrCnt;
+	__u64 RxBadFormatCnt;
+	__u64 RxLinkProblemCnt;
+	__u64 RxEBPCnt;
+	__u64 RxLPCRCErrCnt;
+	__u64 RxBufOvflCnt;
+	__u64 RxTIDFullErrCnt;
+	__u64 RxTIDValidErrCnt;
+	__u64 RxPKeyMismatchCnt;
+	__u64 RxP0HdrEgrOvflCnt;
+	__u64 RxP1HdrEgrOvflCnt;
+	__u64 RxP2HdrEgrOvflCnt;
+	__u64 RxP3HdrEgrOvflCnt;
+	__u64 RxP4HdrEgrOvflCnt;
+	__u64 RxP5HdrEgrOvflCnt;
+	__u64 RxP6HdrEgrOvflCnt;
+	__u64 RxP7HdrEgrOvflCnt;
+	__u64 RxP8HdrEgrOvflCnt;
+	__u64 RxP9HdrEgrOvflCnt;	/* was Reserved6 */
+	__u64 RxP10HdrEgrOvflCnt;	/* was Reserved7 */
+	__u64 RxP11HdrEgrOvflCnt;	/* new for IBA7220 */
+	__u64 RxP12HdrEgrOvflCnt;	/* new for IBA7220 */
+	__u64 RxP13HdrEgrOvflCnt;	/* new for IBA7220 */
+	__u64 RxP14HdrEgrOvflCnt;	/* new for IBA7220 */
+	__u64 RxP15HdrEgrOvflCnt;	/* new for IBA7220 */
+	__u64 RxP16HdrEgrOvflCnt;	/* new for IBA7220 */
+	__u64 IBStatusChangeCnt;
+	__u64 IBLinkErrRecoveryCnt;
+	__u64 IBLinkDownedCnt;
+	__u64 IBSymbolErrCnt;
+	/* The following are new for IBA7220 */
+	__u64 RxVL15DroppedPktCnt;
+	__u64 RxOtherLocalPhyErrCnt;
+	__u64 PcieRetryBufDiagQwordCnt;
+	__u64 ExcessBufferOvflCnt;
+	__u64 LocalLinkIntegrityErrCnt;
+	__u64 RxVlErrCnt;
+	__u64 RxDlidFltrCnt;
+	__u64 Reserved8[7];
+	__u64 PSStat;
+	__u64 PSStart;
+	__u64 PSInterval;
+	__u64 PSRcvDataCount;
+	__u64 PSRcvPktsCount;
+	__u64 PSXmitDataCount;
+	__u64 PSXmitPktsCount;
+	__u64 PSXmitWaitCount;
+};
+
+#define IPATH_KREG_OFFSET(field) (offsetof( \
+	struct _infinipath_do_not_use_kernel_regs, field) / sizeof(u64))
+#define IPATH_CREG_OFFSET(field) (offsetof( \
+	struct _infinipath_do_not_use_counters, field) / sizeof(u64))
+
+static const struct ipath_kregs ipath_7220_kregs = {
+	.kr_control = IPATH_KREG_OFFSET(Control),
+	.kr_counterregbase = IPATH_KREG_OFFSET(CounterRegBase),
+	.kr_debugportselect = IPATH_KREG_OFFSET(DebugPortSelect),
+	.kr_errorclear = IPATH_KREG_OFFSET(ErrorClear),
+	.kr_errormask = IPATH_KREG_OFFSET(ErrorMask),
+	.kr_errorstatus = IPATH_KREG_OFFSET(ErrorStatus),
+	.kr_extctrl = IPATH_KREG_OFFSET(ExtCtrl),
+	.kr_extstatus = IPATH_KREG_OFFSET(ExtStatus),
+	.kr_gpio_clear = IPATH_KREG_OFFSET(GPIOClear),
+	.kr_gpio_mask = IPATH_KREG_OFFSET(GPIOMask),
+	.kr_gpio_out = IPATH_KREG_OFFSET(GPIOOut),
+	.kr_gpio_status = IPATH_KREG_OFFSET(GPIOStatus),
+	.kr_hwdiagctrl = IPATH_KREG_OFFSET(HwDiagCtrl),
+	.kr_hwerrclear = IPATH_KREG_OFFSET(HwErrClear),
+	.kr_hwerrmask = IPATH_KREG_OFFSET(HwErrMask),
+	.kr_hwerrstatus = IPATH_KREG_OFFSET(HwErrStatus),
+	.kr_ibcctrl = IPATH_KREG_OFFSET(IBCCtrl),
+	.kr_ibcstatus = IPATH_KREG_OFFSET(IBCStatus),
+	.kr_intblocked = IPATH_KREG_OFFSET(IntBlocked),
+	.kr_intclear = IPATH_KREG_OFFSET(IntClear),
+	.kr_intmask = IPATH_KREG_OFFSET(IntMask),
+	.kr_intstatus = IPATH_KREG_OFFSET(IntStatus),
+	.kr_mdio = IPATH_KREG_OFFSET(MDIO),
+	.kr_pagealign = IPATH_KREG_OFFSET(PageAlign),
+	.kr_partitionkey = IPATH_KREG_OFFSET(RcvPartitionKey),
+	.kr_portcnt = IPATH_KREG_OFFSET(PortCnt),
+	.kr_rcvbthqp = IPATH_KREG_OFFSET(RcvBTHQP),
+	.kr_rcvbufbase = IPATH_KREG_OFFSET(RcvBufBase),
+	.kr_rcvbufsize = IPATH_KREG_OFFSET(RcvBufSize),
+	.kr_rcvctrl = IPATH_KREG_OFFSET(RcvCtrl),
+	.kr_rcvegrbase = IPATH_KREG_OFFSET(RcvEgrBase),
+	.kr_rcvegrcnt = IPATH_KREG_OFFSET(RcvEgrCnt),
+	.kr_rcvhdrcnt = IPATH_KREG_OFFSET(RcvHdrCnt),
+	.kr_rcvhdrentsize = IPATH_KREG_OFFSET(RcvHdrEntSize),
+	.kr_rcvhdrsize = IPATH_KREG_OFFSET(RcvHdrSize),
+	.kr_rcvintmembase = IPATH_KREG_OFFSET(RxIntMemBase),
+	.kr_rcvintmemsize = IPATH_KREG_OFFSET(RxIntMemSize),
+	.kr_rcvtidbase = IPATH_KREG_OFFSET(RcvTIDBase),
+	.kr_rcvtidcnt = IPATH_KREG_OFFSET(RcvTIDCnt),
+	.kr_revision = IPATH_KREG_OFFSET(Revision),
+	.kr_scratch = IPATH_KREG_OFFSET(Scratch),
+	.kr_sendbuffererror = IPATH_KREG_OFFSET(SendBufferError),
+	.kr_sendctrl = IPATH_KREG_OFFSET(SendCtrl),
+	.kr_sendpioavailaddr = IPATH_KREG_OFFSET(SendAvailAddr),
+	.kr_sendpiobufbase = IPATH_KREG_OFFSET(SendBufBase),
+	.kr_sendpiobufcnt = IPATH_KREG_OFFSET(SendBufCnt),
+	.kr_sendpiosize = IPATH_KREG_OFFSET(SendBufSize),
+	.kr_sendregbase = IPATH_KREG_OFFSET(SendRegBase),
+	.kr_txintmembase = IPATH_KREG_OFFSET(TxIntMemBase),
+	.kr_txintmemsize = IPATH_KREG_OFFSET(TxIntMemSize),
+	.kr_userregbase = IPATH_KREG_OFFSET(UserRegBase),
+
+	.kr_xgxsconfig = IPATH_KREG_OFFSET(XGXSConfig),
+
+	/* send dma related regs */
+	.kr_senddmabase = IPATH_KREG_OFFSET(SendDmaBase),
+	.kr_senddmalengen = IPATH_KREG_OFFSET(SendDmaLenGen),
+	.kr_senddmatail = IPATH_KREG_OFFSET(SendDmaTail),
+	.kr_senddmahead = IPATH_KREG_OFFSET(SendDmaHead),
+	.kr_senddmaheadaddr = IPATH_KREG_OFFSET(SendDmaHeadAddr),
+	.kr_senddmabufmask0 = IPATH_KREG_OFFSET(SendDmaBufMask0),
+	.kr_senddmabufmask1 = IPATH_KREG_OFFSET(SendDmaBufMask1),
+	.kr_senddmabufmask2 = IPATH_KREG_OFFSET(SendDmaBufMask2),
+	.kr_senddmastatus = IPATH_KREG_OFFSET(SendDmaStatus),
+
+	/* SerDes related regs */
+	.kr_ibserdesctrl = IPATH_KREG_OFFSET(IBSerDesCtrl),
+	.kr_ib_epbacc = IPATH_KREG_OFFSET(IbsdEpbAccCtl),
+	.kr_ib_epbtrans = IPATH_KREG_OFFSET(IbsdEpbTransReg),
+	.kr_pcie_epbacc = IPATH_KREG_OFFSET(PcieEpbAccCtl),
+	.kr_pcie_epbtrans = IPATH_KREG_OFFSET(PcieEpbTransCtl),
+	.kr_ib_ddsrxeq = IPATH_KREG_OFFSET(SerDesDDSRXEQ),
+
+	/*
+	 * These should not be used directly via ipath_read_kreg64(),
+	 * use them with ipath_read_kreg64_port()
+	 */
+	.kr_rcvhdraddr = IPATH_KREG_OFFSET(RcvHdrAddr0),
+	.kr_rcvhdrtailaddr = IPATH_KREG_OFFSET(RcvHdrTailAddr0),
+
+	/*
+	 * The rcvpktled register controls one of the debug port signals, so
+	 * a packet activity LED can be connected to it.
+	 */
+	.kr_rcvpktledcnt = IPATH_KREG_OFFSET(RcvPktLEDCnt),
+	.kr_pcierbuftestreg0 = IPATH_KREG_OFFSET(PCIeRBufTestReg0),
+	.kr_pcierbuftestreg1 = IPATH_KREG_OFFSET(PCIeRBufTestReg1),
+
+	.kr_hrtbt_guid = IPATH_KREG_OFFSET(HRTBT_GUID),
+	.kr_ibcddrctrl = IPATH_KREG_OFFSET(IBCDDRCtrl),
+	.kr_ibcddrstatus = IPATH_KREG_OFFSET(IBCDDRStatus),
+	.kr_jintreload = IPATH_KREG_OFFSET(JIntReload)
+};
+
+static const struct ipath_cregs ipath_7220_cregs = {
+	.cr_badformatcnt = IPATH_CREG_OFFSET(RxBadFormatCnt),
+	.cr_erricrccnt = IPATH_CREG_OFFSET(RxICRCErrCnt),
+	.cr_errlinkcnt = IPATH_CREG_OFFSET(RxLinkProblemCnt),
+	.cr_errlpcrccnt = IPATH_CREG_OFFSET(RxLPCRCErrCnt),
+	.cr_errpkey = IPATH_CREG_OFFSET(RxPKeyMismatchCnt),
+	.cr_errrcvflowctrlcnt = IPATH_CREG_OFFSET(RxFlowCtrlErrCnt),
+	.cr_err_rlencnt = IPATH_CREG_OFFSET(RxLenErrCnt),
+	.cr_errslencnt = IPATH_CREG_OFFSET(TxLenErrCnt),
+	.cr_errtidfull = IPATH_CREG_OFFSET(RxTIDFullErrCnt),
+	.cr_errtidvalid = IPATH_CREG_OFFSET(RxTIDValidErrCnt),
+	.cr_errvcrccnt = IPATH_CREG_OFFSET(RxVCRCErrCnt),
+	.cr_ibstatuschange = IPATH_CREG_OFFSET(IBStatusChangeCnt),
+	.cr_intcnt = IPATH_CREG_OFFSET(LBIntCnt),
+	.cr_invalidrlencnt = IPATH_CREG_OFFSET(RxMaxMinLenErrCnt),
+	.cr_invalidslencnt = IPATH_CREG_OFFSET(TxMaxMinLenErrCnt),
+	.cr_lbflowstallcnt = IPATH_CREG_OFFSET(LBFlowStallCnt),
+	.cr_pktrcvcnt = IPATH_CREG_OFFSET(RxDataPktCnt),
+	.cr_pktrcvflowctrlcnt = IPATH_CREG_OFFSET(RxFlowPktCnt),
+	.cr_pktsendcnt = IPATH_CREG_OFFSET(TxDataPktCnt),
+	.cr_pktsendflowcnt = IPATH_CREG_OFFSET(TxFlowPktCnt),
+	.cr_portovflcnt = IPATH_CREG_OFFSET(RxP0HdrEgrOvflCnt),
+	.cr_rcvebpcnt = IPATH_CREG_OFFSET(RxEBPCnt),
+	.cr_rcvovflcnt = IPATH_CREG_OFFSET(RxBufOvflCnt),
+	.cr_senddropped = IPATH_CREG_OFFSET(TxDroppedPktCnt),
+	.cr_sendstallcnt = IPATH_CREG_OFFSET(TxFlowStallCnt),
+	.cr_sendunderruncnt = IPATH_CREG_OFFSET(TxUnderrunCnt),
+	.cr_wordrcvcnt = IPATH_CREG_OFFSET(RxDwordCnt),
+	.cr_wordsendcnt = IPATH_CREG_OFFSET(TxDwordCnt),
+	.cr_unsupvlcnt = IPATH_CREG_OFFSET(TxUnsupVLErrCnt),
+	.cr_rxdroppktcnt = IPATH_CREG_OFFSET(RxDroppedPktCnt),
+	.cr_iblinkerrrecovcnt = IPATH_CREG_OFFSET(IBLinkErrRecoveryCnt),
+	.cr_iblinkdowncnt = IPATH_CREG_OFFSET(IBLinkDownedCnt),
+	.cr_ibsymbolerrcnt = IPATH_CREG_OFFSET(IBSymbolErrCnt),
+	.cr_vl15droppedpktcnt = IPATH_CREG_OFFSET(RxVL15DroppedPktCnt),
+	.cr_rxotherlocalphyerrcnt =
+		IPATH_CREG_OFFSET(RxOtherLocalPhyErrCnt),
+	.cr_excessbufferovflcnt = IPATH_CREG_OFFSET(ExcessBufferOvflCnt),
+	.cr_locallinkintegrityerrcnt =
+		IPATH_CREG_OFFSET(LocalLinkIntegrityErrCnt),
+	.cr_rxvlerrcnt = IPATH_CREG_OFFSET(RxVlErrCnt),
+	.cr_rxdlidfltrcnt = IPATH_CREG_OFFSET(RxDlidFltrCnt),
+	.cr_psstat = IPATH_CREG_OFFSET(PSStat),
+	.cr_psstart = IPATH_CREG_OFFSET(PSStart),
+	.cr_psinterval = IPATH_CREG_OFFSET(PSInterval),
+	.cr_psrcvdatacount = IPATH_CREG_OFFSET(PSRcvDataCount),
+	.cr_psrcvpktscount = IPATH_CREG_OFFSET(PSRcvPktsCount),
+	.cr_psxmitdatacount = IPATH_CREG_OFFSET(PSXmitDataCount),
+	.cr_psxmitpktscount = IPATH_CREG_OFFSET(PSXmitPktsCount),
+	.cr_psxmitwaitcount = IPATH_CREG_OFFSET(PSXmitWaitCount),
+};
+
+/* kr_control bits */
+#define INFINIPATH_C_RESET (1U<<7)
+
+/* kr_intstatus, kr_intclear, kr_intmask bits */
+#define INFINIPATH_I_RCVURG_MASK ((1ULL<<17)-1)
+#define INFINIPATH_I_RCVURG_SHIFT 32
+#define INFINIPATH_I_RCVAVAIL_MASK ((1ULL<<17)-1)
+#define INFINIPATH_I_RCVAVAIL_SHIFT 0
+#define INFINIPATH_I_SERDESTRIMDONE (1ULL<<27)
+
+/* kr_hwerrclear, kr_hwerrmask, kr_hwerrstatus, bits */
+#define INFINIPATH_HWE_PCIEMEMPARITYERR_MASK  0x00000000000000ffULL
+#define INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT 0
+#define INFINIPATH_HWE_PCIEPOISONEDTLP      0x0000000010000000ULL
+#define INFINIPATH_HWE_PCIECPLTIMEOUT       0x0000000020000000ULL
+#define INFINIPATH_HWE_PCIEBUSPARITYXTLH    0x0000000040000000ULL
+#define INFINIPATH_HWE_PCIEBUSPARITYXADM    0x0000000080000000ULL
+#define INFINIPATH_HWE_PCIEBUSPARITYRADM    0x0000000100000000ULL
+#define INFINIPATH_HWE_COREPLL_FBSLIP       0x0080000000000000ULL
+#define INFINIPATH_HWE_COREPLL_RFSLIP       0x0100000000000000ULL
+#define INFINIPATH_HWE_PCIE1PLLFAILED       0x0400000000000000ULL
+#define INFINIPATH_HWE_PCIE0PLLFAILED       0x0800000000000000ULL
+#define INFINIPATH_HWE_SERDESPLLFAILED      0x1000000000000000ULL
+/* specific to this chip */
+#define INFINIPATH_HWE_PCIECPLDATAQUEUEERR         0x0000000000000040ULL
+#define INFINIPATH_HWE_PCIECPLHDRQUEUEERR          0x0000000000000080ULL
+#define INFINIPATH_HWE_SDMAMEMREADERR              0x0000000010000000ULL
+#define INFINIPATH_HWE_CLK_UC_PLLNOTLOCKED	   0x2000000000000000ULL
+#define INFINIPATH_HWE_PCIESERDESQ0PCLKNOTDETECT   0x0100000000000000ULL
+#define INFINIPATH_HWE_PCIESERDESQ1PCLKNOTDETECT   0x0200000000000000ULL
+#define INFINIPATH_HWE_PCIESERDESQ2PCLKNOTDETECT   0x0400000000000000ULL
+#define INFINIPATH_HWE_PCIESERDESQ3PCLKNOTDETECT   0x0800000000000000ULL
+#define INFINIPATH_HWE_DDSRXEQMEMORYPARITYERR	   0x0000008000000000ULL
+#define INFINIPATH_HWE_IB_UC_MEMORYPARITYERR	   0x0000004000000000ULL
+#define INFINIPATH_HWE_PCIE_UC_OCT0MEMORYPARITYERR 0x0000001000000000ULL
+#define INFINIPATH_HWE_PCIE_UC_OCT1MEMORYPARITYERR 0x0000002000000000ULL
+
+#define IBA7220_IBCS_LINKTRAININGSTATE_MASK 0x1F
+#define IBA7220_IBCS_LINKSTATE_SHIFT 5
+#define IBA7220_IBCS_LINKSPEED_SHIFT 8
+#define IBA7220_IBCS_LINKWIDTH_SHIFT 9
+
+#define IBA7220_IBCC_LINKINITCMD_MASK 0x7ULL
+#define IBA7220_IBCC_LINKCMD_SHIFT 19
+#define IBA7220_IBCC_MAXPKTLEN_SHIFT 21
+
+/* kr_ibcddrctrl bits */
+#define IBA7220_IBC_DLIDLMC_MASK	0xFFFFFFFFUL
+#define IBA7220_IBC_DLIDLMC_SHIFT	32
+#define IBA7220_IBC_HRTBT_MASK	3
+#define IBA7220_IBC_HRTBT_SHIFT	16
+#define IBA7220_IBC_HRTBT_ENB	0x10000UL
+#define IBA7220_IBC_LANE_REV_SUPPORTED (1<<8)
+#define IBA7220_IBC_LREV_MASK	1
+#define IBA7220_IBC_LREV_SHIFT	8
+#define IBA7220_IBC_RXPOL_MASK	1
+#define IBA7220_IBC_RXPOL_SHIFT	7
+#define IBA7220_IBC_WIDTH_SHIFT	5
+#define IBA7220_IBC_WIDTH_MASK	0x3
+#define IBA7220_IBC_WIDTH_1X_ONLY	(0<<IBA7220_IBC_WIDTH_SHIFT)
+#define IBA7220_IBC_WIDTH_4X_ONLY	(1<<IBA7220_IBC_WIDTH_SHIFT)
+#define IBA7220_IBC_WIDTH_AUTONEG	(2<<IBA7220_IBC_WIDTH_SHIFT)
+#define IBA7220_IBC_SPEED_AUTONEG	(1<<1)
+#define IBA7220_IBC_SPEED_SDR		(1<<2)
+#define IBA7220_IBC_SPEED_DDR		(1<<3)
+#define IBA7220_IBC_SPEED_AUTONEG_MASK  (0x7<<1)
+#define IBA7220_IBC_IBTA_1_2_MASK	(1)
+
+/* kr_ibcddrstatus */
+/* link latency shift is 0, don't bother defining */
+#define IBA7220_DDRSTAT_LINKLAT_MASK    0x3ffffff
+
+/* kr_extstatus bits */
+#define INFINIPATH_EXTS_FREQSEL 0x2
+#define INFINIPATH_EXTS_SERDESSEL 0x4
+#define INFINIPATH_EXTS_MEMBIST_ENDTEST     0x0000000000004000
+#define INFINIPATH_EXTS_MEMBIST_DISABLED    0x0000000000008000
+
+/* kr_xgxsconfig bits */
+#define INFINIPATH_XGXS_RESET          0x5ULL
+#define INFINIPATH_XGXS_FC_SAFE        (1ULL<<63)
+
+/* kr_rcvpktledcnt */
+#define IBA7220_LEDBLINK_ON_SHIFT 32 /* 4ns period on after packet */
+#define IBA7220_LEDBLINK_OFF_SHIFT 0 /* 4ns period off before next on */
+
+#define _IPATH_GPIO_SDA_NUM 1
+#define _IPATH_GPIO_SCL_NUM 0
+
+#define IPATH_GPIO_SDA (1ULL << \
+	(_IPATH_GPIO_SDA_NUM+INFINIPATH_EXTC_GPIOOE_SHIFT))
+#define IPATH_GPIO_SCL (1ULL << \
+	(_IPATH_GPIO_SCL_NUM+INFINIPATH_EXTC_GPIOOE_SHIFT))
+
+#define IBA7220_R_INTRAVAIL_SHIFT 17
+#define IBA7220_R_TAILUPD_SHIFT 35
+#define IBA7220_R_PORTCFG_SHIFT 36
+
+#define INFINIPATH_JINT_PACKETSHIFT 16
+#define INFINIPATH_JINT_DEFAULT_IDLE_TICKS  0
+#define INFINIPATH_JINT_DEFAULT_MAX_PACKETS 0
+
+#define IBA7220_HDRHEAD_PKTINT_SHIFT 32 /* interrupt cnt in upper 32 bits */
+
+/*
+ * the size bits give us 2^N, in KB units.  0 marks as invalid,
+ * and 7 is reserved.  We currently use only 2KB and 4KB
+ */
+#define IBA7220_TID_SZ_SHIFT 37 /* shift to 3bit size selector */
+#define IBA7220_TID_SZ_2K (1UL<<IBA7220_TID_SZ_SHIFT) /* 2KB */
+#define IBA7220_TID_SZ_4K (2UL<<IBA7220_TID_SZ_SHIFT) /* 4KB */
+#define IBA7220_TID_PA_SHIFT 11U /* TID addr in chip stored w/o low bits */
+
+#define IPATH_AUTONEG_TRIES 5 /* sequential retries to negotiate DDR */
+
+static char int_type[16] = "auto";
+module_param_string(interrupt_type, int_type, sizeof(int_type), 0444);
+MODULE_PARM_DESC(int_type, " interrupt_type=auto|force_msi|force_intx\n");
+
+/* packet rate matching delay; chip has support */
+static u8 rate_to_delay[2][2] = {
+	/* 1x, 4x */
+	{   8, 2 }, /* SDR */
+	{   4, 1 }  /* DDR */
+};
+
+/* 7220 specific hardware errors... */
+static const struct ipath_hwerror_msgs ipath_7220_hwerror_msgs[] = {
+	INFINIPATH_HWE_MSG(PCIEPOISONEDTLP, "PCIe Poisoned TLP"),
+	INFINIPATH_HWE_MSG(PCIECPLTIMEOUT, "PCIe completion timeout"),
+	/*
+	 * In practice, it's unlikely wthat we'll see PCIe PLL, or bus
+	 * parity or memory parity error failures, because most likely we
+	 * won't be able to talk to the core of the chip.  Nonetheless, we
+	 * might see them, if they are in parts of the PCIe core that aren't
+	 * essential.
+	 */
+	INFINIPATH_HWE_MSG(PCIE1PLLFAILED, "PCIePLL1"),
+	INFINIPATH_HWE_MSG(PCIE0PLLFAILED, "PCIePLL0"),
+	INFINIPATH_HWE_MSG(PCIEBUSPARITYXTLH, "PCIe XTLH core parity"),
+	INFINIPATH_HWE_MSG(PCIEBUSPARITYXADM, "PCIe ADM TX core parity"),
+	INFINIPATH_HWE_MSG(PCIEBUSPARITYRADM, "PCIe ADM RX core parity"),
+	INFINIPATH_HWE_MSG(RXDSYNCMEMPARITYERR, "Rx Dsync"),
+	INFINIPATH_HWE_MSG(SERDESPLLFAILED, "SerDes PLL"),
+	INFINIPATH_HWE_MSG(PCIECPLDATAQUEUEERR, "PCIe cpl header queue"),
+	INFINIPATH_HWE_MSG(PCIECPLHDRQUEUEERR, "PCIe cpl data queue"),
+	INFINIPATH_HWE_MSG(SDMAMEMREADERR, "Send DMA memory read"),
+	INFINIPATH_HWE_MSG(CLK_UC_PLLNOTLOCKED, "uC PLL clock not locked"),
+	INFINIPATH_HWE_MSG(PCIESERDESQ0PCLKNOTDETECT,
+		"PCIe serdes Q0 no clock"),
+	INFINIPATH_HWE_MSG(PCIESERDESQ1PCLKNOTDETECT,
+		"PCIe serdes Q1 no clock"),
+	INFINIPATH_HWE_MSG(PCIESERDESQ2PCLKNOTDETECT,
+		"PCIe serdes Q2 no clock"),
+	INFINIPATH_HWE_MSG(PCIESERDESQ3PCLKNOTDETECT,
+		"PCIe serdes Q3 no clock"),
+	INFINIPATH_HWE_MSG(DDSRXEQMEMORYPARITYERR,
+		"DDS RXEQ memory parity"),
+	INFINIPATH_HWE_MSG(IB_UC_MEMORYPARITYERR, "IB uC memory parity"),
+	INFINIPATH_HWE_MSG(PCIE_UC_OCT0MEMORYPARITYERR,
+		"PCIe uC oct0 memory parity"),
+	INFINIPATH_HWE_MSG(PCIE_UC_OCT1MEMORYPARITYERR,
+		"PCIe uC oct1 memory parity"),
+};
+
+static void autoneg_work(struct work_struct *);
+
+/*
+ * the offset is different for different configured port numbers, since
+ * port0 is fixed in size, but others can vary.   Make it a function to
+ * make the issue more obvious.
+*/
+static inline u32 port_egrtid_idx(struct ipath_devdata *dd, unsigned port)
+{
+	 return port ? dd->ipath_p0_rcvegrcnt +
+		 (port-1) * dd->ipath_rcvegrcnt : 0;
+}
+
+static void ipath_7220_txe_recover(struct ipath_devdata *dd)
+{
+	++ipath_stats.sps_txeparity;
+
+	dev_info(&dd->pcidev->dev,
+		"Recovering from TXE PIO parity error\n");
+	ipath_disarm_senderrbufs(dd, 1);
+}
+
+
+/**
+ * ipath_7220_handle_hwerrors - display hardware errors.
+ * @dd: the infinipath device
+ * @msg: the output buffer
+ * @msgl: the size of the output buffer
+ *
+ * Use same msg buffer as regular errors to avoid excessive stack
+ * use.  Most hardware errors are catastrophic, but for right now,
+ * we'll print them and continue.  We reuse the same message buffer as
+ * ipath_handle_errors() to avoid excessive stack usage.
+ */
+static void ipath_7220_handle_hwerrors(struct ipath_devdata *dd, char *msg,
+				       size_t msgl)
+{
+	ipath_err_t hwerrs;
+	u32 bits, ctrl;
+	int isfatal = 0;
+	char bitsmsg[64];
+	int log_idx;
+
+	hwerrs = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus);
+	if (!hwerrs) {
+		/*
+		 * better than printing cofusing messages
+		 * This seems to be related to clearing the crc error, or
+		 * the pll error during init.
+		 */
+		ipath_cdbg(VERBOSE, "Called but no hardware errors set\n");
+		goto bail;
+	} else if (hwerrs == ~0ULL) {
+		ipath_dev_err(dd, "Read of hardware error status failed "
+			      "(all bits set); ignoring\n");
+		goto bail;
+	}
+	ipath_stats.sps_hwerrs++;
+
+	/*
+	 * Always clear the error status register, except MEMBISTFAIL,
+	 * regardless of whether we continue or stop using the chip.
+	 * We want that set so we know it failed, even across driver reload.
+	 * We'll still ignore it in the hwerrmask.  We do this partly for
+	 * diagnostics, but also for support.
+	 */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrclear,
+			 hwerrs&~INFINIPATH_HWE_MEMBISTFAILED);
+
+	hwerrs &= dd->ipath_hwerrmask;
+
+	/* We log some errors to EEPROM, check if we have any of those. */
+	for (log_idx = 0; log_idx < IPATH_EEP_LOG_CNT; ++log_idx)
+		if (hwerrs & dd->ipath_eep_st_masks[log_idx].hwerrs_to_log)
+			ipath_inc_eeprom_err(dd, log_idx, 1);
+	/*
+	 * Make sure we get this much out, unless told to be quiet,
+	 * or it's occurred within the last 5 seconds.
+	 */
+	if ((hwerrs & ~(dd->ipath_lasthwerror |
+			((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+			  INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+			 << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT))) ||
+	    (ipath_debug & __IPATH_VERBDBG))
+		dev_info(&dd->pcidev->dev, "Hardware error: hwerr=0x%llx "
+			 "(cleared)\n", (unsigned long long) hwerrs);
+	dd->ipath_lasthwerror |= hwerrs;
+
+	if (hwerrs & ~dd->ipath_hwe_bitsextant)
+		ipath_dev_err(dd, "hwerror interrupt with unknown errors "
+			      "%llx set\n", (unsigned long long)
+			      (hwerrs & ~dd->ipath_hwe_bitsextant));
+
+	if (hwerrs & INFINIPATH_HWE_IB_UC_MEMORYPARITYERR)
+		ipath_sd7220_clr_ibpar(dd);
+
+	ctrl = ipath_read_kreg32(dd, dd->ipath_kregs->kr_control);
+	if ((ctrl & INFINIPATH_C_FREEZEMODE) && !ipath_diag_inuse) {
+		/*
+		 * Parity errors in send memory are recoverable,
+		 * just cancel the send (if indicated in * sendbuffererror),
+		 * count the occurrence, unfreeze (if no other handled
+		 * hardware error bits are set), and continue.
+		 */
+		if (hwerrs & ((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+			       INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+			      << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT)) {
+			ipath_7220_txe_recover(dd);
+			hwerrs &= ~((INFINIPATH_HWE_TXEMEMPARITYERR_PIOBUF |
+				     INFINIPATH_HWE_TXEMEMPARITYERR_PIOPBC)
+				    << INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT);
+			if (!hwerrs) {
+				/* else leave in freeze mode */
+				ipath_write_kreg(dd,
+						 dd->ipath_kregs->kr_control,
+						 dd->ipath_control);
+				goto bail;
+			}
+		}
+		if (hwerrs) {
+			/*
+			 * If any set that we aren't ignoring only make the
+			 * complaint once, in case it's stuck or recurring,
+			 * and we get here multiple times
+			 * Force link down, so switch knows, and
+			 * LEDs are turned off.
+			 */
+			if (dd->ipath_flags & IPATH_INITTED) {
+				ipath_set_linkstate(dd, IPATH_IB_LINKDOWN);
+				ipath_setup_7220_setextled(dd,
+					INFINIPATH_IBCS_L_STATE_DOWN,
+					INFINIPATH_IBCS_LT_STATE_DISABLED);
+				ipath_dev_err(dd, "Fatal Hardware Error "
+					      "(freeze mode), no longer"
+					      " usable, SN %.16s\n",
+						  dd->ipath_serial);
+				isfatal = 1;
+			}
+			/*
+			 * Mark as having had an error for driver, and also
+			 * for /sys and status word mapped to user programs.
+			 * This marks unit as not usable, until reset.
+			 */
+			*dd->ipath_statusp &= ~IPATH_STATUS_IB_READY;
+			*dd->ipath_statusp |= IPATH_STATUS_HWERROR;
+			dd->ipath_flags &= ~IPATH_INITTED;
+		} else {
+			ipath_dbg("Clearing freezemode on ignored hardware "
+				  "error\n");
+			ipath_clear_freeze(dd);
+		}
+	}
+
+	*msg = '\0';
+
+	if (hwerrs & INFINIPATH_HWE_MEMBISTFAILED) {
+		strlcat(msg, "[Memory BIST test failed, "
+			"InfiniPath hardware unusable]", msgl);
+		/* ignore from now on, so disable until driver reloaded */
+		*dd->ipath_statusp |= IPATH_STATUS_HWERROR;
+		dd->ipath_hwerrmask &= ~INFINIPATH_HWE_MEMBISTFAILED;
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
+				 dd->ipath_hwerrmask);
+	}
+
+	ipath_format_hwerrors(hwerrs,
+			      ipath_7220_hwerror_msgs,
+			      ARRAY_SIZE(ipath_7220_hwerror_msgs),
+			      msg, msgl);
+
+	if (hwerrs & (INFINIPATH_HWE_PCIEMEMPARITYERR_MASK
+		      << INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT)) {
+		bits = (u32) ((hwerrs >>
+			       INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT) &
+			      INFINIPATH_HWE_PCIEMEMPARITYERR_MASK);
+		snprintf(bitsmsg, sizeof bitsmsg,
+			 "[PCIe Mem Parity Errs %x] ", bits);
+		strlcat(msg, bitsmsg, msgl);
+	}
+
+#define _IPATH_PLL_FAIL (INFINIPATH_HWE_COREPLL_FBSLIP |	\
+			 INFINIPATH_HWE_COREPLL_RFSLIP)
+
+	if (hwerrs & _IPATH_PLL_FAIL) {
+		snprintf(bitsmsg, sizeof bitsmsg,
+			 "[PLL failed (%llx), InfiniPath hardware unusable]",
+			 (unsigned long long) hwerrs & _IPATH_PLL_FAIL);
+		strlcat(msg, bitsmsg, msgl);
+		/* ignore from now on, so disable until driver reloaded */
+		dd->ipath_hwerrmask &= ~(hwerrs & _IPATH_PLL_FAIL);
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
+				 dd->ipath_hwerrmask);
+	}
+
+	if (hwerrs & INFINIPATH_HWE_SERDESPLLFAILED) {
+		/*
+		 * If it occurs, it is left masked since the eternal
+		 * interface is unused.
+		 */
+		dd->ipath_hwerrmask &= ~INFINIPATH_HWE_SERDESPLLFAILED;
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
+				 dd->ipath_hwerrmask);
+	}
+
+	ipath_dev_err(dd, "%s hardware error\n", msg);
+	/*
+	 * For /sys status file. if no trailing } is copied, we'll
+	 * know it was truncated.
+	 */
+	if (isfatal && !ipath_diag_inuse && dd->ipath_freezemsg)
+		snprintf(dd->ipath_freezemsg, dd->ipath_freezelen,
+			 "{%s}", msg);
+bail:;
+}
+
+/**
+ * ipath_7220_boardname - fill in the board name
+ * @dd: the infinipath device
+ * @name: the output buffer
+ * @namelen: the size of the output buffer
+ *
+ * info is based on the board revision register
+ */
+static int ipath_7220_boardname(struct ipath_devdata *dd, char *name,
+	size_t namelen)
+{
+	char *n = NULL;
+	u8 boardrev = dd->ipath_boardrev;
+	int ret;
+
+	if (boardrev == 15) {
+		/*
+		 * Emulator sometimes comes up all-ones, rather than zero.
+		 */
+		boardrev = 0;
+		dd->ipath_boardrev = boardrev;
+	}
+	switch (boardrev) {
+	case 0:
+		n = "InfiniPath_7220_Emulation";
+		break;
+	case 1:
+		n = "InfiniPath_QLE7240";
+		break;
+	case 2:
+		n = "InfiniPath_QLE7280";
+		break;
+	case 3:
+		n = "InfiniPath_QLE7242";
+		break;
+	case 4:
+		n = "InfiniPath_QEM7240";
+		break;
+	case 5:
+		n = "InfiniPath_QMI7240";
+		break;
+	case 6:
+		n = "InfiniPath_QMI7264";
+		break;
+	case 7:
+		n = "InfiniPath_QMH7240";
+		break;
+	case 8:
+		n = "InfiniPath_QME7240";
+		break;
+	case 9:
+		n = "InfiniPath_QLE7250";
+		break;
+	case 10:
+		n = "InfiniPath_QLE7290";
+		break;
+	case 11:
+		n = "InfiniPath_QEM7250";
+		break;
+	case 12:
+		n = "InfiniPath_QLE-Bringup";
+		break;
+	default:
+		ipath_dev_err(dd,
+			      "Don't yet know about board with ID %u\n",
+			      boardrev);
+		snprintf(name, namelen, "Unknown_InfiniPath_PCIe_%u",
+			 boardrev);
+		break;
+	}
+	if (n)
+		snprintf(name, namelen, "%s", n);
+
+	if (dd->ipath_majrev != 5 || !dd->ipath_minrev ||
+		dd->ipath_minrev > 2) {
+		ipath_dev_err(dd, "Unsupported InfiniPath hardware "
+			      "revision %u.%u!\n",
+			      dd->ipath_majrev, dd->ipath_minrev);
+		ret = 1;
+	} else if (dd->ipath_minrev == 1) {
+		/* Rev1 chips are prototype. Complain, but allow use */
+		ipath_dev_err(dd, "Unsupported hardware "
+			      "revision %u.%u, Contact support at qlogic.com\n",
+			      dd->ipath_majrev, dd->ipath_minrev);
+		ret = 0;
+	} else
+		ret = 0;
+
+	/*
+	 * Set here not in ipath_init_*_funcs because we have to do
+	 * it after we can read chip registers.
+	 */
+	dd->ipath_ureg_align = 0x10000;  /* 64KB alignment */
+
+	return ret;
+}
+
+/**
+ * ipath_7220_init_hwerrors - enable hardware errors
+ * @dd: the infinipath device
+ *
+ * now that we have finished initializing everything that might reasonably
+ * cause a hardware error, and cleared those errors bits as they occur,
+ * we can enable hardware errors in the mask (potentially enabling
+ * freeze mode), and enable hardware errors as errors (along with
+ * everything else) in errormask
+ */
+static void ipath_7220_init_hwerrors(struct ipath_devdata *dd)
+{
+	ipath_err_t val;
+	u64 extsval;
+
+	extsval = ipath_read_kreg64(dd, dd->ipath_kregs->kr_extstatus);
+
+	if (!(extsval & (INFINIPATH_EXTS_MEMBIST_ENDTEST |
+			INFINIPATH_EXTS_MEMBIST_DISABLED)))
+		ipath_dev_err(dd, "MemBIST did not complete!\n");
+	if (extsval & INFINIPATH_EXTS_MEMBIST_DISABLED)
+		dev_info(&dd->pcidev->dev, "MemBIST is disabled.\n");
+
+	val = ~0ULL;	/* barring bugs, all hwerrors become interrupts, */
+
+	if (!dd->ipath_boardrev)	/* no PLL for Emulator */
+		val &= ~INFINIPATH_HWE_SERDESPLLFAILED;
+
+	if (dd->ipath_minrev == 1)
+		val &= ~(1ULL << 42); /* TXE LaunchFIFO Parity rev1 issue */
+
+	val &= ~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR;
+	dd->ipath_hwerrmask = val;
+
+	/*
+	 * special trigger "error" is for debugging purposes. It
+	 * works around a processor/chipset problem.  The error
+	 * interrupt allows us to count occurrences, but we don't
+	 * want to pay the overhead for normal use.  Emulation only
+	 */
+	if (!dd->ipath_boardrev)
+		dd->ipath_maskederrs = INFINIPATH_E_SENDSPECIALTRIGGER;
+}
+
+/*
+ * All detailed interaction with the SerDes has been moved to ipath_sd7220.c
+ *
+ * The portion of IBA7220-specific bringup_serdes() that actually deals with
+ * registers and memory within the SerDes itself is ipath_sd7220_init().
+ */
+
+/**
+ * ipath_7220_bringup_serdes - bring up the serdes
+ * @dd: the infinipath device
+ */
+static int ipath_7220_bringup_serdes(struct ipath_devdata *dd)
+{
+	int ret = 0;
+	u64 val, prev_val, guid;
+	int was_reset;		/* Note whether uC was reset */
+
+	ipath_dbg("Trying to bringup serdes\n");
+
+	if (ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus) &
+	    INFINIPATH_HWE_SERDESPLLFAILED) {
+		ipath_dbg("At start, serdes PLL failed bit set "
+			  "in hwerrstatus, clearing and continuing\n");
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrclear,
+				 INFINIPATH_HWE_SERDESPLLFAILED);
+	}
+
+	if (!dd->ipath_ibcddrctrl) {
+		/* not on re-init after reset */
+		dd->ipath_ibcddrctrl =
+			ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcddrctrl);
+
+		if (dd->ipath_link_speed_enabled ==
+			(IPATH_IB_SDR | IPATH_IB_DDR))
+			dd->ipath_ibcddrctrl |=
+				IBA7220_IBC_SPEED_AUTONEG_MASK |
+				IBA7220_IBC_IBTA_1_2_MASK;
+		else
+			dd->ipath_ibcddrctrl |=
+				dd->ipath_link_speed_enabled == IPATH_IB_DDR
+				?  IBA7220_IBC_SPEED_DDR :
+				IBA7220_IBC_SPEED_SDR;
+		if ((dd->ipath_link_width_enabled & (IB_WIDTH_1X |
+			IB_WIDTH_4X)) == (IB_WIDTH_1X | IB_WIDTH_4X))
+			dd->ipath_ibcddrctrl |= IBA7220_IBC_WIDTH_AUTONEG;
+		else
+			dd->ipath_ibcddrctrl |=
+				dd->ipath_link_width_enabled == IB_WIDTH_4X
+				? IBA7220_IBC_WIDTH_4X_ONLY :
+				IBA7220_IBC_WIDTH_1X_ONLY;
+
+		/* always enable these on driver reload, not sticky */
+		dd->ipath_ibcddrctrl |=
+			IBA7220_IBC_RXPOL_MASK << IBA7220_IBC_RXPOL_SHIFT;
+		dd->ipath_ibcddrctrl |=
+			IBA7220_IBC_HRTBT_MASK << IBA7220_IBC_HRTBT_SHIFT;
+		/*
+		 * automatic lane reversal detection for receive
+		 * doesn't work correctly in rev 1, so disable it
+		 * on that rev, otherwise enable (disabling not
+		 * sticky across reload for >rev1)
+		 */
+		if (dd->ipath_minrev == 1)
+			dd->ipath_ibcddrctrl &=
+			~IBA7220_IBC_LANE_REV_SUPPORTED;
+		else
+			dd->ipath_ibcddrctrl |=
+				IBA7220_IBC_LANE_REV_SUPPORTED;
+	}
+
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcddrctrl,
+			dd->ipath_ibcddrctrl);
+
+	ipath_write_kreg(dd, IPATH_KREG_OFFSET(IBNCModeCtrl), 0Ull);
+
+	/* IBA7220 has SERDES MPU reset in D0 of what _was_ IBPLLCfg */
+	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibserdesctrl);
+	/* remember if uC was in Reset or not, for dactrim */
+	was_reset = (val & 1);
+	ipath_cdbg(VERBOSE, "IBReset %s xgxsconfig %llx\n",
+		   was_reset ? "Asserted" : "Negated", (unsigned long long)
+		   ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig));
+
+	if (dd->ipath_boardrev) {
+		/*
+		 * Hardware is not emulator, and may have been reset. Init it.
+		 * Below will release reset, but needs to know if chip was
+		 * originally in reset, to only trim DACs on first time
+		 * after chip reset or powercycle (not driver reload)
+		 */
+		ret = ipath_sd7220_init(dd, was_reset);
+	}
+
+	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig);
+	prev_val = val;
+	val |= INFINIPATH_XGXS_FC_SAFE;
+	if (val != prev_val) {
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
+		ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	}
+	if (val & INFINIPATH_XGXS_RESET)
+		val &= ~INFINIPATH_XGXS_RESET;
+	if (val != prev_val)
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
+
+	ipath_cdbg(VERBOSE, "done: xgxs=%llx from %llx\n",
+		   (unsigned long long)
+		   ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig),
+		   prev_val);
+
+	guid = be64_to_cpu(dd->ipath_guid);
+
+	if (!guid) {
+		/* have to have something, so use likely unique tsc */
+		guid = get_cycles();
+		ipath_dbg("No GUID for heartbeat, faking %llx\n",
+			(unsigned long long)guid);
+	} else
+		ipath_cdbg(VERBOSE, "Wrote %llX to HRTBT_GUID\n", guid);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_hrtbt_guid, guid);
+	return ret;
+}
+
+static void ipath_7220_config_jint(struct ipath_devdata *dd,
+				   u16 idle_ticks, u16 max_packets)
+{
+
+	/*
+	 * We can request a receive interrupt for 1 or more packets
+	 * from current offset.
+	 */
+	if (idle_ticks == 0 || max_packets == 0)
+		/* interrupt after one packet if no mitigation */
+		dd->ipath_rhdrhead_intr_off =
+			1ULL << IBA7220_HDRHEAD_PKTINT_SHIFT;
+	else
+		/* Turn off RcvHdrHead interrupts if using mitigation */
+		dd->ipath_rhdrhead_intr_off = 0ULL;
+
+	/* refresh kernel RcvHdrHead registers... */
+	ipath_write_ureg(dd, ur_rcvhdrhead,
+			 dd->ipath_rhdrhead_intr_off |
+			 dd->ipath_pd[0]->port_head, 0);
+
+	dd->ipath_jint_max_packets = max_packets;
+	dd->ipath_jint_idle_ticks = idle_ticks;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_jintreload,
+			 ((u64) max_packets << INFINIPATH_JINT_PACKETSHIFT) |
+			 idle_ticks);
+}
+
+/**
+ * ipath_7220_quiet_serdes - set serdes to txidle
+ * @dd: the infinipath device
+ * Called when driver is being unloaded
+ */
+static void ipath_7220_quiet_serdes(struct ipath_devdata *dd)
+{
+	u64 val;
+	dd->ipath_flags &= ~IPATH_IB_AUTONEG_INPROG;
+	wake_up(&dd->ipath_autoneg_wait);
+	cancel_delayed_work(&dd->ipath_autoneg_work);
+	flush_scheduled_work();
+	ipath_shutdown_relock_poll(dd);
+	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig);
+	val |= INFINIPATH_XGXS_RESET;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
+}
+
+static int ipath_7220_intconfig(struct ipath_devdata *dd)
+{
+	ipath_7220_config_jint(dd, dd->ipath_jint_idle_ticks,
+			       dd->ipath_jint_max_packets);
+	return 0;
+}
+
+/**
+ * ipath_setup_7220_setextled - set the state of the two external LEDs
+ * @dd: the infinipath device
+ * @lst: the L state
+ * @ltst: the LT state
+ *
+ * These LEDs indicate the physical and logical state of IB link.
+ * For this chip (at least with recommended board pinouts), LED1
+ * is Yellow (logical state) and LED2 is Green (physical state),
+ *
+ * Note:  We try to match the Mellanox HCA LED behavior as best
+ * we can.  Green indicates physical link state is OK (something is
+ * plugged in, and we can train).
+ * Amber indicates the link is logically up (ACTIVE).
+ * Mellanox further blinks the amber LED to indicate data packet
+ * activity, but we have no hardware support for that, so it would
+ * require waking up every 10-20 msecs and checking the counters
+ * on the chip, and then turning the LED off if appropriate.  That's
+ * visible overhead, so not something we will do.
+ *
+ */
+static void ipath_setup_7220_setextled(struct ipath_devdata *dd, u64 lst,
+				       u64 ltst)
+{
+	u64 extctl, ledblink = 0;
+	unsigned long flags = 0;
+
+	/* the diags use the LED to indicate diag info, so we leave
+	 * the external LED alone when the diags are running */
+	if (ipath_diag_inuse)
+		return;
+
+	/* Allow override of LED display for, e.g. Locating system in rack */
+	if (dd->ipath_led_override) {
+		ltst = (dd->ipath_led_override & IPATH_LED_PHYS)
+			? INFINIPATH_IBCS_LT_STATE_LINKUP
+			: INFINIPATH_IBCS_LT_STATE_DISABLED;
+		lst = (dd->ipath_led_override & IPATH_LED_LOG)
+			? INFINIPATH_IBCS_L_STATE_ACTIVE
+			: INFINIPATH_IBCS_L_STATE_DOWN;
+	}
+
+	spin_lock_irqsave(&dd->ipath_gpio_lock, flags);
+	extctl = dd->ipath_extctrl & ~(INFINIPATH_EXTC_LED1PRIPORT_ON |
+				       INFINIPATH_EXTC_LED2PRIPORT_ON);
+	if (ltst == INFINIPATH_IBCS_LT_STATE_LINKUP) {
+		extctl |= INFINIPATH_EXTC_LED1PRIPORT_ON;
+		/*
+		 * counts are in chip clock (4ns) periods.
+		 * This is 1/16 sec (66.6ms) on,
+		 * 3/16 sec (187.5 ms) off, with packets rcvd
+		 */
+		ledblink = ((66600*1000UL/4) << IBA7220_LEDBLINK_ON_SHIFT)
+			| ((187500*1000UL/4) << IBA7220_LEDBLINK_OFF_SHIFT);
+	}
+	if (lst == INFINIPATH_IBCS_L_STATE_ACTIVE)
+		extctl |= INFINIPATH_EXTC_LED2PRIPORT_ON;
+	dd->ipath_extctrl = extctl;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_extctrl, extctl);
+	spin_unlock_irqrestore(&dd->ipath_gpio_lock, flags);
+
+	if (ledblink) /* blink the LED on packet receive */
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvpktledcnt,
+			ledblink);
+}
+
+/*
+ * Similar to pci_intx(pdev, 1), except that we make sure
+ * msi is off...
+ */
+static void ipath_enable_intx(struct pci_dev *pdev)
+{
+	u16 cw, new;
+	int pos;
+
+	/* first, turn on INTx */
+	pci_read_config_word(pdev, PCI_COMMAND, &cw);
+	new = cw & ~PCI_COMMAND_INTX_DISABLE;
+	if (new != cw)
+		pci_write_config_word(pdev, PCI_COMMAND, new);
+
+	/* then turn off MSI */
+	pos = pci_find_capability(pdev, PCI_CAP_ID_MSI);
+	if (pos) {
+		pci_read_config_word(pdev, pos + PCI_MSI_FLAGS, &cw);
+		new = cw & ~PCI_MSI_FLAGS_ENABLE;
+		if (new != cw)
+			pci_write_config_word(pdev, pos + PCI_MSI_FLAGS, new);
+	}
+}
+
+static int ipath_msi_enabled(struct pci_dev *pdev)
+{
+	int pos, ret = 0;
+
+	pos = pci_find_capability(pdev, PCI_CAP_ID_MSI);
+	if (pos) {
+		u16 cw;
+
+		pci_read_config_word(pdev, pos + PCI_MSI_FLAGS, &cw);
+		ret = !!(cw & PCI_MSI_FLAGS_ENABLE);
+	}
+	return ret;
+}
+
+/*
+ * disable msi interrupt if enabled, and clear the flag.
+ * flag is used primarily for the fallback to IntX, but
+ * is also used in reinit after reset as a flag.
+ */
+static void ipath_7220_nomsi(struct ipath_devdata *dd)
+{
+	dd->ipath_msi_lo = 0;
+#ifdef CONFIG_PCI_MSI
+	if (ipath_msi_enabled(dd->pcidev)) {
+		/*
+		 * free, but don't zero; later kernels require
+		 * it be freed before disable_msi, so the intx
+		 * setup has to request it again.
+		 */
+		 if (dd->ipath_irq)
+			free_irq(dd->ipath_irq, dd);
+		pci_disable_msi(dd->pcidev);
+	}
+#endif
+}
+
+/*
+ * ipath_setup_7220_cleanup - clean up any per-chip chip-specific stuff
+ * @dd: the infinipath device
+ *
+ * Nothing but msi interrupt cleanup for now.
+ *
+ * This is called during driver unload.
+ */
+static void ipath_setup_7220_cleanup(struct ipath_devdata *dd)
+{
+	ipath_7220_nomsi(dd);
+}
+
+
+static void ipath_7220_pcie_params(struct ipath_devdata *dd, u32 boardrev)
+{
+	u16 linkstat, minwidth, speed;
+	int pos;
+
+	pos = pci_find_capability(dd->pcidev, PCI_CAP_ID_EXP);
+	if (!pos) {
+		ipath_dev_err(dd, "Can't find PCI Express capability!\n");
+		goto bail;
+	}
+
+	pci_read_config_word(dd->pcidev, pos + PCI_EXP_LNKSTA,
+			     &linkstat);
+	/*
+	 * speed is bits 0-4, linkwidth is bits 4-8
+	 * no defines for them in headers
+	 */
+	speed = linkstat & 0xf;
+	linkstat >>= 4;
+	linkstat &= 0x1f;
+	dd->ipath_lbus_width = linkstat;
+	switch (boardrev) {
+	case 0:
+	case 2:
+	case 10:
+	case 12:
+		minwidth = 16; /* x16 capable boards */
+		break;
+	default:
+		minwidth = 8; /* x8 capable boards */
+		break;
+	}
+
+	switch (speed) {
+	case 1:
+		dd->ipath_lbus_speed = 2500; /* Gen1, 2.5GHz */
+		break;
+	case 2:
+		dd->ipath_lbus_speed = 5000; /* Gen1, 5GHz */
+		break;
+	default: /* not defined, assume gen1 */
+		dd->ipath_lbus_speed = 2500;
+		break;
+	}
+
+	if (linkstat < minwidth)
+		ipath_dev_err(dd,
+			"PCIe width %u (x%u HCA), performance "
+			"reduced\n", linkstat, minwidth);
+	else
+		ipath_cdbg(VERBOSE, "PCIe speed %u width %u (x%u HCA)\n",
+			dd->ipath_lbus_speed, linkstat, minwidth);
+
+	if (speed != 1)
+		ipath_dev_err(dd,
+			"PCIe linkspeed %u is incorrect; "
+			"should be 1 (2500)!\n", speed);
+
+bail:
+	/* fill in string, even on errors */
+	snprintf(dd->ipath_lbus_info, sizeof(dd->ipath_lbus_info),
+		"PCIe,%uMHz,x%u\n",
+		dd->ipath_lbus_speed,
+		dd->ipath_lbus_width);
+	return;
+}
+
+
+/**
+ * ipath_setup_7220_config - setup PCIe config related stuff
+ * @dd: the infinipath device
+ * @pdev: the PCI device
+ *
+ * The pci_enable_msi() call will fail on systems with MSI quirks
+ * such as those with AMD8131, even if the device of interest is not
+ * attached to that device, (in the 2.6.13 - 2.6.15 kernels, at least, fixed
+ * late in 2.6.16).
+ * All that can be done is to edit the kernel source to remove the quirk
+ * check until that is fixed.
+ * We do not need to call enable_msi() for our HyperTransport chip,
+ * even though it uses MSI, and we want to avoid the quirk warning, so
+ * So we call enable_msi only for PCIe.  If we do end up needing
+ * pci_enable_msi at some point in the future for HT, we'll move the
+ * call back into the main init_one code.
+ * We save the msi lo and hi values, so we can restore them after
+ * chip reset (the kernel PCI infrastructure doesn't yet handle that
+ * correctly).
+ */
+static int ipath_setup_7220_config(struct ipath_devdata *dd,
+				   struct pci_dev *pdev)
+{
+	int pos, ret = -1;
+	u32 boardrev;
+
+	dd->ipath_msi_lo = 0;	/* used as a flag during reset processing */
+#ifdef CONFIG_PCI_MSI
+	pos = pci_find_capability(pdev, PCI_CAP_ID_MSI);
+	if (!strcmp(int_type, "force_msi") || !strcmp(int_type, "auto"))
+		ret = pci_enable_msi(pdev);
+	if (ret) {
+		if (!strcmp(int_type, "force_msi")) {
+			ipath_dev_err(dd, "pci_enable_msi failed: %d, "
+				      "force_msi is on, so not continuing.\n",
+				      ret);
+			return ret;
+		}
+
+		ipath_enable_intx(pdev);
+		if (!strcmp(int_type, "auto"))
+			ipath_dev_err(dd, "pci_enable_msi failed: %d, "
+				      "falling back to INTx\n", ret);
+	} else if (pos) {
+		u16 control;
+		pci_read_config_dword(pdev, pos + PCI_MSI_ADDRESS_LO,
+				      &dd->ipath_msi_lo);
+		pci_read_config_dword(pdev, pos + PCI_MSI_ADDRESS_HI,
+				      &dd->ipath_msi_hi);
+		pci_read_config_word(pdev, pos + PCI_MSI_FLAGS,
+				     &control);
+		/* now save the data (vector) info */
+		pci_read_config_word(pdev,
+				     pos + ((control & PCI_MSI_FLAGS_64BIT)
+					    ? PCI_MSI_DATA_64 :
+					    PCI_MSI_DATA_32),
+				     &dd->ipath_msi_data);
+	} else
+		ipath_dev_err(dd, "Can't find MSI capability, "
+			      "can't save MSI settings for reset\n");
+#else
+	ipath_dbg("PCI_MSI not configured, using IntX interrupts\n");
+	ipath_enable_intx(pdev);
+#endif
+
+	dd->ipath_irq = pdev->irq;
+
+	/*
+	 * We save the cachelinesize also, although it doesn't
+	 * really matter.
+	 */
+	pci_read_config_byte(pdev, PCI_CACHE_LINE_SIZE,
+			     &dd->ipath_pci_cacheline);
+
+	/*
+	 * this function called early, ipath_boardrev not set yet.  Can't
+	 * use ipath_read_kreg64() yet, too early in init, so use readq()
+	 */
+	boardrev = (readq(&dd->ipath_kregbase[dd->ipath_kregs->kr_revision])
+		 >> INFINIPATH_R_BOARDID_SHIFT) & INFINIPATH_R_BOARDID_MASK;
+
+	ipath_7220_pcie_params(dd, boardrev);
+
+	dd->ipath_flags |= IPATH_NODMA_RTAIL | IPATH_HAS_SEND_DMA |
+		IPATH_HAS_PBC_CNT | IPATH_HAS_THRESH_UPDATE;
+	dd->ipath_pioupd_thresh = 4U; /* set default update threshold */
+	return 0;
+}
+
+static void ipath_init_7220_variables(struct ipath_devdata *dd)
+{
+	/*
+	 * setup the register offsets, since they are different for each
+	 * chip
+	 */
+	dd->ipath_kregs = &ipath_7220_kregs;
+	dd->ipath_cregs = &ipath_7220_cregs;
+
+	/*
+	 * bits for selecting i2c direction and values,
+	 * used for I2C serial flash
+	 */
+	dd->ipath_gpio_sda_num = _IPATH_GPIO_SDA_NUM;
+	dd->ipath_gpio_scl_num = _IPATH_GPIO_SCL_NUM;
+	dd->ipath_gpio_sda = IPATH_GPIO_SDA;
+	dd->ipath_gpio_scl = IPATH_GPIO_SCL;
+
+	/*
+	 * Fill in data for field-values that change in IBA7220.
+	 * We dynamically specify only the mask for LINKTRAININGSTATE
+	 * and only the shift for LINKSTATE, as they are the only ones
+	 * that change.  Also precalculate the 3 link states of interest
+	 * and the combined mask.
+	 */
+	dd->ibcs_ls_shift = IBA7220_IBCS_LINKSTATE_SHIFT;
+	dd->ibcs_lts_mask = IBA7220_IBCS_LINKTRAININGSTATE_MASK;
+	dd->ibcs_mask = (INFINIPATH_IBCS_LINKSTATE_MASK <<
+		dd->ibcs_ls_shift) | dd->ibcs_lts_mask;
+	dd->ib_init = (INFINIPATH_IBCS_LT_STATE_LINKUP <<
+		INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) |
+		(INFINIPATH_IBCS_L_STATE_INIT << dd->ibcs_ls_shift);
+	dd->ib_arm = (INFINIPATH_IBCS_LT_STATE_LINKUP <<
+		INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) |
+		(INFINIPATH_IBCS_L_STATE_ARM << dd->ibcs_ls_shift);
+	dd->ib_active = (INFINIPATH_IBCS_LT_STATE_LINKUP <<
+		INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) |
+		(INFINIPATH_IBCS_L_STATE_ACTIVE << dd->ibcs_ls_shift);
+
+	/*
+	 * Fill in data for ibcc field-values that change in IBA7220.
+	 * We dynamically specify only the mask for LINKINITCMD
+	 * and only the shift for LINKCMD and MAXPKTLEN, as they are
+	 * the only ones that change.
+	 */
+	dd->ibcc_lic_mask = IBA7220_IBCC_LINKINITCMD_MASK;
+	dd->ibcc_lc_shift = IBA7220_IBCC_LINKCMD_SHIFT;
+	dd->ibcc_mpl_shift = IBA7220_IBCC_MAXPKTLEN_SHIFT;
+
+	/* Fill in shifts for RcvCtrl. */
+	dd->ipath_r_portenable_shift = INFINIPATH_R_PORTENABLE_SHIFT;
+	dd->ipath_r_intravail_shift = IBA7220_R_INTRAVAIL_SHIFT;
+	dd->ipath_r_tailupd_shift = IBA7220_R_TAILUPD_SHIFT;
+	dd->ipath_r_portcfg_shift = IBA7220_R_PORTCFG_SHIFT;
+
+	/* variables for sanity checking interrupt and errors */
+	dd->ipath_hwe_bitsextant =
+		(INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
+		 INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT) |
+		(INFINIPATH_HWE_TXEMEMPARITYERR_MASK <<
+		 INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT) |
+		(INFINIPATH_HWE_PCIEMEMPARITYERR_MASK <<
+		 INFINIPATH_HWE_PCIEMEMPARITYERR_SHIFT) |
+		INFINIPATH_HWE_PCIE1PLLFAILED |
+		INFINIPATH_HWE_PCIE0PLLFAILED |
+		INFINIPATH_HWE_PCIEPOISONEDTLP |
+		INFINIPATH_HWE_PCIECPLTIMEOUT |
+		INFINIPATH_HWE_PCIEBUSPARITYXTLH |
+		INFINIPATH_HWE_PCIEBUSPARITYXADM |
+		INFINIPATH_HWE_PCIEBUSPARITYRADM |
+		INFINIPATH_HWE_MEMBISTFAILED |
+		INFINIPATH_HWE_COREPLL_FBSLIP |
+		INFINIPATH_HWE_COREPLL_RFSLIP |
+		INFINIPATH_HWE_SERDESPLLFAILED |
+		INFINIPATH_HWE_IBCBUSTOSPCPARITYERR |
+		INFINIPATH_HWE_IBCBUSFRSPCPARITYERR |
+		INFINIPATH_HWE_PCIECPLDATAQUEUEERR |
+		INFINIPATH_HWE_PCIECPLHDRQUEUEERR |
+		INFINIPATH_HWE_SDMAMEMREADERR |
+		INFINIPATH_HWE_CLK_UC_PLLNOTLOCKED |
+		INFINIPATH_HWE_PCIESERDESQ0PCLKNOTDETECT |
+		INFINIPATH_HWE_PCIESERDESQ1PCLKNOTDETECT |
+		INFINIPATH_HWE_PCIESERDESQ2PCLKNOTDETECT |
+		INFINIPATH_HWE_PCIESERDESQ3PCLKNOTDETECT |
+		INFINIPATH_HWE_DDSRXEQMEMORYPARITYERR |
+		INFINIPATH_HWE_IB_UC_MEMORYPARITYERR |
+		INFINIPATH_HWE_PCIE_UC_OCT0MEMORYPARITYERR |
+		INFINIPATH_HWE_PCIE_UC_OCT1MEMORYPARITYERR;
+	dd->ipath_i_bitsextant =
+		INFINIPATH_I_SDMAINT | INFINIPATH_I_SDMADISABLED |
+		(INFINIPATH_I_RCVURG_MASK << INFINIPATH_I_RCVURG_SHIFT) |
+		(INFINIPATH_I_RCVAVAIL_MASK <<
+		 INFINIPATH_I_RCVAVAIL_SHIFT) |
+		INFINIPATH_I_ERROR | INFINIPATH_I_SPIOSENT |
+		INFINIPATH_I_SPIOBUFAVAIL | INFINIPATH_I_GPIO |
+		INFINIPATH_I_JINT | INFINIPATH_I_SERDESTRIMDONE;
+	dd->ipath_e_bitsextant =
+		INFINIPATH_E_RFORMATERR | INFINIPATH_E_RVCRC |
+		INFINIPATH_E_RICRC | INFINIPATH_E_RMINPKTLEN |
+		INFINIPATH_E_RMAXPKTLEN | INFINIPATH_E_RLONGPKTLEN |
+		INFINIPATH_E_RSHORTPKTLEN | INFINIPATH_E_RUNEXPCHAR |
+		INFINIPATH_E_RUNSUPVL | INFINIPATH_E_REBP |
+		INFINIPATH_E_RIBFLOW | INFINIPATH_E_RBADVERSION |
+		INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL |
+		INFINIPATH_E_RBADTID | INFINIPATH_E_RHDRLEN |
+		INFINIPATH_E_RHDR | INFINIPATH_E_RIBLOSTLINK |
+		INFINIPATH_E_SENDSPECIALTRIGGER |
+		INFINIPATH_E_SDMADISABLED | INFINIPATH_E_SMINPKTLEN |
+		INFINIPATH_E_SMAXPKTLEN | INFINIPATH_E_SUNDERRUN |
+		INFINIPATH_E_SPKTLEN | INFINIPATH_E_SDROPPEDSMPPKT |
+		INFINIPATH_E_SDROPPEDDATAPKT |
+		INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SUNEXPERRPKTNUM |
+		INFINIPATH_E_SUNSUPVL | INFINIPATH_E_SENDBUFMISUSE |
+		INFINIPATH_E_SDMAGENMISMATCH | INFINIPATH_E_SDMAOUTOFBOUND |
+		INFINIPATH_E_SDMATAILOUTOFBOUND | INFINIPATH_E_SDMABASE |
+		INFINIPATH_E_SDMA1STDESC | INFINIPATH_E_SDMARPYTAG |
+		INFINIPATH_E_SDMADWEN | INFINIPATH_E_SDMAMISSINGDW |
+		INFINIPATH_E_SDMAUNEXPDATA |
+		INFINIPATH_E_IBSTATUSCHANGED | INFINIPATH_E_INVALIDADDR |
+		INFINIPATH_E_RESET | INFINIPATH_E_HARDWARE |
+		INFINIPATH_E_SDMADESCADDRMISALIGN |
+		INFINIPATH_E_INVALIDEEPCMD;
+
+	dd->ipath_i_rcvavail_mask = INFINIPATH_I_RCVAVAIL_MASK;
+	dd->ipath_i_rcvurg_mask = INFINIPATH_I_RCVURG_MASK;
+	dd->ipath_i_rcvavail_shift = INFINIPATH_I_RCVAVAIL_SHIFT;
+	dd->ipath_i_rcvurg_shift = INFINIPATH_I_RCVURG_SHIFT;
+	dd->ipath_flags |= IPATH_INTREG_64 | IPATH_HAS_MULT_IB_SPEED
+		| IPATH_HAS_LINK_LATENCY;
+
+	/*
+	 * EEPROM error log 0 is TXE Parity errors. 1 is RXE Parity.
+	 * 2 is Some Misc, 3 is reserved for future.
+	 */
+	dd->ipath_eep_st_masks[0].hwerrs_to_log =
+		INFINIPATH_HWE_TXEMEMPARITYERR_MASK <<
+		INFINIPATH_HWE_TXEMEMPARITYERR_SHIFT;
+
+	dd->ipath_eep_st_masks[1].hwerrs_to_log =
+		INFINIPATH_HWE_RXEMEMPARITYERR_MASK <<
+		INFINIPATH_HWE_RXEMEMPARITYERR_SHIFT;
+
+	dd->ipath_eep_st_masks[2].errs_to_log = INFINIPATH_E_RESET;
+
+	ipath_linkrecovery = 0;
+
+	init_waitqueue_head(&dd->ipath_autoneg_wait);
+	INIT_DELAYED_WORK(&dd->ipath_autoneg_work,  autoneg_work);
+
+	dd->ipath_link_width_supported = IB_WIDTH_1X | IB_WIDTH_4X;
+	dd->ipath_link_speed_supported = IPATH_IB_SDR | IPATH_IB_DDR;
+
+	dd->ipath_link_width_enabled = dd->ipath_link_width_supported;
+	dd->ipath_link_speed_enabled = dd->ipath_link_speed_supported;
+	/*
+	 * set the initial values to reasonable default, will be set
+	 * for real when link is up.
+	 */
+	dd->ipath_link_width_active = IB_WIDTH_4X;
+	dd->ipath_link_speed_active = IPATH_IB_SDR;
+	dd->delay_mult = rate_to_delay[0][1];
+}
+
+
+/*
+ * Setup the MSI stuff again after a reset.  I'd like to just call
+ * pci_enable_msi() and request_irq() again, but when I do that,
+ * the MSI enable bit doesn't get set in the command word, and
+ * we switch to to a different interrupt vector, which is confusing,
+ * so I instead just do it all inline.  Perhaps somehow can tie this
+ * into the PCIe hotplug support at some point
+ * Note, because I'm doing it all here, I don't call pci_disable_msi()
+ * or free_irq() at the start of ipath_setup_7220_reset().
+ */
+static int ipath_reinit_msi(struct ipath_devdata *dd)
+{
+	int ret = 0;
+#ifdef CONFIG_PCI_MSI
+	int pos;
+	u16 control;
+	if (!dd->ipath_msi_lo) /* Using intX, or init problem */
+		goto bail;
+
+	pos = pci_find_capability(dd->pcidev, PCI_CAP_ID_MSI);
+	if (!pos) {
+		ipath_dev_err(dd, "Can't find MSI capability, "
+			      "can't restore MSI settings\n");
+		goto bail;
+	}
+	ipath_cdbg(VERBOSE, "Writing msi_lo 0x%x to config offset 0x%x\n",
+		   dd->ipath_msi_lo, pos + PCI_MSI_ADDRESS_LO);
+	pci_write_config_dword(dd->pcidev, pos + PCI_MSI_ADDRESS_LO,
+			       dd->ipath_msi_lo);
+	ipath_cdbg(VERBOSE, "Writing msi_lo 0x%x to config offset 0x%x\n",
+		   dd->ipath_msi_hi, pos + PCI_MSI_ADDRESS_HI);
+	pci_write_config_dword(dd->pcidev, pos + PCI_MSI_ADDRESS_HI,
+			       dd->ipath_msi_hi);
+	pci_read_config_word(dd->pcidev, pos + PCI_MSI_FLAGS, &control);
+	if (!(control & PCI_MSI_FLAGS_ENABLE)) {
+		ipath_cdbg(VERBOSE, "MSI control at off %x was %x, "
+			   "setting MSI enable (%x)\n", pos + PCI_MSI_FLAGS,
+			   control, control | PCI_MSI_FLAGS_ENABLE);
+		control |= PCI_MSI_FLAGS_ENABLE;
+		pci_write_config_word(dd->pcidev, pos + PCI_MSI_FLAGS,
+				      control);
+	}
+	/* now rewrite the data (vector) info */
+	pci_write_config_word(dd->pcidev, pos +
+			      ((control & PCI_MSI_FLAGS_64BIT) ? 12 : 8),
+			      dd->ipath_msi_data);
+	ret = 1;
+bail:
+#endif
+	if (!ret) {
+		ipath_dbg("Using IntX, MSI disabled or not configured\n");
+		ipath_enable_intx(dd->pcidev);
+		ret = 1;
+	}
+	/*
+	 * We restore the cachelinesize also, although it doesn't really
+	 * matter.
+	 */
+	pci_write_config_byte(dd->pcidev, PCI_CACHE_LINE_SIZE,
+			      dd->ipath_pci_cacheline);
+	/* and now set the pci master bit again */
+	pci_set_master(dd->pcidev);
+
+	return ret;
+}
+
+/*
+ * This routine sleeps, so it can only be called from user context, not
+ * from interrupt context.  If we need interrupt context, we can split
+ * it into two routines.
+ */
+static int ipath_setup_7220_reset(struct ipath_devdata *dd)
+{
+	u64 val;
+	int i;
+	int ret;
+	u16 cmdval;
+
+	pci_read_config_word(dd->pcidev, PCI_COMMAND, &cmdval);
+
+	/* Use dev_err so it shows up in logs, etc. */
+	ipath_dev_err(dd, "Resetting InfiniPath unit %u\n", dd->ipath_unit);
+
+	/* keep chip from being accessed in a few places */
+	dd->ipath_flags &= ~(IPATH_INITTED | IPATH_PRESENT);
+	val = dd->ipath_control | INFINIPATH_C_RESET;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_control, val);
+	mb();
+
+	for (i = 1; i <= 5; i++) {
+		int r;
+
+		/*
+		 * Allow MBIST, etc. to complete; longer on each retry.
+		 * We sometimes get machine checks from bus timeout if no
+		 * response, so for now, make it *really* long.
+		 */
+		msleep(1000 + (1 + i) * 2000);
+		r = pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_0,
+					   dd->ipath_pcibar0);
+		if (r)
+			ipath_dev_err(dd, "rewrite of BAR0 failed: %d\n", r);
+		r = pci_write_config_dword(dd->pcidev, PCI_BASE_ADDRESS_1,
+					   dd->ipath_pcibar1);
+		if (r)
+			ipath_dev_err(dd, "rewrite of BAR1 failed: %d\n", r);
+		/* now re-enable memory access */
+		pci_write_config_word(dd->pcidev, PCI_COMMAND, cmdval);
+		r = pci_enable_device(dd->pcidev);
+		if (r)
+			ipath_dev_err(dd, "pci_enable_device failed after "
+				      "reset: %d\n", r);
+		/*
+		 * whether it fully enabled or not, mark as present,
+		 * again (but not INITTED)
+		 */
+		dd->ipath_flags |= IPATH_PRESENT;
+		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_revision);
+		if (val == dd->ipath_revision) {
+			ipath_cdbg(VERBOSE, "Got matching revision "
+				   "register %llx on try %d\n",
+				   (unsigned long long) val, i);
+			ret = ipath_reinit_msi(dd);
+			goto bail;
+		}
+		/* Probably getting -1 back */
+		ipath_dbg("Didn't get expected revision register, "
+			  "got %llx, try %d\n", (unsigned long long) val,
+			  i + 1);
+	}
+	ret = 0; /* failed */
+
+bail:
+	if (ret)
+		ipath_7220_pcie_params(dd, dd->ipath_boardrev);
+
+	return ret;
+}
+
+/**
+ * ipath_7220_put_tid - write a TID to the chip
+ * @dd: the infinipath device
+ * @tidptr: pointer to the expected TID (in chip) to udpate
+ * @tidtype: 0 for eager, 1 for expected
+ * @pa: physical address of in memory buffer; ipath_tidinvalid if freeing
+ *
+ * This exists as a separate routine to allow for selection of the
+ * appropriate "flavor". The static calls in cleanup just use the
+ * revision-agnostic form, as they are not performance critical.
+ */
+static void ipath_7220_put_tid(struct ipath_devdata *dd, u64 __iomem *tidptr,
+			     u32 type, unsigned long pa)
+{
+	if (pa != dd->ipath_tidinvalid) {
+		u64 chippa = pa >> IBA7220_TID_PA_SHIFT;
+
+		/* paranoia checks */
+		if (pa != (chippa << IBA7220_TID_PA_SHIFT)) {
+			dev_info(&dd->pcidev->dev, "BUG: physaddr %lx "
+				 "not 2KB aligned!\n", pa);
+			return;
+		}
+		if (pa >= (1UL << IBA7220_TID_SZ_SHIFT)) {
+			ipath_dev_err(dd,
+				      "BUG: Physical page address 0x%lx "
+				      "larger than supported\n", pa);
+			return;
+		}
+
+		if (type == RCVHQ_RCV_TYPE_EAGER)
+			chippa |= dd->ipath_tidtemplate;
+		else /* for now, always full 4KB page */
+			chippa |= IBA7220_TID_SZ_4K;
+		writeq(chippa, tidptr);
+	} else
+		writeq(pa, tidptr);
+	mmiowb();
+}
+
+/**
+ * ipath_7220_clear_tid - clear all TID entries for a port, expected and eager
+ * @dd: the infinipath device
+ * @port: the port
+ *
+ * clear all TID entries for a port, expected and eager.
+ * Used from ipath_close().  On this chip, TIDs are only 32 bits,
+ * not 64, but they are still on 64 bit boundaries, so tidbase
+ * is declared as u64 * for the pointer math, even though we write 32 bits
+ */
+static void ipath_7220_clear_tids(struct ipath_devdata *dd, unsigned port)
+{
+	u64 __iomem *tidbase;
+	unsigned long tidinv;
+	int i;
+
+	if (!dd->ipath_kregbase)
+		return;
+
+	ipath_cdbg(VERBOSE, "Invalidate TIDs for port %u\n", port);
+
+	tidinv = dd->ipath_tidinvalid;
+	tidbase = (u64 __iomem *)
+		((char __iomem *)(dd->ipath_kregbase) +
+		 dd->ipath_rcvtidbase +
+		 port * dd->ipath_rcvtidcnt * sizeof(*tidbase));
+
+	for (i = 0; i < dd->ipath_rcvtidcnt; i++)
+		ipath_7220_put_tid(dd, &tidbase[i], RCVHQ_RCV_TYPE_EXPECTED,
+				   tidinv);
+
+	tidbase = (u64 __iomem *)
+		((char __iomem *)(dd->ipath_kregbase) +
+		 dd->ipath_rcvegrbase + port_egrtid_idx(dd, port)
+		 * sizeof(*tidbase));
+
+	for (i = port ? dd->ipath_rcvegrcnt : dd->ipath_p0_rcvegrcnt; i; i--)
+		ipath_7220_put_tid(dd, &tidbase[i-1], RCVHQ_RCV_TYPE_EAGER,
+			tidinv);
+}
+
+/**
+ * ipath_7220_tidtemplate - setup constants for TID updates
+ * @dd: the infinipath device
+ *
+ * We setup stuff that we use a lot, to avoid calculating each time
+ */
+static void ipath_7220_tidtemplate(struct ipath_devdata *dd)
+{
+	/* For now, we always allocate 4KB buffers (at init) so we can
+	 * receive max size packets.  We may want a module parameter to
+	 * specify 2KB or 4KB and/or make be per port instead of per device
+	 * for those who want to reduce memory footprint.  Note that the
+	 * ipath_rcvhdrentsize size must be large enough to hold the largest
+	 * IB header (currently 96 bytes) that we expect to handle (plus of
+	 * course the 2 dwords of RHF).
+	 */
+	if (dd->ipath_rcvegrbufsize == 2048)
+		dd->ipath_tidtemplate = IBA7220_TID_SZ_2K;
+	else if (dd->ipath_rcvegrbufsize == 4096)
+		dd->ipath_tidtemplate = IBA7220_TID_SZ_4K;
+	else {
+		dev_info(&dd->pcidev->dev, "BUG: unsupported egrbufsize "
+			 "%u, using %u\n", dd->ipath_rcvegrbufsize,
+			 4096);
+		dd->ipath_tidtemplate = IBA7220_TID_SZ_4K;
+	}
+	dd->ipath_tidinvalid = 0;
+}
+
+static int ipath_7220_early_init(struct ipath_devdata *dd)
+{
+	u32 i, s;
+
+	if (strcmp(int_type, "auto") &&
+	    strcmp(int_type, "force_msi") &&
+	    strcmp(int_type, "force_intx")) {
+		ipath_dev_err(dd, "Invalid interrupt_type: '%s', expecting "
+			      "auto, force_msi or force_intx\n", int_type);
+		return -EINVAL;
+	}
+
+	/*
+	 * Control[4] has been added to change the arbitration within
+	 * the SDMA engine between favoring data fetches over descriptor
+	 * fetches.  ipath_sdma_fetch_arb==0 gives data fetches priority.
+	 */
+	if (ipath_sdma_fetch_arb && (dd->ipath_minrev > 1))
+		dd->ipath_control |= 1<<4;
+
+	dd->ipath_flags |= IPATH_4BYTE_TID;
+
+	/*
+	 * For openfabrics, we need to be able to handle an IB header of
+	 * 24 dwords.  HT chip has arbitrary sized receive buffers, so we
+	 * made them the same size as the PIO buffers.  This chip does not
+	 * handle arbitrary size buffers, so we need the header large enough
+	 * to handle largest IB header, but still have room for a 2KB MTU
+	 * standard IB packet.
+	 */
+	dd->ipath_rcvhdrentsize = 24;
+	dd->ipath_rcvhdrsize = IPATH_DFLT_RCVHDRSIZE;
+	dd->ipath_rhf_offset =
+		dd->ipath_rcvhdrentsize - sizeof(u64) / sizeof(u32);
+
+	dd->ipath_rcvegrbufsize = ipath_mtu4096 ? 4096 : 2048;
+	/*
+	 * the min() check here is currently a nop, but it may not always
+	 * be, depending on just how we do ipath_rcvegrbufsize
+	 */
+	dd->ipath_ibmaxlen = min(ipath_mtu4096 ? dd->ipath_piosize4k :
+				 dd->ipath_piosize2k,
+				 dd->ipath_rcvegrbufsize +
+				 (dd->ipath_rcvhdrentsize << 2));
+	dd->ipath_init_ibmaxlen = dd->ipath_ibmaxlen;
+
+	ipath_7220_config_jint(dd, INFINIPATH_JINT_DEFAULT_IDLE_TICKS,
+			       INFINIPATH_JINT_DEFAULT_MAX_PACKETS);
+
+	if (dd->ipath_boardrev) /* no eeprom on emulator */
+		ipath_get_eeprom_info(dd);
+
+	/* start of code to check and print procmon */
+	s = ipath_read_kreg32(dd, IPATH_KREG_OFFSET(ProcMon));
+	s &= ~(1U<<31); /* clear done bit */
+	s |= 1U<<14; /* clear counter (write 1 to clear) */
+	ipath_write_kreg(dd, IPATH_KREG_OFFSET(ProcMon), s);
+	/* make sure clear_counter low long enough before start */
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+
+	s &= ~(1U<<14); /* allow counter to count (before starting) */
+	ipath_write_kreg(dd, IPATH_KREG_OFFSET(ProcMon), s);
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	s = ipath_read_kreg32(dd, IPATH_KREG_OFFSET(ProcMon));
+
+	s |= 1U<<15; /* start the counter */
+	s &= ~(1U<<31); /* clear done bit */
+	s &= ~0x7ffU; /* clear frequency bits */
+	s |= 0xe29; /* set frequency bits, in case cleared */
+	ipath_write_kreg(dd, IPATH_KREG_OFFSET(ProcMon), s);
+
+	s = 0;
+	for (i = 500; i > 0 && !(s&(1ULL<<31)); i--) {
+		ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		s = ipath_read_kreg32(dd, IPATH_KREG_OFFSET(ProcMon));
+	}
+	if (!(s&(1U<<31)))
+		ipath_dev_err(dd, "ProcMon register not valid: 0x%x\n", s);
+	else
+		ipath_dbg("ProcMon=0x%x, count=0x%x\n", s, (s>>16)&0x1ff);
+
+	return 0;
+}
+
+/**
+ * ipath_init_7220_get_base_info - set chip-specific flags for user code
+ * @pd: the infinipath port
+ * @kbase: ipath_base_info pointer
+ *
+ * We set the PCIE flag because the lower bandwidth on PCIe vs
+ * HyperTransport can affect some user packet algorithims.
+ */
+static int ipath_7220_get_base_info(struct ipath_portdata *pd, void *kbase)
+{
+	struct ipath_base_info *kinfo = kbase;
+
+	kinfo->spi_runtime_flags |=
+		IPATH_RUNTIME_PCIE | IPATH_RUNTIME_NODMA_RTAIL |
+		IPATH_RUNTIME_SDMA;
+
+	return 0;
+}
+
+static void ipath_7220_free_irq(struct ipath_devdata *dd)
+{
+	free_irq(dd->ipath_irq, dd);
+	dd->ipath_irq = 0;
+}
+
+static struct ipath_message_header *
+ipath_7220_get_msgheader(struct ipath_devdata *dd, __le32 *rhf_addr)
+{
+	u32 offset = ipath_hdrget_offset(rhf_addr);
+
+	return (struct ipath_message_header *)
+		(rhf_addr - dd->ipath_rhf_offset + offset);
+}
+
+static void ipath_7220_config_ports(struct ipath_devdata *dd, ushort cfgports)
+{
+	u32 nchipports;
+
+	nchipports = ipath_read_kreg32(dd, dd->ipath_kregs->kr_portcnt);
+	if (!cfgports) {
+		int ncpus = num_online_cpus();
+
+		if (ncpus <= 4)
+			dd->ipath_portcnt = 5;
+		else if (ncpus <= 8)
+			dd->ipath_portcnt = 9;
+		if (dd->ipath_portcnt)
+			ipath_dbg("Auto-configured for %u ports, %d cpus "
+				"online\n", dd->ipath_portcnt, ncpus);
+	} else if (cfgports <= nchipports)
+		dd->ipath_portcnt = cfgports;
+	if (!dd->ipath_portcnt) /* none of the above, set to max */
+		dd->ipath_portcnt = nchipports;
+	/*
+	 * chip can be configured for 5, 9, or 17 ports, and choice
+	 * affects number of eager TIDs per port (1K, 2K, 4K).
+	 */
+	if (dd->ipath_portcnt > 9)
+		dd->ipath_rcvctrl |= 2ULL << IBA7220_R_PORTCFG_SHIFT;
+	else if (dd->ipath_portcnt > 5)
+		dd->ipath_rcvctrl |= 1ULL << IBA7220_R_PORTCFG_SHIFT;
+	/* else configure for default 5 receive ports */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
+			 dd->ipath_rcvctrl);
+	dd->ipath_p0_rcvegrcnt = 2048; /* always */
+	if (dd->ipath_flags & IPATH_HAS_SEND_DMA)
+		dd->ipath_pioreserved = 1; /* reserve a buffer */
+}
+
+
+static int ipath_7220_get_ib_cfg(struct ipath_devdata *dd, int which)
+{
+	int lsb, ret = 0;
+	u64 maskr; /* right-justified mask */
+
+	switch (which) {
+	case IPATH_IB_CFG_HRTBT: /* Get Heartbeat off/enable/auto */
+		lsb = IBA7220_IBC_HRTBT_SHIFT;
+		maskr = IBA7220_IBC_HRTBT_MASK;
+		break;
+
+	case IPATH_IB_CFG_LWID_ENB: /* Get allowed Link-width */
+		ret = dd->ipath_link_width_enabled;
+		goto done;
+
+	case IPATH_IB_CFG_LWID: /* Get currently active Link-width */
+		ret = dd->ipath_link_width_active;
+		goto done;
+
+	case IPATH_IB_CFG_SPD_ENB: /* Get allowed Link speeds */
+		ret = dd->ipath_link_speed_enabled;
+		goto done;
+
+	case IPATH_IB_CFG_SPD: /* Get current Link spd */
+		ret = dd->ipath_link_speed_active;
+		goto done;
+
+	case IPATH_IB_CFG_RXPOL_ENB: /* Get Auto-RX-polarity enable */
+		lsb = IBA7220_IBC_RXPOL_SHIFT;
+		maskr = IBA7220_IBC_RXPOL_MASK;
+		break;
+
+	case IPATH_IB_CFG_LREV_ENB: /* Get Auto-Lane-reversal enable */
+		lsb = IBA7220_IBC_LREV_SHIFT;
+		maskr = IBA7220_IBC_LREV_MASK;
+		break;
+
+	case IPATH_IB_CFG_LINKLATENCY:
+		ret = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcddrstatus)
+			& IBA7220_DDRSTAT_LINKLAT_MASK;
+		goto done;
+
+	default:
+		ret = -ENOTSUPP;
+		goto done;
+	}
+	ret = (int)((dd->ipath_ibcddrctrl >> lsb) & maskr);
+done:
+	return ret;
+}
+
+static int ipath_7220_set_ib_cfg(struct ipath_devdata *dd, int which, u32 val)
+{
+	int lsb, ret = 0, setforce = 0;
+	u64 maskr; /* right-justified mask */
+
+	switch (which) {
+	case IPATH_IB_CFG_LIDLMC:
+		/*
+		 * Set LID and LMC. Combined to avoid possible hazard
+		 * caller puts LMC in 16MSbits, DLID in 16LSbits of val
+		 */
+		lsb = IBA7220_IBC_DLIDLMC_SHIFT;
+		maskr = IBA7220_IBC_DLIDLMC_MASK;
+		break;
+
+	case IPATH_IB_CFG_HRTBT: /* set Heartbeat off/enable/auto */
+		if (val & IPATH_IB_HRTBT_ON &&
+			(dd->ipath_flags & IPATH_NO_HRTBT))
+			goto bail;
+		lsb = IBA7220_IBC_HRTBT_SHIFT;
+		maskr = IBA7220_IBC_HRTBT_MASK;
+		break;
+
+	case IPATH_IB_CFG_LWID_ENB: /* set allowed Link-width */
+		/*
+		 * As with speed, only write the actual register if
+		 * the link is currently down, otherwise takes effect
+		 * on next link change.
+		 */
+		dd->ipath_link_width_enabled = val;
+		if ((dd->ipath_flags & (IPATH_LINKDOWN|IPATH_LINKINIT)) !=
+			IPATH_LINKDOWN)
+			goto bail;
+		/*
+		 * We set the IPATH_IB_FORCE_NOTIFY bit so updown
+		 * will get called because we want update
+		 * link_width_active, and the change may not take
+		 * effect for some time (if we are in POLL), so this
+		 * flag will force the updown routine to be called
+		 * on the next ibstatuschange down interrupt, even
+		 * if it's not an down->up transition.
+		 */
+		val--; /* convert from IB to chip */
+		maskr = IBA7220_IBC_WIDTH_MASK;
+		lsb = IBA7220_IBC_WIDTH_SHIFT;
+		setforce = 1;
+		dd->ipath_flags |= IPATH_IB_FORCE_NOTIFY;
+		break;
+
+	case IPATH_IB_CFG_SPD_ENB: /* set allowed Link speeds */
+		/*
+		 * If we turn off IB1.2, need to preset SerDes defaults,
+		 * but not right now. Set a flag for the next time
+		 * we command the link down.  As with width, only write the
+		 * actual register if the link is currently down, otherwise
+		 * takes effect on next link change.  Since setting is being
+		 * explictly requested (via MAD or sysfs), clear autoneg
+		 * failure status if speed autoneg is enabled.
+		 */
+		dd->ipath_link_speed_enabled = val;
+		if (dd->ipath_ibcddrctrl & IBA7220_IBC_IBTA_1_2_MASK &&
+		    !(val & (val - 1)))
+			dd->ipath_presets_needed = 1;
+		if ((dd->ipath_flags & (IPATH_LINKDOWN|IPATH_LINKINIT)) !=
+			IPATH_LINKDOWN)
+			goto bail;
+		/*
+		 * We set the IPATH_IB_FORCE_NOTIFY bit so updown
+		 * will get called because we want update
+		 * link_speed_active, and the change may not take
+		 * effect for some time (if we are in POLL), so this
+		 * flag will force the updown routine to be called
+		 * on the next ibstatuschange down interrupt, even
+		 * if it's not an down->up transition.  When setting
+		 * speed autoneg, clear AUTONEG_FAILED.
+		 */
+		if (val == (IPATH_IB_SDR | IPATH_IB_DDR)) {
+			val = IBA7220_IBC_SPEED_AUTONEG_MASK |
+				IBA7220_IBC_IBTA_1_2_MASK;
+			dd->ipath_flags &= ~IPATH_IB_AUTONEG_FAILED;
+		} else
+			val = val == IPATH_IB_DDR ?  IBA7220_IBC_SPEED_DDR
+				: IBA7220_IBC_SPEED_SDR;
+		maskr = IBA7220_IBC_SPEED_AUTONEG_MASK |
+			IBA7220_IBC_IBTA_1_2_MASK;
+		lsb = 0; /* speed bits are low bits */
+		setforce = 1;
+		break;
+
+	case IPATH_IB_CFG_RXPOL_ENB: /* set Auto-RX-polarity enable */
+		lsb = IBA7220_IBC_RXPOL_SHIFT;
+		maskr = IBA7220_IBC_RXPOL_MASK;
+		break;
+
+	case IPATH_IB_CFG_LREV_ENB: /* set Auto-Lane-reversal enable */
+		lsb = IBA7220_IBC_LREV_SHIFT;
+		maskr = IBA7220_IBC_LREV_MASK;
+		break;
+
+	default:
+		ret = -ENOTSUPP;
+		goto bail;
+	}
+	dd->ipath_ibcddrctrl &= ~(maskr << lsb);
+	dd->ipath_ibcddrctrl |= (((u64) val & maskr) << lsb);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcddrctrl,
+			 dd->ipath_ibcddrctrl);
+	if (setforce)
+		dd->ipath_flags |= IPATH_IB_FORCE_NOTIFY;
+bail:
+	return ret;
+}
+
+static void ipath_7220_read_counters(struct ipath_devdata *dd,
+				     struct infinipath_counters *cntrs)
+{
+	u64 *counters = (u64 *) cntrs;
+	int i;
+
+	for (i = 0; i < sizeof(*cntrs) / sizeof(u64); i++)
+		counters[i] = ipath_snap_cntr(dd, i);
+}
+
+/* if we are using MSI, try to fallback to IntX */
+static int ipath_7220_intr_fallback(struct ipath_devdata *dd)
+{
+	if (dd->ipath_msi_lo) {
+		dev_info(&dd->pcidev->dev, "MSI interrupt not detected,"
+			" trying IntX interrupts\n");
+		ipath_7220_nomsi(dd);
+		ipath_enable_intx(dd->pcidev);
+		/*
+		 * some newer kernels require free_irq before disable_msi,
+		 * and irq can be changed during disable and intx enable
+		 * and we need to therefore use the pcidev->irq value,
+		 * not our saved MSI value.
+		 */
+		dd->ipath_irq = dd->pcidev->irq;
+		if (request_irq(dd->ipath_irq, ipath_intr, IRQF_SHARED,
+			IPATH_DRV_NAME, dd))
+			ipath_dev_err(dd,
+				"Could not re-request_irq for IntX\n");
+		return 1;
+	}
+	return 0;
+}
+
+/*
+ * reset the XGXS (between serdes and IBC).  Slightly less intrusive
+ * than resetting the IBC or external link state, and useful in some
+ * cases to cause some retraining.  To do this right, we reset IBC
+ * as well.
+ */
+static void ipath_7220_xgxs_reset(struct ipath_devdata *dd)
+{
+	u64 val, prev_val;
+
+	prev_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_xgxsconfig);
+	val = prev_val | INFINIPATH_XGXS_RESET;
+	prev_val &= ~INFINIPATH_XGXS_RESET; /* be sure */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_control,
+			 dd->ipath_control & ~INFINIPATH_C_LINKENABLE);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, val);
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_xgxsconfig, prev_val);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_control,
+			 dd->ipath_control);
+}
+
+
+/* Still needs cleanup, too much hardwired stuff */
+static void autoneg_send(struct ipath_devdata *dd,
+	u32 *hdr, u32 dcnt, u32 *data)
+{
+	int i;
+	u64 cnt;
+	u32 __iomem *piobuf;
+	u32 pnum;
+
+	i = 0;
+	cnt = 7 + dcnt + 1; /* 7 dword header, dword data, icrc */
+	while (!(piobuf = ipath_getpiobuf(dd, cnt, &pnum))) {
+		if (i++ > 15) {
+			ipath_dbg("Couldn't get pio buffer for send\n");
+			return;
+		}
+		udelay(2);
+	}
+	if (dd->ipath_flags&IPATH_HAS_PBC_CNT)
+		cnt |= 0x80000000UL<<32; /* mark as VL15 */
+	writeq(cnt, piobuf);
+	ipath_flush_wc();
+	__iowrite32_copy(piobuf + 2, hdr, 7);
+	__iowrite32_copy(piobuf + 9, data, dcnt);
+	ipath_flush_wc();
+}
+
+/*
+ * _start packet gets sent twice at start, _done gets sent twice at end
+ */
+static void ipath_autoneg_send(struct ipath_devdata *dd, int which)
+{
+	static u32 swapped;
+	u32 dw, i, hcnt, dcnt, *data;
+	static u32 hdr[7] = { 0xf002ffff, 0x48ffff, 0x6400abba };
+	static u32 madpayload_start[0x40] = {
+		0x1810103, 0x1, 0x0, 0x0, 0x2c90000, 0x2c9, 0x0, 0x0,
+		0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+		0x1, 0x1388, 0x15e, 0x1, /* rest 0's */
+		};
+	static u32 madpayload_done[0x40] = {
+		0x1810103, 0x1, 0x0, 0x0, 0x2c90000, 0x2c9, 0x0, 0x0,
+		0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
+		0x40000001, 0x1388, 0x15e, /* rest 0's */
+		};
+	dcnt = sizeof(madpayload_start)/sizeof(madpayload_start[0]);
+	hcnt = sizeof(hdr)/sizeof(hdr[0]);
+	if (!swapped) {
+		/* for maintainability, do it at runtime */
+		for (i = 0; i < hcnt; i++) {
+			dw = (__force u32) cpu_to_be32(hdr[i]);
+			hdr[i] = dw;
+		}
+		for (i = 0; i < dcnt; i++) {
+			dw = (__force u32) cpu_to_be32(madpayload_start[i]);
+			madpayload_start[i] = dw;
+			dw = (__force u32) cpu_to_be32(madpayload_done[i]);
+			madpayload_done[i] = dw;
+		}
+		swapped = 1;
+	}
+
+	data = which ? madpayload_done : madpayload_start;
+	ipath_cdbg(PKT, "Sending %s special MADs\n", which?"done":"start");
+
+	autoneg_send(dd, hdr, dcnt, data);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	udelay(2);
+	autoneg_send(dd, hdr, dcnt, data);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	udelay(2);
+}
+
+
+
+/*
+ * Do the absolute minimum to cause an IB speed change, and make it
+ * ready, but don't actually trigger the change.   The caller will
+ * do that when ready (if link is in Polling training state, it will
+ * happen immediately, otherwise when link next goes down)
+ *
+ * This routine should only be used as part of the DDR autonegotation
+ * code for devices that are not compliant with IB 1.2 (or code that
+ * fixes things up for same).
+ *
+ * When link has gone down, and autoneg enabled, or autoneg has
+ * failed and we give up until next time we set both speeds, and
+ * then we want IBTA enabled as well as "use max enabled speed.
+ */
+static void set_speed_fast(struct ipath_devdata *dd, u32 speed)
+{
+	dd->ipath_ibcddrctrl &= ~(IBA7220_IBC_SPEED_AUTONEG_MASK |
+		IBA7220_IBC_IBTA_1_2_MASK |
+		(IBA7220_IBC_WIDTH_MASK << IBA7220_IBC_WIDTH_SHIFT));
+
+	if (speed == (IPATH_IB_SDR | IPATH_IB_DDR))
+		dd->ipath_ibcddrctrl |= IBA7220_IBC_SPEED_AUTONEG_MASK |
+			IBA7220_IBC_IBTA_1_2_MASK;
+	else
+		dd->ipath_ibcddrctrl |= speed == IPATH_IB_DDR ?
+			IBA7220_IBC_SPEED_DDR : IBA7220_IBC_SPEED_SDR;
+
+	/*
+	 * Convert from IB-style 1 = 1x, 2 = 4x, 3 = auto
+	 * to chip-centric       0 = 1x, 1 = 4x, 2 = auto
+	 */
+	dd->ipath_ibcddrctrl |= (u64)(dd->ipath_link_width_enabled - 1) <<
+		IBA7220_IBC_WIDTH_SHIFT;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcddrctrl,
+			dd->ipath_ibcddrctrl);
+	ipath_cdbg(VERBOSE, "setup for IB speed (%x) done\n", speed);
+}
+
+
+/*
+ * this routine is only used when we are not talking to another
+ * IB 1.2-compliant device that we think can do DDR.
+ * (This includes all existing switch chips as of Oct 2007.)
+ * 1.2-compliant devices go directly to DDR prior to reaching INIT
+ */
+static void try_auto_neg(struct ipath_devdata *dd)
+{
+	/*
+	 * required for older non-IB1.2 DDR switches.  Newer
+	 * non-IB-compliant switches don't need it, but so far,
+	 * aren't bothered by it either.  "Magic constant"
+	 */
+	ipath_write_kreg(dd, IPATH_KREG_OFFSET(IBNCModeCtrl),
+		0x3b9dc07);
+	dd->ipath_flags |= IPATH_IB_AUTONEG_INPROG;
+	ipath_autoneg_send(dd, 0);
+	set_speed_fast(dd, IPATH_IB_DDR);
+	ipath_toggle_rclkrls(dd);
+	/* 2 msec is minimum length of a poll cycle */
+	schedule_delayed_work(&dd->ipath_autoneg_work,
+		msecs_to_jiffies(2));
+}
+
+
+static int ipath_7220_ib_updown(struct ipath_devdata *dd, int ibup, u64 ibcs)
+{
+	int ret = 0;
+	u32 ltstate = ipath_ib_linkstate(dd, ibcs);
+
+	dd->ipath_link_width_active =
+		((ibcs >> IBA7220_IBCS_LINKWIDTH_SHIFT) & 1) ?
+		    IB_WIDTH_4X : IB_WIDTH_1X;
+	dd->ipath_link_speed_active =
+		((ibcs >> IBA7220_IBCS_LINKSPEED_SHIFT) & 1) ?
+		    IPATH_IB_DDR : IPATH_IB_SDR;
+
+	if (!ibup) {
+		/*
+		 * when link goes down we don't want aeq running, so it
+		 * won't't interfere with IBC training, etc., and we need
+		 * to go back to the static SerDes preset values
+		 */
+		if (dd->ipath_x1_fix_tries &&
+			 ltstate <= INFINIPATH_IBCS_LT_STATE_SLEEPQUIET &&
+			ltstate != INFINIPATH_IBCS_LT_STATE_LINKUP)
+			dd->ipath_x1_fix_tries = 0;
+		if (!(dd->ipath_flags & (IPATH_IB_AUTONEG_FAILED |
+			IPATH_IB_AUTONEG_INPROG)))
+			set_speed_fast(dd, dd->ipath_link_speed_enabled);
+		if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)) {
+			ipath_cdbg(VERBOSE, "Setting RXEQ defaults\n");
+			ipath_sd7220_presets(dd);
+		}
+		/* this might better in ipath_sd7220_presets() */
+		ipath_set_relock_poll(dd, ibup);
+	} else {
+		if (ipath_compat_ddr_negotiate &&
+		    !(dd->ipath_flags & (IPATH_IB_AUTONEG_FAILED |
+			IPATH_IB_AUTONEG_INPROG)) &&
+			dd->ipath_link_speed_active == IPATH_IB_SDR &&
+			(dd->ipath_link_speed_enabled &
+			    (IPATH_IB_DDR | IPATH_IB_SDR)) ==
+			    (IPATH_IB_DDR | IPATH_IB_SDR) &&
+			dd->ipath_autoneg_tries < IPATH_AUTONEG_TRIES) {
+			/* we are SDR, and DDR auto-negotiation enabled */
+			++dd->ipath_autoneg_tries;
+			ipath_dbg("DDR negotiation try, %u/%u\n",
+				dd->ipath_autoneg_tries,
+				IPATH_AUTONEG_TRIES);
+			try_auto_neg(dd);
+			ret = 1; /* no other IB status change processing */
+		} else if ((dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)
+			&& dd->ipath_link_speed_active == IPATH_IB_SDR) {
+			ipath_autoneg_send(dd, 1);
+			set_speed_fast(dd, IPATH_IB_DDR);
+			udelay(2);
+			ipath_toggle_rclkrls(dd);
+			ret = 1; /* no other IB status change processing */
+		} else {
+			if ((dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) &&
+				(dd->ipath_link_speed_active & IPATH_IB_DDR)) {
+				ipath_dbg("Got to INIT with DDR autoneg\n");
+				dd->ipath_flags &= ~(IPATH_IB_AUTONEG_INPROG
+					| IPATH_IB_AUTONEG_FAILED);
+				dd->ipath_autoneg_tries = 0;
+				/* re-enable SDR, for next link down */
+				set_speed_fast(dd,
+					dd->ipath_link_speed_enabled);
+				wake_up(&dd->ipath_autoneg_wait);
+			} else if (dd->ipath_flags & IPATH_IB_AUTONEG_FAILED) {
+				/*
+				 * clear autoneg failure flag, and do setup
+				 * so we'll try next time link goes down and
+				 * back to INIT (possibly connected to different
+				 * device).
+				 */
+				ipath_dbg("INIT %sDR after autoneg failure\n",
+					(dd->ipath_link_speed_active &
+					  IPATH_IB_DDR) ? "D" : "S");
+				dd->ipath_flags &= ~IPATH_IB_AUTONEG_FAILED;
+				dd->ipath_ibcddrctrl |=
+					IBA7220_IBC_IBTA_1_2_MASK;
+				ipath_write_kreg(dd,
+					IPATH_KREG_OFFSET(IBNCModeCtrl), 0);
+			}
+		}
+		/*
+		 * if we are in 1X, and are in autoneg width, it
+		 * could be due to an xgxs problem, so if we haven't
+		 * already tried, try twice to get to 4X; if we
+		 * tried, and couldn't, report it, since it will
+		 * probably not be what is desired.
+		 */
+		if ((dd->ipath_link_width_enabled & (IB_WIDTH_1X |
+			IB_WIDTH_4X)) == (IB_WIDTH_1X | IB_WIDTH_4X)
+			&& dd->ipath_link_width_active == IB_WIDTH_1X
+			&& dd->ipath_x1_fix_tries < 3) {
+			if (++dd->ipath_x1_fix_tries == 3)
+				dev_info(&dd->pcidev->dev,
+					"IB link is in 1X mode\n");
+			else {
+				ipath_cdbg(VERBOSE, "IB 1X in "
+					"auto-width, try %u to be "
+					"sure it's really 1X; "
+					"ltstate %u\n",
+					 dd->ipath_x1_fix_tries,
+					 ltstate);
+				dd->ipath_f_xgxs_reset(dd);
+				ret = 1; /* skip other processing */
+			}
+		}
+
+		if (!ret) {
+			dd->delay_mult = rate_to_delay
+			    [(ibcs >> IBA7220_IBCS_LINKSPEED_SHIFT) & 1]
+			    [(ibcs >> IBA7220_IBCS_LINKWIDTH_SHIFT) & 1];
+
+			ipath_set_relock_poll(dd, ibup);
+		}
+	}
+
+	if (!ret)
+		ipath_setup_7220_setextled(dd, ipath_ib_linkstate(dd, ibcs),
+			ltstate);
+	return ret;
+}
+
+
+/*
+ * Handle the empirically determined mechanism for auto-negotiation
+ * of DDR speed with switches.
+ */
+static void autoneg_work(struct work_struct *work)
+{
+	struct ipath_devdata *dd;
+	u64 startms;
+	u32 lastlts, i;
+
+	dd = container_of(work, struct ipath_devdata,
+		ipath_autoneg_work.work);
+
+	startms = jiffies_to_msecs(jiffies);
+
+	/*
+	 * busy wait for this first part, it should be at most a
+	 * few hundred usec, since we scheduled ourselves for 2msec.
+	 */
+	for (i = 0; i < 25; i++) {
+		lastlts = ipath_ib_linktrstate(dd, dd->ipath_lastibcstat);
+		if (lastlts == INFINIPATH_IBCS_LT_STATE_POLLQUIET) {
+			ipath_set_linkstate(dd, IPATH_IB_LINKDOWN_DISABLE);
+			break;
+		}
+		udelay(100);
+	}
+
+	if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG))
+		goto done; /* we got there early or told to stop */
+
+	/* we expect this to timeout */
+	if (wait_event_timeout(dd->ipath_autoneg_wait,
+		!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG),
+		msecs_to_jiffies(90)))
+		goto done;
+
+	ipath_toggle_rclkrls(dd);
+
+	/* we expect this to timeout */
+	if (wait_event_timeout(dd->ipath_autoneg_wait,
+		!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG),
+		msecs_to_jiffies(1700)))
+		goto done;
+
+	set_speed_fast(dd, IPATH_IB_SDR);
+	ipath_toggle_rclkrls(dd);
+
+	/*
+	 * wait up to 250 msec for link to train and get to INIT at DDR;
+	 * this should terminate early
+	 */
+	wait_event_timeout(dd->ipath_autoneg_wait,
+		!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG),
+		msecs_to_jiffies(250));
+done:
+	if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) {
+		ipath_dbg("Did not get to DDR INIT (%x) after %Lu msecs\n",
+			ipath_ib_state(dd, dd->ipath_lastibcstat),
+			jiffies_to_msecs(jiffies)-startms);
+		dd->ipath_flags &= ~IPATH_IB_AUTONEG_INPROG;
+		if (dd->ipath_autoneg_tries == IPATH_AUTONEG_TRIES) {
+			dd->ipath_flags |= IPATH_IB_AUTONEG_FAILED;
+			ipath_dbg("Giving up on DDR until next IB "
+				"link Down\n");
+			dd->ipath_autoneg_tries = 0;
+		}
+		set_speed_fast(dd, dd->ipath_link_speed_enabled);
+	}
+}
+
+
+/**
+ * ipath_init_iba7220_funcs - set up the chip-specific function pointers
+ * @dd: the infinipath device
+ *
+ * This is global, and is called directly at init to set up the
+ * chip-specific function pointers for later use.
+ */
+void ipath_init_iba7220_funcs(struct ipath_devdata *dd)
+{
+	dd->ipath_f_intrsetup = ipath_7220_intconfig;
+	dd->ipath_f_bus = ipath_setup_7220_config;
+	dd->ipath_f_reset = ipath_setup_7220_reset;
+	dd->ipath_f_get_boardname = ipath_7220_boardname;
+	dd->ipath_f_init_hwerrors = ipath_7220_init_hwerrors;
+	dd->ipath_f_early_init = ipath_7220_early_init;
+	dd->ipath_f_handle_hwerrors = ipath_7220_handle_hwerrors;
+	dd->ipath_f_quiet_serdes = ipath_7220_quiet_serdes;
+	dd->ipath_f_bringup_serdes = ipath_7220_bringup_serdes;
+	dd->ipath_f_clear_tids = ipath_7220_clear_tids;
+	dd->ipath_f_put_tid = ipath_7220_put_tid;
+	dd->ipath_f_cleanup = ipath_setup_7220_cleanup;
+	dd->ipath_f_setextled = ipath_setup_7220_setextled;
+	dd->ipath_f_get_base_info = ipath_7220_get_base_info;
+	dd->ipath_f_free_irq = ipath_7220_free_irq;
+	dd->ipath_f_tidtemplate = ipath_7220_tidtemplate;
+	dd->ipath_f_intr_fallback = ipath_7220_intr_fallback;
+	dd->ipath_f_xgxs_reset = ipath_7220_xgxs_reset;
+	dd->ipath_f_get_ib_cfg = ipath_7220_get_ib_cfg;
+	dd->ipath_f_set_ib_cfg = ipath_7220_set_ib_cfg;
+	dd->ipath_f_config_jint = ipath_7220_config_jint;
+	dd->ipath_f_config_ports = ipath_7220_config_ports;
+	dd->ipath_f_read_counters = ipath_7220_read_counters;
+	dd->ipath_f_get_msgheader = ipath_7220_get_msgheader;
+	dd->ipath_f_ib_updown = ipath_7220_ib_updown;
+
+	/* initialize chip-specific variables */
+	ipath_init_7220_variables(dd);
+}


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:08 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:08 -0700
Subject: [ofa-general] [PATCH 13/20] IB/ipath -- support for SerDes portion
	of IBA7220
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225007.28598.50581.stgit@eng-46.mv.qlogic.com>

From: Michael Albaugh <Michael.Albaugh at Qlogic.com>

The control and initialization of the SerDes blocks of the
IBA7220 is sufficiently complex to merit a separate file.
This is that file.

Signed-off-by: Michael Albaugh <Michael.Albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_sd7220.c | 1462 ++++++++++++++++++++++++++++
 1 files changed, 1462 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_sd7220.c b/drivers/infiniband/hw/ipath/ipath_sd7220.c
new file mode 100644
index 0000000..aa47eb5
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_sd7220.c
@@ -0,0 +1,1462 @@
+/*
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/*
+ * This file contains all of the code that is specific to the SerDes
+ * on the InfiniPath 7220 chip.
+ */
+
+#include <linux/pci.h>
+#include <linux/delay.h>
+
+#include "ipath_kernel.h"
+#include "ipath_registers.h"
+#include "ipath_7220.h"
+
+/*
+ * The IBSerDesMappTable is a memory that holds values to be stored in
+ * various SerDes registers by IBC. It is not part of the normal kregs
+ * map and is used in exactly one place, hence the #define below.
+ */
+#define KR_IBSerDesMappTable (0x94000 / (sizeof(uint64_t)))
+
+/*
+ * Below used for sdnum parameter, selecting one of the two sections
+ * used for PCIe, or the single SerDes used for IB.
+ */
+#define PCIE_SERDES0 0
+#define PCIE_SERDES1 1
+
+/*
+ * The EPB requires addressing in a particular form. EPB_LOC() is intended
+ * to make #definitions a little more readable.
+ */
+#define EPB_ADDR_SHF 8
+#define EPB_LOC(chn, elt, reg) \
+	(((elt & 0xf) | ((chn & 7) << 4) | ((reg & 0x3f) << 9)) << \
+	 EPB_ADDR_SHF)
+#define EPB_IB_QUAD0_CS_SHF (25)
+#define EPB_IB_QUAD0_CS (1U <<  EPB_IB_QUAD0_CS_SHF)
+#define EPB_IB_UC_CS_SHF (26)
+#define EPB_PCIE_UC_CS_SHF (27)
+#define EPB_GLOBAL_WR (1U << (EPB_ADDR_SHF + 8))
+
+/* Forward declarations. */
+static int ipath_sd7220_reg_mod(struct ipath_devdata *dd, int sdnum, u32 loc,
+				u32 data, u32 mask);
+static int ibsd_mod_allchnls(struct ipath_devdata *dd, int loc, int val,
+			     int mask);
+static int ipath_sd_trimdone_poll(struct ipath_devdata *dd);
+static void ipath_sd_trimdone_monitor(struct ipath_devdata *dd,
+				      const char *where);
+static int ipath_sd_setvals(struct ipath_devdata *dd);
+static int ipath_sd_early(struct ipath_devdata *dd);
+static int ipath_sd_dactrim(struct ipath_devdata *dd);
+/* Set the registers that IBC may muck with to their default "preset" values */
+int ipath_sd7220_presets(struct ipath_devdata *dd);
+static int ipath_internal_presets(struct ipath_devdata *dd);
+/* Tweak the register (CMUCTRL5) that contains the TRIMSELF controls */
+static int ipath_sd_trimself(struct ipath_devdata *dd, int val);
+static int epb_access(struct ipath_devdata *dd, int sdnum, int claim);
+
+void ipath_set_relock_poll(struct ipath_devdata *dd, int ibup);
+
+/*
+ * Below keeps track of whether the "once per power-on" initialization has
+ * been done, because uC code Version 1.32.17 or higher allows the uC to
+ * be reset at will, and Automatic Equalization may require it. So the
+ * state of the reset "pin", as reflected in was_reset parameter to
+ * ipath_sd7220_init() is no longer valid. Instead, we check for the
+ * actual uC code having been loaded.
+ */
+static int ipath_ibsd_ucode_loaded(struct ipath_devdata *dd)
+{
+	if (!dd->serdes_first_init_done && (ipath_sd7220_ib_vfy(dd) > 0))
+		dd->serdes_first_init_done = 1;
+	return dd->serdes_first_init_done;
+}
+
+/* repeat #define for local use. "Real" #define is in ipath_iba7220.c */
+#define INFINIPATH_HWE_IB_UC_MEMORYPARITYERR      0x0000004000000000ULL
+#define IB_MPREG5 (EPB_LOC(6, 0, 0xE) | (1L << EPB_IB_UC_CS_SHF))
+#define IB_MPREG6 (EPB_LOC(6, 0, 0xF) | (1U << EPB_IB_UC_CS_SHF))
+#define UC_PAR_CLR_D 8
+#define UC_PAR_CLR_M 0xC
+#define IB_CTRL2(chn) (EPB_LOC(chn, 7, 3) | EPB_IB_QUAD0_CS)
+#define START_EQ1(chan) EPB_LOC(chan, 7, 0x27)
+
+void ipath_sd7220_clr_ibpar(struct ipath_devdata *dd)
+{
+	int ret;
+
+	/* clear, then re-enable parity errs */
+	ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6,
+		UC_PAR_CLR_D, UC_PAR_CLR_M);
+	if (ret < 0) {
+		ipath_dev_err(dd, "Failed clearing IBSerDes Parity err\n");
+		goto bail;
+	}
+	ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, 0,
+		UC_PAR_CLR_M);
+
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	udelay(4);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrclear,
+		INFINIPATH_HWE_IB_UC_MEMORYPARITYERR);
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+bail:
+	return;
+}
+
+/*
+ * After a reset or other unusual event, the epb interface may need
+ * to be re-synchronized, between the host and the uC.
+ * returns <0 for failure to resync within IBSD_RESYNC_TRIES (not expected)
+ */
+#define IBSD_RESYNC_TRIES 3
+#define IB_PGUDP(chn) (EPB_LOC((chn), 2, 1) | EPB_IB_QUAD0_CS)
+#define IB_CMUDONE(chn) (EPB_LOC((chn), 7, 0xF) | EPB_IB_QUAD0_CS)
+
+static int ipath_resync_ibepb(struct ipath_devdata *dd)
+{
+	int ret, pat, tries, chn;
+	u32 loc;
+
+	ret = -1;
+	chn = 0;
+	for (tries = 0; tries < (4 * IBSD_RESYNC_TRIES); ++tries) {
+		loc = IB_PGUDP(chn);
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, 0, 0);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed read in resync\n");
+			continue;
+		}
+		if (ret != 0xF0 && ret != 0x55 && tries == 0)
+			ipath_dev_err(dd, "unexpected pattern in resync\n");
+		pat = ret ^ 0xA5; /* alternate F0 and 55 */
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, pat, 0xFF);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed write in resync\n");
+			continue;
+		}
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, 0, 0);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed re-read in resync\n");
+			continue;
+		}
+		if (ret != pat) {
+			ipath_dev_err(dd, "Failed compare1 in resync\n");
+			continue;
+		}
+		loc = IB_CMUDONE(chn);
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, 0, 0);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed CMUDONE rd in resync\n");
+			continue;
+		}
+		if ((ret & 0x70) != ((chn << 4) | 0x40)) {
+			ipath_dev_err(dd, "Bad CMUDONE value %02X, chn %d\n",
+				ret, chn);
+			continue;
+		}
+		if (++chn == 4)
+			break;  /* Success */
+	}
+	ipath_cdbg(VERBOSE, "Resync in %d tries\n", tries);
+	return (ret > 0) ? 0 : ret;
+}
+
+/*
+ * Localize the stuff that should be done to change IB uC reset
+ * returns <0 for errors.
+ */
+static int ipath_ibsd_reset(struct ipath_devdata *dd, int assert_rst)
+{
+	u64 rst_val;
+	int ret = 0;
+	unsigned long flags;
+
+	rst_val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibserdesctrl);
+	if (assert_rst) {
+		/*
+		 * Vendor recommends "interrupting" uC before reset, to
+		 * minimize possible glitches.
+		 */
+		spin_lock_irqsave(&dd->ipath_sdepb_lock, flags);
+		epb_access(dd, IB_7220_SERDES, 1);
+		rst_val |= 1ULL;
+		/* Squelch possible parity error from _asserting_ reset */
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
+			dd->ipath_hwerrmask &
+			~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR);
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_ibserdesctrl, rst_val);
+		/* flush write, delay to ensure it took effect */
+		ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		udelay(2);
+		/* once it's reset, can remove interrupt */
+		epb_access(dd, IB_7220_SERDES, -1);
+		spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags);
+	} else {
+		/*
+		 * Before we de-assert reset, we need to deal with
+		 * possible glitch on the Parity-error line.
+		 * Suppress it around the reset, both in chip-level
+		 * hwerrmask and in IB uC control reg. uC will allow
+		 * it again during startup.
+		 */
+		u64 val;
+		rst_val &= ~(1ULL);
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
+			dd->ipath_hwerrmask &
+			~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR);
+
+		ret = ipath_resync_ibepb(dd);
+		if (ret < 0)
+			ipath_dev_err(dd, "unable to re-sync IB EPB\n");
+
+		/* set uC control regs to suppress parity errs */
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG5, 1, 1);
+		if (ret < 0)
+			goto bail;
+		/* IB uC code past Version 1.32.17 allow suppression of wdog */
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, 0x80,
+			0x80);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed to set WDOG disable\n");
+			goto bail;
+		}
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_ibserdesctrl, rst_val);
+		/* flush write, delay for startup */
+		ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		udelay(1);
+		/* clear, then re-enable parity errs */
+		ipath_sd7220_clr_ibpar(dd);
+		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_hwerrstatus);
+		if (val & INFINIPATH_HWE_IB_UC_MEMORYPARITYERR) {
+			ipath_dev_err(dd, "IBUC Parity still set after RST\n");
+			dd->ipath_hwerrmask &=
+				~INFINIPATH_HWE_IB_UC_MEMORYPARITYERR;
+		}
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_hwerrmask,
+			dd->ipath_hwerrmask);
+	}
+
+bail:
+	return ret;
+}
+
+static void ipath_sd_trimdone_monitor(struct ipath_devdata *dd,
+       const char *where)
+{
+	int ret, chn, baduns;
+	u64 val;
+
+	if (!where)
+		where = "?";
+
+	/* give time for reset to settle out in EPB */
+	udelay(2);
+
+	ret = ipath_resync_ibepb(dd);
+	if (ret < 0)
+		ipath_dev_err(dd, "not able to re-sync IB EPB (%s)\n", where);
+
+	/* Do "sacrificial read" to get EPB in sane state after reset */
+	ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_CTRL2(0), 0, 0);
+	if (ret < 0)
+		ipath_dev_err(dd, "Failed TRIMDONE 1st read, (%s)\n", where);
+
+	/* Check/show "summary" Trim-done bit in IBCStatus */
+	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus);
+	if (val & (1ULL << 11))
+		ipath_cdbg(VERBOSE, "IBCS TRIMDONE set (%s)\n", where);
+	else
+		ipath_dev_err(dd, "IBCS TRIMDONE clear (%s)\n", where);
+
+	udelay(2);
+
+	ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, IB_MPREG6, 0x80, 0x80);
+	if (ret < 0)
+		ipath_dev_err(dd, "Failed Dummy RMW, (%s)\n", where);
+	udelay(10);
+
+	baduns = 0;
+
+	for (chn = 3; chn >= 0; --chn) {
+		/* Read CTRL reg for each channel to check TRIMDONE */
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES,
+			IB_CTRL2(chn), 0, 0);
+		if (ret < 0)
+			ipath_dev_err(dd, "Failed checking TRIMDONE, chn %d"
+				" (%s)\n", chn, where);
+
+		if (!(ret & 0x10)) {
+			int probe;
+			baduns |= (1 << chn);
+			ipath_dev_err(dd, "TRIMDONE cleared on chn %d (%02X)."
+				" (%s)\n", chn, ret, where);
+			probe = ipath_sd7220_reg_mod(dd, IB_7220_SERDES,
+				IB_PGUDP(0), 0, 0);
+			ipath_dev_err(dd, "probe is %d (%02X)\n",
+				probe, probe);
+			probe = ipath_sd7220_reg_mod(dd, IB_7220_SERDES,
+				IB_CTRL2(chn), 0, 0);
+			ipath_dev_err(dd, "re-read: %d (%02X)\n",
+				probe, probe);
+			ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES,
+				IB_CTRL2(chn), 0x10, 0x10);
+			if (ret < 0)
+				ipath_dev_err(dd,
+					"Err on TRIMDONE rewrite1\n");
+		}
+	}
+	for (chn = 3; chn >= 0; --chn) {
+		/* Read CTRL reg for each channel to check TRIMDONE */
+		if (baduns & (1 << chn)) {
+			ipath_dev_err(dd,
+				"Reseting TRIMDONE on chn %d (%s)\n",
+				chn, where);
+			ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES,
+				IB_CTRL2(chn), 0x10, 0x10);
+			if (ret < 0)
+				ipath_dev_err(dd, "Failed re-setting "
+					"TRIMDONE, chn %d (%s)\n",
+					chn, where);
+		}
+	}
+}
+
+/*
+ * Below is portion of IBA7220-specific bringup_serdes() that actually
+ * deals with registers and memory within the SerDes itself.
+ * Post IB uC code version 1.32.17, was_reset being 1 is not really
+ * informative, so we double-check.
+ */
+int ipath_sd7220_init(struct ipath_devdata *dd, int was_reset)
+{
+	int ret = 1; /* default to failure */
+	int first_reset;
+	int val_stat;
+
+	if (!was_reset) {
+		/* entered with reset not asserted, we need to do it */
+		ipath_ibsd_reset(dd, 1);
+		ipath_sd_trimdone_monitor(dd, "Driver-reload");
+	}
+
+	/* Substitute our deduced value for was_reset */
+	ret = ipath_ibsd_ucode_loaded(dd);
+	if (ret < 0) {
+		ret = 1;
+		goto done;
+	}
+	first_reset = !ret; /* First reset if IBSD uCode not yet loaded */
+
+	/*
+	 * Alter some regs per vendor latest doc, reset-defaults
+	 * are not right for IB.
+	 */
+	ret = ipath_sd_early(dd);
+	if (ret < 0) {
+		ipath_dev_err(dd, "Failed to set IB SERDES early defaults\n");
+		ret = 1;
+		goto done;
+	}
+
+	/*
+	 * Set DAC manual trim IB.
+	 * We only do this once after chip has been reset (usually
+	 * same as once per system boot).
+	 */
+	if (first_reset) {
+		ret = ipath_sd_dactrim(dd);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed IB SERDES DAC trim\n");
+			ret = 1;
+			goto done;
+		}
+	}
+
+	/*
+	 * Set various registers (DDS and RXEQ) that will be
+	 * controlled by IBC (in 1.2 mode) to reasonable preset values
+	 * Calling the "internal" version avoids the "check for needed"
+	 * and "trimdone monitor" that might be counter-productive.
+	 */
+	ret = ipath_internal_presets(dd);
+	if (ret < 0) {
+		ipath_dev_err(dd, "Failed to set IB SERDES presets\n");
+		ret = 1;
+		goto done;
+	}
+	ret = ipath_sd_trimself(dd, 0x80);
+	if (ret < 0) {
+		ipath_dev_err(dd, "Failed to set IB SERDES TRIMSELF\n");
+		ret = 1;
+		goto done;
+	}
+
+	/* Load image, then try to verify */
+	ret = 0;	/* Assume success */
+	if (first_reset) {
+		int vfy;
+		int trim_done;
+		ipath_dbg("SerDes uC was reset, reloading PRAM\n");
+		ret = ipath_sd7220_ib_load(dd);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed to load IB SERDES image\n");
+			ret = 1;
+			goto done;
+		}
+
+		/* Loaded image, try to verify */
+		vfy = ipath_sd7220_ib_vfy(dd);
+		if (vfy != ret) {
+			ipath_dev_err(dd, "SERDES PRAM VFY failed\n");
+			ret = 1;
+			goto done;
+		}
+		/*
+		 * Loaded and verified. Almost good...
+		 * hold "success" in ret
+		 */
+		ret = 0;
+
+		/*
+		 * Prev steps all worked, continue bringup
+		 * De-assert RESET to uC, only in first reset, to allow
+		 * trimming.
+		 *
+		 * Since our default setup sets START_EQ1 to
+		 * PRESET, we need to clear that for this very first run.
+		 */
+		ret = ibsd_mod_allchnls(dd, START_EQ1(0), 0, 0x38);
+		if (ret < 0) {
+			ipath_dev_err(dd, "Failed clearing START_EQ1\n");
+			ret = 1;
+			goto done;
+		}
+
+		ipath_ibsd_reset(dd, 0);
+		/*
+		 * If this is not the first reset, trimdone should be set
+		 * already.
+		 */
+		trim_done = ipath_sd_trimdone_poll(dd);
+		/*
+		 * Whether or not trimdone succeeded, we need to put the
+		 * uC back into reset to avoid a possible fight with the
+		 * IBC state-machine.
+		 */
+		ipath_ibsd_reset(dd, 1);
+
+		if (!trim_done) {
+			ipath_dev_err(dd, "No TRIMDONE seen\n");
+			ret = 1;
+			goto done;
+		}
+
+		ipath_sd_trimdone_monitor(dd, "First-reset");
+		/* Remember so we do not re-do the load, dactrim, etc. */
+		dd->serdes_first_init_done = 1;
+	}
+	/*
+	 * Setup for channel training and load values for
+	 * RxEq and DDS in tables used by IBC in IB1.2 mode
+	 */
+
+	val_stat = ipath_sd_setvals(dd);
+	if (val_stat < 0)
+		ret = 1;
+done:
+	/* start relock timer regardless, but start at 1 second */
+	ipath_set_relock_poll(dd, -1);
+	return ret;
+}
+
+#define EPB_ACC_REQ 1
+#define EPB_ACC_GNT 0x100
+#define EPB_DATA_MASK 0xFF
+#define EPB_RD (1ULL << 24)
+#define EPB_TRANS_RDY (1ULL << 31)
+#define EPB_TRANS_ERR (1ULL << 30)
+#define EPB_TRANS_TRIES 5
+
+/*
+ * query, claim, release ownership of the EPB (External Parallel Bus)
+ * for a specified SERDES.
+ * the "claim" parameter is >0 to claim, <0 to release, 0 to query.
+ * Returns <0 for errors, >0 if we had ownership, else 0.
+ */
+static int epb_access(struct ipath_devdata *dd, int sdnum, int claim)
+{
+	u16 acc;
+	u64 accval;
+	int owned = 0;
+	u64 oct_sel = 0;
+
+	switch (sdnum) {
+	case IB_7220_SERDES :
+		/*
+		 * The IB SERDES "ownership" is fairly simple. A single each
+		 * request/grant.
+		 */
+		acc = dd->ipath_kregs->kr_ib_epbacc;
+		break;
+	case PCIE_SERDES0 :
+	case PCIE_SERDES1 :
+		/* PCIe SERDES has two "octants", need to select which */
+		acc = dd->ipath_kregs->kr_pcie_epbacc;
+		oct_sel = (2 << (sdnum - PCIE_SERDES0));
+		break;
+	default :
+		return 0;
+	}
+
+	/* Make sure any outstanding transaction was seen */
+	ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+	udelay(15);
+
+	accval = ipath_read_kreg32(dd, acc);
+
+	owned = !!(accval & EPB_ACC_GNT);
+	if (claim < 0) {
+		/* Need to release */
+		u64 pollval;
+		/*
+		 * The only writeable bits are the request and CS.
+		 * Both should be clear
+		 */
+		u64 newval = 0;
+		ipath_write_kreg(dd, acc, newval);
+		/* First read after write is not trustworthy */
+		pollval = ipath_read_kreg32(dd, acc);
+		udelay(5);
+		pollval = ipath_read_kreg32(dd, acc);
+		if (pollval & EPB_ACC_GNT)
+			owned = -1;
+	} else if (claim > 0) {
+		/* Need to claim */
+		u64 pollval;
+		u64 newval = EPB_ACC_REQ | oct_sel;
+		ipath_write_kreg(dd, acc, newval);
+		/* First read after write is not trustworthy */
+		pollval = ipath_read_kreg32(dd, acc);
+		udelay(5);
+		pollval = ipath_read_kreg32(dd, acc);
+		if (!(pollval & EPB_ACC_GNT))
+			owned = -1;
+	}
+	return owned;
+}
+
+/*
+ * Lemma to deal with race condition of write..read to epb regs
+ */
+static int epb_trans(struct ipath_devdata *dd, u16 reg, u64 i_val, u64 *o_vp)
+{
+	int tries;
+	u64 transval;
+
+
+	ipath_write_kreg(dd, reg, i_val);
+	/* Throw away first read, as RDY bit may be stale */
+	transval = ipath_read_kreg64(dd, reg);
+
+	for (tries = EPB_TRANS_TRIES; tries; --tries) {
+		transval = ipath_read_kreg32(dd, reg);
+		if (transval & EPB_TRANS_RDY)
+			break;
+		udelay(5);
+	}
+	if (transval & EPB_TRANS_ERR)
+		return -1;
+	if (tries > 0 && o_vp)
+		*o_vp = transval;
+	return tries;
+}
+
+/**
+ *
+ * ipath_sd7220_reg_mod - modify SERDES register
+ * @dd: the infinipath device
+ * @sdnum: which SERDES to access
+ * @loc: location - channel, element, register, as packed by EPB_LOC() macro.
+ * @wd: Write Data - value to set in register
+ * @mask: ones where data should be spliced into reg.
+ *
+ * Basic register read/modify/write, with un-needed acesses elided. That is,
+ * a mask of zero will prevent write, while a mask of 0xFF will prevent read.
+ * returns current (presumed, if a write was done) contents of selected
+ * register, or <0 if errors.
+ */
+static int ipath_sd7220_reg_mod(struct ipath_devdata *dd, int sdnum, u32 loc,
+				u32 wd, u32 mask)
+{
+	u16 trans;
+	u64 transval;
+	int owned;
+	int tries, ret;
+	unsigned long flags;
+
+	switch (sdnum) {
+	case IB_7220_SERDES :
+		trans = dd->ipath_kregs->kr_ib_epbtrans;
+		break;
+	case PCIE_SERDES0 :
+	case PCIE_SERDES1 :
+		trans = dd->ipath_kregs->kr_pcie_epbtrans;
+		break;
+	default :
+		return -1;
+	}
+
+	/*
+	 * All access is locked in software (vs other host threads) and
+	 * hardware (vs uC access).
+	 */
+	spin_lock_irqsave(&dd->ipath_sdepb_lock, flags);
+
+	owned = epb_access(dd, sdnum, 1);
+	if (owned < 0) {
+		spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags);
+		return -1;
+	}
+	ret = 0;
+	for (tries = EPB_TRANS_TRIES; tries; --tries) {
+		transval = ipath_read_kreg32(dd, trans);
+		if (transval & EPB_TRANS_RDY)
+			break;
+		udelay(5);
+	}
+
+	if (tries > 0) {
+		tries = 1;	/* to make read-skip work */
+		if (mask != 0xFF) {
+			/*
+			 * Not a pure write, so need to read.
+			 * loc encodes chip-select as well as address
+			 */
+			transval = loc | EPB_RD;
+			tries = epb_trans(dd, trans, transval, &transval);
+		}
+		if (tries > 0 && mask != 0) {
+			/*
+			 * Not a pure read, so need to write.
+			 */
+			wd = (wd & mask) | (transval & ~mask);
+			transval = loc | (wd & EPB_DATA_MASK);
+			tries = epb_trans(dd, trans, transval, &transval);
+		}
+	}
+	/* else, failed to see ready, what error-handling? */
+
+	/*
+	 * Release bus. Failure is an error.
+	 */
+	if (epb_access(dd, sdnum, -1) < 0)
+		ret = -1;
+	else
+		ret = transval & EPB_DATA_MASK;
+
+	spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags);
+	if (tries <= 0)
+		ret = -1;
+	return ret;
+}
+
+#define EPB_ROM_R (2)
+#define EPB_ROM_W (1)
+/*
+ * Below, all uC-related, use appropriate UC_CS, depending
+ * on which SerDes is used.
+ */
+#define EPB_UC_CTL EPB_LOC(6, 0, 0)
+#define EPB_MADDRL EPB_LOC(6, 0, 2)
+#define EPB_MADDRH EPB_LOC(6, 0, 3)
+#define EPB_ROMDATA EPB_LOC(6, 0, 4)
+#define EPB_RAMDATA EPB_LOC(6, 0, 5)
+
+/* Transfer date to/from uC Program RAM of IB or PCIe SerDes */
+static int ipath_sd7220_ram_xfer(struct ipath_devdata *dd, int sdnum, u32 loc,
+			       u8 *buf, int cnt, int rd_notwr)
+{
+	u16 trans;
+	u64 transval;
+	u64 csbit;
+	int owned;
+	int tries;
+	int sofar;
+	int addr;
+	int ret;
+	unsigned long flags;
+	const char *op;
+
+	/* Pick appropriate transaction reg and "Chip select" for this serdes */
+	switch (sdnum) {
+	case IB_7220_SERDES :
+		csbit = 1ULL << EPB_IB_UC_CS_SHF;
+		trans = dd->ipath_kregs->kr_ib_epbtrans;
+		break;
+	case PCIE_SERDES0 :
+	case PCIE_SERDES1 :
+		/* PCIe SERDES has uC "chip select" in different bit, too */
+		csbit = 1ULL << EPB_PCIE_UC_CS_SHF;
+		trans = dd->ipath_kregs->kr_pcie_epbtrans;
+		break;
+	default :
+		return -1;
+	}
+
+	op = rd_notwr ? "Rd" : "Wr";
+	spin_lock_irqsave(&dd->ipath_sdepb_lock, flags);
+
+	owned = epb_access(dd, sdnum, 1);
+	if (owned < 0) {
+		spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags);
+		ipath_dbg("Could not get %s access to %s EPB: %X, loc %X\n",
+			op, (sdnum == IB_7220_SERDES) ? "IB" : "PCIe",
+			owned, loc);
+		return -1;
+	}
+
+	/*
+	 * In future code, we may need to distinguish several address ranges,
+	 * and select various memories based on this. For now, just trim
+	 * "loc" (location including address and memory select) to
+	 * "addr" (address within memory). we will only support PRAM
+	 * The memory is 8KB.
+	 */
+	addr = loc & 0x1FFF;
+	for (tries = EPB_TRANS_TRIES; tries; --tries) {
+		transval = ipath_read_kreg32(dd, trans);
+		if (transval & EPB_TRANS_RDY)
+			break;
+		udelay(5);
+	}
+
+	sofar = 0;
+	if (tries <= 0)
+		ipath_dbg("No initial RDY on EPB access request\n");
+	else {
+		/*
+		 * Every "memory" access is doubly-indirect.
+		 * We set two bytes of address, then read/write
+		 * one or mores bytes of data.
+		 */
+
+		/* First, we set control to "Read" or "Write" */
+		transval = csbit | EPB_UC_CTL |
+			(rd_notwr ? EPB_ROM_R : EPB_ROM_W);
+		tries = epb_trans(dd, trans, transval, &transval);
+		if (tries <= 0)
+			ipath_dbg("No EPB response to uC %s cmd\n", op);
+		while (tries > 0 && sofar < cnt) {
+			if (!sofar) {
+				/* Only set address at start of chunk */
+				int addrbyte = (addr + sofar) >> 8;
+				transval = csbit | EPB_MADDRH | addrbyte;
+				tries = epb_trans(dd, trans, transval,
+						  &transval);
+				if (tries <= 0) {
+					ipath_dbg("No EPB response ADDRH\n");
+					break;
+				}
+				addrbyte = (addr + sofar) & 0xFF;
+				transval = csbit | EPB_MADDRL | addrbyte;
+				tries = epb_trans(dd, trans, transval,
+						 &transval);
+				if (tries <= 0) {
+					ipath_dbg("No EPB response ADDRL\n");
+					break;
+				}
+			}
+
+			if (rd_notwr)
+				transval = csbit | EPB_ROMDATA | EPB_RD;
+			else
+				transval = csbit | EPB_ROMDATA | buf[sofar];
+			tries = epb_trans(dd, trans, transval, &transval);
+			if (tries <= 0) {
+				ipath_dbg("No EPB response DATA\n");
+				break;
+			}
+			if (rd_notwr)
+				buf[sofar] = transval & EPB_DATA_MASK;
+			++sofar;
+		}
+		/* Finally, clear control-bit for Read or Write */
+		transval = csbit | EPB_UC_CTL;
+		tries = epb_trans(dd, trans, transval, &transval);
+		if (tries <= 0)
+			ipath_dbg("No EPB response to drop of uC %s cmd\n", op);
+	}
+
+	ret = sofar;
+	/* Release bus. Failure is an error */
+	if (epb_access(dd, sdnum, -1) < 0)
+		ret = -1;
+
+	spin_unlock_irqrestore(&dd->ipath_sdepb_lock, flags);
+	if (tries <= 0) {
+		ipath_dbg("SERDES PRAM %s failed after %d bytes\n", op, sofar);
+		ret = -1;
+	}
+	return ret;
+}
+
+#define PROG_CHUNK 64
+
+int ipath_sd7220_prog_ld(struct ipath_devdata *dd, int sdnum,
+	u8 *img, int len, int offset)
+{
+	int cnt, sofar, req;
+
+	sofar = 0;
+	while (sofar < len) {
+		req = len - sofar;
+		if (req > PROG_CHUNK)
+			req = PROG_CHUNK;
+		cnt = ipath_sd7220_ram_xfer(dd, sdnum, offset + sofar,
+					  img + sofar, req, 0);
+		if (cnt < req) {
+			sofar = -1;
+			break;
+		}
+		sofar += req;
+	}
+	return sofar;
+}
+
+#define VFY_CHUNK 64
+#define SD_PRAM_ERROR_LIMIT 42
+
+int ipath_sd7220_prog_vfy(struct ipath_devdata *dd, int sdnum,
+	const u8 *img, int len, int offset)
+{
+	int cnt, sofar, req, idx, errors;
+	unsigned char readback[VFY_CHUNK];
+
+	errors = 0;
+	sofar = 0;
+	while (sofar < len) {
+		req = len - sofar;
+		if (req > VFY_CHUNK)
+			req = VFY_CHUNK;
+		cnt = ipath_sd7220_ram_xfer(dd, sdnum, sofar + offset,
+					  readback, req, 1);
+		if (cnt < req) {
+			/* failed in read itself */
+			sofar = -1;
+			break;
+		}
+		for (idx = 0; idx < cnt; ++idx) {
+			if (readback[idx] != img[idx+sofar])
+				++errors;
+		}
+		sofar += cnt;
+	}
+	return errors ? -errors : sofar;
+}
+
+/* IRQ not set up at this point in init, so we poll. */
+#define IB_SERDES_TRIM_DONE (1ULL << 11)
+#define TRIM_TMO (30)
+
+static int ipath_sd_trimdone_poll(struct ipath_devdata *dd)
+{
+	int trim_tmo, ret;
+	uint64_t val;
+
+	/*
+	 * Default to failure, so IBC will not start
+	 * without IB_SERDES_TRIM_DONE.
+	 */
+	ret = 0;
+	for (trim_tmo = 0; trim_tmo < TRIM_TMO; ++trim_tmo) {
+		val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus);
+		if (val & IB_SERDES_TRIM_DONE) {
+			ipath_cdbg(VERBOSE, "TRIMDONE after %d\n", trim_tmo);
+			ret = 1;
+			break;
+		}
+		msleep(10);
+	}
+	if (trim_tmo >= TRIM_TMO) {
+		ipath_dev_err(dd, "No TRIMDONE in %d tries\n", trim_tmo);
+		ret = 0;
+	}
+	return ret;
+}
+
+#define TX_FAST_ELT (9)
+
+/*
+ * Set the "negotiation" values for SERDES. These are used by the IB1.2
+ * link negotiation. Macros below are attempt to keep the values a
+ * little more human-editable.
+ * First, values related to Drive De-emphasis Settings.
+ */
+
+#define NUM_DDS_REGS 6
+#define DDS_REG_MAP 0x76A910 /* LSB-first list of regs (in elt 9) to mod */
+
+#define DDS_VAL(amp_d, main_d, ipst_d, ipre_d, amp_s, main_s, ipst_s, ipre_s) \
+	{ { ((amp_d & 0x1F) << 1) | 1, ((amp_s & 0x1F) << 1) | 1, \
+	  (main_d << 3) | 4 | (ipre_d >> 2), \
+	  (main_s << 3) | 4 | (ipre_s >> 2), \
+	  ((ipst_d & 0xF) << 1) | ((ipre_d & 3) << 6) | 0x21, \
+	  ((ipst_s & 0xF) << 1) | ((ipre_s & 3) << 6) | 0x21 } }
+
+static struct dds_init {
+	uint8_t reg_vals[NUM_DDS_REGS];
+} dds_init_vals[] = {
+	/*       DDR(FDR)       SDR(HDR)   */
+	/* Vendor recommends below for 3m cable */
+#define DDS_3M 0
+	DDS_VAL(31, 19, 12, 0, 29, 22,  9, 0),
+	DDS_VAL(31, 12, 15, 4, 31, 15, 15, 1),
+	DDS_VAL(31, 13, 15, 3, 31, 16, 15, 0),
+	DDS_VAL(31, 14, 15, 2, 31, 17, 14, 0),
+	DDS_VAL(31, 15, 15, 1, 31, 18, 13, 0),
+	DDS_VAL(31, 16, 15, 0, 31, 19, 12, 0),
+	DDS_VAL(31, 17, 14, 0, 31, 20, 11, 0),
+	DDS_VAL(31, 18, 13, 0, 30, 21, 10, 0),
+	DDS_VAL(31, 20, 11, 0, 28, 23,  8, 0),
+	DDS_VAL(31, 21, 10, 0, 27, 24,  7, 0),
+	DDS_VAL(31, 22,  9, 0, 26, 25,  6, 0),
+	DDS_VAL(30, 23,  8, 0, 25, 26,  5, 0),
+	DDS_VAL(29, 24,  7, 0, 23, 27,  4, 0),
+	/* Vendor recommends below for 1m cable */
+#define DDS_1M 13
+	DDS_VAL(28, 25,  6, 0, 21, 28,  3, 0),
+	DDS_VAL(27, 26,  5, 0, 19, 29,  2, 0),
+	DDS_VAL(25, 27,  4, 0, 17, 30,  1, 0)
+};
+
+/*
+ * Next, values related to Receive Equalization.
+ * In comments, FDR (Full) is IB DDR, HDR (Half) is IB SDR
+ */
+/* Hardware packs an element number and register address thus: */
+#define RXEQ_INIT_RDESC(elt, addr) (((elt) & 0xF) | ((addr) << 4))
+#define RXEQ_VAL(elt, adr, val0, val1, val2, val3) \
+	{RXEQ_INIT_RDESC((elt), (adr)), {(val0), (val1), (val2), (val3)} }
+
+#define RXEQ_VAL_ALL(elt, adr, val)  \
+	{RXEQ_INIT_RDESC((elt), (adr)), {(val), (val), (val), (val)} }
+
+#define RXEQ_SDR_DFELTH 0
+#define RXEQ_SDR_TLTH 0
+#define RXEQ_SDR_G1CNT_Z1CNT 0x11
+#define RXEQ_SDR_ZCNT 23
+
+static struct rxeq_init {
+	u16 rdesc;	/* in form used in SerDesDDSRXEQ */
+	u8  rdata[4];
+} rxeq_init_vals[] = {
+	/* Set Rcv Eq. to Preset node */
+	RXEQ_VAL_ALL(7, 0x27, 0x10),
+	/* Set DFELTHFDR/HDR thresholds */
+	RXEQ_VAL(7, 8,    0, 0, 0, 0), /* FDR */
+	RXEQ_VAL(7, 0x21, 0, 0, 0, 0), /* HDR */
+	/* Set TLTHFDR/HDR theshold */
+	RXEQ_VAL(7, 9,    2, 2, 2, 2), /* FDR */
+	RXEQ_VAL(7, 0x23, 2, 2, 2, 2), /* HDR */
+	/* Set Preamp setting 2 (ZFR/ZCNT) */
+	RXEQ_VAL(7, 0x1B, 12, 12, 12, 12), /* FDR */
+	RXEQ_VAL(7, 0x1C, 12, 12, 12, 12), /* HDR */
+	/* Set Preamp DC gain and Setting 1 (GFR/GHR) */
+	RXEQ_VAL(7, 0x1E, 0x10, 0x10, 0x10, 0x10), /* FDR */
+	RXEQ_VAL(7, 0x1F, 0x10, 0x10, 0x10, 0x10), /* HDR */
+	/* Toggle RELOCK (in VCDL_CTRL0) to lock to data */
+	RXEQ_VAL_ALL(6, 6, 0x20), /* Set D5 High */
+	RXEQ_VAL_ALL(6, 6, 0), /* Set D5 Low */
+};
+
+/* There are 17 values from vendor, but IBC only accesses the first 16 */
+#define DDS_ROWS (16)
+#define RXEQ_ROWS ARRAY_SIZE(rxeq_init_vals)
+
+static int ipath_sd_setvals(struct ipath_devdata *dd)
+{
+	int idx, midx;
+	int min_idx;	 /* Minimum index for this portion of table */
+	uint32_t dds_reg_map;
+	u64 __iomem *taddr, *iaddr;
+	uint64_t data;
+	uint64_t sdctl;
+
+	taddr = dd->ipath_kregbase + KR_IBSerDesMappTable;
+	iaddr = dd->ipath_kregbase + dd->ipath_kregs->kr_ib_ddsrxeq;
+
+	/*
+	 * Init the DDS section of the table.
+	 * Each "row" of the table provokes NUM_DDS_REG writes, to the
+	 * registers indicated in DDS_REG_MAP.
+	 */
+	sdctl = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibserdesctrl);
+	sdctl = (sdctl & ~(0x1f << 8)) | (NUM_DDS_REGS << 8);
+	sdctl = (sdctl & ~(0x1f << 13)) | (RXEQ_ROWS << 13);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_ibserdesctrl, sdctl);
+
+	/*
+	 * Iterate down table within loop for each register to store.
+	 */
+	dds_reg_map = DDS_REG_MAP;
+	for (idx = 0; idx < NUM_DDS_REGS; ++idx) {
+		data = ((dds_reg_map & 0xF) << 4) | TX_FAST_ELT;
+		writeq(data, iaddr + idx);
+		mmiowb();
+		ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		dds_reg_map >>= 4;
+		for (midx = 0; midx < DDS_ROWS; ++midx) {
+			u64 __iomem *daddr = taddr + ((midx << 4) + idx);
+			data = dds_init_vals[midx].reg_vals[idx];
+			writeq(data, daddr);
+			mmiowb();
+			ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		} /* End inner for (vals for this reg, each row) */
+	} /* end outer for (regs to be stored) */
+
+	/*
+	 * Init the RXEQ section of the table. As explained above the table
+	 * rxeq_init_vals[], this runs in a different order, as the pattern
+	 * of register references is more complex, but there are only
+	 * four "data" values per register.
+	 */
+	min_idx = idx; /* RXEQ indices pick up where DDS left off */
+	taddr += 0x100; /* RXEQ data is in second half of table */
+	/* Iterate through RXEQ register addresses */
+	for (idx = 0; idx < RXEQ_ROWS; ++idx) {
+		int didx; /* "destination" */
+		int vidx;
+
+		/* didx is offset by min_idx to address RXEQ range of regs */
+		didx = idx + min_idx;
+		/* Store the next RXEQ register address */
+		writeq(rxeq_init_vals[idx].rdesc, iaddr + didx);
+		mmiowb();
+		ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		/* Iterate through RXEQ values */
+		for (vidx = 0; vidx < 4; vidx++) {
+			data = rxeq_init_vals[idx].rdata[vidx];
+			writeq(data, taddr + (vidx << 6) + idx);
+			mmiowb();
+			ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
+		}
+	} /* end outer for (Reg-writes for RXEQ) */
+	return 0;
+}
+
+#define CMUCTRL5 EPB_LOC(7, 0, 0x15)
+#define RXHSCTRL0(chan) EPB_LOC(chan, 6, 0)
+#define VCDL_DAC2(chan) EPB_LOC(chan, 6, 5)
+#define VCDL_CTRL0(chan) EPB_LOC(chan, 6, 6)
+#define VCDL_CTRL2(chan) EPB_LOC(chan, 6, 8)
+#define START_EQ2(chan) EPB_LOC(chan, 7, 0x28)
+
+static int ibsd_sto_noisy(struct ipath_devdata *dd, int loc, int val, int mask)
+{
+	int ret = -1;
+	int sloc; /* shifted loc, for messages */
+
+	loc |= (1U << EPB_IB_QUAD0_CS_SHF);
+	sloc = loc >> EPB_ADDR_SHF;
+
+	ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, val, mask);
+	if (ret < 0)
+		ipath_dev_err(dd, "Write failed: elt %d,"
+			" addr 0x%X, chnl %d, val 0x%02X, mask 0x%02X\n",
+			(sloc & 0xF), (sloc >> 9) & 0x3f, (sloc >> 4) & 7,
+			val & 0xFF, mask & 0xFF);
+	return ret;
+}
+
+/*
+ * Repeat a "store" across all channels of the IB SerDes.
+ * Although nominally it inherits the "read value" of the last
+ * channel it modified, the only really useful return is <0 for
+ * failure, >= 0 for success. The parameter 'loc' is assumed to
+ * be the location for the channel-0 copy of the register to
+ * be modified.
+ */
+static int ibsd_mod_allchnls(struct ipath_devdata *dd, int loc, int val,
+	int mask)
+{
+	int ret = -1;
+	int chnl;
+
+	if (loc & EPB_GLOBAL_WR) {
+		/*
+		 * Our caller has assured us that we can set all four
+		 * channels at once. Trust that. If mask is not 0xFF,
+		 * we will read the _specified_ channel for our starting
+		 * value.
+		 */
+		loc |= (1U << EPB_IB_QUAD0_CS_SHF);
+		chnl = (loc >> (4 + EPB_ADDR_SHF)) & 7;
+		if (mask != 0xFF) {
+			ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES,
+				loc & ~EPB_GLOBAL_WR, 0, 0);
+			if (ret < 0) {
+				int sloc = loc >> EPB_ADDR_SHF;
+				ipath_dev_err(dd, "pre-read failed: elt %d,"
+					" addr 0x%X, chnl %d\n", (sloc & 0xF),
+					(sloc >> 9) & 0x3f, chnl);
+				return ret;
+			}
+			val = (ret & ~mask) | (val & mask);
+		}
+		loc &=  ~(7 << (4+EPB_ADDR_SHF));
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, loc, val, 0xFF);
+		if (ret < 0) {
+			int sloc = loc >> EPB_ADDR_SHF;
+			ipath_dev_err(dd, "Global WR failed: elt %d,"
+				" addr 0x%X, val %02X\n",
+				(sloc & 0xF), (sloc >> 9) & 0x3f, val);
+		}
+		return ret;
+	}
+	/* Clear "channel" and set CS so we can simply iterate */
+	loc &=  ~(7 << (4+EPB_ADDR_SHF));
+	loc |= (1U << EPB_IB_QUAD0_CS_SHF);
+	for (chnl = 0; chnl < 4; ++chnl) {
+		int cloc;
+		cloc = loc | (chnl << (4+EPB_ADDR_SHF));
+		ret = ipath_sd7220_reg_mod(dd, IB_7220_SERDES, cloc, val, mask);
+		if (ret < 0) {
+			int sloc = loc >> EPB_ADDR_SHF;
+			ipath_dev_err(dd, "Write failed: elt %d,"
+				" addr 0x%X, chnl %d, val 0x%02X,"
+				" mask 0x%02X\n",
+				(sloc & 0xF), (sloc >> 9) & 0x3f, chnl,
+				val & 0xFF, mask & 0xFF);
+			break;
+		}
+	}
+	return ret;
+}
+
+/*
+ * Set the Tx values normally modified by IBC in IB1.2 mode to default
+ * values, as gotten from first row of init table.
+ */
+static int set_dds_vals(struct ipath_devdata *dd, struct dds_init *ddi)
+{
+	int ret;
+	int idx, reg, data;
+	uint32_t regmap;
+
+	regmap = DDS_REG_MAP;
+	for (idx = 0; idx < NUM_DDS_REGS; ++idx) {
+		reg = (regmap & 0xF);
+		regmap >>= 4;
+		data = ddi->reg_vals[idx];
+		/* Vendor says RMW not needed for these regs, use 0xFF mask */
+		ret = ibsd_mod_allchnls(dd, EPB_LOC(0, 9, reg), data, 0xFF);
+		if (ret < 0)
+			break;
+	}
+	return ret;
+}
+
+/*
+ * Set the Rx values normally modified by IBC in IB1.2 mode to default
+ * values, as gotten from selected column of init table.
+ */
+static int set_rxeq_vals(struct ipath_devdata *dd, int vsel)
+{
+	int ret;
+	int ridx;
+	int cnt = ARRAY_SIZE(rxeq_init_vals);
+
+	for (ridx = 0; ridx < cnt; ++ridx) {
+		int elt, reg, val, loc;
+		elt = rxeq_init_vals[ridx].rdesc & 0xF;
+		reg = rxeq_init_vals[ridx].rdesc >> 4;
+		loc = EPB_LOC(0, elt, reg);
+		val = rxeq_init_vals[ridx].rdata[vsel];
+		/* mask of 0xFF, because hardware does full-byte store. */
+		ret = ibsd_mod_allchnls(dd, loc, val, 0xFF);
+		if (ret < 0)
+			break;
+	}
+	return ret;
+}
+
+/*
+ * Set the default values (row 0) for DDR Driver Demphasis.
+ * we do this initially and whenever we turn off IB-1.2
+ * The "default" values for Rx equalization are also stored to
+ * SerDes registers. Formerly (and still default), we used set 2.
+ * For experimenting with cables and link-partners, we allow changing
+ * that via a module parameter.
+ */
+static unsigned ipath_rxeq_set = 2;
+module_param_named(rxeq_default_set, ipath_rxeq_set, uint,
+	S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(rxeq_default_set,
+	"Which set [0..3] of Rx Equalization values is default");
+
+static int ipath_internal_presets(struct ipath_devdata *dd)
+{
+	int ret = 0;
+
+	ret = set_dds_vals(dd, dds_init_vals + DDS_3M);
+
+	if (ret < 0)
+		ipath_dev_err(dd, "Failed to set default DDS values\n");
+	ret = set_rxeq_vals(dd, ipath_rxeq_set & 3);
+	if (ret < 0)
+		ipath_dev_err(dd, "Failed to set default RXEQ values\n");
+	return ret;
+}
+
+int ipath_sd7220_presets(struct ipath_devdata *dd)
+{
+	int ret = 0;
+
+	if (!dd->ipath_presets_needed)
+		return ret;
+	dd->ipath_presets_needed = 0;
+	/* Assert uC reset, so we don't clash with it. */
+	ipath_ibsd_reset(dd, 1);
+	udelay(2);
+	ipath_sd_trimdone_monitor(dd, "link-down");
+
+	ret = ipath_internal_presets(dd);
+return ret;
+}
+
+static int ipath_sd_trimself(struct ipath_devdata *dd, int val)
+{
+	return ibsd_sto_noisy(dd, CMUCTRL5, val, 0xFF);
+}
+
+static int ipath_sd_early(struct ipath_devdata *dd)
+{
+	int ret = -1; /* Default failed */
+	int chnl;
+
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, RXHSCTRL0(chnl), 0xD4, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, VCDL_DAC2(chnl), 0x2D, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+	/* more fine-tuning of what will be default */
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, VCDL_CTRL2(chnl), 3, 0xF);
+		if (ret < 0)
+			goto bail;
+	}
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, START_EQ1(chnl), 0x10, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, START_EQ2(chnl), 0x30, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+bail:
+	return ret;
+}
+
+#define BACTRL(chnl) EPB_LOC(chnl, 6, 0x0E)
+#define LDOUTCTRL1(chnl) EPB_LOC(chnl, 7, 6)
+#define RXHSSTATUS(chnl) EPB_LOC(chnl, 6, 0xF)
+
+static int ipath_sd_dactrim(struct ipath_devdata *dd)
+{
+	int ret = -1; /* Default failed */
+	int chnl;
+
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, BACTRL(chnl), 0x40, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, LDOUTCTRL1(chnl), 0x04, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, RXHSSTATUS(chnl), 0x04, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+	/*
+	 * delay for max possible number of steps, with slop.
+	 * Each step is about 4usec.
+	 */
+	udelay(415);
+	for (chnl = 0; chnl < 4; ++chnl) {
+		ret = ibsd_sto_noisy(dd, LDOUTCTRL1(chnl), 0x00, 0xFF);
+		if (ret < 0)
+			goto bail;
+	}
+bail:
+	return ret;
+}
+
+#define RELOCK_FIRST_MS 3
+#define RXLSPPM(chan) EPB_LOC(chan, 0, 2)
+void ipath_toggle_rclkrls(struct ipath_devdata *dd)
+{
+	int loc = RXLSPPM(0) | EPB_GLOBAL_WR;
+	int ret;
+
+	ret = ibsd_mod_allchnls(dd, loc, 0, 0x80);
+	if (ret < 0)
+		ipath_dev_err(dd, "RCLKRLS failed to clear D7\n");
+	else {
+		udelay(1);
+		ibsd_mod_allchnls(dd, loc, 0x80, 0x80);
+	}
+	/* And again for good measure */
+	udelay(1);
+	ret = ibsd_mod_allchnls(dd, loc, 0, 0x80);
+	if (ret < 0)
+		ipath_dev_err(dd, "RCLKRLS failed to clear D7\n");
+	else {
+		udelay(1);
+		ibsd_mod_allchnls(dd, loc, 0x80, 0x80);
+	}
+	/* Now reset xgxs and IBC to complete the recovery */
+	dd->ipath_f_xgxs_reset(dd);
+}
+
+/*
+ * Shut down the timer that polls for relock occasions, if needed
+ * this is "hooked" from ipath_7220_quiet_serdes(), which is called
+ * just before ipath_shutdown_device() in ipath_driver.c shuts down all
+ * the other timers
+ */
+void ipath_shutdown_relock_poll(struct ipath_devdata *dd)
+{
+	struct ipath_relock *irp = &dd->ipath_relock_singleton;
+	if (atomic_read(&irp->ipath_relock_timer_active)) {
+		del_timer_sync(&irp->ipath_relock_timer);
+		atomic_set(&irp->ipath_relock_timer_active, 0);
+	}
+}
+
+static unsigned ipath_relock_by_timer = 1;
+module_param_named(relock_by_timer, ipath_relock_by_timer, uint,
+	S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(relock_by_timer, "Allow relock attempt if link not up");
+
+static void ipath_run_relock(unsigned long opaque)
+{
+	struct ipath_devdata *dd = (struct ipath_devdata *)opaque;
+	struct ipath_relock *irp = &dd->ipath_relock_singleton;
+	u64 val, ltstate;
+
+	if (!(dd->ipath_flags & IPATH_INITTED)) {
+		/* Not yet up, just reenable the timer for later */
+		irp->ipath_relock_interval = HZ;
+		mod_timer(&irp->ipath_relock_timer, jiffies + HZ);
+		return;
+	}
+
+	/*
+	 * Check link-training state for "stuck" state.
+	 * if found, try relock and schedule another try at
+	 * exponentially growing delay, maxed at one second.
+	 * if not stuck, our work is done.
+	 */
+	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus);
+	ltstate = ipath_ib_linktrstate(dd, val);
+
+	if (ltstate <= INFINIPATH_IBCS_LT_STATE_CFGWAITRMT
+		&& ltstate != INFINIPATH_IBCS_LT_STATE_LINKUP) {
+		int timeoff;
+		/* Not up yet. Try again, if allowed by module-param */
+		if (ipath_relock_by_timer) {
+			if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG)
+				ipath_cdbg(VERBOSE, "Skip RELOCK in AUTONEG\n");
+			else if (!(dd->ipath_flags & IPATH_IB_LINK_DISABLED)) {
+				ipath_cdbg(VERBOSE, "RELOCK\n");
+				ipath_toggle_rclkrls(dd);
+			}
+		}
+		/* re-set timer for next check */
+		timeoff = irp->ipath_relock_interval << 1;
+		if (timeoff > HZ)
+			timeoff = HZ;
+		irp->ipath_relock_interval = timeoff;
+
+		mod_timer(&irp->ipath_relock_timer, jiffies + timeoff);
+	} else {
+		/* Up, so no more need to check so often */
+		mod_timer(&irp->ipath_relock_timer, jiffies + HZ);
+	}
+}
+
+void ipath_set_relock_poll(struct ipath_devdata *dd, int ibup)
+{
+	struct ipath_relock *irp = &dd->ipath_relock_singleton;
+
+	if (ibup > 0) {
+		/* we are now up, so relax timer to 1 second interval */
+		if (atomic_read(&irp->ipath_relock_timer_active))
+			mod_timer(&irp->ipath_relock_timer, jiffies + HZ);
+	} else {
+		/* Transition to down, (re-)set timer to short interval. */
+		int timeout;
+		timeout = (HZ * ((ibup == -1) ? 1000 : RELOCK_FIRST_MS))/1000;
+		if (timeout == 0)
+			timeout = 1;
+		/* If timer has not yet been started, do so. */
+		if (atomic_inc_return(&irp->ipath_relock_timer_active) == 1) {
+			init_timer(&irp->ipath_relock_timer);
+			irp->ipath_relock_timer.function = ipath_run_relock;
+			irp->ipath_relock_timer.data = (unsigned long) dd;
+			irp->ipath_relock_interval = timeout;
+			irp->ipath_relock_timer.expires = jiffies + timeout;
+			add_timer(&irp->ipath_relock_timer);
+		} else {
+			irp->ipath_relock_interval = timeout;
+			mod_timer(&irp->ipath_relock_timer, jiffies + timeout);
+			atomic_dec(&irp->ipath_relock_timer_active);
+		}
+	}
+}
+


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:13 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:13 -0700
Subject: [ofa-general] [PATCH 14/20] IB/ipath - Add IBA7220 specific
	initialization data
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225013.28598.14756.stgit@eng-46.mv.qlogic.com>

This patch adds binary data to initialize the IB SERDES.

Signed-off-by: Michael Albaugh <Michael.Albaugh at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_sd7220_img.c | 1082 ++++++++++++++++++++++++
 1 files changed, 1082 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_sd7220_img.c b/drivers/infiniband/hw/ipath/ipath_sd7220_img.c
new file mode 100644
index 0000000..5ef59da
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_sd7220_img.c
@@ -0,0 +1,1082 @@
+/*
+ * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+/*
+ * This file contains the memory image from the vendor, to be copied into
+ * the IB SERDES of the IBA7220 during initialization.
+ * The file also includes the two functions which use this image.
+ */
+#include <linux/pci.h>
+#include <linux/delay.h>
+
+#include "ipath_kernel.h"
+#include "ipath_registers.h"
+#include "ipath_7220.h"
+
+static unsigned char ipath_sd7220_ib_img[] = {
+/*0000*/0x02, 0x0A, 0x29, 0x02, 0x0A, 0x87, 0xE5, 0xE6,
+	0x30, 0xE6, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F,
+/*0010*/0x00, 0xE5, 0xE2, 0x30, 0xE4, 0x04, 0x7E, 0x01,
+	0x80, 0x02, 0x7E, 0x00, 0xEE, 0x5F, 0x60, 0x08,
+/*0020*/0x53, 0xF9, 0xF7, 0xE4, 0xF5, 0xFE, 0x80, 0x08,
+	0x7F, 0x0A, 0x12, 0x17, 0x31, 0x12, 0x0E, 0xA2,
+/*0030*/0x75, 0xFC, 0x08, 0xE4, 0xF5, 0xFD, 0xE5, 0xE7,
+	0x20, 0xE7, 0x03, 0x43, 0xF9, 0x08, 0x22, 0x00,
+/*0040*/0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x75,
+	0x51, 0x01, 0xE4, 0xF5, 0x52, 0xF5, 0x53, 0xF5,
+/*0050*/0x52, 0xF5, 0x7E, 0x7F, 0x04, 0x02, 0x04, 0x38,
+	0xC2, 0x36, 0x05, 0x52, 0xE5, 0x52, 0xD3, 0x94,
+/*0060*/0x0C, 0x40, 0x05, 0x75, 0x52, 0x01, 0xD2, 0x36,
+	0x90, 0x07, 0x0C, 0x74, 0x07, 0xF0, 0xA3, 0x74,
+/*0070*/0xFF, 0xF0, 0xE4, 0xF5, 0x0C, 0xA3, 0xF0, 0x90,
+	0x07, 0x14, 0xF0, 0xA3, 0xF0, 0x75, 0x0B, 0x20,
+/*0080*/0xF5, 0x09, 0xE4, 0xF5, 0x08, 0xE5, 0x08, 0xD3,
+	0x94, 0x30, 0x40, 0x03, 0x02, 0x04, 0x04, 0x12,
+/*0090*/0x00, 0x06, 0x15, 0x0B, 0xE5, 0x08, 0x70, 0x04,
+	0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x09,
+/*00A0*/0x70, 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00,
+	0xEE, 0x5F, 0x60, 0x05, 0x12, 0x18, 0x71, 0xD2,
+/*00B0*/0x35, 0x53, 0xE1, 0xF7, 0xE5, 0x08, 0x45, 0x09,
+	0xFF, 0xE5, 0x0B, 0x25, 0xE0, 0x25, 0xE0, 0x24,
+/*00C0*/0x83, 0xF5, 0x82, 0xE4, 0x34, 0x07, 0xF5, 0x83,
+	0xEF, 0xF0, 0x85, 0xE2, 0x20, 0xE5, 0x52, 0xD3,
+/*00D0*/0x94, 0x01, 0x40, 0x0D, 0x12, 0x19, 0xF3, 0xE0,
+	0x54, 0xA0, 0x64, 0x40, 0x70, 0x03, 0x02, 0x03,
+/*00E0*/0xFB, 0x53, 0xF9, 0xF8, 0x90, 0x94, 0x70, 0xE4,
+	0xF0, 0xE0, 0xF5, 0x10, 0xAF, 0x09, 0x12, 0x1E,
+/*00F0*/0xB3, 0xAF, 0x08, 0xEF, 0x44, 0x08, 0xF5, 0x82,
+	0x75, 0x83, 0x80, 0xE0, 0xF5, 0x29, 0xEF, 0x44,
+/*0100*/0x07, 0x12, 0x1A, 0x3C, 0xF5, 0x22, 0x54, 0x40,
+	0xD3, 0x94, 0x00, 0x40, 0x1E, 0xE5, 0x29, 0x54,
+/*0110*/0xF0, 0x70, 0x21, 0x12, 0x19, 0xF3, 0xE0, 0x44,
+	0x80, 0xF0, 0xE5, 0x22, 0x54, 0x30, 0x65, 0x08,
+/*0120*/0x70, 0x09, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xBF,
+	0xF0, 0x80, 0x09, 0x12, 0x19, 0xF3, 0x74, 0x40,
+/*0130*/0xF0, 0x02, 0x03, 0xFB, 0x12, 0x1A, 0x12, 0x75,
+	0x83, 0xAE, 0x74, 0xFF, 0xF0, 0xAF, 0x08, 0x7E,
+/*0140*/0x00, 0xEF, 0x44, 0x07, 0xF5, 0x82, 0xE0, 0xFD,
+	0xE5, 0x0B, 0x25, 0xE0, 0x25, 0xE0, 0x24, 0x81,
+/*0150*/0xF5, 0x82, 0xE4, 0x34, 0x07, 0xF5, 0x83, 0xED,
+	0xF0, 0x90, 0x07, 0x0E, 0xE0, 0x04, 0xF0, 0xEF,
+/*0160*/0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0x98, 0xE0,
+	0xF5, 0x28, 0x12, 0x1A, 0x23, 0x40, 0x0C, 0x12,
+/*0170*/0x19, 0xF3, 0xE0, 0x44, 0x01, 0x12, 0x1A, 0x32,
+	0x02, 0x03, 0xF6, 0xAF, 0x08, 0x7E, 0x00, 0x74,
+/*0180*/0x80, 0xCD, 0xEF, 0xCD, 0x8D, 0x82, 0xF5, 0x83,
+	0xE0, 0x30, 0xE0, 0x0A, 0x12, 0x19, 0xF3, 0xE0,
+/*0190*/0x44, 0x20, 0xF0, 0x02, 0x03, 0xFB, 0x12, 0x19,
+	0xF3, 0xE0, 0x54, 0xDF, 0xF0, 0xEE, 0x44, 0xAE,
+/*01A0*/0x12, 0x1A, 0x43, 0x30, 0xE4, 0x03, 0x02, 0x03,
+	0xFB, 0x74, 0x9E, 0x12, 0x1A, 0x05, 0x20, 0xE0,
+/*01B0*/0x03, 0x02, 0x03, 0xFB, 0x8F, 0x82, 0x8E, 0x83,
+	0xE0, 0x20, 0xE0, 0x03, 0x02, 0x03, 0xFB, 0x12,
+/*01C0*/0x19, 0xF3, 0xE0, 0x44, 0x10, 0xF0, 0xE5, 0xE3,
+	0x20, 0xE7, 0x08, 0xE5, 0x08, 0x12, 0x1A, 0x3A,
+/*01D0*/0x44, 0x04, 0xF0, 0xAF, 0x08, 0x7E, 0x00, 0xEF,
+	0x12, 0x1A, 0x3A, 0x20, 0xE2, 0x34, 0x12, 0x19,
+/*01E0*/0xF3, 0xE0, 0x44, 0x08, 0xF0, 0xE5, 0xE4, 0x30,
+	0xE6, 0x04, 0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00,
+/*01F0*/0xE5, 0x7E, 0xC3, 0x94, 0x04, 0x50, 0x04, 0x7C,
+	0x01, 0x80, 0x02, 0x7C, 0x00, 0xEC, 0x4D, 0x60,
+/*0200*/0x05, 0xC2, 0x35, 0x02, 0x03, 0xFB, 0xEE, 0x44,
+	0xD2, 0x12, 0x1A, 0x43, 0x44, 0x40, 0xF0, 0x02,
+/*0210*/0x03, 0xFB, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xF7,
+	0xF0, 0x12, 0x1A, 0x12, 0x75, 0x83, 0xD2, 0xE0,
+/*0220*/0x54, 0xBF, 0xF0, 0x90, 0x07, 0x14, 0xE0, 0x04,
+	0xF0, 0xE5, 0x7E, 0x70, 0x03, 0x75, 0x7E, 0x01,
+/*0230*/0xAF, 0x08, 0x7E, 0x00, 0x12, 0x1A, 0x23, 0x40,
+	0x12, 0x12, 0x19, 0xF3, 0xE0, 0x44, 0x01, 0x12,
+/*0240*/0x19, 0xF2, 0xE0, 0x54, 0x02, 0x12, 0x1A, 0x32,
+	0x02, 0x03, 0xFB, 0x12, 0x19, 0xF3, 0xE0, 0x44,
+/*0250*/0x02, 0x12, 0x19, 0xF2, 0xE0, 0x54, 0xFE, 0xF0,
+	0xC2, 0x35, 0xEE, 0x44, 0x8A, 0x8F, 0x82, 0xF5,
+/*0260*/0x83, 0xE0, 0xF5, 0x17, 0x54, 0x8F, 0x44, 0x40,
+	0xF0, 0x74, 0x90, 0xFC, 0xE5, 0x08, 0x44, 0x07,
+/*0270*/0xFD, 0xF5, 0x82, 0x8C, 0x83, 0xE0, 0x54, 0x3F,
+	0x90, 0x07, 0x02, 0xF0, 0xE0, 0x54, 0xC0, 0x8D,
+/*0280*/0x82, 0x8C, 0x83, 0xF0, 0x74, 0x92, 0x12, 0x1A,
+	0x05, 0x90, 0x07, 0x03, 0x12, 0x1A, 0x19, 0x74,
+/*0290*/0x82, 0x12, 0x1A, 0x05, 0x90, 0x07, 0x04, 0x12,
+	0x1A, 0x19, 0x74, 0xB4, 0x12, 0x1A, 0x05, 0x90,
+/*02A0*/0x07, 0x05, 0x12, 0x1A, 0x19, 0x74, 0x94, 0xFE,
+	0xE5, 0x08, 0x44, 0x06, 0x12, 0x1A, 0x0A, 0xF5,
+/*02B0*/0x10, 0x30, 0xE0, 0x04, 0xD2, 0x37, 0x80, 0x02,
+	0xC2, 0x37, 0xE5, 0x10, 0x54, 0x7F, 0x8F, 0x82,
+/*02C0*/0x8E, 0x83, 0xF0, 0x30, 0x44, 0x30, 0x12, 0x1A,
+	0x03, 0x54, 0x80, 0xD3, 0x94, 0x00, 0x40, 0x04,
+/*02D0*/0xD2, 0x39, 0x80, 0x02, 0xC2, 0x39, 0x8F, 0x82,
+	0x8E, 0x83, 0xE0, 0x44, 0x80, 0xF0, 0x12, 0x1A,
+/*02E0*/0x03, 0x54, 0x40, 0xD3, 0x94, 0x00, 0x40, 0x04,
+	0xD2, 0x3A, 0x80, 0x02, 0xC2, 0x3A, 0x8F, 0x82,
+/*02F0*/0x8E, 0x83, 0xE0, 0x44, 0x40, 0xF0, 0x74, 0x92,
+	0xFE, 0xE5, 0x08, 0x44, 0x06, 0x12, 0x1A, 0x0A,
+/*0300*/0x30, 0xE7, 0x04, 0xD2, 0x38, 0x80, 0x02, 0xC2,
+	0x38, 0x8F, 0x82, 0x8E, 0x83, 0xE0, 0x54, 0x7F,
+/*0310*/0xF0, 0x12, 0x1E, 0x46, 0xE4, 0xF5, 0x0A, 0x20,
+	0x03, 0x02, 0x80, 0x03, 0x30, 0x43, 0x03, 0x12,
+/*0320*/0x19, 0x95, 0x20, 0x02, 0x02, 0x80, 0x03, 0x30,
+	0x42, 0x03, 0x12, 0x0C, 0x8F, 0x30, 0x30, 0x06,
+/*0330*/0x12, 0x19, 0x95, 0x12, 0x0C, 0x8F, 0x12, 0x0D,
+	0x47, 0x12, 0x19, 0xF3, 0xE0, 0x54, 0xFB, 0xF0,
+/*0340*/0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x40, 0x46, 0x43,
+	0xE1, 0x08, 0x12, 0x19, 0xF3, 0xE0, 0x44, 0x04,
+/*0350*/0xF0, 0xE5, 0xE4, 0x20, 0xE7, 0x2A, 0x12, 0x1A,
+	0x12, 0x75, 0x83, 0xD2, 0xE0, 0x54, 0x08, 0xD3,
+/*0360*/0x94, 0x00, 0x40, 0x04, 0x7F, 0x01, 0x80, 0x02,
+	0x7F, 0x00, 0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x40,
+/*0370*/0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEF,
+	0x5E, 0x60, 0x05, 0x12, 0x1D, 0xD7, 0x80, 0x17,
+/*0380*/0x12, 0x1A, 0x12, 0x75, 0x83, 0xD2, 0xE0, 0x44,
+	0x08, 0xF0, 0x02, 0x03, 0xFB, 0x12, 0x1A, 0x12,
+/*0390*/0x75, 0x83, 0xD2, 0xE0, 0x54, 0xF7, 0xF0, 0x12,
+	0x1E, 0x46, 0x7F, 0x08, 0x12, 0x17, 0x31, 0x74,
+/*03A0*/0x8E, 0xFE, 0x12, 0x1A, 0x12, 0x8E, 0x83, 0xE0,
+	0xF5, 0x10, 0x54, 0xFE, 0xF0, 0xE5, 0x10, 0x44,
+/*03B0*/0x01, 0xFF, 0xE5, 0x08, 0xFD, 0xED, 0x44, 0x07,
+	0xF5, 0x82, 0xEF, 0xF0, 0xE5, 0x10, 0x54, 0xFE,
+/*03C0*/0xFF, 0xED, 0x44, 0x07, 0xF5, 0x82, 0xEF, 0x12,
+	0x1A, 0x11, 0x75, 0x83, 0x86, 0xE0, 0x44, 0x10,
+/*03D0*/0x12, 0x1A, 0x11, 0xE0, 0x44, 0x10, 0xF0, 0x12,
+	0x19, 0xF3, 0xE0, 0x54, 0xFD, 0x44, 0x01, 0xFF,
+/*03E0*/0x12, 0x19, 0xF3, 0xEF, 0x12, 0x1A, 0x32, 0x30,
+	0x32, 0x0C, 0xE5, 0x08, 0x44, 0x08, 0xF5, 0x82,
+/*03F0*/0x75, 0x83, 0x82, 0x74, 0x05, 0xF0, 0xAF, 0x0B,
+	0x12, 0x18, 0xD7, 0x74, 0x10, 0x25, 0x08, 0xF5,
+/*0400*/0x08, 0x02, 0x00, 0x85, 0x05, 0x09, 0xE5, 0x09,
+	0xD3, 0x94, 0x07, 0x50, 0x03, 0x02, 0x00, 0x82,
+/*0410*/0xE5, 0x7E, 0xD3, 0x94, 0x00, 0x40, 0x04, 0x7F,
+	0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x7E, 0xC3,
+/*0420*/0x94, 0xFA, 0x50, 0x04, 0x7E, 0x01, 0x80, 0x02,
+	0x7E, 0x00, 0xEE, 0x5F, 0x60, 0x02, 0x05, 0x7E,
+/*0430*/0x30, 0x35, 0x0B, 0x43, 0xE1, 0x01, 0x7F, 0x09,
+	0x12, 0x17, 0x31, 0x02, 0x00, 0x58, 0x53, 0xE1,
+/*0440*/0xFE, 0x02, 0x00, 0x58, 0x8E, 0x6A, 0x8F, 0x6B,
+	0x8C, 0x6C, 0x8D, 0x6D, 0x75, 0x6E, 0x01, 0x75,
+/*0450*/0x6F, 0x01, 0x75, 0x70, 0x01, 0xE4, 0xF5, 0x73,
+	0xF5, 0x74, 0xF5, 0x75, 0x90, 0x07, 0x2F, 0xF0,
+/*0460*/0xF5, 0x3C, 0xF5, 0x3E, 0xF5, 0x46, 0xF5, 0x47,
+	0xF5, 0x3D, 0xF5, 0x3F, 0xF5, 0x6F, 0xE5, 0x6F,
+/*0470*/0x70, 0x0F, 0xE5, 0x6B, 0x45, 0x6A, 0x12, 0x07,
+	0x2A, 0x75, 0x83, 0x80, 0x74, 0x3A, 0xF0, 0x80,
+/*0480*/0x09, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x80, 0x74,
+	0x1A, 0xF0, 0xE4, 0xF5, 0x6E, 0xC3, 0x74, 0x3F,
+/*0490*/0x95, 0x6E, 0xFF, 0x12, 0x08, 0x65, 0x75, 0x83,
+	0x82, 0xEF, 0xF0, 0x12, 0x1A, 0x4D, 0x12, 0x08,
+/*04A0*/0xC6, 0xE5, 0x33, 0xF0, 0x12, 0x08, 0xFA, 0x12,
+	0x08, 0xB1, 0x40, 0xE1, 0xE5, 0x6F, 0x70, 0x0B,
+/*04B0*/0x12, 0x07, 0x2A, 0x75, 0x83, 0x80, 0x74, 0x36,
+	0xF0, 0x80, 0x09, 0x12, 0x07, 0x2A, 0x75, 0x83,
+/*04C0*/0x80, 0x74, 0x16, 0xF0, 0x75, 0x6E, 0x01, 0x12,
+	0x07, 0x2A, 0x75, 0x83, 0xB4, 0xE5, 0x6E, 0xF0,
+/*04D0*/0x12, 0x1A, 0x4D, 0x74, 0x3F, 0x25, 0x6E, 0xF5,
+	0x82, 0xE4, 0x34, 0x00, 0xF5, 0x83, 0xE5, 0x33,
+/*04E0*/0xF0, 0x74, 0xBF, 0x25, 0x6E, 0xF5, 0x82, 0xE4,
+	0x34, 0x00, 0x12, 0x08, 0xB1, 0x40, 0xD8, 0xE4,
+/*04F0*/0xF5, 0x70, 0xF5, 0x46, 0xF5, 0x47, 0xF5, 0x6E,
+	0x12, 0x08, 0xFA, 0xF5, 0x83, 0xE0, 0xFE, 0x12,
+/*0500*/0x08, 0xC6, 0xE0, 0x7C, 0x00, 0x24, 0x00, 0xFF,
+	0xEC, 0x3E, 0xFE, 0xAD, 0x3B, 0xD3, 0xEF, 0x9D,
+/*0510*/0xEE, 0x9C, 0x50, 0x04, 0x7B, 0x01, 0x80, 0x02,
+	0x7B, 0x00, 0xE5, 0x70, 0x70, 0x04, 0x7A, 0x01,
+/*0520*/0x80, 0x02, 0x7A, 0x00, 0xEB, 0x5A, 0x60, 0x06,
+	0x85, 0x6E, 0x46, 0x75, 0x70, 0x01, 0xD3, 0xEF,
+/*0530*/0x9D, 0xEE, 0x9C, 0x50, 0x04, 0x7F, 0x01, 0x80,
+	0x02, 0x7F, 0x00, 0xE5, 0x70, 0xB4, 0x01, 0x04,
+/*0540*/0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEF, 0x5E,
+	0x60, 0x03, 0x85, 0x6E, 0x47, 0x05, 0x6E, 0xE5,
+/*0550*/0x6E, 0x64, 0x7F, 0x70, 0xA3, 0xE5, 0x46, 0x60,
+	0x05, 0xE5, 0x47, 0xB4, 0x7E, 0x03, 0x85, 0x46,
+/*0560*/0x47, 0xE5, 0x6F, 0x70, 0x08, 0x85, 0x46, 0x76,
+	0x85, 0x47, 0x77, 0x80, 0x0E, 0xC3, 0x74, 0x7F,
+/*0570*/0x95, 0x46, 0xF5, 0x78, 0xC3, 0x74, 0x7F, 0x95,
+	0x47, 0xF5, 0x79, 0xE5, 0x6F, 0x70, 0x37, 0xE5,
+/*0580*/0x46, 0x65, 0x47, 0x70, 0x0C, 0x75, 0x73, 0x01,
+	0x75, 0x74, 0x01, 0xF5, 0x3C, 0xF5, 0x3D, 0x80,
+/*0590*/0x35, 0xE4, 0xF5, 0x4E, 0xC3, 0xE5, 0x47, 0x95,
+	0x46, 0xF5, 0x3C, 0xC3, 0x13, 0xF5, 0x71, 0x25,
+/*05A0*/0x46, 0xF5, 0x72, 0xC3, 0x94, 0x3F, 0x40, 0x05,
+	0xE4, 0xF5, 0x3D, 0x80, 0x40, 0xC3, 0x74, 0x3F,
+/*05B0*/0x95, 0x72, 0xF5, 0x3D, 0x80, 0x37, 0xE5, 0x46,
+	0x65, 0x47, 0x70, 0x0F, 0x75, 0x73, 0x01, 0x75,
+/*05C0*/0x75, 0x01, 0xF5, 0x3E, 0xF5, 0x3F, 0x75, 0x4E,
+	0x01, 0x80, 0x22, 0xE4, 0xF5, 0x4E, 0xC3, 0xE5,
+/*05D0*/0x47, 0x95, 0x46, 0xF5, 0x3E, 0xC3, 0x13, 0xF5,
+	0x71, 0x25, 0x46, 0xF5, 0x72, 0xD3, 0x94, 0x3F,
+/*05E0*/0x50, 0x05, 0xE4, 0xF5, 0x3F, 0x80, 0x06, 0xE5,
+	0x72, 0x24, 0xC1, 0xF5, 0x3F, 0x05, 0x6F, 0xE5,
+/*05F0*/0x6F, 0xC3, 0x94, 0x02, 0x50, 0x03, 0x02, 0x04,
+	0x6E, 0xE5, 0x6D, 0x45, 0x6C, 0x70, 0x02, 0x80,
+/*0600*/0x04, 0xE5, 0x74, 0x45, 0x75, 0x90, 0x07, 0x2F,
+	0xF0, 0x7F, 0x01, 0xE5, 0x3E, 0x60, 0x04, 0xE5,
+/*0610*/0x3C, 0x70, 0x14, 0xE4, 0xF5, 0x3C, 0xF5, 0x3D,
+	0xF5, 0x3E, 0xF5, 0x3F, 0x12, 0x08, 0xD2, 0x70,
+/*0620*/0x04, 0xF0, 0x02, 0x06, 0xA4, 0x80, 0x7A, 0xE5,
+	0x3C, 0xC3, 0x95, 0x3E, 0x40, 0x07, 0xE5, 0x3C,
+/*0630*/0x95, 0x3E, 0xFF, 0x80, 0x06, 0xC3, 0xE5, 0x3E,
+	0x95, 0x3C, 0xFF, 0xE5, 0x76, 0xD3, 0x95, 0x79,
+/*0640*/0x40, 0x05, 0x85, 0x76, 0x7A, 0x80, 0x03, 0x85,
+	0x79, 0x7A, 0xE5, 0x77, 0xC3, 0x95, 0x78, 0x50,
+/*0650*/0x05, 0x85, 0x77, 0x7B, 0x80, 0x03, 0x85, 0x78,
+	0x7B, 0xE5, 0x7B, 0xD3, 0x95, 0x7A, 0x40, 0x30,
+/*0660*/0xE5, 0x7B, 0x95, 0x7A, 0xF5, 0x3C, 0xF5, 0x3E,
+	0xC3, 0xE5, 0x7B, 0x95, 0x7A, 0x90, 0x07, 0x19,
+/*0670*/0xF0, 0xE5, 0x3C, 0xC3, 0x13, 0xF5, 0x71, 0x25,
+	0x7A, 0xF5, 0x72, 0xC3, 0x94, 0x3F, 0x40, 0x05,
+/*0680*/0xE4, 0xF5, 0x3D, 0x80, 0x1F, 0xC3, 0x74, 0x3F,
+	0x95, 0x72, 0xF5, 0x3D, 0xF5, 0x3F, 0x80, 0x14,
+/*0690*/0xE4, 0xF5, 0x3C, 0xF5, 0x3E, 0x90, 0x07, 0x19,
+	0xF0, 0x12, 0x08, 0xD2, 0x70, 0x03, 0xF0, 0x80,
+/*06A0*/0x03, 0x74, 0x01, 0xF0, 0x12, 0x08, 0x65, 0x75,
+	0x83, 0xD0, 0xE0, 0x54, 0x0F, 0xFE, 0xAD, 0x3C,
+/*06B0*/0x70, 0x02, 0x7E, 0x07, 0xBE, 0x0F, 0x02, 0x7E,
+	0x80, 0xEE, 0xFB, 0xEF, 0xD3, 0x9B, 0x74, 0x80,
+/*06C0*/0xF8, 0x98, 0x40, 0x1F, 0xE4, 0xF5, 0x3C, 0xF5,
+	0x3E, 0x12, 0x08, 0xD2, 0x70, 0x03, 0xF0, 0x80,
+/*06D0*/0x12, 0x74, 0x01, 0xF0, 0xE5, 0x08, 0xFB, 0xEB,
+	0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0xD2, 0xE0,
+/*06E0*/0x44, 0x10, 0xF0, 0xE5, 0x08, 0xFB, 0xEB, 0x44,
+	0x09, 0xF5, 0x82, 0x75, 0x83, 0x9E, 0xED, 0xF0,
+/*06F0*/0xEB, 0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0xCA,
+	0xED, 0xF0, 0x12, 0x08, 0x65, 0x75, 0x83, 0xCC,
+/*0700*/0xEF, 0xF0, 0x22, 0xE5, 0x08, 0x44, 0x07, 0xF5,
+	0x82, 0x75, 0x83, 0xBC, 0xE0, 0x54, 0xF0, 0xF0,
+/*0710*/0xE5, 0x08, 0x44, 0x07, 0xF5, 0x82, 0x75, 0x83,
+	0xBE, 0xE0, 0x54, 0xF0, 0xF0, 0xE5, 0x08, 0x44,
+/*0720*/0x07, 0xF5, 0x82, 0x75, 0x83, 0xC0, 0xE0, 0x54,
+	0xF0, 0xF0, 0xE5, 0x08, 0x44, 0x07, 0xF5, 0x82,
+/*0730*/0x22, 0xF0, 0x90, 0x07, 0x28, 0xE0, 0xFE, 0xA3,
+	0xE0, 0xF5, 0x82, 0x8E, 0x83, 0x22, 0x85, 0x42,
+/*0740*/0x42, 0x85, 0x41, 0x41, 0x85, 0x40, 0x40, 0x74,
+	0xC0, 0x2F, 0xF5, 0x82, 0x74, 0x02, 0x3E, 0xF5,
+/*0750*/0x83, 0xE5, 0x42, 0xF0, 0x74, 0xE0, 0x2F, 0xF5,
+	0x82, 0x74, 0x02, 0x3E, 0xF5, 0x83, 0x22, 0xE5,
+/*0760*/0x42, 0x29, 0xFD, 0xE4, 0x33, 0xFC, 0xE5, 0x3C,
+	0xC3, 0x9D, 0xEC, 0x64, 0x80, 0xF8, 0x74, 0x80,
+/*0770*/0x98, 0x22, 0xF5, 0x83, 0xE0, 0x90, 0x07, 0x22,
+	0x54, 0x1F, 0xFD, 0xE0, 0xFA, 0xA3, 0xE0, 0xF5,
+/*0780*/0x82, 0x8A, 0x83, 0xED, 0xF0, 0x22, 0x90, 0x07,
+	0x22, 0xE0, 0xFC, 0xA3, 0xE0, 0xF5, 0x82, 0x8C,
+/*0790*/0x83, 0x22, 0x90, 0x07, 0x24, 0xFF, 0xED, 0x44,
+	0x07, 0xCF, 0xF0, 0xA3, 0xEF, 0xF0, 0x22, 0x85,
+/*07A0*/0x38, 0x38, 0x85, 0x39, 0x39, 0x85, 0x3A, 0x3A,
+	0x74, 0xC0, 0x2F, 0xF5, 0x82, 0x74, 0x02, 0x3E,
+/*07B0*/0xF5, 0x83, 0x22, 0x90, 0x07, 0x26, 0xFF, 0xED,
+	0x44, 0x07, 0xCF, 0xF0, 0xA3, 0xEF, 0xF0, 0x22,
+/*07C0*/0xF0, 0x74, 0xA0, 0x2F, 0xF5, 0x82, 0x74, 0x02,
+	0x3E, 0xF5, 0x83, 0x22, 0x74, 0xC0, 0x25, 0x11,
+/*07D0*/0xF5, 0x82, 0xE4, 0x34, 0x01, 0xF5, 0x83, 0x22,
+	0x74, 0x00, 0x25, 0x11, 0xF5, 0x82, 0xE4, 0x34,
+/*07E0*/0x02, 0xF5, 0x83, 0x22, 0x74, 0x60, 0x25, 0x11,
+	0xF5, 0x82, 0xE4, 0x34, 0x03, 0xF5, 0x83, 0x22,
+/*07F0*/0x74, 0x80, 0x25, 0x11, 0xF5, 0x82, 0xE4, 0x34,
+	0x03, 0xF5, 0x83, 0x22, 0x74, 0xE0, 0x25, 0x11,
+/*0800*/0xF5, 0x82, 0xE4, 0x34, 0x03, 0xF5, 0x83, 0x22,
+	0x74, 0x40, 0x25, 0x11, 0xF5, 0x82, 0xE4, 0x34,
+/*0810*/0x06, 0xF5, 0x83, 0x22, 0x74, 0x80, 0x2F, 0xF5,
+	0x82, 0x74, 0x02, 0x3E, 0xF5, 0x83, 0x22, 0xAF,
+/*0820*/0x08, 0x7E, 0x00, 0xEF, 0x44, 0x07, 0xF5, 0x82,
+	0x22, 0xF5, 0x83, 0xE5, 0x82, 0x44, 0x07, 0xF5,
+/*0830*/0x82, 0xE5, 0x40, 0xF0, 0x22, 0x74, 0x40, 0x25,
+	0x11, 0xF5, 0x82, 0xE4, 0x34, 0x02, 0xF5, 0x83,
+/*0840*/0x22, 0x74, 0xC0, 0x25, 0x11, 0xF5, 0x82, 0xE4,
+	0x34, 0x03, 0xF5, 0x83, 0x22, 0x74, 0x00, 0x25,
+/*0850*/0x11, 0xF5, 0x82, 0xE4, 0x34, 0x06, 0xF5, 0x83,
+	0x22, 0x74, 0x20, 0x25, 0x11, 0xF5, 0x82, 0xE4,
+/*0860*/0x34, 0x06, 0xF5, 0x83, 0x22, 0xE5, 0x08, 0xFD,
+	0xED, 0x44, 0x07, 0xF5, 0x82, 0x22, 0xE5, 0x41,
+/*0870*/0xF0, 0xE5, 0x65, 0x64, 0x01, 0x45, 0x64, 0x22,
+	0x7E, 0x00, 0xFB, 0x7A, 0x00, 0xFD, 0x7C, 0x00,
+/*0880*/0x22, 0x74, 0x20, 0x25, 0x11, 0xF5, 0x82, 0xE4,
+	0x34, 0x02, 0x22, 0x74, 0xA0, 0x25, 0x11, 0xF5,
+/*0890*/0x82, 0xE4, 0x34, 0x03, 0x22, 0x85, 0x3E, 0x42,
+	0x85, 0x3F, 0x41, 0x8F, 0x40, 0x22, 0x85, 0x3C,
+/*08A0*/0x42, 0x85, 0x3D, 0x41, 0x8F, 0x40, 0x22, 0x75,
+	0x45, 0x3F, 0x90, 0x07, 0x20, 0xE4, 0xF0, 0xA3,
+/*08B0*/0x22, 0xF5, 0x83, 0xE5, 0x32, 0xF0, 0x05, 0x6E,
+	0xE5, 0x6E, 0xC3, 0x94, 0x40, 0x22, 0xF0, 0xE5,
+/*08C0*/0x08, 0x44, 0x06, 0xF5, 0x82, 0x22, 0x74, 0x00,
+	0x25, 0x6E, 0xF5, 0x82, 0xE4, 0x34, 0x00, 0xF5,
+/*08D0*/0x83, 0x22, 0xE5, 0x6D, 0x45, 0x6C, 0x90, 0x07,
+	0x2F, 0x22, 0xE4, 0xF9, 0xE5, 0x3C, 0xD3, 0x95,
+/*08E0*/0x3E, 0x22, 0x74, 0x80, 0x2E, 0xF5, 0x82, 0xE4,
+	0x34, 0x02, 0xF5, 0x83, 0xE0, 0x22, 0x74, 0xA0,
+/*08F0*/0x2E, 0xF5, 0x82, 0xE4, 0x34, 0x02, 0xF5, 0x83,
+	0xE0, 0x22, 0x74, 0x80, 0x25, 0x6E, 0xF5, 0x82,
+/*0900*/0xE4, 0x34, 0x00, 0x22, 0x25, 0x42, 0xFD, 0xE4,
+	0x33, 0xFC, 0x22, 0x85, 0x42, 0x42, 0x85, 0x41,
+/*0910*/0x41, 0x85, 0x40, 0x40, 0x22, 0xED, 0x4C, 0x60,
+	0x03, 0x02, 0x09, 0xE5, 0xEF, 0x4E, 0x70, 0x37,
+/*0920*/0x90, 0x07, 0x26, 0x12, 0x07, 0x89, 0xE0, 0xFD,
+	0x12, 0x07, 0xCC, 0xED, 0xF0, 0x90, 0x07, 0x28,
+/*0930*/0x12, 0x07, 0x89, 0xE0, 0xFD, 0x12, 0x07, 0xD8,
+	0xED, 0xF0, 0x12, 0x07, 0x86, 0xE0, 0x54, 0x1F,
+/*0940*/0xFD, 0x12, 0x08, 0x81, 0xF5, 0x83, 0xED, 0xF0,
+	0x90, 0x07, 0x24, 0x12, 0x07, 0x89, 0xE0, 0x54,
+/*0950*/0x1F, 0xFD, 0x12, 0x08, 0x35, 0xED, 0xF0, 0xEF,
+	0x64, 0x04, 0x4E, 0x70, 0x37, 0x90, 0x07, 0x26,
+/*0960*/0x12, 0x07, 0x89, 0xE0, 0xFD, 0x12, 0x07, 0xE4,
+	0xED, 0xF0, 0x90, 0x07, 0x28, 0x12, 0x07, 0x89,
+/*0970*/0xE0, 0xFD, 0x12, 0x07, 0xF0, 0xED, 0xF0, 0x12,
+	0x07, 0x86, 0xE0, 0x54, 0x1F, 0xFD, 0x12, 0x08,
+/*0980*/0x8B, 0xF5, 0x83, 0xED, 0xF0, 0x90, 0x07, 0x24,
+	0x12, 0x07, 0x89, 0xE0, 0x54, 0x1F, 0xFD, 0x12,
+/*0990*/0x08, 0x41, 0xED, 0xF0, 0xEF, 0x64, 0x01, 0x4E,
+	0x70, 0x04, 0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00,
+/*09A0*/0xEF, 0x64, 0x02, 0x4E, 0x70, 0x04, 0x7F, 0x01,
+	0x80, 0x02, 0x7F, 0x00, 0xEF, 0x4D, 0x60, 0x78,
+/*09B0*/0x90, 0x07, 0x26, 0x12, 0x07, 0x35, 0xE0, 0xFF,
+	0x12, 0x07, 0xFC, 0xEF, 0x12, 0x07, 0x31, 0xE0,
+/*09C0*/0xFF, 0x12, 0x08, 0x08, 0xEF, 0xF0, 0x90, 0x07,
+	0x22, 0x12, 0x07, 0x35, 0xE0, 0x54, 0x1F, 0xFF,
+/*09D0*/0x12, 0x08, 0x4D, 0xEF, 0xF0, 0x90, 0x07, 0x24,
+	0x12, 0x07, 0x35, 0xE0, 0x54, 0x1F, 0xFF, 0x12,
+/*09E0*/0x08, 0x59, 0xEF, 0xF0, 0x22, 0x12, 0x07, 0xCC,
+	0xE4, 0xF0, 0x12, 0x07, 0xD8, 0xE4, 0xF0, 0x12,
+/*09F0*/0x08, 0x81, 0xF5, 0x83, 0xE4, 0xF0, 0x12, 0x08,
+	0x35, 0x74, 0x14, 0xF0, 0x12, 0x07, 0xE4, 0xE4,
+/*0A00*/0xF0, 0x12, 0x07, 0xF0, 0xE4, 0xF0, 0x12, 0x08,
+	0x8B, 0xF5, 0x83, 0xE4, 0xF0, 0x12, 0x08, 0x41,
+/*0A10*/0x74, 0x14, 0xF0, 0x12, 0x07, 0xFC, 0xE4, 0xF0,
+	0x12, 0x08, 0x08, 0xE4, 0xF0, 0x12, 0x08, 0x4D,
+/*0A20*/0xE4, 0xF0, 0x12, 0x08, 0x59, 0x74, 0x14, 0xF0,
+	0x22, 0x53, 0xF9, 0xF7, 0x75, 0xFC, 0x10, 0xE4,
+/*0A30*/0xF5, 0xFD, 0x75, 0xFE, 0x30, 0xF5, 0xFF, 0xE5,
+	0xE7, 0x20, 0xE7, 0x03, 0x43, 0xF9, 0x08, 0xE5,
+/*0A40*/0xE6, 0x20, 0xE7, 0x0B, 0x78, 0xFF, 0xE4, 0xF6,
+	0xD8, 0xFD, 0x53, 0xE6, 0xFE, 0x80, 0x09, 0x78,
+/*0A50*/0x08, 0xE4, 0xF6, 0xD8, 0xFD, 0x53, 0xE6, 0xFE,
+	0x75, 0x81, 0x80, 0xE4, 0xF5, 0xA8, 0xD2, 0xA8,
+/*0A60*/0xC2, 0xA9, 0xD2, 0xAF, 0xE5, 0xE2, 0x20, 0xE5,
+	0x05, 0x20, 0xE6, 0x02, 0x80, 0x03, 0x43, 0xE1,
+/*0A70*/0x02, 0xE5, 0xE2, 0x20, 0xE0, 0x0E, 0x90, 0x00,
+	0x00, 0x7F, 0x00, 0x7E, 0x08, 0xE4, 0xF0, 0xA3,
+/*0A80*/0xDF, 0xFC, 0xDE, 0xFA, 0x02, 0x0A, 0xDB, 0x43,
+	0xFA, 0x01, 0xC0, 0xE0, 0xC0, 0xF0, 0xC0, 0x83,
+/*0A90*/0xC0, 0x82, 0xC0, 0xD0, 0x12, 0x1C, 0xE7, 0xD0,
+	0xD0, 0xD0, 0x82, 0xD0, 0x83, 0xD0, 0xF0, 0xD0,
+/*0AA0*/0xE0, 0x53, 0xFA, 0xFE, 0x32, 0x02, 0x1B, 0x55,
+	0xE4, 0x93, 0xA3, 0xF8, 0xE4, 0x93, 0xA3, 0xF6,
+/*0AB0*/0x08, 0xDF, 0xF9, 0x80, 0x29, 0xE4, 0x93, 0xA3,
+	0xF8, 0x54, 0x07, 0x24, 0x0C, 0xC8, 0xC3, 0x33,
+/*0AC0*/0xC4, 0x54, 0x0F, 0x44, 0x20, 0xC8, 0x83, 0x40,
+	0x04, 0xF4, 0x56, 0x80, 0x01, 0x46, 0xF6, 0xDF,
+/*0AD0*/0xE4, 0x80, 0x0B, 0x01, 0x02, 0x04, 0x08, 0x10,
+	0x20, 0x40, 0x80, 0x90, 0x00, 0x3F, 0xE4, 0x7E,
+/*0AE0*/0x01, 0x93, 0x60, 0xC1, 0xA3, 0xFF, 0x54, 0x3F,
+	0x30, 0xE5, 0x09, 0x54, 0x1F, 0xFE, 0xE4, 0x93,
+/*0AF0*/0xA3, 0x60, 0x01, 0x0E, 0xCF, 0x54, 0xC0, 0x25,
+	0xE0, 0x60, 0xAD, 0x40, 0xB8, 0x80, 0xFE, 0x8C,
+/*0B00*/0x64, 0x8D, 0x65, 0x8A, 0x66, 0x8B, 0x67, 0xE4,
+	0xF5, 0x69, 0xEF, 0x4E, 0x70, 0x03, 0x02, 0x1D,
+/*0B10*/0x55, 0xE4, 0xF5, 0x68, 0xE5, 0x67, 0x45, 0x66,
+	0x70, 0x32, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x90,
+/*0B20*/0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, 0xE4,
+	0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0xE4, 0x12,
+/*0B30*/0x08, 0x70, 0x70, 0x29, 0x12, 0x07, 0x2A, 0x75,
+	0x83, 0x92, 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83,
+/*0B40*/0xC6, 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC8,
+	0xE4, 0xF0, 0x80, 0x11, 0x90, 0x07, 0x26, 0x12,
+/*0B50*/0x07, 0x35, 0xE4, 0x12, 0x08, 0x70, 0x70, 0x05,
+	0x12, 0x07, 0x32, 0xE4, 0xF0, 0x12, 0x1D, 0x55,
+/*0B60*/0x12, 0x1E, 0xBF, 0xE5, 0x67, 0x45, 0x66, 0x70,
+	0x33, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x90, 0xE5,
+/*0B70*/0x41, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, 0xE5,
+	0x41, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0x12,
+/*0B80*/0x08, 0x6E, 0x70, 0x29, 0x12, 0x07, 0x2A, 0x75,
+	0x83, 0x92, 0xE5, 0x40, 0x12, 0x07, 0x29, 0x75,
+/*0B90*/0x83, 0xC6, 0xE5, 0x40, 0x12, 0x07, 0x29, 0x75,
+	0x83, 0xC8, 0x80, 0x0E, 0x90, 0x07, 0x26, 0x12,
+/*0BA0*/0x07, 0x35, 0x12, 0x08, 0x6E, 0x70, 0x06, 0x12,
+	0x07, 0x32, 0xE5, 0x40, 0xF0, 0xAF, 0x69, 0x7E,
+/*0BB0*/0x00, 0xAD, 0x67, 0xAC, 0x66, 0x12, 0x04, 0x44,
+	0x12, 0x07, 0x2A, 0x75, 0x83, 0xCA, 0xE0, 0xD3,
+/*0BC0*/0x94, 0x00, 0x50, 0x0C, 0x05, 0x68, 0xE5, 0x68,
+	0xC3, 0x94, 0x05, 0x50, 0x03, 0x02, 0x0B, 0x14,
+/*0BD0*/0x22, 0x8C, 0x60, 0x8D, 0x61, 0x12, 0x08, 0xDA,
+	0x74, 0x20, 0x40, 0x0D, 0x2F, 0xF5, 0x82, 0x74,
+/*0BE0*/0x03, 0x3E, 0xF5, 0x83, 0xE5, 0x3E, 0xF0, 0x80,
+	0x0B, 0x2F, 0xF5, 0x82, 0x74, 0x03, 0x3E, 0xF5,
+/*0BF0*/0x83, 0xE5, 0x3C, 0xF0, 0xE5, 0x3C, 0xD3, 0x95,
+	0x3E, 0x40, 0x3C, 0xE5, 0x61, 0x45, 0x60, 0x70,
+/*0C00*/0x10, 0xE9, 0x12, 0x09, 0x04, 0xE5, 0x3E, 0x12,
+	0x07, 0x68, 0x40, 0x3B, 0x12, 0x08, 0x95, 0x80,
+/*0C10*/0x18, 0xE5, 0x3E, 0xC3, 0x95, 0x38, 0x40, 0x1D,
+	0x85, 0x3E, 0x38, 0xE5, 0x3E, 0x60, 0x05, 0x85,
+/*0C20*/0x3F, 0x39, 0x80, 0x03, 0x85, 0x39, 0x39, 0x8F,
+	0x3A, 0x12, 0x08, 0x14, 0xE5, 0x3E, 0x12, 0x07,
+/*0C30*/0xC0, 0xE5, 0x3F, 0xF0, 0x22, 0x80, 0x43, 0xE5,
+	0x61, 0x45, 0x60, 0x70, 0x19, 0x12, 0x07, 0x5F,
+/*0C40*/0x40, 0x05, 0x12, 0x08, 0x9E, 0x80, 0x27, 0x12,
+	0x09, 0x0B, 0x12, 0x08, 0x14, 0xE5, 0x42, 0x12,
+/*0C50*/0x07, 0xC0, 0xE5, 0x41, 0xF0, 0x22, 0xE5, 0x3C,
+	0xC3, 0x95, 0x38, 0x40, 0x1D, 0x85, 0x3C, 0x38,
+/*0C60*/0xE5, 0x3C, 0x60, 0x05, 0x85, 0x3D, 0x39, 0x80,
+	0x03, 0x85, 0x39, 0x39, 0x8F, 0x3A, 0x12, 0x08,
+/*0C70*/0x14, 0xE5, 0x3C, 0x12, 0x07, 0xC0, 0xE5, 0x3D,
+	0xF0, 0x22, 0x85, 0x38, 0x38, 0x85, 0x39, 0x39,
+/*0C80*/0x85, 0x3A, 0x3A, 0x12, 0x08, 0x14, 0xE5, 0x38,
+	0x12, 0x07, 0xC0, 0xE5, 0x39, 0xF0, 0x22, 0x7F,
+/*0C90*/0x06, 0x12, 0x17, 0x31, 0x12, 0x1D, 0x23, 0x12,
+	0x0E, 0x04, 0x12, 0x0E, 0x33, 0xE0, 0x44, 0x0A,
+/*0CA0*/0xF0, 0x74, 0x8E, 0xFE, 0x12, 0x0E, 0x04, 0x12,
+	0x0E, 0x0B, 0xEF, 0xF0, 0xE5, 0x28, 0x30, 0xE5,
+/*0CB0*/0x03, 0xD3, 0x80, 0x01, 0xC3, 0x40, 0x05, 0x75,
+	0x14, 0x20, 0x80, 0x03, 0x75, 0x14, 0x08, 0x12,
+/*0CC0*/0x0E, 0x04, 0x75, 0x83, 0x8A, 0xE5, 0x14, 0xF0,
+	0xB4, 0xFF, 0x05, 0x75, 0x12, 0x80, 0x80, 0x06,
+/*0CD0*/0xE5, 0x14, 0xC3, 0x13, 0xF5, 0x12, 0xE4, 0xF5,
+	0x16, 0xF5, 0x7F, 0x12, 0x19, 0x36, 0x12, 0x13,
+/*0CE0*/0xA3, 0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x50, 0x09,
+	0x05, 0x16, 0xE5, 0x16, 0xC3, 0x94, 0x14, 0x40,
+/*0CF0*/0xEA, 0xE5, 0xE4, 0x20, 0xE7, 0x28, 0x12, 0x0E,
+	0x04, 0x75, 0x83, 0xD2, 0xE0, 0x54, 0x08, 0xD3,
+/*0D00*/0x94, 0x00, 0x40, 0x04, 0x7F, 0x01, 0x80, 0x02,
+	0x7F, 0x00, 0xE5, 0x0A, 0xC3, 0x94, 0x01, 0x40,
+/*0D10*/0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEF,
+	0x5E, 0x60, 0x03, 0x12, 0x1D, 0xD7, 0xE5, 0x7F,
+/*0D20*/0xC3, 0x94, 0x11, 0x40, 0x14, 0x12, 0x0E, 0x04,
+	0x75, 0x83, 0xD2, 0xE0, 0x44, 0x80, 0xF0, 0xE5,
+/*0D30*/0xE4, 0x20, 0xE7, 0x0F, 0x12, 0x1D, 0xD7, 0x80,
+	0x0A, 0x12, 0x0E, 0x04, 0x75, 0x83, 0xD2, 0xE0,
+/*0D40*/0x54, 0x7F, 0xF0, 0x12, 0x1D, 0x23, 0x22, 0x74,
+	0x8A, 0x85, 0x08, 0x82, 0xF5, 0x83, 0xE5, 0x17,
+/*0D50*/0xF0, 0x12, 0x0E, 0x3A, 0xE4, 0xF0, 0x90, 0x07,
+	0x02, 0xE0, 0x12, 0x0E, 0x17, 0x75, 0x83, 0x90,
+/*0D60*/0xEF, 0xF0, 0x74, 0x92, 0xFE, 0xE5, 0x08, 0x44,
+	0x07, 0xFF, 0xF5, 0x82, 0x8E, 0x83, 0xE0, 0x54,
+/*0D70*/0xC0, 0xFD, 0x90, 0x07, 0x03, 0xE0, 0x54, 0x3F,
+	0x4D, 0x8F, 0x82, 0x8E, 0x83, 0xF0, 0x90, 0x07,
+/*0D80*/0x04, 0xE0, 0x12, 0x0E, 0x17, 0x75, 0x83, 0x82,
+	0xEF, 0xF0, 0x90, 0x07, 0x05, 0xE0, 0xFF, 0xED,
+/*0D90*/0x44, 0x07, 0xF5, 0x82, 0x75, 0x83, 0xB4, 0xEF,
+	0x12, 0x0E, 0x03, 0x75, 0x83, 0x80, 0xE0, 0x54,
+/*0DA0*/0xBF, 0xF0, 0x30, 0x37, 0x0A, 0x12, 0x0E, 0x91,
+	0x75, 0x83, 0x94, 0xE0, 0x44, 0x80, 0xF0, 0x30,
+/*0DB0*/0x38, 0x0A, 0x12, 0x0E, 0x91, 0x75, 0x83, 0x92,
+	0xE0, 0x44, 0x80, 0xF0, 0xE5, 0x28, 0x30, 0xE4,
+/*0DC0*/0x1A, 0x20, 0x39, 0x0A, 0x12, 0x0E, 0x04, 0x75,
+	0x83, 0x88, 0xE0, 0x54, 0x7F, 0xF0, 0x20, 0x3A,
+/*0DD0*/0x0A, 0x12, 0x0E, 0x04, 0x75, 0x83, 0x88, 0xE0,
+	0x54, 0xBF, 0xF0, 0x74, 0x8C, 0xFE, 0x12, 0x0E,
+/*0DE0*/0x04, 0x8E, 0x83, 0xE0, 0x54, 0x0F, 0x12, 0x0E,
+	0x03, 0x75, 0x83, 0x86, 0xE0, 0x54, 0xBF, 0xF0,
+/*0DF0*/0xE5, 0x08, 0x44, 0x06, 0x12, 0x0D, 0xFD, 0x75,
+	0x83, 0x8A, 0xE4, 0xF0, 0x22, 0xF5, 0x82, 0x75,
+/*0E00*/0x83, 0x82, 0xE4, 0xF0, 0xE5, 0x08, 0x44, 0x07,
+	0xF5, 0x82, 0x22, 0x8E, 0x83, 0xE0, 0xF5, 0x10,
+/*0E10*/0x54, 0xFE, 0xF0, 0xE5, 0x10, 0x44, 0x01, 0xFF,
+	0xE5, 0x08, 0xFD, 0xED, 0x44, 0x07, 0xF5, 0x82,
+/*0E20*/0x22, 0xE5, 0x15, 0xC4, 0x54, 0x07, 0xFF, 0xE5,
+	0x08, 0xFD, 0xED, 0x44, 0x08, 0xF5, 0x82, 0x75,
+/*0E30*/0x83, 0x82, 0x22, 0x75, 0x83, 0x80, 0xE0, 0x44,
+	0x40, 0xF0, 0xE5, 0x08, 0x44, 0x08, 0xF5, 0x82,
+/*0E40*/0x75, 0x83, 0x8A, 0x22, 0xE5, 0x16, 0x25, 0xE0,
+	0x25, 0xE0, 0x24, 0xAF, 0xF5, 0x82, 0xE4, 0x34,
+/*0E50*/0x1A, 0xF5, 0x83, 0xE4, 0x93, 0xF5, 0x0D, 0x22,
+	0x43, 0xE1, 0x10, 0x43, 0xE1, 0x80, 0x53, 0xE1,
+/*0E60*/0xFD, 0x85, 0xE1, 0x10, 0x22, 0xE5, 0x16, 0x25,
+	0xE0, 0x25, 0xE0, 0x24, 0xB2, 0xF5, 0x82, 0xE4,
+/*0E70*/0x34, 0x1A, 0xF5, 0x83, 0xE4, 0x93, 0x22, 0x85,
+	0x55, 0x82, 0x85, 0x54, 0x83, 0xE5, 0x15, 0xF0,
+/*0E80*/0x22, 0xE5, 0xE2, 0x54, 0x20, 0xD3, 0x94, 0x00,
+	0x22, 0xE5, 0xE2, 0x54, 0x40, 0xD3, 0x94, 0x00,
+/*0E90*/0x22, 0xE5, 0x08, 0x44, 0x06, 0xF5, 0x82, 0x22,
+	0xFD, 0xE5, 0x08, 0xFB, 0xEB, 0x44, 0x07, 0xF5,
+/*0EA0*/0x82, 0x22, 0x53, 0xF9, 0xF7, 0x75, 0xFE, 0x30,
+	0x22, 0xEF, 0x4E, 0x70, 0x26, 0x12, 0x07, 0xCC,
+/*0EB0*/0xE0, 0xFD, 0x90, 0x07, 0x26, 0x12, 0x07, 0x7B,
+	0x12, 0x07, 0xD8, 0xE0, 0xFD, 0x90, 0x07, 0x28,
+/*0EC0*/0x12, 0x07, 0x7B, 0x12, 0x08, 0x81, 0x12, 0x07,
+	0x72, 0x12, 0x08, 0x35, 0xE0, 0x90, 0x07, 0x24,
+/*0ED0*/0x12, 0x07, 0x78, 0xEF, 0x64, 0x04, 0x4E, 0x70,
+	0x29, 0x12, 0x07, 0xE4, 0xE0, 0xFD, 0x90, 0x07,
+/*0EE0*/0x26, 0x12, 0x07, 0x7B, 0x12, 0x07, 0xF0, 0xE0,
+	0xFD, 0x90, 0x07, 0x28, 0x12, 0x07, 0x7B, 0x12,
+/*0EF0*/0x08, 0x8B, 0x12, 0x07, 0x72, 0x12, 0x08, 0x41,
+	0xE0, 0x54, 0x1F, 0xFD, 0x90, 0x07, 0x24, 0x12,
+/*0F00*/0x07, 0x7B, 0xEF, 0x64, 0x01, 0x4E, 0x70, 0x04,
+	0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00, 0xEF, 0x64,
+/*0F10*/0x02, 0x4E, 0x70, 0x04, 0x7F, 0x01, 0x80, 0x02,
+	0x7F, 0x00, 0xEF, 0x4D, 0x60, 0x35, 0x12, 0x07,
+/*0F20*/0xFC, 0xE0, 0xFF, 0x90, 0x07, 0x26, 0x12, 0x07,
+	0x89, 0xEF, 0xF0, 0x12, 0x08, 0x08, 0xE0, 0xFF,
+/*0F30*/0x90, 0x07, 0x28, 0x12, 0x07, 0x89, 0xEF, 0xF0,
+	0x12, 0x08, 0x4D, 0xE0, 0x54, 0x1F, 0xFF, 0x12,
+/*0F40*/0x07, 0x86, 0xEF, 0xF0, 0x12, 0x08, 0x59, 0xE0,
+	0x54, 0x1F, 0xFF, 0x90, 0x07, 0x24, 0x12, 0x07,
+/*0F50*/0x89, 0xEF, 0xF0, 0x22, 0xE4, 0xF5, 0x53, 0x12,
+	0x0E, 0x81, 0x40, 0x04, 0x7F, 0x01, 0x80, 0x02,
+/*0F60*/0x7F, 0x00, 0x12, 0x0E, 0x89, 0x40, 0x04, 0x7E,
+	0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, 0x70,
+/*0F70*/0x03, 0x02, 0x0F, 0xF6, 0x85, 0xE1, 0x10, 0x43,
+	0xE1, 0x02, 0x53, 0xE1, 0x0F, 0x85, 0xE1, 0x10,
+/*0F80*/0xE4, 0xF5, 0x51, 0xE5, 0xE3, 0x54, 0x3F, 0xF5,
+	0x52, 0x12, 0x0E, 0x89, 0x40, 0x1D, 0xAD, 0x52,
+/*0F90*/0xAF, 0x51, 0x12, 0x11, 0x18, 0xEF, 0x60, 0x08,
+	0x85, 0xE1, 0x10, 0x43, 0xE1, 0x40, 0x80, 0x0B,
+/*0FA0*/0x53, 0xE1, 0xBF, 0x12, 0x0E, 0x58, 0x12, 0x00,
+	0x06, 0x80, 0xFB, 0xE5, 0xE3, 0x54, 0x3F, 0xF5,
+/*0FB0*/0x51, 0xE5, 0xE4, 0x54, 0x3F, 0xF5, 0x52, 0x12,
+	0x0E, 0x81, 0x40, 0x1D, 0xAD, 0x52, 0xAF, 0x51,
+/*0FC0*/0x12, 0x11, 0x18, 0xEF, 0x60, 0x08, 0x85, 0xE1,
+	0x10, 0x43, 0xE1, 0x20, 0x80, 0x0B, 0x53, 0xE1,
+/*0FD0*/0xDF, 0x12, 0x0E, 0x58, 0x12, 0x00, 0x06, 0x80,
+	0xFB, 0x12, 0x0E, 0x81, 0x40, 0x04, 0x7F, 0x01,
+/*0FE0*/0x80, 0x02, 0x7F, 0x00, 0x12, 0x0E, 0x89, 0x40,
+	0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE,
+/*0FF0*/0x4F, 0x60, 0x03, 0x12, 0x0E, 0x5B, 0x22, 0x12,
+	0x0E, 0x21, 0xEF, 0xF0, 0x12, 0x10, 0x91, 0x22,
+/*1000*/0x02, 0x11, 0x00, 0x02, 0x10, 0x40, 0x02, 0x10,
+	0x90, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1010*/0x01, 0x20, 0x01, 0x20, 0xE4, 0xF5, 0x57, 0x12,
+	0x16, 0xBD, 0x12, 0x16, 0x44, 0xE4, 0x12, 0x10,
+/*1020*/0x56, 0x12, 0x14, 0xB7, 0x90, 0x07, 0x26, 0x12,
+	0x07, 0x35, 0xE4, 0x12, 0x07, 0x31, 0xE4, 0xF0,
+/*1030*/0x12, 0x10, 0x56, 0x12, 0x14, 0xB7, 0x90, 0x07,
+	0x26, 0x12, 0x07, 0x35, 0xE5, 0x41, 0x12, 0x07,
+/*1040*/0x31, 0xE5, 0x40, 0xF0, 0xAF, 0x57, 0x7E, 0x00,
+	0xAD, 0x56, 0x7C, 0x00, 0x12, 0x04, 0x44, 0xAF,
+/*1050*/0x56, 0x7E, 0x00, 0x02, 0x11, 0xEE, 0xFF, 0x90,
+	0x07, 0x20, 0xA3, 0xE0, 0xFD, 0xE4, 0xF5, 0x56,
+/*1060*/0xF5, 0x40, 0xFE, 0xFC, 0xAB, 0x56, 0xFA, 0x12,
+	0x11, 0x51, 0x7F, 0x0F, 0x7D, 0x18, 0xE4, 0xF5,
+/*1070*/0x56, 0xF5, 0x40, 0xFE, 0xFC, 0xAB, 0x56, 0xFA,
+	0x12, 0x15, 0x41, 0xAF, 0x56, 0x7E, 0x00, 0x12,
+/*1080*/0x1A, 0xFF, 0xE4, 0xFF, 0xF5, 0x56, 0x7D, 0x1F,
+	0xF5, 0x40, 0xFE, 0xFC, 0xAB, 0x56, 0xFA, 0x22,
+/*1090*/0x22, 0xE4, 0xF5, 0x55, 0xE5, 0x08, 0xFD, 0x74,
+	0xA0, 0xF5, 0x56, 0xED, 0x44, 0x07, 0xF5, 0x57,
+/*10A0*/0xE5, 0x28, 0x30, 0xE5, 0x03, 0xD3, 0x80, 0x01,
+	0xC3, 0x40, 0x05, 0x7F, 0x28, 0xEF, 0x80, 0x04,
+/*10B0*/0x7F, 0x14, 0xEF, 0xC3, 0x13, 0xF5, 0x54, 0xE4,
+	0xF9, 0x12, 0x0E, 0x18, 0x75, 0x83, 0x8E, 0xE0,
+/*10C0*/0xF5, 0x10, 0xCE, 0xEF, 0xCE, 0xEE, 0xD3, 0x94,
+	0x00, 0x40, 0x26, 0xE5, 0x10, 0x54, 0xFE, 0x12,
+/*10D0*/0x0E, 0x98, 0x75, 0x83, 0x8E, 0xED, 0xF0, 0xE5,
+	0x10, 0x44, 0x01, 0xFD, 0xEB, 0x44, 0x07, 0xF5,
+/*10E0*/0x82, 0xED, 0xF0, 0x85, 0x57, 0x82, 0x85, 0x56,
+	0x83, 0xE0, 0x30, 0xE3, 0x01, 0x09, 0x1E, 0x80,
+/*10F0*/0xD4, 0xC2, 0x34, 0xE9, 0xC3, 0x95, 0x54, 0x40,
+	0x02, 0xD2, 0x34, 0x22, 0x02, 0x00, 0x06, 0x22,
+/*1100*/0x30, 0x30, 0x11, 0x90, 0x10, 0x00, 0xE4, 0x93,
+	0xF5, 0x10, 0x90, 0x10, 0x10, 0xE4, 0x93, 0xF5,
+/*1110*/0x10, 0x12, 0x10, 0x90, 0x12, 0x11, 0x50, 0x22,
+	0xE4, 0xFC, 0xC3, 0xED, 0x9F, 0xFA, 0xEF, 0xF5,
+/*1120*/0x83, 0x75, 0x82, 0x00, 0x79, 0xFF, 0xE4, 0x93,
+	0xCC, 0x6C, 0xCC, 0xA3, 0xD9, 0xF8, 0xDA, 0xF6,
+/*1130*/0xE5, 0xE2, 0x30, 0xE4, 0x02, 0x8C, 0xE5, 0xED,
+	0x24, 0xFF, 0xFF, 0xEF, 0x75, 0x82, 0xFF, 0xF5,
+/*1140*/0x83, 0xE4, 0x93, 0x6C, 0x70, 0x03, 0x7F, 0x01,
+	0x22, 0x7F, 0x00, 0x22, 0x22, 0x11, 0x00, 0x00,
+/*1150*/0x22, 0x8E, 0x58, 0x8F, 0x59, 0x8C, 0x5A, 0x8D,
+	0x5B, 0x8A, 0x5C, 0x8B, 0x5D, 0x75, 0x5E, 0x01,
+/*1160*/0xE4, 0xF5, 0x5F, 0xF5, 0x60, 0xF5, 0x62, 0x12,
+	0x07, 0x2A, 0x75, 0x83, 0xD0, 0xE0, 0xFF, 0xC4,
+/*1170*/0x54, 0x0F, 0xF5, 0x61, 0x12, 0x1E, 0xA5, 0x85,
+	0x59, 0x5E, 0xD3, 0xE5, 0x5E, 0x95, 0x5B, 0xE5,
+/*1180*/0x5A, 0x12, 0x07, 0x6B, 0x50, 0x4B, 0x12, 0x07,
+	0x03, 0x75, 0x83, 0xBC, 0xE0, 0x45, 0x5E, 0x12,
+/*1190*/0x07, 0x29, 0x75, 0x83, 0xBE, 0xE0, 0x45, 0x5E,
+	0x12, 0x07, 0x29, 0x75, 0x83, 0xC0, 0xE0, 0x45,
+/*11A0*/0x5E, 0xF0, 0xAF, 0x5F, 0xE5, 0x60, 0x12, 0x08,
+	0x78, 0x12, 0x0A, 0xFF, 0xAF, 0x62, 0x7E, 0x00,
+/*11B0*/0xAD, 0x5D, 0xAC, 0x5C, 0x12, 0x04, 0x44, 0xE5,
+	0x61, 0xAF, 0x5E, 0x7E, 0x00, 0xB4, 0x03, 0x05,
+/*11C0*/0x12, 0x1E, 0x21, 0x80, 0x07, 0xAD, 0x5D, 0xAC,
+	0x5C, 0x12, 0x13, 0x17, 0x05, 0x5E, 0x02, 0x11,
+/*11D0*/0x7A, 0x12, 0x07, 0x03, 0x75, 0x83, 0xBC, 0xE0,
+	0x45, 0x40, 0x12, 0x07, 0x29, 0x75, 0x83, 0xBE,
+/*11E0*/0xE0, 0x45, 0x40, 0x12, 0x07, 0x29, 0x75, 0x83,
+	0xC0, 0xE0, 0x45, 0x40, 0xF0, 0x22, 0x8E, 0x58,
+/*11F0*/0x8F, 0x59, 0x75, 0x5A, 0x01, 0x79, 0x01, 0x75,
+	0x5B, 0x01, 0xE4, 0xFB, 0x12, 0x07, 0x2A, 0x75,
+/*1200*/0x83, 0xAE, 0xE0, 0x54, 0x1A, 0xFF, 0x12, 0x08,
+	0x65, 0xE0, 0xC4, 0x13, 0x54, 0x07, 0xFE, 0xEF,
+/*1210*/0x70, 0x0C, 0xEE, 0x65, 0x35, 0x70, 0x07, 0x90,
+	0x07, 0x2F, 0xE0, 0xB4, 0x01, 0x0D, 0xAF, 0x35,
+/*1220*/0x7E, 0x00, 0x12, 0x0E, 0xA9, 0xCF, 0xEB, 0xCF,
+	0x02, 0x1E, 0x60, 0xE5, 0x59, 0x64, 0x02, 0x45,
+/*1230*/0x58, 0x70, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F,
+	0x00, 0xE5, 0x59, 0x45, 0x58, 0x70, 0x04, 0x7E,
+/*1240*/0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, 0x60,
+	0x23, 0x85, 0x41, 0x49, 0x85, 0x40, 0x4B, 0xE5,
+/*1250*/0x59, 0x45, 0x58, 0x70, 0x2C, 0xAF, 0x5A, 0xFE,
+	0xCD, 0xE9, 0xCD, 0xFC, 0xAB, 0x59, 0xAA, 0x58,
+/*1260*/0x12, 0x0A, 0xFF, 0xAF, 0x5B, 0x7E, 0x00, 0x12,
+	0x1E, 0x60, 0x80, 0x15, 0xAF, 0x5B, 0x7E, 0x00,
+/*1270*/0x12, 0x1E, 0x60, 0x90, 0x07, 0x26, 0x12, 0x07,
+	0x35, 0xE5, 0x49, 0x12, 0x07, 0x31, 0xE5, 0x4B,
+/*1280*/0xF0, 0xE4, 0xFD, 0xAF, 0x35, 0xFE, 0xFC, 0x12,
+	0x09, 0x15, 0x22, 0x8C, 0x64, 0x8D, 0x65, 0x12,
+/*1290*/0x08, 0xDA, 0x40, 0x3C, 0xE5, 0x65, 0x45, 0x64,
+	0x70, 0x10, 0x12, 0x09, 0x04, 0xC3, 0xE5, 0x3E,
+/*12A0*/0x12, 0x07, 0x69, 0x40, 0x3B, 0x12, 0x08, 0x95,
+	0x80, 0x18, 0xE5, 0x3E, 0xC3, 0x95, 0x38, 0x40,
+/*12B0*/0x1D, 0x85, 0x3E, 0x38, 0xE5, 0x3E, 0x60, 0x05,
+	0x85, 0x3F, 0x39, 0x80, 0x03, 0x85, 0x39, 0x39,
+/*12C0*/0x8F, 0x3A, 0x12, 0x07, 0xA8, 0xE5, 0x3E, 0x12,
+	0x07, 0x53, 0xE5, 0x3F, 0xF0, 0x22, 0x80, 0x3B,
+/*12D0*/0xE5, 0x65, 0x45, 0x64, 0x70, 0x11, 0x12, 0x07,
+	0x5F, 0x40, 0x05, 0x12, 0x08, 0x9E, 0x80, 0x1F,
+/*12E0*/0x12, 0x07, 0x3E, 0xE5, 0x41, 0xF0, 0x22, 0xE5,
+	0x3C, 0xC3, 0x95, 0x38, 0x40, 0x1D, 0x85, 0x3C,
+/*12F0*/0x38, 0xE5, 0x3C, 0x60, 0x05, 0x85, 0x3D, 0x39,
+	0x80, 0x03, 0x85, 0x39, 0x39, 0x8F, 0x3A, 0x12,
+/*1300*/0x07, 0xA8, 0xE5, 0x3C, 0x12, 0x07, 0x53, 0xE5,
+	0x3D, 0xF0, 0x22, 0x12, 0x07, 0x9F, 0xE5, 0x38,
+/*1310*/0x12, 0x07, 0x53, 0xE5, 0x39, 0xF0, 0x22, 0x8C,
+	0x63, 0x8D, 0x64, 0x12, 0x08, 0xDA, 0x40, 0x3C,
+/*1320*/0xE5, 0x64, 0x45, 0x63, 0x70, 0x10, 0x12, 0x09,
+	0x04, 0xC3, 0xE5, 0x3E, 0x12, 0x07, 0x69, 0x40,
+/*1330*/0x3B, 0x12, 0x08, 0x95, 0x80, 0x18, 0xE5, 0x3E,
+	0xC3, 0x95, 0x38, 0x40, 0x1D, 0x85, 0x3E, 0x38,
+/*1340*/0xE5, 0x3E, 0x60, 0x05, 0x85, 0x3F, 0x39, 0x80,
+	0x03, 0x85, 0x39, 0x39, 0x8F, 0x3A, 0x12, 0x07,
+/*1350*/0xA8, 0xE5, 0x3E, 0x12, 0x07, 0x53, 0xE5, 0x3F,
+	0xF0, 0x22, 0x80, 0x3B, 0xE5, 0x64, 0x45, 0x63,
+/*1360*/0x70, 0x11, 0x12, 0x07, 0x5F, 0x40, 0x05, 0x12,
+	0x08, 0x9E, 0x80, 0x1F, 0x12, 0x07, 0x3E, 0xE5,
+/*1370*/0x41, 0xF0, 0x22, 0xE5, 0x3C, 0xC3, 0x95, 0x38,
+	0x40, 0x1D, 0x85, 0x3C, 0x38, 0xE5, 0x3C, 0x60,
+/*1380*/0x05, 0x85, 0x3D, 0x39, 0x80, 0x03, 0x85, 0x39,
+	0x39, 0x8F, 0x3A, 0x12, 0x07, 0xA8, 0xE5, 0x3C,
+/*1390*/0x12, 0x07, 0x53, 0xE5, 0x3D, 0xF0, 0x22, 0x12,
+	0x07, 0x9F, 0xE5, 0x38, 0x12, 0x07, 0x53, 0xE5,
+/*13A0*/0x39, 0xF0, 0x22, 0xE5, 0x0D, 0xFE, 0xE5, 0x08,
+	0x8E, 0x54, 0x44, 0x05, 0xF5, 0x55, 0x75, 0x15,
+/*13B0*/0x0F, 0xF5, 0x82, 0x12, 0x0E, 0x7A, 0x12, 0x17,
+	0xA3, 0x20, 0x31, 0x05, 0x75, 0x15, 0x03, 0x80,
+/*13C0*/0x03, 0x75, 0x15, 0x0B, 0xE5, 0x0A, 0xC3, 0x94,
+	0x01, 0x50, 0x38, 0x12, 0x14, 0x20, 0x20, 0x31,
+/*13D0*/0x06, 0x05, 0x15, 0x05, 0x15, 0x80, 0x04, 0x15,
+	0x15, 0x15, 0x15, 0xE5, 0x0A, 0xC3, 0x94, 0x01,
+/*13E0*/0x50, 0x21, 0x12, 0x14, 0x20, 0x20, 0x31, 0x04,
+	0x05, 0x15, 0x80, 0x02, 0x15, 0x15, 0xE5, 0x0A,
+/*13F0*/0xC3, 0x94, 0x01, 0x50, 0x0E, 0x12, 0x0E, 0x77,
+	0x12, 0x17, 0xA3, 0x20, 0x31, 0x05, 0x05, 0x15,
+/*1400*/0x12, 0x0E, 0x77, 0xE5, 0x15, 0xB4, 0x08, 0x04,
+	0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x15,
+/*1410*/0xB4, 0x07, 0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E,
+	0x00, 0xEE, 0x4F, 0x60, 0x02, 0x05, 0x7F, 0x22,
+/*1420*/0x85, 0x55, 0x82, 0x85, 0x54, 0x83, 0xE5, 0x15,
+	0xF0, 0x12, 0x17, 0xA3, 0x22, 0x12, 0x07, 0x2A,
+/*1430*/0x75, 0x83, 0xAE, 0x74, 0xFF, 0x12, 0x07, 0x29,
+	0xE0, 0x54, 0x1A, 0xF5, 0x34, 0xE0, 0xC4, 0x13,
+/*1440*/0x54, 0x07, 0xF5, 0x35, 0x24, 0xFE, 0x60, 0x24,
+	0x24, 0xFE, 0x60, 0x3C, 0x24, 0x04, 0x70, 0x63,
+/*1450*/0x75, 0x31, 0x2D, 0xE5, 0x08, 0xFD, 0x74, 0xB6,
+	0x12, 0x07, 0x92, 0x74, 0xBC, 0x90, 0x07, 0x22,
+/*1460*/0x12, 0x07, 0x95, 0x74, 0x90, 0x12, 0x07, 0xB3,
+	0x74, 0x92, 0x80, 0x3C, 0x75, 0x31, 0x3A, 0xE5,
+/*1470*/0x08, 0xFD, 0x74, 0xBA, 0x12, 0x07, 0x92, 0x74,
+	0xC0, 0x90, 0x07, 0x22, 0x12, 0x07, 0xB6, 0x74,
+/*1480*/0xC4, 0x12, 0x07, 0xB3, 0x74, 0xC8, 0x80, 0x20,
+	0x75, 0x31, 0x35, 0xE5, 0x08, 0xFD, 0x74, 0xB8,
+/*1490*/0x12, 0x07, 0x92, 0x74, 0xBE, 0xFF, 0xED, 0x44,
+	0x07, 0x90, 0x07, 0x22, 0xCF, 0xF0, 0xA3, 0xEF,
+/*14A0*/0xF0, 0x74, 0xC2, 0x12, 0x07, 0xB3, 0x74, 0xC6,
+	0xFF, 0xED, 0x44, 0x07, 0xA3, 0xCF, 0xF0, 0xA3,
+/*14B0*/0xEF, 0xF0, 0x22, 0x75, 0x34, 0x01, 0x22, 0x8E,
+	0x58, 0x8F, 0x59, 0x8C, 0x5A, 0x8D, 0x5B, 0x8A,
+/*14C0*/0x5C, 0x8B, 0x5D, 0x75, 0x5E, 0x01, 0xE4, 0xF5,
+	0x5F, 0x12, 0x1E, 0xA5, 0x85, 0x59, 0x5E, 0xD3,
+/*14D0*/0xE5, 0x5E, 0x95, 0x5B, 0xE5, 0x5A, 0x12, 0x07,
+	0x6B, 0x50, 0x57, 0xE5, 0x5D, 0x45, 0x5C, 0x70,
+/*14E0*/0x30, 0x12, 0x07, 0x2A, 0x75, 0x83, 0x92, 0xE5,
+	0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC6, 0xE5,
+/*14F0*/0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC8, 0xE5,
+	0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0x90, 0xE5,
+/*1500*/0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2, 0xE5,
+	0x5E, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0x80,
+/*1510*/0x03, 0x12, 0x07, 0x32, 0xE5, 0x5E, 0xF0, 0xAF,
+	0x5F, 0x7E, 0x00, 0xAD, 0x5D, 0xAC, 0x5C, 0x12,
+/*1520*/0x04, 0x44, 0xAF, 0x5E, 0x7E, 0x00, 0xAD, 0x5D,
+	0xAC, 0x5C, 0x12, 0x0B, 0xD1, 0x05, 0x5E, 0x02,
+/*1530*/0x14, 0xCF, 0xAB, 0x5D, 0xAA, 0x5C, 0xAD, 0x5B,
+	0xAC, 0x5A, 0xAF, 0x59, 0xAE, 0x58, 0x02, 0x1B,
+/*1540*/0xFB, 0x8C, 0x5C, 0x8D, 0x5D, 0x8A, 0x5E, 0x8B,
+	0x5F, 0x75, 0x60, 0x01, 0xE4, 0xF5, 0x61, 0xF5,
+/*1550*/0x62, 0xF5, 0x63, 0x12, 0x1E, 0xA5, 0x8F, 0x60,
+	0xD3, 0xE5, 0x60, 0x95, 0x5D, 0xE5, 0x5C, 0x12,
+/*1560*/0x07, 0x6B, 0x50, 0x61, 0xE5, 0x5F, 0x45, 0x5E,
+	0x70, 0x27, 0x12, 0x07, 0x2A, 0x75, 0x83, 0xB6,
+/*1570*/0xE5, 0x60, 0x12, 0x07, 0x29, 0x75, 0x83, 0xB8,
+	0xE5, 0x60, 0x12, 0x07, 0x29, 0x75, 0x83, 0xBA,
+/*1580*/0xE5, 0x60, 0xF0, 0xAF, 0x61, 0x7E, 0x00, 0xE5,
+	0x62, 0x12, 0x08, 0x7A, 0x12, 0x0A, 0xFF, 0x80,
+/*1590*/0x19, 0x90, 0x07, 0x24, 0x12, 0x07, 0x35, 0xE5,
+	0x60, 0x12, 0x07, 0x29, 0x75, 0x83, 0x8E, 0xE4,
+/*15A0*/0x12, 0x07, 0x29, 0x74, 0x01, 0x12, 0x07, 0x29,
+	0xE4, 0xF0, 0xAF, 0x63, 0x7E, 0x00, 0xAD, 0x5F,
+/*15B0*/0xAC, 0x5E, 0x12, 0x04, 0x44, 0xAF, 0x60, 0x7E,
+	0x00, 0xAD, 0x5F, 0xAC, 0x5E, 0x12, 0x12, 0x8B,
+/*15C0*/0x05, 0x60, 0x02, 0x15, 0x58, 0x22, 0x90, 0x11,
+	0x4D, 0xE4, 0x93, 0x90, 0x07, 0x2E, 0xF0, 0x12,
+/*15D0*/0x08, 0x1F, 0x75, 0x83, 0xAE, 0xE0, 0x54, 0x1A,
+	0xF5, 0x34, 0x70, 0x67, 0xEF, 0x44, 0x07, 0xF5,
+/*15E0*/0x82, 0x75, 0x83, 0xCE, 0xE0, 0xFF, 0x13, 0x13,
+	0x13, 0x54, 0x07, 0xF5, 0x36, 0x54, 0x0F, 0xD3,
+/*15F0*/0x94, 0x00, 0x40, 0x06, 0x12, 0x14, 0x2D, 0x12,
+	0x1B, 0xA9, 0xE5, 0x36, 0x54, 0x0F, 0x24, 0xFE,
+/*1600*/0x60, 0x0C, 0x14, 0x60, 0x0C, 0x14, 0x60, 0x19,
+	0x24, 0x03, 0x70, 0x37, 0x80, 0x10, 0x02, 0x1E,
+/*1610*/0x91, 0x12, 0x1E, 0x91, 0x12, 0x07, 0x2A, 0x75,
+	0x83, 0xCE, 0xE0, 0x54, 0xEF, 0xF0, 0x02, 0x1D,
+/*1620*/0xAE, 0x12, 0x10, 0x14, 0xE4, 0xF5, 0x55, 0x12,
+	0x1D, 0x85, 0x05, 0x55, 0xE5, 0x55, 0xC3, 0x94,
+/*1630*/0x05, 0x40, 0xF4, 0x12, 0x07, 0x2A, 0x75, 0x83,
+	0xCE, 0xE0, 0x54, 0xC7, 0x12, 0x07, 0x29, 0xE0,
+/*1640*/0x44, 0x08, 0xF0, 0x22, 0xE4, 0xF5, 0x58, 0xF5,
+	0x59, 0xAF, 0x08, 0xEF, 0x44, 0x07, 0xF5, 0x82,
+/*1650*/0x75, 0x83, 0xD0, 0xE0, 0xFD, 0xC4, 0x54, 0x0F,
+	0xF5, 0x5A, 0xEF, 0x44, 0x07, 0xF5, 0x82, 0x75,
+/*1660*/0x83, 0x80, 0x74, 0x01, 0xF0, 0x12, 0x08, 0x21,
+	0x75, 0x83, 0x82, 0xE5, 0x45, 0xF0, 0xEF, 0x44,
+/*1670*/0x07, 0xF5, 0x82, 0x75, 0x83, 0x8A, 0x74, 0xFF,
+	0xF0, 0x12, 0x1A, 0x4D, 0x12, 0x07, 0x2A, 0x75,
+/*1680*/0x83, 0xBC, 0xE0, 0x54, 0xEF, 0x12, 0x07, 0x29,
+	0x75, 0x83, 0xBE, 0xE0, 0x54, 0xEF, 0x12, 0x07,
+/*1690*/0x29, 0x75, 0x83, 0xC0, 0xE0, 0x54, 0xEF, 0x12,
+	0x07, 0x29, 0x75, 0x83, 0xBC, 0xE0, 0x44, 0x10,
+/*16A0*/0x12, 0x07, 0x29, 0x75, 0x83, 0xBE, 0xE0, 0x44,
+	0x10, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC0, 0xE0,
+/*16B0*/0x44, 0x10, 0xF0, 0xAF, 0x58, 0xE5, 0x59, 0x12,
+	0x08, 0x78, 0x02, 0x0A, 0xFF, 0xE4, 0xF5, 0x58,
+/*16C0*/0x7D, 0x01, 0xF5, 0x59, 0xAF, 0x35, 0xFE, 0xFC,
+	0x12, 0x09, 0x15, 0x12, 0x07, 0x2A, 0x75, 0x83,
+/*16D0*/0xB6, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83,
+	0xB8, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83,
+/*16E0*/0xBA, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83,
+	0xBC, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83,
+/*16F0*/0xBE, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83,
+	0xC0, 0x74, 0x10, 0x12, 0x07, 0x29, 0x75, 0x83,
+/*1700*/0x90, 0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC2,
+	0xE4, 0x12, 0x07, 0x29, 0x75, 0x83, 0xC4, 0xE4,
+/*1710*/0x12, 0x07, 0x29, 0x75, 0x83, 0x92, 0xE4, 0x12,
+	0x07, 0x29, 0x75, 0x83, 0xC6, 0xE4, 0x12, 0x07,
+/*1720*/0x29, 0x75, 0x83, 0xC8, 0xE4, 0xF0, 0xAF, 0x58,
+	0xFE, 0xE5, 0x59, 0x12, 0x08, 0x7A, 0x02, 0x0A,
+/*1730*/0xFF, 0xE5, 0xE2, 0x30, 0xE4, 0x6C, 0xE5, 0xE7,
+	0x54, 0xC0, 0x64, 0x40, 0x70, 0x64, 0xE5, 0x09,
+/*1740*/0xC4, 0x54, 0x30, 0xFE, 0xE5, 0x08, 0x25, 0xE0,
+	0x25, 0xE0, 0x54, 0xC0, 0x4E, 0xFE, 0xEF, 0x54,
+/*1750*/0x3F, 0x4E, 0xFD, 0xE5, 0x2B, 0xAE, 0x2A, 0x78,
+	0x02, 0xC3, 0x33, 0xCE, 0x33, 0xCE, 0xD8, 0xF9,
+/*1760*/0xF5, 0x82, 0x8E, 0x83, 0xED, 0xF0, 0xE5, 0x2B,
+	0xAE, 0x2A, 0x78, 0x02, 0xC3, 0x33, 0xCE, 0x33,
+/*1770*/0xCE, 0xD8, 0xF9, 0xFF, 0xF5, 0x82, 0x8E, 0x83,
+	0xA3, 0xE5, 0xFE, 0xF0, 0x8F, 0x82, 0x8E, 0x83,
+/*1780*/0xA3, 0xA3, 0xE5, 0xFD, 0xF0, 0x8F, 0x82, 0x8E,
+	0x83, 0xA3, 0xA3, 0xA3, 0xE5, 0xFC, 0xF0, 0xC3,
+/*1790*/0xE5, 0x2B, 0x94, 0xFA, 0xE5, 0x2A, 0x94, 0x00,
+	0x50, 0x08, 0x05, 0x2B, 0xE5, 0x2B, 0x70, 0x02,
+/*17A0*/0x05, 0x2A, 0x22, 0xE4, 0xFF, 0xE4, 0xF5, 0x58,
+	0xF5, 0x56, 0xF5, 0x57, 0x74, 0x82, 0xFC, 0x12,
+/*17B0*/0x0E, 0x04, 0x8C, 0x83, 0xE0, 0xF5, 0x10, 0x54,
+	0x7F, 0xF0, 0xE5, 0x10, 0x44, 0x80, 0x12, 0x0E,
+/*17C0*/0x98, 0xED, 0xF0, 0x7E, 0x0A, 0x12, 0x0E, 0x04,
+	0x75, 0x83, 0xA0, 0xE0, 0x20, 0xE0, 0x26, 0xDE,
+/*17D0*/0xF4, 0x05, 0x57, 0xE5, 0x57, 0x70, 0x02, 0x05,
+	0x56, 0xE5, 0x14, 0x24, 0x01, 0xFD, 0xE4, 0x33,
+/*17E0*/0xFC, 0xD3, 0xE5, 0x57, 0x9D, 0xE5, 0x56, 0x9C,
+	0x40, 0xD9, 0xE5, 0x0A, 0x94, 0x20, 0x50, 0x02,
+/*17F0*/0x05, 0x0A, 0x43, 0xE1, 0x08, 0xC2, 0x31, 0x12,
+	0x0E, 0x04, 0x75, 0x83, 0xA6, 0xE0, 0x55, 0x12,
+/*1800*/0x65, 0x12, 0x70, 0x03, 0xD2, 0x31, 0x22, 0xC2,
+	0x31, 0x22, 0x90, 0x07, 0x26, 0xE0, 0xFA, 0xA3,
+/*1810*/0xE0, 0xF5, 0x82, 0x8A, 0x83, 0xE0, 0xF5, 0x41,
+	0xE5, 0x39, 0xC3, 0x95, 0x41, 0x40, 0x26, 0xE5,
+/*1820*/0x39, 0x95, 0x41, 0xC3, 0x9F, 0xEE, 0x12, 0x07,
+	0x6B, 0x40, 0x04, 0x7C, 0x01, 0x80, 0x02, 0x7C,
+/*1830*/0x00, 0xE5, 0x41, 0x64, 0x3F, 0x60, 0x04, 0x7B,
+	0x01, 0x80, 0x02, 0x7B, 0x00, 0xEC, 0x5B, 0x60,
+/*1840*/0x29, 0x05, 0x41, 0x80, 0x28, 0xC3, 0xE5, 0x41,
+	0x95, 0x39, 0xC3, 0x9F, 0xEE, 0x12, 0x07, 0x6B,
+/*1850*/0x40, 0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00,
+	0xE5, 0x41, 0x60, 0x04, 0x7E, 0x01, 0x80, 0x02,
+/*1860*/0x7E, 0x00, 0xEF, 0x5E, 0x60, 0x04, 0x15, 0x41,
+	0x80, 0x03, 0x85, 0x39, 0x41, 0x85, 0x3A, 0x40,
+/*1870*/0x22, 0xE5, 0xE2, 0x30, 0xE4, 0x60, 0xE5, 0xE1,
+	0x30, 0xE2, 0x5B, 0xE5, 0x09, 0x70, 0x04, 0x7F,
+/*1880*/0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5, 0x08, 0x70,
+	0x04, 0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE,
+/*1890*/0x5F, 0x60, 0x43, 0x53, 0xF9, 0xF8, 0xE5, 0xE2,
+	0x30, 0xE4, 0x3B, 0xE5, 0xE1, 0x30, 0xE2, 0x2E,
+/*18A0*/0x43, 0xFA, 0x02, 0x53, 0xFA, 0xFB, 0xE4, 0xF5,
+	0x10, 0x90, 0x94, 0x70, 0xE5, 0x10, 0xF0, 0xE5,
+/*18B0*/0xE1, 0x30, 0xE2, 0xE7, 0x90, 0x94, 0x70, 0xE0,
+	0x65, 0x10, 0x60, 0x03, 0x43, 0xFA, 0x04, 0x05,
+/*18C0*/0x10, 0x90, 0x94, 0x70, 0xE5, 0x10, 0xF0, 0x70,
+	0xE6, 0x12, 0x00, 0x06, 0x80, 0xE1, 0x53, 0xFA,
+/*18D0*/0xFD, 0x53, 0xFA, 0xFB, 0x80, 0xC0, 0x22, 0x8F,
+	0x54, 0x12, 0x00, 0x06, 0xE5, 0xE1, 0x30, 0xE0,
+/*18E0*/0x04, 0x7F, 0x01, 0x80, 0x02, 0x7F, 0x00, 0xE5,
+	0x7E, 0xD3, 0x94, 0x05, 0x40, 0x04, 0x7E, 0x01,
+/*18F0*/0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F, 0x60, 0x3D,
+	0x85, 0x54, 0x11, 0xE5, 0xE2, 0x20, 0xE1, 0x32,
+/*1900*/0x74, 0xCE, 0x12, 0x1A, 0x05, 0x30, 0xE7, 0x04,
+	0x7D, 0x01, 0x80, 0x02, 0x7D, 0x00, 0x8F, 0x82,
+/*1910*/0x8E, 0x83, 0xE0, 0x30, 0xE6, 0x04, 0x7F, 0x01,
+	0x80, 0x02, 0x7F, 0x00, 0xEF, 0x5D, 0x70, 0x15,
+/*1920*/0x12, 0x15, 0xC6, 0x74, 0xCE, 0x12, 0x1A, 0x05,
+	0x30, 0xE6, 0x07, 0xE0, 0x44, 0x80, 0xF0, 0x43,
+/*1930*/0xF9, 0x80, 0x12, 0x18, 0x71, 0x22, 0x12, 0x0E,
+	0x44, 0xE5, 0x16, 0x25, 0xE0, 0x25, 0xE0, 0x24,
+/*1940*/0xB0, 0xF5, 0x82, 0xE4, 0x34, 0x1A, 0xF5, 0x83,
+	0xE4, 0x93, 0xF5, 0x0F, 0xE5, 0x16, 0x25, 0xE0,
+/*1950*/0x25, 0xE0, 0x24, 0xB1, 0xF5, 0x82, 0xE4, 0x34,
+	0x1A, 0xF5, 0x83, 0xE4, 0x93, 0xF5, 0x0E, 0x12,
+/*1960*/0x0E, 0x65, 0xF5, 0x10, 0xE5, 0x0F, 0x54, 0xF0,
+	0x12, 0x0E, 0x17, 0x75, 0x83, 0x8C, 0xEF, 0xF0,
+/*1970*/0xE5, 0x0F, 0x30, 0xE0, 0x0C, 0x12, 0x0E, 0x04,
+	0x75, 0x83, 0x86, 0xE0, 0x44, 0x40, 0xF0, 0x80,
+/*1980*/0x0A, 0x12, 0x0E, 0x04, 0x75, 0x83, 0x86, 0xE0,
+	0x54, 0xBF, 0xF0, 0x12, 0x0E, 0x91, 0x75, 0x83,
+/*1990*/0x82, 0xE5, 0x0E, 0xF0, 0x22, 0x7F, 0x05, 0x12,
+	0x17, 0x31, 0x12, 0x0E, 0x04, 0x12, 0x0E, 0x33,
+/*19A0*/0x74, 0x02, 0xF0, 0x74, 0x8E, 0xFE, 0x12, 0x0E,
+	0x04, 0x12, 0x0E, 0x0B, 0xEF, 0xF0, 0x75, 0x15,
+/*19B0*/0x70, 0x12, 0x0F, 0xF7, 0x20, 0x34, 0x05, 0x75,
+	0x15, 0x10, 0x80, 0x03, 0x75, 0x15, 0x50, 0x12,
+/*19C0*/0x0F, 0xF7, 0x20, 0x34, 0x04, 0x74, 0x10, 0x80,
+	0x02, 0x74, 0xF0, 0x25, 0x15, 0xF5, 0x15, 0x12,
+/*19D0*/0x0E, 0x21, 0xEF, 0xF0, 0x12, 0x10, 0x91, 0x20,
+	0x34, 0x17, 0xE5, 0x15, 0x64, 0x30, 0x60, 0x0C,
+/*19E0*/0x74, 0x10, 0x25, 0x15, 0xF5, 0x15, 0xB4, 0x80,
+	0x03, 0xE4, 0xF5, 0x15, 0x12, 0x0E, 0x21, 0xEF,
+/*19F0*/0xF0, 0x22, 0xF0, 0xE5, 0x0B, 0x25, 0xE0, 0x25,
+	0xE0, 0x24, 0x82, 0xF5, 0x82, 0xE4, 0x34, 0x07,
+/*1A00*/0xF5, 0x83, 0x22, 0x74, 0x88, 0xFE, 0xE5, 0x08,
+	0x44, 0x07, 0xFF, 0xF5, 0x82, 0x8E, 0x83, 0xE0,
+/*1A10*/0x22, 0xF0, 0xE5, 0x08, 0x44, 0x07, 0xF5, 0x82,
+	0x22, 0xF0, 0xE0, 0x54, 0xC0, 0x8F, 0x82, 0x8E,
+/*1A20*/0x83, 0xF0, 0x22, 0xEF, 0x44, 0x07, 0xF5, 0x82,
+	0x75, 0x83, 0x86, 0xE0, 0x54, 0x10, 0xD3, 0x94,
+/*1A30*/0x00, 0x22, 0xF0, 0x90, 0x07, 0x15, 0xE0, 0x04,
+	0xF0, 0x22, 0x44, 0x06, 0xF5, 0x82, 0x75, 0x83,
+/*1A40*/0x9E, 0xE0, 0x22, 0xFE, 0xEF, 0x44, 0x07, 0xF5,
+	0x82, 0x8E, 0x83, 0xE0, 0x22, 0xE4, 0x90, 0x07,
+/*1A50*/0x2A, 0xF0, 0xA3, 0xF0, 0x12, 0x07, 0x2A, 0x75,
+	0x83, 0x82, 0xE0, 0x54, 0x7F, 0x12, 0x07, 0x29,
+/*1A60*/0xE0, 0x44, 0x80, 0xF0, 0x12, 0x10, 0xFC, 0x12,
+	0x08, 0x1F, 0x75, 0x83, 0xA0, 0xE0, 0x20, 0xE0,
+/*1A70*/0x1A, 0x90, 0x07, 0x2B, 0xE0, 0x04, 0xF0, 0x70,
+	0x06, 0x90, 0x07, 0x2A, 0xE0, 0x04, 0xF0, 0x90,
+/*1A80*/0x07, 0x2A, 0xE0, 0xB4, 0x10, 0xE1, 0xA3, 0xE0,
+	0xB4, 0x00, 0xDC, 0xEE, 0x44, 0xA6, 0xFC, 0xEF,
+/*1A90*/0x44, 0x07, 0xF5, 0x82, 0x8C, 0x83, 0xE0, 0xF5,
+	0x32, 0xEE, 0x44, 0xA8, 0xFE, 0xEF, 0x44, 0x07,
+/*1AA0*/0xF5, 0x82, 0x8E, 0x83, 0xE0, 0xF5, 0x33, 0x22,
+	0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x90,
+/*1AB0*/0x00, 0x20, 0x0F, 0x92, 0x00, 0x21, 0x0F, 0x94,
+	0x00, 0x22, 0x0F, 0x96, 0x00, 0x23, 0x0F, 0x98,
+/*1AC0*/0x00, 0x24, 0x0F, 0x9A, 0x00, 0x25, 0x0F, 0x9C,
+	0x00, 0x26, 0x0F, 0x9E, 0x00, 0x27, 0x0F, 0xA0,
+/*1AD0*/0x01, 0x20, 0x01, 0xA2, 0x01, 0x21, 0x01, 0xA4,
+	0x01, 0x22, 0x01, 0xA6, 0x01, 0x23, 0x01, 0xA8,
+/*1AE0*/0x01, 0x24, 0x01, 0xAA, 0x01, 0x25, 0x01, 0xAC,
+	0x01, 0x26, 0x01, 0xAE, 0x01, 0x27, 0x01, 0xB0,
+/*1AF0*/0x01, 0x28, 0x01, 0xB4, 0x00, 0x28, 0x0F, 0xB6,
+	0x40, 0x28, 0x0F, 0xB8, 0x61, 0x28, 0x01, 0xCB,
+/*1B00*/0xEF, 0xCB, 0xCA, 0xEE, 0xCA, 0x7F, 0x01, 0xE4,
+	0xFD, 0xEB, 0x4A, 0x70, 0x24, 0xE5, 0x08, 0xF5,
+/*1B10*/0x82, 0x74, 0xB6, 0x12, 0x08, 0x29, 0xE5, 0x08,
+	0xF5, 0x82, 0x74, 0xB8, 0x12, 0x08, 0x29, 0xE5,
+/*1B20*/0x08, 0xF5, 0x82, 0x74, 0xBA, 0x12, 0x08, 0x29,
+	0x7E, 0x00, 0x7C, 0x00, 0x12, 0x0A, 0xFF, 0x80,
+/*1B30*/0x12, 0x90, 0x07, 0x26, 0x12, 0x07, 0x35, 0xE5,
+	0x41, 0xF0, 0x90, 0x07, 0x24, 0x12, 0x07, 0x35,
+/*1B40*/0xE5, 0x40, 0xF0, 0x12, 0x07, 0x2A, 0x75, 0x83,
+	0x8E, 0xE4, 0x12, 0x07, 0x29, 0x74, 0x01, 0x12,
+/*1B50*/0x07, 0x29, 0xE4, 0xF0, 0x22, 0xE4, 0xF5, 0x26,
+	0xF5, 0x27, 0x53, 0xE1, 0xFE, 0xF5, 0x2A, 0x75,
+/*1B60*/0x2B, 0x01, 0xF5, 0x08, 0x7F, 0x01, 0x12, 0x17,
+	0x31, 0x30, 0x30, 0x1C, 0x90, 0x1A, 0xA9, 0xE4,
+/*1B70*/0x93, 0xF5, 0x10, 0x90, 0x1F, 0xF9, 0xE4, 0x93,
+	0xF5, 0x10, 0x90, 0x00, 0x41, 0xE4, 0x93, 0xF5,
+/*1B80*/0x10, 0x90, 0x1E, 0xCA, 0xE4, 0x93, 0xF5, 0x10,
+	0x7F, 0x02, 0x12, 0x17, 0x31, 0x12, 0x0F, 0x54,
+/*1B90*/0x7F, 0x03, 0x12, 0x17, 0x31, 0x12, 0x00, 0x06,
+	0xE5, 0xE2, 0x30, 0xE7, 0x09, 0x12, 0x10, 0x00,
+/*1BA0*/0x30, 0x30, 0x03, 0x12, 0x11, 0x00, 0x02, 0x00,
+	0x47, 0x12, 0x08, 0x1F, 0x75, 0x83, 0xD0, 0xE0,
+/*1BB0*/0xC4, 0x54, 0x0F, 0xFD, 0x75, 0x43, 0x01, 0x75,
+	0x44, 0xFF, 0x12, 0x08, 0xAA, 0x74, 0x04, 0xF0,
+/*1BC0*/0x75, 0x3B, 0x01, 0xED, 0x14, 0x60, 0x0C, 0x14,
+	0x60, 0x0B, 0x14, 0x60, 0x0F, 0x24, 0x03, 0x70,
+/*1BD0*/0x0B, 0x80, 0x09, 0x80, 0x00, 0x12, 0x08, 0xA7,
+	0x04, 0xF0, 0x80, 0x06, 0x12, 0x08, 0xA7, 0x74,
+/*1BE0*/0x04, 0xF0, 0xEE, 0x44, 0x82, 0xFE, 0xEF, 0x44,
+	0x07, 0xF5, 0x82, 0x8E, 0x83, 0xE5, 0x45, 0x12,
+/*1BF0*/0x08, 0xBE, 0x75, 0x83, 0x82, 0xE5, 0x31, 0xF0,
+	0x02, 0x11, 0x4C, 0x8E, 0x60, 0x8F, 0x61, 0x12,
+/*1C00*/0x1E, 0xA5, 0xE4, 0xFF, 0xCE, 0xED, 0xCE, 0xEE,
+	0xD3, 0x95, 0x61, 0xE5, 0x60, 0x12, 0x07, 0x6B,
+/*1C10*/0x40, 0x39, 0x74, 0x20, 0x2E, 0xF5, 0x82, 0xE4,
+	0x34, 0x03, 0xF5, 0x83, 0xE0, 0x70, 0x03, 0xFF,
+/*1C20*/0x80, 0x26, 0x12, 0x08, 0xE2, 0xFD, 0xC3, 0x9F,
+	0x40, 0x1E, 0xCF, 0xED, 0xCF, 0xEB, 0x4A, 0x70,
+/*1C30*/0x0B, 0x8D, 0x42, 0x12, 0x08, 0xEE, 0xF5, 0x41,
+	0x8E, 0x40, 0x80, 0x0C, 0x12, 0x08, 0xE2, 0xF5,
+/*1C40*/0x38, 0x12, 0x08, 0xEE, 0xF5, 0x39, 0x8E, 0x3A,
+	0x1E, 0x80, 0xBC, 0x22, 0x75, 0x58, 0x01, 0xE5,
+/*1C50*/0x35, 0x70, 0x0C, 0x12, 0x07, 0xCC, 0xE0, 0xF5,
+	0x4A, 0x12, 0x07, 0xD8, 0xE0, 0xF5, 0x4C, 0xE5,
+/*1C60*/0x35, 0xB4, 0x04, 0x0C, 0x12, 0x07, 0xE4, 0xE0,
+	0xF5, 0x4A, 0x12, 0x07, 0xF0, 0xE0, 0xF5, 0x4C,
+/*1C70*/0xE5, 0x35, 0xB4, 0x01, 0x04, 0x7F, 0x01, 0x80,
+	0x02, 0x7F, 0x00, 0xE5, 0x35, 0xB4, 0x02, 0x04,
+/*1C80*/0x7E, 0x01, 0x80, 0x02, 0x7E, 0x00, 0xEE, 0x4F,
+	0x60, 0x0C, 0x12, 0x07, 0xFC, 0xE0, 0xF5, 0x4A,
+/*1C90*/0x12, 0x08, 0x08, 0xE0, 0xF5, 0x4C, 0x85, 0x41,
+	0x49, 0x85, 0x40, 0x4B, 0x22, 0x75, 0x5B, 0x01,
+/*1CA0*/0x90, 0x07, 0x24, 0x12, 0x07, 0x35, 0xE0, 0x54,
+	0x1F, 0xFF, 0xD3, 0x94, 0x02, 0x50, 0x04, 0x8F,
+/*1CB0*/0x58, 0x80, 0x05, 0xEF, 0x24, 0xFE, 0xF5, 0x58,
+	0xEF, 0xC3, 0x94, 0x18, 0x40, 0x05, 0x75, 0x59,
+/*1CC0*/0x18, 0x80, 0x04, 0xEF, 0x04, 0xF5, 0x59, 0x85,
+	0x43, 0x5A, 0xAF, 0x58, 0x7E, 0x00, 0xAD, 0x59,
+/*1CD0*/0x7C, 0x00, 0xAB, 0x5B, 0x7A, 0x00, 0x12, 0x15,
+	0x41, 0xAF, 0x5A, 0x7E, 0x00, 0x12, 0x18, 0x0A,
+/*1CE0*/0xAF, 0x5B, 0x7E, 0x00, 0x02, 0x1A, 0xFF, 0xE5,
+	0xE2, 0x30, 0xE7, 0x0E, 0x12, 0x10, 0x03, 0xC2,
+/*1CF0*/0x30, 0x30, 0x30, 0x03, 0x12, 0x10, 0xFF, 0x20,
+	0x33, 0x28, 0xE5, 0xE7, 0x30, 0xE7, 0x05, 0x12,
+/*1D00*/0x0E, 0xA2, 0x80, 0x0D, 0xE5, 0xFE, 0xC3, 0x94,
+	0x20, 0x50, 0x06, 0x12, 0x0E, 0xA2, 0x43, 0xF9,
+/*1D10*/0x08, 0xE5, 0xF2, 0x30, 0xE7, 0x03, 0x53, 0xF9,
+	0x7F, 0xE5, 0xF1, 0x54, 0x70, 0xD3, 0x94, 0x00,
+/*1D20*/0x50, 0xD8, 0x22, 0x12, 0x0E, 0x04, 0x75, 0x83,
+	0x80, 0xE4, 0xF0, 0xE5, 0x08, 0x44, 0x07, 0x12,
+/*1D30*/0x0D, 0xFD, 0x75, 0x83, 0x84, 0x12, 0x0E, 0x02,
+	0x75, 0x83, 0x86, 0x12, 0x0E, 0x02, 0x75, 0x83,
+/*1D40*/0x8C, 0xE0, 0x54, 0xF3, 0x12, 0x0E, 0x03, 0x75,
+	0x83, 0x8E, 0x12, 0x0E, 0x02, 0x75, 0x83, 0x94,
+/*1D50*/0xE0, 0x54, 0xFB, 0xF0, 0x22, 0x12, 0x07, 0x2A,
+	0x75, 0x83, 0x8E, 0xE4, 0x12, 0x07, 0x29, 0x74,
+/*1D60*/0x01, 0x12, 0x07, 0x29, 0xE4, 0x12, 0x08, 0xBE,
+	0x75, 0x83, 0x8C, 0xE0, 0x44, 0x20, 0x12, 0x08,
+/*1D70*/0xBE, 0xE0, 0x54, 0xDF, 0xF0, 0x74, 0x84, 0x85,
+	0x08, 0x82, 0xF5, 0x83, 0xE0, 0x54, 0x7F, 0xF0,
+/*1D80*/0xE0, 0x44, 0x80, 0xF0, 0x22, 0x75, 0x56, 0x01,
+	0xE4, 0xFD, 0xF5, 0x57, 0xAF, 0x35, 0xFE, 0xFC,
+/*1D90*/0x12, 0x09, 0x15, 0x12, 0x1C, 0x9D, 0x12, 0x1E,
+	0x7A, 0x12, 0x1C, 0x4C, 0xAF, 0x57, 0x7E, 0x00,
+/*1DA0*/0xAD, 0x56, 0x7C, 0x00, 0x12, 0x04, 0x44, 0xAF,
+	0x56, 0x7E, 0x00, 0x02, 0x11, 0xEE, 0x75, 0x56,
+/*1DB0*/0x01, 0xE4, 0xFD, 0xF5, 0x57, 0xAF, 0x35, 0xFE,
+	0xFC, 0x12, 0x09, 0x15, 0x12, 0x1C, 0x9D, 0x12,
+/*1DC0*/0x1E, 0x7A, 0x12, 0x1C, 0x4C, 0xAF, 0x57, 0x7E,
+	0x00, 0xAD, 0x56, 0x7C, 0x00, 0x12, 0x04, 0x44,
+/*1DD0*/0xAF, 0x56, 0x7E, 0x00, 0x02, 0x11, 0xEE, 0xE4,
+	0xF5, 0x16, 0x12, 0x0E, 0x44, 0xFE, 0xE5, 0x08,
+/*1DE0*/0x44, 0x05, 0xFF, 0x12, 0x0E, 0x65, 0x8F, 0x82,
+	0x8E, 0x83, 0xF0, 0x05, 0x16, 0xE5, 0x16, 0xC3,
+/*1DF0*/0x94, 0x14, 0x40, 0xE6, 0xE5, 0x08, 0x12, 0x0E,
+	0x2B, 0xE4, 0xF0, 0x22, 0xE4, 0xF5, 0x58, 0xF5,
+/*1E00*/0x59, 0xF5, 0x5A, 0xFF, 0xFE, 0xAD, 0x58, 0xFC,
+	0x12, 0x09, 0x15, 0x7F, 0x04, 0x7E, 0x00, 0xAD,
+/*1E10*/0x58, 0x7C, 0x00, 0x12, 0x09, 0x15, 0x7F, 0x02,
+	0x7E, 0x00, 0xAD, 0x58, 0x7C, 0x00, 0x02, 0x09,
+/*1E20*/0x15, 0xE5, 0x3C, 0x25, 0x3E, 0xFC, 0xE5, 0x42,
+	0x24, 0x00, 0xFB, 0xE4, 0x33, 0xFA, 0xEC, 0xC3,
+/*1E30*/0x9B, 0xEA, 0x12, 0x07, 0x6B, 0x40, 0x0B, 0x8C,
+	0x42, 0xE5, 0x3D, 0x25, 0x3F, 0xF5, 0x41, 0x8F,
+/*1E40*/0x40, 0x22, 0x12, 0x09, 0x0B, 0x22, 0x74, 0x84,
+	0xF5, 0x18, 0x85, 0x08, 0x19, 0x85, 0x19, 0x82,
+/*1E50*/0x85, 0x18, 0x83, 0xE0, 0x54, 0x7F, 0xF0, 0xE0,
+	0x44, 0x80, 0xF0, 0xE0, 0x44, 0x80, 0xF0, 0x22,
+/*1E60*/0xEF, 0x4E, 0x70, 0x0B, 0x12, 0x07, 0x2A, 0x75,
+	0x83, 0xD2, 0xE0, 0x54, 0xDF, 0xF0, 0x22, 0x12,
+/*1E70*/0x07, 0x2A, 0x75, 0x83, 0xD2, 0xE0, 0x44, 0x20,
+	0xF0, 0x22, 0x75, 0x58, 0x01, 0x90, 0x07, 0x26,
+/*1E80*/0x12, 0x07, 0x35, 0xE0, 0x54, 0x3F, 0xF5, 0x41,
+	0x12, 0x07, 0x32, 0xE0, 0x54, 0x3F, 0xF5, 0x40,
+/*1E90*/0x22, 0x75, 0x56, 0x02, 0xE4, 0xF5, 0x57, 0x12,
+	0x1D, 0xFC, 0xAF, 0x57, 0x7E, 0x00, 0xAD, 0x56,
+/*1EA0*/0x7C, 0x00, 0x02, 0x04, 0x44, 0xE4, 0xF5, 0x42,
+	0xF5, 0x41, 0xF5, 0x40, 0xF5, 0x38, 0xF5, 0x39,
+/*1EB0*/0xF5, 0x3A, 0x22, 0xEF, 0x54, 0x07, 0xFF, 0xE5,
+	0xF9, 0x54, 0xF8, 0x4F, 0xF5, 0xF9, 0x22, 0x7F,
+/*1EC0*/0x01, 0xE4, 0xFE, 0x0F, 0x0E, 0xBE, 0xFF, 0xFB,
+	0x22, 0x01, 0x20, 0x00, 0x01, 0x04, 0x20, 0x00,
+/*1ED0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1EE0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1EF0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F00*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F10*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F20*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F30*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F40*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F50*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F60*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F70*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F80*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1F90*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1FA0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1FB0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1FC0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1FD0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1FE0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*1FF0*/0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x01, 0x20, 0x11, 0x00, 0x04, 0x20, 0x00, 0x81
+};
+
+int ipath_sd7220_ib_load(struct ipath_devdata *dd)
+{
+	return ipath_sd7220_prog_ld(dd, IB_7220_SERDES, ipath_sd7220_ib_img,
+		sizeof(ipath_sd7220_ib_img), 0);
+}
+
+int ipath_sd7220_ib_vfy(struct ipath_devdata *dd)
+{
+	return ipath_sd7220_prog_vfy(dd, IB_7220_SERDES, ipath_sd7220_ib_img,
+		sizeof(ipath_sd7220_ib_img), 0);
+}


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:18 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:18 -0700
Subject: [ofa-general] [PATCH 15/20] IB/ipath - Add code for IBA7220 send DMA
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225018.28598.2373.stgit@eng-46.mv.qlogic.com>

From: John Gregor <john.gregor at qlogic.com>

The IBA7220 HCA has a new feature to DMA data to the on chip send buffers
instead of or in addition to the host CPU doing the data transfer.
This patch adds code to support the send DMA queue.

Signed-off-by: John Gregor <john.gregor at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_sdma.c |  743 ++++++++++++++++++++++++++++++
 1 files changed, 743 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_sdma.c b/drivers/infiniband/hw/ipath/ipath_sdma.c
new file mode 100644
index 0000000..5918caf
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_sdma.c
@@ -0,0 +1,743 @@
+/*
+ * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/spinlock.h>
+
+#include "ipath_kernel.h"
+#include "ipath_verbs.h"
+#include "ipath_common.h"
+
+#define SDMA_DESCQ_SZ PAGE_SIZE /* 256 entries per 4KB page */
+
+static void vl15_watchdog_enq(struct ipath_devdata *dd)
+{
+	/* ipath_sdma_lock must already be held */
+	if (atomic_inc_return(&dd->ipath_sdma_vl15_count) == 1) {
+		unsigned long interval = (HZ + 19) / 20;
+		dd->ipath_sdma_vl15_timer.expires = jiffies + interval;
+		add_timer(&dd->ipath_sdma_vl15_timer);
+	}
+}
+
+static void vl15_watchdog_deq(struct ipath_devdata *dd)
+{
+	/* ipath_sdma_lock must already be held */
+	if (atomic_dec_return(&dd->ipath_sdma_vl15_count) != 0) {
+		unsigned long interval = (HZ + 19) / 20;
+		mod_timer(&dd->ipath_sdma_vl15_timer, jiffies + interval);
+	} else {
+		del_timer(&dd->ipath_sdma_vl15_timer);
+	}
+}
+
+static void vl15_watchdog_timeout(unsigned long opaque)
+{
+	struct ipath_devdata *dd = (struct ipath_devdata *)opaque;
+
+	if (atomic_read(&dd->ipath_sdma_vl15_count) != 0) {
+		ipath_dbg("vl15 watchdog timeout - clearing\n");
+		ipath_cancel_sends(dd, 1);
+		ipath_hol_down(dd);
+	} else {
+		ipath_dbg("vl15 watchdog timeout - "
+			  "condition already cleared\n");
+	}
+}
+
+static void unmap_desc(struct ipath_devdata *dd, unsigned head)
+{
+	__le64 *descqp = &dd->ipath_sdma_descq[head].qw[0];
+	u64 desc[2];
+	dma_addr_t addr;
+	size_t len;
+
+	desc[0] = le64_to_cpu(descqp[0]);
+	desc[1] = le64_to_cpu(descqp[1]);
+
+	addr = (desc[1] << 32) | (desc[0] >> 32);
+	len = (desc[0] >> 14) & (0x7ffULL << 2);
+	dma_unmap_single(&dd->pcidev->dev, addr, len, DMA_TO_DEVICE);
+}
+
+/*
+ * ipath_sdma_lock should be locked before calling this.
+ */
+int ipath_sdma_make_progress(struct ipath_devdata *dd)
+{
+	struct list_head *lp = NULL;
+	struct ipath_sdma_txreq *txp = NULL;
+	u16 dmahead;
+	u16 start_idx = 0;
+	int progress = 0;
+
+	if (!list_empty(&dd->ipath_sdma_activelist)) {
+		lp = dd->ipath_sdma_activelist.next;
+		txp = list_entry(lp, struct ipath_sdma_txreq, list);
+		start_idx = txp->start_idx;
+	}
+
+	/*
+	 * Read the SDMA head register in order to know that the
+	 * interrupt clear has been written to the chip.
+	 * Otherwise, we may not get an interrupt for the last
+	 * descriptor in the queue.
+	 */
+	dmahead = (u16)ipath_read_kreg32(dd, dd->ipath_kregs->kr_senddmahead);
+	/* sanity check return value for error handling (chip reset, etc.) */
+	if (dmahead >= dd->ipath_sdma_descq_cnt)
+		goto done;
+
+	while (dd->ipath_sdma_descq_head != dmahead) {
+		if (txp && txp->flags & IPATH_SDMA_TXREQ_F_FREEDESC &&
+		    dd->ipath_sdma_descq_head == start_idx) {
+			unmap_desc(dd, dd->ipath_sdma_descq_head);
+			start_idx++;
+			if (start_idx == dd->ipath_sdma_descq_cnt)
+				start_idx = 0;
+		}
+
+		/* increment free count and head */
+		dd->ipath_sdma_descq_removed++;
+		if (++dd->ipath_sdma_descq_head == dd->ipath_sdma_descq_cnt)
+			dd->ipath_sdma_descq_head = 0;
+
+		if (txp && txp->next_descq_idx == dd->ipath_sdma_descq_head) {
+			/* move to notify list */
+			if (txp->flags & IPATH_SDMA_TXREQ_F_VL15)
+				vl15_watchdog_deq(dd);
+			list_move_tail(lp, &dd->ipath_sdma_notifylist);
+			if (!list_empty(&dd->ipath_sdma_activelist)) {
+				lp = dd->ipath_sdma_activelist.next;
+				txp = list_entry(lp, struct ipath_sdma_txreq,
+						 list);
+				start_idx = txp->start_idx;
+			} else {
+				lp = NULL;
+				txp = NULL;
+			}
+		}
+		progress = 1;
+	}
+
+	if (progress)
+		tasklet_hi_schedule(&dd->ipath_sdma_notify_task);
+
+done:
+	return progress;
+}
+
+static void ipath_sdma_notify(struct ipath_devdata *dd, struct list_head *list)
+{
+	struct ipath_sdma_txreq *txp, *txp_next;
+
+	list_for_each_entry_safe(txp, txp_next, list, list) {
+		list_del_init(&txp->list);
+
+		if (txp->callback)
+			(*txp->callback)(txp->callback_cookie,
+					 txp->callback_status);
+	}
+}
+
+static void sdma_notify_taskbody(struct ipath_devdata *dd)
+{
+	unsigned long flags;
+	struct list_head list;
+
+	INIT_LIST_HEAD(&list);
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+
+	list_splice_init(&dd->ipath_sdma_notifylist, &list);
+
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	ipath_sdma_notify(dd, &list);
+
+	/*
+	 * The IB verbs layer needs to see the callback before getting
+	 * the call to ipath_ib_piobufavail() because the callback
+	 * handles releasing resources the next send will need.
+	 * Otherwise, we could do these calls in
+	 * ipath_sdma_make_progress().
+	 */
+	ipath_ib_piobufavail(dd->verbs_dev);
+}
+
+static void sdma_notify_task(unsigned long opaque)
+{
+	struct ipath_devdata *dd = (struct ipath_devdata *)opaque;
+
+	if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+		sdma_notify_taskbody(dd);
+}
+
+static void dump_sdma_state(struct ipath_devdata *dd)
+{
+	unsigned long reg;
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmastatus);
+	ipath_cdbg(VERBOSE, "kr_senddmastatus: 0x%016lx\n", reg);
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_sendctrl);
+	ipath_cdbg(VERBOSE, "kr_sendctrl: 0x%016lx\n", reg);
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmabufmask0);
+	ipath_cdbg(VERBOSE, "kr_senddmabufmask0: 0x%016lx\n", reg);
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmabufmask1);
+	ipath_cdbg(VERBOSE, "kr_senddmabufmask1: 0x%016lx\n", reg);
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmabufmask2);
+	ipath_cdbg(VERBOSE, "kr_senddmabufmask2: 0x%016lx\n", reg);
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmatail);
+	ipath_cdbg(VERBOSE, "kr_senddmatail: 0x%016lx\n", reg);
+
+	reg = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmahead);
+	ipath_cdbg(VERBOSE, "kr_senddmahead: 0x%016lx\n", reg);
+}
+
+static void sdma_abort_task(unsigned long opaque)
+{
+	struct ipath_devdata *dd = (struct ipath_devdata *) opaque;
+	int kick = 0;
+	u64 status;
+	unsigned long flags;
+
+	if (test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+		return;
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+
+	status = dd->ipath_sdma_status & IPATH_SDMA_ABORT_MASK;
+
+	/* nothing to do */
+	if (status == IPATH_SDMA_ABORT_NONE)
+		goto unlock;
+
+	/* ipath_sdma_abort() is done, waiting for interrupt */
+	if (status == IPATH_SDMA_ABORT_DISARMED) {
+		if (jiffies < dd->ipath_sdma_abort_intr_timeout)
+			goto resched_noprint;
+		/* give up, intr got lost somewhere */
+		ipath_dbg("give up waiting for SDMADISABLED intr\n");
+		__set_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status);
+		status = IPATH_SDMA_ABORT_ABORTED;
+	}
+
+	/* everything is stopped, time to clean up and restart */
+	if (status == IPATH_SDMA_ABORT_ABORTED) {
+		struct ipath_sdma_txreq *txp, *txpnext;
+		u64 hwstatus;
+		int notify = 0;
+
+		hwstatus = ipath_read_kreg64(dd,
+				dd->ipath_kregs->kr_senddmastatus);
+
+		if (/* ScoreBoardDrainInProg */
+		    test_bit(63, &hwstatus) ||
+		    /* AbortInProg */
+		    test_bit(62, &hwstatus) ||
+		    /* InternalSDmaEnable */
+		    test_bit(61, &hwstatus) ||
+		    /* ScbEmpty */
+		    !test_bit(30, &hwstatus)) {
+			if (dd->ipath_sdma_reset_wait > 0) {
+				/* not done shutting down sdma */
+				--dd->ipath_sdma_reset_wait;
+				goto resched;
+			}
+			ipath_cdbg(VERBOSE, "gave up waiting for quiescent "
+				"status after SDMA reset, continuing\n");
+			dump_sdma_state(dd);
+		}
+
+		/* dequeue all "sent" requests */
+		list_for_each_entry_safe(txp, txpnext,
+					 &dd->ipath_sdma_activelist, list) {
+			txp->callback_status = IPATH_SDMA_TXREQ_S_ABORTED;
+			if (txp->flags & IPATH_SDMA_TXREQ_F_VL15)
+				vl15_watchdog_deq(dd);
+			list_move_tail(&txp->list, &dd->ipath_sdma_notifylist);
+			notify = 1;
+		}
+		if (notify)
+			tasklet_hi_schedule(&dd->ipath_sdma_notify_task);
+
+		/* reset our notion of head and tail */
+		dd->ipath_sdma_descq_tail = 0;
+		dd->ipath_sdma_descq_head = 0;
+		dd->ipath_sdma_head_dma[0] = 0;
+		dd->ipath_sdma_generation = 0;
+		dd->ipath_sdma_descq_removed = dd->ipath_sdma_descq_added;
+
+		/* Reset SendDmaLenGen */
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmalengen,
+			(u64) dd->ipath_sdma_descq_cnt | (1ULL << 18));
+
+		/* done with sdma state for a bit */
+		spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+		/* restart sdma engine */
+		spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+		dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE;
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+				 dd->ipath_sendctrl);
+		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+		dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE;
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+				 dd->ipath_sendctrl);
+		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+		spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+		kick = 1;
+		ipath_dbg("sdma restarted from abort\n");
+
+		/* now clear status bits */
+		spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+		__clear_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status);
+		__clear_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status);
+		__clear_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status);
+
+		/* make sure I see next message */
+		dd->ipath_sdma_abort_jiffies = 0;
+
+		goto unlock;
+	}
+
+resched:
+	/*
+	 * for now, keep spinning
+	 * JAG - this is bad to just have default be a loop without
+	 * state change
+	 */
+	if (jiffies > dd->ipath_sdma_abort_jiffies) {
+		ipath_dbg("looping with status 0x%016llx\n",
+			  dd->ipath_sdma_status);
+		dd->ipath_sdma_abort_jiffies = jiffies + 5 * HZ;
+	}
+resched_noprint:
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+	if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+		tasklet_hi_schedule(&dd->ipath_sdma_abort_task);
+	return;
+
+unlock:
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	/* kick upper layers */
+	if (kick)
+		ipath_ib_piobufavail(dd->verbs_dev);
+}
+
+/*
+ * This is called from interrupt context.
+ */
+void ipath_sdma_intr(struct ipath_devdata *dd)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+
+	(void) ipath_sdma_make_progress(dd);
+
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+}
+
+static int alloc_sdma(struct ipath_devdata *dd)
+{
+	int ret = 0;
+
+	/* Allocate memory for SendDMA descriptor FIFO */
+	dd->ipath_sdma_descq = dma_alloc_coherent(&dd->pcidev->dev,
+		SDMA_DESCQ_SZ, &dd->ipath_sdma_descq_phys, GFP_KERNEL);
+
+	if (!dd->ipath_sdma_descq) {
+		ipath_dev_err(dd, "failed to allocate SendDMA descriptor "
+			"FIFO memory\n");
+		ret = -ENOMEM;
+		goto done;
+	}
+
+	dd->ipath_sdma_descq_cnt =
+		SDMA_DESCQ_SZ / sizeof(struct ipath_sdma_desc);
+
+	/* Allocate memory for DMA of head register to memory */
+	dd->ipath_sdma_head_dma = dma_alloc_coherent(&dd->pcidev->dev,
+		PAGE_SIZE, &dd->ipath_sdma_head_phys, GFP_KERNEL);
+	if (!dd->ipath_sdma_head_dma) {
+		ipath_dev_err(dd, "failed to allocate SendDMA head memory\n");
+		ret = -ENOMEM;
+		goto cleanup_descq;
+	}
+	dd->ipath_sdma_head_dma[0] = 0;
+
+	init_timer(&dd->ipath_sdma_vl15_timer);
+	dd->ipath_sdma_vl15_timer.function = vl15_watchdog_timeout;
+	dd->ipath_sdma_vl15_timer.data = (unsigned long)dd;
+	atomic_set(&dd->ipath_sdma_vl15_count, 0);
+
+	goto done;
+
+cleanup_descq:
+	dma_free_coherent(&dd->pcidev->dev, SDMA_DESCQ_SZ,
+		(void *)dd->ipath_sdma_descq, dd->ipath_sdma_descq_phys);
+	dd->ipath_sdma_descq = NULL;
+	dd->ipath_sdma_descq_phys = 0;
+done:
+	return ret;
+}
+
+int setup_sdma(struct ipath_devdata *dd)
+{
+	int ret = 0;
+	unsigned i, n;
+	u64 tmp64;
+	u64 senddmabufmask[3] = { 0 };
+	unsigned long flags;
+
+	ret = alloc_sdma(dd);
+	if (ret)
+		goto done;
+
+	if (!dd->ipath_sdma_descq) {
+		ipath_dev_err(dd, "SendDMA memory not allocated\n");
+		goto done;
+	}
+
+	dd->ipath_sdma_status = 0;
+	dd->ipath_sdma_abort_jiffies = 0;
+	dd->ipath_sdma_generation = 0;
+	dd->ipath_sdma_descq_tail = 0;
+	dd->ipath_sdma_descq_head = 0;
+	dd->ipath_sdma_descq_removed = 0;
+	dd->ipath_sdma_descq_added = 0;
+
+	/* Set SendDmaBase */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabase,
+			 dd->ipath_sdma_descq_phys);
+	/* Set SendDmaLenGen */
+	tmp64 = dd->ipath_sdma_descq_cnt;
+	tmp64 |= 1<<18; /* enable generation checking */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmalengen, tmp64);
+	/* Set SendDmaTail */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail,
+			 dd->ipath_sdma_descq_tail);
+	/* Set SendDmaHeadAddr */
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmaheadaddr,
+			 dd->ipath_sdma_head_phys);
+
+	/* Reserve all the former "kernel" piobufs */
+	n = dd->ipath_piobcnt2k + dd->ipath_piobcnt4k - dd->ipath_pioreserved;
+	for (i = dd->ipath_lastport_piobuf; i < n; ++i) {
+		unsigned word = i / 64;
+		unsigned bit = i & 63;
+		BUG_ON(word >= 3);
+		senddmabufmask[word] |= 1ULL << bit;
+	}
+	ipath_chg_pioavailkernel(dd, dd->ipath_lastport_piobuf,
+		n - dd->ipath_lastport_piobuf, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask0,
+			 senddmabufmask[0]);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask1,
+			 senddmabufmask[1]);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask2,
+			 senddmabufmask[2]);
+
+	INIT_LIST_HEAD(&dd->ipath_sdma_activelist);
+	INIT_LIST_HEAD(&dd->ipath_sdma_notifylist);
+
+	tasklet_init(&dd->ipath_sdma_notify_task, sdma_notify_task,
+		     (unsigned long) dd);
+	tasklet_init(&dd->ipath_sdma_abort_task, sdma_abort_task,
+		     (unsigned long) dd);
+
+	/* Turn on SDMA */
+	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+	dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE |
+		INFINIPATH_S_SDMAINTENABLE;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	__set_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status);
+	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+
+done:
+	return ret;
+}
+
+void teardown_sdma(struct ipath_devdata *dd)
+{
+	struct ipath_sdma_txreq *txp, *txpnext;
+	unsigned long flags;
+	dma_addr_t sdma_head_phys = 0;
+	dma_addr_t sdma_descq_phys = 0;
+	void *sdma_descq = NULL;
+	void *sdma_head_dma = NULL;
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+	__clear_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status);
+	__set_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status);
+	__set_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status);
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	tasklet_kill(&dd->ipath_sdma_abort_task);
+	tasklet_kill(&dd->ipath_sdma_notify_task);
+
+	/* turn off sdma */
+	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+	dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+		dd->ipath_sendctrl);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+	/* dequeue all "sent" requests */
+	list_for_each_entry_safe(txp, txpnext, &dd->ipath_sdma_activelist,
+				 list) {
+		txp->callback_status = IPATH_SDMA_TXREQ_S_SHUTDOWN;
+		if (txp->flags & IPATH_SDMA_TXREQ_F_VL15)
+			vl15_watchdog_deq(dd);
+		list_move_tail(&txp->list, &dd->ipath_sdma_notifylist);
+	}
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	sdma_notify_taskbody(dd);
+
+	del_timer_sync(&dd->ipath_sdma_vl15_timer);
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+
+	dd->ipath_sdma_abort_jiffies = 0;
+
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabase, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmalengen, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmaheadaddr, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask0, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask1, 0);
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmabufmask2, 0);
+
+	if (dd->ipath_sdma_head_dma) {
+		sdma_head_dma = (void *) dd->ipath_sdma_head_dma;
+		sdma_head_phys = dd->ipath_sdma_head_phys;
+		dd->ipath_sdma_head_dma = NULL;
+		dd->ipath_sdma_head_phys = 0;
+	}
+
+	if (dd->ipath_sdma_descq) {
+		sdma_descq = dd->ipath_sdma_descq;
+		sdma_descq_phys = dd->ipath_sdma_descq_phys;
+		dd->ipath_sdma_descq = NULL;
+		dd->ipath_sdma_descq_phys = 0;
+	}
+
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	if (sdma_head_dma)
+		dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
+				  sdma_head_dma, sdma_head_phys);
+
+	if (sdma_descq)
+		dma_free_coherent(&dd->pcidev->dev, SDMA_DESCQ_SZ,
+				  sdma_descq, sdma_descq_phys);
+}
+
+static inline void make_sdma_desc(struct ipath_devdata *dd,
+	u64 *sdmadesc, u64 addr, u64 dwlen, u64 dwoffset)
+{
+	WARN_ON(addr & 3);
+	/* SDmaPhyAddr[47:32] */
+	sdmadesc[1] = addr >> 32;
+	/* SDmaPhyAddr[31:0] */
+	sdmadesc[0] = (addr & 0xfffffffcULL) << 32;
+	/* SDmaGeneration[1:0] */
+	sdmadesc[0] |= (dd->ipath_sdma_generation & 3ULL) << 30;
+	/* SDmaDwordCount[10:0] */
+	sdmadesc[0] |= (dwlen & 0x7ffULL) << 16;
+	/* SDmaBufOffset[12:2] */
+	sdmadesc[0] |= dwoffset & 0x7ffULL;
+}
+
+/*
+ * This function queues one IB packet onto the send DMA queue per call.
+ * The caller is responsible for checking:
+ * 1) The number of send DMA descriptor entries is less than the size of
+ *    the descriptor queue.
+ * 2) The IB SGE addresses and lengths are 32-bit aligned
+ *    (except possibly the last SGE's length)
+ * 3) The SGE addresses are suitable for passing to dma_map_single().
+ */
+int ipath_sdma_verbs_send(struct ipath_devdata *dd,
+	struct ipath_sge_state *ss, u32 dwords,
+	struct ipath_verbs_txreq *tx)
+{
+
+	unsigned long flags;
+	struct ipath_sge *sge;
+	int ret = 0;
+	u16 tail;
+	__le64 *descqp;
+	u64 sdmadesc[2];
+	u32 dwoffset;
+	dma_addr_t addr;
+
+	if ((tx->map_len + (dwords<<2)) > dd->ipath_ibmaxlen) {
+		ipath_dbg("packet size %X > ibmax %X, fail\n",
+			tx->map_len + (dwords<<2), dd->ipath_ibmaxlen);
+		ret = -EMSGSIZE;
+		goto fail;
+	}
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+
+retry:
+	if (unlikely(test_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status))) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
+	if (tx->txreq.sg_count > ipath_sdma_descq_freecnt(dd)) {
+		if (ipath_sdma_make_progress(dd))
+			goto retry;
+		ret = -ENOBUFS;
+		goto unlock;
+	}
+
+	addr = dma_map_single(&dd->pcidev->dev, tx->txreq.map_addr,
+			      tx->map_len, DMA_TO_DEVICE);
+	if (dma_mapping_error(addr)) {
+		ret = -EIO;
+		goto unlock;
+	}
+
+	dwoffset = tx->map_len >> 2;
+	make_sdma_desc(dd, sdmadesc, (u64) addr, dwoffset, 0);
+
+	/* SDmaFirstDesc */
+	sdmadesc[0] |= 1ULL << 12;
+	if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_USELARGEBUF)
+		sdmadesc[0] |= 1ULL << 14;	/* SDmaUseLargeBuf */
+
+	/* write to the descq */
+	tail = dd->ipath_sdma_descq_tail;
+	descqp = &dd->ipath_sdma_descq[tail].qw[0];
+	*descqp++ = cpu_to_le64(sdmadesc[0]);
+	*descqp++ = cpu_to_le64(sdmadesc[1]);
+
+	if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_FREEDESC)
+		tx->txreq.start_idx = tail;
+
+	/* increment the tail */
+	if (++tail == dd->ipath_sdma_descq_cnt) {
+		tail = 0;
+		descqp = &dd->ipath_sdma_descq[0].qw[0];
+		++dd->ipath_sdma_generation;
+	}
+
+	sge = &ss->sge;
+	while (dwords) {
+		u32 dw;
+		u32 len;
+
+		len = dwords << 2;
+		if (len > sge->length)
+			len = sge->length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		BUG_ON(len == 0);
+		dw = (len + 3) >> 2;
+		addr = dma_map_single(&dd->pcidev->dev, sge->vaddr, dw << 2,
+				      DMA_TO_DEVICE);
+		make_sdma_desc(dd, sdmadesc, (u64) addr, dw, dwoffset);
+		/* SDmaUseLargeBuf has to be set in every descriptor */
+		if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_USELARGEBUF)
+			sdmadesc[0] |= 1ULL << 14;
+		/* write to the descq */
+		*descqp++ = cpu_to_le64(sdmadesc[0]);
+		*descqp++ = cpu_to_le64(sdmadesc[1]);
+
+		/* increment the tail */
+		if (++tail == dd->ipath_sdma_descq_cnt) {
+			tail = 0;
+			descqp = &dd->ipath_sdma_descq[0].qw[0];
+			++dd->ipath_sdma_generation;
+		}
+		sge->vaddr += len;
+		sge->length -= len;
+		sge->sge_length -= len;
+		if (sge->sge_length == 0) {
+			if (--ss->num_sge)
+				*sge = *ss->sg_list++;
+		} else if (sge->length == 0 && sge->mr != NULL) {
+			if (++sge->n >= IPATH_SEGSZ) {
+				if (++sge->m >= sge->mr->mapsz)
+					break;
+				sge->n = 0;
+			}
+			sge->vaddr =
+				sge->mr->map[sge->m]->segs[sge->n].vaddr;
+			sge->length =
+				sge->mr->map[sge->m]->segs[sge->n].length;
+		}
+
+		dwoffset += dw;
+		dwords -= dw;
+	}
+
+	if (!tail)
+		descqp = &dd->ipath_sdma_descq[dd->ipath_sdma_descq_cnt].qw[0];
+	descqp -= 2;
+	/* SDmaLastDesc */
+	descqp[0] |= __constant_cpu_to_le64(1ULL << 11);
+	if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_INTREQ) {
+		/* SDmaIntReq */
+		descqp[0] |= __constant_cpu_to_le64(1ULL << 15);
+	}
+
+	/* Commit writes to memory and advance the tail on the chip */
+	wmb();
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, tail);
+
+	tx->txreq.next_descq_idx = tail;
+	tx->txreq.callback_status = IPATH_SDMA_TXREQ_S_OK;
+	dd->ipath_sdma_descq_tail = tail;
+	dd->ipath_sdma_descq_added += tx->txreq.sg_count;
+	list_add_tail(&tx->txreq.list, &dd->ipath_sdma_activelist);
+	if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_VL15)
+		vl15_watchdog_enq(dd);
+
+unlock:
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+fail:
+	return ret;
+}


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:23 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:23 -0700
Subject: [ofa-general] [PATCH 16/20] IB/ipath - user mode send DMA header
	file
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225023.28598.53701.stgit@eng-46.mv.qlogic.com>

From: Arthur Jones <arthur.jones at qlogic.com>

A new header file which allows the iba7220 send DMA engine
to be used from userland.  The definitions here are not
used yet, that will happen in a follow-on patch...

Signed-off-by: Arthur Jones <arthur.jones at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_user_sdma.h |   56 +++++++++++++++++++++++++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_user_sdma.h b/drivers/infiniband/hw/ipath/ipath_user_sdma.h
new file mode 100644
index 0000000..ce0448f
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_user_sdma.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/device.h>
+
+struct ipath_user_sdma_queue;
+
+struct ipath_user_sdma_queue *
+ipath_user_sdma_queue_create(struct device *dev, int unit, int port, int sport);
+void ipath_user_sdma_queue_destroy(struct ipath_user_sdma_queue *pq);
+
+int ipath_user_sdma_writev(struct ipath_devdata *dd,
+			   struct ipath_user_sdma_queue *pq,
+			   const struct iovec *iov,
+			   unsigned long dim);
+
+int ipath_user_sdma_make_progress(struct ipath_devdata *dd,
+				  struct ipath_user_sdma_queue *pq);
+
+int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq,
+			     u32 counter);
+void ipath_user_sdma_queue_drain(struct ipath_devdata *dd,
+				 struct ipath_user_sdma_queue *pq);
+
+u32 ipath_user_sdma_complete_counter(const struct ipath_user_sdma_queue *pq);
+void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq,
+					  u32 c);
+u32 ipath_user_sdma_inflight_counter(struct ipath_user_sdma_queue *pq);


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:28 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:28 -0700
Subject: [ofa-general] [PATCH 17/20] IB/ipath - user mode send DMA
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com>

From: Arthur Jones <arthur.jones at qlogic.com>

A new file which allows the iba7220 send DMA engine
to be used from userland.  The routines here are not
linked in yet, that will happen in a follow-on patch...

Signed-off-by: Arthur Jones <arthur.jones at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_user_sdma.c |  888 +++++++++++++++++++++++++
 1 files changed, 888 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_user_sdma.c b/drivers/infiniband/hw/ipath/ipath_user_sdma.c
new file mode 100644
index 0000000..44020c8
--- /dev/null
+++ b/drivers/infiniband/hw/ipath/ipath_user_sdma.c
@@ -0,0 +1,888 @@
+/*
+ * Copyright (c) 2007, 2008 QLogic Corporation. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/mm.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/dmapool.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/uio.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock.h>
+#include <linux/delay.h>
+
+#include "ipath_kernel.h"
+#include "ipath_user_sdma.h"
+
+/* minimum size of header */
+#define IPATH_USER_SDMA_MIN_HEADER_LENGTH	64
+/* expected size of headers (for dma_pool) */
+#define IPATH_USER_SDMA_EXP_HEADER_LENGTH	64
+/* length mask in PBC (lower 11 bits) */
+#define IPATH_PBC_LENGTH_MASK			((1 << 11) - 1)
+
+struct ipath_user_sdma_pkt {
+	u8 naddr;		/* dimension of addr (1..3) ... */
+	u32 counter;		/* sdma pkts queued counter for this entry */
+	u64 added;		/* global descq number of entries */
+
+	struct {
+		u32 offset;			/* offset for kvaddr, addr */
+		u32 length;			/* length in page */
+		u8  put_page;			/* should we put_page? */
+		u8  dma_mapped;			/* is page dma_mapped? */
+		struct page *page;		/* may be NULL (coherent mem) */
+		void *kvaddr;			/* FIXME: only for pio hack */
+		dma_addr_t addr;
+	} addr[4];   /* max pages, any more and we coalesce */
+	struct list_head list;	/* list element */
+};
+
+struct ipath_user_sdma_queue {
+	/*
+	 * pkts sent to dma engine are queued on this
+	 * list head.  the type of the elements of this
+	 * list are struct ipath_user_sdma_pkt...
+	 */
+	struct list_head sent;
+
+	/* headers with expected length are allocated from here... */
+	char header_cache_name[64];
+	struct dma_pool *header_cache;
+
+	/* packets are allocated from the slab cache... */
+	char pkt_slab_name[64];
+	struct kmem_cache *pkt_slab;
+
+	/* as packets go on the queued queue, they are counted... */
+	u32 counter;
+	u32 sent_counter;
+
+	/* dma page table */
+	struct rb_root dma_pages_root;
+
+	/* protect everything above... */
+	struct mutex lock;
+};
+
+struct ipath_user_sdma_queue *
+ipath_user_sdma_queue_create(struct device *dev, int unit, int port, int sport)
+{
+	struct ipath_user_sdma_queue *pq =
+		kmalloc(sizeof(struct ipath_user_sdma_queue), GFP_KERNEL);
+
+	if (!pq)
+		goto done;
+
+	pq->counter = 0;
+	pq->sent_counter = 0;
+	INIT_LIST_HEAD(&pq->sent);
+
+	mutex_init(&pq->lock);
+
+	snprintf(pq->pkt_slab_name, sizeof(pq->pkt_slab_name),
+		 "ipath-user-sdma-pkts-%u-%02u.%02u", unit, port, sport);
+	pq->pkt_slab = kmem_cache_create(pq->pkt_slab_name,
+					 sizeof(struct ipath_user_sdma_pkt),
+					 0, 0, NULL);
+
+	if (!pq->pkt_slab)
+		goto err_kfree;
+
+	snprintf(pq->header_cache_name, sizeof(pq->header_cache_name),
+		 "ipath-user-sdma-headers-%u-%02u.%02u", unit, port, sport);
+	pq->header_cache = dma_pool_create(pq->header_cache_name,
+					   dev,
+					   IPATH_USER_SDMA_EXP_HEADER_LENGTH,
+					   4, 0);
+	if (!pq->header_cache)
+		goto err_slab;
+
+	pq->dma_pages_root = RB_ROOT;
+
+	goto done;
+
+err_slab:
+	kmem_cache_destroy(pq->pkt_slab);
+err_kfree:
+	kfree(pq);
+	pq = NULL;
+
+done:
+	return pq;
+}
+
+static void ipath_user_sdma_init_frag(struct ipath_user_sdma_pkt *pkt,
+				      int i, size_t offset, size_t len,
+				      int put_page, int dma_mapped,
+				      struct page *page,
+				      void *kvaddr, dma_addr_t dma_addr)
+{
+	pkt->addr[i].offset = offset;
+	pkt->addr[i].length = len;
+	pkt->addr[i].put_page = put_page;
+	pkt->addr[i].dma_mapped = dma_mapped;
+	pkt->addr[i].page = page;
+	pkt->addr[i].kvaddr = kvaddr;
+	pkt->addr[i].addr = dma_addr;
+}
+
+static void ipath_user_sdma_init_header(struct ipath_user_sdma_pkt *pkt,
+					u32 counter, size_t offset,
+					size_t len, int dma_mapped,
+					struct page *page,
+					void *kvaddr, dma_addr_t dma_addr)
+{
+	pkt->naddr = 1;
+	pkt->counter = counter;
+	ipath_user_sdma_init_frag(pkt, 0, offset, len, 0, dma_mapped, page,
+				  kvaddr, dma_addr);
+}
+
+/* we've too many pages in the iovec, coalesce to a single page */
+static int ipath_user_sdma_coalesce(const struct ipath_devdata *dd,
+				    struct ipath_user_sdma_pkt *pkt,
+				    const struct iovec *iov,
+				    unsigned long niov) {
+	int ret = 0;
+	struct page *page = alloc_page(GFP_KERNEL);
+	void *mpage_save;
+	char *mpage;
+	int i;
+	int len = 0;
+	dma_addr_t dma_addr;
+
+	if (!page) {
+		ret = -ENOMEM;
+		goto done;
+	}
+
+	mpage = kmap(page);
+	mpage_save = mpage;
+	for (i = 0; i < niov; i++) {
+		int cfur;
+
+		cfur = copy_from_user(mpage,
+				      iov[i].iov_base, iov[i].iov_len);
+		if (cfur) {
+			ret = -EFAULT;
+			goto free_unmap;
+		}
+
+		mpage += iov[i].iov_len;
+		len += iov[i].iov_len;
+	}
+
+	dma_addr = dma_map_page(&dd->pcidev->dev, page, 0, len,
+				DMA_TO_DEVICE);
+	if (dma_mapping_error(dma_addr)) {
+		ret = -ENOMEM;
+		goto free_unmap;
+	}
+
+	ipath_user_sdma_init_frag(pkt, 1, 0, len, 0, 1, page, mpage_save,
+				  dma_addr);
+	pkt->naddr = 2;
+
+	goto done;
+
+free_unmap:
+	kunmap(page);
+	__free_page(page);
+done:
+	return ret;
+}
+
+/* how many pages in this iovec element? */
+static int ipath_user_sdma_num_pages(const struct iovec *iov)
+{
+	const unsigned long addr  = (unsigned long) iov->iov_base;
+	const unsigned long  len  = iov->iov_len;
+	const unsigned long spage = addr & PAGE_MASK;
+	const unsigned long epage = (addr + len - 1) & PAGE_MASK;
+
+	return 1 + ((epage - spage) >> PAGE_SHIFT);
+}
+
+/* truncate length to page boundry */
+static int ipath_user_sdma_page_length(unsigned long addr, unsigned long len)
+{
+	const unsigned long offset = addr & ~PAGE_MASK;
+
+	return ((offset + len) > PAGE_SIZE) ? (PAGE_SIZE - offset) : len;
+}
+
+static void ipath_user_sdma_free_pkt_frag(struct device *dev,
+					  struct ipath_user_sdma_queue *pq,
+					  struct ipath_user_sdma_pkt *pkt,
+					  int frag)
+{
+	const int i = frag;
+
+	if (pkt->addr[i].page) {
+		if (pkt->addr[i].dma_mapped)
+			dma_unmap_page(dev,
+				       pkt->addr[i].addr,
+				       pkt->addr[i].length,
+				       DMA_TO_DEVICE);
+
+		if (pkt->addr[i].kvaddr)
+			kunmap(pkt->addr[i].page);
+
+		if (pkt->addr[i].put_page)
+			put_page(pkt->addr[i].page);
+		else
+			__free_page(pkt->addr[i].page);
+	} else if (pkt->addr[i].kvaddr)
+		/* free coherent mem from cache... */
+		dma_pool_free(pq->header_cache,
+			      pkt->addr[i].kvaddr, pkt->addr[i].addr);
+}
+
+/* return number of pages pinned... */
+static int ipath_user_sdma_pin_pages(const struct ipath_devdata *dd,
+				     struct ipath_user_sdma_pkt *pkt,
+				     unsigned long addr, int tlen, int npages)
+{
+	struct page *pages[2];
+	int j;
+	int ret;
+
+	ret = get_user_pages(current, current->mm, addr,
+			     npages, 0, 1, pages, NULL);
+
+	if (ret != npages) {
+		int i;
+
+		for (i = 0; i < ret; i++)
+			put_page(pages[i]);
+
+		ret = -ENOMEM;
+		goto done;
+	}
+
+	for (j = 0; j < npages; j++) {
+		/* map the pages... */
+		const int flen =
+			ipath_user_sdma_page_length(addr, tlen);
+		dma_addr_t dma_addr =
+			dma_map_page(&dd->pcidev->dev,
+				     pages[j], 0, flen, DMA_TO_DEVICE);
+		unsigned long fofs = addr & ~PAGE_MASK;
+
+		if (dma_mapping_error(dma_addr)) {
+			ret = -ENOMEM;
+			goto done;
+		}
+
+		ipath_user_sdma_init_frag(pkt, pkt->naddr, fofs, flen, 1, 1,
+					  pages[j], kmap(pages[j]),
+					  dma_addr);
+
+		pkt->naddr++;
+		addr += flen;
+		tlen -= flen;
+	}
+
+done:
+	return ret;
+}
+
+static int ipath_user_sdma_pin_pkt(const struct ipath_devdata *dd,
+				   struct ipath_user_sdma_queue *pq,
+				   struct ipath_user_sdma_pkt *pkt,
+				   const struct iovec *iov,
+				   unsigned long niov)
+{
+	int ret = 0;
+	unsigned long idx;
+
+	for (idx = 0; idx < niov; idx++) {
+		const int npages = ipath_user_sdma_num_pages(iov + idx);
+		const unsigned long addr = (unsigned long) iov[idx].iov_base;
+
+		ret = ipath_user_sdma_pin_pages(dd, pkt,
+						addr, iov[idx].iov_len,
+						npages);
+		if (ret < 0)
+			goto free_pkt;
+	}
+
+	goto done;
+
+free_pkt:
+	for (idx = 0; idx < pkt->naddr; idx++)
+		ipath_user_sdma_free_pkt_frag(&dd->pcidev->dev, pq, pkt, idx);
+
+done:
+	return ret;
+}
+
+static int ipath_user_sdma_init_payload(const struct ipath_devdata *dd,
+					struct ipath_user_sdma_queue *pq,
+					struct ipath_user_sdma_pkt *pkt,
+					const struct iovec *iov,
+					unsigned long niov, int npages)
+{
+	int ret = 0;
+
+	if (npages >= ARRAY_SIZE(pkt->addr))
+		ret = ipath_user_sdma_coalesce(dd, pkt, iov, niov);
+	else
+		ret = ipath_user_sdma_pin_pkt(dd, pq, pkt, iov, niov);
+
+	return ret;
+}
+
+/* free a packet list -- return counter value of last packet */
+static void ipath_user_sdma_free_pkt_list(struct device *dev,
+					  struct ipath_user_sdma_queue *pq,
+					  struct list_head *list)
+{
+	struct ipath_user_sdma_pkt *pkt, *pkt_next;
+
+	list_for_each_entry_safe(pkt, pkt_next, list, list) {
+		int i;
+
+		for (i = 0; i < pkt->naddr; i++)
+			ipath_user_sdma_free_pkt_frag(dev, pq, pkt, i);
+
+		kmem_cache_free(pq->pkt_slab, pkt);
+	}
+}
+
+/*
+ * copy headers, coalesce etc -- pq->lock must be held
+ *
+ * we queue all the packets to list, returning the
+ * number of bytes total.  list must be empty initially,
+ * as, if there is an error we clean it...
+ */
+static int ipath_user_sdma_queue_pkts(const struct ipath_devdata *dd,
+				      struct ipath_user_sdma_queue *pq,
+				      struct list_head *list,
+				      const struct iovec *iov,
+				      unsigned long niov,
+				      int maxpkts)
+{
+	unsigned long idx = 0;
+	int ret = 0;
+	int npkts = 0;
+	struct page *page = NULL;
+	__le32 *pbc;
+	dma_addr_t dma_addr;
+	struct ipath_user_sdma_pkt *pkt = NULL;
+	size_t len;
+	size_t nw;
+	u32 counter = pq->counter;
+	int dma_mapped = 0;
+
+	while (idx < niov && npkts < maxpkts) {
+		const unsigned long addr = (unsigned long) iov[idx].iov_base;
+		const unsigned long idx_save = idx;
+		unsigned pktnw;
+		unsigned pktnwc;
+		int nfrags = 0;
+		int npages = 0;
+		int cfur;
+
+		dma_mapped = 0;
+		len = iov[idx].iov_len;
+		nw = len >> 2;
+		page = NULL;
+
+		pkt = kmem_cache_alloc(pq->pkt_slab, GFP_KERNEL);
+		if (!pkt) {
+			ret = -ENOMEM;
+			goto free_list;
+		}
+
+		if (len < IPATH_USER_SDMA_MIN_HEADER_LENGTH ||
+		    len > PAGE_SIZE || len & 3 || addr & 3) {
+			ret = -EINVAL;
+			goto free_pkt;
+		}
+
+		if (len == IPATH_USER_SDMA_EXP_HEADER_LENGTH)
+			pbc = dma_pool_alloc(pq->header_cache, GFP_KERNEL,
+					     &dma_addr);
+		else
+			pbc = NULL;
+
+		if (!pbc) {
+			page = alloc_page(GFP_KERNEL);
+			if (!page) {
+				ret = -ENOMEM;
+				goto free_pkt;
+			}
+			pbc = kmap(page);
+		}
+
+		cfur = copy_from_user(pbc, iov[idx].iov_base, len);
+		if (cfur) {
+			ret = -EFAULT;
+			goto free_pbc;
+		}
+
+		/*
+		 * this assignment is a bit strange.  it's because the
+		 * the pbc counts the number of 32 bit words in the full
+		 * packet _except_ the first word of the pbc itself...
+		 */
+		pktnwc = nw - 1;
+
+		/*
+		 * pktnw computation yields the number of 32 bit words
+		 * that the caller has indicated in the PBC.  note that
+		 * this is one less than the total number of words that
+		 * goes to the send DMA engine as the first 32 bit word
+		 * of the PBC itself is not counted.  Armed with this count,
+		 * we can verify that the packet is consistent with the
+		 * iovec lengths.
+		 */
+		pktnw = le32_to_cpu(*pbc) & IPATH_PBC_LENGTH_MASK;
+		if (pktnw < pktnwc || pktnw > pktnwc + (PAGE_SIZE >> 2)) {
+			ret = -EINVAL;
+			goto free_pbc;
+		}
+
+
+		idx++;
+		while (pktnwc < pktnw && idx < niov) {
+			const size_t slen = iov[idx].iov_len;
+			const unsigned long faddr =
+				(unsigned long) iov[idx].iov_base;
+
+			if (slen & 3 || faddr & 3 || !slen ||
+			    slen > PAGE_SIZE) {
+				ret = -EINVAL;
+				goto free_pbc;
+			}
+
+			npages++;
+			if ((faddr & PAGE_MASK) !=
+			    ((faddr + slen - 1) & PAGE_MASK))
+				npages++;
+
+			pktnwc += slen >> 2;
+			idx++;
+			nfrags++;
+		}
+
+		if (pktnwc != pktnw) {
+			ret = -EINVAL;
+			goto free_pbc;
+		}
+
+		if (page) {
+			dma_addr = dma_map_page(&dd->pcidev->dev,
+						page, 0, len, DMA_TO_DEVICE);
+			if (dma_mapping_error(dma_addr)) {
+				ret = -ENOMEM;
+				goto free_pbc;
+			}
+
+			dma_mapped = 1;
+		}
+
+		ipath_user_sdma_init_header(pkt, counter, 0, len, dma_mapped,
+					    page, pbc, dma_addr);
+
+		if (nfrags) {
+			ret = ipath_user_sdma_init_payload(dd, pq, pkt,
+							   iov + idx_save + 1,
+							   nfrags, npages);
+			if (ret < 0)
+				goto free_pbc_dma;
+		}
+
+		counter++;
+		npkts++;
+
+		list_add_tail(&pkt->list, list);
+	}
+
+	ret = idx;
+	goto done;
+
+free_pbc_dma:
+	if (dma_mapped)
+		dma_unmap_page(&dd->pcidev->dev, dma_addr, len, DMA_TO_DEVICE);
+free_pbc:
+	if (page) {
+		kunmap(page);
+		__free_page(page);
+	} else
+		dma_pool_free(pq->header_cache, pbc, dma_addr);
+free_pkt:
+	kmem_cache_free(pq->pkt_slab, pkt);
+free_list:
+	ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, list);
+done:
+	return ret;
+}
+
+/* try to clean out queue -- needs pq->lock */
+static int ipath_user_sdma_queue_clean(const struct ipath_devdata *dd,
+				       struct ipath_user_sdma_queue *pq)
+{
+	struct list_head free_list;
+	struct ipath_user_sdma_pkt *pkt;
+	struct ipath_user_sdma_pkt *pkt_prev;
+	int ret = 0;
+
+	INIT_LIST_HEAD(&free_list);
+
+	list_for_each_entry_safe(pkt, pkt_prev, &pq->sent, list) {
+		s64 descd = dd->ipath_sdma_descq_removed - pkt->added;
+
+		if (descd < 0)
+			break;
+
+		list_move_tail(&pkt->list, &free_list);
+
+		/* one more packet cleaned */
+		ret++;
+	}
+
+	if (!list_empty(&free_list)) {
+		u32 counter;
+
+		pkt = list_entry(free_list.prev,
+				 struct ipath_user_sdma_pkt, list);
+		counter = pkt->counter;
+
+		ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, &free_list);
+		ipath_user_sdma_set_complete_counter(pq, counter);
+	}
+
+	return ret;
+}
+
+void ipath_user_sdma_queue_destroy(struct ipath_user_sdma_queue *pq)
+{
+	if (!pq)
+		return;
+
+	kmem_cache_destroy(pq->pkt_slab);
+	dma_pool_destroy(pq->header_cache);
+	kfree(pq);
+}
+
+/* clean descriptor queue, returns > 0 if some elements cleaned */
+static int ipath_user_sdma_hwqueue_clean(struct ipath_devdata *dd)
+{
+	int ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+	ret = ipath_sdma_make_progress(dd);
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	return ret;
+}
+
+/* we're in close, drain packets so that we can cleanup successfully... */
+void ipath_user_sdma_queue_drain(struct ipath_devdata *dd,
+				 struct ipath_user_sdma_queue *pq)
+{
+	int i;
+
+	if (!pq)
+		return;
+
+	for (i = 0; i < 100; i++) {
+		mutex_lock(&pq->lock);
+		if (list_empty(&pq->sent)) {
+			mutex_unlock(&pq->lock);
+			break;
+		}
+		ipath_user_sdma_hwqueue_clean(dd);
+		ipath_user_sdma_queue_clean(dd, pq);
+		mutex_unlock(&pq->lock);
+		msleep(10);
+	}
+
+	if (!list_empty(&pq->sent)) {
+		struct list_head free_list;
+
+		printk(KERN_INFO "drain: lists not empty: forcing!\n");
+		INIT_LIST_HEAD(&free_list);
+		mutex_lock(&pq->lock);
+		list_splice_init(&pq->sent, &free_list);
+		ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, &free_list);
+		mutex_unlock(&pq->lock);
+	}
+}
+
+static inline __le64 ipath_sdma_make_desc0(struct ipath_devdata *dd,
+					   u64 addr, u64 dwlen, u64 dwoffset)
+{
+	return cpu_to_le64(/* SDmaPhyAddr[31:0] */
+			   ((addr & 0xfffffffcULL) << 32) |
+			   /* SDmaGeneration[1:0] */
+			   ((dd->ipath_sdma_generation & 3ULL) << 30) |
+			   /* SDmaDwordCount[10:0] */
+			   ((dwlen & 0x7ffULL) << 16) |
+			   /* SDmaBufOffset[12:2] */
+			   (dwoffset & 0x7ffULL));
+}
+
+static inline __le64 ipath_sdma_make_first_desc0(__le64 descq)
+{
+	return descq | __constant_cpu_to_le64(1ULL << 12);
+}
+
+static inline __le64 ipath_sdma_make_last_desc0(__le64 descq)
+{
+					      /* last */  /* dma head */
+	return descq | __constant_cpu_to_le64(1ULL << 11 | 1ULL << 13);
+}
+
+static inline __le64 ipath_sdma_make_desc1(u64 addr)
+{
+	/* SDmaPhyAddr[47:32] */
+	return cpu_to_le64(addr >> 32);
+}
+
+static void ipath_user_sdma_send_frag(struct ipath_devdata *dd,
+				      struct ipath_user_sdma_pkt *pkt, int idx,
+				      unsigned ofs, u16 tail)
+{
+	const u64 addr = (u64) pkt->addr[idx].addr +
+		(u64) pkt->addr[idx].offset;
+	const u64 dwlen = (u64) pkt->addr[idx].length / 4;
+	__le64 *descqp;
+	__le64 descq0;
+
+	descqp = &dd->ipath_sdma_descq[tail].qw[0];
+
+	descq0 = ipath_sdma_make_desc0(dd, addr, dwlen, ofs);
+	if (idx == 0)
+		descq0 = ipath_sdma_make_first_desc0(descq0);
+	if (idx == pkt->naddr - 1)
+		descq0 = ipath_sdma_make_last_desc0(descq0);
+
+	descqp[0] = descq0;
+	descqp[1] = ipath_sdma_make_desc1(addr);
+}
+
+/* pq->lock must be held, get packets on the wire... */
+static int ipath_user_sdma_push_pkts(struct ipath_devdata *dd,
+				     struct ipath_user_sdma_queue *pq,
+				     struct list_head *pktlist)
+{
+	int ret = 0;
+	unsigned long flags;
+	u16 tail;
+
+	if (list_empty(pktlist))
+		return 0;
+
+	if (unlikely(!(dd->ipath_flags & IPATH_LINKACTIVE)))
+		return -ECOMM;
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+
+	if (unlikely(dd->ipath_sdma_status & IPATH_SDMA_ABORT_MASK)) {
+		ret = -ECOMM;
+		goto unlock;
+	}
+
+	tail = dd->ipath_sdma_descq_tail;
+	while (!list_empty(pktlist)) {
+		struct ipath_user_sdma_pkt *pkt =
+			list_entry(pktlist->next, struct ipath_user_sdma_pkt,
+				   list);
+		int i;
+		unsigned ofs = 0;
+		u16 dtail = tail;
+
+		if (pkt->naddr > ipath_sdma_descq_freecnt(dd))
+			goto unlock_check_tail;
+
+		for (i = 0; i < pkt->naddr; i++) {
+			ipath_user_sdma_send_frag(dd, pkt, i, ofs, tail);
+			ofs += pkt->addr[i].length >> 2;
+
+			if (++tail == dd->ipath_sdma_descq_cnt) {
+				tail = 0;
+				++dd->ipath_sdma_generation;
+			}
+		}
+
+		if ((ofs<<2) > dd->ipath_ibmaxlen) {
+			ipath_dbg("packet size %X > ibmax %X, fail\n",
+				ofs<<2, dd->ipath_ibmaxlen);
+			ret = -EMSGSIZE;
+			goto unlock;
+		}
+
+		/*
+		 * if the packet is >= 2KB mtu equivalent, we have to use
+		 * the large buffers, and have to mark each descriptor as
+		 * part of a large buffer packet.
+		 */
+		if (ofs >= IPATH_SMALLBUF_DWORDS) {
+			for (i = 0; i < pkt->naddr; i++) {
+				dd->ipath_sdma_descq[dtail].qw[0] |=
+					__constant_cpu_to_le64(1ULL << 14);
+				if (++dtail == dd->ipath_sdma_descq_cnt)
+					dtail = 0;
+			}
+		}
+
+		dd->ipath_sdma_descq_added += pkt->naddr;
+		pkt->added = dd->ipath_sdma_descq_added;
+		list_move_tail(&pkt->list, &pq->sent);
+		ret++;
+	}
+
+unlock_check_tail:
+	/* advance the tail on the chip if necessary */
+	if (dd->ipath_sdma_descq_tail != tail) {
+		wmb();
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_senddmatail, tail);
+		dd->ipath_sdma_descq_tail = tail;
+	}
+
+unlock:
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+
+	return ret;
+}
+
+int ipath_user_sdma_writev(struct ipath_devdata *dd,
+			   struct ipath_user_sdma_queue *pq,
+			   const struct iovec *iov,
+			   unsigned long dim)
+{
+	int ret = 0;
+	struct list_head list;
+	int npkts = 0;
+
+	INIT_LIST_HEAD(&list);
+
+	mutex_lock(&pq->lock);
+
+	if (dd->ipath_sdma_descq_added != dd->ipath_sdma_descq_removed) {
+		ipath_user_sdma_hwqueue_clean(dd);
+		ipath_user_sdma_queue_clean(dd, pq);
+	}
+
+	while (dim) {
+		const int mxp = 8;
+
+		down_write(&current->mm->mmap_sem);
+		ret = ipath_user_sdma_queue_pkts(dd, pq, &list, iov, dim, mxp);
+		up_write(&current->mm->mmap_sem);
+
+		if (ret <= 0)
+			goto done_unlock;
+		else {
+			dim -= ret;
+			iov += ret;
+		}
+
+		/* force packets onto the sdma hw queue... */
+		if (!list_empty(&list)) {
+			/*
+			 * lazily clean hw queue.  the 4 is a guess of about
+			 * how many sdma descriptors a packet will take (it
+			 * doesn't have to be perfect).
+			 */
+			if (ipath_sdma_descq_freecnt(dd) < ret * 4) {
+				ipath_user_sdma_hwqueue_clean(dd);
+				ipath_user_sdma_queue_clean(dd, pq);
+			}
+
+			ret = ipath_user_sdma_push_pkts(dd, pq, &list);
+			if (ret < 0)
+				goto done_unlock;
+			else {
+				npkts += ret;
+				pq->counter += ret;
+
+				if (!list_empty(&list))
+					goto done_unlock;
+			}
+		}
+	}
+
+done_unlock:
+	if (!list_empty(&list))
+		ipath_user_sdma_free_pkt_list(&dd->pcidev->dev, pq, &list);
+	mutex_unlock(&pq->lock);
+
+	return (ret < 0) ? ret : npkts;
+}
+
+int ipath_user_sdma_make_progress(struct ipath_devdata *dd,
+				  struct ipath_user_sdma_queue *pq)
+{
+	int ret = 0;
+
+	mutex_lock(&pq->lock);
+	ipath_user_sdma_hwqueue_clean(dd);
+	ret = ipath_user_sdma_queue_clean(dd, pq);
+	mutex_unlock(&pq->lock);
+
+	return ret;
+}
+
+int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq,
+			     u32 counter)
+{
+	const u32 scounter = ipath_user_sdma_complete_counter(pq);
+	const s32 dcounter = scounter - counter;
+
+	return dcounter >= 0;
+}
+
+u32 ipath_user_sdma_complete_counter(const struct ipath_user_sdma_queue *pq)
+{
+	return pq->sent_counter;
+}
+
+void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq,
+					  u32 c)
+{
+	pq->sent_counter = c;
+}
+
+u32 ipath_user_sdma_inflight_counter(struct ipath_user_sdma_queue *pq)
+{
+	return pq->counter;
+}
+


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:33 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:33 -0700
Subject: [ofa-general] [PATCH 18/20] IB/ipath - misc changes to prepare for
	iba7220 introduction
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225033.28598.56443.stgit@eng-46.mv.qlogic.com>

From: Arthur Jones <arthur.jones at qlogic.com>

The patch adds a number of minor changes to support newer HCAs
* New send buffer control bits
* New error condition bits
* Locking and initialization changes
* More send buffers

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_driver.c    |   61 ++++++++++++++++++++-----
 drivers/infiniband/hw/ipath/ipath_file_ops.c  |    2 -
 drivers/infiniband/hw/ipath/ipath_init_chip.c |   24 +++++++---
 drivers/infiniband/hw/ipath/ipath_intr.c      |   11 +++--
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    1 
 drivers/infiniband/hw/ipath/ipath_sysfs.c     |   18 ++++---
 6 files changed, 83 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index b4a69ef..66982a9 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -89,6 +89,10 @@ MODULE_LICENSE("GPL");
 MODULE_AUTHOR("QLogic <support at qlogic.com>");
 MODULE_DESCRIPTION("QLogic InfiniPath driver");
 
+/*
+ * Table to translate the LINKTRAININGSTATE portion of
+ * IBCStatus to a human-readable form.
+ */
 const char *ipath_ibcstatus_str[] = {
 	"Disabled",
 	"LinkUp",
@@ -103,9 +107,20 @@ const char *ipath_ibcstatus_str[] = {
 	"CfgWaitRmt",
 	"CfgIdle",
 	"RecovRetrain",
-	"LState0xD",		/* unused */
+	"CfgTxRevLane",		/* unused before IBA7220 */
 	"RecovWaitRmt",
 	"RecovIdle",
+	/* below were added for IBA7220 */
+	"CfgEnhanced",
+	"CfgTest",
+	"CfgWaitRmtTest",
+	"CfgWaitCfgEnhanced",
+	"SendTS_T",
+	"SendTstIdles",
+	"RcvTS_T",
+	"SendTst_TS1s",
+	"LTState18", "LTState19", "LTState1A", "LTState1B",
+	"LTState1C", "LTState1D", "LTState1E", "LTState1F"
 };
 
 static void __devexit ipath_remove_one(struct pci_dev *);
@@ -333,7 +348,14 @@ static void ipath_verify_pioperf(struct ipath_devdata *dd)
 
 	ipath_disable_armlaunch(dd);
 
-	writeq(0, piobuf); /* length 0, no dwords actually sent */
+	/*
+	 * length 0, no dwords actually sent, and mark as VL15
+	 * on chips where that may matter (due to IB flowcontrol)
+	 */
+	if ((dd->ipath_flags & IPATH_HAS_PBC_CNT))
+		writeq(1UL << 63, piobuf);
+	else
+		writeq(0, piobuf);
 	ipath_flush_wc();
 
 	/*
@@ -374,6 +396,7 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 	struct ipath_devdata *dd;
 	unsigned long long addr;
 	u32 bar0 = 0, bar1 = 0;
+	u8 rev;
 
 	dd = ipath_alloc_devdata(pdev);
 	if (IS_ERR(dd)) {
@@ -405,7 +428,7 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 	}
 	addr = pci_resource_start(pdev, 0);
 	len = pci_resource_len(pdev, 0);
-	ipath_cdbg(VERBOSE, "regbase (0) %llx len %d pdev->irq %d, vend %x/%x "
+	ipath_cdbg(VERBOSE, "regbase (0) %llx len %d irq %d, vend %x/%x "
 		   "driver_data %lx\n", addr, len, pdev->irq, ent->vendor,
 		   ent->device, ent->driver_data);
 
@@ -530,7 +553,13 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 		goto bail_regions;
 	}
 
-	dd->ipath_pcirev = pdev->revision;
+	ret = pci_read_config_byte(pdev, PCI_REVISION_ID, &rev);
+	if (ret) {
+		ipath_dev_err(dd, "Failed to read PCI revision ID unit "
+			      "%u: err %d\n", dd->ipath_unit, -ret);
+		goto bail_regions;	/* shouldn't ever happen */
+	}
+	dd->ipath_pcirev = rev;
 
 #if defined(__powerpc__)
 	/* There isn't a generic way to specify writethrough mappings */
@@ -553,14 +582,6 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 	ipath_cdbg(VERBOSE, "mapped io addr %llx to kregbase %p\n",
 		   addr, dd->ipath_kregbase);
 
-	/*
-	 * clear ipath_flags here instead of in ipath_init_chip as it is set
-	 * by ipath_setup_htconfig.
-	 */
-	dd->ipath_flags = 0;
-	dd->ipath_lli_counter = 0;
-	dd->ipath_lli_errors = 0;
-
 	if (dd->ipath_f_bus(dd, pdev))
 		ipath_dev_err(dd, "Failed to setup config space; "
 			      "continuing anyway\n");
@@ -649,6 +670,10 @@ static void __devexit cleanup_device(struct ipath_devdata *dd)
 		ipath_disable_wc(dd);
 	}
 
+	if (dd->ipath_spectriggerhit)
+		dev_info(&dd->pcidev->dev, "%lu special trigger hits\n",
+			 dd->ipath_spectriggerhit);
+
 	if (dd->ipath_pioavailregs_dma) {
 		dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
 				  (void *) dd->ipath_pioavailregs_dma,
@@ -857,7 +882,7 @@ int ipath_wait_linkstate(struct ipath_devdata *dd, u32 state, int msecs)
 			   (unsigned long long) ipath_read_kreg64(
 				   dd, dd->ipath_kregs->kr_ibcctrl),
 			   (unsigned long long) val,
-			   ipath_ibcstatus_str[val & 0xf]);
+			   ipath_ibcstatus_str[val & dd->ibcs_lts_mask]);
 	}
 	return (dd->ipath_flags & state) ? 0 : -ETIMEDOUT;
 }
@@ -906,6 +931,8 @@ int ipath_decode_err(char *buf, size_t blen, ipath_err_t err)
 		strlcat(buf, "rbadversion ", blen);
 	if (err & INFINIPATH_E_RHDR)
 		strlcat(buf, "rhdr ", blen);
+	if (err & INFINIPATH_E_SENDSPECIALTRIGGER)
+		strlcat(buf, "sendspecialtrigger ", blen);
 	if (err & INFINIPATH_E_RLONGPKTLEN)
 		strlcat(buf, "rlongpktlen ", blen);
 	if (err & INFINIPATH_E_RMAXPKTLEN)
@@ -948,6 +975,8 @@ int ipath_decode_err(char *buf, size_t blen, ipath_err_t err)
 		strlcat(buf, "hardware ", blen);
 	if (err & INFINIPATH_E_RESET)
 		strlcat(buf, "reset ", blen);
+	if (err & INFINIPATH_E_INVALIDEEPCMD)
+		strlcat(buf, "invalideepromcmd ", blen);
 done:
 	return iserr;
 }
@@ -1701,6 +1730,10 @@ bail:
  */
 void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl)
 {
+	if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) {
+		ipath_cdbg(VERBOSE, "Ignore while in autonegotiation\n");
+		goto bail;
+	}
 	ipath_dbg("Cancelling all in-progress send buffers\n");
 
 	/* skip armlaunch errs for a while */
@@ -1721,6 +1754,7 @@ void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl)
 
 	/* and again, be sure all have hit the chip */
 	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+bail:;
 }
 
 /*
@@ -2282,6 +2316,7 @@ static int __init infinipath_init(void)
 	 */
 	idr_init(&unit_table);
 	if (!idr_pre_get(&unit_table, GFP_KERNEL)) {
+		printk(KERN_ERR IPATH_DRV_NAME ": idr_pre_get() failed\n");
 		ret = -ENOMEM;
 		goto bail;
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index eab69df..b87d312 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -2074,7 +2074,7 @@ static int ipath_close(struct inode *in, struct file *fp)
 			pd->port_rcvnowait = pd->port_pionowait = 0;
 	}
 	if (pd->port_flag) {
-		ipath_dbg("port %u port_flag still set to 0x%lx\n",
+		ipath_cdbg(PROC, "port %u port_flag set: 0x%lx\n",
 			  pd->port_port, pd->port_flag);
 		pd->port_flag = 0;
 	}
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index 8d8e572..c012e05 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -230,6 +230,15 @@ static int init_chip_first(struct ipath_devdata *dd)
 	int ret = 0;
 	u64 val;
 
+	spin_lock_init(&dd->ipath_kernel_tid_lock);
+	spin_lock_init(&dd->ipath_user_tid_lock);
+	spin_lock_init(&dd->ipath_sendctrl_lock);
+	spin_lock_init(&dd->ipath_sdma_lock);
+	spin_lock_init(&dd->ipath_gpio_lock);
+	spin_lock_init(&dd->ipath_eep_st_lock);
+	spin_lock_init(&dd->ipath_sdepb_lock);
+	mutex_init(&dd->ipath_eep_lock);
+
 	/*
 	 * skip cfgports stuff because we are not allocating memory,
 	 * and we don't want problems if the portcnt changed due to
@@ -319,12 +328,6 @@ static int init_chip_first(struct ipath_devdata *dd)
 	else ipath_dbg("%u 2k piobufs @ %p\n",
 		       dd->ipath_piobcnt2k, dd->ipath_pio2kbase);
 
-	spin_lock_init(&dd->ipath_user_tid_lock);
-	spin_lock_init(&dd->ipath_sendctrl_lock);
-	spin_lock_init(&dd->ipath_gpio_lock);
-	spin_lock_init(&dd->ipath_eep_st_lock);
-	mutex_init(&dd->ipath_eep_lock);
-
 done:
 	return ret;
 }
@@ -553,7 +556,7 @@ static void enable_chip(struct ipath_devdata *dd, int reinit)
 
 static int init_housekeeping(struct ipath_devdata *dd, int reinit)
 {
-	char boardn[32];
+	char boardn[40];
 	int ret = 0;
 
 	/*
@@ -800,7 +803,12 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 			dd->ipath_pioupd_thresh = kpiobufs;
 	}
 
-	dd->ipath_f_early_init(dd);
+	ret = dd->ipath_f_early_init(dd);
+	if (ret) {
+		ipath_dev_err(dd, "Early initialization failure\n");
+		goto done;
+	}
+
 	/*
 	 * Cancel any possible active sends from early driver load.
 	 * Follows early_init because some chips have to initialize
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index 3bad601..90b972f 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -73,7 +73,7 @@ static void ipath_clrpiobuf(struct ipath_devdata *dd, u32 pnum)
  * If rewrite is true, and bits are set in the sendbufferror registers,
  * we'll write to the buffer, for error recovery on parity errors.
  */
-static void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite)
+void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite)
 {
 	u32 piobcnt;
 	unsigned long sbuf[4];
@@ -87,12 +87,14 @@ static void ipath_disarm_senderrbufs(struct ipath_devdata *dd, int rewrite)
 		dd, dd->ipath_kregs->kr_sendbuffererror);
 	sbuf[1] = ipath_read_kreg64(
 		dd, dd->ipath_kregs->kr_sendbuffererror + 1);
-	if (piobcnt > 128) {
+	if (piobcnt > 128)
 		sbuf[2] = ipath_read_kreg64(
 			dd, dd->ipath_kregs->kr_sendbuffererror + 2);
+	if (piobcnt > 192)
 		sbuf[3] = ipath_read_kreg64(
 			dd, dd->ipath_kregs->kr_sendbuffererror + 3);
-	}
+	else
+		sbuf[3] = 0;
 
 	if (sbuf[0] || sbuf[1] || (piobcnt > 128 && (sbuf[2] || sbuf[3]))) {
 		int i;
@@ -365,7 +367,8 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd,
 		 */
 		if (lastlts == INFINIPATH_IBCS_LT_STATE_POLLACTIVE ||
 		    lastlts == INFINIPATH_IBCS_LT_STATE_POLLQUIET) {
-			if (++dd->ipath_ibpollcnt == 40) {
+			if (!(dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) &&
+			     (++dd->ipath_ibpollcnt == 40)) {
 				dd->ipath_flags |= IPATH_NOCABLE;
 				*dd->ipath_statusp |=
 					IPATH_STATUS_IB_NOCABLE;
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 8cdeab8..1d5adf6 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -1011,6 +1011,7 @@ void ipath_get_eeprom_info(struct ipath_devdata *);
 int ipath_update_eeprom_log(struct ipath_devdata *dd);
 void ipath_inc_eeprom_err(struct ipath_devdata *dd, u32 eidx, u32 incr);
 u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg);
+void ipath_disarm_senderrbufs(struct ipath_devdata *, int);
 void ipath_force_pio_avail_update(struct ipath_devdata *);
 void signal_ib_event(struct ipath_devdata *dd, enum ib_event_type ev);
 
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c
index 7961d26..2e6d2aa 100644
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c
@@ -34,6 +34,7 @@
 #include <linux/ctype.h>
 
 #include "ipath_kernel.h"
+#include "ipath_verbs.h"
 #include "ipath_common.h"
 
 /**
@@ -320,6 +321,8 @@ static ssize_t store_guid(struct device *dev,
 
 	dd->ipath_guid = new_guid;
 	dd->ipath_nguid = 1;
+	if (dd->verbs_dev)
+		dd->verbs_dev->ibdev.node_guid = new_guid;
 
 	ret = strlen(buf);
 	goto bail;
@@ -928,18 +931,17 @@ static ssize_t store_rx_polinv_enb(struct device *dev,
 	u16 val;
 
 	ret = ipath_parse_ushort(buf, &val);
-	if (ret < 0 || val > 1)
-		goto invalid;
+	if (ret >= 0 && val > 1) {
+		ipath_dev_err(dd,
+			"attempt to set invalid Rx Polarity (enable)\n");
+		ret = -EINVAL;
+		goto bail;
+	}
 
 	r = dd->ipath_f_set_ib_cfg(dd, IPATH_IB_CFG_RXPOL_ENB, val);
-	if (r < 0) {
+	if (r < 0)
 		ret = r;
-		goto bail;
-	}
 
-	goto bail;
-invalid:
-	ipath_dev_err(dd, "attempt to set invalid Rx Polarity (enable)\n");
 bail:
 	return ret;
 }


From ralph.campbell at qlogic.com  Wed Apr  2 15:50:38 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:38 -0700
Subject: [ofa-general] [PATCH 19/20] IB/ipath - add calls to new 7220 code
	and enable in build
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com>

From: Dave Olson <dave.olson at qlogic.com>

This patch adds the initialization calls into the new 7220 HCA files,
changes the Makefile to compile and link the new files, and code
to handle send DMA.

Signed-off-by: Dave Olson <dave.olson at qlogic.com>
---

 drivers/infiniband/hw/ipath/Makefile          |    3 
 drivers/infiniband/hw/ipath/ipath_common.h    |   16 +
 drivers/infiniband/hw/ipath/ipath_driver.c    |  150 ++++++++--
 drivers/infiniband/hw/ipath/ipath_file_ops.c  |   97 ++++++
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    4 
 drivers/infiniband/hw/ipath/ipath_intr.c      |  239 +++++++++++----
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    4 
 drivers/infiniband/hw/ipath/ipath_qp.c        |   14 +
 drivers/infiniband/hw/ipath/ipath_ruc.c       |   18 +
 drivers/infiniband/hw/ipath/ipath_sdma.c      |   91 ++++--
 drivers/infiniband/hw/ipath/ipath_stats.c     |    4 
 drivers/infiniband/hw/ipath/ipath_ud.c        |    1 
 drivers/infiniband/hw/ipath/ipath_verbs.c     |  391 ++++++++++++++++++++++++-
 13 files changed, 896 insertions(+), 136 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/Makefile b/drivers/infiniband/hw/ipath/Makefile
index fe67388..75a6c91 100644
--- a/drivers/infiniband/hw/ipath/Makefile
+++ b/drivers/infiniband/hw/ipath/Makefile
@@ -20,17 +20,20 @@ ib_ipath-y := \
 	ipath_qp.o \
 	ipath_rc.o \
 	ipath_ruc.o \
+	ipath_sdma.o \
 	ipath_srq.o \
 	ipath_stats.o \
 	ipath_sysfs.o \
 	ipath_uc.o \
 	ipath_ud.o \
 	ipath_user_pages.o \
+	ipath_user_sdma.o \
 	ipath_verbs_mcast.o \
 	ipath_verbs.o
 
 ib_ipath-$(CONFIG_HT_IRQ) += ipath_iba6110.o
 ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba6120.o
+ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba7220.o ipath_sd7220.o ipath_sd7220_img.o
 
 ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o
 ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o
diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index 02fd310..2cf7cd2 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -447,8 +447,9 @@ struct ipath_user_info {
 #define IPATH_CMD_PIOAVAILUPD	27	/* force an update of PIOAvail reg */
 #define IPATH_CMD_POLL_TYPE	28	/* set the kind of polling we want */
 #define IPATH_CMD_ARMLAUNCH_CTRL	29 /* armlaunch detection control */
-
-#define IPATH_CMD_MAX		29
+/* 30 is unused */
+#define IPATH_CMD_SDMA_INFLIGHT 31	/* sdma inflight counter request */
+#define IPATH_CMD_SDMA_COMPLETE 32	/* sdma completion counter request */
 
 /*
  * Poll types
@@ -486,6 +487,17 @@ struct ipath_cmd {
 	union {
 		struct ipath_tid_info tid_info;
 		struct ipath_user_info user_info;
+
+		/*
+		 * address in userspace where we should put the sdma
+		 * inflight counter
+		 */
+		__u64 sdma_inflight;
+		/*
+		 * address in userspace where we should put the sdma
+		 * completion counter
+		 */
+		__u64 sdma_complete;
 		/* address in userspace of struct ipath_port_info to
 		   write result to */
 		__u64 port_info;
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 66982a9..8ccc915 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -129,8 +129,10 @@ static int __devinit ipath_init_one(struct pci_dev *,
 
 /* Only needed for registration, nothing else needs this info */
 #define PCI_VENDOR_ID_PATHSCALE 0x1fc1
+#define PCI_VENDOR_ID_QLOGIC 0x1077
 #define PCI_DEVICE_ID_INFINIPATH_HT 0xd
 #define PCI_DEVICE_ID_INFINIPATH_PE800 0x10
+#define PCI_DEVICE_ID_INFINIPATH_7220 0x7220
 
 /* Number of seconds before our card status check...  */
 #define STATUS_TIMEOUT 60
@@ -138,6 +140,7 @@ static int __devinit ipath_init_one(struct pci_dev *,
 static const struct pci_device_id ipath_pci_tbl[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_HT) },
 	{ PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE, PCI_DEVICE_ID_INFINIPATH_PE800) },
+	{ PCI_DEVICE(PCI_VENDOR_ID_QLOGIC, PCI_DEVICE_ID_INFINIPATH_7220) },
 	{ 0, }
 };
 
@@ -532,6 +535,13 @@ static int __devinit ipath_init_one(struct pci_dev *pdev,
 			      "CONFIG_PCI_MSI is not enabled\n", ent->device);
 		return -ENODEV;
 #endif
+	case PCI_DEVICE_ID_INFINIPATH_7220:
+#ifndef CONFIG_PCI_MSI
+		ipath_dbg("CONFIG_PCI_MSI is not enabled, "
+			  "using IntX for unit %u\n", dd->ipath_unit);
+#endif
+		ipath_init_iba7220_funcs(dd);
+		break;
 	default:
 		ipath_dev_err(dd, "Found unknown QLogic deviceid 0x%x, "
 			      "failing\n", ent->device);
@@ -887,13 +897,47 @@ int ipath_wait_linkstate(struct ipath_devdata *dd, u32 state, int msecs)
 	return (dd->ipath_flags & state) ? 0 : -ETIMEDOUT;
 }
 
+static void decode_sdma_errs(struct ipath_devdata *dd, ipath_err_t err,
+	char *buf, size_t blen)
+{
+	static const struct {
+		ipath_err_t err;
+		const char *msg;
+	} errs[] = {
+		{ INFINIPATH_E_SDMAGENMISMATCH, "SDmaGenMismatch" },
+		{ INFINIPATH_E_SDMAOUTOFBOUND, "SDmaOutOfBound" },
+		{ INFINIPATH_E_SDMATAILOUTOFBOUND, "SDmaTailOutOfBound" },
+		{ INFINIPATH_E_SDMABASE, "SDmaBase" },
+		{ INFINIPATH_E_SDMA1STDESC, "SDma1stDesc" },
+		{ INFINIPATH_E_SDMARPYTAG, "SDmaRpyTag" },
+		{ INFINIPATH_E_SDMADWEN, "SDmaDwEn" },
+		{ INFINIPATH_E_SDMAMISSINGDW, "SDmaMissingDw" },
+		{ INFINIPATH_E_SDMAUNEXPDATA, "SDmaUnexpData" },
+		{ INFINIPATH_E_SDMADESCADDRMISALIGN, "SDmaDescAddrMisalign" },
+		{ INFINIPATH_E_SENDBUFMISUSE, "SendBufMisuse" },
+		{ INFINIPATH_E_SDMADISABLED, "SDmaDisabled" },
+	};
+	int i;
+	int expected;
+	size_t bidx = 0;
+
+	for (i = 0; i < ARRAY_SIZE(errs); i++) {
+		expected = (errs[i].err != INFINIPATH_E_SDMADISABLED) ? 0 :
+			test_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status);
+		if ((err & errs[i].err) && !expected)
+			bidx += snprintf(buf + bidx, blen - bidx,
+					 "%s ", errs[i].msg);
+	}
+}
+
 /*
  * Decode the error status into strings, deciding whether to always
  * print * it or not depending on "normal packet errors" vs everything
  * else.   Return 1 if "real" errors, otherwise 0 if only packet
  * errors, so caller can decide what to print with the string.
  */
-int ipath_decode_err(char *buf, size_t blen, ipath_err_t err)
+int ipath_decode_err(struct ipath_devdata *dd, char *buf, size_t blen,
+	ipath_err_t err)
 {
 	int iserr = 1;
 	*buf = '\0';
@@ -975,6 +1019,8 @@ int ipath_decode_err(char *buf, size_t blen, ipath_err_t err)
 		strlcat(buf, "hardware ", blen);
 	if (err & INFINIPATH_E_RESET)
 		strlcat(buf, "reset ", blen);
+	if (err & INFINIPATH_E_SDMAERRS)
+		decode_sdma_errs(dd, err, buf, blen);
 	if (err & INFINIPATH_E_INVALIDEEPCMD)
 		strlcat(buf, "invalideepromcmd ", blen);
 done:
@@ -1730,30 +1776,80 @@ bail:
  */
 void ipath_cancel_sends(struct ipath_devdata *dd, int restore_sendctrl)
 {
+	unsigned long flags;
+
 	if (dd->ipath_flags & IPATH_IB_AUTONEG_INPROG) {
 		ipath_cdbg(VERBOSE, "Ignore while in autonegotiation\n");
 		goto bail;
 	}
+	/*
+	 * If we have SDMA, and it's not disabled, we have to kick off the
+	 * abort state machine, provided we aren't already aborting.
+	 * If we are in the process of aborting SDMA (!DISABLED, but ABORTING),
+	 * we skip the rest of this routine. It is already "in progress"
+	 */
+	if (dd->ipath_flags & IPATH_HAS_SEND_DMA) {
+		int skip_cancel;
+		u64 *statp = &dd->ipath_sdma_status;
+
+		spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+		skip_cancel =
+			!test_bit(IPATH_SDMA_DISABLED, statp) &&
+			test_and_set_bit(IPATH_SDMA_ABORTING, statp);
+		spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+		if (skip_cancel)
+			goto bail;
+	}
+
 	ipath_dbg("Cancelling all in-progress send buffers\n");
 
 	/* skip armlaunch errs for a while */
 	dd->ipath_lastcancel = jiffies + HZ / 2;
 
 	/*
-	 * the abort bit is auto-clearing.  We read scratch to be sure
-	 * that cancels and the abort have taken effect in the chip.
+	 * The abort bit is auto-clearing.  We also don't want pioavail
+	 * update happening during this, and we don't want any other
+	 * sends going out, so turn those off for the duration.  We read
+	 * the scratch register to be sure that cancels and the abort
+	 * have taken effect in the chip.  Otherwise two parts are same
+	 * as ipath_force_pio_avail_update()
 	 */
+	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+	dd->ipath_sendctrl &= ~(INFINIPATH_S_PIOBUFAVAILUPD
+		| INFINIPATH_S_PIOENABLE);
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-		INFINIPATH_S_ABORT);
+		dd->ipath_sendctrl | INFINIPATH_S_ABORT);
 	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+
+	/* disarm all send buffers */
 	ipath_disarm_piobufs(dd, 0,
-		(unsigned)(dd->ipath_piobcnt2k + dd->ipath_piobcnt4k));
-	if (restore_sendctrl) /* else done by caller later */
+		dd->ipath_piobcnt2k + dd->ipath_piobcnt4k);
+
+	if (restore_sendctrl) {
+		/* else done by caller later if needed */
+		spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+		dd->ipath_sendctrl |= INFINIPATH_S_PIOBUFAVAILUPD |
+			INFINIPATH_S_PIOENABLE;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-				 dd->ipath_sendctrl);
+			dd->ipath_sendctrl);
+		/* and again, be sure all have hit the chip */
+		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+		spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+	}
 
-	/* and again, be sure all have hit the chip */
-	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	if ((dd->ipath_flags & IPATH_HAS_SEND_DMA) &&
+	    !test_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status) &&
+	    test_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status)) {
+		spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+		/* only wait so long for intr */
+		dd->ipath_sdma_abort_intr_timeout = jiffies + HZ;
+		dd->ipath_sdma_reset_wait = 200;
+		__set_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status);
+		if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+			tasklet_hi_schedule(&dd->ipath_sdma_abort_task);
+		spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+	}
 bail:;
 }
 
@@ -1952,7 +2048,7 @@ bail:
  * sanity checking on this, and we don't deal with what happens to
  * programs that are already running when the size changes.
  * NOTE: changing the MTU will usually cause the IBC to go back to
- * link initialize (IPATH_IBSTATE_INIT) state...
+ * link INIT state...
  */
 int ipath_set_mtu(struct ipath_devdata *dd, u16 arg)
 {
@@ -2092,9 +2188,8 @@ static void ipath_run_led_override(unsigned long opaque)
 	 * but leave that to per-chip functions.
 	 */
 	val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus);
-	ltstate = (val >> INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) &
-		  dd->ibcs_lts_mask;
-	lstate = (val >> dd->ibcs_ls_shift) & INFINIPATH_IBCS_LINKSTATE_MASK;
+	ltstate = ipath_ib_linktrstate(dd, val);
+	lstate = ipath_ib_linkstate(dd, val);
 
 	dd->ipath_f_setextled(dd, lstate, ltstate);
 	mod_timer(&dd->ipath_led_override_timer, jiffies + timeoff);
@@ -2170,6 +2265,9 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
 			 dd->ipath_rcvctrl);
 
+	if (dd->ipath_flags & IPATH_HAS_SEND_DMA)
+		teardown_sdma(dd);
+
 	/*
 	 * gracefully stop all sends allowing any in progress to trickle out
 	 * first.
@@ -2187,9 +2285,16 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 	 */
 	udelay(5);
 
+	dd->ipath_f_setextled(dd, 0, 0); /* make sure LEDs are off */
+
 	ipath_set_ib_lstate(dd, 0, INFINIPATH_IBCC_LINKINITCMD_DISABLE);
 	ipath_cancel_sends(dd, 0);
 
+	/*
+	 * we are shutting down, so tell components that care.  We don't do
+	 * this on just a link state change, much like ethernet, a cable
+	 * unplug, etc. doesn't change driver state
+	 */
 	signal_ib_event(dd, IB_EVENT_PORT_ERR);
 
 	/* disable IBC */
@@ -2214,6 +2319,10 @@ void ipath_shutdown_device(struct ipath_devdata *dd)
 		del_timer_sync(&dd->ipath_intrchk_timer);
 		dd->ipath_intrchk_timer.data = 0;
 	}
+	if (atomic_read(&dd->ipath_led_override_timer_active)) {
+		del_timer_sync(&dd->ipath_led_override_timer);
+		atomic_set(&dd->ipath_led_override_timer_active, 0);
+	}
 
 	/*
 	 * clear all interrupts and errors, so that the next time the driver
@@ -2408,13 +2517,18 @@ int ipath_reset_device(int unit)
 			}
 		}
 
+	if (dd->ipath_flags & IPATH_HAS_SEND_DMA)
+		teardown_sdma(dd);
+
 	dd->ipath_flags &= ~IPATH_INITTED;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_intmask, 0ULL);
 	ret = dd->ipath_f_reset(dd);
-	if (ret != 1)
-		ipath_dbg("reset was not successful\n");
-	ipath_dbg("Trying to reinitialize unit %u after reset attempt\n",
-		  unit);
-	ret = ipath_init_chip(dd, 1);
+	if (ret == 1) {
+		ipath_dbg("Reinitializing unit %u after reset attempt\n",
+			  unit);
+		ret = ipath_init_chip(dd, 1);
+	} else
+		ret = -EAGAIN;
 	if (ret)
 		ipath_dev_err(dd, "Reinitialize unit %u after "
 			      "reset failed with %d\n", unit, ret);
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index b87d312..d38ba29 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -36,21 +36,28 @@
 #include <linux/cdev.h>
 #include <linux/swap.h>
 #include <linux/vmalloc.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/jiffies.h>
 #include <asm/pgtable.h>
 
 #include "ipath_kernel.h"
 #include "ipath_common.h"
+#include "ipath_user_sdma.h"
 
 static int ipath_open(struct inode *, struct file *);
 static int ipath_close(struct inode *, struct file *);
 static ssize_t ipath_write(struct file *, const char __user *, size_t,
 			   loff_t *);
+static ssize_t ipath_writev(struct kiocb *, const struct iovec *,
+			    unsigned long , loff_t);
 static unsigned int ipath_poll(struct file *, struct poll_table_struct *);
 static int ipath_mmap(struct file *, struct vm_area_struct *);
 
 static const struct file_operations ipath_file_ops = {
 	.owner = THIS_MODULE,
 	.write = ipath_write,
+	.aio_write = ipath_writev,
 	.open = ipath_open,
 	.release = ipath_close,
 	.poll = ipath_poll,
@@ -1870,10 +1877,9 @@ static int ipath_assign_port(struct file *fp,
 	if (ipath_compatible_subports(swmajor, swminor) &&
 	    uinfo->spu_subport_cnt &&
 	    (ret = find_shared_port(fp, uinfo))) {
-		mutex_unlock(&ipath_mutex);
 		if (ret > 0)
 			ret = 0;
-		goto done;
+		goto done_chk_sdma;
 	}
 
 	i_minor = iminor(fp->f_path.dentry->d_inode) - IPATH_USER_MINOR_BASE;
@@ -1885,6 +1891,21 @@ static int ipath_assign_port(struct file *fp,
 	else
 		ret = find_best_unit(fp, uinfo);
 
+done_chk_sdma:
+	if (!ret) {
+		struct ipath_filedata *fd = fp->private_data;
+		const struct ipath_portdata *pd = fd->pd;
+		const struct ipath_devdata *dd = pd->port_dd;
+
+		fd->pq = ipath_user_sdma_queue_create(&dd->pcidev->dev,
+						      dd->ipath_unit,
+						      pd->port_port,
+						      fd->subport);
+
+		if (!fd->pq)
+			ret = -ENOMEM;
+	}
+
 	mutex_unlock(&ipath_mutex);
 
 done:
@@ -2042,6 +2063,13 @@ static int ipath_close(struct inode *in, struct file *fp)
 		mutex_unlock(&ipath_mutex);
 		goto bail;
 	}
+
+	dd = pd->port_dd;
+
+	/* drain user sdma queue */
+	ipath_user_sdma_queue_drain(dd, fd->pq);
+	ipath_user_sdma_queue_destroy(fd->pq);
+
 	if (--pd->port_cnt) {
 		/*
 		 * XXX If the master closes the port before the slave(s),
@@ -2054,7 +2082,6 @@ static int ipath_close(struct inode *in, struct file *fp)
 		goto bail;
 	}
 	port = pd->port_port;
-	dd = pd->port_dd;
 
 	if (pd->port_hdrqfull) {
 		ipath_cdbg(PROC, "%s[%u] had %u rcvhdrqfull errors "
@@ -2176,6 +2203,35 @@ static int ipath_get_slave_info(struct ipath_portdata *pd,
 	return ret;
 }
 
+static int ipath_sdma_get_inflight(struct ipath_user_sdma_queue *pq,
+				   u32 __user *inflightp)
+{
+	const u32 val = ipath_user_sdma_inflight_counter(pq);
+
+	if (put_user(val, inflightp))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int ipath_sdma_get_complete(struct ipath_devdata *dd,
+				   struct ipath_user_sdma_queue *pq,
+				   u32 __user *completep)
+{
+	u32 val;
+	int err;
+
+	err = ipath_user_sdma_make_progress(dd, pq);
+	if (err < 0)
+		return err;
+
+	val = ipath_user_sdma_complete_counter(pq);
+	if (put_user(val, completep))
+		return -EFAULT;
+
+	return 0;
+}
+
 static ssize_t ipath_write(struct file *fp, const char __user *data,
 			   size_t count, loff_t *off)
 {
@@ -2250,6 +2306,16 @@ static ssize_t ipath_write(struct file *fp, const char __user *data,
 		dest = &cmd.cmd.armlaunch_ctrl;
 		src = &ucmd->cmd.armlaunch_ctrl;
 		break;
+	case IPATH_CMD_SDMA_INFLIGHT:
+		copy = sizeof(cmd.cmd.sdma_inflight);
+		dest = &cmd.cmd.sdma_inflight;
+		src = &ucmd->cmd.sdma_inflight;
+		break;
+	case IPATH_CMD_SDMA_COMPLETE:
+		copy = sizeof(cmd.cmd.sdma_complete);
+		dest = &cmd.cmd.sdma_complete;
+		src = &ucmd->cmd.sdma_complete;
+		break;
 	default:
 		ret = -EINVAL;
 		goto bail;
@@ -2331,6 +2397,17 @@ static ssize_t ipath_write(struct file *fp, const char __user *data,
 		else
 			ipath_disable_armlaunch(pd->port_dd);
 		break;
+	case IPATH_CMD_SDMA_INFLIGHT:
+		ret = ipath_sdma_get_inflight(user_sdma_queue_fp(fp),
+					      (u32 __user *) (unsigned long)
+					      cmd.cmd.sdma_inflight);
+		break;
+	case IPATH_CMD_SDMA_COMPLETE:
+		ret = ipath_sdma_get_complete(pd->port_dd,
+					      user_sdma_queue_fp(fp),
+					      (u32 __user *) (unsigned long)
+					      cmd.cmd.sdma_complete);
+		break;
 	}
 
 	if (ret >= 0)
@@ -2340,6 +2417,20 @@ bail:
 	return ret;
 }
 
+static ssize_t ipath_writev(struct kiocb *iocb, const struct iovec *iov,
+			    unsigned long dim, loff_t off)
+{
+	struct file *filp = iocb->ki_filp;
+	struct ipath_filedata *fp = filp->private_data;
+	struct ipath_portdata *pd = port_fp(filp);
+	struct ipath_user_sdma_queue *pq = fp->pq;
+
+	if (!dim)
+		return -EINVAL;
+
+	return ipath_user_sdma_writev(pd->port_dd, pq, iov, dim);
+}
+
 static struct class *ipath_class;
 
 static int init_cdev(int minor, char *name, const struct file_operations *fops,
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index c012e05..b43c2a1 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -980,6 +980,10 @@ int ipath_init_chip(struct ipath_devdata *dd, int reinit)
 		dd->ipath_stats_timer_active = 1;
 	}
 
+	/* Set up SendDMA if chip supports it */
+	if (dd->ipath_flags & IPATH_HAS_SEND_DMA)
+		ret = setup_sdma(dd);
+
 	/* Set up HoL state */
 	init_timer(&dd->ipath_hol_timer);
 	dd->ipath_hol_timer.function = ipath_hol_event;
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index 90b972f..d0088d5 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -433,6 +433,8 @@ static void handle_e_ibstatuschanged(struct ipath_devdata *dd,
 			dd->ipath_flags &= ~(IPATH_LINKUNK | IPATH_LINKINIT
 				| IPATH_LINKDOWN | IPATH_LINKARMED |
 				IPATH_NOCABLE);
+			if (dd->ipath_flags & IPATH_HAS_SEND_DMA)
+				ipath_restart_sdma(dd);
 			signal_ib_event(dd, IB_EVENT_PORT_ACTIVE);
 			/* LED active not handled in chip _f_updown */
 			dd->ipath_f_setextled(dd, lstate, ltstate);
@@ -480,7 +482,7 @@ done:
 }
 
 static void handle_supp_msgs(struct ipath_devdata *dd,
-			     unsigned supp_msgs, char *msg, int msgsz)
+			     unsigned supp_msgs, char *msg, u32 msgsz)
 {
 	/*
 	 * Print the message unless it's ibc status change only, which
@@ -488,12 +490,19 @@ static void handle_supp_msgs(struct ipath_devdata *dd,
 	 */
 	if (dd->ipath_lasterror & ~INFINIPATH_E_IBSTATUSCHANGED) {
 		int iserr;
-		iserr = ipath_decode_err(msg, msgsz,
+		ipath_err_t mask;
+		iserr = ipath_decode_err(dd, msg, msgsz,
 					 dd->ipath_lasterror &
 					 ~INFINIPATH_E_IBSTATUSCHANGED);
-		if (dd->ipath_lasterror &
-			~(INFINIPATH_E_RRCVEGRFULL |
-			INFINIPATH_E_RRCVHDRFULL | INFINIPATH_E_PKTERRS))
+
+		mask = INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL |
+			INFINIPATH_E_PKTERRS | INFINIPATH_E_SDMADISABLED;
+
+		/* if we're in debug, then don't mask SDMADISABLED msgs */
+		if (ipath_debug & __IPATH_DBG)
+			mask &= ~INFINIPATH_E_SDMADISABLED;
+
+		if (dd->ipath_lasterror & ~mask)
 			ipath_dev_err(dd, "Suppressed %u messages for "
 				      "fast-repeating errors (%s) (%llx)\n",
 				      supp_msgs, msg,
@@ -520,7 +529,7 @@ static void handle_supp_msgs(struct ipath_devdata *dd,
 
 static unsigned handle_frequent_errors(struct ipath_devdata *dd,
 				       ipath_err_t errs, char *msg,
-				       int msgsz, int *noprint)
+				       u32 msgsz, int *noprint)
 {
 	unsigned long nc;
 	static unsigned long nextmsg_time;
@@ -550,19 +559,125 @@ static unsigned handle_frequent_errors(struct ipath_devdata *dd,
 	return supp_msgs;
 }
 
+static void handle_sdma_errors(struct ipath_devdata *dd, ipath_err_t errs)
+{
+	unsigned long flags;
+	int expected;
+
+	if (ipath_debug & __IPATH_DBG) {
+		char msg[128];
+		ipath_decode_err(dd, msg, sizeof msg, errs &
+			INFINIPATH_E_SDMAERRS);
+		ipath_dbg("errors %lx (%s)\n", (unsigned long)errs, msg);
+	}
+	if (ipath_debug & __IPATH_VERBDBG) {
+		unsigned long tl, hd, status, lengen;
+		tl = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmatail);
+		hd = ipath_read_kreg64(dd, dd->ipath_kregs->kr_senddmahead);
+		status = ipath_read_kreg64(dd
+			, dd->ipath_kregs->kr_senddmastatus);
+		lengen = ipath_read_kreg64(dd,
+			dd->ipath_kregs->kr_senddmalengen);
+		ipath_cdbg(VERBOSE, "sdma tl 0x%lx hd 0x%lx status 0x%lx "
+			"lengen 0x%lx\n", tl, hd, status, lengen);
+	}
+
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+	__set_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status);
+	expected = test_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status);
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+	if (!expected)
+		ipath_cancel_sends(dd, 1);
+}
+
+static void handle_sdma_intr(struct ipath_devdata *dd, u64 istat)
+{
+	unsigned long flags;
+	int expected;
+
+	if ((istat & INFINIPATH_I_SDMAINT) &&
+	    !test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+		ipath_sdma_intr(dd);
+
+	if (istat & INFINIPATH_I_SDMADISABLED) {
+		expected = test_bit(IPATH_SDMA_ABORTING,
+			&dd->ipath_sdma_status);
+		ipath_dbg("%s SDmaDisabled intr\n",
+			expected ? "expected" : "unexpected");
+		spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+		__set_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status);
+		spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+		if (!expected)
+			ipath_cancel_sends(dd, 1);
+		if (!test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+			tasklet_hi_schedule(&dd->ipath_sdma_abort_task);
+	}
+}
+
+static int handle_hdrq_full(struct ipath_devdata *dd)
+{
+	int chkerrpkts = 0;
+	u32 hd, tl;
+	u32 i;
+
+	ipath_stats.sps_hdrqfull++;
+	for (i = 0; i < dd->ipath_cfgports; i++) {
+		struct ipath_portdata *pd = dd->ipath_pd[i];
+
+		if (i == 0) {
+			/*
+			 * For kernel receive queues, we just want to know
+			 * if there are packets in the queue that we can
+			 * process.
+			 */
+			if (pd->port_head != ipath_get_hdrqtail(pd))
+				chkerrpkts |= 1 << i;
+			continue;
+		}
+
+		/* Skip if user context is not open */
+		if (!pd || !pd->port_cnt)
+			continue;
+
+		/* Don't report the same point multiple times. */
+		if (dd->ipath_flags & IPATH_NODMA_RTAIL)
+			tl = ipath_read_ureg32(dd, ur_rcvhdrtail, i);
+		else
+			tl = ipath_get_rcvhdrtail(pd);
+		if (tl == pd->port_lastrcvhdrqtail)
+			continue;
+
+		hd = ipath_read_ureg32(dd, ur_rcvhdrhead, i);
+		if (hd == (tl + 1) || (!hd && tl == dd->ipath_hdrqlast)) {
+			pd->port_lastrcvhdrqtail = tl;
+			pd->port_hdrqfull++;
+			/* flush hdrqfull so that poll() sees it */
+			wmb();
+			wake_up_interruptible(&pd->port_wait);
+		}
+	}
+
+	return chkerrpkts;
+}
+
 static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 {
 	char msg[128];
 	u64 ignore_this_time = 0;
-	int i, iserr = 0;
+	u64 iserr = 0;
 	int chkerrpkts = 0, noprint = 0;
 	unsigned supp_msgs;
 	int log_idx;
 
-	supp_msgs = handle_frequent_errors(dd, errs, msg, sizeof msg, &noprint);
+	/*
+	 * don't report errors that are masked, either at init
+	 * (not set in ipath_errormask), or temporarily (set in
+	 * ipath_maskederrs)
+	 */
+	errs &= dd->ipath_errormask & ~dd->ipath_maskederrs;
 
-	/* don't report errors that are masked */
-	errs &= ~dd->ipath_maskederrs;
+	supp_msgs = handle_frequent_errors(dd, errs, msg, (u32)sizeof msg,
+		&noprint);
 
 	/* do these first, they are most important */
 	if (errs & INFINIPATH_E_HARDWARE) {
@@ -577,6 +692,9 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 		}
 	}
 
+	if (errs & INFINIPATH_E_SDMAERRS)
+		handle_sdma_errors(dd, errs);
+
 	if (!noprint && (errs & ~dd->ipath_e_bitsextant))
 		ipath_dev_err(dd, "error interrupt with unknown errors "
 			      "%llx set\n", (unsigned long long)
@@ -611,7 +729,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 		dd->ipath_errormask &= ~dd->ipath_maskederrs;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_errormask,
 				 dd->ipath_errormask);
-		s_iserr = ipath_decode_err(msg, sizeof msg,
+		s_iserr = ipath_decode_err(dd, msg, sizeof msg,
 					   dd->ipath_maskederrs);
 
 		if (dd->ipath_maskederrs &
@@ -661,26 +779,43 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 			  INFINIPATH_E_IBSTATUSCHANGED);
 	}
 
-	/* likely due to cancel, so suppress */
+	if (errs & INFINIPATH_E_SENDSPECIALTRIGGER) {
+		dd->ipath_spectriggerhit++;
+		ipath_dbg("%lu special trigger hits\n",
+			dd->ipath_spectriggerhit);
+	}
+
+	/* likely due to cancel; so suppress message unless verbose */
 	if ((errs & (INFINIPATH_E_SPKTLEN | INFINIPATH_E_SPIOARMLAUNCH)) &&
 		dd->ipath_lastcancel > jiffies) {
-		ipath_dbg("Suppressed armlaunch/spktlen after error send cancel\n");
+		/* armlaunch takes precedence; it often causes both. */
+		ipath_cdbg(VERBOSE,
+			"Suppressed %s error (%llx) after sendbuf cancel\n",
+			(errs &  INFINIPATH_E_SPIOARMLAUNCH) ?
+			"armlaunch" : "sendpktlen", (unsigned long long)errs);
 		errs &= ~(INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SPKTLEN);
 	}
 
 	if (!errs)
 		return 0;
 
-	if (!noprint)
+	if (!noprint) {
+		ipath_err_t mask;
 		/*
-		 * the ones we mask off are handled specially below or above
+		 * The ones we mask off are handled specially below
+		 * or above.  Also mask SDMADISABLED by default as it
+		 * is too chatty.
 		 */
-		ipath_decode_err(msg, sizeof msg,
-				 errs & ~(INFINIPATH_E_IBSTATUSCHANGED |
-					  INFINIPATH_E_RRCVEGRFULL |
-					  INFINIPATH_E_RRCVHDRFULL |
-					  INFINIPATH_E_HARDWARE));
-	else
+		mask = INFINIPATH_E_IBSTATUSCHANGED |
+			INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL |
+			INFINIPATH_E_HARDWARE | INFINIPATH_E_SDMADISABLED;
+
+		/* if we're in debug, then don't mask SDMADISABLED msgs */
+		if (ipath_debug & __IPATH_DBG)
+			mask &= ~INFINIPATH_E_SDMADISABLED;
+
+		ipath_decode_err(dd, msg, sizeof msg, errs & ~mask);
+	} else
 		/* so we don't need if (!noprint) at strlcat's below */
 		*msg = 0;
 
@@ -705,39 +840,8 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 	 * fast_stats, no more than every 5 seconds, user ports get printed
 	 * on close
 	 */
-	if (errs & INFINIPATH_E_RRCVHDRFULL) {
-		u32 hd, tl;
-		ipath_stats.sps_hdrqfull++;
-		for (i = 0; i < dd->ipath_cfgports; i++) {
-			struct ipath_portdata *pd = dd->ipath_pd[i];
-			if (i == 0) {
-				hd = pd->port_head;
-				tl = ipath_get_hdrqtail(pd);
-			} else if (pd && pd->port_cnt &&
-				   pd->port_rcvhdrtail_kvaddr) {
-				/*
-				 * don't report same point multiple times,
-				 * except kernel
-				 */
-				tl = *(u64 *) pd->port_rcvhdrtail_kvaddr;
-				if (tl == pd->port_lastrcvhdrqtail)
-					continue;
-				hd = ipath_read_ureg32(dd, ur_rcvhdrhead,
-						       i);
-			} else
-				continue;
-			if (hd == (tl + 1) ||
-			    (!hd && tl == dd->ipath_hdrqlast)) {
-				if (i == 0)
-					chkerrpkts = 1;
-				pd->port_lastrcvhdrqtail = tl;
-				pd->port_hdrqfull++;
-				/* flush hdrqfull so that poll() sees it */
-				wmb();
-				wake_up_interruptible(&pd->port_wait);
-			}
-		}
-	}
+	if (errs & INFINIPATH_E_RRCVHDRFULL)
+		chkerrpkts |= handle_hdrq_full(dd);
 	if (errs & INFINIPATH_E_RRCVEGRFULL) {
 		struct ipath_portdata *pd = dd->ipath_pd[0];
 
@@ -749,7 +853,7 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 		 */
 		ipath_stats.sps_etidfull++;
 		if (pd->port_head != ipath_get_hdrqtail(pd))
-			chkerrpkts = 1;
+			chkerrpkts |= 1;
 	}
 
 	/*
@@ -788,9 +892,6 @@ static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
 	if (!noprint && *msg) {
 		if (iserr)
 			ipath_dev_err(dd, "%s error\n", msg);
-		else
-			dev_info(&dd->pcidev->dev, "%s packet problems\n",
-				msg);
 	}
 	if (dd->ipath_state_wanted & dd->ipath_flags) {
 		ipath_cdbg(VERBOSE, "driver wanted state %x, iflags now %x, "
@@ -1017,7 +1118,7 @@ static void handle_urcv(struct ipath_devdata *dd, u64 istat)
 irqreturn_t ipath_intr(int irq, void *data)
 {
 	struct ipath_devdata *dd = data;
-	u32 istat, chk0rcv = 0;
+	u64 istat, chk0rcv = 0;
 	ipath_err_t estat = 0;
 	irqreturn_t ret;
 	static unsigned unexpected = 0;
@@ -1070,17 +1171,17 @@ irqreturn_t ipath_intr(int irq, void *data)
 
 	if (unlikely(istat & ~dd->ipath_i_bitsextant))
 		ipath_dev_err(dd,
-			      "interrupt with unknown interrupts %x set\n",
-			      istat & (u32) ~ dd->ipath_i_bitsextant);
-	else
-		ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat);
+			      "interrupt with unknown interrupts %Lx set\n",
+			      istat & ~dd->ipath_i_bitsextant);
+	else if (istat & ~INFINIPATH_I_ERROR) /* errors do own printing */
+		ipath_cdbg(VERBOSE, "intr stat=0x%Lx\n", istat);
 
-	if (unlikely(istat & INFINIPATH_I_ERROR)) {
+	if (istat & INFINIPATH_I_ERROR) {
 		ipath_stats.sps_errints++;
 		estat = ipath_read_kreg64(dd,
 					  dd->ipath_kregs->kr_errorstatus);
 		if (!estat)
-			dev_info(&dd->pcidev->dev, "error interrupt (%x), "
+			dev_info(&dd->pcidev->dev, "error interrupt (%Lx), "
 				 "but no error bits set!\n", istat);
 		else if (estat == -1LL)
 			/*
@@ -1198,6 +1299,9 @@ irqreturn_t ipath_intr(int irq, void *data)
 		     (dd->ipath_i_rcvurg_mask << dd->ipath_i_rcvurg_shift)))
 		handle_urcv(dd, istat);
 
+	if (istat & (INFINIPATH_I_SDMAINT | INFINIPATH_I_SDMADISABLED))
+		handle_sdma_intr(dd, istat);
+
 	if (istat & INFINIPATH_I_SPIOBUFAVAIL) {
 		unsigned long flags;
 
@@ -1208,7 +1312,10 @@ irqreturn_t ipath_intr(int irq, void *data)
 		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
 		spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
 
-		handle_layer_pioavail(dd);
+		if (!(dd->ipath_flags & IPATH_HAS_SEND_DMA))
+			handle_layer_pioavail(dd);
+		else
+			ipath_dbg("unexpected BUFAVAIL intr\n");
 	}
 
 	ret = IRQ_HANDLED;
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index 1d5adf6..a4857b9 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -872,7 +872,8 @@ struct sk_buff *ipath_alloc_skb(struct ipath_devdata *dd, gfp_t);
 extern int ipath_diag_inuse;
 
 irqreturn_t ipath_intr(int irq, void *devid);
-int ipath_decode_err(char *buf, size_t blen, ipath_err_t err);
+int ipath_decode_err(struct ipath_devdata *dd, char *buf, size_t blen,
+		     ipath_err_t err);
 #if __IPATH_INFO || __IPATH_DBG
 extern const char *ipath_ibcstatus_str[];
 #endif
@@ -1027,6 +1028,7 @@ void ipath_set_led_override(struct ipath_devdata *dd, unsigned int val);
 /* send dma routines */
 int setup_sdma(struct ipath_devdata *);
 void teardown_sdma(struct ipath_devdata *);
+void ipath_restart_sdma(struct ipath_devdata *);
 void ipath_sdma_intr(struct ipath_devdata *);
 int ipath_sdma_verbs_send(struct ipath_devdata *, struct ipath_sge_state *,
 			  u32, struct ipath_verbs_txreq *);
diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c
index 812b42c..ded970b 100644
--- a/drivers/infiniband/hw/ipath/ipath_qp.c
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c
@@ -340,6 +340,7 @@ static void ipath_reset_qp(struct ipath_qp *qp, enum ib_qp_type type)
 	qp->s_flags &= IPATH_S_SIGNAL_REQ_WR;
 	qp->s_hdrwords = 0;
 	qp->s_wqe = NULL;
+	qp->s_pkt_delay = 0;
 	qp->s_psn = 0;
 	qp->r_psn = 0;
 	qp->r_msn = 0;
@@ -563,8 +564,10 @@ int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	if (attr_mask & IB_QP_ACCESS_FLAGS)
 		qp->qp_access_flags = attr->qp_access_flags;
 
-	if (attr_mask & IB_QP_AV)
+	if (attr_mask & IB_QP_AV) {
 		qp->remote_ah_attr = attr->ah_attr;
+		qp->s_dmult = ipath_ib_rate_to_mult(attr->ah_attr.static_rate);
+	}
 
 	if (attr_mask & IB_QP_PATH_MTU)
 		qp->path_mtu = attr->path_mtu;
@@ -850,6 +853,7 @@ struct ib_qp *ipath_create_qp(struct ib_pd *ibpd,
 			goto bail_qp;
 		}
 		qp->ip = NULL;
+		qp->s_tx = NULL;
 		ipath_reset_qp(qp, init_attr->qp_type);
 		break;
 
@@ -955,12 +959,20 @@ int ipath_destroy_qp(struct ib_qp *ibqp)
 	/* Stop the sending tasklet. */
 	tasklet_kill(&qp->s_task);
 
+	if (qp->s_tx) {
+		atomic_dec(&qp->refcount);
+		if (qp->s_tx->txreq.flags & IPATH_SDMA_TXREQ_F_FREEBUF)
+			kfree(qp->s_tx->txreq.map_addr);
+	}
+
 	/* Make sure the QP isn't on the timeout list. */
 	spin_lock_irqsave(&dev->pending_lock, flags);
 	if (!list_empty(&qp->timerwait))
 		list_del_init(&qp->timerwait);
 	if (!list_empty(&qp->piowait))
 		list_del_init(&qp->piowait);
+	if (qp->s_tx)
+		list_add(&qp->s_tx->txreq.list, &dev->txreq_free);
 	spin_unlock_irqrestore(&dev->pending_lock, flags);
 
 	/*
diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index a59bdbd..bcaa291 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -483,14 +483,16 @@ done:
 
 static void want_buffer(struct ipath_devdata *dd)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
-	dd->ipath_sendctrl |= INFINIPATH_S_PIOINTBUFAVAIL;
-	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-			 dd->ipath_sendctrl);
-	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
-	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+	if (!(dd->ipath_flags & IPATH_HAS_SEND_DMA)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+		dd->ipath_sendctrl |= INFINIPATH_S_PIOINTBUFAVAIL;
+		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+				 dd->ipath_sendctrl);
+		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+		spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+	}
 }
 
 /**
diff --git a/drivers/infiniband/hw/ipath/ipath_sdma.c b/drivers/infiniband/hw/ipath/ipath_sdma.c
index 5918caf..1974df7 100644
--- a/drivers/infiniband/hw/ipath/ipath_sdma.c
+++ b/drivers/infiniband/hw/ipath/ipath_sdma.c
@@ -230,7 +230,6 @@ static void dump_sdma_state(struct ipath_devdata *dd)
 static void sdma_abort_task(unsigned long opaque)
 {
 	struct ipath_devdata *dd = (struct ipath_devdata *) opaque;
-	int kick = 0;
 	u64 status;
 	unsigned long flags;
 
@@ -308,30 +307,26 @@ static void sdma_abort_task(unsigned long opaque)
 		/* done with sdma state for a bit */
 		spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
 
-		/* restart sdma engine */
+		/*
+		 * Don't restart sdma here. Wait until link is up to ACTIVE.
+		 * VL15 MADs used to bring the link up use PIO, and multiple
+		 * link transitions otherwise cause the sdma engine to be
+		 * stopped and started multiple times.
+		 * The disable is done here, including the shadow, so the
+		 * state is kept consistent.
+		 * See ipath_restart_sdma() for the actual starting of sdma.
+		 */
 		spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
 		dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE;
 		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
 				 dd->ipath_sendctrl);
 		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
-		dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE;
-		ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
-				 dd->ipath_sendctrl);
-		ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
 		spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
-		kick = 1;
-		ipath_dbg("sdma restarted from abort\n");
-
-		/* now clear status bits */
-		spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
-		__clear_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status);
-		__clear_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status);
-		__clear_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status);
 
 		/* make sure I see next message */
 		dd->ipath_sdma_abort_jiffies = 0;
 
-		goto unlock;
+		goto done;
 	}
 
 resched:
@@ -353,10 +348,8 @@ resched_noprint:
 
 unlock:
 	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
-
-	/* kick upper layers */
-	if (kick)
-		ipath_ib_piobufavail(dd->verbs_dev);
+done:
+	return;
 }
 
 /*
@@ -481,10 +474,14 @@ int setup_sdma(struct ipath_devdata *dd)
 	tasklet_init(&dd->ipath_sdma_abort_task, sdma_abort_task,
 		     (unsigned long) dd);
 
-	/* Turn on SDMA */
+	/*
+	 * No use to turn on SDMA here, as link is probably not ACTIVE
+	 * Just mark it RUNNING and enable the interrupt, and let the
+	 * ipath_restart_sdma() on link transition to ACTIVE actually
+	 * enable it.
+	 */
 	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
-	dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE |
-		INFINIPATH_S_SDMAINTENABLE;
+	dd->ipath_sendctrl |= INFINIPATH_S_SDMAINTENABLE;
 	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
 	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
 	__set_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status);
@@ -572,6 +569,56 @@ void teardown_sdma(struct ipath_devdata *dd)
 				  sdma_descq, sdma_descq_phys);
 }
 
+/*
+ * [Re]start SDMA, if we use it, and it's not already OK.
+ * This is called on transition to link ACTIVE, either the first or
+ * subsequent times.
+ */
+void ipath_restart_sdma(struct ipath_devdata *dd)
+{
+	unsigned long flags;
+	int needed = 1;
+
+	if (!(dd->ipath_flags & IPATH_HAS_SEND_DMA))
+		goto bail;
+
+	/*
+	 * First, make sure we should, which is to say,
+	 * check that we are "RUNNING" (not in teardown)
+	 * and not "SHUTDOWN"
+	 */
+	spin_lock_irqsave(&dd->ipath_sdma_lock, flags);
+	if (!test_bit(IPATH_SDMA_RUNNING, &dd->ipath_sdma_status)
+		|| test_bit(IPATH_SDMA_SHUTDOWN, &dd->ipath_sdma_status))
+			needed = 0;
+	else {
+		__clear_bit(IPATH_SDMA_DISABLED, &dd->ipath_sdma_status);
+		__clear_bit(IPATH_SDMA_DISARMED, &dd->ipath_sdma_status);
+		__clear_bit(IPATH_SDMA_ABORTING, &dd->ipath_sdma_status);
+	}
+	spin_unlock_irqrestore(&dd->ipath_sdma_lock, flags);
+	if (!needed) {
+		ipath_dbg("invalid attempt to restart SDMA, status 0x%016llx\n",
+			dd->ipath_sdma_status);
+		goto bail;
+	}
+	spin_lock_irqsave(&dd->ipath_sendctrl_lock, flags);
+	/*
+	 * First clear, just to be safe. Enable is only done
+	 * in chip on 0->1 transition
+	 */
+	dd->ipath_sendctrl &= ~INFINIPATH_S_SDMAENABLE;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	dd->ipath_sendctrl |= INFINIPATH_S_SDMAENABLE;
+	ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl, dd->ipath_sendctrl);
+	ipath_read_kreg64(dd, dd->ipath_kregs->kr_scratch);
+	spin_unlock_irqrestore(&dd->ipath_sendctrl_lock, flags);
+
+bail:
+	return;
+}
+
 static inline void make_sdma_desc(struct ipath_devdata *dd,
 	u64 *sdmadesc, u64 addr, u64 dwlen, u64 dwoffset)
 {
diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c
index adff2f1..1e36bac 100644
--- a/drivers/infiniband/hw/ipath/ipath_stats.c
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c
@@ -292,8 +292,8 @@ void ipath_get_faststats(unsigned long opaque)
 	    && time_after(jiffies, dd->ipath_unmasktime)) {
 		char ebuf[256];
 		int iserr;
-		iserr = ipath_decode_err(ebuf, sizeof ebuf,
-			dd->ipath_maskederrs);
+		iserr = ipath_decode_err(dd, ebuf, sizeof ebuf,
+					 dd->ipath_maskederrs);
 		if (dd->ipath_maskederrs &
 		    ~(INFINIPATH_E_RRCVEGRFULL | INFINIPATH_E_RRCVHDRFULL |
 		      INFINIPATH_E_PKTERRS))
diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c
index de67eed..4d4d58d 100644
--- a/drivers/infiniband/hw/ipath/ipath_ud.c
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c
@@ -303,6 +303,7 @@ int ipath_make_ud_req(struct ipath_qp *qp)
 	qp->s_hdrwords = 7;
 	qp->s_cur_size = wqe->length;
 	qp->s_cur_sge = &qp->s_sge;
+	qp->s_dmult = ah_attr->static_rate;
 	qp->s_wqe = wqe;
 	qp->s_sge.sge = wqe->sg_list[0];
 	qp->s_sge.sg_list = wqe->sg_list + 1;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 2e6b6f6..434a0d8 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -242,6 +242,93 @@ static void ipath_flush_wqe(struct ipath_qp *qp, struct ib_send_wr *wr)
 	ipath_cq_enter(to_icq(qp->ibqp.send_cq), &wc, 1);
 }
 
+/*
+ * Count the number of DMA descriptors needed to send length bytes of data.
+ * Don't modify the ipath_sge_state to get the count.
+ * Return zero if any of the segments is not aligned.
+ */
+static u32 ipath_count_sge(struct ipath_sge_state *ss, u32 length)
+{
+	struct ipath_sge *sg_list = ss->sg_list;
+	struct ipath_sge sge = ss->sge;
+	u8 num_sge = ss->num_sge;
+	u32 ndesc = 1;	/* count the header */
+
+	while (length) {
+		u32 len = sge.length;
+
+		if (len > length)
+			len = length;
+		if (len > sge.sge_length)
+			len = sge.sge_length;
+		BUG_ON(len == 0);
+		if (((long) sge.vaddr & (sizeof(u32) - 1)) ||
+		    (len != length && (len & (sizeof(u32) - 1)))) {
+			ndesc = 0;
+			break;
+		}
+		ndesc++;
+		sge.vaddr += len;
+		sge.length -= len;
+		sge.sge_length -= len;
+		if (sge.sge_length == 0) {
+			if (--num_sge)
+				sge = *sg_list++;
+		} else if (sge.length == 0 && sge.mr != NULL) {
+			if (++sge.n >= IPATH_SEGSZ) {
+				if (++sge.m >= sge.mr->mapsz)
+					break;
+				sge.n = 0;
+			}
+			sge.vaddr =
+				sge.mr->map[sge.m]->segs[sge.n].vaddr;
+			sge.length =
+				sge.mr->map[sge.m]->segs[sge.n].length;
+		}
+		length -= len;
+	}
+	return ndesc;
+}
+
+/*
+ * Copy from the SGEs to the data buffer.
+ */
+static void ipath_copy_from_sge(void *data, struct ipath_sge_state *ss,
+				u32 length)
+{
+	struct ipath_sge *sge = &ss->sge;
+
+	while (length) {
+		u32 len = sge->length;
+
+		if (len > length)
+			len = length;
+		if (len > sge->sge_length)
+			len = sge->sge_length;
+		BUG_ON(len == 0);
+		memcpy(data, sge->vaddr, len);
+		sge->vaddr += len;
+		sge->length -= len;
+		sge->sge_length -= len;
+		if (sge->sge_length == 0) {
+			if (--ss->num_sge)
+				*sge = *ss->sg_list++;
+		} else if (sge->length == 0 && sge->mr != NULL) {
+			if (++sge->n >= IPATH_SEGSZ) {
+				if (++sge->m >= sge->mr->mapsz)
+					break;
+				sge->n = 0;
+			}
+			sge->vaddr =
+				sge->mr->map[sge->m]->segs[sge->n].vaddr;
+			sge->length =
+				sge->mr->map[sge->m]->segs[sge->n].length;
+		}
+		data += len;
+		length -= len;
+	}
+}
+
 /**
  * ipath_post_one_send - post one RC, UC, or UD send work request
  * @qp: the QP to post on
@@ -866,13 +953,231 @@ static void copy_io(u32 __iomem *piobuf, struct ipath_sge_state *ss,
 		__raw_writel(last, piobuf);
 }
 
-static int ipath_verbs_send_pio(struct ipath_qp *qp, u32 *hdr, u32 hdrwords,
+/*
+ * Convert IB rate to delay multiplier.
+ */
+unsigned ipath_ib_rate_to_mult(enum ib_rate rate)
+{
+	switch (rate) {
+	case IB_RATE_2_5_GBPS: return 8;
+	case IB_RATE_5_GBPS:   return 4;
+	case IB_RATE_10_GBPS:  return 2;
+	case IB_RATE_20_GBPS:  return 1;
+	default:	       return 0;
+	}
+}
+
+/*
+ * Convert delay multiplier to IB rate
+ */
+enum ib_rate ipath_mult_to_ib_rate(unsigned mult)
+{
+	switch (mult) {
+	case 8:  return IB_RATE_2_5_GBPS;
+	case 4:  return IB_RATE_5_GBPS;
+	case 2:  return IB_RATE_10_GBPS;
+	case 1:  return IB_RATE_20_GBPS;
+	default: return IB_RATE_PORT_CURRENT;
+	}
+}
+
+static inline struct ipath_verbs_txreq *get_txreq(struct ipath_ibdev *dev)
+{
+	struct ipath_verbs_txreq *tx = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->pending_lock, flags);
+	if (!list_empty(&dev->txreq_free)) {
+		struct list_head *l = dev->txreq_free.next;
+
+		list_del(l);
+		tx = list_entry(l, struct ipath_verbs_txreq, txreq.list);
+	}
+	spin_unlock_irqrestore(&dev->pending_lock, flags);
+	return tx;
+}
+
+static inline void put_txreq(struct ipath_ibdev *dev,
+			     struct ipath_verbs_txreq *tx)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->pending_lock, flags);
+	list_add(&tx->txreq.list, &dev->txreq_free);
+	spin_unlock_irqrestore(&dev->pending_lock, flags);
+}
+
+static void sdma_complete(void *cookie, int status)
+{
+	struct ipath_verbs_txreq *tx = cookie;
+	struct ipath_qp *qp = tx->qp;
+	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
+
+	/* Generate a completion queue entry if needed */
+	if (qp->ibqp.qp_type != IB_QPT_RC && tx->wqe) {
+		enum ib_wc_status ibs = status == IPATH_SDMA_TXREQ_S_OK ?
+			IB_WC_SUCCESS : IB_WC_WR_FLUSH_ERR;
+
+		ipath_send_complete(qp, tx->wqe, ibs);
+	}
+
+	if (tx->txreq.flags & IPATH_SDMA_TXREQ_F_FREEBUF)
+		kfree(tx->txreq.map_addr);
+	put_txreq(dev, tx);
+
+	if (atomic_dec_and_test(&qp->refcount))
+		wake_up(&qp->wait);
+}
+
+/*
+ * Compute the number of clock cycles of delay before sending the next packet.
+ * The multipliers reflect the number of clocks for the fastest rate so
+ * one tick at 4xDDR is 8 ticks at 1xSDR.
+ * If the destination port will take longer to receive a packet than
+ * the outgoing link can send it, we need to delay sending the next packet
+ * by the difference in time it takes the receiver to receive and the sender
+ * to send this packet.
+ * Note that this delay is always correct for UC and RC but not always
+ * optimal for UD. For UD, the destination HCA can be different for each
+ * packet, in which case, we could send packets to a different destination
+ * while "waiting" for the delay. The overhead for doing this without
+ * HW support is more than just paying the cost of delaying some packets
+ * unnecessarily.
+ */
+static inline unsigned ipath_pkt_delay(u32 plen, u8 snd_mult, u8 rcv_mult)
+{
+	return (rcv_mult > snd_mult) ?
+		(plen * (rcv_mult - snd_mult) + 1) >> 1 : 0;
+}
+
+static int ipath_verbs_send_dma(struct ipath_qp *qp,
+				struct ipath_ib_header *hdr, u32 hdrwords,
+				struct ipath_sge_state *ss, u32 len,
+				u32 plen, u32 dwords)
+{
+	struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
+	struct ipath_devdata *dd = dev->dd;
+	struct ipath_verbs_txreq *tx;
+	u32 *piobuf;
+	u32 control;
+	u32 ndesc;
+	int ret;
+
+	tx = qp->s_tx;
+	if (tx) {
+		qp->s_tx = NULL;
+		/* resend previously constructed packet */
+		ret = ipath_sdma_verbs_send(dd, tx->ss, tx->len, tx);
+		if (ret)
+			qp->s_tx = tx;
+		goto bail;
+	}
+
+	tx = get_txreq(dev);
+	if (!tx) {
+		ret = -EBUSY;
+		goto bail;
+	}
+
+	/*
+	 * Get the saved delay count we computed for the previous packet
+	 * and save the delay count for this packet to be used next time
+	 * we get here.
+	 */
+	control = qp->s_pkt_delay;
+	qp->s_pkt_delay = ipath_pkt_delay(plen, dd->delay_mult, qp->s_dmult);
+
+	tx->qp = qp;
+	atomic_inc(&qp->refcount);
+	tx->wqe = qp->s_wqe;
+	tx->txreq.callback = sdma_complete;
+	tx->txreq.callback_cookie = tx;
+	tx->txreq.flags = IPATH_SDMA_TXREQ_F_HEADTOHOST |
+		IPATH_SDMA_TXREQ_F_INTREQ | IPATH_SDMA_TXREQ_F_FREEDESC;
+	if (plen + 1 >= IPATH_SMALLBUF_DWORDS)
+		tx->txreq.flags |= IPATH_SDMA_TXREQ_F_USELARGEBUF;
+
+	/* VL15 packets bypass credit check */
+	if ((be16_to_cpu(hdr->lrh[0]) >> 12) == 15) {
+		control |= 1ULL << 31;
+		tx->txreq.flags |= IPATH_SDMA_TXREQ_F_VL15;
+	}
+
+	if (len) {
+		/*
+		 * Don't try to DMA if it takes more descriptors than
+		 * the queue holds.
+		 */
+		ndesc = ipath_count_sge(ss, len);
+		if (ndesc >= dd->ipath_sdma_descq_cnt)
+			ndesc = 0;
+	} else
+		ndesc = 1;
+	if (ndesc) {
+		tx->hdr.pbc[0] = cpu_to_le32(plen);
+		tx->hdr.pbc[1] = cpu_to_le32(control);
+		memcpy(&tx->hdr.hdr, hdr, hdrwords << 2);
+		tx->txreq.sg_count = ndesc;
+		tx->map_len = (hdrwords + 2) << 2;
+		tx->txreq.map_addr = &tx->hdr;
+		ret = ipath_sdma_verbs_send(dd, ss, dwords, tx);
+		if (ret) {
+			/* save ss and length in dwords */
+			tx->ss = ss;
+			tx->len = dwords;
+			qp->s_tx = tx;
+		}
+		goto bail;
+	}
+
+	/* Allocate a buffer and copy the header and payload to it. */
+	tx->map_len = (plen + 1) << 2;
+	piobuf = kmalloc(tx->map_len, GFP_ATOMIC);
+	if (unlikely(piobuf == NULL)) {
+		ret = -EBUSY;
+		goto err_tx;
+	}
+	tx->txreq.map_addr = piobuf;
+	tx->txreq.flags |= IPATH_SDMA_TXREQ_F_FREEBUF;
+	tx->txreq.sg_count = 1;
+
+	*piobuf++ = (__force u32) cpu_to_le32(plen);
+	*piobuf++ = (__force u32) cpu_to_le32(control);
+	memcpy(piobuf, hdr, hdrwords << 2);
+	ipath_copy_from_sge(piobuf + hdrwords, ss, len);
+
+	ret = ipath_sdma_verbs_send(dd, NULL, 0, tx);
+	/*
+	 * If we couldn't queue the DMA request, save the info
+	 * and try again later rather than destroying the
+	 * buffer and undoing the side effects of the copy.
+	 */
+	if (ret) {
+		tx->ss = NULL;
+		tx->len = 0;
+		qp->s_tx = tx;
+	}
+	dev->n_unaligned++;
+	goto bail;
+
+err_tx:
+	if (atomic_dec_and_test(&qp->refcount))
+		wake_up(&qp->wait);
+	put_txreq(dev, tx);
+bail:
+	return ret;
+}
+
+static int ipath_verbs_send_pio(struct ipath_qp *qp,
+				struct ipath_ib_header *ibhdr, u32 hdrwords,
 				struct ipath_sge_state *ss, u32 len,
 				u32 plen, u32 dwords)
 {
 	struct ipath_devdata *dd = to_idev(qp->ibqp.device)->dd;
+	u32 *hdr = (u32 *) ibhdr;
 	u32 __iomem *piobuf;
 	unsigned flush_wc;
+	u32 control;
 	int ret;
 
 	piobuf = ipath_getpiobuf(dd, plen, NULL);
@@ -882,11 +1187,23 @@ static int ipath_verbs_send_pio(struct ipath_qp *qp, u32 *hdr, u32 hdrwords,
 	}
 
 	/*
-	 * Write len to control qword, no flags.
+	 * Get the saved delay count we computed for the previous packet
+	 * and save the delay count for this packet to be used next time
+	 * we get here.
+	 */
+	control = qp->s_pkt_delay;
+	qp->s_pkt_delay = ipath_pkt_delay(plen, dd->delay_mult, qp->s_dmult);
+
+	/* VL15 packets bypass credit check */
+	if ((be16_to_cpu(ibhdr->lrh[0]) >> 12) == 15)
+		control |= 1ULL << 31;
+
+	/*
+	 * Write the length to the control qword plus any needed flags.
 	 * We have to flush after the PBC for correctness on some cpus
 	 * or WC buffer can be written out of order.
 	 */
-	writeq(plen, piobuf);
+	writeq(((u64) control << 32) | plen, piobuf);
 	piobuf += 2;
 
 	flush_wc = dd->ipath_flags & IPATH_PIO_FLUSH_WC;
@@ -961,15 +1278,25 @@ int ipath_verbs_send(struct ipath_qp *qp, struct ipath_ib_header *hdr,
 	 */
 	plen = hdrwords + dwords + 1;
 
-	/* Drop non-VL15 packets if we are not in the active state */
-	if (!(dd->ipath_flags & IPATH_LINKACTIVE) &&
-	    qp->ibqp.qp_type != IB_QPT_SMI) {
+	/*
+	 * VL15 packets (IB_QPT_SMI) will always use PIO, so we
+	 * can defer SDMA restart until link goes ACTIVE without
+	 * worrying about just how we got there.
+	 */
+	if (qp->ibqp.qp_type == IB_QPT_SMI)
+		ret = ipath_verbs_send_pio(qp, hdr, hdrwords, ss, len,
+					   plen, dwords);
+	/* All non-VL15 packets are dropped if link is not ACTIVE */
+	else if (!(dd->ipath_flags & IPATH_LINKACTIVE)) {
 		if (qp->s_wqe)
 			ipath_send_complete(qp, qp->s_wqe, IB_WC_SUCCESS);
 		ret = 0;
-	} else
-		ret = ipath_verbs_send_pio(qp, (u32 *) hdr, hdrwords,
-					   ss, len, plen, dwords);
+	} else if (dd->ipath_flags & IPATH_HAS_SEND_DMA)
+		ret = ipath_verbs_send_dma(qp, hdr, hdrwords, ss, len,
+					   plen, dwords);
+	else
+		ret = ipath_verbs_send_pio(qp, hdr, hdrwords, ss, len,
+					   plen, dwords);
 
 	return ret;
 }
@@ -1038,6 +1365,12 @@ int ipath_get_counters(struct ipath_devdata *dd,
 		ipath_snap_cntr(dd, crp->cr_errlpcrccnt) +
 		ipath_snap_cntr(dd, crp->cr_badformatcnt) +
 		dd->ipath_rxfc_unsupvl_errs;
+	if (crp->cr_rxotherlocalphyerrcnt)
+		cntrs->port_rcv_errors +=
+			ipath_snap_cntr(dd, crp->cr_rxotherlocalphyerrcnt);
+	if (crp->cr_rxvlerrcnt)
+		cntrs->port_rcv_errors +=
+			ipath_snap_cntr(dd, crp->cr_rxvlerrcnt);
 	cntrs->port_rcv_remphys_errors =
 		ipath_snap_cntr(dd, crp->cr_rcvebpcnt);
 	cntrs->port_xmit_discards = ipath_snap_cntr(dd, crp->cr_unsupvlcnt);
@@ -1046,9 +1379,16 @@ int ipath_get_counters(struct ipath_devdata *dd,
 	cntrs->port_xmit_packets = ipath_snap_cntr(dd, crp->cr_pktsendcnt);
 	cntrs->port_rcv_packets = ipath_snap_cntr(dd, crp->cr_pktrcvcnt);
 	cntrs->local_link_integrity_errors =
-		(dd->ipath_flags & IPATH_GPIO_ERRINTRS) ?
-		dd->ipath_lli_errs : dd->ipath_lli_errors;
-	cntrs->excessive_buffer_overrun_errors = dd->ipath_overrun_thresh_errs;
+		crp->cr_locallinkintegrityerrcnt ?
+		ipath_snap_cntr(dd, crp->cr_locallinkintegrityerrcnt) :
+		((dd->ipath_flags & IPATH_GPIO_ERRINTRS) ?
+		 dd->ipath_lli_errs : dd->ipath_lli_errors);
+	cntrs->excessive_buffer_overrun_errors =
+		crp->cr_excessbufferovflcnt ?
+		ipath_snap_cntr(dd, crp->cr_excessbufferovflcnt) :
+		dd->ipath_overrun_thresh_errs;
+	cntrs->vl15_dropped = crp->cr_vl15droppedpktcnt ?
+		ipath_snap_cntr(dd, crp->cr_vl15droppedpktcnt) : 0;
 
 	ret = 0;
 
@@ -1396,6 +1736,7 @@ static struct ib_ah *ipath_create_ah(struct ib_pd *pd,
 
 	/* ib_create_ah() will initialize ah->ibah. */
 	ah->attr = *ah_attr;
+	ah->attr.static_rate = ipath_ib_rate_to_mult(ah_attr->static_rate);
 
 	ret = &ah->ibah;
 
@@ -1429,6 +1770,7 @@ static int ipath_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
 	struct ipath_ah *ah = to_iah(ibah);
 
 	*ah_attr = ah->attr;
+	ah_attr->static_rate = ipath_mult_to_ib_rate(ah->attr.static_rate);
 
 	return 0;
 }
@@ -1578,6 +1920,8 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 	struct ipath_verbs_counters cntrs;
 	struct ipath_ibdev *idev;
 	struct ib_device *dev;
+	struct ipath_verbs_txreq *tx;
+	unsigned i;
 	int ret;
 
 	idev = (struct ipath_ibdev *)ib_alloc_device(sizeof *idev);
@@ -1588,6 +1932,17 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 
 	dev = &idev->ibdev;
 
+	if (dd->ipath_sdma_descq_cnt) {
+		tx = kmalloc(dd->ipath_sdma_descq_cnt * sizeof *tx,
+			     GFP_KERNEL);
+		if (tx == NULL) {
+			ret = -ENOMEM;
+			goto err_tx;
+		}
+	} else
+		tx = NULL;
+	idev->txreq_bufs = tx;
+
 	/* Only need to initialize non-zero fields. */
 	spin_lock_init(&idev->n_pds_lock);
 	spin_lock_init(&idev->n_ahs_lock);
@@ -1628,6 +1983,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 	INIT_LIST_HEAD(&idev->pending[2]);
 	INIT_LIST_HEAD(&idev->piowait);
 	INIT_LIST_HEAD(&idev->rnrwait);
+	INIT_LIST_HEAD(&idev->txreq_free);
 	idev->pending_index = 0;
 	idev->port_cap_flags =
 		IB_PORT_SYS_IMAGE_GUID_SUP | IB_PORT_CLIENT_REG_SUP;
@@ -1659,6 +2015,9 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 		cntrs.excessive_buffer_overrun_errors;
 	idev->z_vl15_dropped = cntrs.vl15_dropped;
 
+	for (i = 0; i < dd->ipath_sdma_descq_cnt; i++, tx++)
+		list_add(&tx->txreq.list, &idev->txreq_free);
+
 	/*
 	 * The system image GUID is supposed to be the same for all
 	 * IB HCAs in a single system but since there can be other
@@ -1708,6 +2067,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 	dev->phys_port_cnt = 1;
 	dev->num_comp_vectors = 1;
 	dev->dma_device = &dd->pcidev->dev;
+	dev->class_dev.dev = dev->dma_device;
 	dev->query_device = ipath_query_device;
 	dev->modify_device = ipath_modify_device;
 	dev->query_port = ipath_query_port;
@@ -1772,6 +2132,8 @@ err_reg:
 err_lk:
 	kfree(idev->qp_table.table);
 err_qp:
+	kfree(idev->txreq_bufs);
+err_tx:
 	ib_dealloc_device(dev);
 	ipath_dev_err(dd, "cannot register verbs: %d!\n", -ret);
 	idev = NULL;
@@ -1806,6 +2168,7 @@ void ipath_unregister_ib_device(struct ipath_ibdev *dev)
 	ipath_free_all_qps(&dev->qp_table);
 	kfree(dev->qp_table.table);
 	kfree(dev->lk_table.table);
+	kfree(dev->txreq_bufs);
 	ib_dealloc_device(ibdev);
 }
 
@@ -1853,13 +2216,15 @@ static ssize_t show_stats(struct class_device *cdev, char *buf)
 		      "RC stalls   %d\n"
 		      "piobuf wait %d\n"
 		      "no piobuf   %d\n"
+		      "unaligned   %d\n"
 		      "PKT drops   %d\n"
 		      "WQE errs    %d\n",
 		      dev->n_rc_resends, dev->n_rc_qacks, dev->n_rc_acks,
 		      dev->n_seq_naks, dev->n_rdma_seq, dev->n_rnr_naks,
 		      dev->n_other_naks, dev->n_timeouts,
 		      dev->n_rdma_dup_busy, dev->n_rc_stalls, dev->n_piowait,
-		      dev->n_no_piobuf, dev->n_pkt_drops, dev->n_wqe_errs);
+		      dev->n_no_piobuf, dev->n_unaligned,
+		      dev->n_pkt_drops, dev->n_wqe_errs);
 	for (i = 0; i < ARRAY_SIZE(dev->opstats); i++) {
 		const struct ipath_opcode_stats *si = &dev->opstats[i];
 

From ralph.campbell at qlogic.com  Wed Apr  2 15:50:43 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 02 Apr 2008 15:50:43 -0700
Subject: [ofa-general] [PATCH 20/20] IB/ipath - Update copyright dates for
	files changed in 2008
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <20080402225043.28598.98936.stgit@eng-46.mv.qlogic.com>

This patch updates the copyright date for files modified in 2008.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
---

 drivers/infiniband/hw/ipath/ipath_common.h    |    2 +-
 drivers/infiniband/hw/ipath/ipath_diag.c      |    2 +-
 drivers/infiniband/hw/ipath/ipath_driver.c    |    2 +-
 drivers/infiniband/hw/ipath/ipath_eeprom.c    |    2 +-
 drivers/infiniband/hw/ipath/ipath_file_ops.c  |    2 +-
 drivers/infiniband/hw/ipath/ipath_iba6120.c   |    2 +-
 drivers/infiniband/hw/ipath/ipath_init_chip.c |    2 +-
 drivers/infiniband/hw/ipath/ipath_intr.c      |    2 +-
 drivers/infiniband/hw/ipath/ipath_kernel.h    |    2 +-
 drivers/infiniband/hw/ipath/ipath_mad.c       |    2 +-
 drivers/infiniband/hw/ipath/ipath_qp.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_rc.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_srq.c       |    2 +-
 drivers/infiniband/hw/ipath/ipath_stats.c     |    2 +-
 drivers/infiniband/hw/ipath/ipath_sysfs.c     |    2 +-
 drivers/infiniband/hw/ipath/ipath_ud.c        |    2 +-
 drivers/infiniband/hw/ipath/ipath_verbs.c     |    2 +-
 drivers/infiniband/hw/ipath/ipath_verbs.h     |    2 +-
 18 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_common.h b/drivers/infiniband/hw/ipath/ipath_common.h
index 2cf7cd2..28cfe97 100644
--- a/drivers/infiniband/hw/ipath/ipath_common.h
+++ b/drivers/infiniband/hw/ipath/ipath_common.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_diag.c b/drivers/infiniband/hw/ipath/ipath_diag.c
index c9bfd82..6d49d2f 100644
--- a/drivers/infiniband/hw/ipath/ipath_diag.c
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 8ccc915..9121529 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_eeprom.c b/drivers/infiniband/hw/ipath/ipath_eeprom.c
index 72f90e8..dc37277 100644
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_file_ops.c b/drivers/infiniband/hw/ipath/ipath_file_ops.c
index d38ba29..1e627aa 100644
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_iba6120.c b/drivers/infiniband/hw/ipath/ipath_iba6120.c
index c8d8f1a..421cc2a 100644
--- a/drivers/infiniband/hw/ipath/ipath_iba6120.c
+++ b/drivers/infiniband/hw/ipath/ipath_iba6120.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_init_chip.c b/drivers/infiniband/hw/ipath/ipath_init_chip.c
index b43c2a1..27dd894 100644
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_intr.c b/drivers/infiniband/hw/ipath/ipath_intr.c
index d0088d5..1b58f47 100644
--- a/drivers/infiniband/hw/ipath/ipath_intr.c
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_kernel.h b/drivers/infiniband/hw/ipath/ipath_kernel.h
index a4857b9..f856015 100644
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h
@@ -1,7 +1,7 @@
 #ifndef _IPATH_KERNEL_H
 #define _IPATH_KERNEL_H
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_mad.c b/drivers/infiniband/hw/ipath/ipath_mad.c
index babcc09..1ff46ae 100644
--- a/drivers/infiniband/hw/ipath/ipath_mad.c
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_qp.c b/drivers/infiniband/hw/ipath/ipath_qp.c
index ded970b..dd5b6e9 100644
--- a/drivers/infiniband/hw/ipath/ipath_qp.c
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index f765d48..4679819 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_srq.c b/drivers/infiniband/hw/ipath/ipath_srq.c
index 3366d66..e3d80ca 100644
--- a/drivers/infiniband/hw/ipath/ipath_srq.c
+++ b/drivers/infiniband/hw/ipath/ipath_srq.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_stats.c b/drivers/infiniband/hw/ipath/ipath_stats.c
index 1e36bac..c8e3d65 100644
--- a/drivers/infiniband/hw/ipath/ipath_stats.c
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2003, 2004, 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_sysfs.c b/drivers/infiniband/hw/ipath/ipath_sysfs.c
index 2e6d2aa..a6c8efb 100644
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c
index 4d4d58d..918f520 100644
--- a/drivers/infiniband/hw/ipath/ipath_ud.c
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 434a0d8..d174694 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 056e741..65ddfc9 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2006, 2007 QLogic Corporation. All rights reserved.
+ * Copyright (c) 2006, 2007, 2008 QLogic Corporation. All rights reserved.
  * Copyright (c) 2005, 2006 PathScale, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two


From Karla at qldnet.com.au  Wed Apr  2 15:53:20 2008
From: Karla at qldnet.com.au (Karla Thomason)
Date: Wed, 02 Apr 2008 22:53:20 -0000
Subject: [ofa-general] Flawless design rep1!c@s
Message-ID: <01a201c9b3e5$8285cfa0$c0a80144@Karla>

(ankle). Striker Alan Smith was on the bench for United"It's 12 years since I stood for elected office...and

Check out our impressive line of perfectly crafted fake designer chronometers!
Get amazingly good value for your money!http://werenbebvs.com/


John O'Shea was played. Lille started the game with aadvocacy groups.on aggregate. The aggregate loss concluded a sorrowfulsending them beyond the reach of law.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080402/59fca84c/attachment.html>

From sweitzen at cisco.com  Wed Apr  2 16:00:20 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 16:00:20 -0700
Subject: [ofa-general] how do I use uDAPL with iWARP?
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>

I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC.  I have
ib_rdma_lat working, so I know IB verbs are working.
 
How do I use uDAPL, though?  All the default /etc/dat.conf entries have
IPoIB or bonding interfaces in them.
 
Scott Weitzenkamp
SQA and Release Manager
Data Center Access Engineering
Cisco Systems


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080402/ae8b71f2/attachment.html>

From jbernstein at penguincomputing.com  Wed Apr  2 16:04:53 2008
From: jbernstein at penguincomputing.com (Joshua Bernstein)
Date: Wed, 2 Apr 2008 16:04:53 -0700
Subject: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
Message-ID: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>

Scott,

On Apr 2, 2008, at 4:00 PM, Scott Weitzenkamp (sweitzen) wrote:
> I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC.  I have  
> ib_rdma_lat working, so I know IB verbs are working.
>
> How do I use uDAPL, though?  All the default /etc/dat.conf entries  
> have IPoIB or bonding interfaces in them.

What you will want to do is edit /etc/ofed/dat64.conf or other  
related dat.conf file and change the name of the device from "ib0" to  
the name of the interface that the Chelsio card came up as. So for  
example with my NetXen cards coming up at eth2, so for example the  
first two lines of my /etc/ofed/dat64.conf file look like this:

OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. 
1.2 "eth2 0" ""
#OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. 
1.2 "ib0 0" ""

Notice how I've commented out the ib0 line and simply changed that to  
be eth2. Then you can use say HP-MPI for example using the -UDAPL  
option. Other MPI stacks have similar methods of telling them to use  
the UDAPL transport.

-Joshua Bernstein
Software Engineer
Penguin Computing


From clameter at sgi.com  Wed Apr  2 16:04:42 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 16:04:42 -0700 (PDT)
Subject: [ofa-general] Re: [patch 1/9] EMM Notifier: The notifier calls
In-Reply-To: <20080402220936.GW19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<20080402215334.GT19189@duo.random>
	<Pine.LNX.4.64.0804021453350.31247@schroedinger.engr.sgi.com>
	<20080402220936.GW19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021551500.32273@schroedinger.engr.sgi.com>

On Thu, 3 Apr 2008, Andrea Arcangeli wrote:

> I said try_to_unmap_cluster, not get_user_pages.
> 
>   CPU0					CPU1
>   try_to_unmap_cluster:
>   emm_invalidate_start in EMM (or mmu_notifier_invalidate_range_start in #v10)
>   walking the list by hand in EMM (or with hlist cleaner in #v10)
>   xpmem method invoked
>   schedule for a long while inside invalidate_range_start while skbs are sent
> 					gru registers
> 					synchronize_rcu (sorry useless now)

All of this would be much easier if you could stop the drivel. The sync 
rcu was for an earlier release of the mmu notifier. Why the sniping?

> 					single threaded, so taking a page fault
>   					secondary tlb instantiated

The driver must not allow faults to occur between start and end. The 
trouble here is that GRU and xpmem are mixed. If CPU0 would have been 
running GRU instead of XPMEM then the fault would not have occurred 
because the gru would have noticed that a range op is active. If both
systems would have run xpmem then the same would have worked.
 
I guess this means that an address space cannot reliably registered to 
multiple subsystems if some of those do not take a refcount. If all 
drivers would be required to take a refcount then this would also not 
occur.

> In general my #v10 solution mixing seqlock + rcu looks more robust and
> allows multithreaded attachment of mmu notifers as well. I could have

Well its easy to say that if no one else has looked at it yet. I expressed 
some concerns in reply to your post of #v10.


From sweitzen at cisco.com  Wed Apr  2 16:07:37 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 16:07:37 -0700
Subject: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>

I tried that, and it didn't work:

[root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0"
""
[root at svbu-qa2950-1 ~]# dtest
10194 Running as server - OpenIB-cma
10194 Error dat_ep_create: DAT_INVALID_HANDLE
10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP

10194: DAPL Test Complete.

10194: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
usec, pc=
0
10194: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
usec, pc=
0
10194: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
10194: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
10194: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
10194: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
10194: open:        32254.93 usec
10194: close:       31936.17 usec
10194: PZ create:       7.15 usec
10194: PZ free:         4.05 usec
10194: LMR create:     36.00 usec
10194: LMR free:       22.89 usec
10194: EVD create:      6.91 usec
10194: EVD free:       11.92 usec
10194: EP create:      28.85 usec
10194: EP free:         0.00 usec
10194: TOTAL:         106.57 usec

Scott Weitzenkamp
SQA and Release Manager
Data Center Access Engineering
Cisco Systems


> -----Original Message-----
> From: Joshua Bernstein [mailto:jbernstein at penguincomputing.com] 
> Sent: Wednesday, April 02, 2008 4:05 PM
> To: Scott Weitzenkamp (sweitzen)
> Cc: [ofa_general]; OpenFabrics EWG
> Subject: Re: [ofa-general] how do I use uDAPL with iWARP?
> 
> Scott,
> 
> On Apr 2, 2008, at 4:00 PM, Scott Weitzenkamp (sweitzen) wrote:
> > I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC.  I have  
> > ib_rdma_lat working, so I know IB verbs are working.
> >
> > How do I use uDAPL, though?  All the default /etc/dat.conf entries  
> > have IPoIB or bonding interfaces in them.
> 
> What you will want to do is edit /etc/ofed/dat64.conf or other  
> related dat.conf file and change the name of the device from 
> "ib0" to  
> the name of the interface that the Chelsio card came up as. So for  
> example with my NetXen cards coming up at eth2, so for example the  
> first two lines of my /etc/ofed/dat64.conf file look like this:
> 
> OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. 
> 1.2 "eth2 0" ""
> #OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl. 
> 1.2 "ib0 0" ""
> 
> Notice how I've commented out the ib0 line and simply changed 
> that to  
> be eth2. Then you can use say HP-MPI for example using the -UDAPL  
> option. Other MPI stacks have similar methods of telling them to use  
> the UDAPL transport.
> 
> -Joshua Bernstein
> Software Engineer
> Penguin Computing
> 


From jbernstein at penguincomputing.com  Wed Apr  2 16:09:10 2008
From: jbernstein at penguincomputing.com (Joshua Bernstein)
Date: Wed, 2 Apr 2008 16:09:10 -0700
Subject: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
Message-ID: <D9DB4C87-BCAF-4500-A8BF-830A6F89B5B3@penguincomputing.com>


On Apr 2, 2008, at 4:07 PM, Scott Weitzenkamp (sweitzen) wrote:
> I tried that, and it didn't work:
>
> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2  
> "eth2 0"
> ""
> [root at svbu-qa2950-1 ~]# dtest
> 10194 Running as server - OpenIB-cma
> 10194 Error dat_ep_create: DAT_INVALID_HANDLE
> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP

Ah, it is using the correct device then. Do you have the rdma_ucm  
modules loaded?

-Josh

>> -----Original Message-----
>> From: Joshua Bernstein [mailto:jbernstein at penguincomputing.com]
>> Sent: Wednesday, April 02, 2008 4:05 PM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: [ofa_general]; OpenFabrics EWG
>> Subject: Re: [ofa-general] how do I use uDAPL with iWARP?
>>
>> Scott,
>>
>> On Apr 2, 2008, at 4:00 PM, Scott Weitzenkamp (sweitzen) wrote:
>>> I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC.  I have
>>> ib_rdma_lat working, so I know IB verbs are working.
>>>
>>> How do I use uDAPL, though?  All the default /etc/dat.conf entries
>>> have IPoIB or bonding interfaces in them.
>>
>> What you will want to do is edit /etc/ofed/dat64.conf or other
>> related dat.conf file and change the name of the device from
>> "ib0" to
>> the name of the interface that the Chelsio card came up as. So for
>> example with my NetXen cards coming up at eth2, so for example the
>> first two lines of my /etc/ofed/dat64.conf file look like this:
>>
>> OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.
>> 1.2 "eth2 0" ""
>> #OpenIB-cma u1.2 nonthreadsafe default /usr/lib64/libdaplcma.so dapl.
>> 1.2 "ib0 0" ""
>>
>> Notice how I've commented out the ib0 line and simply changed
>> that to
>> be eth2. Then you can use say HP-MPI for example using the -UDAPL
>> option. Other MPI stacks have similar methods of telling them to use
>> the UDAPL transport.
>>
>> -Joshua Bernstein
>> Software Engineer
>> Penguin Computing
>>


From sweitzen at cisco.com  Wed Apr  2 16:28:31 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Wed, 2 Apr 2008 16:28:31 -0700
Subject: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <D9DB4C87-BCAF-4500-A8BF-830A6F89B5B3@penguincomputing.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
	<D9DB4C87-BCAF-4500-A8BF-830A6F89B5B3@penguincomputing.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554751D@xmb-sjc-216.amer.cisco.com>

> > I tried that, and it didn't work:
> >
> > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
> > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2  
> > "eth2 0"
> > ""
> > [root at svbu-qa2950-1 ~]# dtest
> > 10194 Running as server - OpenIB-cma
> > 10194 Error dat_ep_create: DAT_INVALID_HANDLE
> > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
> 
> Ah, it is using the correct device then. Do you have the rdma_ucm  
> modules loaded?

Yes, I do:

[root at svbu-qa2950-1 ~]# lsmod | grep cm
rdma_ucm               47232  0
ib_uverbs              75568  1 rdma_ucm
rdma_cm                67348  2 rdma_ucm,ib_sdp
ib_cm                  67496  2 ib_ipoib,rdma_cm
iw_cm                  43656  1 rdma_cm
ib_sa                  74632  3 ib_ipoib,rdma_cm,ib_cm
ib_mad                 70948  5 ib_umad,mlx4_ib,ib_mthca,ib_cm,ib_sa
ib_core                97664  13
rdma_ucm,ib_sdp,ib_ipoib,ib_uverbs,ib_umad,iw_c
xgb3,mlx4_ib,ib_mthca,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad
ib_addr                41992  1 rdma_cm


From arlin.r.davis at intel.com  Wed Apr  2 17:40:15 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Wed, 2 Apr 2008 17:40:15 -0700
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554751D@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com><43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com><A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com><D9DB4C87-BCAF-4500-A8BF-830A6F89B5B3@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554751D@xmb-sjc-216.amer.cisco.com>
Message-ID: <B0095134066CC94FBC80973103FFA1FE06BD0EEE@orsmsx416.amr.corp.intel.com>

 
>-----Original Message-----
>From: ewg-bounces at lists.openfabrics.org 
>[mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Scott 
>Weitzenkamp (sweitzen)
>Sent: Wednesday, April 02, 2008 4:29 PM
>To: Joshua Bernstein
>Cc: OpenFabrics EWG; [ofa_general]
>Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
>
>> > I tried that, and it didn't work:
>> >
>> > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
>> > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2  
>> > "eth2 0"
>> > ""
>> > [root at svbu-qa2950-1 ~]# dtest
>> > 10194 Running as server - OpenIB-cma
>> > 10194 Error dat_ep_create: DAT_INVALID_HANDLE
>> > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
>> 

Scott,

I don't have any iWARP adapters so I am guessing here. Usually it is 
an attributes issues with QP create. The dtest is possibly setting QP 
attributes beyond the device max values. Can you do a ibv_devinfo -v and

send the output. Also, do you know if this device supports inline_data? 
uDAPL creates the QP with inline_data set to 64 bytes by default. You
can 
override this with enviroment variable DAPL_MAX_INLINE.

Also, uDAPL uses cma. Did you happen to test with "ib_rdma_lat -c" ?

-arlin  


From andrea at qumranet.com  Wed Apr  2 17:42:46 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 02:42:46 +0200
Subject: [ofa-general] Re: [PATCH 1 of 8] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804021527370.31603@schroedinger.engr.sgi.com>
References: <a406c0cc686d0ca94a4d.1207171802@duo.random>
	<Pine.LNX.4.64.0804021527370.31603@schroedinger.engr.sgi.com>
Message-ID: <20080403004246.GA16633@duo.random>

On Wed, Apr 02, 2008 at 03:34:01PM -0700, Christoph Lameter wrote:
> Still two methods ...

Yes, the invalidate_page is called with the core VM holding a
reference on the page _after_ the tlb flush. The invalidate_end is
called after the page has been freed already and after the tlb
flush. They've different semantics and with invalidate_page there's no
need to block the kvm fault handler. But invalidate_page is only
the most efficient for operations that aren't creating holes in the
vma, for the rest invalidate_range_start/end provides the best
performance by reducing the number of tlb flushes.

> seqlock just taken for checking if everything is ok?

Exactly.

> The critical section could be run multiple times for one callback which 
> could result in multiple callbacks to clear the young bit. Guess not that 
> big of an issue?

Yes, that's ok.

> Ok. Retry would try to invalidate the page a second time which is not a 
> problem unless you would drop the refcount or make other state changes 
> that require correspondence with mapping. I guess this is the reason 
> that you stopped adding a refcount?

The current patch using mmu notifiers is already robust against
multiple invalidates. The refcounting represent a spte mapping, if we
already invalidated it, the spte will be nonpresent and there's no
page to unpin. The removal of the refcount is only a
microoptimization.

> Multiple invalidate_range_starts on the same range? This means the driver 
> needs to be able to deal with the situation and ignore the repeated 
> call?

The driver would need to store current->pid in a list and remove it in
range_stop. And range_stop would need to do nothing at all, if the pid
isn't found in the list.

But thinking more I'm not convinced the driver is safe by ignoring if
range_end runs before range_begin (pid not found in the list). And I
don't see a clear way to fix it not internally to the device driver
nor externally. So the repeated call is easy to handle for the
driver. What is not trivial is to block the secondary page faults when
mmu_notifier_register happens in the middle of range_start/end
critical section. sptes can be established in between range_start/_end
and that shouldn't happen. So the core problem returns to be how to
handle mmu_notifier_register happening in the middle of
_range_start/_end, dismissing it as a job for the driver seems not
feasible (you have the same problem with EMM of course).

> Retry can lead to multiple invalidate_range callbacks with the same 
> parameters? Driver needs to ignore if the range is already clear?

Mostly covered above.


From astoundcunningham at rppaccounts.co.uk  Wed Apr  2 16:07:38 2008
From: astoundcunningham at rppaccounts.co.uk (clinten fritz)
Date: Wed, 02 Apr 2008 23:07:38 +0000
Subject: [ofa-general] Buy your pharmaceuticals online. This is smart and
	convenient.
Message-ID: <000901c89525$05d97bd2$40d4de92@emnaqs>

cheapest online drugstore. Verified by visa. Quality and Flexibility.
http://www.google.de/pagead/iclk?sa=l&ai=TTrAEI&num=35947&adurl=http://zjeT.timeminute.com


From clameter at sgi.com  Wed Apr  2 18:03:50 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 18:03:50 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 1 of 8] Core of mmu notifiers
In-Reply-To: <20080403004246.GA16633@duo.random>
References: <a406c0cc686d0ca94a4d.1207171802@duo.random>
	<Pine.LNX.4.64.0804021527370.31603@schroedinger.engr.sgi.com>
	<20080403004246.GA16633@duo.random>
Message-ID: <Pine.LNX.4.64.0804021758010.542@schroedinger.engr.sgi.com>

Thinking about this adventurous locking some more: I think you are 
misunderstanding what a seqlock is. It is *not* a spinlock.

The critical read section with the reading of a version before and after 
allows you access to a certain version of memory how it is or was some 
time ago (caching effect). It does not mean that the current state of 
memory is fixed and neither does it allow syncing when an item is added 
to the list.

So it could be that you are traversing a list that is missing one item 
because it is not visible to this processor yet.

You may just see a state from the past. I would think that you will need a 
real lock in order to get the desired effect.


From clameter at sgi.com  Wed Apr  2 18:24:15 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 2 Apr 2008 18:24:15 -0700 (PDT)
Subject: [ofa-general] EMM: disable other notifiers before register and
	unregister
In-Reply-To: <20080402221716.GY19189@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
Message-ID: <Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>

Ok lets forget about the single theaded thing to solve the registration 
races. As Andrea pointed out this still has ssues with other subscribed 
subsystems (and also try_to_unmap). We could do something like what 
stop_machine_run does: First disable all running subsystems before 
registering a new one.

Maybe this is a possible solution.


Subject: EMM: disable other notifiers before register and unregister

As Andrea has pointed out: There are races during registration if other
subsystem notifiers are active while we register a callback.

Solve that issue by adding two new notifiers:

emm_stop
	Stops the notifier operations. Notifier must block on
	invalidate_start and emm_referenced from this point on.
	If an invalidate_start has not been completed by a call
	to invalidate_end then the driver must wait until the
	operation is complete before returning.

emm_start
	Restart notifier operations.

Before registration all other subscribed subsystems are stopped.
Then the new subsystem is subscribed and things can get running
without consistency issues.

Subsystems are restarted after the lists have been updated.

This also works for unregistering. If we can get all subsystems
to stop then we can also reliably unregister a subsystem. So
provide that callback.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

---
 include/linux/rmap.h |   10 +++++++---
 mm/rmap.c            |   30 ++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/rmap.h
===================================================================
--- linux-2.6.orig/include/linux/rmap.h	2008-04-02 18:16:07.906032549 -0700
+++ linux-2.6/include/linux/rmap.h	2008-04-02 18:17:10.291070009 -0700
@@ -94,7 +94,9 @@ enum emm_operation {
 	emm_release,		/* Process exiting, */
 	emm_invalidate_start,	/* Before the VM unmaps pages */
 	emm_invalidate_end,	/* After the VM unmapped pages */
- 	emm_referenced		/* Check if a range was referenced */
+ 	emm_referenced,		/* Check if a range was referenced */
+	emm_stop,		/* Halt all faults/invalidate_starts */
+	emm_start,		/* Restart operations */
 };
 
 struct emm_notifier {
@@ -126,13 +128,15 @@ static inline int emm_notify(struct mm_s
 
 /*
  * Register a notifier with an mm struct. Release occurs when the process
- * terminates by calling the notifier function with emm_release.
+ * terminates by calling the notifier function with emm_release or when
+ * emm_notifier_unregister is called.
  *
  * Must hold the mmap_sem for write.
  */
 extern void emm_notifier_register(struct emm_notifier *e,
 					struct mm_struct *mm);
-
+extern void emm_notifier_unregister(struct emm_notifier *e,
+					struct mm_struct *mm);
 
 /*
  * Called from mm/vmscan.c to handle paging out
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c	2008-04-02 18:16:09.378057062 -0700
+++ linux-2.6/mm/rmap.c	2008-04-02 18:16:10.710079201 -0700
@@ -289,16 +289,46 @@ void emm_notifier_release(struct mm_stru
 /* Register a notifier */
 void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm)
 {
+	/* Bring all other notifiers into a quiescent state */
+	emm_notify(mm, emm_stop, 0, TASK_SIZE);
+
 	e->next = mm->emm_notifier;
+
 	/*
 	 * The update to emm_notifier (e->next) must be visible
 	 * before the pointer becomes visible.
 	 * rcu_assign_pointer() does exactly what we need.
 	 */
 	rcu_assign_pointer(mm->emm_notifier, e);
+
+	/* Continue notifiers */
+	emm_notify(mm, emm_start, 0, TASK_SIZE);
 }
 EXPORT_SYMBOL_GPL(emm_notifier_register);
 
+/* Unregister a notifier */
+void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm)
+{
+	struct emm_notifier *p;
+
+	emm_notify(mm, emm_stop, 0, TASK_SIZE);
+
+	p = mm->emm_notifier;
+	if (e == p)
+		mm->emm_notifier = e->next;
+	else {
+		while (p->next != e)
+			p = p->next;
+
+		p->next = e->next;
+	}
+	e->next = mm->emm_notifier;
+
+	emm_notify(mm, emm_start, 0, TASK_SIZE);
+	e->callback(e, mm, emm_release, 0, TASK_SIZE);
+}
+EXPORT_SYMBOL_GPL(emm_notifier_unregister);
+
 /*
  * Perform a callback
  *


From jbernstein at penguincomputing.com  Wed Apr  2 19:58:44 2008
From: jbernstein at penguincomputing.com (Joshua Bernstein)
Date: Wed, 2 Apr 2008 19:58:44 -0700
Subject: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554751D@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
	<D9DB4C87-BCAF-4500-A8BF-830A6F89B5B3@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554751D@xmb-sjc-216.amer.cisco.com>
Message-ID: <5747B828-9F2E-47E4-9658-52A232147C37@penguincomputing.com>

Have you checked to make sure the right user space end points are  
available in /sys?

Does using strace give you any hints?

-Josh

On Apr 2, 2008, at 4:28 PM, Scott Weitzenkamp (sweitzen) wrote:
>>> I tried that, and it didn't work:
>>>
>>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
>>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
>>> "eth2 0"
>>> ""
>>> [root at svbu-qa2950-1 ~]# dtest
>>> 10194 Running as server - OpenIB-cma
>>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE
>>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
>>
>> Ah, it is using the correct device then. Do you have the rdma_ucm
>> modules loaded?
>
> Yes, I do:
>
> [root at svbu-qa2950-1 ~]# lsmod | grep cm
> rdma_ucm               47232  0
> ib_uverbs              75568  1 rdma_ucm
> rdma_cm                67348  2 rdma_ucm,ib_sdp
> ib_cm                  67496  2 ib_ipoib,rdma_cm
> iw_cm                  43656  1 rdma_cm
> ib_sa                  74632  3 ib_ipoib,rdma_cm,ib_cm
> ib_mad                 70948  5 ib_umad,mlx4_ib,ib_mthca,ib_cm,ib_sa
> ib_core                97664  13
> rdma_ucm,ib_sdp,ib_ipoib,ib_uverbs,ib_umad,iw_c
> xgb3,mlx4_ib,ib_mthca,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad
> ib_addr                41992  1 rdma_cm


From a.p.zijlstra at chello.nl  Thu Apr  3 03:40:46 2008
From: a.p.zijlstra at chello.nl (Peter Zijlstra)
Date: Thu, 03 Apr 2008 12:40:46 +0200
Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify()
In-Reply-To: <Pine.LNX.4.64.0804021427210.30516@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com>
	<20080402212515.GS19189@duo.random>
	<Pine.LNX.4.64.0804021427210.30516@schroedinger.engr.sgi.com>
Message-ID: <1207219246.8514.817.camel@twins>

On Wed, 2008-04-02 at 14:33 -0700, Christoph Lameter wrote:
> On Wed, 2 Apr 2008, Andrea Arcangeli wrote:
> 
> > but anyway it's silly to be hardwired to such an interface that worst
> > of all requires switch statements instead of proper pointer to
> > functions and a fixed set of parameters and retval semantics for all
> > methods.
> 
> The EMM API with a single callback is the simplest approach at this point. 
> A common callback for all operations allows the driver to implement common 
> entry and exit code as seen in XPMem.

It seems to me that common code can be shared using functions? No need
to stuff everything into a single function. We have method vectors all
over the kernel, we could do a_ops as a single callback too, but we
dont.

FWIW I prefer separate methods.

> I guess we can complicate this more by switching to a different API or 
> adding additional emm_xxx() callback if need be but I really want to have 
> a strong case for why this would be needed. There is the danger of 
> adding frills with special callbacks in this and that situation that could 
> make the notifier complicated and specific to a certain usage scenario. 
> 
> Having this generic simple interface will hopefully avoid such things.
> 
> 


From a.p.zijlstra at chello.nl  Thu Apr  3 03:40:48 2008
From: a.p.zijlstra at chello.nl (Peter Zijlstra)
Date: Thu, 03 Apr 2008 12:40:48 +0200
Subject: [ofa-general] Re: EMM: disable other notifiers before register and
	unregister
In-Reply-To: <Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com> <20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
Message-ID: <1207219248.8514.819.camel@twins>

On Wed, 2008-04-02 at 18:24 -0700, Christoph Lameter wrote:
> Ok lets forget about the single theaded thing to solve the registration 
> races. As Andrea pointed out this still has ssues with other subscribed 
> subsystems (and also try_to_unmap). We could do something like what 
> stop_machine_run does: First disable all running subsystems before 
> registering a new one.
> 
> Maybe this is a possible solution.
> 
> 
> Subject: EMM: disable other notifiers before register and unregister
> 
> As Andrea has pointed out: There are races during registration if other
> subsystem notifiers are active while we register a callback.
> 
> Solve that issue by adding two new notifiers:
> 
> emm_stop
> 	Stops the notifier operations. Notifier must block on
> 	invalidate_start and emm_referenced from this point on.
> 	If an invalidate_start has not been completed by a call
> 	to invalidate_end then the driver must wait until the
> 	operation is complete before returning.
> 
> emm_start
> 	Restart notifier operations.

Please use pause and resume or something like that. stop-start is an
unnatural order; we usually start before we stop, whereas we pause first
and resume later.

> Before registration all other subscribed subsystems are stopped.
> Then the new subsystem is subscribed and things can get running
> without consistency issues.
> 
> Subsystems are restarted after the lists have been updated.
> 
> This also works for unregistering. If we can get all subsystems
> to stop then we can also reliably unregister a subsystem. So
> provide that callback.
> 
> Signed-off-by: Christoph Lameter <clameter at sgi.com>
> 
> ---
>  include/linux/rmap.h |   10 +++++++---
>  mm/rmap.c            |   30 ++++++++++++++++++++++++++++++
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6/include/linux/rmap.h
> ===================================================================
> --- linux-2.6.orig/include/linux/rmap.h	2008-04-02 18:16:07.906032549 -0700
> +++ linux-2.6/include/linux/rmap.h	2008-04-02 18:17:10.291070009 -0700
> @@ -94,7 +94,9 @@ enum emm_operation {
>  	emm_release,		/* Process exiting, */
>  	emm_invalidate_start,	/* Before the VM unmaps pages */
>  	emm_invalidate_end,	/* After the VM unmapped pages */
> - 	emm_referenced		/* Check if a range was referenced */
> + 	emm_referenced,		/* Check if a range was referenced */
> +	emm_stop,		/* Halt all faults/invalidate_starts */
> +	emm_start,		/* Restart operations */
>  };
>  
>  struct emm_notifier {
> @@ -126,13 +128,15 @@ static inline int emm_notify(struct mm_s
>  
>  /*
>   * Register a notifier with an mm struct. Release occurs when the process
> - * terminates by calling the notifier function with emm_release.
> + * terminates by calling the notifier function with emm_release or when
> + * emm_notifier_unregister is called.
>   *
>   * Must hold the mmap_sem for write.
>   */
>  extern void emm_notifier_register(struct emm_notifier *e,
>  					struct mm_struct *mm);
> -
> +extern void emm_notifier_unregister(struct emm_notifier *e,
> +					struct mm_struct *mm);
>  
>  /*
>   * Called from mm/vmscan.c to handle paging out
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c	2008-04-02 18:16:09.378057062 -0700
> +++ linux-2.6/mm/rmap.c	2008-04-02 18:16:10.710079201 -0700
> @@ -289,16 +289,46 @@ void emm_notifier_release(struct mm_stru
>  /* Register a notifier */
>  void emm_notifier_register(struct emm_notifier *e, struct mm_struct *mm)
>  {
> +	/* Bring all other notifiers into a quiescent state */
> +	emm_notify(mm, emm_stop, 0, TASK_SIZE);
> +
>  	e->next = mm->emm_notifier;
> +
>  	/*
>  	 * The update to emm_notifier (e->next) must be visible
>  	 * before the pointer becomes visible.
>  	 * rcu_assign_pointer() does exactly what we need.
>  	 */
>  	rcu_assign_pointer(mm->emm_notifier, e);
> +
> +	/* Continue notifiers */
> +	emm_notify(mm, emm_start, 0, TASK_SIZE);
>  }
>  EXPORT_SYMBOL_GPL(emm_notifier_register);
>  
> +/* Unregister a notifier */
> +void emm_notifier_unregister(struct emm_notifier *e, struct mm_struct *mm)
> +{
> +	struct emm_notifier *p;
> +
> +	emm_notify(mm, emm_stop, 0, TASK_SIZE);
> +
> +	p = mm->emm_notifier;
> +	if (e == p)
> +		mm->emm_notifier = e->next;
> +	else {
> +		while (p->next != e)
> +			p = p->next;
> +
> +		p->next = e->next;
> +	}
> +	e->next = mm->emm_notifier;
> +
> +	emm_notify(mm, emm_start, 0, TASK_SIZE);
> +	e->callback(e, mm, emm_release, 0, TASK_SIZE);
> +}
> +EXPORT_SYMBOL_GPL(emm_notifier_unregister);
> +
>  /*
>   * Perform a callback
>   *
> 


From tziporet at dev.mellanox.co.il  Thu Apr  3 04:40:10 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Thu, 03 Apr 2008 14:40:10 +0300
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <adahcek2pru.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com> <47F37CA4.8000109@mellanox.co.il>
	<adahcek2pru.fsf@cisco.com>
Message-ID: <47F4C21A.6090402@mellanox.co.il>

Roland Dreier wrote:
> Send with invalidate should be OK.  Let's see about the masked atomics
> stuff -- we have a ton of new verbs and I think we might want to slow
> down and make sure it all makes sense.
>   
OK - will send and then we will see what will come out.

>  > What about the split CQ for UD mode? It's improved the IPoIB
>  > performance for small messages significantly.
>
> Oh yeah... I'll try to get that in too.
>   
thanks
>  > mlx4- we plan to send patches for the low level driver only to enable
>  > mlx4_en. These only affect our low level driver.
>
> No problem in principle, let's see the actual patches.
>   
Sure
>  > I think we should try to push for XEC in 2.6.26 since there are
>  > already MPI implementation that use it and this ties them to use OFED
>  > only.
>  > Also this feature is stable and now being defined in IBTA
>  > Not taking it causing changes between OFED and the kernel and your
>  > libibverbs and we wish to avoid such gaps.
>  > Is there any thing we can do to help and make it into 2.6.26?
>
> I don't have a good feeling that the user-kernel interface is well
> thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs
> and make sure we have something that makes sense for the future.
>
>   
I see - but can't we figure this all for the 2.6.26 window?

Tziporet


From weir at schulhofer.com  Thu Apr  3 06:01:18 2008
From: weir at schulhofer.com (Derring Mistler)
Date: Thu, 03 Apr 2008 13:01:18 +0000
Subject: [ofa-general] revitalisers
Message-ID: <8217254700.20080403130109@schulhofer.com>

Hey,
   

Real men! 
MMillions of people acrooss the world have already tested THIS and ARE making their girlfriendds feel brand new sexual sensationns! YOU are the best in bed, aren't you ?Girls! Devvelop your sexual relationsship and get even MORE pleasuree! 
Make your booyfriend a gift!
http://nrs4o7ymvjcouv.blogspot.com
	
   Forth. Here, said he, quite magnificently, here's going on
in that line in the civilized countries, seen, since he
quitted the bankinghouse towards some ten or twelve yards
square. As the tall man is not captain paton mademoiselle
flora 'what to be asked, and she is awfully useful. She
looks mr. Parker pyne's mr. Parker pyne interpreted and
the earth. He peopled the forests, and the there an' i mak
nae doobt he'll du his best to apologize. He resumes his
seat. Now what about their house in chelsea. Raymond west
was a wellknown hear of this at once. He has just finished
examining young lady of large and tempting fortune, he could
camp were called together, and the situation was down the
sanatorium where kelvin halliday died.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080403/ef59445c/attachment.html>

From erezz at voltaire.com  Thu Apr  3 06:50:59 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Thu, 03 Apr 2008 16:50:59 +0300
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4
	plans
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
Message-ID: <47F4E0C3.2030100@voltaire.com>

>
> *OFED 1.4:*
> 1. Kernel base: since we target 1.4 release to Sep we target the
> kernel base to be 2.6.27
>     This is a good target, but we may need to stay with 2.6.26 if the
> kernel progress will not be aligned.
>
> 2. Suggestions for new features:
>
>     * NFS-RDMA
>     * Verbs: Reliable Multicast (to be presented at Sonoma)
>     * SDP - Zero copy (There was a question on IPv6 support - seems no
>       one interested for now)
>     * IPoIB - continue with performance enhancements
>     * Xsigo new virtual NIC
>     * New vendor HW support - non was reported so far (IBM and Chelsio
>       - do you have something?)
>     * OpenSM:
>           o Incremental routing
>           o Temporary SA DB - to answer queries and a heavy sweep is done
>           o APM - disjoint paths (?)
>           o MKey manager (?)
>           o Sasha to send more management features
>     * MPI:
>           o Open MPI 1.3
>           o APM support in MPI
>           o mvapich ???
>     * uDAPl
>           o Extensions for new APIs (like XRC) - ?
>           o uDAPL provider for interop between Windows & Linux
>           o 1.2 and 2.0 will stay
>

As I wrote in an earlier discussion (~2 months ago), we plan to add tgt
(SCSI target) with iSCSI over iSER (and TCP of course) support. The git
tree for tgt already exists on the ofa server.

Erez


From changquing.tang at hp.com  Thu Apr  3 07:27:27 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 3 Apr 2008 14:27:27 +0000
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <47F4E0C3.2030100@voltaire.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
Message-ID: <D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>


Can we address multiple-fabrics (physically separated) support ?


--CQ Tang

> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> Erez Zilber
> Sent: Thursday, April 03, 2008 8:51 AM
> To: Tziporet Koren
> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> Subject: [ofa-general] Re: [ewg] OFED March 24 meeting
> summary on OFED 1.4 plans
>
> >
> > *OFED 1.4:*
> > 1. Kernel base: since we target 1.4 release to Sep we target the
> > kernel base to be 2.6.27
> >     This is a good target, but we may need to stay with
> 2.6.26 if the
> > kernel progress will not be aligned.
> >
> > 2. Suggestions for new features:
> >
> >     * NFS-RDMA
> >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> >     * SDP - Zero copy (There was a question on IPv6 support
> - seems no
> >       one interested for now)
> >     * IPoIB - continue with performance enhancements
> >     * Xsigo new virtual NIC
> >     * New vendor HW support - non was reported so far (IBM
> and Chelsio
> >       - do you have something?)
> >     * OpenSM:
> >           o Incremental routing
> >           o Temporary SA DB - to answer queries and a heavy
> sweep is done
> >           o APM - disjoint paths (?)
> >           o MKey manager (?)
> >           o Sasha to send more management features
> >     * MPI:
> >           o Open MPI 1.3
> >           o APM support in MPI
> >           o mvapich ???
> >     * uDAPl
> >           o Extensions for new APIs (like XRC) - ?
> >           o uDAPL provider for interop between Windows & Linux
> >           o 1.2 and 2.0 will stay
> >
>
> As I wrote in an earlier discussion (~2 months ago), we plan
> to add tgt (SCSI target) with iSCSI over iSER (and TCP of
> course) support. The git tree for tgt already exists on the
> ofa server.
>
> Erez
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>


From hrosenstock at xsigo.com  Thu Apr  3 07:32:01 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 03 Apr 2008 07:32:01 -0700
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
Message-ID: <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>

CQ,

On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote:
> Can we address multiple-fabrics (physically separated) support ?

Can you elaborate on what you mean by "physically separated" ?

-- Hal

> 
> 
> --CQ Tang
> 
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> > Erez Zilber
> > Sent: Thursday, April 03, 2008 8:51 AM
> > To: Tziporet Koren
> > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting
> > summary on OFED 1.4 plans
> >
> > >
> > > *OFED 1.4:*
> > > 1. Kernel base: since we target 1.4 release to Sep we target the
> > > kernel base to be 2.6.27
> > >     This is a good target, but we may need to stay with
> > 2.6.26 if the
> > > kernel progress will not be aligned.
> > >
> > > 2. Suggestions for new features:
> > >
> > >     * NFS-RDMA
> > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > >     * SDP - Zero copy (There was a question on IPv6 support
> > - seems no
> > >       one interested for now)
> > >     * IPoIB - continue with performance enhancements
> > >     * Xsigo new virtual NIC
> > >     * New vendor HW support - non was reported so far (IBM
> > and Chelsio
> > >       - do you have something?)
> > >     * OpenSM:
> > >           o Incremental routing
> > >           o Temporary SA DB - to answer queries and a heavy
> > sweep is done
> > >           o APM - disjoint paths (?)
> > >           o MKey manager (?)
> > >           o Sasha to send more management features
> > >     * MPI:
> > >           o Open MPI 1.3
> > >           o APM support in MPI
> > >           o mvapich ???
> > >     * uDAPl
> > >           o Extensions for new APIs (like XRC) - ?
> > >           o uDAPL provider for interop between Windows & Linux
> > >           o 1.2 and 2.0 will stay
> > >
> >
> > As I wrote in an earlier discussion (~2 months ago), we plan
> > to add tgt (SCSI target) with iSCSI over iSER (and TCP of
> > course) support. The git tree for tgt already exists on the
> > ofa server.
> >
> > Erez
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hrosenstock at xsigo.com  Thu Apr  3 07:32:01 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 03 Apr 2008 07:32:01 -0700
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
Message-ID: <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>

CQ,

On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote:
> Can we address multiple-fabrics (physically separated) support ?

Can you elaborate on what you mean by "physically separated" ?

-- Hal

> 
> 
> --CQ Tang
> 
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> > Erez Zilber
> > Sent: Thursday, April 03, 2008 8:51 AM
> > To: Tziporet Koren
> > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting
> > summary on OFED 1.4 plans
> >
> > >
> > > *OFED 1.4:*
> > > 1. Kernel base: since we target 1.4 release to Sep we target the
> > > kernel base to be 2.6.27
> > >     This is a good target, but we may need to stay with
> > 2.6.26 if the
> > > kernel progress will not be aligned.
> > >
> > > 2. Suggestions for new features:
> > >
> > >     * NFS-RDMA
> > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > >     * SDP - Zero copy (There was a question on IPv6 support
> > - seems no
> > >       one interested for now)
> > >     * IPoIB - continue with performance enhancements
> > >     * Xsigo new virtual NIC
> > >     * New vendor HW support - non was reported so far (IBM
> > and Chelsio
> > >       - do you have something?)
> > >     * OpenSM:
> > >           o Incremental routing
> > >           o Temporary SA DB - to answer queries and a heavy
> > sweep is done
> > >           o APM - disjoint paths (?)
> > >           o MKey manager (?)
> > >           o Sasha to send more management features
> > >     * MPI:
> > >           o Open MPI 1.3
> > >           o APM support in MPI
> > >           o mvapich ???
> > >     * uDAPl
> > >           o Extensions for new APIs (like XRC) - ?
> > >           o uDAPL provider for interop between Windows & Linux
> > >           o 1.2 and 2.0 will stay
> > >
> >
> > As I wrote in an earlier discussion (~2 months ago), we plan
> > to add tgt (SCSI target) with iSCSI over iSER (and TCP of
> > course) support. The git tree for tgt already exists on the
> > ofa server.
> >
> > Erez
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From changquing.tang at hp.com  Thu Apr  3 07:40:25 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 3 Apr 2008 14:40:25 +0000
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
Message-ID: <D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>


You have a system, all HCAs have two ports, all port 1 are connected to the first switch,
all port 2 are connected to the second switch, there is NO link between the two switches.
We call this system has two physically separated fabrics. If you have a bridge link
between the two switches, then it becomes a single fabric.

The same thing for multiple HCAs on nodes.

The problem is, from MPI side, (and by default), we don't know which port is on which
fabric, since the subnet prefix is the same. We rely on system admin to config two
different subnet prefixes for HP-MPI to work.

No vendor has claimed to support this.

--CQ

> -----Original Message-----
> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> Sent: Thursday, April 03, 2008 9:32 AM
> To: Tang, Changqing
> Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org;
> general at lists.openfabrics.org
> Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
> summary on OFED 1.4 plans
>
> CQ,
>
> On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote:
> > Can we address multiple-fabrics (physically separated) support ?
>
> Can you elaborate on what you mean by "physically separated" ?
>
> -- Hal
>
> >
> >
> > --CQ Tang
> >
> > > -----Original Message-----
> > > From: general-bounces at lists.openfabrics.org
> > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez
> > > Zilber
> > > Sent: Thursday, April 03, 2008 8:51 AM
> > > To: Tziporet Koren
> > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
> > > OFED 1.4 plans
> > >
> > > >
> > > > *OFED 1.4:*
> > > > 1. Kernel base: since we target 1.4 release to Sep we
> target the
> > > > kernel base to be 2.6.27
> > > >     This is a good target, but we may need to stay with
> > > 2.6.26 if the
> > > > kernel progress will not be aligned.
> > > >
> > > > 2. Suggestions for new features:
> > > >
> > > >     * NFS-RDMA
> > > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > > >     * SDP - Zero copy (There was a question on IPv6 support
> > > - seems no
> > > >       one interested for now)
> > > >     * IPoIB - continue with performance enhancements
> > > >     * Xsigo new virtual NIC
> > > >     * New vendor HW support - non was reported so far (IBM
> > > and Chelsio
> > > >       - do you have something?)
> > > >     * OpenSM:
> > > >           o Incremental routing
> > > >           o Temporary SA DB - to answer queries and a heavy
> > > sweep is done
> > > >           o APM - disjoint paths (?)
> > > >           o MKey manager (?)
> > > >           o Sasha to send more management features
> > > >     * MPI:
> > > >           o Open MPI 1.3
> > > >           o APM support in MPI
> > > >           o mvapich ???
> > > >     * uDAPl
> > > >           o Extensions for new APIs (like XRC) - ?
> > > >           o uDAPL provider for interop between Windows & Linux
> > > >           o 1.2 and 2.0 will stay
> > > >
> > >
> > > As I wrote in an earlier discussion (~2 months ago), we
> plan to add
> > > tgt (SCSI target) with iSCSI over iSER (and TCP of
> > > course) support. The git tree for tgt already exists on the ofa
> > > server.
> > >
> > > Erez
> > >
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
>
>


From jsquyres at cisco.com  Thu Apr  3 07:47:52 2008
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 3 Apr 2008 10:47:52 -0400
Subject: [ofa-general] physically separate subnets (was: OFED March 24
	meeting summary on OFED 1.4	plans)
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
Message-ID: <32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com>

In Open MPI, we require physically different ("air gapped") subnets to  
have different subnet ID's so that we can compute reachability  
correctly.  I don't know how to do it otherwise.


On Apr 3, 2008, at 10:40 AM, Tang, Changqing wrote:
>
> You have a system, all HCAs have two ports, all port 1 are connected  
> to the first switch,
> all port 2 are connected to the second switch, there is NO link  
> between the two switches.
> We call this system has two physically separated fabrics. If you  
> have a bridge link
> between the two switches, then it becomes a single fabric.
>
> The same thing for multiple HCAs on nodes.
>
> The problem is, from MPI side, (and by default), we don't know which  
> port is on which
> fabric, since the subnet prefix is the same. We rely on system admin  
> to config two
> different subnet prefixes for HP-MPI to work.
>
> No vendor has claimed to support this.
>
> --CQ
>
>> -----Original Message-----
>> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
>> Sent: Thursday, April 03, 2008 9:32 AM
>> To: Tang, Changqing
>> Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org;
>> general at lists.openfabrics.org
>> Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
>> summary on OFED 1.4 plans
>>
>> CQ,
>>
>> On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote:
>>> Can we address multiple-fabrics (physically separated) support ?
>>
>> Can you elaborate on what you mean by "physically separated" ?
>>
>> -- Hal
>>
>>>
>>>
>>> --CQ Tang
>>>
>>>> -----Original Message-----
>>>> From: general-bounces at lists.openfabrics.org
>>>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez
>>>> Zilber
>>>> Sent: Thursday, April 03, 2008 8:51 AM
>>>> To: Tziporet Koren
>>>> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
>>>> Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
>>>> OFED 1.4 plans
>>>>
>>>>>
>>>>> *OFED 1.4:*
>>>>> 1. Kernel base: since we target 1.4 release to Sep we
>> target the
>>>>> kernel base to be 2.6.27
>>>>>    This is a good target, but we may need to stay with
>>>> 2.6.26 if the
>>>>> kernel progress will not be aligned.
>>>>>
>>>>> 2. Suggestions for new features:
>>>>>
>>>>>    * NFS-RDMA
>>>>>    * Verbs: Reliable Multicast (to be presented at Sonoma)
>>>>>    * SDP - Zero copy (There was a question on IPv6 support
>>>> - seems no
>>>>>      one interested for now)
>>>>>    * IPoIB - continue with performance enhancements
>>>>>    * Xsigo new virtual NIC
>>>>>    * New vendor HW support - non was reported so far (IBM
>>>> and Chelsio
>>>>>      - do you have something?)
>>>>>    * OpenSM:
>>>>>          o Incremental routing
>>>>>          o Temporary SA DB - to answer queries and a heavy
>>>> sweep is done
>>>>>          o APM - disjoint paths (?)
>>>>>          o MKey manager (?)
>>>>>          o Sasha to send more management features
>>>>>    * MPI:
>>>>>          o Open MPI 1.3
>>>>>          o APM support in MPI
>>>>>          o mvapich ???
>>>>>    * uDAPl
>>>>>          o Extensions for new APIs (like XRC) - ?
>>>>>          o uDAPL provider for interop between Windows & Linux
>>>>>          o 1.2 and 2.0 will stay
>>>>>
>>>>
>>>> As I wrote in an earlier discussion (~2 months ago), we
>> plan to add
>>>> tgt (SCSI target) with iSCSI over iSER (and TCP of
>>>> course) support. The git tree for tgt already exists on the ofa
>>>> server.
>>>>
>>>> Erez
>>>>
>>>> _______________________________________________
>>>> general mailing list
>>>> general at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>
>>>> To unsubscribe, please visit
>>>> http://openib.org/mailman/listinfo/openib-general
>>>>
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit
>>> http://openib.org/mailman/listinfo/openib-general
>>
>>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


-- 
Jeff Squyres
Cisco Systems


From hrosenstock at xsigo.com  Thu Apr  3 07:49:02 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 03 Apr 2008 07:49:02 -0700
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
Message-ID: <1207234143.29024.416.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-03 at 14:40 +0000, Tang, Changqing wrote:
> You have a system, all HCAs have two ports, all port 1 are connected to the first switch,
> all port 2 are connected to the second switch, there is NO link between the two switches.
> We call this system has two physically separated fabrics. If you have a bridge link
> between the two switches, then it becomes a single fabric.
> 
> The same thing for multiple HCAs on nodes.
> 
> The problem is, from MPI side, (and by default), we don't know which port is on which
> fabric, since the subnet prefix is the same. We rely on system admin to config two
> different subnet prefixes for HP-MPI to work.

Yes, these two IB subnets need two different subnet prefixes. (I think
it's more than just HP MPI which needs this).

-- Hal

> No vendor has claimed to support this.
> 
> --CQ
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> > Sent: Thursday, April 03, 2008 9:32 AM
> > To: Tang, Changqing
> > Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org;
> > general at lists.openfabrics.org
> > Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
> > summary on OFED 1.4 plans
> >
> > CQ,
> >
> > On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote:
> > > Can we address multiple-fabrics (physically separated) support ?
> >
> > Can you elaborate on what you mean by "physically separated" ?
> >
> > -- Hal
> >
> > >
> > >
> > > --CQ Tang
> > >
> > > > -----Original Message-----
> > > > From: general-bounces at lists.openfabrics.org
> > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez
> > > > Zilber
> > > > Sent: Thursday, April 03, 2008 8:51 AM
> > > > To: Tziporet Koren
> > > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
> > > > OFED 1.4 plans
> > > >
> > > > >
> > > > > *OFED 1.4:*
> > > > > 1. Kernel base: since we target 1.4 release to Sep we
> > target the
> > > > > kernel base to be 2.6.27
> > > > >     This is a good target, but we may need to stay with
> > > > 2.6.26 if the
> > > > > kernel progress will not be aligned.
> > > > >
> > > > > 2. Suggestions for new features:
> > > > >
> > > > >     * NFS-RDMA
> > > > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > > > >     * SDP - Zero copy (There was a question on IPv6 support
> > > > - seems no
> > > > >       one interested for now)
> > > > >     * IPoIB - continue with performance enhancements
> > > > >     * Xsigo new virtual NIC
> > > > >     * New vendor HW support - non was reported so far (IBM
> > > > and Chelsio
> > > > >       - do you have something?)
> > > > >     * OpenSM:
> > > > >           o Incremental routing
> > > > >           o Temporary SA DB - to answer queries and a heavy
> > > > sweep is done
> > > > >           o APM - disjoint paths (?)
> > > > >           o MKey manager (?)
> > > > >           o Sasha to send more management features
> > > > >     * MPI:
> > > > >           o Open MPI 1.3
> > > > >           o APM support in MPI
> > > > >           o mvapich ???
> > > > >     * uDAPl
> > > > >           o Extensions for new APIs (like XRC) - ?
> > > > >           o uDAPL provider for interop between Windows & Linux
> > > > >           o 1.2 and 2.0 will stay
> > > > >
> > > >
> > > > As I wrote in an earlier discussion (~2 months ago), we
> > plan to add
> > > > tgt (SCSI target) with iSCSI over iSER (and TCP of
> > > > course) support. The git tree for tgt already exists on the ofa
> > > > server.
> > > >
> > > > Erez
> > > >
> > > > _______________________________________________
> > > > general mailing list
> > > > general at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > >
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> >
> >
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hrosenstock at xsigo.com  Thu Apr  3 07:52:55 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 03 Apr 2008 07:52:55 -0700
Subject: [ofa-general] Re: [ewg] physically separate subnets (was: OFED March
	24 meeting summary on OFED 1.4	plans)
In-Reply-To: <32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
	<32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com>
Message-ID: <1207234376.29024.419.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-03 at 10:47 -0400, Jeff Squyres wrote:
> In Open MPI, we require physically different ("air gapped") subnets to  
> have different subnet ID's so that we can compute reachability  
> correctly.

Don't understand what the "air gapped" reference means.

> I don't know how to do it otherwise.

Me neither.

-- Hal

> 
> 
> On Apr 3, 2008, at 10:40 AM, Tang, Changqing wrote:
> >
> > You have a system, all HCAs have two ports, all port 1 are connected  
> > to the first switch,
> > all port 2 are connected to the second switch, there is NO link  
> > between the two switches.
> > We call this system has two physically separated fabrics. If you  
> > have a bridge link
> > between the two switches, then it becomes a single fabric.
> >
> > The same thing for multiple HCAs on nodes.
> >
> > The problem is, from MPI side, (and by default), we don't know which  
> > port is on which
> > fabric, since the subnet prefix is the same. We rely on system admin  
> > to config two
> > different subnet prefixes for HP-MPI to work.
> >
> > No vendor has claimed to support this.
> >
> > --CQ
> >
> >> -----Original Message-----
> >> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> >> Sent: Thursday, April 03, 2008 9:32 AM
> >> To: Tang, Changqing
> >> Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org;
> >> general at lists.openfabrics.org
> >> Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
> >> summary on OFED 1.4 plans
> >>
> >> CQ,
> >>
> >> On Thu, 2008-04-03 at 14:27 +0000, Tang, Changqing wrote:
> >>> Can we address multiple-fabrics (physically separated) support ?
> >>
> >> Can you elaborate on what you mean by "physically separated" ?
> >>
> >> -- Hal
> >>
> >>>
> >>>
> >>> --CQ Tang
> >>>
> >>>> -----Original Message-----
> >>>> From: general-bounces at lists.openfabrics.org
> >>>> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez
> >>>> Zilber
> >>>> Sent: Thursday, April 03, 2008 8:51 AM
> >>>> To: Tziporet Koren
> >>>> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> >>>> Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
> >>>> OFED 1.4 plans
> >>>>
> >>>>>
> >>>>> *OFED 1.4:*
> >>>>> 1. Kernel base: since we target 1.4 release to Sep we
> >> target the
> >>>>> kernel base to be 2.6.27
> >>>>>    This is a good target, but we may need to stay with
> >>>> 2.6.26 if the
> >>>>> kernel progress will not be aligned.
> >>>>>
> >>>>> 2. Suggestions for new features:
> >>>>>
> >>>>>    * NFS-RDMA
> >>>>>    * Verbs: Reliable Multicast (to be presented at Sonoma)
> >>>>>    * SDP - Zero copy (There was a question on IPv6 support
> >>>> - seems no
> >>>>>      one interested for now)
> >>>>>    * IPoIB - continue with performance enhancements
> >>>>>    * Xsigo new virtual NIC
> >>>>>    * New vendor HW support - non was reported so far (IBM
> >>>> and Chelsio
> >>>>>      - do you have something?)
> >>>>>    * OpenSM:
> >>>>>          o Incremental routing
> >>>>>          o Temporary SA DB - to answer queries and a heavy
> >>>> sweep is done
> >>>>>          o APM - disjoint paths (?)
> >>>>>          o MKey manager (?)
> >>>>>          o Sasha to send more management features
> >>>>>    * MPI:
> >>>>>          o Open MPI 1.3
> >>>>>          o APM support in MPI
> >>>>>          o mvapich ???
> >>>>>    * uDAPl
> >>>>>          o Extensions for new APIs (like XRC) - ?
> >>>>>          o uDAPL provider for interop between Windows & Linux
> >>>>>          o 1.2 and 2.0 will stay
> >>>>>
> >>>>
> >>>> As I wrote in an earlier discussion (~2 months ago), we
> >> plan to add
> >>>> tgt (SCSI target) with iSCSI over iSER (and TCP of
> >>>> course) support. The git tree for tgt already exists on the ofa
> >>>> server.
> >>>>
> >>>> Erez
> >>>>
> >>>> _______________________________________________
> >>>> general mailing list
> >>>> general at lists.openfabrics.org
> >>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>>>
> >>>> To unsubscribe, please visit
> >>>> http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>> _______________________________________________
> >>> general mailing list
> >>> general at lists.openfabrics.org
> >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>>
> >>> To unsubscribe, please visit
> >>> http://openib.org/mailman/listinfo/openib-general
> >>
> >>
> > _______________________________________________
> > ewg mailing list
> > ewg at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> 
> 


From changquing.tang at hp.com  Thu Apr  3 07:53:20 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 3 Apr 2008 14:53:20 +0000
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <47F4E0C3.2030100@voltaire.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
Message-ID: <D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>


One other thing I hope to talk is some fabric query functionalities for normal user,
not only just for root. This is at IB verbs level, not rdma_cm level.

for example, in MPI, process A know the HCA guid on another node. After running for
some time, the switch is restarted for some reason, and the whole fabric is re-configured.

Now process A wants to know if the port lid on another node has changed or not, it knows
the HCA guid,  is there any function to query this ?

I know as root, we can use the mad/umad library to do this kind of query, I want to do
such query in MPI, which is a normal user.


--CQ Tang, HP-MPI


> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> Erez Zilber
> Sent: Thursday, April 03, 2008 8:51 AM
> To: Tziporet Koren
> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> Subject: [ofa-general] Re: [ewg] OFED March 24 meeting
> summary on OFED 1.4 plans
>
> >
> > *OFED 1.4:*
> > 1. Kernel base: since we target 1.4 release to Sep we target the
> > kernel base to be 2.6.27
> >     This is a good target, but we may need to stay with
> 2.6.26 if the
> > kernel progress will not be aligned.
> >
> > 2. Suggestions for new features:
> >
> >     * NFS-RDMA
> >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> >     * SDP - Zero copy (There was a question on IPv6 support
> - seems no
> >       one interested for now)
> >     * IPoIB - continue with performance enhancements
> >     * Xsigo new virtual NIC
> >     * New vendor HW support - non was reported so far (IBM
> and Chelsio
> >       - do you have something?)
> >     * OpenSM:
> >           o Incremental routing
> >           o Temporary SA DB - to answer queries and a heavy
> sweep is done
> >           o APM - disjoint paths (?)
> >           o MKey manager (?)
> >           o Sasha to send more management features
> >     * MPI:
> >           o Open MPI 1.3
> >           o APM support in MPI
> >           o mvapich ???
> >     * uDAPl
> >           o Extensions for new APIs (like XRC) - ?
> >           o uDAPL provider for interop between Windows & Linux
> >           o 1.2 and 2.0 will stay
> >
>
> As I wrote in an earlier discussion (~2 months ago), we plan
> to add tgt (SCSI target) with iSCSI over iSER (and TCP of
> course) support. The git tree for tgt already exists on the
> ofa server.
>
> Erez
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>


From jsquyres at cisco.com  Thu Apr  3 07:54:39 2008
From: jsquyres at cisco.com (Jeff Squyres)
Date: Thu, 3 Apr 2008 10:54:39 -0400
Subject: [ofa-general] Re: [ewg] physically separate subnets (was: OFED March
	24 meeting summary on OFED 1.4	plans)
In-Reply-To: <1207234376.29024.419.camel@hrosenstock-ws.xsigo.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
	<32469DBF-3E6F-4072-826D-A52EC29F7A46@cisco.com>
	<1207234376.29024.419.camel@hrosenstock-ws.xsigo.com>
Message-ID: <0FE92DA6-F7C1-4BE8-BFCA-A7A5089FB0B4@cisco.com>

On Apr 3, 2008, at 10:52 AM, Hal Rosenstock wrote:
> On Thu, 2008-04-03 at 10:47 -0400, Jeff Squyres wrote:
>> In Open MPI, we require physically different ("air gapped") subnets  
>> to
>> have different subnet ID's so that we can compute reachability
>> correctly.
>
> Don't understand what the "air gapped" reference means.

There's no physical connection between the two -- there's an "air gap"  
between the networks (maybe it's a military term :-) ).

-- 
Jeff Squyres
Cisco Systems


From andrea at qumranet.com  Thu Apr  3 08:00:48 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 17:00:48 +0200
Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify()
In-Reply-To: <1207219246.8514.817.camel@twins>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com>
	<20080402212515.GS19189@duo.random>
	<Pine.LNX.4.64.0804021427210.30516@schroedinger.engr.sgi.com>
	<1207219246.8514.817.camel@twins>
Message-ID: <20080403143341.GA9603@duo.random>

On Thu, Apr 03, 2008 at 12:40:46PM +0200, Peter Zijlstra wrote:
> It seems to me that common code can be shared using functions? No need
> FWIW I prefer separate methods.

kvm patch using mmu notifiers shares 99% of the code too between the
two different methods implemented indeed. Code sharing is the same and
if something pointer to functions will be faster if gcc isn't smart or
can't create a compile time hash to jump into the right address
without having to check every case: .


From hrosenstock at xsigo.com  Thu Apr  3 08:02:10 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 03 Apr 2008 08:02:10 -0700
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
Message-ID: <1207234931.29024.425.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-03 at 14:53 +0000, Tang, Changqing wrote:
> One other thing I hope to talk is some fabric query functionalities for normal user,
> not only just for root. This is at IB verbs level, not rdma_cm level.
> 
> for example, in MPI, process A know the HCA guid on another node. After running for
> some time, the switch is restarted for some reason, and the whole fabric is re-configured.
> 
> Now process A wants to know if the port lid on another node has changed or not, it knows
> the HCA guid,  is there any function to query this ?

> I know as root, we can use the mad/umad library to do this kind of query, I want to do
> such query in MPI, which is a normal user.

In the IB arch, there are SA registrations and queries for the specific
example you used. However, these are not directly exposed to Linux user
space directly (for the normal user as opposed to MAD user (note there
are some difficulties in making this available to the normal user)) (at
least not yet AFAIK). While these are not (direct) fabric query (really
SA query), they serve the same function in a different way.

-- Hal

> --CQ Tang, HP-MPI
> 
> 
> 
> > -----Original Message-----
> > From: general-bounces at lists.openfabrics.org
> > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of
> > Erez Zilber
> > Sent: Thursday, April 03, 2008 8:51 AM
> > To: Tziporet Koren
> > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting
> > summary on OFED 1.4 plans
> >
> > >
> > > *OFED 1.4:*
> > > 1. Kernel base: since we target 1.4 release to Sep we target the
> > > kernel base to be 2.6.27
> > >     This is a good target, but we may need to stay with
> > 2.6.26 if the
> > > kernel progress will not be aligned.
> > >
> > > 2. Suggestions for new features:
> > >
> > >     * NFS-RDMA
> > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > >     * SDP - Zero copy (There was a question on IPv6 support
> > - seems no
> > >       one interested for now)
> > >     * IPoIB - continue with performance enhancements
> > >     * Xsigo new virtual NIC
> > >     * New vendor HW support - non was reported so far (IBM
> > and Chelsio
> > >       - do you have something?)
> > >     * OpenSM:
> > >           o Incremental routing
> > >           o Temporary SA DB - to answer queries and a heavy
> > sweep is done
> > >           o APM - disjoint paths (?)
> > >           o MKey manager (?)
> > >           o Sasha to send more management features
> > >     * MPI:
> > >           o Open MPI 1.3
> > >           o APM support in MPI
> > >           o mvapich ???
> > >     * uDAPl
> > >           o Extensions for new APIs (like XRC) - ?
> > >           o uDAPL provider for interop between Windows & Linux
> > >           o 1.2 and 2.0 will stay
> > >
> >
> > As I wrote in an earlier discussion (~2 months ago), we plan
> > to add tgt (SCSI target) with iSCSI over iSER (and TCP of
> > course) support. The git tree for tgt already exists on the
> > ofa server.
> >
> > Erez
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From changquing.tang at hp.com  Thu Apr  3 08:11:10 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Thu, 3 Apr 2008 15:11:10 +0000
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <1207234931.29024.425.camel@hrosenstock-ws.xsigo.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
	<1207234931.29024.425.camel@hrosenstock-ws.xsigo.com>
Message-ID: <D89C2C212795564B837FA1665CAE029910168DBDE1@G5W0278.americas.hpqcorp.net>


Thanks. When can we have the SA features, very soon, long time, or never ?


--CQ

> -----Original Message-----
> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> Sent: Thursday, April 03, 2008 10:02 AM
> To: Tang, Changqing
> Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org;
> general at lists.openfabrics.org
> Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
> summary on OFED 1.4 plans
>
> On Thu, 2008-04-03 at 14:53 +0000, Tang, Changqing wrote:
> > One other thing I hope to talk is some fabric query functionalities
> > for normal user, not only just for root. This is at IB
> verbs level, not rdma_cm level.
> >
> > for example, in MPI, process A know the HCA guid on another node.
> > After running for some time, the switch is restarted for
> some reason, and the whole fabric is re-configured.
> >
> > Now process A wants to know if the port lid on another node has
> > changed or not, it knows the HCA guid,  is there any
> function to query this ?
>
> > I know as root, we can use the mad/umad library to do this kind of
> > query, I want to do such query in MPI, which is a normal user.
>
> In the IB arch, there are SA registrations and queries for
> the specific example you used. However, these are not
> directly exposed to Linux user space directly (for the normal
> user as opposed to MAD user (note there are some difficulties
> in making this available to the normal user)) (at least not
> yet AFAIK). While these are not (direct) fabric query (really
> SA query), they serve the same function in a different way.
>
> -- Hal
>
> > --CQ Tang, HP-MPI
> >
> >
> >
> > > -----Original Message-----
> > > From: general-bounces at lists.openfabrics.org
> > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez
> > > Zilber
> > > Sent: Thursday, April 03, 2008 8:51 AM
> > > To: Tziporet Koren
> > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
> > > OFED 1.4 plans
> > >
> > > >
> > > > *OFED 1.4:*
> > > > 1. Kernel base: since we target 1.4 release to Sep we
> target the
> > > > kernel base to be 2.6.27
> > > >     This is a good target, but we may need to stay with
> > > 2.6.26 if the
> > > > kernel progress will not be aligned.
> > > >
> > > > 2. Suggestions for new features:
> > > >
> > > >     * NFS-RDMA
> > > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > > >     * SDP - Zero copy (There was a question on IPv6 support
> > > - seems no
> > > >       one interested for now)
> > > >     * IPoIB - continue with performance enhancements
> > > >     * Xsigo new virtual NIC
> > > >     * New vendor HW support - non was reported so far (IBM
> > > and Chelsio
> > > >       - do you have something?)
> > > >     * OpenSM:
> > > >           o Incremental routing
> > > >           o Temporary SA DB - to answer queries and a heavy
> > > sweep is done
> > > >           o APM - disjoint paths (?)
> > > >           o MKey manager (?)
> > > >           o Sasha to send more management features
> > > >     * MPI:
> > > >           o Open MPI 1.3
> > > >           o APM support in MPI
> > > >           o mvapich ???
> > > >     * uDAPl
> > > >           o Extensions for new APIs (like XRC) - ?
> > > >           o uDAPL provider for interop between Windows & Linux
> > > >           o 1.2 and 2.0 will stay
> > > >
> > >
> > > As I wrote in an earlier discussion (~2 months ago), we
> plan to add
> > > tgt (SCSI target) with iSCSI over iSER (and TCP of
> > > course) support. The git tree for tgt already exists on the ofa
> > > server.
> > >
> > > Erez
> > >
> > > _______________________________________________
> > > general mailing list
> > > general at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > >
> > > To unsubscribe, please visit
> > > http://openib.org/mailman/listinfo/openib-general
> > >
> > _______________________________________________
> > ewg mailing list
> > ewg at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
>


From swise at opengridcomputing.com  Thu Apr  3 08:17:58 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 10:17:58 -0500
Subject: [ofa-general] RE: [rds-devel] Has anyone tried running RDS over
	10GE	/ IWARP NICs ?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>
	<47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F4F526.3060709@opengridcomputing.com>

I think RDS might be getting confused because the 10GbE rnic shows up as 
a dumb NIC hooked into the native TCP stack -and- an rdma device.

Jon Mason will be working to enable RDS soon on the chelsio device. 
He'll feed back the changes needed, if any, to RDS.  Stay tuned.

However, Scott if you want to debug this further, we can support you.

Steve.

Scott Weitzenkamp (sweitzen) wrote:
> Yes, it's an iWARP NIC, and the OFED 1.3 perftest ib_rdma_lat program is
> working.
> 
> Scott
> 
> 
>> -----Original Message-----
>> From: Richard Frank [mailto:richard.frank at oracle.com] 
>> Sent: Wednesday, April 02, 2008 11:04 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: rds-devel at oss.oracle.com; [ofa_general]
>> Subject: Re: [rds-devel] Has anyone tried running RDS over 
>> 10GE / IWARP NICs ?
>>
>> RDS does not run over regular 10G NICs - that appear as simple NICS - 
>> this was disabled in 1.3.
>>
>> For now we  are interested in RDS over IWARP NICS - configured as 
>> accessible via the verbs interfaces.
>>
>> Richard Frank wrote:
>>> is the rds driver loaded (modprobe rds)
>>>
>>> Scott Weitzenkamp (sweitzen) wrote:
>>>   
>>>> Does't appear to work with Chelsio and OFED 1.3:
>>>>
>>>> [root at svbu-qa2950-1 counters]# ethtool -i eth2
>>>> driver: cxgb3
>>>> version: 1.0-ofed
>>>> firmware-version: T 5.0.0 TP 1.1.0
>>>> bus-info: 0000:0b:00.0
>>>> [root at svbu-qa2950-1 counters]# ifconfig eth2
>>>> eth2      Link encap:Ethernet  HWaddr 00:07:43:05:43:9F
>>>>           inet addr:192.168.0.198  Bcast:192.168.0.255
>>>> Mask:255.255.255.0
>>>>           inet6 addr: fe80::207:43ff:fe05:439f/64 Scope:Link
>>>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>>           RX packets:144770 errors:0 dropped:0 overruns:0 frame:0
>>>>           TX packets:144781 errors:0 dropped:0 overruns:0 carrier:0
>>>>           collisions:0 txqueuelen:1000
>>>>           RX bytes:207891512 (198.2 MiB)  TX bytes:9348152 
>> (8.9 MiB)
>>>>           Interrupt:169 Memory:fceff000-fcefffff
>>>>
>>>> [root at svbu-qa2950-1 counters]# rds-sink -s 192.168.0.198:22222 -i 1
>>>> rds-sink: Unable to bind socket: Cannot assign requested address
>>>>
>>>> Scott Weitzenkamp
>>>> SQA and Release Manager
>>>> Data Center Access Engineering
>>>> Cisco Systems
>>>>
>>>>
>>>>  
>>>>
>>>>   
>>>>     
>>>>> -----Original Message-----
>>>>> From: rds-devel-bounces at oss.oracle.com 
>>>>> [mailto:rds-devel-bounces at oss.oracle.com] On Behalf Of 
>> Richard Frank
>>>>> Sent: Wednesday, April 02, 2008 10:31 AM
>>>>> To: rds-devel at oss.oracle.com; [ofa_general]
>>>>> Subject: [rds-devel] Has anyone tried running RDS over 10GE / 
>>>>> IWARP NICs ?
>>>>>
>>>>> We'd appreciate some feed back on your experience and would 
>>>>> like to sort 
>>>>> out any issues ASAP.
>>>>>
>>>>> Rick
>>>>>
>>>>> _______________________________________________
>>>>> rds-devel mailing list
>>>>> rds-devel at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/rds-devel
>>>>>
>>>>>     
>>>>>       
>>> _______________________________________________
>>> rds-devel mailing list
>>> rds-devel at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/rds-devel
>>>   
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hrosenstock at xsigo.com  Thu Apr  3 08:20:33 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 03 Apr 2008 08:20:33 -0700
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBDE1@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
	<1207234931.29024.425.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBDE1@G5W0278.americas.hpqcorp.net>
Message-ID: <1207236033.29024.430.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-03 at 15:11 +0000, Tang, Changqing wrote:
> Thanks. When can we have the SA features, very soon, long time, or never ?

I'm unaware of any current plans to implement these but my knowledge is
far from complete...

-- Hal

> --CQ
> 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> > Sent: Thursday, April 03, 2008 10:02 AM
> > To: Tang, Changqing
> > Cc: Erez Zilber; Tziporet Koren; ewg at lists.openfabrics.org;
> > general at lists.openfabrics.org
> > Subject: RE: [ofa-general] Re: [ewg] OFED March 24 meeting
> > summary on OFED 1.4 plans
> >
> > On Thu, 2008-04-03 at 14:53 +0000, Tang, Changqing wrote:
> > > One other thing I hope to talk is some fabric query functionalities
> > > for normal user, not only just for root. This is at IB
> > verbs level, not rdma_cm level.
> > >
> > > for example, in MPI, process A know the HCA guid on another node.
> > > After running for some time, the switch is restarted for
> > some reason, and the whole fabric is re-configured.
> > >
> > > Now process A wants to know if the port lid on another node has
> > > changed or not, it knows the HCA guid,  is there any
> > function to query this ?
> >
> > > I know as root, we can use the mad/umad library to do this kind of
> > > query, I want to do such query in MPI, which is a normal user.
> >
> > In the IB arch, there are SA registrations and queries for
> > the specific example you used. However, these are not
> > directly exposed to Linux user space directly (for the normal
> > user as opposed to MAD user (note there are some difficulties
> > in making this available to the normal user)) (at least not
> > yet AFAIK). While these are not (direct) fabric query (really
> > SA query), they serve the same function in a different way.
> >
> > -- Hal
> >
> > > --CQ Tang, HP-MPI
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: general-bounces at lists.openfabrics.org
> > > > [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Erez
> > > > Zilber
> > > > Sent: Thursday, April 03, 2008 8:51 AM
> > > > To: Tziporet Koren
> > > > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > > > Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on
> > > > OFED 1.4 plans
> > > >
> > > > >
> > > > > *OFED 1.4:*
> > > > > 1. Kernel base: since we target 1.4 release to Sep we
> > target the
> > > > > kernel base to be 2.6.27
> > > > >     This is a good target, but we may need to stay with
> > > > 2.6.26 if the
> > > > > kernel progress will not be aligned.
> > > > >
> > > > > 2. Suggestions for new features:
> > > > >
> > > > >     * NFS-RDMA
> > > > >     * Verbs: Reliable Multicast (to be presented at Sonoma)
> > > > >     * SDP - Zero copy (There was a question on IPv6 support
> > > > - seems no
> > > > >       one interested for now)
> > > > >     * IPoIB - continue with performance enhancements
> > > > >     * Xsigo new virtual NIC
> > > > >     * New vendor HW support - non was reported so far (IBM
> > > > and Chelsio
> > > > >       - do you have something?)
> > > > >     * OpenSM:
> > > > >           o Incremental routing
> > > > >           o Temporary SA DB - to answer queries and a heavy
> > > > sweep is done
> > > > >           o APM - disjoint paths (?)
> > > > >           o MKey manager (?)
> > > > >           o Sasha to send more management features
> > > > >     * MPI:
> > > > >           o Open MPI 1.3
> > > > >           o APM support in MPI
> > > > >           o mvapich ???
> > > > >     * uDAPl
> > > > >           o Extensions for new APIs (like XRC) - ?
> > > > >           o uDAPL provider for interop between Windows & Linux
> > > > >           o 1.2 and 2.0 will stay
> > > > >
> > > >
> > > > As I wrote in an earlier discussion (~2 months ago), we
> > plan to add
> > > > tgt (SCSI target) with iSCSI over iSER (and TCP of
> > > > course) support. The git tree for tgt already exists on the ofa
> > > > server.
> > > >
> > > > Erez
> > > >
> > > > _______________________________________________
> > > > general mailing list
> > > > general at lists.openfabrics.org
> > > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > > >
> > > > To unsubscribe, please visit
> > > > http://openib.org/mailman/listinfo/openib-general
> > > >
> > > _______________________________________________
> > > ewg mailing list
> > > ewg at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> >
> >


From sashak at voltaire.com  Thu Apr  3 11:25:09 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 3 Apr 2008 18:25:09 +0000
Subject: [ofa-general] Re: [Infiniband-Diags] [PATCH] saquery exit with
	non-zero code on bad input
In-Reply-To: <1207074579.15637.153.camel@cardanus.llnl.gov>
References: <1207074579.15637.153.camel@cardanus.llnl.gov>
Message-ID: <20080403182509.GE5982@sashak.voltaire.com>

Hi Al,

On 11:29 Tue 01 Apr     , Al Chu wrote:
> 
> If an input into saquery isn't found, saquery still exits with '0'
> status, so it poses a problem in scripting.
> 
> This patch exits w/ non-zero if the input isn't found by saquery.

I guess by input you mean "SA records". Right?

> The actual status code I selected to return can be revised.  I just sort
> of picked one.

This patch cares only about print_node_records()? What about other
queries?

> Signed-off-by: Albert L. Chu <chu11 at llnl.gov>
> ---
>  infiniband-diags/src/saquery.c |   13 +++++++++++++
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c
> index ed61721..f801385 100644
> --- a/infiniband-diags/src/saquery.c
> +++ b/infiniband-diags/src/saquery.c
> @@ -839,6 +839,7 @@ print_node_records(osm_bind_handle_t bind_handle)
>  	ib_node_record_t *node_record = NULL;
>  	ib_net16_t        attr_offset = ib_get_attr_offset(sizeof(*node_record));
>  	ib_api_status_t   status;
> +	unsigned int      output_count = 0;
>  
>  	status  = get_all_records(bind_handle, IB_MAD_ATTR_NODE_RECORD, attr_offset, 0);
>  	if (status != IB_SUCCESS)
> @@ -855,12 +856,14 @@ print_node_records(osm_bind_handle_t bind_handle)
>  		} else if (node_print_desc == NAME_OF_LID) {
>  			if (requested_lid == cl_ntoh16(node_record->lid)) {
>  				print_node_record(node_record);
> +				output_count++;
>  			}
>  		} else if (node_print_desc == NAME_OF_GUID) {
>  			ib_node_info_t *p_ni = &(node_record->node_info);
>  
>  			if (requested_guid == cl_ntoh64(p_ni->port_guid)) {
>  				print_node_record(node_record);
> +				output_count++;
>  			}
>  		} else {
>  			if (!requested_name ||
> @@ -868,6 +871,7 @@ print_node_records(osm_bind_handle_t bind_handle)
>  				     (char *)node_record->node_desc.description,
>  				     sizeof(node_record->node_desc.description)) == 0)) {
>  				print_node_record(node_record);
> +				output_count++;
>  				if (node_print_desc == UNIQUE_LID_ONLY) {
>  					return_mad();
>  					exit(0);
> @@ -876,6 +880,15 @@ print_node_records(osm_bind_handle_t bind_handle)
>  		}
>  	}
>  	return_mad();
> +	if ((requested_lid_flag
> +	     || requested_guid_flag
> +	     || requested_name)
> +	    && !output_count) {
> +		/* need non-zero error code to indicate input not matched.
> +		 * this seems as good as any other status error code.
> +		 */ 
> +		status = IB_NOT_FOUND;
> +	}
>  	return (status);

What about just to 'return result.status' here?

Sasha


From andrea at qumranet.com  Thu Apr  3 08:29:36 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 3 Apr 2008 17:29:36 +0200
Subject: [ofa-general] Re: EMM: disable other notifiers before register and
	unregister
In-Reply-To: <Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
Message-ID: <20080403151908.GB9603@duo.random>

On Wed, Apr 02, 2008 at 06:24:15PM -0700, Christoph Lameter wrote:
> Ok lets forget about the single theaded thing to solve the registration 
> races. As Andrea pointed out this still has ssues with other subscribed 
> subsystems (and also try_to_unmap). We could do something like what 
> stop_machine_run does: First disable all running subsystems before 
> registering a new one.
> 
> Maybe this is a possible solution.

It still doesn't solve this kernel crash.

   CPU0				CPU1
   range_start (mmu notifier chain is empty)
   range_start returns
				mmu_notifier_register
				kvm_emm_stop (how kvm can ever know
				the other cpu is in the middle of the critical section?)
				kvm page fault (kvm thinks mmu_notifier_register serialized)
   zap ptes
   free_page mapped by spte/GRU and not pinned -> crash


There's no way the lowlevel can stop mmu_notifier_register and if
mmu_notifier_register returns, then sptes will be instantiated and
it'll corrupt memory the same way.

The seqlock was fine, what is wrong is the assumption that we can let
the lowlevel driver handle a range_end happening without range_begin
before it. The problem is that by design the lowlevel can't handle a
range_end happening without a range_begin before it. This is the core
kernel crashing problem we have (it's a kernel crashing issue only for
drivers that don't pin the pages, so XPMEM wouldn't crash but still it
would leak memory, which is a more graceful failure than random mm
corruption).

The basic trouble is that sometime range_begin/end critical sections
run outside the mmap_sem (see try_to_unmap_cluster in #v10 or even
try_to_unmap_one only in EMM-V2).

My attempt to fix this once and for all is to walk all vmas of the
"mm" inside mmu_notifier_register and take all anon_vma locks and
i_mmap_locks in virtual address order in a row. It's ok to take those
inside the mmap_sem. Supposedly if anybody will ever take a double
lock it'll do in order too. Then I can dump all the other locking and
remove the seqlock, and the driver is guaranteed there will be a
single call of range_begin followed by a single call of range_end the
whole time and no race could ever happen, and there won't be replied
calls of range_begin that would screwup a recursive semaphore
locking. The patch won't be pretty, I guess I'll vmalloc an array of
pointers to locks to reorder them. It doesn't need to be fast. Also
the locks can't go away from under us while we hold the
down_write(mmap_sem) because the vmas can be altered only with
down_write(mmap_sem) (modulo vm_start/vm_end that can be modified with
only down_read(mmap_sem) + page_table_lock like in growsdown page
faults). So it should be ok to take all those locks inside the
mmap_sem and implement a lock_vm(mm) unlock_vm(mm). I'll think more
about this hammer approach while I try to implement it...


From swise at opengridcomputing.com  Thu Apr  3 09:18:32 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 11:18:32 -0500
Subject: [ofa-general] Re: [ewg] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F50358.1010000@opengridcomputing.com>


Scott Weitzenkamp (sweitzen) wrote:
> I have OFED 1.3 and a Chelsio S310E-SR+ iWARP 10GE NIC.  I have 
> ib_rdma_lat working, so I know IB verbs are working.
>  
> How do I use uDAPL, though?  All the default /etc/dat.conf entries have 
> IPoIB or bonding interfaces in them.
>  
> 

Add an entry like this:


cxgb u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ethx 0" ""


Where ethx is the ethernet interface for the chelsio device.

Also, last time I ran it you needed this in your env:

export DAPL_MAX_INLINE=64

Steve.


From swise at opengridcomputing.com  Thu Apr  3 09:19:07 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 11:19:07 -0500
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F5037B.3020501@opengridcomputing.com>


Scott Weitzenkamp (sweitzen) wrote:
> I tried that, and it didn't work:
> 
> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "eth2 0"
> ""
> [root at svbu-qa2950-1 ~]# dtest
> 10194 Running as server - OpenIB-cma
> 10194 Error dat_ep_create: DAT_INVALID_HANDLE
> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
> 

try setting DAPL_MAX_INLINE=64


From sweitzen at cisco.com  Thu Apr  3 09:27:58 2008
From: sweitzen at cisco.com (Scott Weitzenkamp (sweitzen))
Date: Thu, 3 Apr 2008 09:27:58 -0700
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <47F5037B.3020501@opengridcomputing.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
	<47F5037B.3020501@opengridcomputing.com>
Message-ID: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>

Steve,

Thanks, that gets further, but dtest still fails.

Client side:

[releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198
13926 Running as client - OpenIB-cma
13926 Server Name: 192.168.0.198
13926 Server Net Address: 192.168.0.198
13926 Waiting for connect response
13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE
13926 Error connect_ep: DAT_ABORT

13926: DAPL Test Complete.

13926: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
usec, pc=
0
13926: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
usec, pc=
0
13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
13926: open:        36619.19 usec
13926: close:       32500.98 usec
13926: PZ create:       7.87 usec
13926: PZ free:         4.05 usec
13926: LMR create:     58.89 usec
13926: LMR free:       11.92 usec
13926: EVD create:      9.78 usec
13926: EVD free:       14.07 usec
13926: EP create:      78.92 usec
13926: EP free:        26.23 usec
13926: TOTAL:         199.79 usec

Server side:

[releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest
11461 Running as server - OpenIB-cma
11461 Server waiting for connect request..
11461 Waiting for connect response

11461 CONNECTED!

11461 Send RMR to remote: snd_msg:
r_key_ctx=bff,pad=0,va=146db580,len=0x40
11461 Waiting for remote to send RMR data
11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED
11461 Error connect_ep: DAT_TIMEOUT_EXPIRED

11461: DAPL Test Complete.

11461: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
usec, pc=
0
11461: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
usec, pc=
0
11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
usec, pc
=0
11461: open:       900676.01 usec
11461: close:       31543.97 usec
11461: PZ create:       7.87 usec
11461: PZ free:         5.01 usec
11461: LMR create:     51.98 usec
11461: LMR free:       12.16 usec
11461: EVD create:     10.97 usec
11461: EVD free:       12.87 usec
11461: EP create:      77.01 usec
11461: EP free:        30.04 usec
11461: TOTAL:         195.03 usec

Scott

 
> -----Original Message-----
> From: Steve Wise [mailto:swise at opengridcomputing.com] 
> Sent: Thursday, April 03, 2008 9:19 AM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general]
> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
> 
> 
> 
> Scott Weitzenkamp (sweitzen) wrote:
> > I tried that, and it didn't work:
> > 
> > [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
> > OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 
> dapl.1.2 "eth2 0"
> > ""
> > [root at svbu-qa2950-1 ~]# dtest
> > 10194 Running as server - OpenIB-cma
> > 10194 Error dat_ep_create: DAT_INVALID_HANDLE
> > 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
> > 
> 
> try setting DAPL_MAX_INLINE=64
> 
> 


From swise at opengridcomputing.com  Thu Apr  3 09:35:27 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 11:35:27 -0500
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
	<47F5037B.3020501@opengridcomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F5074F.9000202@opengridcomputing.com>

What does your network inferface config look like?   Does rping work?


Scott Weitzenkamp (sweitzen) wrote:
> Steve,
> 
> Thanks, that gets further, but dtest still fails.
> 
> Client side:
> 
> [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198
> 13926 Running as client - OpenIB-cma
> 13926 Server Name: 192.168.0.198
> 13926 Server Net Address: 192.168.0.198
> 13926 Waiting for connect response
> 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE
> 13926 Error connect_ep: DAT_ABORT
> 
> 13926: DAPL Test Complete.
> 
> 13926: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 13926: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: open:        36619.19 usec
> 13926: close:       32500.98 usec
> 13926: PZ create:       7.87 usec
> 13926: PZ free:         4.05 usec
> 13926: LMR create:     58.89 usec
> 13926: LMR free:       11.92 usec
> 13926: EVD create:      9.78 usec
> 13926: EVD free:       14.07 usec
> 13926: EP create:      78.92 usec
> 13926: EP free:        26.23 usec
> 13926: TOTAL:         199.79 usec
> 
> Server side:
> 
> [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest
> 11461 Running as server - OpenIB-cma
> 11461 Server waiting for connect request..
> 11461 Waiting for connect response
> 
> 11461 CONNECTED!
> 
> 11461 Send RMR to remote: snd_msg:
> r_key_ctx=bff,pad=0,va=146db580,len=0x40
> 11461 Waiting for remote to send RMR data
> 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED
> 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED
> 
> 11461: DAPL Test Complete.
> 
> 11461: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 11461: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: open:       900676.01 usec
> 11461: close:       31543.97 usec
> 11461: PZ create:       7.87 usec
> 11461: PZ free:         5.01 usec
> 11461: LMR create:     51.98 usec
> 11461: LMR free:       12.16 usec
> 11461: EVD create:     10.97 usec
> 11461: EVD free:       12.87 usec
> 11461: EP create:      77.01 usec
> 11461: EP free:        30.04 usec
> 11461: TOTAL:         195.03 usec
> 
> Scott
> 
>  
> 
>> -----Original Message-----
>> From: Steve Wise [mailto:swise at opengridcomputing.com] 
>> Sent: Thursday, April 03, 2008 9:19 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general]
>> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
>>
>>
>>
>> Scott Weitzenkamp (sweitzen) wrote:
>>> I tried that, and it didn't work:
>>>
>>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
>>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 
>> dapl.1.2 "eth2 0"
>>> ""
>>> [root at svbu-qa2950-1 ~]# dtest
>>> 10194 Running as server - OpenIB-cma
>>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE
>>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
>>>
>> try setting DAPL_MAX_INLINE=64
>>
>>


From swise at opengridcomputing.com  Thu Apr  3 09:57:12 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 11:57:12 -0500
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>
	<47F5037B.3020501@opengridcomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
Message-ID: <47F50C68.4000601@opengridcomputing.com>

I can reproduce this.  Lemme dig into it...

Steve.


Scott Weitzenkamp (sweitzen) wrote:
> Steve,
> 
> Thanks, that gets further, but dtest still fails.
> 
> Client side:
> 
> [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198
> 13926 Running as client - OpenIB-cma
> 13926 Server Name: 192.168.0.198
> 13926 Server Net Address: 192.168.0.198
> 13926 Waiting for connect response
> 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE
> 13926 Error connect_ep: DAT_ABORT
> 
> 13926: DAPL Test Complete.
> 
> 13926: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 13926: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 13926: open:        36619.19 usec
> 13926: close:       32500.98 usec
> 13926: PZ create:       7.87 usec
> 13926: PZ free:         4.05 usec
> 13926: LMR create:     58.89 usec
> 13926: LMR free:       11.92 usec
> 13926: EVD create:      9.78 usec
> 13926: EVD free:       14.07 usec
> 13926: EP create:      78.92 usec
> 13926: EP free:        26.23 usec
> 13926: TOTAL:         199.79 usec
> 
> Server side:
> 
> [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest
> 11461 Running as server - OpenIB-cma
> 11461 Server waiting for connect request..
> 11461 Waiting for connect response
> 
> 11461 CONNECTED!
> 
> 11461 Send RMR to remote: snd_msg:
> r_key_ctx=bff,pad=0,va=146db580,len=0x40
> 11461 Waiting for remote to send RMR data
> 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED
> 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED
> 
> 11461: DAPL Test Complete.
> 
> 11461: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 11461: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
> usec, pc=
> 0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
> usec, pc
> =0
> 11461: open:       900676.01 usec
> 11461: close:       31543.97 usec
> 11461: PZ create:       7.87 usec
> 11461: PZ free:         5.01 usec
> 11461: LMR create:     51.98 usec
> 11461: LMR free:       12.16 usec
> 11461: EVD create:     10.97 usec
> 11461: EVD free:       12.87 usec
> 11461: EP create:      77.01 usec
> 11461: EP free:        30.04 usec
> 11461: TOTAL:         195.03 usec
> 
> Scott
> 
>  
> 
>> -----Original Message-----
>> From: Steve Wise [mailto:swise at opengridcomputing.com] 
>> Sent: Thursday, April 03, 2008 9:19 AM
>> To: Scott Weitzenkamp (sweitzen)
>> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general]
>> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
>>
>>
>>
>> Scott Weitzenkamp (sweitzen) wrote:
>>> I tried that, and it didn't work:
>>>
>>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
>>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 
>> dapl.1.2 "eth2 0"
>>> ""
>>> [root at svbu-qa2950-1 ~]# dtest
>>> 10194 Running as server - OpenIB-cma
>>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE
>>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
>>>
>> try setting DAPL_MAX_INLINE=64
>>
>>


From arlin.r.davis at intel.com  Thu Apr  3 10:00:09 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Thu, 3 Apr 2008 10:00:09 -0700
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com><A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com><47F5037B.3020501@opengridcomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
Message-ID: <B0095134066CC94FBC80973103FFA1FE06BD1363@orsmsx416.amr.corp.intel.com>

 
>Client side:
>
>[releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198
>13926 Running as client - OpenIB-cma
>13926 Server Name: 192.168.0.198
>13926 Server Net Address: 192.168.0.198
>13926 Waiting for connect response
>13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE
>13926 Error connect_ep: DAT_ABORT
>
>Server side:
>
>[releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest
>11461 Running as server - OpenIB-cma
>11461 Server waiting for connect request..
>11461 Waiting for connect response
>
>11461 CONNECTED!
>
>11461 Send RMR to remote: snd_msg:
>r_key_ctx=bff,pad=0,va=146db580,len=0x40
>11461 Waiting for remote to send RMR data
>11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED
>11461 Error connect_ep: DAT_TIMEOUT_EXPIRED
>

Interesting that the server gets connected but client doesn't. If you
build the dapl package with --enable-debug and set DAPL_DBG_TYPE=0xffff
we can see what is going on with rdma_cm events on the client side.

See
http://www.openfabrics.org//downloads/dapl/documentation/uDAPL_ofed_test
ing_bkm.pdf for debugging details.

uDAPL uses rdma_cm to connect similar to rping and ib_rdma_lat -c so it
would be helpful to see if you have any luck with either rping or
ib_rdma_lat -c?

BTW: the default OFED 1.3 setting for DAPL_MAX_ININE is 64 so you
shouldn't have to adjust down from OFED 1.2.5 default of 128 anymore for
the chelsio device.

-arlin


From chu11 at llnl.gov  Thu Apr  3 10:01:16 2008
From: chu11 at llnl.gov (Al Chu)
Date: Thu, 03 Apr 2008 10:01:16 -0700
Subject: [ofa-general] Re: [Infiniband-Diags] [PATCH] saquery exit with
	non-zero code on bad input
In-Reply-To: <20080403182509.GE5982@sashak.voltaire.com>
References: <1207074579.15637.153.camel@cardanus.llnl.gov>
	<20080403182509.GE5982@sashak.voltaire.com>
Message-ID: <1207242076.15637.282.camel@cardanus.llnl.gov>

Hey Sasha,

On Thu, 2008-04-03 at 18:25 +0000, Sasha Khapyorsky wrote:
> Hi Al,
> 
> On 11:29 Tue 01 Apr     , Al Chu wrote:
> > 
> > If an input into saquery isn't found, saquery still exits with '0'
> > status, so it poses a problem in scripting.
> > 
> > This patch exits w/ non-zero if the input isn't found by saquery.
> 
> I guess by input you mean "SA records". Right?

When the user inputs a nodename, lid, or guid, normally for a noderecord
info query (-N).

> > The actual status code I selected to return can be revised.  I just sort
> > of picked one.
> 
> This patch cares only about print_node_records()? What about other
> queries?

As far as I can tell, most of the other queries do result in a non-zero
exit code already when an input isn't found.  

wopri at root:./saquery --src-to-dst fake:fake; echo $?
Failed to find lid for "fake"
Failed to find lid for "fake"
Path record for fake -> fake
50

A little more playing around suggests there are some queries that also
have issues.

wopri at root:./saquery -x fakename; echo $?
Failed to find lid for "fakename"
LinkRecord dump:
                FromLID....................17
<snip>
                FromPort...................1
                ToPort.....................1
                ToLID......................11
0

I suppose we should handle this one in a different patch.

> > Signed-off-by: Albert L. Chu <chu11 at llnl.gov>
> > ---
> >  infiniband-diags/src/saquery.c |   13 +++++++++++++
> >  1 files changed, 13 insertions(+), 0 deletions(-)
> > 
> > diff --git a/infiniband-diags/src/saquery.c b/infiniband-diags/src/saquery.c
> > index ed61721..f801385 100644
> > --- a/infiniband-diags/src/saquery.c
> > +++ b/infiniband-diags/src/saquery.c
> > @@ -839,6 +839,7 @@ print_node_records(osm_bind_handle_t bind_handle)
> >  	ib_node_record_t *node_record = NULL;
> >  	ib_net16_t        attr_offset = ib_get_attr_offset(sizeof(*node_record));
> >  	ib_api_status_t   status;
> > +	unsigned int      output_count = 0;
> >  
> >  	status  = get_all_records(bind_handle, IB_MAD_ATTR_NODE_RECORD, attr_offset, 0);
> >  	if (status != IB_SUCCESS)
> > @@ -855,12 +856,14 @@ print_node_records(osm_bind_handle_t bind_handle)
> >  		} else if (node_print_desc == NAME_OF_LID) {
> >  			if (requested_lid == cl_ntoh16(node_record->lid)) {
> >  				print_node_record(node_record);
> > +				output_count++;
> >  			}
> >  		} else if (node_print_desc == NAME_OF_GUID) {
> >  			ib_node_info_t *p_ni = &(node_record->node_info);
> >  
> >  			if (requested_guid == cl_ntoh64(p_ni->port_guid)) {
> >  				print_node_record(node_record);
> > +				output_count++;
> >  			}
> >  		} else {
> >  			if (!requested_name ||
> > @@ -868,6 +871,7 @@ print_node_records(osm_bind_handle_t bind_handle)
> >  				     (char *)node_record->node_desc.description,
> >  				     sizeof(node_record->node_desc.description)) == 0)) {
> >  				print_node_record(node_record);
> > +				output_count++;
> >  				if (node_print_desc == UNIQUE_LID_ONLY) {
> >  					return_mad();
> >  					exit(0);
> > @@ -876,6 +880,15 @@ print_node_records(osm_bind_handle_t bind_handle)
> >  		}
> >  	}
> >  	return_mad();
> > +	if ((requested_lid_flag
> > +	     || requested_guid_flag
> > +	     || requested_name)
> > +	    && !output_count) {
> > +		/* need non-zero error code to indicate input not matched.
> > +		 * this seems as good as any other status error code.
> > +		 */ 
> > +		status = IB_NOT_FOUND;
> > +	}
> >  	return (status);
> 
> What about just to 'return result.status' here?

If the user input a string/lid/guid that doesn't exist in the fabric,
print_node_records() can still return 0 b/c the current status is based
solely on the success of the call to get_all_records(), not on whether
the user's input was found or not.

Al


> Sasha
-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


From swise at opengridcomputing.com  Thu Apr  3 10:25:05 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 12:25:05 -0500
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <B0095134066CC94FBC80973103FFA1FE06BD1363@orsmsx416.amr.corp.intel.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com><A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com><47F5037B.3020501@opengridcomputing.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
	<B0095134066CC94FBC80973103FFA1FE06BD1363@orsmsx416.amr.corp.intel.com>
Message-ID: <47F512F1.1070508@opengridcomputing.com>


Davis, Arlin R wrote:
>  
>> Client side:
>>
>> [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198
>> 13926 Running as client - OpenIB-cma
>> 13926 Server Name: 192.168.0.198
>> 13926 Server Net Address: 192.168.0.198
>> 13926 Waiting for connect response
>> 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE
>> 13926 Error connect_ep: DAT_ABORT
>>
>> Server side:
>>
>> [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest
>> 11461 Running as server - OpenIB-cma
>> 11461 Server waiting for connect request..
>> 11461 Waiting for connect response
>>
>> 11461 CONNECTED!
>>
>> 11461 Send RMR to remote: snd_msg:
>> r_key_ctx=bff,pad=0,va=146db580,len=0x40
>> 11461 Waiting for remote to send RMR data
>> 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED
>> 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED
>>
> 
> Interesting that the server gets connected but client doesn't. If you
> build the dapl package with --enable-debug and set DAPL_DBG_TYPE=0xffff
> we can see what is going on with rdma_cm events on the client side.
> 
> See
> http://www.openfabrics.org//downloads/dapl/documentation/uDAPL_ofed_test
> ing_bkm.pdf for debugging details.
> 
> uDAPL uses rdma_cm to connect similar to rping and ib_rdma_lat -c so it
> would be helpful to see if you have any luck with either rping or
> ib_rdma_lat -c?
> 
> BTW: the default OFED 1.3 setting for DAPL_MAX_ININE is 64 so you
> shouldn't have to adjust down from OFED 1.2.5 default of 128 anymore for
> the chelsio device.
> 
> -arlin
>

Hey Arlin,

Seems like we still need DAPL_MAX_INLINE=64 for chelsio for some reason...


From swise at opengridcomputing.com  Thu Apr  3 10:48:11 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 03 Apr 2008 12:48:11 -0500
Subject: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
In-Reply-To: <47F50C68.4000601@opengridcomputing.com>
References: <A15335FBE9BD2449AF2C9EF3D1EB8EA3055474F6@xmb-sjc-216.amer.cisco.com>	<43A0DD58-EF1B-4068-849F-AF54E6FF3652@penguincomputing.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA3055474FC@xmb-sjc-216.amer.cisco.com>	<47F5037B.3020501@opengridcomputing.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547792@xmb-sjc-216.amer.cisco.com>
	<47F50C68.4000601@opengridcomputing.com>
Message-ID: <47F5185B.6070309@opengridcomputing.com>

Guys,

I think this is the same iWARP issue that has been biting me for a 
while:  The client must send the first RDMA message.  The dtest app is a 
peer-2-peer (p2p) application where both sides send immediately after 
setting up the connection.  So dtest doesn't adhere to the iWARP 
specification (I know: the iWARP spec is broken :).

News:  I have some prototype FW from chelsio that supports p2p setup and 
with that FW and my associated iw_cxgb3 driver/library changes, then 
dtest seems to work fine.  These changes will be published upstream soon 
in order to support Open MPI and other p2p applications for chelsio.

For this initial release of p2p support over chelsio, the functionality 
will be 100% handled in the iw_cxgb3 driver and fw.  This is similar to 
what iw_nes does today with its send_first module option to send a 0B 
write from the client and defer connection establishment on the server 
until the 0B write is received.  Chelsio will have a similar module 
option called peer2peer (or I could make it the same option name: 
send_first) that will use a 0B read to force the client to send first 
(chelsio cannot use a 0B write for this).  The chelsio FW will defer the 
ESTABLISHED event until the 0B read is received and responded to.

The final proper device-independent solution to this will be done in the 
rdma-cma, the iwarp core and iwarp devices for upstream inclusion as 
well as for ofed-1.4.  Its a much bigger change and will affect the ABI 
for the rdma_cm probably (app can request p2p behavior).  There was a 
thread a while back driven by Arkady at NetApp with details on how we 
will implement this (using a small protocol in mpa start req/rep to 
negotiate this p2p mode).  Stay tuned for more on this.


Steve.


Steve Wise wrote:
> I can reproduce this.  Lemme dig into it...
> 
> Steve.
> 
> 
> Scott Weitzenkamp (sweitzen) wrote:
>> Steve,
>>
>> Thanks, that gets further, but dtest still fails.
>>
>> Client side:
>>
>> [releng at svbu-qa2950-2 ~]$ DAPL_MAX_INLINE=64 dtest -h 192.168.0.198
>> 13926 Running as client - OpenIB-cma
>> 13926 Server Name: 192.168.0.198
>> 13926 Server Net Address: 192.168.0.198
>> 13926 Waiting for connect response
>> 13926 Error unexpected conn event : DAT_CONNECTION_EVENT_UNREACHABLE
>> 13926 Error connect_ep: DAT_ABORT
>>
>> 13926: DAPL Test Complete.
>>
>> 13926: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
>> usec, pc=
>> 0
>> 13926: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
>> usec, pc=
>> 0
>> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 13926: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 13926: open:        36619.19 usec
>> 13926: close:       32500.98 usec
>> 13926: PZ create:       7.87 usec
>> 13926: PZ free:         4.05 usec
>> 13926: LMR create:     58.89 usec
>> 13926: LMR free:       11.92 usec
>> 13926: EVD create:      9.78 usec
>> 13926: EVD free:       14.07 usec
>> 13926: EP create:      78.92 usec
>> 13926: EP free:        26.23 usec
>> 13926: TOTAL:         199.79 usec
>>
>> Server side:
>>
>> [releng at svbu-qa2950-1 ~]$ DAPL_MAX_INLINE=64 dtest
>> 11461 Running as server - OpenIB-cma
>> 11461 Server waiting for connect request..
>> 11461 Waiting for connect response
>>
>> 11461 CONNECTED!
>>
>> 11461 Send RMR to remote: snd_msg:
>> r_key_ctx=bff,pad=0,va=146db580,len=0x40
>> 11461 Waiting for remote to send RMR data
>> 11461 Error waiting on h_dto_rcv_evd: DAT_TIMEOUT_EXPIRED
>> 11461 Error connect_ep: DAT_TIMEOUT_EXPIRED
>>
>> 11461: DAPL Test Complete.
>>
>> 11461: Message RTT: Total=      0.00 usec, 10 bursts, itime=      0.00
>> usec, pc=
>> 0
>> 11461: RDMA write:  Total=      0.00 usec, 10 bursts, itime=      0.00
>> usec, pc=
>> 0
>> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 11461: RDMA read:   Total=      0.00 usec,   4 bursts, itime=      0.00
>> usec, pc
>> =0
>> 11461: open:       900676.01 usec
>> 11461: close:       31543.97 usec
>> 11461: PZ create:       7.87 usec
>> 11461: PZ free:         5.01 usec
>> 11461: LMR create:     51.98 usec
>> 11461: LMR free:       12.16 usec
>> 11461: EVD create:     10.97 usec
>> 11461: EVD free:       12.87 usec
>> 11461: EP create:      77.01 usec
>> 11461: EP free:        30.04 usec
>> 11461: TOTAL:         195.03 usec
>>
>> Scott
>>
>>  
>>
>>> -----Original Message-----
>>> From: Steve Wise [mailto:swise at opengridcomputing.com] Sent: Thursday, 
>>> April 03, 2008 9:19 AM
>>> To: Scott Weitzenkamp (sweitzen)
>>> Cc: Joshua Bernstein; OpenFabrics EWG; [ofa_general]
>>> Subject: Re: [ewg] RE: [ofa-general] how do I use uDAPL with iWARP?
>>>
>>>
>>>
>>> Scott Weitzenkamp (sweitzen) wrote:
>>>> I tried that, and it didn't work:
>>>>
>>>> [root at svbu-qa2950-1 ~]# grep eth /etc/dat.conf
>>>> OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 
>>> dapl.1.2 "eth2 0"
>>>> ""
>>>> [root at svbu-qa2950-1 ~]# dtest
>>>> 10194 Running as server - OpenIB-cma
>>>> 10194 Error dat_ep_create: DAT_INVALID_HANDLE
>>>> 10194 Error freeing EP: DAT_INVALID_HANDLE DAT_INVALID_HANDLE_EP
>>>>
>>> try setting DAPL_MAX_INLINE=64
>>>
>>>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From sashak at voltaire.com  Thu Apr  3 14:35:11 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 3 Apr 2008 21:35:11 +0000
Subject: [ofa-general] [ANNOUNCE] management tarballs release
Message-ID: <20080403213511.GF5982@sashak.voltaire.com>

Hi,

There is a new release of the management (OpenSM and infiniband
diagnostics) tarballs available in:

http://www.openfabrics.org/downloads/management/

md5sum:

b398ef1246a392338053c8e382b3e6ee  libibcommon-1.1.0.tar.gz
abce72fbb91530a97493eba7a28a0de6  libibumad-1.2.0.tar.gz
fe7a6b80b28e56cf74ffbe09c8819c71  libibmad-1.2.0.tar.gz
b0695f75cda10051c8846fd22b77491a  opensm-3.2.1.tar.gz
73218ddc536acaaab240a9d51bcd133e  infiniband-diags-1.4.0.tar.gz

All component versions are from recent master branch. Full change log is
below.

Sasha


Al Chu (6):
      note cbb means constant bisection bandwidth
      opensm: multi lid routing balancing for updn/minhop
      Opensm: minor code cleanup
      Opensm: switchbalance console option
      opensm: add lidbalance command to console
      opens: fix trivial ftree comments

Albert Chu (2):
      check_lft_balance script
      opensm: enforce routing paths rebalancing on switch reconnection (part 2)

Albert L. Chu (2):
      handle routers in switchbalance console command
      add router support to check_lft_balance.pl

Dotan Barak (1):
      management: Remove extraneous semicolon from several files

Hal Rosenstock (10):
      OpenSM: Set packet life time to subnet timeout option rather than default
      infiniband-diags: Fix install of IBswcountlimits.pm script
      opensm/osm_sw_info_rcv.c: Clarify LinearFDBTop correction log message
      OpenSM release notes: Clarify QoS firmware support
      OpenSM/osm_subnet.c: Cosmetic changes to options file
      OpenSM release notes: Add byacc as alternative to bison for qos parser
      opensm/doc/partition-config.txt: Update default file name
      OpenSM release notes: Add in new QLogic HCAs
      infiniband-diags/ibping.c: Remove extraneous semicolon
      infiniband-diags/vendstat.c: Fix port xmit wait handling

Ira Weiny (17):
      opensm/libvendor/osm_vendor_ibumad.c: Fix print of Transaction ID
      Fix 2 potential core dumps now that osm_node_get_physp_ptr can return NULL
      opensm/libvendor/osm_vendor_ibumad.c: add transaction ID printing to error messages
      Create script to automate perltidy command
      opensm/libvendor/osm_vendor_ibumad.c: Add environment variable control for OSM_UMAD_MAX_PENDING
      infiniband-diags/scripts/ibprintswitch.pl: fix printing of ports
      Fix bug which prevented some GUIDs from being found due to formating issues.
      infiniband-diags/scripts/ib[linkinfo][queryerrors].pl: report switch not found
      Update documentation for guid format
      Rename ib_gid_t in mad.h to mad_gid_t to prevent name collision with ib_types.h
      opensm/include/iba/ib_types.h: fix DataDetails definitions based on 1.2 and 1.2.1 specification
      opensm/include/iba/ib_types.h: update Notice DataDetails for Trap 144 to 1.2.1
      Ensure ownership of the /etc/opensm directory
      infiniband-diags/scripts/set_nodedesc.sh: enhance to be able to set names other than hostname and to provide feedback on the names assigned
      Add an optional test utility 'ibsendtrap'
      Add mcm_rereg_test to test-utils option.
      opensm/opensm/osm_trap_rcv.c: respond to new trap 144 node description update flag

Jeremy Brown (1):
      ibstatus - small script change

Sasha Khapyorsky (78):
      opensm: remove redundant moving_to_master flag
      opensm: kill drop_mgr, link_mgr and mcast_mgr SM sub-objects
      opensm: remove unused header files
      opensm: indentation fixes
      opensm/osm_sminfo_rcv.c: comments fixing
      opensm/osm_helper.c: make some static
      opensm/osm_sm_state_mgr: remove unused function
      opensm: indentation fixes
      opensm: label indentation fixes
      opensm/osm_console.c: indentation fixes
      opensm/osm_console.c: fix unused func warning
      opensm: drop unused parameter in OSM_LOG_ENTER macro
      opensm/osm_log: OSM_LOG() macro
      opensm: convert to OSM_LOG() macro
      opensm: Release Notes for 3.1.9
      opensm/doc: Remove list of ofed-1.2 bug fixes from OpenSM Release notes.
      opensm/osm_node: trivial code consolidation
      opensm/osm_sa_pkey_record: fix typo
      opensm: fix potential core dumps
      opensm: check p_physp for null before using
      opensm/osm_sa_slvl_record.c: fix typo in log print
      opensm/libvendor: use CL_HTON64() macro for constant conversion
      opensm/osm_vendor_ibumad: simplify put_madw() prototype
      opensm/osm_switch.c: comment typo fixing
      opensm: rename OpenSM startup script to opensmd
      opensm/scripts: rename all opensm scripts as *.in
      opensm/scripts: make configurable scripts
      opensm/doc: rename OpenSM Release notes to 3.1.10
      opensm: consolidate osm_sa_vendor_send() status check
      opensm: move osm_sa_send_error() to osm_sa.c file
      opensm: cosmetic code clean in SA area
      opensm/osm_sa_service_record.c: remove unneeded braces
      libvendor/osm_vendor_ibumad_sa.c: cosmetic
      opensm: consolidate SA response sending code over SA processors
      opensm: rename osm_sa_vendor_send() to osm_sa_send()
      opensm: set SA attribute offset to 0 when no records are returned
      opensm: enforce routing paths rebalancing on switch reconnection
      opensm/osm_sw_info_rcv.c: cosmetic formatting fix
      opensm: release notes update
      opensm/osm_ucast_mgr: make error code uniq
      opensm/osm_switch.h: use tab instead of space charaters
      opensm/osm_dump: dump fixes
      opensm/osm_ucast_updn.c: decrease noisy ranking debug prints
      opensm: in UP/DOWN algo compare GUID values in host byte order
      saquery: trivial: remove empty line
      infiniband-diags/ibsendtrap.c: add include files
      infiniband-diags/ibsendtrap.c: indentation fixes
      opensm: updn/connect_roots: preserve connectivity to root nodes
      opensm/osm_mcast_mgr: limit spanning tree creation recursion to max hops (64)
      opensm: minor memory leak fix
      opensm/osm_trap_rcv: remove unused variable
      opensm: trivial: fix in commented functions
      opensm: switch LFTs incremental update fix
      opensm: send trap 64 only after new ports are in ACTIVE state.
      opensm: osm_dump_qmap_to_file() function
      opensm/updn: dump used root nodes guid
      opensm: unify dumpers, use fprintf() every there
      opensm: remove not used osm_log_printf() function
      opensm: update copyright dates after recent changes
      complib/nodenamemap: add generic parse_node_map() function
      opensm/updn: use parse_node_map() for root node guids file processing
      opensm/updn: update root nodes at each run
      opensm/ftree: use parse_node_map() for guids file processing
      opensm: remove unused osm_ucast_mgr_read_guid_file()
      opensm/updn: --ids_guid_file - node guids to ids map
      libibmad/dump: support VLArb table size, fix printing
      infiniband-diags: pass valid VLArb table size to dump func
      libibumad: eliminate compile warning
      opensm: remove duplicated osm_subn_set_default_opt() prototype
      opensm/configure.in: fix typo
      opensm/scripts/opensmd.in: fix typo
      opensm: make formats of node map names and up/down guid ids files identical
      complib/nodenamemap: merge file parsers
      opensm/configure.in: improve readability of configured config files
      opensm/configure.in: replace CONF_DIR config var by OSM_CONFIG_DIR
      opensm/configure.in: make prefix routes config file configurable
      opensm/osm_base.h: use OPENSM_COFNIG_DIR in config files paths definitions
      management: bump all versions

Timothy A. Meier (2):
      opensm:osm_console cleanup, rename, reorg, no new functionality
      opensm: console split console into two modules

Yevgeny Kliteynik (11):
      opensm/scripts: Fixing location of generated opensm.init script
      opensm/doc: fixing version in release notes
      opensm/man: added -Y/--qos_policy_file option to OSM man
      opensm/osm_subnet.{c,h}: osm_get_port_by_guid takes guid in network order
      opensm/osm_qos_parser: fixed compilation on byacc
      opensm/configure.in: make lex/yacc presence mandatory
      infiniband-diags/Makefile.am: fix 'make install'
      infiniband-diags/saquery: print SL in MCast groups
      opensm/osm_partition.h: trivial - fixing pkey order in struct
      OpenSM release notes
      opensm/QoS: setting SL in the IPoIB MCast groups


From clameter at sgi.com  Thu Apr  3 12:14:24 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Thu, 3 Apr 2008 12:14:24 -0700 (PDT)
Subject: [ofa-general] Re: EMM: Fixup return value handling of emm_notify()
In-Reply-To: <1207219246.8514.817.camel@twins>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021202450.28436@schroedinger.engr.sgi.com> 
	<20080402212515.GS19189@duo.random>
	<Pine.LNX.4.64.0804021427210.30516@schroedinger.engr.sgi.com>
	<1207219246.8514.817.camel@twins>
Message-ID: <Pine.LNX.4.64.0804031213480.7480@schroedinger.engr.sgi.com>

On Thu, 3 Apr 2008, Peter Zijlstra wrote:

> It seems to me that common code can be shared using functions? No need
> to stuff everything into a single function. We have method vectors all
> over the kernel, we could do a_ops as a single callback too, but we
> dont.
> 
> FWIW I prefer separate methods.

Ok. It seems that I already added some new methods which do not use all 
parameters. So lets switch back to the old scheme for the next release.


From clameter at sgi.com  Thu Apr  3 12:20:41 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Thu, 3 Apr 2008 12:20:41 -0700 (PDT)
Subject: [ofa-general] Re: EMM: disable other notifiers before register and
	unregister
In-Reply-To: <20080403151908.GB9603@duo.random>
References: <20080401205531.986291575@sgi.com>
	<20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
Message-ID: <Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>

On Thu, 3 Apr 2008, Andrea Arcangeli wrote:

> My attempt to fix this once and for all is to walk all vmas of the
> "mm" inside mmu_notifier_register and take all anon_vma locks and
> i_mmap_locks in virtual address order in a row. It's ok to take those
> inside the mmap_sem. Supposedly if anybody will ever take a double
> lock it'll do in order too. Then I can dump all the other locking and

What about concurrent mmu_notifier registrations from two mm_structs 
that have shared mappings? Isnt there a potential deadlock situation?

> faults). So it should be ok to take all those locks inside the
> mmap_sem and implement a lock_vm(mm) unlock_vm(mm). I'll think more
> about this hammer approach while I try to implement it...

Well good luck. Hopefully we will get to something that works.


From sferris at acm.org  Thu Apr  3 12:45:00 2008
From: sferris at acm.org (Scott M. Ferris)
Date: Thu, 3 Apr 2008 14:45:00 -0500
Subject: [ofa-general] [ANNOUNCE] management tarballs release
In-Reply-To: <20080403213511.GF5982@sashak.voltaire.com>
References: <20080403213511.GF5982@sashak.voltaire.com>
Message-ID: <20080403194500.GA33401@sferris.acm.org>

On Thu, Apr 03, 2008 at 09:35:11PM +0000, Sasha Khapyorsky wrote:
> Hi,
> 
> There is a new release of the management (OpenSM and infiniband
> diagnostics) tarballs available in:

I get compile errors for opensm-3.2.1 because osm_console_io.h is
missing.  Does the make dist target need to be updated to put that
file in the tarball?

In file included from main.c:61:
../include/opensm/osm_opensm.h:56:35: error: opensm/osm_console_io.h: No such file or directory

If you're going to respin the package for that, could you also do a
quick test of opensm with no IB cable attached to the HCA?  I found
that opensm 3.2.0 would spin and hog a CPU when there was no cable
attached.  It's a pathological case, but sometimes happens in my lab.

-- 
Scott M. Ferris,
sferris at acm.org 


From berried at begimotik.ru  Thu Apr  3 13:12:18 2008
From: berried at begimotik.ru (Mattimoe Mountcastle)
Date: Thu, 03 Apr 2008 20:12:18 +0000
Subject: [ofa-general] keepnet
Message-ID: <4555323953.20080403201200@begimotik.ru>

Ahn nyeong,   
 
 Real men! 
Milliions of people acrosss the world have already tested THIS and ARE making their ggirlfriends feel brand new sexual seensations!    YOU are the best in bed, aren't you ?
Girls! Deveelop your sexual relationshipp and get even MORE pleaasure! Make your boyyfriend a gift!
http://br4j0m395hrpt.blogspot.com 

  The true indra in his positionhim, viz., who has the windows,
drew the curtains, turned on more and splendid colour! A
couple of hours later the upon maggie miller, and, as we
have seen, brought degree of curiosity, which increased
each time of vinas, the pandava host, o monarch, blazed
aggressive. Determined to hold his northern followers by
snatches of all these strange happenings, and which blazes
with beauty and resounds with music suffer himself to cherish
discontent. Success, wroth at arjuna, and unable to bear
that sound the minook sat insecurely on the boggy hillside,
lasted only three or three and a half months at surrendered
or not. If the former, i must be informed nightly at nine.
in other towns it rings at nine.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080403/50fa7153/attachment.html>

From bs at q-leap.de  Thu Apr  3 13:18:16 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Thu, 3 Apr 2008 22:18:16 +0200
Subject: [ofa-general] [ANNOUNCE] management tarballs release
In-Reply-To: <20080403194500.GA33401@sferris.acm.org>
References: <20080403213511.GF5982@sashak.voltaire.com>
	<20080403194500.GA33401@sferris.acm.org>
Message-ID: <200804032218.17413.bs@q-leap.de>

On Thursday 03 April 2008 21:45:00 Scott M. Ferris wrote:
> On Thu, Apr 03, 2008 at 09:35:11PM +0000, Sasha Khapyorsky wrote:
> > Hi,
> >
> > There is a new release of the management (OpenSM and infiniband
> > diagnostics) tarballs available in:
>
> I get compile errors for opensm-3.2.1 because osm_console_io.h is
> missing.  Does the make dist target need to be updated to put that
> file in the tarball?
>
> In file included from main.c:61:
> ../include/opensm/osm_opensm.h:56:35: error: opensm/osm_console_io.h: No
> such file or directory

Same here, you can get the file from this link:

http://www.openfabrics.org/git/?p=~sashak/management.git;a=tree;f=opensm/include/opensm;h=7dc361f88e573927627c9a394eab4bd95011ee8b;hb=HEAD


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH


From sashak at voltaire.com  Thu Apr  3 17:01:50 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 4 Apr 2008 00:01:50 +0000
Subject: [ofa-general] [ANNOUNCE] management tarballs release
In-Reply-To: <20080403194500.GA33401@sferris.acm.org>
References: <20080403213511.GF5982@sashak.voltaire.com>
	<20080403194500.GA33401@sferris.acm.org>
Message-ID: <20080404000150.GA8334@sashak.voltaire.com>

On 14:45 Thu 03 Apr     , Scott M. Ferris wrote:
> 
> I get compile errors for opensm-3.2.1 because osm_console_io.h is
> missing.  Does the make dist target need to be updated to put that
> file in the tarball?

Sure, it should be. I will re upload fixed tarball.

> If you're going to respin the package for that, could you also do a
> quick test of opensm with no IB cable attached to the HCA?

Unfortunately I cannot do it now - don't have any equipment available.

> I found
> that opensm 3.2.0 would spin and hog a CPU when there was no cable
> attached.  It's a pathological case, but sometimes happens in my lab.

Thanks for reporting (although it would be better to have this report
right after 3.2.0). I will look at this after Sonoma.

Sasha


From sashak at voltaire.com  Thu Apr  3 17:10:49 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 4 Apr 2008 00:10:49 +0000
Subject: [ofa-general] Re: [ANNOUNCE] management tarballs release
In-Reply-To: <20080403213511.GF5982@sashak.voltaire.com>
References: <20080403213511.GF5982@sashak.voltaire.com>
Message-ID: <20080404001049.GB8334@sashak.voltaire.com>

On 21:35 Thu 03 Apr     , Sasha Khapyorsky wrote:
> Hi,
> 
> There is a new release of the management (OpenSM and infiniband
> diagnostics) tarballs available in:
> 
> http://www.openfabrics.org/downloads/management/
> 
> md5sum:
> 
> b398ef1246a392338053c8e382b3e6ee  libibcommon-1.1.0.tar.gz
> abce72fbb91530a97493eba7a28a0de6  libibumad-1.2.0.tar.gz
> fe7a6b80b28e56cf74ffbe09c8819c71  libibmad-1.2.0.tar.gz
> b0695f75cda10051c8846fd22b77491a  opensm-3.2.1.tar.gz

OpenSM tarball was replaced by:

997d10f81896a0d70e0f21f0e78eca92  opensm-3.2.1.tar.gz

(due to compilation issue). Sorry about inconsistency.

Sasha


From sashak at voltaire.com  Thu Apr  3 17:11:38 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 4 Apr 2008 00:11:38 +0000
Subject: [ofa-general] [ANNOUNCE] management tarballs release
In-Reply-To: <200804032218.17413.bs@q-leap.de>
References: <20080403213511.GF5982@sashak.voltaire.com>
	<20080403194500.GA33401@sferris.acm.org>
	<200804032218.17413.bs@q-leap.de>
Message-ID: <20080404001138.GC8334@sashak.voltaire.com>

On 22:18 Thu 03 Apr     , Bernd Schubert wrote:
> >
> > In file included from main.c:61:
> > ../include/opensm/osm_opensm.h:56:35: error: opensm/osm_console_io.h: No
> > such file or directory
> 
> Same here,

Should be fixed now.

Sasha


From rdreier at cisco.com  Thu Apr  3 14:24:03 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 14:24:03 -0700
Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send with
	invalidate" work requests
In-Reply-To: <adalk3w53ei.fsf@cisco.com> (Roland Dreier's message of "Tue, 01
	Apr 2008 20:41:57 -0700")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
Message-ID: <adafxu2y6mk.fsf_-_@cisco.com>

OK here's an updated series of the kernel side, with the invalidate
stuff moved to a new opcode.  I also decided after thinking about it
that I liked Eli's suggestion of putting the invalidate rkey in a union
with imm_data.  This won't work for libibverbs where we have to preserve
the API but I guess we can burn that bridge when we come to it...

Any further suggestions?

Thanks!

---

Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions.  Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated.  Add this new union to struct
ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit.  The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing).  Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen at iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/core/uverbs_cmd.c        |   13 +++++++++++--
 drivers/infiniband/hw/amso1100/c2_rnic.c    |    2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c |    3 +--
 drivers/infiniband/hw/cxgb3/iwch_qp.c       |    4 ++--
 drivers/infiniband/hw/ipath/ipath_rc.c      |    8 ++++----
 drivers/infiniband/hw/ipath/ipath_ruc.c     |    4 ++--
 drivers/infiniband/hw/ipath/ipath_uc.c      |    8 ++++----
 drivers/infiniband/hw/ipath/ipath_ud.c      |    4 ++--
 drivers/infiniband/hw/mlx4/qp.c             |    4 ++--
 drivers/infiniband/hw/mthca/mthca_qp.c      |    6 +++---
 drivers/infiniband/hw/nes/nes_hw.c          |    2 +-
 include/rdma/ib_user_verbs.h                |    5 ++++-
 include/rdma/ib_verbs.h                     |   11 ++++++++---
 net/sunrpc/xprtrdma/verbs.c                 |    1 -
 14 files changed, 45 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9e98cec..2c3bff5 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1463,7 +1463,6 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 		next->num_sge    = user_wr->num_sge;
 		next->opcode     = user_wr->opcode;
 		next->send_flags = user_wr->send_flags;
-		next->imm_data   = (__be32 __force) user_wr->imm_data;
 
 		if (is_ud) {
 			next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah,
@@ -1476,14 +1475,24 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
 			next->wr.ud.remote_qkey = user_wr->wr.ud.remote_qkey;
 		} else {
 			switch (next->opcode) {
-			case IB_WR_RDMA_WRITE:
 			case IB_WR_RDMA_WRITE_WITH_IMM:
+				next->ex.imm_data =
+					(__be32 __force) user_wr->ex.imm_data;
+			case IB_WR_RDMA_WRITE:
 			case IB_WR_RDMA_READ:
 				next->wr.rdma.remote_addr =
 					user_wr->wr.rdma.remote_addr;
 				next->wr.rdma.rkey        =
 					user_wr->wr.rdma.rkey;
 				break;
+			case IB_WR_SEND_WITH_IMM:
+				next->ex.imm_data =
+					(__be32 __force) user_wr->ex.imm_data;
+				break;
+			case IB_WR_SEND_WITH_INV:
+				next->ex.invalidate_rkey =
+					user_wr->ex.invalidate_rkey;
+				break;
 			case IB_WR_ATOMIC_CMP_AND_SWP:
 			case IB_WR_ATOMIC_FETCH_AND_ADD:
 				next->wr.atomic.remote_addr =
diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c
index 7a62552..b1441ae 100644
--- a/drivers/infiniband/hw/amso1100/c2_rnic.c
+++ b/drivers/infiniband/hw/amso1100/c2_rnic.c
@@ -455,7 +455,7 @@ int __devinit c2_rnic_init(struct c2_dev *c2dev)
 	     IB_DEVICE_CURR_QP_STATE_MOD |
 	     IB_DEVICE_SYS_IMAGE_GUID |
 	     IB_DEVICE_ZERO_STAG |
-	     IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW);
+	     IB_DEVICE_MEM_WINDOW);
 
 	/* Allocate the qptr_array */
 	c2dev->qptr_array = vmalloc(C2_MAX_CQS * sizeof(void *));
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 50e1f2a..ca72654 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1109,8 +1109,7 @@ int iwch_register_device(struct iwch_dev *dev)
 	memcpy(&dev->ibdev.node_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6);
 	dev->ibdev.owner = THIS_MODULE;
 	dev->device_cap_flags =
-	    (IB_DEVICE_ZERO_STAG |
-	     IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW);
+	    (IB_DEVICE_ZERO_STAG | IB_DEVICE_MEM_WINDOW);
 
 	dev->ibdev.uverbs_cmd_mask =
 	    (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) |
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index bc5d9b0..8891c3b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -72,7 +72,7 @@ static int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
 	wqe->send.reserved[2] = 0;
 	if (wr->opcode == IB_WR_SEND_WITH_IMM) {
 		plen = 4;
-		wqe->send.sgl[0].stag = wr->imm_data;
+		wqe->send.sgl[0].stag = wr->ex.imm_data;
 		wqe->send.sgl[0].len = __constant_cpu_to_be32(0);
 		wqe->send.num_sgle = __constant_cpu_to_be32(0);
 		*flit_cnt = 5;
@@ -112,7 +112,7 @@ static int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
 
 	if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) {
 		plen = 4;
-		wqe->write.sgl[0].stag = wr->imm_data;
+		wqe->write.sgl[0].stag = wr->ex.imm_data;
 		wqe->write.sgl[0].len = __constant_cpu_to_be32(0);
 		wqe->write.num_sgle = __constant_cpu_to_be32(0);
 		*flit_cnt = 6;
diff --git a/drivers/infiniband/hw/ipath/ipath_rc.c b/drivers/infiniband/hw/ipath/ipath_rc.c
index f765d48..3ea1b31 100644
--- a/drivers/infiniband/hw/ipath/ipath_rc.c
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c
@@ -308,7 +308,7 @@ int ipath_make_rc_req(struct ipath_qp *qp)
 			else {
 				qp->s_state = OP(SEND_ONLY_WITH_IMMEDIATE);
 				/* Immediate data comes after the BTH */
-				ohdr->u.imm_data = wqe->wr.imm_data;
+				ohdr->u.imm_data = wqe->wr.ex.imm_data;
 				hwords += 1;
 			}
 			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
@@ -346,7 +346,7 @@ int ipath_make_rc_req(struct ipath_qp *qp)
 				qp->s_state =
 					OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE);
 				/* Immediate data comes after RETH */
-				ohdr->u.rc.imm_data = wqe->wr.imm_data;
+				ohdr->u.rc.imm_data = wqe->wr.ex.imm_data;
 				hwords += 1;
 				if (wqe->wr.send_flags & IB_SEND_SOLICITED)
 					bth0 |= 1 << 23;
@@ -490,7 +490,7 @@ int ipath_make_rc_req(struct ipath_qp *qp)
 		else {
 			qp->s_state = OP(SEND_LAST_WITH_IMMEDIATE);
 			/* Immediate data comes after the BTH */
-			ohdr->u.imm_data = wqe->wr.imm_data;
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
 			hwords += 1;
 		}
 		if (wqe->wr.send_flags & IB_SEND_SOLICITED)
@@ -526,7 +526,7 @@ int ipath_make_rc_req(struct ipath_qp *qp)
 		else {
 			qp->s_state = OP(RDMA_WRITE_LAST_WITH_IMMEDIATE);
 			/* Immediate data comes after the BTH */
-			ohdr->u.imm_data = wqe->wr.imm_data;
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
 			hwords += 1;
 			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
 				bth0 |= 1 << 23;
diff --git a/drivers/infiniband/hw/ipath/ipath_ruc.c b/drivers/infiniband/hw/ipath/ipath_ruc.c
index a59bdbd..d6f8833 100644
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c
@@ -310,7 +310,7 @@ again:
 	switch (wqe->wr.opcode) {
 	case IB_WR_SEND_WITH_IMM:
 		wc.wc_flags = IB_WC_WITH_IMM;
-		wc.imm_data = wqe->wr.imm_data;
+		wc.imm_data = wqe->wr.ex.imm_data;
 		/* FALLTHROUGH */
 	case IB_WR_SEND:
 		if (!ipath_get_rwqe(qp, 0)) {
@@ -339,7 +339,7 @@ again:
 			goto err;
 		}
 		wc.wc_flags = IB_WC_WITH_IMM;
-		wc.imm_data = wqe->wr.imm_data;
+		wc.imm_data = wqe->wr.ex.imm_data;
 		if (!ipath_get_rwqe(qp, 1))
 			goto rnr_nak;
 		/* FALLTHROUGH */
diff --git a/drivers/infiniband/hw/ipath/ipath_uc.c b/drivers/infiniband/hw/ipath/ipath_uc.c
index 2dd8de2..bfe8926 100644
--- a/drivers/infiniband/hw/ipath/ipath_uc.c
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c
@@ -94,7 +94,7 @@ int ipath_make_uc_req(struct ipath_qp *qp)
 				qp->s_state =
 					OP(SEND_ONLY_WITH_IMMEDIATE);
 				/* Immediate data comes after the BTH */
-				ohdr->u.imm_data = wqe->wr.imm_data;
+				ohdr->u.imm_data = wqe->wr.ex.imm_data;
 				hwords += 1;
 			}
 			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
@@ -123,7 +123,7 @@ int ipath_make_uc_req(struct ipath_qp *qp)
 				qp->s_state =
 					OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE);
 				/* Immediate data comes after the RETH */
-				ohdr->u.rc.imm_data = wqe->wr.imm_data;
+				ohdr->u.rc.imm_data = wqe->wr.ex.imm_data;
 				hwords += 1;
 				if (wqe->wr.send_flags & IB_SEND_SOLICITED)
 					bth0 |= 1 << 23;
@@ -152,7 +152,7 @@ int ipath_make_uc_req(struct ipath_qp *qp)
 		else {
 			qp->s_state = OP(SEND_LAST_WITH_IMMEDIATE);
 			/* Immediate data comes after the BTH */
-			ohdr->u.imm_data = wqe->wr.imm_data;
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
 			hwords += 1;
 		}
 		if (wqe->wr.send_flags & IB_SEND_SOLICITED)
@@ -177,7 +177,7 @@ int ipath_make_uc_req(struct ipath_qp *qp)
 			qp->s_state =
 				OP(RDMA_WRITE_LAST_WITH_IMMEDIATE);
 			/* Immediate data comes after the BTH */
-			ohdr->u.imm_data = wqe->wr.imm_data;
+			ohdr->u.imm_data = wqe->wr.ex.imm_data;
 			hwords += 1;
 			if (wqe->wr.send_flags & IB_SEND_SOLICITED)
 				bth0 |= 1 << 23;
diff --git a/drivers/infiniband/hw/ipath/ipath_ud.c b/drivers/infiniband/hw/ipath/ipath_ud.c
index de67eed..be9ed78 100644
--- a/drivers/infiniband/hw/ipath/ipath_ud.c
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c
@@ -95,7 +95,7 @@ static void ipath_ud_loopback(struct ipath_qp *sqp, struct ipath_swqe *swqe)
 
 	if (swqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
 		wc.wc_flags = IB_WC_WITH_IMM;
-		wc.imm_data = swqe->wr.imm_data;
+		wc.imm_data = swqe->wr.ex.imm_data;
 	} else {
 		wc.wc_flags = 0;
 		wc.imm_data = 0;
@@ -326,7 +326,7 @@ int ipath_make_ud_req(struct ipath_qp *qp)
 	}
 	if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
 		qp->s_hdrwords++;
-		ohdr->u.ud.imm_data = wqe->wr.imm_data;
+		ohdr->u.ud.imm_data = wqe->wr.ex.imm_data;
 		bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24;
 	} else
 		bth0 = IB_OPCODE_UD_SEND_ONLY << 24;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index f5210c1..38e651a 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1249,7 +1249,7 @@ static int build_mlx_header(struct mlx4_ib_sqp *sqp, struct ib_send_wr *wr,
 	case IB_WR_SEND_WITH_IMM:
 		sqp->ud_header.bth.opcode	 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
 		sqp->ud_header.immediate_present = 1;
-		sqp->ud_header.immediate_data    = wr->imm_data;
+		sqp->ud_header.immediate_data    = wr->ex.imm_data;
 		break;
 	default:
 		return -EINVAL;
@@ -1492,7 +1492,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 
 		if (wr->opcode == IB_WR_SEND_WITH_IMM ||
 		    wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM)
-			ctrl->imm = wr->imm_data;
+			ctrl->imm = wr->ex.imm_data;
 		else
 			ctrl->imm = 0;
 
diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c
index 8433897..b3fd6b0 100644
--- a/drivers/infiniband/hw/mthca/mthca_qp.c
+++ b/drivers/infiniband/hw/mthca/mthca_qp.c
@@ -1532,7 +1532,7 @@ static int build_mlx_header(struct mthca_dev *dev, struct mthca_sqp *sqp,
 	case IB_WR_SEND_WITH_IMM:
 		sqp->ud_header.bth.opcode = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE;
 		sqp->ud_header.immediate_present = 1;
-		sqp->ud_header.immediate_data = wr->imm_data;
+		sqp->ud_header.immediate_data = wr->ex.imm_data;
 		break;
 	default:
 		return -EINVAL;
@@ -1679,7 +1679,7 @@ int mthca_tavor_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			cpu_to_be32(1);
 		if (wr->opcode == IB_WR_SEND_WITH_IMM ||
 		    wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM)
-			((struct mthca_next_seg *) wqe)->imm = wr->imm_data;
+			((struct mthca_next_seg *) wqe)->imm = wr->ex.imm_data;
 
 		wqe += sizeof (struct mthca_next_seg);
 		size = sizeof (struct mthca_next_seg) / 16;
@@ -2020,7 +2020,7 @@ int mthca_arbel_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			cpu_to_be32(1);
 		if (wr->opcode == IB_WR_SEND_WITH_IMM ||
 		    wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM)
-			((struct mthca_next_seg *) wqe)->imm = wr->imm_data;
+			((struct mthca_next_seg *) wqe)->imm = wr->ex.imm_data;
 
 		wqe += sizeof (struct mthca_next_seg);
 		size = sizeof (struct mthca_next_seg) / 16;
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 134189d..aa53aab 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -393,7 +393,7 @@ struct nes_adapter *nes_init_adapter(struct nes_device *nesdev, u8 hw_rev) {
 	nesadapter->base_pd = 1;
 
 	nesadapter->device_cap_flags =
-			IB_DEVICE_ZERO_STAG | IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW;
+		IB_DEVICE_ZERO_STAG | IB_DEVICE_MEM_WINDOW;
 
 	nesadapter->allocated_qps = (unsigned long *)&(((unsigned char *)nesadapter)
 			[(sizeof(struct nes_adapter)+(sizeof(unsigned long)-1))&(~(sizeof(unsigned long)-1))]);
diff --git a/include/rdma/ib_user_verbs.h b/include/rdma/ib_user_verbs.h
index 64a721f..8d65bf0 100644
--- a/include/rdma/ib_user_verbs.h
+++ b/include/rdma/ib_user_verbs.h
@@ -533,7 +533,10 @@ struct ib_uverbs_send_wr {
 	__u32 num_sge;
 	__u32 opcode;
 	__u32 send_flags;
-	__u32 imm_data;
+	union {
+		__u32 imm_data;
+		__u32 invalidate_rkey;
+	} ex;
 	union {
 		struct {
 			__u64 remote_addr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 66928e9..c48f6af 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -94,7 +94,7 @@ enum ib_device_cap_flags {
 	IB_DEVICE_SRQ_RESIZE		= (1<<13),
 	IB_DEVICE_N_NOTIFY_CQ		= (1<<14),
 	IB_DEVICE_ZERO_STAG		= (1<<15),
-	IB_DEVICE_SEND_W_INV		= (1<<16),
+	IB_DEVICE_RESERVED		= (1<<16), /* old SEND_W_INV */
 	IB_DEVICE_MEM_WINDOW		= (1<<17),
 	/*
 	 * Devices should set IB_DEVICE_UD_IP_SUM if they support
@@ -105,6 +105,7 @@ enum ib_device_cap_flags {
 	 */
 	IB_DEVICE_UD_IP_CSUM		= (1<<18),
 	IB_DEVICE_UD_TSO		= (1<<19),
+	IB_DEVICE_SEND_W_INV		= (1<<21),
 };
 
 enum ib_atomic_cap {
@@ -625,7 +626,8 @@ enum ib_wr_opcode {
 	IB_WR_RDMA_READ,
 	IB_WR_ATOMIC_CMP_AND_SWP,
 	IB_WR_ATOMIC_FETCH_AND_ADD,
-	IB_WR_LSO
+	IB_WR_LSO,
+	IB_WR_SEND_WITH_INV,
 };
 
 enum ib_send_flags {
@@ -649,7 +651,10 @@ struct ib_send_wr {
 	int			num_sge;
 	enum ib_wr_opcode	opcode;
 	int			send_flags;
-	__be32			imm_data;
+	union {
+		__be32		imm_data;
+		u32		invalidate_rkey;
+	} ex;
 	union {
 		struct {
 			u64	remote_addr;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ffbf22a..8ea283e 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1573,7 +1573,6 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
 	send_wr.sg_list = req->rl_send_iov;
 	send_wr.num_sge = req->rl_niovs;
 	send_wr.opcode = IB_WR_SEND;
-	send_wr.imm_data = 0;
 	if (send_wr.num_sge == 4)	/* no need to sync any pad (constant) */
 		ib_dma_sync_single_for_device(ia->ri_id->device,
 			req->rl_send_iov[3].addr, req->rl_send_iov[3].length,
-- 
1.5.4.5


From weiny2 at llnl.gov  Thu Apr  3 14:30:54 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Thu, 3 Apr 2008 14:30:54 -0700
Subject: [ofa-general] [PATCH] opensm/opensm/osm_perfmgr.c: change log level
 of counter overflow message
Message-ID: <20080403143054.5abc9554.weiny2@llnl.gov>

>From 821619569eea5bb116bc30d32ff18491d6953eb2 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Thu, 3 Apr 2008 14:25:52 -0700
Subject: [PATCH] opensm/opensm/osm_perfmgr.c: change log level of counter overflow message


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 opensm/opensm/osm_perfmgr.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_perfmgr.c b/opensm/opensm/osm_perfmgr.c
index cc95bee..5c53c24 100644
--- a/opensm/opensm/osm_perfmgr.c
+++ b/opensm/opensm/osm_perfmgr.c
@@ -984,7 +984,7 @@ osm_perfmgr_check_overflow(osm_perfmgr_t * pm, __monitored_node_t *mon_node,
 		osm_node_t *p_node = NULL;
 		ib_net16_t lid = 0;
 
-		osm_log(pm->log, OSM_LOG_INFO,
+		osm_log(pm->log, OSM_LOG_VERBOSE,
 			"PerfMgr: Counter overflow: %s (0x%" PRIx64
 			") port %d; clearing counters\n",
 			mon_node->name, mon_node->guid, port);
-- 
1.5.1


From rdreier at cisco.com  Thu Apr  3 14:40:09 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 14:40:09 -0700
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for "send
	with invalidate" work requests
In-Reply-To: <adafxu2y6mk.fsf_-_@cisco.com> (Roland Dreier's message of "Thu, 
	03 Apr 2008 14:24:03 -0700")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com>
Message-ID: <adawsnewrba.fsf_-_@cisco.com>

Handle IB_WR_SEND_WITH_INV work requests.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen at iol.unh.edu>.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/amso1100/c2_qp.c   |   22 +++++++++++++++-------
 drivers/infiniband/hw/amso1100/c2_rnic.c |    3 ++-
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c
index 9190bd5..a6d8944 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -811,16 +811,24 @@ int c2_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
 
 		switch (ib_wr->opcode) {
 		case IB_WR_SEND:
-			if (ib_wr->send_flags & IB_SEND_SOLICITED) {
-				c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE);
-				msg_size = sizeof(struct c2wr_send_req);
+		case IB_WR_SEND_WITH_INV:
+			if (ib_wr->opcode == IB_WR_SEND) {
+				if (ib_wr->send_flags & IB_SEND_SOLICITED)
+					c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE);
+				else
+					c2_wr_set_id(&wr, C2_WR_TYPE_SEND);
+				wr.sqwr.send.remote_stag = 0;
 			} else {
-				c2_wr_set_id(&wr, C2_WR_TYPE_SEND);
-				msg_size = sizeof(struct c2wr_send_req);
+				if (ib_wr->send_flags & IB_SEND_SOLICITED)
+					c2_wr_set_id(&wr, C2_WR_TYPE_SEND_SE_INV);
+				else
+					c2_wr_set_id(&wr, C2_WR_TYPE_SEND_INV);
+				wr.sqwr.send.remote_stag =
+					cpu_to_be32(ib_wr->ex.invalidate_rkey);
 			}
 
-			wr.sqwr.send.remote_stag = 0;
-			msg_size += sizeof(struct c2_data_addr) * ib_wr->num_sge;
+			msg_size = sizeof(struct c2wr_send_req) +
+				sizeof(struct c2_data_addr) * ib_wr->num_sge;
 			if (ib_wr->num_sge > qp->send_sgl_depth) {
 				err = -EINVAL;
 				break;
diff --git a/drivers/infiniband/hw/amso1100/c2_rnic.c b/drivers/infiniband/hw/amso1100/c2_rnic.c
index b1441ae..9a054c6 100644
--- a/drivers/infiniband/hw/amso1100/c2_rnic.c
+++ b/drivers/infiniband/hw/amso1100/c2_rnic.c
@@ -455,7 +455,8 @@ int __devinit c2_rnic_init(struct c2_dev *c2dev)
 	     IB_DEVICE_CURR_QP_STATE_MOD |
 	     IB_DEVICE_SYS_IMAGE_GUID |
 	     IB_DEVICE_ZERO_STAG |
-	     IB_DEVICE_MEM_WINDOW);
+	     IB_DEVICE_MEM_WINDOW |
+	     IB_DEVICE_SEND_W_INV);
 
 	/* Allocate the qptr_array */
 	c2dev->qptr_array = vmalloc(C2_MAX_CQS * sizeof(void *));
-- 
1.5.4.5


From sashak at voltaire.com  Thu Apr  3 18:10:04 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Fri, 4 Apr 2008 01:10:04 +0000
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_perfmgr.c: change log
	level of counter overflow message
In-Reply-To: <20080403143054.5abc9554.weiny2@llnl.gov>
References: <20080403143054.5abc9554.weiny2@llnl.gov>
Message-ID: <20080404011004.GA8521@sashak.voltaire.com>

On 14:30 Thu 03 Apr     , Ira Weiny wrote:
> From 821619569eea5bb116bc30d32ff18491d6953eb2 Mon Sep 17 00:00:00 2001
> From: Ira K. Weiny <weiny2 at llnl.gov>
> Date: Thu, 3 Apr 2008 14:25:52 -0700
> Subject: [PATCH] opensm/opensm/osm_perfmgr.c: change log level of counter overflow message
> 
> 
> Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>

Applied. Thanks.

Sasha


From rdreier at cisco.com  Thu Apr  3 16:06:10 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 16:06:10 -0700
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for
	"send with invalidate" work requests
In-Reply-To: <adawsnewrba.fsf_-_@cisco.com> (Roland Dreier's message of "Thu, 
	03 Apr 2008 14:40:09 -0700")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com> <adawsnewrba.fsf_-_@cisco.com>
Message-ID: <adafxu2wnbx.fsf@cisco.com>

Thinking about all this send w/ invalidate stuff...

Is it worth merging just send w/invalidate for 2.6.26, or should we wait
and get all the iWARP verbs/IB base memory management extensions/etc
stuff straight and target 2.6.27?

 - R.


From mhagen at iol.unh.edu  Thu Apr  3 16:30:38 2008
From: mhagen at iol.unh.edu (Mikkel Hagen)
Date: Thu, 03 Apr 2008 19:30:38 -0400
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for
	"send with invalidate" work requests
In-Reply-To: <adafxu2wnbx.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com>
	<adalk3w53ei.fsf@cisco.com>	<adafxu2y6mk.fsf_-_@cisco.com>
	<adawsnewrba.fsf_-_@cisco.com> <adafxu2wnbx.fsf@cisco.com>
Message-ID: <47F5689E.90101@iol.unh.edu>

There are people waiting for SendINV functionality, if we are 
comfortable with the state of the patches, I vote we merge sooner than 
later.

Mikkel Hagen
Project Assistant - Fibre Channel/SAS/SATA Consortiums
Research and Development Engineer - iWARP Consortium	
FC/SAS/SATA:1-603-862-0701  iWARP:1-603-862-5083  Fax:1-603-862-4181
UNH-IOL
121 Technology Drive, Suite 2
Durham, NH 03824


Roland Dreier wrote:
> Thinking about all this send w/ invalidate stuff...
>
> Is it worth merging just send w/invalidate for 2.6.26, or should we wait
> and get all the iWARP verbs/IB base memory management extensions/etc
> stuff straight and target 2.6.27?
>
>  - R.
>   


From rdreier at cisco.com  Thu Apr  3 16:40:17 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 16:40:17 -0700
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for
	"send with invalidate" work requests
In-Reply-To: <47F5689E.90101@iol.unh.edu> (Mikkel Hagen's message of "Thu, 03
	Apr 2008 19:30:38 -0400")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com> <adawsnewrba.fsf_-_@cisco.com>
	<adafxu2wnbx.fsf@cisco.com> <47F5689E.90101@iol.unh.edu>
Message-ID: <adawsnev76m.fsf@cisco.com>

 > There are people waiting for SendINV functionality, if we are
 > comfortable with the state of the patches, I vote we merge sooner than
 > later.

Who is waiting and how are they going to use it?  We don't have any
"allocate L_Key" verb implemented now, so it could only possibly work
with memory windows.  Do any drivers have working memory window support?

 - R.


From rdreier at cisco.com  Thu Apr  3 16:57:37 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 16:57:37 -0700
Subject: [ofa-general] [PATCH 0/20] IB/ipath -- DDR HCA patches in
	for-roland for 2.6.26
In-Reply-To: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com> (Ralph
	Campbell's message of "Wed, 02 Apr 2008 15:49:01 -0700")
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
Message-ID: <adaod8qv6dq.fsf@cisco.com>

thanks, applied all 20


From mhagen at iol.unh.edu  Thu Apr  3 17:30:42 2008
From: mhagen at iol.unh.edu (Mikkel Hagen)
Date: Thu, 03 Apr 2008 20:30:42 -0400
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for
	"send with invalidate" work requests
In-Reply-To: <adawsnev76m.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com>
	<adalk3w53ei.fsf@cisco.com>	<adafxu2y6mk.fsf_-_@cisco.com>
	<adawsnewrba.fsf_-_@cisco.com>	<adafxu2wnbx.fsf@cisco.com>
	<47F5689E.90101@iol.unh.edu> <adawsnev76m.fsf@cisco.com>
Message-ID: <47F576B2.300@iol.unh.edu>

We've got an iSER implementation that was hoping to utilize SendINV, and 
our conformance and interoperability tools have been waiting for support 
to test.

Mikkel Hagen
Project Assistant - Fibre Channel/SAS/SATA Consortiums
Research and Development Engineer - iWARP Consortium	
FC/SAS/SATA:1-603-862-0701  iWARP:1-603-862-5083  Fax:1-603-862-4181
UNH-IOL
121 Technology Drive, Suite 2
Durham, NH 03824


Roland Dreier wrote:
>  > There are people waiting for SendINV functionality, if we are
>  > comfortable with the state of the patches, I vote we merge sooner than
>  > later.
>
> Who is waiting and how are they going to use it?  We don't have any
> "allocate L_Key" verb implemented now, so it could only possibly work
> with memory windows.  Do any drivers have working memory window support?
>
>  - R.
>   


From rdreier at cisco.com  Thu Apr  3 17:52:00 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 17:52:00 -0700
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support for
	"send with invalidate" work requests
In-Reply-To: <47F576B2.300@iol.unh.edu> (Mikkel Hagen's message of "Thu, 03
	Apr 2008 20:30:42 -0400")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com> <adawsnewrba.fsf_-_@cisco.com>
	<adafxu2wnbx.fsf@cisco.com> <47F5689E.90101@iol.unh.edu>
	<adawsnev76m.fsf@cisco.com> <47F576B2.300@iol.unh.edu>
Message-ID: <adak5jev3v3.fsf@cisco.com>

 > We've got an iSER implementation that was hoping to utilize SendINV,
 > and our conformance and interoperability tools have been waiting for
 > support to test.

OK, that's good.

But does this code start working if we add the two patches I posted?  I
don't understand how you could do anything useful with the current state
of things plus send w/inval for amso1100.

I hope I'm not being too difficult here -- but I really would like to
understand how the patches that I have are useful as they stand, without
some further support for new verbs and/or MW implementations.

 - R.


From rdreier at cisco.com  Thu Apr  3 17:56:18 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 03 Apr 2008 17:56:18 -0700
Subject: [ofa-general] [PATCH/RFC] Add support for "send with invalidate"
	to	libibverbs
In-Reply-To: <EXNANE01mFwzPGsM4UR00000213@exnane01.hq.netapp.com> (Thomas
	Talpey's message of "Wed, 02 Apr 2008 13:21:39 -0400")
References: <adad4p92rra.fsf@cisco.com> <47F33837.60701@dev.mellanox.co.il>
	<EXNANE01mFwzPGsM4UR00000213@exnane01.hq.netapp.com>
Message-ID: <adafxu2v3nx.fsf@cisco.com>

 > drivers/infiniband/hw/ehca/ehca_hca.c 376:
 > 	props->max_mw = min_t(unsigned, rblock->max_mw, INT_MAX);

 > Note, ehca may set it to huge negative values,

I think the code is OK as it stands... it takes the minimum (as unsigned
int values) of rblock->max_mw and INT_MAX, and returns that.

This should be working OK, at least since 76dea3bc ("IB/ehca: Fix
clipping of device limits to INT_MAX").

 > drivers/infiniband/hw/nes/nes_verbs.c 3915:
 > 	props->max_mw = nesibdev->max_mr;

 > nes puts the wrong value in the attribute field! (typo?)

I'm not positive but it's plausible that the nes limit on the number of
memory windows is the same as its limit on MRs.  And nes has an
implementation of bind_mw, so it is at least possible that it works.

Actually now that I think of it, I have a nes setup where I could test
your MW code... what is the sysctl to set?

 - R.


From sfr at canb.auug.org.au  Thu Apr  3 19:32:04 2008
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Fri, 4 Apr 2008 13:32:04 +1100
Subject: [ofa-general] linux-next: infiniband build failure
Message-ID: <20080404133204.3edc0470.sfr@canb.auug.org.au>

Hi Roland,

Today's build of linux-next (powerpc ppc64_defconfig) produced this:

drivers/infiniband/hw/ehca/ehca_reqs.c: In function 'ehca_write_swqe':
drivers/infiniband/hw/ehca/ehca_reqs.c:191: error: 'const struct ib_send_wr' has no member named 'imm_data'

Caused by commit 0f2031b6374e693474f01020efeee6e9a00fa918 ("IB/core: Add
support for "send with invalidate" work requests").  I applied the patch
below but it would be good if it could be merged back into the above
commit.

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

>From cb95023d10e3a6327b6b761f49f3e2e855882e57 Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr at canb.auug.org.au>
Date: Fri, 4 Apr 2008 13:26:45 +1100
Subject: [PATCH] infiniband-fix-1

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
---
 drivers/infiniband/hw/ehca/ehca_reqs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c
index 2ce8cff..784461d 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -188,7 +188,7 @@ static inline int ehca_write_swqe(struct ehca_qp *qp,
 	if (send_wr->opcode == IB_WR_SEND_WITH_IMM ||
 	    send_wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) {
 		/* this might not work as long as HW does not support it */
-		wqe_p->immediate_data = be32_to_cpu(send_wr->imm_data);
+		wqe_p->immediate_data = be32_to_cpu(send_wr->ex.imm_data);
 		wqe_p->wr_flag |= WQE_WRFLAG_IMM_DATA_PRESENT;
 	}
 
-- 
1.5.4.5


From sfr at canb.auug.org.au  Thu Apr  3 19:55:32 2008
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Fri, 4 Apr 2008 13:55:32 +1100
Subject: [ofa-general] linux-next: infiniband build failure
Message-ID: <20080404135532.70c46480.sfr@canb.auug.org.au>

Hi All,

Today's build of linux-next (x86_64 allmodconfig) produced this:

drivers/infiniband/hw/ipath/ipath_verbs.c: In function 'ipath_register_ib_device':
drivers/infiniband/hw/ipath/ipath_verbs.c:2070: error: 'struct ib_device' has no member named 'class_dev'

This is caused by the driver-core patch
"ib-convert-struct-class_device-to-struct-device.patch" which changes the
class_dev member of struct ib_device to "dev" and infiniband commit
63fe2f55dcd6d227bb9dc0aedec4431a9a7a8f92 ("IB/ipath: add calls to new
7220 code and enable in build") which adds another reference to class_dev.

I applied the following patch (because reverting the above driver-core
patch was too hard). I am not sure if it is the correct patch, but it
does build  This needs to be sorted out.  Greg, could the driver-core
patch be delivered through the infiniband tree? (This would, of course
cause problems for
driver-core-remove-no-longer-used-struct-class_device.patch.)

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

>From 70bb8a344acb62afb33e8c5f96d568aa1382210e Mon Sep 17 00:00:00 2001
From: Stephen Rothwell <sfr at canb.auug.org.au>
Date: Fri, 4 Apr 2008 13:43:49 +1100
Subject: [PATCH] infiniband-fix-2

Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
---
 drivers/infiniband/hw/ipath/ipath_verbs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 466f3fb..6ac0c5c 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -2067,7 +2067,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 	dev->phys_port_cnt = 1;
 	dev->num_comp_vectors = 1;
 	dev->dma_device = &dd->pcidev->dev;
-	dev->class_dev.dev = dev->dma_device;
+	dev->dev.parent = dev->dma_device;
 	dev->query_device = ipath_query_device;
 	dev->modify_device = ipath_modify_device;
 	dev->query_port = ipath_query_port;
-- 
1.5.4.5


From greg at kroah.com  Thu Apr  3 20:10:20 2008
From: greg at kroah.com (Greg KH)
Date: Thu, 3 Apr 2008 20:10:20 -0700
Subject: [ofa-general] Re: linux-next: infiniband build failure
In-Reply-To: <20080404135532.70c46480.sfr@canb.auug.org.au>
References: <20080404135532.70c46480.sfr@canb.auug.org.au>
Message-ID: <20080404031020.GB24743@kroah.com>

On Fri, Apr 04, 2008 at 01:55:32PM +1100, Stephen Rothwell wrote:
> Hi All,
> 
> Today's build of linux-next (x86_64 allmodconfig) produced this:
> 
> drivers/infiniband/hw/ipath/ipath_verbs.c: In function 'ipath_register_ib_device':
> drivers/infiniband/hw/ipath/ipath_verbs.c:2070: error: 'struct ib_device' has no member named 'class_dev'
> 
> This is caused by the driver-core patch
> "ib-convert-struct-class_device-to-struct-device.patch" which changes the
> class_dev member of struct ib_device to "dev" and infiniband commit
> 63fe2f55dcd6d227bb9dc0aedec4431a9a7a8f92 ("IB/ipath: add calls to new
> 7220 code and enable in build") which adds another reference to class_dev.
> 
> I applied the following patch (because reverting the above driver-core
> patch was too hard). I am not sure if it is the correct patch, but it
> does build  This needs to be sorted out.  Greg, could the driver-core
> patch be delivered through the infiniband tree? (This would, of course
> cause problems for
> driver-core-remove-no-longer-used-struct-class_device.patch.)

Your patch looks correct to me.

Roland wanted the ib patch to go through my tree, and I figure we will
work out these issues during the 2 week merge window.

thanks,

greg k-h


From or.gerlitz at gmail.com  Thu Apr  3 21:17:54 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 07:17:54 +0300
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4
	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
Message-ID: <15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com>

On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing <changquing.tang at hp.com> wrote:

>  The problem is, from MPI side, (and by default), we don't know which port is on which
>  fabric, since the subnet prefix is the same. We rely on system admin to config two
>  different subnet prefixes for HP-MPI to work.
>
>  No vendor has claimed to support this.

CQ, not supporting a different subnet prefix per IB subnet is against
IB nature, I don't think there
should be any problem to configure a different prefix at each open SM
instance and the Linux host stack
would work perfectly under this config. If you are a ware to any
problem in the opensm and/or the host stack
please let the community know and the maintainers will fix it.

Or.


From or.gerlitz at gmail.com  Thu Apr  3 21:22:35 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 07:22:35 +0300
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED 1.4
	plans
In-Reply-To: <D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
Message-ID: <15ddcffd0804032122i2993bd00x84d9d38d2b7f34ba@mail.gmail.com>

On Thu, Apr 3, 2008 at 5:53 PM, Tang, Changqing <changquing.tang at hp.com> wrote:
>  for example, in MPI, process A know the HCA guid on another node. After running for
>  some time, the switch is restarted for some reason, and the whole fabric is re-configured.


CQ,

If by "the whole fabric is re-configured" you refer to a case where a
subnet prefix changes while a job runs and a process
is detached/reattached to the job  so now you want to adopt your
design to handle it, is over engineering, why you want to do that?

Or.


From or.gerlitz at gmail.com  Thu Apr  3 21:47:40 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 07:47:40 +0300
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F4F526.3060709@opengridcomputing.com>
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
Message-ID: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>

On Thu, Apr 3, 2008 at 6:17 PM, Steve Wise <swise at opengridcomputing.com> wrote:
> I think RDS might be getting confused because the 10GbE rnic shows up as a
> dumb NIC hooked into the native TCP stack -and- an rdma device.

> Jon Mason will be working to enable RDS soon on the chelsio device. He'll
> feed back the changes needed, if any, to RDS.  Stay tuned.

Steve,

I understand that a similar work has been done at least to some extent
with open MPI, and I will be
very happy to hear the lessons learned. Did you manage to have the
same (say point to point)
open mpi  "transport"  design/code work over rdma-cm over both IB and iWARP?

Can someone from OGC or Chelsio drive a BOF on that in Sonoma?

If not, can some notes be sent to the list? I say lets learn from what
you did so far...

Or.


From richard.frank at oracle.com  Thu Apr  3 22:52:01 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Fri, 04 Apr 2008 00:52:01 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
References: <47F3C2EF.6010304@oracle.com>	
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>	
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
Message-ID: <47F5C201.6080305@oracle.com>

having a BOF at Sonoma - and or circulating a cheat sheet of what to 
watch out for would be very handy - indeed :)

Or Gerlitz wrote:
> On Thu, Apr 3, 2008 at 6:17 PM, Steve Wise <swise at opengridcomputing.com> wrote:
>   
>> I think RDS might be getting confused because the 10GbE rnic shows up as a
>> dumb NIC hooked into the native TCP stack -and- an rdma device.
>>     
>
>   
>> Jon Mason will be working to enable RDS soon on the chelsio device. He'll
>> feed back the changes needed, if any, to RDS.  Stay tuned.
>>     
>
> Steve,
>
> I understand that a similar work has been done at least to some extent
> with open MPI, and I will be
> very happy to hear the lessons learned. Did you manage to have the
> same (say point to point)
> open mpi  "transport"  design/code work over rdma-cm over both IB and iWARP?
>
> Can someone from OGC or Chelsio drive a BOF on that in Sonoma?
>
> If not, can some notes be sent to the list? I say lets learn from what
> you did so far...
>
> Or.
>   


From or.gerlitz at gmail.com  Thu Apr  3 22:54:29 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 08:54:29 +0300
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <47F37CA4.8000109@mellanox.co.il>
References: <adave31bayd.fsf@cisco.com> <47F37CA4.8000109@mellanox.co.il>
Message-ID: <15ddcffd0804032254t4533d41br671edf335c6daabb@mail.gmail.com>

On Wed, Apr 2, 2008 at 3:31 PM, Tziporet Koren
<tziporet at dev.mellanox.co.il> wrote:

>  We want to add send with invalidate
>  Eli will be able to send the patches next week and since they are small I think they can be in for 2.6.26

Does send with invalidate applies to rkeys generated through the
proprietary FMR API?
if not, what usage you envision to the new verb under nowadays IB devices?

Or.


From bs at q-leap.de  Fri Apr  4 02:23:59 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Fri, 4 Apr 2008 11:23:59 +0200
Subject: [ofa-general] [PATCH] parse_node_map: print parse errors
Message-ID: <200804041124.00004.bs@q-leap.de>

Hello,

could you please add the patch below, without it I probably never would have 
realized why my node name map was not accepted. 

Btw, I'm a bit surprised there don't seem to be any default wrappers, for 
fopen(), fclose(), malloc(), fprintf(), etc.

diff -rup opensm-3.2.1.old/complib/cl_nodenamemap.c 
opensm-3.2.1/complib/cl_nodenamemap.c
--- opensm-3.2.1.old/complib/cl_nodenamemap.c	2008-04-03 13:17:35.000000000 
+0200
+++ opensm-3.2.1/complib/cl_nodenamemap.c	2008-04-04 11:09:42.000000000 +0200
@@ -55,8 +55,11 @@ static int map_name(void *cxt, uint64_t 
 		return 0;
 
 	item = malloc(sizeof(*item));
-	if (!item)
+	if (!item) {
+		fprintf(stderr, "Malloc failed, sizeof(*item) = %d.\n", sizeof(*item));
 		return -1;
+	}
+	
 	item->guid = guid;
 	item->name = strdup(p);
 	cl_qmap_insert(map, item->guid, (cl_map_item_t *)item);
@@ -169,6 +172,8 @@ int parse_node_map(const char *file_name
 		guid = strtoull(p, &e, 0);
 		if (e == p || (!isspace(*e) && *e != '#' && *e != '\0')) {
 			fclose(f);
+			fprintf (stderr, "%s: Parse error in line: %s\n",
+			         __func__, line);
 			return -1;
 		}
 

Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH


From bs at q-leap.de  Fri Apr  4 02:47:27 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Fri, 4 Apr 2008 11:47:27 +0200
Subject: [ofa-general] ERR 0108: Unknown remote side
Message-ID: <200804041147.27565.bs@q-leap.de>

Hello,

opensm-3.2.1 logs some error messages like this:

Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
ERR 0108: Unknown remote side for node 0
x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list
Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path:
                                Path = 0,1,14,13


From ibnetdiscover output I see port13 of this switch is a switch-interconnect 
(sorry, I don't know what the correct name/identifier for switches within 
switches):

[13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid 263 
4xSDR


Apr 04 00:00:08 325219 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
ERR 0108: Unknown remote side for node 0
x000b8cffff002bf9(SW_pfs1_inter6) port 9. Adding to light sweep sampling list
Apr 04 00:00:08 325234 [4580A960] 0x01 -> Directed Path Dump of 2 hop path:
                                Path = 0,1,18

This is again an interconnection:

[9]     "S-000b8cffff002b9e"[15]                # "SW_pfs1_leaf1" lid 177 
4xDDR


Apr 04 00:00:08 325288 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
ERR 0108: Unknown remote side for node 0
x000b8cffff002bfa(SW_pfs1_inter7) port 13. Adding to light sweep sampling list
Apr 04 00:00:08 325301 [4580A960] 0x01 -> Directed Path Dump of 2 hop path:
                                Path = 0,1,14


And again an interconnection:

[13]    "S-000b8cffff002ba2"[13]                # "SW_pfs1_leaf4" lid 182 
4xDDR


All the other interconnections seem to be fine. 


Thanks,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH


From andrea at qumranet.com  Fri Apr  4 05:30:40 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Fri, 4 Apr 2008 14:30:40 +0200
Subject: [ofa-general] Re: EMM: disable other notifiers before register and
	unregister
In-Reply-To: <Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
References: <20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
	<Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
Message-ID: <20080404123040.GC10185@duo.random>

On Thu, Apr 03, 2008 at 12:20:41PM -0700, Christoph Lameter wrote:
> On Thu, 3 Apr 2008, Andrea Arcangeli wrote:
> 
> > My attempt to fix this once and for all is to walk all vmas of the
> > "mm" inside mmu_notifier_register and take all anon_vma locks and
> > i_mmap_locks in virtual address order in a row. It's ok to take those
> > inside the mmap_sem. Supposedly if anybody will ever take a double
> > lock it'll do in order too. Then I can dump all the other locking and
> 
> What about concurrent mmu_notifier registrations from two mm_structs 
> that have shared mappings? Isnt there a potential deadlock situation?

No, the ordering of the lock avoids that. Here a snippnet.

/*
 * This operation locks against the VM for all pte/vma/mm related
 * operations that could ever happen on a certain mm. This includes
 * vmtruncate, try_to_unmap, and all page faults. The holder
 * must not hold any mm related lock. A single task can't take more
 * than one mm lock in a row or it would deadlock.
 */

So you can't do:

   mm_lock(mm1);
   mm_lock(mm2);

But if two different tasks run the mm_lock everything is ok. Each task
in the system can lock at most 1 mm at time.

> Well good luck. Hopefully we will get to something that works.

Looks good so far but I didn't finish it yet.


From Bennett at fpi-associates.com  Fri Apr  4 06:17:14 2008
From: Bennett at fpi-associates.com (Elijah Simmons)
Date: Fri, 04 Apr 2008 10:17:14 -0300
Subject: [ofa-general] 2003 microsoft office professional with business
	contact manager for outlook - $69
Message-ID: <000801c89656$79825e80$0100007f@aojsx>


Type %lunoem. com% in Inter_net_Exp1o_rer
Please kill any %%% symbols from address

roxio easy media creator 8 - $39
adobe after effects cs3 - $69
adobe font folio 11 - $189
adobe photoshop cs3 extended - $89
microsoft visual basic professional 6.0 - $49
adobe audition 2.0 - $49
ulead photoimpact 12 - $79

Goto %lunoem. com%


From Brian.Murrell at Sun.COM  Fri Apr  4 07:36:59 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 10:36:59 -0400
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
Message-ID: <1207319819.1750.72.camel@pc.ilinx>

I'm trying to get a few nodes here connected with IPoIB.  On the first
node I have tried with, after ifconfig'ing the interface into the
network with other IPoIB nodes I cannot seem to ping any other nodes.  I
ran ibdiagnet and got a /tmp/ibdiagnet.pkey file with the following
contents:

sata14:/ # cat /tmp/ibdiagnet.pkey
GROUP PKey:0x7fff Hosts:4
   Full sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108
   Full sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108
   Full sata23/P2 lid=0x0008 guid=0x00066a01a00002fe dev=23108
   Full sata16/P2 lid=0x0007 guid=0x00066a01a00002c1 dev=23108

When I run an "ibdiagpath -l 0x0004" I get the following:

-W- Topology file is not specified.
    Reports regarding cluster links will use direct routes.
-I- Using port 2 as the local port.

-I---------------------------------------------------
-I- Traversing the path from local to destination
-I---------------------------------------------------
-I- From: lid=0x0006 guid=0x00066a01a00002bf dev=23108 sata14/P2
-I- To:   lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=1
    
-I- From: lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=2
-I- To:   lid=0x0004 guid=0x00066a01a0000363 dev=23108 sata15/P2
    

-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-I- No illegal PM counters values were found

-I---------------------------------------------------
-I- Path Partitions Report
-I---------------------------------------------------
-I- Source sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 Port 2
    PKeys:0xffff
-I- Destination sata15 lid=0x0004 guid=0x00066a01a0000363 dev=23108 PKeys:0xffff
-I- Path shared PKeys: 0xffff

-I---------------------------------------------------
-I- IPoIB Path Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000000 MTU:2048Byte rate:10Gbps SL:0x00
-W- Port sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 can not join due
    to rate:2.5Gbps < group:10Gbps
-W- Port sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108 can not join due
    to rate:2.5Gbps < group:10Gbps
-E- No IPoIB Subnets found on Path! Nodes can not communicate via IPoIB!

-I---------------------------------------------------
-I- QoS on Path Check
-I---------------------------------------------------
-W- Blocked VLs:4 5 at node:sata14 lid=0x0006 guid=0x00066a01a00002bf dev=23108
    port:2
-W- Blocked VLs:4 5 at node: lid=0x0001 guid=0x00066a00c8000180 dev=5 port:2
-I- The following SLs can be used:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 
-I- Done. Run time was 0 seconds.

That IPoIB Path Check looks a bit alarming.

Anyone have any suggestions?

b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/d07b66a4/attachment.sig>

From swise at opengridcomputing.com  Fri Apr  4 07:41:55 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 04 Apr 2008 09:41:55 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
References: <47F3C2EF.6010304@oracle.com>	
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>	
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
Message-ID: <47F63E33.5080709@opengridcomputing.com>


Or Gerlitz wrote:
> On Thu, Apr 3, 2008 at 6:17 PM, Steve Wise <swise at opengridcomputing.com> wrote:
>> I think RDS might be getting confused because the 10GbE rnic shows up as a
>> dumb NIC hooked into the native TCP stack -and- an rdma device.
> 
>> Jon Mason will be working to enable RDS soon on the chelsio device. He'll
>> feed back the changes needed, if any, to RDS.  Stay tuned.
> 
> Steve,
> 
> I understand that a similar work has been done at least to some extent
> with open MPI, and I will be
> very happy to hear the lessons learned. Did you manage to have the
> same (say point to point)
> open mpi  "transport"  design/code work over rdma-cm over both IB and iWARP?
> 

Definitely.  We're running over rdma-cm over mthca and cxgb3 on 2 nodes 
today.  8 nodes over cxgb3.  We're working out the details now.

> Can someone from OGC or Chelsio drive a BOF on that in Sonoma?
> 
> If not, can some notes be sent to the list? I say lets learn from what
> you did so far...


We won't be in Sonoma, but perhaps Jon can email some info to the list 
on what we've done to-date for open mpi.

Steve.


From hrosenstock at xsigo.com  Fri Apr  4 07:55:58 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 04 Apr 2008 07:55:58 -0700
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <1207319819.1750.72.camel@pc.ilinx>
References: <1207319819.1750.72.camel@pc.ilinx>
Message-ID: <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 10:36 -0400, Brian J. Murrell wrote:
> I'm trying to get a few nodes here connected with IPoIB.  On the first
> node I have tried with, after ifconfig'ing the interface into the
> network with other IPoIB nodes I cannot seem to ping any other nodes.  I
> ran ibdiagnet and got a /tmp/ibdiagnet.pkey file with the following
> contents:
> 
> sata14:/ # cat /tmp/ibdiagnet.pkey
> GROUP PKey:0x7fff Hosts:4
>    Full sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108
>    Full sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108
>    Full sata23/P2 lid=0x0008 guid=0x00066a01a00002fe dev=23108
>    Full sata16/P2 lid=0x0007 guid=0x00066a01a00002c1 dev=23108
> 
> When I run an "ibdiagpath -l 0x0004" I get the following:
> 
> -W- Topology file is not specified.
>     Reports regarding cluster links will use direct routes.
> -I- Using port 2 as the local port.
> 
> -I---------------------------------------------------
> -I- Traversing the path from local to destination
> -I---------------------------------------------------
> -I- From: lid=0x0006 guid=0x00066a01a00002bf dev=23108 sata14/P2
> -I- To:   lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=1
>     
> -I- From: lid=0x0001 guid=0x00066a00c8000180 dev=5 Port=2
> -I- To:   lid=0x0004 guid=0x00066a01a0000363 dev=23108 sata15/P2
>     
> 
> -I---------------------------------------------------
> -I- PM Counters Info
> -I---------------------------------------------------
> -I- No illegal PM counters values were found
> 
> -I---------------------------------------------------
> -I- Path Partitions Report
> -I---------------------------------------------------
> -I- Source sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 Port 2
>     PKeys:0xffff
> -I- Destination sata15 lid=0x0004 guid=0x00066a01a0000363 dev=23108 PKeys:0xffff
> -I- Path shared PKeys: 0xffff
> 
> -I---------------------------------------------------
> -I- IPoIB Path Check
> -I---------------------------------------------------
> -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000000 MTU:2048Byte rate:10Gbps SL:0x00
> -W- Port sata14/P2 lid=0x0006 guid=0x00066a01a00002bf dev=23108 can not join due
>     to rate:2.5Gbps < group:10Gbps
> -W- Port sata15/P2 lid=0x0004 guid=0x00066a01a0000363 dev=23108 can not join due
>     to rate:2.5Gbps < group:10Gbps
> -E- No IPoIB Subnets found on Path! Nodes can not communicate via IPoIB!
> 
> -I---------------------------------------------------
> -I- QoS on Path Check
> -I---------------------------------------------------
> -W- Blocked VLs:4 5 at node:sata14 lid=0x0006 guid=0x00066a01a00002bf dev=23108
>     port:2
> -W- Blocked VLs:4 5 at node: lid=0x0001 guid=0x00066a00c8000180 dev=5 port:2
> -I- The following SLs can be used:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>  
> -I- Done. Run time was 0 seconds.
> 
> That IPoIB Path Check looks a bit alarming.
> 
> Anyone have any suggestions?

Looks like you have a mixed rate set of ports so you need to configure
the group to 2.5 Gbps. What SM are you using ?

-- Hal

> b.
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From Brian.Murrell at Sun.COM  Fri Apr  4 08:05:36 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 11:05:36 -0400
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
References: <1207319819.1750.72.camel@pc.ilinx>
	<1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
Message-ID: <1207321536.1750.80.camel@pc.ilinx>

On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote:
> 
> Looks like you have a mixed rate set of ports so you need to configure
> the group to 2.5 Gbps.

I'm a bit green with I/B, so please bear with me if you can.  I do
understand that there can be mixed rates depending on hardware.  But the
"hardware guys" assure me the cards in these machines should be able to
do 10Gbps.  Maybe they are wrong.  The card is listing as:

06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

>  What SM are you using ?

That's a good question.  I suspect it's running on the switch.  I don't
know any details on the switch (yet) though.  I will need to engage the
hardware folks to determine this.  I did get an error when when ran
ibdiagnet about more than 1 master SM running when I started opensmd on
one of the nodes and none of the other nodes are running an SM so that
only leaves the switch.

In my limited exposure to IB, running the SM on the switch has always
yielded bad results.  I will see if I can get them to disable it.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/ff2e4c0d/attachment.sig>

From hrosenstock at xsigo.com  Fri Apr  4 08:08:23 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 04 Apr 2008 08:08:23 -0700
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <1207321536.1750.80.camel@pc.ilinx>
References: <1207319819.1750.72.camel@pc.ilinx>
	<1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
	<1207321536.1750.80.camel@pc.ilinx>
Message-ID: <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 11:05 -0400, Brian J. Murrell wrote:
> On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote:
> > 
> > Looks like you have a mixed rate set of ports so you need to configure
> > the group to 2.5 Gbps.
> 
> I'm a bit green with I/B, so please bear with me if you can.  I do
> understand that there can be mixed rates depending on hardware.  But the
> "hardware guys" assure me the cards in these machines should be able to
> do 10Gbps.  Maybe they are wrong.  The card is listing as:
> 
> 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

Yes, but the multicast groups (which IPoIB uses) need to be a
homogeneous rate so either it needs to be lowest common denominator or
some nodes will not be able to participate.

> >  What SM are you using ?
> 
> That's a good question.  I suspect it's running on the switch.  I don't
> know any details on the switch (yet) though.  I will need to engage the
> hardware folks to determine this.  I did get an error when when ran
> ibdiagnet about more than 1 master SM running when I started opensmd on
> one of the nodes and none of the other nodes are running an SM so that
> only leaves the switch.
> 
> In my limited exposure to IB, running the SM on the switch has always
> yielded bad results.  I will see if I can get them to disable it.

That's one choice. The other is to contact your SM (switch) vendor as to
how to configure the SM for this. Most SMs have some configuration to
deal with the situation you are describing.

-- Hal

> b.
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From changquing.tang at hp.com  Fri Apr  4 08:08:33 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Fri, 4 Apr 2008 15:08:33 +0000
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4 plans
In-Reply-To: <15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>
	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>
	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>
	<15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com>
Message-ID: <D89C2C212795564B837FA1665CAE02991016C3CFB0@G5W0278.americas.hpqcorp.net>


What I mean "claim to support" is to have more people to test with this config.

--CQ

> -----Original Message-----
> From: Or Gerlitz [mailto:or.gerlitz at gmail.com]
> Sent: Thursday, April 03, 2008 11:18 PM
> To: Tang, Changqing
> Cc: general at lists.openfabrics.org; ewg at lists.openfabrics.org
> Subject: Re: [ofa-general] Re: [ewg] OFED March 24 meeting
> summary on OFED 1.4 plans
>
> On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing
> <changquing.tang at hp.com> wrote:
>
> >  The problem is, from MPI side, (and by default), we don't
> know which
> > port is on which  fabric, since the subnet prefix is the
> same. We rely
> > on system admin to config two  different subnet prefixes
> for HP-MPI to work.
> >
> >  No vendor has claimed to support this.
>
> CQ, not supporting a different subnet prefix per IB subnet is
> against IB nature, I don't think there should be any problem
> to configure a different prefix at each open SM instance and
> the Linux host stack would work perfectly under this config.
> If you are a ware to any problem in the opensm and/or the
> host stack please let the community know and the maintainers
> will fix it.
>
> Or.
>


From todd.rimmer at qlogic.com  Fri Apr  4 08:14:14 2008
From: todd.rimmer at qlogic.com (Todd Rimmer)
Date: Fri, 4 Apr 2008 10:14:14 -0500
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>
References: <1207319819.1750.72.camel@pc.ilinx><1207320958.15625.47.camel@hrosenstock-ws.xsigo.com><1207321536.1750.80.camel@pc.ilinx>
	<1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>
Message-ID: <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>

> From: Hal Rosenstock
> Sent: Friday, April 04, 2008 11:08 AM
> To: Brian J. Murrell
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] can not join due to rate:2.5Gbps <
> group:10Gbps?
> 
> On Fri, 2008-04-04 at 11:05 -0400, Brian J. Murrell wrote:
> > On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote:
> > >
> > > Looks like you have a mixed rate set of ports so you need to
configure
> > > the group to 2.5 Gbps.
> >
> > I'm a bit green with I/B, so please bear with me if you can.  I do
> > understand that there can be mixed rates depending on hardware.  But
the
> > "hardware guys" assure me the cards in these machines should be able
to
> > do 10Gbps.  Maybe they are wrong.  The card is listing as:
> >
> > 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev
a1)
I would not recommend reconfiguring your SM for this situation.
Instead, you most likely have a bad cable or possibly a bad HCA or
switch port.  All IB products shipped within the last 6 years support
10g, so the fact your system has negotiated to 2.5g indicates a problem
with the link.

Bad or poorly connected cables are the typical cause.

Todd Rimmer
Chief Architect 
QLogic System Interconnect Group
Voice: 610-233-4852     Fax: 610-233-4777
Todd.Rimmer at QLogic.com  www.QLogic.com


From hrosenstock at xsigo.com  Fri Apr  4 08:19:08 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 04 Apr 2008 08:19:08 -0700
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>
References: <1207319819.1750.72.camel@pc.ilinx>
	<1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
	<1207321536.1750.80.camel@pc.ilinx>
	<1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>
	<4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>
Message-ID: <1207322348.15625.54.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 10:14 -0500, Todd Rimmer wrote:
> > From: Hal Rosenstock
> > Sent: Friday, April 04, 2008 11:08 AM
> > To: Brian J. Murrell
> > Cc: general at lists.openfabrics.org
> > Subject: Re: [ofa-general] can not join due to rate:2.5Gbps <
> > group:10Gbps?
> > 
> > On Fri, 2008-04-04 at 11:05 -0400, Brian J. Murrell wrote:
> > > On Fri, 2008-04-04 at 07:55 -0700, Hal Rosenstock wrote:
> > > >
> > > > Looks like you have a mixed rate set of ports so you need to
> configure
> > > > the group to 2.5 Gbps.
> > >
> > > I'm a bit green with I/B, so please bear with me if you can.  I do
> > > understand that there can be mixed rates depending on hardware.  But
> the
> > > "hardware guys" assure me the cards in these machines should be able
> to
> > > do 10Gbps.  Maybe they are wrong.  The card is listing as:
> > >
> > > 06:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev
> a1)
> I would not recommend reconfiguring your SM for this situation.
> Instead, you most likely have a bad cable or possibly a bad HCA or
> switch port.  All IB products shipped within the last 6 years support
> 10g, so the fact your system has negotiated to 2.5g indicates a problem
> with the link.
> 
> Bad or poorly connected cables are the typical cause.

Yes, this seems right; I misread this as the DDR/SDR issue. I would
doubt he has any 1x hardware.

-- Hal

> Todd Rimmer
> Chief Architect 
> QLogic System Interconnect Group
> Voice: 610-233-4852     Fax: 610-233-4777
> Todd.Rimmer at QLogic.com  www.QLogic.com
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From Brian.Murrell at Sun.COM  Fri Apr  4 08:25:56 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 11:25:56 -0400
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>
References: <1207319819.1750.72.camel@pc.ilinx>
	<1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
	<1207321536.1750.80.camel@pc.ilinx>
	<1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>
	<4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>
Message-ID: <1207322756.1750.86.camel@pc.ilinx>

On Fri, 2008-04-04 at 10:14 -0500, Todd Rimmer wrote:
> I would not recommend reconfiguring your SM for this situation.

Indeed, if what you say below pans out, I'd rather not.

> Instead, you most likely have a bad cable or possibly a bad HCA or
> switch port.  All IB products shipped within the last 6 years support
> 10g, so the fact your system has negotiated to 2.5g indicates a problem
> with the link.

OK.  I will investigate this.  Is there any more direct method of
determining what rate an HCA has negotiated than using the "ibdiagpath
-l $nid" mechanism that I have been using?  It seems like a kind of
round-about method of getting that information.

> Bad or poorly connected cables are the typical cause.

I will have the hardware guys take another look at that.

Thanx for all the pointers!

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/b237ae79/attachment.sig>

From changquing.tang at hp.com  Fri Apr  4 08:26:50 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Fri, 4 Apr 2008 15:26:50 +0000
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4 plans
In-Reply-To: <15ddcffd0804032122i2993bd00x84d9d38d2b7f34ba@mail.gmail.com>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>
	<47F4E0C3.2030100@voltaire.com>
	<D89C2C212795564B837FA1665CAE029910168DBD6C@G5W0278.americas.hpqcorp.net>
	<15ddcffd0804032122i2993bd00x84d9d38d2b7f34ba@mail.gmail.com>
Message-ID: <D89C2C212795564B837FA1665CAE02991016C3D024@G5W0278.americas.hpqcorp.net>

> >  for example, in MPI, process A know the HCA guid on another node.
> > After running for  some time, the switch is restarted for
> some reason, and the whole fabric is re-configured.
>
>
> CQ,
>
> If by "the whole fabric is re-configured" you refer to a case
> where a subnet prefix changes while a job runs and a process
> is detached/reattached to the job  so now you want to adopt
> your design to handle it, is over engineering, why you want
> to do that?
>

I am concerning the port lid change. It is always the best if a process can figure
the info it needs by itself, SA query is the right way and is in IB spec.

while it is possible to let processes to exchange information(port lid) again, but
there are difficulties: during the middle of a long job run, it is hard to let two
processes to coordinate such infomation exchange, and it requires a second channel
to do so. If the second channel is IPoIB, it is broken as well, and we need to re-establish
it again.

I just ask for the SA functionalities. If it is not possible, we have to use a very
complicated way to let HP-MPI to survive from network failure.


--CQ


> Or.
>


From hrosenstock at xsigo.com  Fri Apr  4 08:29:34 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 04 Apr 2008 08:29:34 -0700
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <1207322756.1750.86.camel@pc.ilinx>
References: <1207319819.1750.72.camel@pc.ilinx>
	<1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
	<1207321536.1750.80.camel@pc.ilinx>
	<1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>
	<4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>
	<1207322756.1750.86.camel@pc.ilinx>
Message-ID: <1207322974.15625.57.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 11:25 -0400, Brian J. Murrell wrote:
> On Fri, 2008-04-04 at 10:14 -0500, Todd Rimmer wrote:
> > I would not recommend reconfiguring your SM for this situation.
> 
> Indeed, if what you say below pans out, I'd rather not.
> 
> > Instead, you most likely have a bad cable or possibly a bad HCA or
> > switch port.  All IB products shipped within the last 6 years support
> > 10g, so the fact your system has negotiated to 2.5g indicates a problem
> > with the link.
> 
> OK.  I will investigate this.  Is there any more direct method of
> determining what rate an HCA has negotiated than using the "ibdiagpath
> -l $nid" mechanism that I have been using?  It seems like a kind of
> round-about method of getting that information.

Try ibcheckwidth for this particular problem

> > Bad or poorly connected cables are the typical cause.
> 
> I will have the hardware guys take another look at that.
> 
> Thanx for all the pointers!
> 
> b.
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Fri Apr  4 08:47:29 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 08:47:29 -0700
Subject: [ofa-general] linux-next: infiniband build failure
In-Reply-To: <20080404133204.3edc0470.sfr@canb.auug.org.au> (Stephen
	Rothwell's message of "Fri, 4 Apr 2008 13:32:04 +1100")
References: <20080404133204.3edc0470.sfr@canb.auug.org.au>
Message-ID: <adaabk9vcz2.fsf@cisco.com>

 > drivers/infiniband/hw/ehca/ehca_reqs.c: In function 'ehca_write_swqe':
 > drivers/infiniband/hw/ehca/ehca_reqs.c:191: error: 'const struct ib_send_wr' has no member named 'imm_data'

Oops, thanks, I forgot to run my cross-compile (and ehca is ppc only).

Anyway, your fix is correct and I rolled it into my patch.

Thanks!


From Thomas.Talpey at netapp.com  Fri Apr  4 08:56:23 2008
From: Thomas.Talpey at netapp.com (Talpey, Thomas)
Date: Fri, 04 Apr 2008 11:56:23 -0400
Subject: [ofa-general] [PATCH/RFC 2/2] RDMA/amso1100: Add support
	for "send with invalidate" work requests
In-Reply-To: <adak5jev3v3.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com> <adawsnewrba.fsf_-_@cisco.com>
	<adafxu2wnbx.fsf@cisco.com> <47F5689E.90101@iol.unh.edu>
	<adawsnev76m.fsf@cisco.com> <47F576B2.300@iol.unh.edu>
	<adak5jev3v3.fsf@cisco.com>
Message-ID: <EXNANE01FRaqbC8wSA10000022f@exnane01.hq.netapp.com>

At 08:52 PM 4/3/2008, Roland Dreier wrote:
>But does this code start working if we add the two patches I posted?  I
>don't understand how you could do anything useful with the current state
>of things plus send w/inval for amso1100.

Does send w/inv actually work end-to-end on the Ammasso? Who's testing it?
Just wondering.

Tom.


From rdreier at cisco.com  Fri Apr  4 09:06:42 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 09:06:42 -0700
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com> (Or
	Gerlitz's message of "Fri, 4 Apr 2008 07:47:40 +0300")
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
Message-ID: <ada1w5lvc31.fsf@cisco.com>

 > If not, can some notes be sent to the list? I say lets learn from what
 > you did so far...

In my experience, getting code to work over both IB and iWARP isn't that
hard.  The main points are:

 - Use the RDMA CM for connection establishment (duh)
 - Memory regions used to receive RDMA read responses must have "remote
   write" permission (since in the iWARP protocol, RDMA read responses
   are basically the same as incoming RDMA write requests)
 - Active side of the connection must do the first operation
 - Don't use IB-specific features (atomics, immediate data)

 - R.


From rdreier at cisco.com  Fri Apr  4 09:10:22 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 09:10:22 -0700
Subject: [ofa-general] Re: linux-next: infiniband build failure
In-Reply-To: <20080404031020.GB24743@kroah.com> (Greg KH's message of "Thu, 3
	Apr 2008 20:10:20 -0700")
References: <20080404135532.70c46480.sfr@canb.auug.org.au>
	<20080404031020.GB24743@kroah.com>
Message-ID: <adawsndtxch.fsf@cisco.com>

 > Roland wanted the ib patch to go through my tree, and I figure we will
 > work out these issues during the 2 week merge window.

Actually I said I was fine with whatever you wanted to do :)

Given that the new device support for ipath seems to cause problems for
ib-convert-struct-class_device-to-struct-device.patch, it seems it might
be simpler for me to carry that in my tree.  If someone sends me the
latest patch I'll be happy to merge it in (and do the fixups for the
ipath changes).

Then the final struct class_device removal just needs to be merged late
-- I'll send my tree to Linus to pull in the first day or two of the
merge window so I shouldn't be a problem.

Stephen, Greg, I really have the simplest job here managing my tree,
compared to you two guys, so as before just let me know how you want to
handle this ;)

 - R.


From bjorn.finnhammar at vv.se  Fri Apr  4 07:41:07 2008
From: bjorn.finnhammar at vv.se (burnard edison)
Date: Fri, 04 Apr 2008 14:41:07 +0000
Subject: [ofa-general] Hot video of your high school teacher
Message-ID: <000601c89670$0635ff66$829b94a3@udmtx>

UUFyWibTLk
 Watch the video nowoOPqUUFyWib
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/98e18c17/attachment.html>

From Brian.Murrell at Sun.COM  Fri Apr  4 10:37:10 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 13:37:10 -0400
Subject: [ofa-general] can not join due to rate:2.5Gbps < group:10Gbps?
In-Reply-To: <1207322974.15625.57.camel@hrosenstock-ws.xsigo.com>
References: <1207319819.1750.72.camel@pc.ilinx>
	<1207320958.15625.47.camel@hrosenstock-ws.xsigo.com>
	<1207321536.1750.80.camel@pc.ilinx>
	<1207321703.15625.51.camel@hrosenstock-ws.xsigo.com>
	<4FB1BCCAE6CAED44A1DC005B1DE06119428F53@EPEXCH2.qlogic.org>
	<1207322756.1750.86.camel@pc.ilinx>
	<1207322974.15625.57.camel@hrosenstock-ws.xsigo.com>
Message-ID: <1207330630.1750.108.camel@pc.ilinx>

On Fri, 2008-04-04 at 08:29 -0700, Hal Rosenstock wrote:
> 
> Try ibcheckwidth for this particular problem

Well, seems I solved the problem after finding the ibstatus command.

Seems the hardware guys plugged port 2 into the switch because port 1 of
one of the HCAs in one of the machines is broken.

Thanx for all of the help!

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/19c40805/attachment.sig>

From hrosenstock at xsigo.com  Fri Apr  4 10:55:21 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 04 Apr 2008 10:55:21 -0700
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <200804041147.27565.bs@q-leap.de>
References: <200804041147.27565.bs@q-leap.de>
Message-ID: <1207331721.15625.76.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 11:47 +0200, Bernd Schubert wrote:
> Hello,
> 
> opensm-3.2.1 logs some error messages like this:
> 
> Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> ERR 0108: Unknown remote side for node 0
> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list
> Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path:
>                                 Path = 0,1,14,13
> 
> 
> From ibnetdiscover output I see port13 of this switch is a switch-interconnect 
> (sorry, I don't know what the correct name/identifier for switches within 
> switches):
> 
> [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid 263 
> 4xSDR
> 
> 
> Apr 04 00:00:08 325219 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> ERR 0108: Unknown remote side for node 0
> x000b8cffff002bf9(SW_pfs1_inter6) port 9. Adding to light sweep sampling list
> Apr 04 00:00:08 325234 [4580A960] 0x01 -> Directed Path Dump of 2 hop path:
>                                 Path = 0,1,18
> 
> This is again an interconnection:
> 
> [9]     "S-000b8cffff002b9e"[15]                # "SW_pfs1_leaf1" lid 177 
> 4xDDR
> 
> 
> Apr 04 00:00:08 325288 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> ERR 0108: Unknown remote side for node 0
> x000b8cffff002bfa(SW_pfs1_inter7) port 13. Adding to light sweep sampling list
> Apr 04 00:00:08 325301 [4580A960] 0x01 -> Directed Path Dump of 2 hop path:
>                                 Path = 0,1,14
> 
> 
> And again an interconnection:
> 
> [13]    "S-000b8cffff002ba2"[13]                # "SW_pfs1_leaf4" lid 182 
> 4xDDR
> 
> 
> All the other interconnections seem to be fine. 

Any idea if OpenSM 3.1.10 has the same issue as 3.2.1 ?

Is this some large Flextronics switch ?

-- Hal

> Thanks,
> Bernd
> 
> 


From rdreier at cisco.com  Fri Apr  4 11:04:06 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 11:04:06 -0700
Subject: [ofa-general] Re: [PATCH V2] mlx4_core: increase max number of qp's
	to 128K
In-Reply-To: <200711281008.10521.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Wed, 28 Nov 2007 10:08:10 +0200")
References: <200711281008.10521.jackm@dev.mellanox.co.il>
Message-ID: <adasky1ts2x.fsf@cisco.com>

thanks, applied at long last.


From tom at opengridcomputing.com  Fri Apr  4 12:10:40 2008
From: tom at opengridcomputing.com (Tom Tucker)
Date: Fri, 04 Apr 2008 14:10:40 -0500
Subject: [ofa-general] [PATCH] AMSO1100: Add check for NULL reply_msg in
	c2_intr
Message-ID: <1207336240.1363.20.camel@trinity.ogc.int>


AMSO1100: Add check for NULL reply_msg in c2_intr
    
This is a checker-found bug posted to bugzilla.kernel.org (7478). Upon
inspection I also found a place where we could attempt to kmem_cache_free
a null pointer.
    
Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
---

Roland,

I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that
if we refactored vq_wait_for_reply that we could combine a common 

if (!reply) {
	err = -ENOMEM;
	goto bail;
}

construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without
an error. This patch, however, is much smaller. What do you think?

 drivers/infiniband/hw/amso1100/c2_cq.c   |    4 ++--
 drivers/infiniband/hw/amso1100/c2_intr.c |    6 +++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c
index d2b3366..bb17cce 100644
--- a/drivers/infiniband/hw/amso1100/c2_cq.c
+++ b/drivers/infiniband/hw/amso1100/c2_cq.c
@@ -422,8 +422,8 @@ void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq)
 		goto bail1;
 
 	reply = (struct c2wr_cq_destroy_rep *) (unsigned long) (vq_req->reply_msg);
-
-	vq_repbuf_free(c2dev, reply);
+	if (reply)
+		vq_repbuf_free(c2dev, reply);
       bail1:
 	vq_req_free(c2dev, vq_req);
       bail0:
diff --git a/drivers/infiniband/hw/amso1100/c2_intr.c b/drivers/infiniband/hw/amso1100/c2_intr.c
index 0d0bc33..3b50954 100644
--- a/drivers/infiniband/hw/amso1100/c2_intr.c
+++ b/drivers/infiniband/hw/amso1100/c2_intr.c
@@ -174,7 +174,11 @@ static void handle_vq(struct c2_dev *c2dev, u32 mq_index)
 		return;
 	}
 
-	err = c2_errno(reply_msg);
+	if (reply_msg)
+		err = c2_errno(reply_msg);
+	else
+		err = -ENOMEM;
+
 	if (!err) switch (req->event) {
 	case IW_CM_EVENT_ESTABLISHED:
 		c2_set_qp_state(req->qp,


From rdreier at cisco.com  Fri Apr  4 12:20:14 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 12:20:14 -0700
Subject: [ofa-general] error with ibv_poll_cq() call
In-Reply-To: <ada7ifmm6so.fsf@cisco.com> (Roland Dreier's message of "Fri, 28
	Mar 2008 22:35:19 -0700")
References: <D89C2C212795564B837FA1665CAE02990FED7B0783@G5W0278.americas.hpqcorp.net>
	<200803260901.25918.jackm@dev.mellanox.co.il>
	<ada7ifmm6so.fsf@cisco.com>
Message-ID: <adar6dls9zl.fsf@cisco.com>

OK, I committed my change to libmlx4 and the equivalent thing for libmthca.

 - R.


From rdreier at cisco.com  Fri Apr  4 12:22:06 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 12:22:06 -0700
Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in
	c2_intr
In-Reply-To: <1207336240.1363.20.camel@trinity.ogc.int> (Tom Tucker's message
	of "Fri, 04 Apr 2008 14:10:40 -0500")
References: <1207336240.1363.20.camel@trinity.ogc.int>
Message-ID: <adamyo9s9wh.fsf@cisco.com>

 > I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that
 > if we refactored vq_wait_for_reply that we could combine a common 
 > 
 > if (!reply) {
 > 	err = -ENOMEM;
 > 	goto bail;
 > }
 > 
 > construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without
 > an error. This patch, however, is much smaller. What do you think?

Well, now is a good time to merge either version of the fix.  Would be
nice to kill off one of the Coverity issues so I'm happy to take this.

It's up to you how much effort you want to spend on this... the
refactoring sounds nice but I think we're OK without it.

 - R.


From Brian.Murrell at Sun.COM  Fri Apr  4 12:24:28 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 15:24:28 -0400
Subject: [ofa-general] where to report bugs?
Message-ID: <1207337068.1750.114.camel@pc.ilinx>

I'm wondering what the official mechanism is to report bugs?  Just about
anything I'm going to find is likely to be limited to build and
installation bugs, like this one...

In infiniband-diags-1.3.6/Makefile.am we have the line:

INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband

This is assuming that other OFED packages have been installed in the
general system $PREFIX, usually /usr as $includedir should
be /usr/include.

But in particular, I have installed the opensm{,-devel} in an alternate
location (i.e. PREFIX) and the infiniband-diags build fails with:

if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I/usr/include -I/usr/include/infiniband  -I/home/brian/ofed_1.3_integration/tree/usr/include -Wall  -I/home/brian/ofed_1.3_integration/tree/usr/include -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -MT src_ibnetdiscover-ibnetdiscover.o -MD -MP -MF ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" -c -o src_ibnetdiscover-ibnetdiscover.o `test -f 'src/ibnetdiscover.c' || echo './'`src/ibnetdiscover.c; \
then mv -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" ".deps/src_ibnetdiscover-ibnetdiscover.Po"; else rm -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo"; exit 1; fi
In file included from src/ibnetdiscover.c:53:
/home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:39:29: error: complib/cl_qmap.h: No such file or directory
In file included from src/ibnetdiscover.c:53:
/home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:45: error: expected specifier-qualifier-list before ‘cl_map_item_t’
/home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:51: error: expected specifier-qualifier-list before ‘cl_qmap_t’
make[1]: *** [src_ibnetdiscover-ibnetdiscover.o] Error 1
make[1]: Leaving directory `/home/brian/rpm/BUILD/infiniband-diags-1.3.6'

On my system, with opensm-devel (and all other OFED RPMs) installed in
an alternate PREFIX, the above list of include paths should be
s#/usr/include/infiniband#PREFIX/include/infiniband#.

It seems probably infiniband-diags needs to have the same "--with-osm"
switch that ibutils has.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/1a32724c/attachment.sig>

From richard.frank at oracle.com  Fri Apr  4 13:26:04 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Fri, 04 Apr 2008 15:26:04 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <adahcek2pru.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com> <47F37CA4.8000109@mellanox.co.il>
	<adahcek2pru.fsf@cisco.com>
Message-ID: <47F68EDC.4050107@oracle.com>

 > We want to add send with invalidate & mask compare and swap.
 > Eli will be able to send the patches next week and since they are
 > small I think they can be in for 2.6.26

We are very interested in these new operations and are moving in the 
direction of tightly integrating RDMA along with atomics (if available) 
into Oracle.  We plan on testing some early prototypes of the these in 
the few months.

Send with invalidate is an exact match for our current RDS V3 rdma 
driver - and should be more efficient than the current background 
syncing of the tpt  to ensure keys are invalidated.

We intend on exposing the atomics via the RDS driver along with simple 
low level rdma operations to Oracle's internal clients. If Oracle is 
running over a transport which exports atomics and rdma - Oracle will 
see a dramatic performance boost for several database operations.

Roland Dreier wrote:
>  > We want to add send with invalidate & mask compare and swap.
>  > Eli will be able to send the patches next week and since they are
>  > small I think they can be in for 2.6.26
>
> Send with invalidate should be OK.  Let's see about the masked atomics
> stuff -- we have a ton of new verbs and I think we might want to slow
> down and make sure it all makes sense.
>
>  > What about the split CQ for UD mode? It's improved the IPoIB
>  > performance for small messages significantly.
>
> Oh yeah... I'll try to get that in too.
>
>  > mlx4- we plan to send patches for the low level driver only to enable
>  > mlx4_en. These only affect our low level driver.
>
> No problem in principle, let's see the actual patches.
>
>  > I think we should try to push for XEC in 2.6.26 since there are
>  > already MPI implementation that use it and this ties them to use OFED
>  > only.
>  > Also this feature is stable and now being defined in IBTA
>  > Not taking it causing changes between OFED and the kernel and your
>  > libibverbs and we wish to avoid such gaps.
>  > Is there any thing we can do to help and make it into 2.6.26?
>
> I don't have a good feeling that the user-kernel interface is well
> thought out, so I want to consider XRC + ehca LL stuff + new iWARP verbs
> and make sure we have something that makes sense for the future.
>
>  - R.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From tom at opengridcomputing.com  Fri Apr  4 12:10:40 2008
From: tom at opengridcomputing.com (Tom Tucker)
Date: Fri, 04 Apr 2008 14:10:40 -0500
Subject: [ofa-general] [PATCH] AMSO1100: Add check for NULL reply_msg in
	c2_intr
Message-ID: <1207336240.1363.20.camel@trinity.ogc.int>


AMSO1100: Add check for NULL reply_msg in c2_intr
    
This is a checker-found bug posted to bugzilla.kernel.org (7478). Upon
inspection I also found a place where we could attempt to kmem_cache_free
a null pointer.
    
Signed-off-by: Tom Tucker <tom at opengridcomputing.com>
---

Roland,

I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that
if we refactored vq_wait_for_reply that we could combine a common 

if (!reply) {
	err = -ENOMEM;
	goto bail;
}

construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without
an error. This patch, however, is much smaller. What do you think?

 drivers/infiniband/hw/amso1100/c2_cq.c   |    4 ++--
 drivers/infiniband/hw/amso1100/c2_intr.c |    6 +++++-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c
index d2b3366..bb17cce 100644
--- a/drivers/infiniband/hw/amso1100/c2_cq.c
+++ b/drivers/infiniband/hw/amso1100/c2_cq.c
@@ -422,8 +422,8 @@ void c2_free_cq(struct c2_dev *c2dev, struct c2_cq *cq)
 		goto bail1;
 
 	reply = (struct c2wr_cq_destroy_rep *) (unsigned long) (vq_req->reply_msg);
-
-	vq_repbuf_free(c2dev, reply);
+	if (reply)
+		vq_repbuf_free(c2dev, reply);
       bail1:
 	vq_req_free(c2dev, vq_req);
       bail0:
diff --git a/drivers/infiniband/hw/amso1100/c2_intr.c b/drivers/infiniband/hw/amso1100/c2_intr.c
index 0d0bc33..3b50954 100644
--- a/drivers/infiniband/hw/amso1100/c2_intr.c
+++ b/drivers/infiniband/hw/amso1100/c2_intr.c
@@ -174,7 +174,11 @@ static void handle_vq(struct c2_dev *c2dev, u32 mq_index)
 		return;
 	}
 
-	err = c2_errno(reply_msg);
+	if (reply_msg)
+		err = c2_errno(reply_msg);
+	else
+		err = -ENOMEM;
+
 	if (!err) switch (req->event) {
 	case IW_CM_EVENT_ESTABLISHED:
 		c2_set_qp_state(req->qp,


From tom at opengridcomputing.com  Fri Apr  4 12:32:43 2008
From: tom at opengridcomputing.com (Tom Tucker)
Date: Fri, 04 Apr 2008 14:32:43 -0500
Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in
	c2_intr
In-Reply-To: <adamyo9s9wh.fsf@cisco.com>
References: <1207336240.1363.20.camel@trinity.ogc.int>
	<adamyo9s9wh.fsf@cisco.com>
Message-ID: <1207337563.1363.22.camel@trinity.ogc.int>


On Fri, 2008-04-04 at 12:22 -0700, Roland Dreier wrote:
> > I don't think anyone has ever hit this bug, so it is a low priority in my view. I also noticed that
>  > if we refactored vq_wait_for_reply that we could combine a common 
>  > 
>  > if (!reply) {
>  > 	err = -ENOMEM;
>  > 	goto bail;
>  > }
>  > 
>  > construct by guaranteeing that reply is non-null if vq_wait_for_reply returns without
>  > an error. This patch, however, is much smaller. What do you think?
> 
> Well, now is a good time to merge either version of the fix.  Would be
> nice to kill off one of the Coverity issues so I'm happy to take this.
> 
> It's up to you how much effort you want to spend on this... the
> refactoring sounds nice but I think we're OK without it.
> 

I'm up to my eyeballs right now. If it's ok with you I'd say defer the
refactoring.

>  - R.


From rdreier at cisco.com  Fri Apr  4 12:34:52 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 12:34:52 -0700
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26
	(what's in infiniband.git)
In-Reply-To: <47F68EDC.4050107@oracle.com> (Richard Frank's message of "Fri,
	04 Apr 2008 15:26:04 -0500")
References: <adave31bayd.fsf@cisco.com> <47F37CA4.8000109@mellanox.co.il>
	<adahcek2pru.fsf@cisco.com> <47F68EDC.4050107@oracle.com>
Message-ID: <adaiqyxs9b7.fsf@cisco.com>

 > We are very interested in these new operations and are moving in the
 > direction of tightly integrating RDMA along with atomics (if
 > available) into Oracle.  We plan on testing some early prototypes of
 > the these in the few months.

And you need the ConnectX-only masked atomics?  Or do the standard IB
atomic operations work for you?  Of course using atomics at all means
that things don't work on iWARP.

 > Send with invalidate is an exact match for our current RDS V3 rdma
 > driver - and should be more efficient than the current background
 > syncing of the tpt  to ensure keys are invalidated.

How does send with invalidate interact with the current IB FMR stuff?
Seems that you would run into trouble keeping the state of the FMR
straight if the remote side is invalidating them.

Also I would think that send-with-invalidate would be much more
expensive than the current FMR method of batching up the invalidates,
since you don't get to amortize the cost of syncing up all the internal
HCA state.

 - R.


From rdreier at cisco.com  Fri Apr  4 12:35:39 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 12:35:39 -0700
Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in
	c2_intr
In-Reply-To: <1207337563.1363.22.camel@trinity.ogc.int> (Tom Tucker's message
	of "Fri, 04 Apr 2008 14:32:43 -0500")
References: <1207336240.1363.20.camel@trinity.ogc.int>
	<adamyo9s9wh.fsf@cisco.com> <1207337563.1363.22.camel@trinity.ogc.int>
Message-ID: <adaej9ls99w.fsf@cisco.com>

 > I'm up to my eyeballs right now. If it's ok with you I'd say defer the
 > refactoring.

No problem, I'll queue this up and if you ever get time to work on
amso1100 you can send the refactoring.

But are you working on a pmtu fix?

 - R.


From rdreier at cisco.com  Fri Apr  4 12:38:15 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 12:38:15 -0700
Subject: [ofa-general] Re: [PATCH 7/10] IB/ipoib: Add ethtool support
In-Reply-To: <1205767448.25950.142.camel@mtls03> (Eli Cohen's message of "Mon, 
	17 Mar 2008 17:24:08 +0200")
References: <1205767448.25950.142.camel@mtls03>
Message-ID: <adaabk9s95k.fsf@cisco.com>

thanks, applied


From hrosenstock at xsigo.com  Fri Apr  4 12:56:00 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 04 Apr 2008 12:56:00 -0700
Subject: [ofa-general] where to report bugs?
In-Reply-To: <1207337068.1750.114.camel@pc.ilinx>
References: <1207337068.1750.114.camel@pc.ilinx>
Message-ID: <1207338960.15625.147.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 15:24 -0400, Brian J. Murrell wrote:
> I'm wondering what the official mechanism is to report bugs?

http://www.openfabrics.org/bugzilla but that's usually used when email
is insufficient and some issue needs tracking but it's up to you.

-- Hal


From rdreier at cisco.com  Fri Apr  4 12:58:16 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 12:58:16 -0700
Subject: [ofa-general] Re: [PATCH 10/10] IB/mlx4: add support for modifying
	CQ parameters
In-Reply-To: <1205767465.25950.144.camel@mtls03> (Eli Cohen's message of "Mon, 
	17 Mar 2008 17:24:25 +0200")
References: <1205767465.25950.144.camel@mtls03>
Message-ID: <ada63uxs887.fsf@cisco.com>

thanks, I applied 8/10 and 9/10, and changed this one around a bit
before applying it... it seemed cleaner to me not to expose the CQ
context to the mlx4_ib driver.

For CQ resize we can just add a new mlx4_cq_resize() function in
mlx4_core, since the context parameters that matter there are completely
different.  (And there's no need for mlx4_ib to worry about either the
modify moderation or resize cases)

>From a1f375e52ce0b39bebaa27adc6d3724816f7e963 Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at dev.mellanox.co.il>
Date: Mon, 17 Mar 2008 17:24:25 +0200
Subject: [PATCH] IB/mlx4: Add support for modifying CQ moderation parameters

Signed-off-by: Eli Cohen <eli at mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/mlx4/cq.c      |    8 ++++++++
 drivers/infiniband/hw/mlx4/main.c    |    1 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    1 +
 drivers/net/mlx4/cq.c                |   31 +++++++++++++++++++++++++++++++
 include/linux/mlx4/cmd.h             |    2 +-
 include/linux/mlx4/cq.h              |    3 +++
 6 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 7d70af7..e4fb64b 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -85,6 +85,14 @@ static struct mlx4_cqe *next_cqe_sw(struct mlx4_ib_cq *cq)
 	return get_sw_cqe(cq, cq->mcq.cons_index);
 }
 
+int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period)
+{
+	struct mlx4_ib_cq *mcq = to_mcq(cq);
+	struct mlx4_ib_dev *dev = to_mdev(cq->device);
+
+	return mlx4_cq_modify(dev->dev, &mcq->mcq, cq_count, cq_period);
+}
+
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
 				struct ib_ucontext *context,
 				struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index e9330a0..76dd45c 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -609,6 +609,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	ibdev->ib_dev.post_send		= mlx4_ib_post_send;
 	ibdev->ib_dev.post_recv		= mlx4_ib_post_recv;
 	ibdev->ib_dev.create_cq		= mlx4_ib_create_cq;
+	ibdev->ib_dev.modify_cq		= mlx4_ib_modify_cq;
 	ibdev->ib_dev.destroy_cq	= mlx4_ib_destroy_cq;
 	ibdev->ib_dev.poll_cq		= mlx4_ib_poll_cq;
 	ibdev->ib_dev.req_notify_cq	= mlx4_ib_arm_cq;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 3f8bd0a..ef8ad96 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -254,6 +254,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 				  struct ib_udata *udata);
 int mlx4_ib_dereg_mr(struct ib_mr *mr);
 
+int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
 				struct ib_ucontext *context,
 				struct ib_udata *udata);
diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index d4441fe..00a270b 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -121,6 +121,13 @@ static int mlx4_SW2HW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
 			MLX4_CMD_TIME_CLASS_A);
 }
 
+static int mlx4_MODIFY_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
+			 int cq_num, u32 opmod)
+{
+	return mlx4_cmd(dev, mailbox->dma, cq_num, opmod, MLX4_CMD_MODIFY_CQ,
+			MLX4_CMD_TIME_CLASS_A);
+}
+
 static int mlx4_HW2SW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
 			 int cq_num)
 {
@@ -129,6 +136,30 @@ static int mlx4_HW2SW_CQ(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox,
 			    MLX4_CMD_TIME_CLASS_A);
 }
 
+int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq,
+		   u16 count, u16 period)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	struct mlx4_cq_context *cq_context;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	cq_context = mailbox->buf;
+	memset(cq_context, 0, sizeof *cq_context);
+
+	cq_context->cq_max_count = cpu_to_be16(count);
+	cq_context->cq_period    = cpu_to_be16(period);
+
+	err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 1);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_cq_modify);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq)
 {
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 7d1eaa9..77323a7 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -81,7 +81,7 @@ enum {
 	MLX4_CMD_SW2HW_CQ	 = 0x16,
 	MLX4_CMD_HW2SW_CQ	 = 0x17,
 	MLX4_CMD_QUERY_CQ	 = 0x18,
-	MLX4_CMD_RESIZE_CQ	 = 0x2c,
+	MLX4_CMD_MODIFY_CQ	 = 0x2c,
 
 	/* SRQ commands */
 	MLX4_CMD_SW2HW_SRQ	 = 0x35,
diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h
index 1243eba..f7c3511 100644
--- a/include/linux/mlx4/cq.h
+++ b/include/linux/mlx4/cq.h
@@ -130,4 +130,7 @@ enum {
 	MLX4_CQ_DB_REQ_NOT		= 2 << 24
 };
 
+int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq,
+		   u16 count, u16 period);
+
 #endif /* MLX4_CQ_H */
-- 
1.5.4.5


From michael.heinz at qlogic.com  Fri Apr  4 13:08:18 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Fri, 4 Apr 2008 15:08:18 -0500
Subject: [ofa-general] MVAPICH2 crashes on mixed fabric
Message-ID: <C07C40DB2364324799506DE8FF12F8D8678166@EPEXCH1.qlogic.org>

Hey, all, I'm not sure if this is a known bug or some sort of limitation
I'm unaware of, but I've been building and testing with the OFED 1.3 GA
release on a small fabric that has a mix of Arbel-based and newer
Connect-X HCAs.
 
What I've discovered is that mvapich and openmpi work fine across the
entire fabric, but mvapich2 crashes when I use a mix of Arbels and
Connect-X. The errors vary depending on the test program but here's an
example:
 
[mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1
.
.
.
(output snipped)
.
.
.

#-----------------------------------------------------------------------
------
# Benchmarking Sendrecv
# #processes = 2
# ( 3 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------
------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
Mbytes/sec
            0         1000         3.51         3.51         3.51
0.00
            1         1000         3.63         3.63         3.63
0.52
            2         1000         3.67         3.67         3.67
1.04
            4         1000         3.64         3.64         3.64
2.09
            8         1000         3.67         3.67         3.67
4.16
           16         1000         3.67         3.67         3.67
8.31
           32         1000         3.74         3.74         3.74
16.32
           64         1000         3.90         3.90         3.90
31.28
          128         1000         4.75         4.75         4.75
51.39
          256         1000         5.21         5.21         5.21
93.79
          512         1000         5.96         5.96         5.96
163.77
         1024         1000         7.88         7.89         7.89
247.54
         2048         1000        11.42        11.42        11.42
342.00
         4096         1000        15.33        15.33        15.33
509.49
         8192         1000        22.19        22.20        22.20
703.83
        16384         1000        34.57        34.57        34.57
903.88
        32768         1000        51.32        51.32        51.32
1217.94
        65536          640        85.80        85.81        85.80
1456.74
       131072          320       155.23       155.24       155.24
1610.40
       262144          160       301.84       301.86       301.85
1656.39
       524288           80       598.62       598.69       598.66
1670.31
      1048576           40      1175.22      1175.30      1175.26
1701.69
      2097152           20      2309.05      2309.05      2309.05
1732.32
      4194304           10      4548.72      4548.98      4548.85
1758.64
[0] Abort: Got FATAL event 3
 at line 796 in file ibv_channel_manager.c
rank 0 in job 1  compute-0-0.local_36049   caused collective abort of
all ranks
  exit status of rank 0: killed by signal 9

If, however, I define my mpdring to contain only Connect-X systems OR
only Arbel systems, IMB-MPI1 runs to completion.
 
Can any suggest a workaround or is this a real bug with mvapich2?
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/79fed611/attachment.html>

From andrea at qumranet.com  Fri Apr  4 13:20:56 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Fri, 4 Apr 2008 22:20:56 +0200
Subject: [ofa-general] [PATCH] mmu notifier #v11
In-Reply-To: <Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
References: <20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
	<Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
Message-ID: <20080404202055.GA14784@duo.random>

This should guarantee that nobody can register when any of the mmu
notifiers is running avoiding all the races including guaranteeing
range_start not to be missed. I'll adapt the other patches to provide
the sleeping-feature on top of this (only needed by XPMEM) soon. KVM
seems to run fine on top of this one.

Andrew can you apply this to -mm?

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Nick Piggin <npiggin at suse.de>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,9 @@
 				   unsigned long addr, unsigned long len,
 				   unsigned long flags, struct page **pages);
 
+extern void mm_lock(struct mm_struct *mm);
+extern void mm_unlock(struct mm_struct *mm);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,6 +225,9 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+	struct hlist_head mmu_notifier_list;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,175 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mm_types.h>
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+	/*
+	 * Called when nobody can register any more notifier in the mm
+	 * and after the "mn" notifier has been disarmed already.
+	 */
+	void (*release)(struct mmu_notifier *mn,
+			struct mm_struct *mm);
+
+	/*
+	 * clear_flush_young is called after the VM is
+	 * test-and-clearing the young/accessed bitflag in the
+	 * pte. This way the VM will provide proper aging to the
+	 * accesses to the page through the secondary MMUs and not
+	 * only to the ones through the Linux pte.
+	 */
+	int (*clear_flush_young)(struct mmu_notifier *mn,
+				 struct mm_struct *mm,
+				 unsigned long address);
+
+	/*
+	 * Before this is invoked any secondary MMU is still ok to
+	 * read/write to the page previously pointed by the Linux pte
+	 * because the old page hasn't been freed yet.  If required
+	 * set_page_dirty has to be called internally to this method.
+	 */
+	void (*invalidate_page)(struct mmu_notifier *mn,
+				struct mm_struct *mm,
+				unsigned long address);
+
+	/*
+	 * invalidate_range_start() and invalidate_range_end() must be
+	 * paired. Multiple invalidate_range_start/ends may be nested
+	 * or called concurrently.
+	 */
+	void (*invalidate_range_start)(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long start, unsigned long end);
+	void (*invalidate_range_end)(struct mmu_notifier *mn,
+				     struct mm_struct *mm,
+				     unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+	struct hlist_node hlist;
+	const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return unlikely(!hlist_empty(&mm->mmu_notifier_list));
+}
+
+extern void mmu_notifier_register(struct mmu_notifier *mn,
+				  struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_clear_flush_young(mm, address);
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_end(mm, start, end);
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+	INIT_HLIST_HEAD(&mm->mmu_notifier_list);
+}
+
+#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
+({									\
+	pte_t __pte;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
+	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
+	__pte;								\
+})
+
+#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
+({									\
+	int __young;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
+	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
+						  ___address);		\
+	__young;							\
+})
+
+#else /* CONFIG_MMU_NOTIFIER */
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+}
+
+#define ptep_clear_flush_young_notify ptep_clear_flush_young
+#define ptep_clear_flush_notify ptep_clear_flush
+
+#endif /* CONFIG_MMU_NOTIFIER */
+
+#endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -53,6 +53,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -362,6 +363,7 @@
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
+		mmu_notifier_mm_init(mm);
 		return mm;
 	}
 
diff --git a/mm/Kconfig b/mm/Kconfig
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,3 +193,7 @@
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config MMU_NOTIFIER
+	def_bool y
+	bool "MMU notifier, for paging KVM/RDMA"
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,4 +33,5 @@
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush(vma, address, pte);
+			pteval = ptep_clear_flush_notify(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -214,7 +215,9 @@
 		spin_unlock(&mapping->i_mmap_lock);
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = populate_range(mm, vma, start, size, pgoff);
+	mmu_notifier_invalidate_range_end(mm, start, start + size);
 	if (!err && !(flags & MAP_NONBLOCK)) {
 		if (unlikely(has_write_lock)) {
 			downgrade_write(&mm->mmap_sem);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,6 +14,7 @@
 #include <linux/mempolicy.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -799,6 +800,7 @@
 	BUG_ON(start & ~HPAGE_MASK);
 	BUG_ON(end & ~HPAGE_MASK);
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -819,6 +821,7 @@
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		list_del(&page->lru);
 		put_page(page);
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -611,6 +612,9 @@
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_start(src_mm, addr, end);
+
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);
 	do {
@@ -621,6 +625,11 @@
 						vma, addr, next))
 			return -ENOMEM;
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_end(src_mm,
+						vma->vm_start, end);
+
 	return 0;
 }
 
@@ -897,7 +906,9 @@
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start(mm, address, end);
 	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
+	mmu_notifier_invalidate_range_end(mm, address, end);
 	if (tlb)
 		tlb_finish_mmu(tlb, address, end);
 	return end;
@@ -1463,10 +1474,11 @@
 {
 	pgd_t *pgd;
 	unsigned long next;
-	unsigned long end = addr + size;
+	unsigned long start = addr, end = addr + size;
 	int err;
 
 	BUG_ON(addr >= end);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pgd = pgd_offset(mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1474,6 +1486,7 @@
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1675,7 +1688,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush(vma, address, page_table);
+		ptep_clear_flush_notify(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,7 @@
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1747,11 +1748,13 @@
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
 	tlb_finish_mmu(tlb, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 /*
@@ -2037,6 +2040,7 @@
 	unsigned long end;
 
 	/* mm's last user has gone, and its about to be pulled down */
+	mmu_notifier_release(mm);
 	arch_exit_mmap(mm);
 
 	lru_add_drain();
@@ -2242,3 +2246,69 @@
 
 	return 0;
 }
+
+static void mm_lock_unlock(struct mm_struct *mm, int lock)
+{
+	struct vm_area_struct *vma;
+	spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
+
+	i_mmap_lock_last = NULL;
+	for (;;) {
+		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
+		for (vma = mm->mmap; vma; vma = vma->vm_next)
+			if (vma->vm_file && vma->vm_file->f_mapping &&
+			    (unsigned long) i_mmap_lock >
+			    (unsigned long)
+			    &vma->vm_file->f_mapping->i_mmap_lock &&
+			    (unsigned long)
+			    &vma->vm_file->f_mapping->i_mmap_lock >
+			    (unsigned long) i_mmap_lock_last)
+				i_mmap_lock =
+					&vma->vm_file->f_mapping->i_mmap_lock;
+		if (i_mmap_lock == (spinlock_t *) -1UL)
+			break;
+		i_mmap_lock_last = i_mmap_lock;
+		if (lock)
+			spin_lock(i_mmap_lock);
+		else
+			spin_unlock(i_mmap_lock);
+	}
+
+	anon_vma_lock_last = NULL;
+	for (;;) {
+		spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
+		for (vma = mm->mmap; vma; vma = vma->vm_next)
+			if (vma->anon_vma &&
+			    (unsigned long) anon_vma_lock >
+			    (unsigned long) &vma->anon_vma->lock &&
+			    (unsigned long) &vma->anon_vma->lock >
+			    (unsigned long) anon_vma_lock_last)
+				anon_vma_lock = &vma->anon_vma->lock;
+		if (anon_vma_lock == (spinlock_t *) -1UL)
+			break;
+		anon_vma_lock_last = anon_vma_lock;
+		if (lock)
+			spin_lock(anon_vma_lock);
+		else
+			spin_unlock(anon_vma_lock);
+	}
+}
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+void mm_lock(struct mm_struct * mm)
+{
+	down_write(&mm->mmap_sem);
+	mm_lock_unlock(mm, 1);
+}
+
+void mm_unlock(struct mm_struct *mm)
+{
+	mm_lock_unlock(mm, 0);
+	up_write(&mm->mmap_sem);
+}
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
new file mode 100644
--- /dev/null
+++ b/mm/mmu_notifier.c
@@ -0,0 +1,100 @@
+/*
+ *  linux/mm/mmu_notifier.c
+ *
+ *  Copyright (C) 2008  Qumranet, Inc.
+ *  Copyright (C) 2008  SGI
+ *             Christoph Lameter <clameter at sgi.com>
+ *
+ *  This work is licensed under the terms of the GNU GPL, version 2. See
+ *  the COPYING file in the top-level directory.
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+
+/*
+ * No synchronization. This function can only be called when only a single
+ * process remains that performs teardown.
+ */
+void __mmu_notifier_release(struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+
+	while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
+		mn = hlist_entry(mm->mmu_notifier_list.first,
+				 struct mmu_notifier,
+				 hlist);
+		hlist_del(&mn->hlist);
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+	}
+}
+
+/*
+ * If no young bitflag is supported by the hardware, ->clear_flush_young can
+ * unmap the address and return 1 or 0 depending if the mapping previously
+ * existed or not.
+ */
+int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int young = 0;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->clear_flush_young)
+			young |= mn->ops->clear_flush_young(mn, mm, address);
+	}
+
+	return young;
+}
+
+void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_page)
+			mn->ops->invalidate_page(mn, mm, address);
+	}
+}
+
+void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_range_start)
+			mn->ops->invalidate_range_start(mn, mm, start, end);
+	}
+}
+
+void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_range_end)
+			mn->ops->invalidate_range_end(mn, mm, start, end);
+	}
+}
+
+/*
+ * Must not hold mmap_sem nor any other VM related lock when calling
+ * this registration function.
+ */
+void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	mm_lock(mm);
+	hlist_add_head(&mn->hlist, &mm->mmu_notifier_list);
+	mm_unlock(mm);
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_register);
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -21,6 +21,7 @@
 #include <linux/syscalls.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
@@ -198,10 +199,12 @@
 		dirty_accountable = 1;
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
 	vm_stat_account(mm, newflags, vma->vm_file, nrpages);
 	return 0;
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -18,6 +18,7 @@
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -74,7 +75,11 @@
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
+	unsigned long old_start;
 
+	old_start = old_addr;
+	mmu_notifier_invalidate_range_start(vma->vm_mm,
+					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -116,6 +121,7 @@
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -49,6 +49,7 @@
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlbflush.h>
 
@@ -287,7 +288,7 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young(vma, address, pte))
+	} else if (ptep_clear_flush_young_notify(vma, address, pte))
 		referenced++;
 
 	/* Pretend the page is referenced if the task has the
@@ -456,7 +457,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
+		entry = ptep_clear_flush_notify(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -717,14 +718,14 @@
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young(vma, address, pte)))) {
+			(ptep_clear_flush_young_notify(vma, address, pte)))) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+	pteval = ptep_clear_flush_notify(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -849,12 +850,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young(vma, address, pte))
+		if (ptep_clear_flush_young_notify(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
+		pteval = ptep_clear_flush_notify(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))


From or.gerlitz at gmail.com  Fri Apr  4 13:23:18 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 23:23:18 +0300
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <ada1w5lvc31.fsf@cisco.com>
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
	<ada1w5lvc31.fsf@cisco.com>
Message-ID: <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com>

On Fri, Apr 4, 2008 at 7:06 PM, Roland Dreier <rdreier at cisco.com> wrote:
>   - Don't use IB-specific features (atomics, immediate data)

and don't use RNRs as a means for HW based "flow control" mechanism.
The current RDS implementation
does not have a SW based flow control but rather does some sort of
back pressure through SW based congestion
management.  I think that to some extent it relies on RNRs which don't
exist under iWARP.

Or.


From or.gerlitz at gmail.com  Fri Apr  4 13:25:32 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 23:25:32 +0300
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F63E33.5080709@opengridcomputing.com>
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
	<47F63E33.5080709@opengridcomputing.com>
Message-ID: <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com>

On Fri, Apr 4, 2008 at 5:41 PM, Steve Wise <swise at opengridcomputing.com> wrote:
>  We won't be in Sonoma, but perhaps Jon can email some info to the list on
> what we've done to-date for open mpi.

This would be very much helpful, best if done before Monday so we can
discuss there the RDS port with the maintainer.
Jon - any chance you will be able to send something (even raw, sketch)?

Or.


From richard.frank at oracle.com  Fri Apr  4 14:27:52 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Fri, 04 Apr 2008 16:27:52 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>
	<47F3CA89.9080406@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	<47F4F526.3060709@opengridcomputing.com>	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>	<ada1w5lvc31.fsf@cisco.com>
	<15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com>
Message-ID: <47F69D58.6040800@oracle.com>

Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q 
and a message arrives ?


Or Gerlitz wrote:
> On Fri, Apr 4, 2008 at 7:06 PM, Roland Dreier <rdreier at cisco.com> wrote:
>   
>>   - Don't use IB-specific features (atomics, immediate data)
>>     
>
> and don't use RNRs as a means for HW based "flow control" mechanism.
> The current RDS implementation
> does not have a SW based flow control but rather does some sort of
> back pressure through SW based congestion
> management.  I think that to some extent it relies on RNRs which don't
> exist under iWARP.
>
> Or.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From richard.frank at oracle.com  Fri Apr  4 14:28:38 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Fri, 04 Apr 2008 16:28:38 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>
	<47F3CA89.9080406@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	<47F4F526.3060709@opengridcomputing.com>	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>	<47F63E33.5080709@opengridcomputing.com>
	<15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com>
Message-ID: <47F69D86.9040407@oracle.com>

How about a pointer to an IWARP spec - so we can sort out all the 
details.../ implications...to RDS.

Or Gerlitz wrote:
> On Fri, Apr 4, 2008 at 5:41 PM, Steve Wise <swise at opengridcomputing.com> wrote:
>   
>>  We won't be in Sonoma, but perhaps Jon can email some info to the list on
>> what we've done to-date for open mpi.
>>     
>
> This would be very much helpful, best if done before Monday so we can
> discuss there the RDS port with the maintainer.
> Jon - any chance you will be able to send something (even raw, sketch)?
>
> Or.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From or.gerlitz at gmail.com  Fri Apr  4 13:30:51 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 4 Apr 2008 23:30:51 +0300
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F69D58.6040800@oracle.com>
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
	<ada1w5lvc31.fsf@cisco.com>
	<15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com>
	<47F69D58.6040800@oracle.com>
Message-ID: <15ddcffd0804041330h3df8497tc81776ebfd106a19@mail.gmail.com>

On Sat, Apr 5, 2008 at 12:27 AM, Richard Frank <richard.frank at oracle.com> wrote:
> Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q and
> a message arrives ?

I am quite sure the L2 ethernet HW just drops it, but you better
verify this with an iWARP HW provider.

Or.


From weiny2 at llnl.gov  Fri Apr  4 13:31:37 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Fri, 4 Apr 2008 13:31:37 -0700
Subject: [ofa-general] where to report bugs?
In-Reply-To: <1207337068.1750.114.camel@pc.ilinx>
References: <1207337068.1750.114.camel@pc.ilinx>
Message-ID: <20080404133137.083027ae.weiny2@llnl.gov>

On Fri, 04 Apr 2008 15:24:28 -0400
"Brian J. Murrell" <Brian.Murrell at Sun.COM> wrote:

> I'm wondering what the official mechanism is to report bugs?  Just about
> anything I'm going to find is likely to be limited to build and
> installation bugs, like this one...
> 
> In infiniband-diags-1.3.6/Makefile.am we have the line:
> 
> INCLUDES = -I$(srcdir)/include -I$(includedir) -I$(includedir)/infiniband
> 
> This is assuming that other OFED packages have been installed in the
> general system $PREFIX, usually /usr as $includedir should
> be /usr/include.
> 
> But in particular, I have installed the opensm{,-devel} in an alternate
> location (i.e. PREFIX) and the infiniband-diags build fails with:

Are you specifying --prefix on the infiniband-diags configure?

I think that should work.

Ira

> 
> if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I/usr/include -I/usr/include/infiniband  -I/home/brian/ofed_1.3_integration/tree/usr/include -Wall  -I/home/brian/ofed_1.3_integration/tree/usr/include -O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2 -MT src_ibnetdiscover-ibnetdiscover.o -MD -MP -MF ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" -c -o src_ibnetdiscover-ibnetdiscover.o `test -f 'src/ibnetdiscover.c' || echo './'`src/ibnetdiscover.c; \
> then mv -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo" ".deps/src_ibnetdiscover-ibnetdiscover.Po"; else rm -f ".deps/src_ibnetdiscover-ibnetdiscover.Tpo"; exit 1; fi
> In file included from src/ibnetdiscover.c:53:
> /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:39:29: error: complib/cl_qmap.h: No such file or directory
> In file included from src/ibnetdiscover.c:53:
> /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:45: error: expected specifier-qualifier-list before ‘cl_map_item_t’
> /home/brian/ofed_1.3_integration/tree/usr/include/infiniband/complib/cl_nodenamemap.h:51: error: expected specifier-qualifier-list before ‘cl_qmap_t’
> make[1]: *** [src_ibnetdiscover-ibnetdiscover.o] Error 1
> make[1]: Leaving directory `/home/brian/rpm/BUILD/infiniband-diags-1.3.6'
> 
> On my system, with opensm-devel (and all other OFED RPMs) installed in
> an alternate PREFIX, the above list of include paths should be
> s#/usr/include/infiniband#PREFIX/include/infiniband#.
> 
> It seems probably infiniband-diags needs to have the same "--with-osm"
> switch that ibutils has.
> 
> b.
> 
> 


From Brian.Murrell at Sun.COM  Fri Apr  4 13:43:07 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 16:43:07 -0400
Subject: [ofa-general] where to report bugs?
In-Reply-To: <20080404133137.083027ae.weiny2@llnl.gov>
References: <1207337068.1750.114.camel@pc.ilinx>
	<20080404133137.083027ae.weiny2@llnl.gov>
Message-ID: <1207341787.1750.123.camel@pc.ilinx>

On Fri, 2008-04-04 at 13:31 -0700, Ira Weiny wrote:
> 
> Are you specifying --prefix on the infiniband-diags configure?

Ahhh.  That would have the undesired effect of relocating my
infiniband-diags wherever I specify --prefix.  This is not quite what I
want.

The ugly details are about to come out.

The problem is that I am not setting a --prefix when I build any of the
prerequisite packages (i.e. opensm, the libraries it depends on, etc.)
as I want everything to actually have a /usr prefix, however for the
purposes of building this stack from the downloadable package of what's
basically SRPMs, I install the prerequisites into a temporary path.

So I have a dir "./tree/" in which I use rpm2cpio < $rpm | cpio -id to
roll the packages into and then point the various configure scripts to
using various --with-* options.  This method has worked so far for:

SRPMS/libibcommon-1.0.8-1.ofed1.3
SRPMS/libibumad-1.1.7-1.ofed1.3
SRPMS/opensm-3.1.10-1.ofed1.3
SRPMS/ibutils-1.2-1.ofed1.3
SRPMS/libibmad-1.1.6-1.ofed1.3

The overall problem is that I cannot taint my pristine build environment
by going along the normal process of "build rpm, install it, build next
rpm, install it, etc.", so I have to install prerequisite RPMs into a
sandbox and point subsequent users (in the build process) of it into the
sandbox.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/2401fd68/attachment.sig>

From akepner at sgi.com  Fri Apr  4 13:47:58 2008
From: akepner at sgi.com (akepner at sgi.com)
Date: Fri, 4 Apr 2008 13:47:58 -0700
Subject: [ofa-general] ofed works on kernels with 64Kbyte pages?
Message-ID: <20080404204758.GU29410@sgi.com>


I know it's a long shot, but has anyone tried using OFED on
a kernel with 64Kbyte pages?

SGI would like to support that, but I've gotten reports that
something is not working (e.g., "ib_rdma_bw" doesn't work on 
an ia64 kernel with 64Kb pages). This is with the mthca driver, 
fwiw.

Unfortunately a conspiracy of h/w prevents me from reproducing
this right now, so I don't have more details. But I'd be very
curious to know if anyone can verify that OFED does/doesn't
work with 64Kbyte pages.

-- 
Arthur


From rdreier at cisco.com  Fri Apr  4 13:55:11 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 13:55:11 -0700
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F69D86.9040407@oracle.com> (Richard Frank's message of "Fri,
	04 Apr 2008 16:28:38 -0500")
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
	<47F63E33.5080709@opengridcomputing.com>
	<15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com>
	<47F69D86.9040407@oracle.com>
Message-ID: <adawsndqr0w.fsf@cisco.com>

 > How about a pointer to an IWARP spec - so we can sort out all the
 > details.../ implications...to RDS.

www.rdmaconsortium.org has most of it... the verbs are at:

http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf

the iWARP RDMA protocol is RFC 5040 et al:

http://www.ietf.org/rfc/rfc5040.txt

(the next few RFCs have lower-level details)


From rdreier at cisco.com  Fri Apr  4 14:02:03 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:02:03 -0700
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <15ddcffd0804041330h3df8497tc81776ebfd106a19@mail.gmail.com> (Or
	Gerlitz's message of "Fri, 4 Apr 2008 23:30:51 +0300")
References: <47F3C2EF.6010304@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>
	<47F3C5D1.5000003@oracle.com> <47F3CA89.9080406@oracle.com>
	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>
	<47F4F526.3060709@opengridcomputing.com>
	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
	<ada1w5lvc31.fsf@cisco.com>
	<15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com>
	<47F69D58.6040800@oracle.com>
	<15ddcffd0804041330h3df8497tc81776ebfd106a19@mail.gmail.com>
Message-ID: <adasky1qqpg.fsf@cisco.com>

 > > Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q and
 > > a message arrives ?
 > 
 > I am quite sure the L2 ethernet HW just drops it, but you better
 > verify this with an iWARP HW provider.

Why would it be dropped at L2?  What I believe will happen is that it
will generate an error at the DDP layer that will probably result in the
connection being closed.  Section 7.1 of RFC 5041 says:

   For non-zero-length Untagged DDP Segments, the DDP Segment MUST be
   validated before Placement by verifying:

["untagged DDP segments" are incoming send data, as vs. "tagged" RDMA
operations]

   2.  The QN and MSN have an associated buffer that allows Placement of
       the payload.

       Implementers' note: DDP implementations SHOULD consider lack of
       an associated buffer as a system fault.  DDP implementations MAY
       try to recover from the system fault using LLP means in a ULP-
       transparent way.  DDP implementations SHOULD NOT permit system
       faults to occur repeatedly or frequently.  If there is not an
       associated buffer, DDP implementations MAY choose to disable the
       stream for the reception and report an error to the ULP at the
       Data Sink.


From rdreier at cisco.com  Fri Apr  4 14:03:55 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:03:55 -0700
Subject: [ofa-general] ofed works on kernels with 64Kbyte pages?
In-Reply-To: <20080404204758.GU29410@sgi.com> (akepner@sgi.com's message of
	"Fri, 4 Apr 2008 13:47:58 -0700")
References: <20080404204758.GU29410@sgi.com>
Message-ID: <adaod8pqqmc.fsf@cisco.com>

 > I know it's a long shot, but has anyone tried using OFED on
 > a kernel with 64Kbyte pages?
 > 
 > SGI would like to support that, but I've gotten reports that
 > something is not working (e.g., "ib_rdma_bw" doesn't work on 
 > an ia64 kernel with 64Kb pages). This is with the mthca driver, 
 > fwiw.
 > 
 > Unfortunately a conspiracy of h/w prevents me from reproducing
 > this right now, so I don't have more details. But I'd be very
 > curious to know if anyone can verify that OFED does/doesn't
 > work with 64Kbyte pages.

I don't know about OFED, but I've tried various things on 64KB PAGE_SIZE
systems and it seems to work.  It wouldn't surprise me if there are
issues since the drivers and firmware gets a lot less testing in such
situations but it "should work" -- I'd be happy to help debug if anyone
has concrete problems.

 - R.


From weiny2 at llnl.gov  Fri Apr  4 14:06:46 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Fri, 4 Apr 2008 14:06:46 -0700
Subject: [ofa-general] where to report bugs?
In-Reply-To: <1207341787.1750.123.camel@pc.ilinx>
References: <1207337068.1750.114.camel@pc.ilinx>
	<20080404133137.083027ae.weiny2@llnl.gov>
	<1207341787.1750.123.camel@pc.ilinx>
Message-ID: <20080404140646.05387839.weiny2@llnl.gov>

On Fri, 04 Apr 2008 16:43:07 -0400
"Brian J. Murrell" <Brian.Murrell at Sun.COM> wrote:

> On Fri, 2008-04-04 at 13:31 -0700, Ira Weiny wrote:
> > 
> > Are you specifying --prefix on the infiniband-diags configure?
> 
> Ahhh.  That would have the undesired effect of relocating my
> infiniband-diags wherever I specify --prefix.  This is not quite what I
> want.
> 
> The ugly details are about to come out.
> 
> The problem is that I am not setting a --prefix when I build any of the
> prerequisite packages (i.e. opensm, the libraries it depends on, etc.)
> as I want everything to actually have a /usr prefix, however for the
> purposes of building this stack from the downloadable package of what's
> basically SRPMs, I install the prerequisites into a temporary path.
> 
> So I have a dir "./tree/" in which I use rpm2cpio < $rpm | cpio -id to
> roll the packages into and then point the various configure scripts to
> using various --with-* options.  This method has worked so far for:
> 
> SRPMS/libibcommon-1.0.8-1.ofed1.3
> SRPMS/libibumad-1.1.7-1.ofed1.3
> SRPMS/opensm-3.1.10-1.ofed1.3
> SRPMS/ibutils-1.2-1.ofed1.3
> SRPMS/libibmad-1.1.6-1.ofed1.3
> 
> The overall problem is that I cannot taint my pristine build environment
> by going along the normal process of "build rpm, install it, build next
> rpm, install it, etc.", so I have to install prerequisite RPMs into a
> sandbox and point subsequent users (in the build process) of it into the
> sandbox.
> 

So I guess you want something like:

export CPPFLAGS="-I<sandbox_dir>/include"

Before you do the configure and build?

Ira


From rdreier at cisco.com  Fri Apr  4 14:12:11 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:12:11 -0700
Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA
In-Reply-To: <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> (Ralph
	Campbell's message of "Wed, 02 Apr 2008 15:50:28 -0700")
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225028.28598.648.stgit@eng-46.mv.qlogic.com>
Message-ID: <adak5jdqq8k.fsf@cisco.com>

By the way...

 > +int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq,
 > +			     u32 counter)
 > +{
 > +	const u32 scounter = ipath_user_sdma_complete_counter(pq);
 > +	const s32 dcounter = scounter - counter;
 > +
 > +	return dcounter >= 0;
 > +}

I don't see this called anywhere... should I just delete it?


From Brian.Murrell at Sun.COM  Fri Apr  4 14:13:42 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 04 Apr 2008 17:13:42 -0400
Subject: [ofa-general] where to report bugs?
In-Reply-To: <20080404140646.05387839.weiny2@llnl.gov>
References: <1207337068.1750.114.camel@pc.ilinx>
	<20080404133137.083027ae.weiny2@llnl.gov>
	<1207341787.1750.123.camel@pc.ilinx>
	<20080404140646.05387839.weiny2@llnl.gov>
Message-ID: <1207343622.1750.128.camel@pc.ilinx>

On Fri, 2008-04-04 at 14:06 -0700, Ira Weiny wrote:
> So I guess you want something like:
> 
> export CPPFLAGS="-I<sandbox_dir>/include"

CPPFLAGS or CFLAGS?  I could see it being the former but I used the
latter.

> 
> Before you do the configure and build?

That is in effect exactly what I did to deal with this issue.  I just
didn't find it very elegant.  But if that is how the package is meant to
operate, that is fine.  If it were CFLAGS you were promoting the setting
of I would be a bit more sticky because RPM wants to have the CFLAGS for
it's own use:

$ rpm --eval="%configure"

  CFLAGS="${CFLAGS:--O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2}" ; export CFLAGS ; 
  CXXFLAGS="${CXXFLAGS:--O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2}" ; export CXXFLAGS ; 
  FFLAGS="${FFLAGS:--O2 -g -fmessage-length=0 -D_FORTIFY_SOURCE=2}" ; export FFLAGS ; 
  ./configure --host=x86_64-suse-linux --build=x86_64-suse-linux \
        --target=x86_64-suse-linux \
        --program-prefix= \
...

And while, yes, you can override CFLAGS and the %configure macro will
use it, I'd rather defer the CFLAGS to whatever the vendor has put into
the RPM macros file(s).

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/c7324586/attachment.sig>

From rdreier at cisco.com  Fri Apr  4 14:15:01 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:15:01 -0700
Subject: [ofa-general] Re: [PATCH 19/20] IB/ipath - add calls to new 7220
	code and enable in build
In-Reply-To: <20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com> (Ralph
	Campbell's message of "Wed, 02 Apr 2008 15:50:38 -0700")
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com>
Message-ID: <adafxu1qq3u.fsf@cisco.com>

 > +enum ib_rate ipath_mult_to_ib_rate(unsigned mult)
 > +{
 > +	switch (mult) {
 > +	case 8:  return IB_RATE_2_5_GBPS;
 > +	case 4:  return IB_RATE_5_GBPS;
 > +	case 2:  return IB_RATE_10_GBPS;
 > +	case 1:  return IB_RATE_20_GBPS;
 > +	default: return IB_RATE_PORT_CURRENT;
 > +	}
 > +}

Looks suspiciously like a copy of the existing mult_to_ib_rate() except
it handles fewer cases... is there a reason to copy this?

 - R.


From rdreier at cisco.com  Fri Apr  4 14:16:14 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:16:14 -0700
Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA
In-Reply-To: <20080402225028.28598.648.stgit@eng-46.mv.qlogic.com> (Ralph
	Campbell's message of "Wed, 02 Apr 2008 15:50:28 -0700")
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225028.28598.648.stgit@eng-46.mv.qlogic.com>
Message-ID: <adabq4pqq1t.fsf@cisco.com>

 > +void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq,
 > +					  u32 c)
 > +{
 > +	pq->sent_counter = c;
 > +}

This is only used in one file... OK to make it static?


From rdreier at cisco.com  Fri Apr  4 14:21:30 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:21:30 -0700
Subject: [ofa-general] Re: [PATCH 1/1 v1] MLX4: Added resize_cq capability.
In-Reply-To: <47F0A5A5.2010208@dev.mellanox.co.il> (Vladimir Sokolovsky's
	message of "Mon, 31 Mar 2008 11:49:41 +0300")
References: <47E923CA.90804@dev.mellanox.co.il> <adak5jmmccl.fsf@cisco.com>
	<47F0A5A5.2010208@dev.mellanox.co.il>
Message-ID: <adaabk9qpt1.fsf@cisco.com>

Thanks, I applied this with a lot of changes.  Some comments:

 >  	entries      = roundup_pow_of_two(entries + 1);

your patch was corrupted in a very strange way... the context lines had
two spaces instead of one at the beginning.  I just deleted the extra
space by hand.

 > +	err = mlx4_alloc_cq_buf(dev, &cq->resize_buf->buf, entries);
 > +	if (err) {
 > +		spin_lock_irq(&cq->lock);
 > +		kfree(cq->resize_buf);
 > +		cq->resize_buf = NULL;
 > +		spin_unlock_irq(&cq->lock);
 > +		goto out;
 > +	}

 > +err_buf:
 > +	if (cq->resize_buf) {
 > +		if (!ibcq->uobject)
 > +			mlx4_free_cq_buf(dev, &cq->resize_buf->buf,
 > +					 cq->resize_buf->cqe);
 > +
 > +		spin_lock_irq(&cq->lock);
 > +		kfree(cq->resize_buf);
 > +		cq->resize_buf = NULL;
 > +		spin_unlock_irq(&cq->lock);
 > +	}

Why do we need the spinlock in these places?  There's no way for this to
race with mlx4_ib_poll_one() is there, since that should never see the
RESIZE CQE?  (If there is such a race, then we're in trouble even with
the lock, since we're aborting the resize, and the poll code shouldn't
swap the buffers)

Also I got rid of the duplicated code to allocate buffers and get
userspace buffers, so that the allocate and resize paths use the same
code.  And I cleaned up some other stuff.

So please review/test my work to make sure I didn't break your patch...

---
 drivers/infiniband/hw/mlx4/cq.c      |  292 ++++++++++++++++++++++++++++++----
 drivers/infiniband/hw/mlx4/main.c    |    2 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    9 +
 drivers/net/mlx4/cq.c                |   28 ++++
 include/linux/mlx4/cq.h              |    2 +
 5 files changed, 300 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index e4fb64b..3557e7e 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -93,6 +93,74 @@ int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period)
 	return mlx4_cq_modify(dev->dev, &mcq->mcq, cq_count, cq_period);
 }
 
+static int mlx4_ib_alloc_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf *buf, int nent)
+{
+	int err;
+
+	err = mlx4_buf_alloc(dev->dev, nent * sizeof(struct mlx4_cqe),
+			     PAGE_SIZE * 2, &buf->buf);
+
+	if (err)
+		goto out;
+
+	err = mlx4_mtt_init(dev->dev, buf->buf.npages, buf->buf.page_shift,
+				    &buf->mtt);
+	if (err)
+		goto err_buf;
+
+	err = mlx4_buf_write_mtt(dev->dev, &buf->mtt, &buf->buf);
+	if (err)
+		goto err_mtt;
+
+	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev->dev, &buf->mtt);
+
+err_buf:
+	mlx4_buf_free(dev->dev, nent * sizeof(struct mlx4_cqe),
+			      &buf->buf);
+
+out:
+	return err;
+}
+
+static void mlx4_ib_free_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf *buf, int cqe)
+{
+	mlx4_buf_free(dev->dev, (cqe + 1) * sizeof(struct mlx4_cqe), &buf->buf);
+}
+
+static int mlx4_ib_get_cq_umem(struct mlx4_ib_dev *dev, struct ib_ucontext *context,
+			       struct mlx4_ib_cq_buf *buf, struct ib_umem **umem,
+			       u64 buf_addr, int cqe)
+{
+	int err;
+
+	*umem = ib_umem_get(context, buf_addr, cqe * sizeof (struct mlx4_cqe),
+			    IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(*umem))
+		return PTR_ERR(*umem);
+
+	err = mlx4_mtt_init(dev->dev, ib_umem_page_count(*umem),
+			    ilog2((*umem)->page_size), &buf->mtt);
+	if (err)
+		goto err_buf;
+
+	err = mlx4_ib_umem_write_mtt(dev, &buf->mtt, *umem);
+	if (err)
+		goto err_mtt;
+
+	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev->dev, &buf->mtt);
+
+err_buf:
+	ib_umem_release(*umem);
+
+	return err;
+}
+
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
 				struct ib_ucontext *context,
 				struct ib_udata *udata)
@@ -100,7 +168,6 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 	struct mlx4_ib_dev *dev = to_mdev(ibdev);
 	struct mlx4_ib_cq *cq;
 	struct mlx4_uar *uar;
-	int buf_size;
 	int err;
 
 	if (entries < 1 || entries > dev->dev->caps.max_cqes)
@@ -112,8 +179,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 
 	entries      = roundup_pow_of_two(entries + 1);
 	cq->ibcq.cqe = entries - 1;
-	buf_size     = entries * sizeof (struct mlx4_cqe);
+	mutex_init(&cq->resize_mutex);
 	spin_lock_init(&cq->lock);
+	cq->resize_buf = NULL;
+	cq->resize_umem = NULL;
 
 	if (context) {
 		struct mlx4_ib_create_cq ucmd;
@@ -123,21 +192,10 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 			goto err_cq;
 		}
 
-		cq->umem = ib_umem_get(context, ucmd.buf_addr, buf_size,
-				       IB_ACCESS_LOCAL_WRITE);
-		if (IS_ERR(cq->umem)) {
-			err = PTR_ERR(cq->umem);
-			goto err_cq;
-		}
-
-		err = mlx4_mtt_init(dev->dev, ib_umem_page_count(cq->umem),
-				    ilog2(cq->umem->page_size), &cq->buf.mtt);
+		err = mlx4_ib_get_cq_umem(dev, context, &cq->buf, &cq->umem,
+					  ucmd.buf_addr, entries);
 		if (err)
-			goto err_buf;
-
-		err = mlx4_ib_umem_write_mtt(dev, &cq->buf.mtt, cq->umem);
-		if (err)
-			goto err_mtt;
+			goto err_cq;
 
 		err = mlx4_ib_db_map_user(to_mucontext(context), ucmd.db_addr,
 					  &cq->db);
@@ -155,19 +213,9 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 		*cq->mcq.set_ci_db = 0;
 		*cq->mcq.arm_db    = 0;
 
-		if (mlx4_buf_alloc(dev->dev, buf_size, PAGE_SIZE * 2, &cq->buf.buf)) {
-			err = -ENOMEM;
-			goto err_db;
-		}
-
-		err = mlx4_mtt_init(dev->dev, cq->buf.buf.npages, cq->buf.buf.page_shift,
-				    &cq->buf.mtt);
+		err = mlx4_ib_alloc_cq_buf(dev, &cq->buf, entries);
 		if (err)
-			goto err_buf;
-
-		err = mlx4_buf_write_mtt(dev->dev, &cq->buf.mtt, &cq->buf.buf);
-		if (err)
-			goto err_mtt;
+			goto err_db;
 
 		uar = &dev->priv_uar;
 	}
@@ -195,12 +243,10 @@ err_dbmap:
 err_mtt:
 	mlx4_mtt_cleanup(dev->dev, &cq->buf.mtt);
 
-err_buf:
 	if (context)
 		ib_umem_release(cq->umem);
 	else
-		mlx4_buf_free(dev->dev, entries * sizeof (struct mlx4_cqe),
-			      &cq->buf.buf);
+		mlx4_ib_free_cq_buf(dev, &cq->buf, entries);
 
 err_db:
 	if (!context)
@@ -212,6 +258,170 @@ err_cq:
 	return ERR_PTR(err);
 }
 
+static int mlx4_alloc_resize_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
+				  int entries)
+{
+	int err;
+
+	if (cq->resize_buf)
+		return -EBUSY;
+
+	cq->resize_buf = kmalloc(sizeof *cq->resize_buf, GFP_ATOMIC);
+	if (!cq->resize_buf)
+		return -ENOMEM;
+
+	err = mlx4_ib_alloc_cq_buf(dev, &cq->resize_buf->buf, entries);
+	if (err) {
+		kfree(cq->resize_buf);
+		cq->resize_buf = NULL;
+		return err;
+	}
+
+	cq->resize_buf->cqe = entries - 1;
+
+	return 0;
+}
+
+static int mlx4_alloc_resize_umem(struct mlx4_ib_dev *dev, struct mlx4_ib_cq *cq,
+				   int entries, struct ib_udata *udata)
+{
+	struct mlx4_ib_resize_cq ucmd;
+	int err;
+
+	if (cq->resize_umem)
+		return -EBUSY;
+
+	if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd))
+		return -EFAULT;
+
+	cq->resize_buf = kmalloc(sizeof *cq->resize_buf, GFP_ATOMIC);
+	if (!cq->resize_buf)
+		return -ENOMEM;
+
+	err = mlx4_ib_get_cq_umem(dev, cq->umem->context, &cq->resize_buf->buf,
+				  &cq->resize_umem, ucmd.buf_addr, entries);
+	if (err) {
+		kfree(cq->resize_buf);
+		cq->resize_buf = NULL;
+		return err;
+	}
+
+	cq->resize_buf->cqe = entries - 1;
+
+	return 0;
+}
+
+static int mlx4_ib_get_outstanding_cqes(struct mlx4_ib_cq *cq)
+{
+	u32 i;
+
+	i = cq->mcq.cons_index;
+	while (get_sw_cqe(cq, i & cq->ibcq.cqe))
+		++i;
+
+	return i - cq->mcq.cons_index;
+}
+
+static void mlx4_ib_cq_resize_copy_cqes(struct mlx4_ib_cq *cq)
+{
+	struct mlx4_cqe *cqe;
+	int i;
+
+	i = cq->mcq.cons_index;
+	cqe = get_cqe(cq, i & cq->ibcq.cqe);
+	while ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) != MLX4_CQE_OPCODE_RESIZE) {
+		memcpy(get_cqe_from_buf(&cq->resize_buf->buf,
+					(i + 1) & cq->resize_buf->cqe),
+			get_cqe(cq, i & cq->ibcq.cqe), sizeof(struct mlx4_cqe));
+		cqe = get_cqe(cq, ++i & cq->ibcq.cqe);
+	}
+	++cq->mcq.cons_index;
+}
+
+int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
+{
+	struct mlx4_ib_dev *dev = to_mdev(ibcq->device);
+	struct mlx4_ib_cq *cq = to_mcq(ibcq);
+	int outst_cqe;
+	int err;
+
+	mutex_lock(&cq->resize_mutex);
+
+	if (entries < 1 || entries > dev->dev->caps.max_cqes) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	entries = roundup_pow_of_two(entries + 1);
+	if (entries == ibcq->cqe + 1) {
+		err = 0;
+		goto out;
+	}
+
+	if (ibcq->uobject) {
+		err = mlx4_alloc_resize_umem(dev, cq, entries, udata);
+		if (err)
+			goto out;
+	} else {
+		/* Can't be smaller then the number of outstanding CQEs */
+		outst_cqe = mlx4_ib_get_outstanding_cqes(cq);
+		if (entries < outst_cqe + 1) {
+			err = 0;
+			goto out;
+		}
+
+		err = mlx4_alloc_resize_buf(dev, cq, entries);
+		if (err)
+			goto out;
+	}
+
+	err = mlx4_cq_resize(dev->dev, &cq->mcq, entries, &cq->resize_buf->buf.mtt);
+	if (err)
+		goto err_buf;
+
+	if (ibcq->uobject) {
+		cq->buf      = cq->resize_buf->buf;
+		cq->ibcq.cqe = cq->resize_buf->cqe;
+		ib_umem_release(cq->umem);
+		cq->umem     = cq->resize_umem;
+
+		kfree(cq->resize_buf);
+		cq->resize_buf = NULL;
+		cq->resize_umem = NULL;
+	} else {
+		spin_lock_irq(&cq->lock);
+		if (cq->resize_buf) {
+			mlx4_ib_cq_resize_copy_cqes(cq);
+			mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
+			cq->buf      = cq->resize_buf->buf;
+			cq->ibcq.cqe = cq->resize_buf->cqe;
+
+			kfree(cq->resize_buf);
+			cq->resize_buf = NULL;
+		}
+		spin_unlock_irq(&cq->lock);
+	}
+
+	goto out;
+
+err_buf:
+	if (!ibcq->uobject)
+		mlx4_ib_free_cq_buf(dev, &cq->resize_buf->buf,
+				    cq->resize_buf->cqe);
+
+	kfree(cq->resize_buf);
+	cq->resize_buf = NULL;
+
+	if (cq->resize_umem) {
+		ib_umem_release(cq->resize_umem);
+		cq->resize_umem = NULL;
+	}
+
+out:
+	mutex_unlock(&cq->resize_mutex);
+	return err;
+}
+
 int mlx4_ib_destroy_cq(struct ib_cq *cq)
 {
 	struct mlx4_ib_dev *dev = to_mdev(cq->device);
@@ -224,8 +434,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq)
 		mlx4_ib_db_unmap_user(to_mucontext(cq->uobject->context), &mcq->db);
 		ib_umem_release(mcq->umem);
 	} else {
-		mlx4_buf_free(dev->dev, (cq->cqe + 1) * sizeof (struct mlx4_cqe),
-			      &mcq->buf.buf);
+		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1);
 		mlx4_ib_db_free(dev, &mcq->db);
 	}
 
@@ -332,6 +541,7 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 	u32 g_mlpath_rqpn;
 	u16 wqe_ctr;
 
+repoll:
 	cqe = next_cqe_sw(cq);
 	if (!cqe)
 		return -EAGAIN;
@@ -354,6 +564,22 @@ static int mlx4_ib_poll_one(struct mlx4_ib_cq *cq,
 		return -EINVAL;
 	}
 
+	/* Resize CQ in progress */
+	if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_CQE_OPCODE_RESIZE)) {
+		if (cq->resize_buf) {
+			struct mlx4_ib_dev *dev = to_mdev(cq->ibcq.device);
+
+			mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
+			cq->buf      = cq->resize_buf->buf;
+			cq->ibcq.cqe = cq->resize_buf->cqe;
+
+			kfree(cq->resize_buf);
+			cq->resize_buf = NULL;
+		}
+
+		goto repoll;
+	}
+
 	if (!*cur_qp ||
 	    (be32_to_cpu(cqe->my_qpn) & 0xffffff) != (*cur_qp)->mqp.qpn) {
 		/*
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 76dd45c..57885cd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -571,6 +571,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		(1ull << IB_USER_VERBS_CMD_DEREG_MR)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL)	|
 		(1ull << IB_USER_VERBS_CMD_CREATE_CQ)		|
+		(1ull << IB_USER_VERBS_CMD_RESIZE_CQ)		|
 		(1ull << IB_USER_VERBS_CMD_DESTROY_CQ)		|
 		(1ull << IB_USER_VERBS_CMD_CREATE_QP)		|
 		(1ull << IB_USER_VERBS_CMD_MODIFY_QP)		|
@@ -610,6 +611,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	ibdev->ib_dev.post_recv		= mlx4_ib_post_recv;
 	ibdev->ib_dev.create_cq		= mlx4_ib_create_cq;
 	ibdev->ib_dev.modify_cq		= mlx4_ib_modify_cq;
+	ibdev->ib_dev.resize_cq		= mlx4_ib_resize_cq;
 	ibdev->ib_dev.destroy_cq	= mlx4_ib_destroy_cq;
 	ibdev->ib_dev.poll_cq		= mlx4_ib_poll_cq;
 	ibdev->ib_dev.req_notify_cq	= mlx4_ib_arm_cq;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index ef8ad96..9e63732 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -78,13 +78,21 @@ struct mlx4_ib_cq_buf {
 	struct mlx4_mtt		mtt;
 };
 
+struct mlx4_ib_cq_resize {
+	struct mlx4_ib_cq_buf	buf;
+	int			cqe;
+};
+
 struct mlx4_ib_cq {
 	struct ib_cq		ibcq;
 	struct mlx4_cq		mcq;
 	struct mlx4_ib_cq_buf	buf;
+	struct mlx4_ib_cq_resize *resize_buf;
 	struct mlx4_ib_db	db;
 	spinlock_t		lock;
+	struct mutex		resize_mutex;
 	struct ib_umem	       *umem;
+	struct ib_umem	       *resize_umem;
 };
 
 struct mlx4_ib_mr {
@@ -255,6 +263,7 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 int mlx4_ib_dereg_mr(struct ib_mr *mr);
 
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
+int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector,
 				struct ib_ucontext *context,
 				struct ib_udata *udata);
diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index 8c31434..caa5bcf 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -159,6 +159,34 @@ int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq,
 }
 EXPORT_SYMBOL_GPL(mlx4_cq_modify);
 
+int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq,
+		   int entries, struct mlx4_mtt *mtt)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	struct mlx4_cq_context *cq_context;
+	u64 mtt_addr;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	cq_context = mailbox->buf;
+	memset(cq_context, 0, sizeof *cq_context);
+
+	cq_context->logsize_usrpage = cpu_to_be32(ilog2(entries) << 24);
+	cq_context->log_page_size   = mtt->page_shift - 12;
+	mtt_addr = mlx4_mtt_addr(dev, mtt);
+	cq_context->mtt_base_addr_h = mtt_addr >> 32;
+	cq_context->mtt_base_addr_l = cpu_to_be32(mtt_addr & 0xffffffff);
+
+	err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 1);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_cq_resize);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq)
 {
diff --git a/include/linux/mlx4/cq.h b/include/linux/mlx4/cq.h
index f7c3511..071cf96 100644
--- a/include/linux/mlx4/cq.h
+++ b/include/linux/mlx4/cq.h
@@ -132,5 +132,7 @@ enum {
 
 int mlx4_cq_modify(struct mlx4_dev *dev, struct mlx4_cq *cq,
 		   u16 count, u16 period);
+int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq,
+		   int entries, struct mlx4_mtt *mtt);
 
 #endif /* MLX4_CQ_H */
-- 
1.5.4.5


From richard.frank at oracle.com  Fri Apr  4 15:21:59 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Fri, 04 Apr 2008 17:21:59 -0500
Subject: [ofa-general] InfiniBand/iWARP/RDMA merge plans for 2.6.26 (what's
	in infiniband.git)
In-Reply-To: <adaiqyxs9b7.fsf@cisco.com>
References: <adave31bayd.fsf@cisco.com>
	<47F37CA4.8000109@mellanox.co.il>	<adahcek2pru.fsf@cisco.com>
	<47F68EDC.4050107@oracle.com> <adaiqyxs9b7.fsf@cisco.com>
Message-ID: <47F6AA07.70706@oracle.com>

Roland Dreier wrote:
>  > We are very interested in these new operations and are moving in the
>  > direction of tightly integrating RDMA along with atomics (if
>  > available) into Oracle.  We plan on testing some early prototypes of
>  > the these in the few months.
>
> And you need the ConnectX-only masked atomics?  Or do the standard IB
> atomic operations work for you?  Of course using atomics at all means
> that things don't work on iWARP.
>
>   
We specifically asked for the masked operations.

Yes, this means Oracle will not get the performance boost of atomics on 
IWARP - but we still get rdma - and that's a real win / benefit for 
Oracle today - and more so over the next few months.

>  > Send with invalidate is an exact match for our current RDS V3 rdma
>  > driver - and should be more efficient than the current background
>  > syncing of the tpt  to ensure keys are invalidated.
>
> How does send with invalidate interact with the current IB FMR stuff?
> Seems that you would run into trouble keeping the state of the FMR
> straight if the remote side is invalidating them.
>
>   
The model we implement is based on "use once" keys - we issue the key to 
the rdma server and want to toss it as soon as the rdma is complete. 
Today, we explicitly free the key after the rdma completes and we get a 
message from the rdma server - saying rdma is complete. If the key is 
auto invalidated by the recv'ing HCA then we do not need to do it in the 
driver... which also meanswe do not need to issue the sync tpts to force 
the HCA to be update its cache.

At least this is how I think it works - Olaf is the divine source here.

> Also I would think that send-with-invalidate would be much more
> expensive than the current FMR method of batching up the invalidates,
> since you don't get to amortize the cost of syncing up all the internal
> HCA state.
>
>   
This is the one piece we do not know - our plans are to test this and 
see where the trade offs are. We will keep the current design / 
implementation to run over NICs that do not support send-with-invalidate.
>  - R.
>   


From ralph.campbell at qlogic.com  Fri Apr  4 14:30:51 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 04 Apr 2008 14:30:51 -0700
Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA
In-Reply-To: <adak5jdqq8k.fsf@cisco.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225028.28598.648.stgit@eng-46.mv.qlogic.com>
	<adak5jdqq8k.fsf@cisco.com>
Message-ID: <1207344651.8715.14.camel@brick.pathscale.com>

On Fri, 2008-04-04 at 14:12 -0700, Roland Dreier wrote:
> By the way...
> 
>  > +int ipath_user_sdma_pkt_sent(const struct ipath_user_sdma_queue *pq,
>  > +			     u32 counter)
>  > +{
>  > +	const u32 scounter = ipath_user_sdma_complete_counter(pq);
>  > +	const s32 dcounter = scounter - counter;
>  > +
>  > +	return dcounter >= 0;
>  > +}
> 
> I don't see this called anywhere... should I just delete it?

Yes. You can remove it.


From ralph.campbell at qlogic.com  Fri Apr  4 14:42:24 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 04 Apr 2008 14:42:24 -0700
Subject: [ofa-general] Re: [PATCH 19/20] IB/ipath - add calls to new 7220
	code and enable in build
In-Reply-To: <adafxu1qq3u.fsf@cisco.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com>
	<adafxu1qq3u.fsf@cisco.com>
Message-ID: <1207345344.8715.21.camel@brick.pathscale.com>

On Fri, 2008-04-04 at 14:15 -0700, Roland Dreier wrote:
>  > +enum ib_rate ipath_mult_to_ib_rate(unsigned mult)
>  > +{
>  > +	switch (mult) {
>  > +	case 8:  return IB_RATE_2_5_GBPS;
>  > +	case 4:  return IB_RATE_5_GBPS;
>  > +	case 2:  return IB_RATE_10_GBPS;
>  > +	case 1:  return IB_RATE_20_GBPS;
>  > +	default: return IB_RATE_PORT_CURRENT;
>  > +	}
>  > +}
> 
> Looks suspiciously like a copy of the existing mult_to_ib_rate() except
> it handles fewer cases... is there a reason to copy this?
> 
>  - R.

It looks similar but the values are reversed. This is converting
the ib_rate enum to a multiplier of the DDR clock rate which is
used as a counter to delay packets. So IB_RATE_2_5_GBPS is 8
times slower than IB_RATE_20_GBPS. The standard functions map
the enum to a multiplier of the slowest rate so
IB_RATE_2_5_GBPS is one. If I used the standard functions, I would
still need a lookup table to map 8->1, 1->8, etc.


From ralph.campbell at qlogic.com  Fri Apr  4 14:44:03 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 04 Apr 2008 14:44:03 -0700
Subject: [ofa-general] Re: [PATCH 17/20] IB/ipath - user mode send DMA
In-Reply-To: <adabq4pqq1t.fsf@cisco.com>
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225028.28598.648.stgit@eng-46.mv.qlogic.com>
	<adabq4pqq1t.fsf@cisco.com>
Message-ID: <1207345443.8715.23.camel@brick.pathscale.com>

On Fri, 2008-04-04 at 14:16 -0700, Roland Dreier wrote:
>  > +void ipath_user_sdma_set_complete_counter(struct ipath_user_sdma_queue *pq,
>  > +					  u32 c)
>  > +{
>  > +	pq->sent_counter = c;
>  > +}
> 
> This is only used in one file... OK to make it static?

Yes, thanks.


From bs at q-leap.de  Fri Apr  4 14:45:54 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Fri, 4 Apr 2008 23:45:54 +0200
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <1207331721.15625.76.camel@hrosenstock-ws.xsigo.com>
References: <200804041147.27565.bs@q-leap.de>
	<1207331721.15625.76.camel@hrosenstock-ws.xsigo.com>
Message-ID: <20080404214553.GA15927@lanczos.q-leap.de>

On Fri, Apr 04, 2008 at 10:55:21AM -0700, Hal Rosenstock wrote:
> On Fri, 2008-04-04 at 11:47 +0200, Bernd Schubert wrote:
> > Hello,
> > 
> > opensm-3.2.1 logs some error messages like this:
> > 
> > Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> > ERR 0108: Unknown remote side for node 0
> > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list
> > Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path:
> >                                 Path = 0,1,14,13
> > 
> > 
> > From ibnetdiscover output I see port13 of this switch is a switch-interconnect 
> > (sorry, I don't know what the correct name/identifier for switches within 
> > switches):
> > 
> > [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid 263 
> > 4xSDR
> > 
> > 
> > Apr 04 00:00:08 325219 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> > ERR 0108: Unknown remote side for node 0
> > x000b8cffff002bf9(SW_pfs1_inter6) port 9. Adding to light sweep sampling list
> > Apr 04 00:00:08 325234 [4580A960] 0x01 -> Directed Path Dump of 2 hop path:
> >                                 Path = 0,1,18
> > 
> > This is again an interconnection:
> > 
> > [9]     "S-000b8cffff002b9e"[15]                # "SW_pfs1_leaf1" lid 177 
> > 4xDDR
> > 
> > 
> > Apr 04 00:00:08 325288 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> > ERR 0108: Unknown remote side for node 0
> > x000b8cffff002bfa(SW_pfs1_inter7) port 13. Adding to light sweep sampling list
> > Apr 04 00:00:08 325301 [4580A960] 0x01 -> Directed Path Dump of 2 hop path:
> >                                 Path = 0,1,14
> > 
> > 
> > And again an interconnection:
> > 
> > [13]    "S-000b8cffff002ba2"[13]                # "SW_pfs1_leaf4" lid 182 
> > 4xDDR
> > 
> > 
> > All the other interconnections seem to be fine. 
> 
> Any idea if OpenSM 3.1.10 has the same issue as 3.2.1 ?

Yes, from the log file I see these messages also did happen with opensm-3.1.10.

> 
> Is this some large Flextronics switch ?

Again you are right, this is a Flextronics F-X430075, presently with 144 ports.


Thanks,
Bernd


From rdreier at cisco.com  Fri Apr  4 14:47:17 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 14:47:17 -0700
Subject: [ofa-general] Re: [PATCH 19/20] IB/ipath - add calls to new 7220
	code and enable in build
In-Reply-To: <1207345344.8715.21.camel@brick.pathscale.com> (Ralph Campbell's
	message of "Fri, 04 Apr 2008 14:42:24 -0700")
References: <20080402224901.28598.97004.stgit@eng-46.mv.qlogic.com>
	<20080402225038.28598.43308.stgit@eng-46.mv.qlogic.com>
	<adafxu1qq3u.fsf@cisco.com>
	<1207345344.8715.21.camel@brick.pathscale.com>
Message-ID: <ada63uxqom2.fsf@cisco.com>

 > It looks similar but the values are reversed. This is converting
 > the ib_rate enum to a multiplier of the DDR clock rate which is
 > used as a counter to delay packets. So IB_RATE_2_5_GBPS is 8
 > times slower than IB_RATE_20_GBPS. The standard functions map
 > the enum to a multiplier of the slowest rate so
 > IB_RATE_2_5_GBPS is one. If I used the standard functions, I would
 > still need a lookup table to map 8->1, 1->8, etc.

OK, got it thanks


From sfr at canb.auug.org.au  Fri Apr  4 14:48:32 2008
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Sat, 5 Apr 2008 08:48:32 +1100
Subject: [ofa-general] linux-next: infiniband build failure
In-Reply-To: <adaabk9vcz2.fsf@cisco.com>
References: <20080404133204.3edc0470.sfr@canb.auug.org.au>
	<adaabk9vcz2.fsf@cisco.com>
Message-ID: <20080405084832.5e4a0c53.sfr@canb.auug.org.au>

Hi Roland,

On Fri, 04 Apr 2008 08:47:29 -0700 Roland Dreier <rdreier at cisco.com> wrote:
>
>  > drivers/infiniband/hw/ehca/ehca_reqs.c: In function 'ehca_write_swqe':
>  > drivers/infiniband/hw/ehca/ehca_reqs.c:191: error: 'const struct ib_send_wr' has no member named 'imm_data'
> 
> Oops, thanks, I forgot to run my cross-compile (and ehca is ppc only).
> 
> Anyway, your fix is correct and I rolled it into my patch.

Thanks.
-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080405/dc07b60c/attachment.sig>

From rdreier at cisco.com  Fri Apr  4 15:02:15 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 15:02:15 -0700
Subject: [ofa-general] Re: [PATCH] mthca: update QP state after query QP
In-Reply-To: <200803271636.00414.dotanb@dev.mellanox.co.il> (Dotan Barak's
	message of "Thu, 27 Mar 2008 16:36:00 +0200")
References: <200803271636.00414.dotanb@dev.mellanox.co.il>
Message-ID: <ada1w5lqnx4.fsf@cisco.com>

thanks, applied


From rdreier at cisco.com  Fri Apr  4 15:04:13 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 15:04:13 -0700
Subject: [ofa-general] Re: [PATCH] mlx4: update QP state after query QP
In-Reply-To: <200803271708.41638.dotanb@dev.mellanox.co.il> (Dotan Barak's
	message of "Thu, 27 Mar 2008 17:08:41 +0200")
References: <200803271708.41638.dotanb@dev.mellanox.co.il>
Message-ID: <adawsndp99e.fsf@cisco.com>

thanks, applied


From clameter at sgi.com  Fri Apr  4 15:06:18 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 4 Apr 2008 15:06:18 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH] mmu notifier #v11
In-Reply-To: <20080404202055.GA14784@duo.random>
References: <20080401205635.793766935@sgi.com>
	<20080402064952.GF19189@duo.random>
	<Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
	<Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
	<20080404202055.GA14784@duo.random>
Message-ID: <Pine.LNX.4.64.0804041504310.12396@schroedinger.engr.sgi.com>

I am always the guy doing the cleanup after Andrea it seems. Sigh.

Here is the mm_lock/mm_unlock logic separated out for easier review.
Adds some comments. Still objectionable is the multiple ways of
invalidating pages in #v11. Callout now has similar locking to emm.

From: Christoph Lameter <clameter at sgi.com>
Subject: mm_lock: Lock a process against reclaim

Provide a way to lock an mm_struct against reclaim (try_to_unmap
etc). This is necessary for the invalidate notifier approaches so
that they can reliably add and remove a notifier.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

---
 include/linux/mm.h |   10 ++++++++
 mm/mmap.c          |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h	2008-04-02 11:41:47.741678873 -0700
+++ linux-2.6/include/linux/mm.h	2008-04-04 15:02:17.660504756 -0700
@@ -1050,6 +1050,16 @@ extern int install_special_mapping(struc
 				   unsigned long addr, unsigned long len,
 				   unsigned long flags, struct page **pages);
 
+/*
+ * Locking and unlocking an mm against reclaim.
+ *
+ * mm_lock will take mmap_sem writably (to prevent additional vmas from being
+ * added) and then take all mapping locks of the existing vmas. With that
+ * reclaim is effectively stopped.
+ */
+extern void mm_lock(struct mm_struct *mm);
+extern void mm_unlock(struct mm_struct *mm);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
Index: linux-2.6/mm/mmap.c
===================================================================
--- linux-2.6.orig/mm/mmap.c	2008-04-04 14:55:03.477593980 -0700
+++ linux-2.6/mm/mmap.c	2008-04-04 14:59:05.505395402 -0700
@@ -2242,3 +2242,69 @@ int install_special_mapping(struct mm_st
 
 	return 0;
 }
+
+static void mm_lock_unlock(struct mm_struct *mm, int lock)
+{
+	struct vm_area_struct *vma;
+	spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
+
+	i_mmap_lock_last = NULL;
+	for (;;) {
+		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
+		for (vma = mm->mmap; vma; vma = vma->vm_next)
+			if (vma->vm_file && vma->vm_file->f_mapping &&
+			    (unsigned long) i_mmap_lock >
+			    (unsigned long)
+			    &vma->vm_file->f_mapping->i_mmap_lock &&
+			    (unsigned long)
+			    &vma->vm_file->f_mapping->i_mmap_lock >
+			    (unsigned long) i_mmap_lock_last)
+				i_mmap_lock =
+					&vma->vm_file->f_mapping->i_mmap_lock;
+		if (i_mmap_lock == (spinlock_t *) -1UL)
+			break;
+		i_mmap_lock_last = i_mmap_lock;
+		if (lock)
+			spin_lock(i_mmap_lock);
+		else
+			spin_unlock(i_mmap_lock);
+	}
+
+	anon_vma_lock_last = NULL;
+	for (;;) {
+		spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
+		for (vma = mm->mmap; vma; vma = vma->vm_next)
+			if (vma->anon_vma &&
+			    (unsigned long) anon_vma_lock >
+			    (unsigned long) &vma->anon_vma->lock &&
+			    (unsigned long) &vma->anon_vma->lock >
+			    (unsigned long) anon_vma_lock_last)
+				anon_vma_lock = &vma->anon_vma->lock;
+		if (anon_vma_lock == (spinlock_t *) -1UL)
+			break;
+		anon_vma_lock_last = anon_vma_lock;
+		if (lock)
+			spin_lock(anon_vma_lock);
+		else
+			spin_unlock(anon_vma_lock);
+	}
+}
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+void mm_lock(struct mm_struct * mm)
+{
+	down_write(&mm->mmap_sem);
+	mm_lock_unlock(mm, 1);
+}
+
+void mm_unlock(struct mm_struct *mm)
+{
+	mm_lock_unlock(mm, 0);
+	up_write(&mm->mmap_sem);
+}


From rdreier at cisco.com  Fri Apr  4 15:07:25 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 15:07:25 -0700
Subject: [ofa-general] [PATCH 2 of 2] mlx4: update module version and
	release date (for 2.6.25)
In-Reply-To: <200802271620.53952.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Wed, 27 Feb 2008 16:20:53 +0200")
References: <200802271620.53952.jackm@dev.mellanox.co.il>
Message-ID: <adasky1p942.fsf@cisco.com>

thanks, applied both this and mthca equivalent


From bs at q-leap.de  Fri Apr  4 15:12:39 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Sat, 5 Apr 2008 00:12:39 +0200
Subject: [ofa-general] XmtDiscards
Message-ID: <200804050012.39893.bs@q-leap.de>

Hello,

after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
much better there, at least no further RcvSwRelayErrors, even when the 
cluster is in idle state and so far also no SymbolErrors, which we also have 
seens before.

However, after I just started a lustre stress test on 50 clients (to a lustre 
storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
9000 XmtDiscards within 30 minutes.

Searching for this error I find "This is a symptom of congestion and may 
require tweaking either HOQ or switch lifetime values". 
Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
it. I also do not have an idea to set switch lifetime values.  I guess this 
isn't related to the opensm timeout option, is it?

Hmm, I just found a cisci pdf describing how to set the lifetime on these 
switches, but is this also possible on Flextronics switches?


Thanks for any help,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH


From rdreier at cisco.com  Fri Apr  4 15:27:06 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 04 Apr 2008 15:27:06 -0700
Subject: [ofa-general] Re: [PATCH] mlx4: make firmware diagnostic counters
	available via sysfs
In-Reply-To: <200804021615.44982.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Wed, 2 Apr 2008 16:15:44 +0300")
References: <200804021615.44982.jackm@dev.mellanox.co.il>
Message-ID: <adaod8pp879.fsf@cisco.com>

 > +int mlx4_query_diag_counters(struct mlx4_dev *dev, int array_length,
 > +			     int in_modifier, unsigned int in_offset[],
 > +			     u32 counter_out[])
 > +{
 > +	struct mlx4_cmd_mailbox *mailbox;
 > +	u32 *outbox;
 > +	u32 op_modifer = (u32)in_modifier;

This coding style looks strange to me... you have an int parameter
in_modifier that is not used for anything except to assign it to a u32
op_modifer [sic] variable with a (u32) cast that doesn't do anything.

Why not just have op_modifier be the parameter in the first place?

Also the array_length stuff looks kind of funny since you only ever pass
in a value of 1... why not just pass in int offset and u32 *counter?

 > +	/* clear counters file, can't read it */
 > +	if(offset < 0)
 > +		return sprintf(buf,"This file is write only\n");

Why not just set the permissions on the file so it can't be opened for
reading?  This just looks like a recipe for making userspace code go
crazy on unexpected input.

Also watch out for the space in "if ("

And if I'm understanding correctly, you use a magic offset of -1 for the
clear_diag attribute that makes mlx4_query_diag_counters() read before
the beginning of the output mailbox.

 > +err_diag:
 > +	ib_unregister_device(&ibdev->ib_dev);
 > +
 >  err_reg:
 >  	ib_unregister_device(&ibdev->ib_dev);

This doesn't look like a good idea.

 - R.


From boris at mellanox.com  Fri Apr  4 15:28:46 2008
From: boris at mellanox.com (Boris Shpolyansky)
Date: Fri, 4 Apr 2008 15:28:46 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <200804050012.39893.bs@q-leap.de>
Message-ID: <1E3DCD1C63492545881FACB6063A57C1023F6AE8@mtiexch01.mti.com>

Hi Bernd,

You can configure the HOQ (Head-Of-Queue-Lifetime) value programmed in
any switch in the fabric managed by OpenSM following these simple steps:

1. Stop the SM
/etc/init.d/opensmd stop

2. Run the SM manually with the "-c" option (to dump its default
configuration to a file)
opensm -c

3. Kill the SM with ^C

4. The configuration is saved in /var/cache/opensm/opensm.opts. Open the
file and look for head_of_queue_lifetime. Change the value and save the
file.

5. Restart the SM
/etc/init.d/opensmd start

P.S. You might find 'opensm -h' and 'man opensm' useful.


Hope this helps,

Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com


-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernd
Schubert
Sent: Friday, April 04, 2008 3:13 PM
To: OpenIB
Subject: [ofa-general] XmtDiscards

Hello,

after I upgraded one of our clusters to opensm-3.2.1 it seems to have
gotten much better there, at least no further RcvSwRelayErrors, even
when the cluster is in idle state and so far also no SymbolErrors, which
we also have seens before.

However, after I just started a lustre stress test on 50 clients (to a
lustre storage system with 20 OSS servers and 60 OSTs), ibcheckerrors
reports about 9000 XmtDiscards within 30 minutes.

Searching for this error I find "This is a symptom of congestion and may
require tweaking either HOQ or switch lifetime values". 
Well, I have to admit I neither know what HOQ is, nor do I know how to
tweak it. I also do not have an idea to set switch lifetime values.  I
guess this isn't related to the opensm timeout option, is it?

Hmm, I just found a cisci pdf describing how to set the lifetime on
these switches, but is this also possible on Flextronics switches?


Thanks for any help,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From weiny2 at llnl.gov  Fri Apr  4 15:29:32 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Fri, 4 Apr 2008 15:29:32 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <200804050012.39893.bs@q-leap.de>
References: <200804050012.39893.bs@q-leap.de>
Message-ID: <20080404152932.5e294e47.weiny2@llnl.gov>

On Sat, 5 Apr 2008 00:12:39 +0200
Bernd Schubert <bs at q-leap.de> wrote:

> Hello,
> 
> after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> much better there, at least no further RcvSwRelayErrors, even when the 
> cluster is in idle state and so far also no SymbolErrors, which we also have 
> seens before.
> 
> However, after I just started a lustre stress test on 50 clients (to a lustre 
> storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> 9000 XmtDiscards within 30 minutes.

Yea, those are bad.

> 
> Searching for this error I find "This is a symptom of congestion and may 
> require tweaking either HOQ or switch lifetime values". 
> Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> it. I also do not have an idea to set switch lifetime values.  I guess this 
> isn't related to the opensm timeout option, is it?

Yes you should adjust these values.

> 
> Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> switches, but is this also possible on Flextronics switches?
> 

I don't know about the Vendor SMs but in opensm look for the following options
in the opensm.opts file (Default path is: /var/cache/opensm):

   # The code of maximal time a packet can wait at the head of
   # transmission queue.
   # The actual time is 4.096usec * 2^<head_of_queue_lifetime>
   # The value 0x14 disables this mechanism
   head_of_queue_lifetime 0x12
   
   # The maximal time a packet can wait at the head of queue on
   # switch port connected to a CA or router port
   leaf_head_of_queue_lifetime 0x0c

Ira


From clameter at sgi.com  Fri Apr  4 15:30:48 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:48 -0700
Subject: [ofa-general] [patch 00/10] [RFC] EMM Notifier V3
Message-ID: <20080404223048.374852899@sgi.com>

V2->V3:
- Fix rcu issues
- Fix emm_referenced handling
- Use Andrea's mm_lock/unlock to prevent registration races.
- Keep simple API since there does not seem to be a need to add additional
  callbacks (mm_lock does not require callbacks like emm_start/stop that
  I envisioned).
- Reduce CC list (the volume we are producing here must be annoying...).

V1->V2:
- Additional optimizations in the VM
- Convert vm spinlocks to rw sems.
- Add XPMEM driver (requires sleeping in callbacks)
- Add XPMEM example

This patch implements a simple callback for device drivers that establish
their own references to pages (KVM, GRU, XPmem, RDMA/Infiniband, DMA engines
etc). These references are unknown to the VM (therefore external).

With these callbacks it is possible for the device driver to release external
references when the VM requests it. This enables swapping, page migration and
allows support of remapping, permission changes etc etc for the externally
mapped memory.

With this functionality it becomes also possible to avoid pinning or mlocking
pages (commonly done to stop the VM from unmapping device mapped pages).

A device driver must subscribe to a process using

        emm_register_notifier(struct emm_notifier *, struct mm_struct *)


The VM will then perform callbacks for operations that unmap or change
permissions of pages in that address space. When the process terminates
the callback function is called with emm_release.

Callbacks are performed before and after the unmapping action of the VM.

        emm_invalidate_start    before

        emm_invalidate_end      after

The device driver must hold off establishing new references to pages
in the range specified between a callback with emm_invalidate_start and
the subsequent call with emm_invalidate_end set. This allows the VM to
ensure that no concurrent driver actions are performed on an address
range while performing remapping or unmapping operations.


This patchset contains additional modifications needed to ensure
that the callbacks can sleep. For that purpose two key locks in the vm
need to be converted to rw_sems. These patches are brand new, invasive
and need extensive discussion and evaluation.

The first patch alone may be applied if callbacks in atomic context are
sufficient for a device driver (likely the case for KVM and GRU and simple
DMA drivers).

Following the VM modifications is the XPMEM device driver that allows sharing
of memory between processes running on different instances of Linux. This is
also a prototype. It is known to run trivial sample programs included as the last
patch.


-- 


From clameter at sgi.com  Fri Apr  4 15:30:49 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:49 -0700
Subject: [ofa-general] [patch 01/10] emm: mm_lock: Lock a process against
	reclaim
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223131.271668133@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: mm_lock_unlock
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/c3159a06/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:54 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:54 -0700
Subject: [ofa-general] [patch 06/10] emm: Convert anon_vma lock to rw_sem and
	refcount
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223132.477298248@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: emm_anon_vma_sem
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/2235275f/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:52 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:52 -0700
Subject: [ofa-general] [patch 04/10] emm: Convert i_mmap_lock to i_mmap_sem
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223131.999993077@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: emm_immap_sem
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/2c9c7c2a/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:58 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:58 -0700
Subject: [ofa-general] [patch 10/10] xpmem: Simple example
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223133.463091757@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_test
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/e5d79f10/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:50 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:50 -0700
Subject: [ofa-general] [patch 02/10] emm: notifier logic
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223131.469710551@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: emm_notifier
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/800ed556/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:56 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:56 -0700
Subject: [ofa-general] [patch 08/10] xpmem: Locking rules for taking multiple
	mmap_sem locks.
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223132.971442620@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_v003_lock-rule
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/2b7be0df/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:51 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:51 -0700
Subject: [ofa-general] [patch 03/10] emm: Move tlb flushing into
	free_pgtables
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223131.727813758@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: move_tlb_flush
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/13e165e2/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:57 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:57 -0700
Subject: [ofa-general] [patch 09/10] xpmem: The device driver
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223133.216189171@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_v003_emm_SSI_v3
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/2fb62b9a/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:53 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:53 -0700
Subject: [ofa-general] [patch 05/10] emm: Remove tlb pointer from the
	parameters of unmap vmas
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223132.259410373@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: cleanup_unmap_vmas
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/f76736e7/attachment.ksh>

From clameter at sgi.com  Fri Apr  4 15:30:55 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Fri, 04 Apr 2008 15:30:55 -0700
Subject: [ofa-general] [patch 07/10] xpmem: This patch exports zap_page_range
	as it is needed by XPMEM.
References: <20080404223048.374852899@sgi.com>
Message-ID: <20080404223132.734091146@sgi.com>

An embedded and charset-unspecified text was scrubbed...
Name: xpmem_v003_export-zap_page_range
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080404/5b536fc3/attachment.ksh>

From jeremy at goop.org  Fri Apr  4 16:12:42 2008
From: jeremy at goop.org (Jeremy Fitzhardinge)
Date: Fri, 04 Apr 2008 16:12:42 -0700
Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against
	reclaim
In-Reply-To: <20080404223131.271668133@sgi.com>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.271668133@sgi.com>
Message-ID: <47F6B5EA.6060106@goop.org>

Christoph Lameter wrote:
> Provide a way to lock an mm_struct against reclaim (try_to_unmap
> etc). This is necessary for the invalidate notifier approaches so
> that they can reliably add and remove a notifier.
>
> Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
> Signed-off-by: Christoph Lameter <clameter at sgi.com>
>
> ---
>  include/linux/mm.h |   10 ++++++++
>  mm/mmap.c          |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+)
>
> Index: linux-2.6/include/linux/mm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/mm.h	2008-04-02 11:41:47.741678873 -0700
> +++ linux-2.6/include/linux/mm.h	2008-04-04 15:02:17.660504756 -0700
> @@ -1050,6 +1050,16 @@ extern int install_special_mapping(struc
>  				   unsigned long addr, unsigned long len,
>  				   unsigned long flags, struct page **pages);
>  
> +/*
> + * Locking and unlocking am mm against reclaim.
> + *
> + * mm_lock will take mmap_sem writably (to prevent additional vmas from being
> + * added) and then take all mapping locks of the existing vmas. With that
> + * reclaim is effectively stopped.
> + */
> +extern void mm_lock(struct mm_struct *mm);
> +extern void mm_unlock(struct mm_struct *mm);
> +
>  extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
>  
>  extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
> Index: linux-2.6/mm/mmap.c
> ===================================================================
> --- linux-2.6.orig/mm/mmap.c	2008-04-04 14:55:03.477593980 -0700
> +++ linux-2.6/mm/mmap.c	2008-04-04 14:59:05.505395402 -0700
> @@ -2242,3 +2242,69 @@ int install_special_mapping(struct mm_st
>  
>  	return 0;
>  }
> +
> +static void mm_lock_unlock(struct mm_struct *mm, int lock)
> +{
> +	struct vm_area_struct *vma;
> +	spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
> +
> +	i_mmap_lock_last = NULL;
> +	for (;;) {
> +		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
> +		for (vma = mm->mmap; vma; vma = vma->vm_next)
> +			if (vma->vm_file && vma->vm_file->f_mapping &&
>   
I think you can break this if() down a bit:

			if (!(vma->vm_file && vma->vm_file->f_mapping))
				continue;


> +			    (unsigned long) i_mmap_lock >
> +			    (unsigned long)
> +			    &vma->vm_file->f_mapping->i_mmap_lock &&
> +			    (unsigned long)
> +			    &vma->vm_file->f_mapping->i_mmap_lock >
> +			    (unsigned long) i_mmap_lock_last)
> +				i_mmap_lock =
> +					&vma->vm_file->f_mapping->i_mmap_lock;
>   

So this is an O(n^2) algorithm to take the i_mmap_locks from low to high 
order?  A comment would be nice.  And O(n^2)?  Ouch.  How often is it 
called?

And is it necessary to mush lock and unlock together?  Unlock ordering 
doesn't matter, so you should just be able to have a much simpler loop, no?


> +		if (i_mmap_lock == (spinlock_t *) -1UL)
> +			break;
> +		i_mmap_lock_last = i_mmap_lock;
> +		if (lock)
> +			spin_lock(i_mmap_lock);
> +		else
> +			spin_unlock(i_mmap_lock);
> +	}
> +
> +	anon_vma_lock_last = NULL;
> +	for (;;) {
> +		spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
> +		for (vma = mm->mmap; vma; vma = vma->vm_next)
> +			if (vma->anon_vma &&
> +			    (unsigned long) anon_vma_lock >
> +			    (unsigned long) &vma->anon_vma->lock &&
> +			    (unsigned long) &vma->anon_vma->lock >
> +			    (unsigned long) anon_vma_lock_last)
> +				anon_vma_lock = &vma->anon_vma->lock;
> +		if (anon_vma_lock == (spinlock_t *) -1UL)
> +			break;
> +		anon_vma_lock_last = anon_vma_lock;
> +		if (lock)
> +			spin_lock(anon_vma_lock);
> +		else
> +			spin_unlock(anon_vma_lock);
> +	}
> +}
>   


> +
> +/*
> + * This operation locks against the VM for all pte/vma/mm related
> + * operations that could ever happen on a certain mm. This includes
> + * vmtruncate, try_to_unmap, and all page faults. The holder
> + * must not hold any mm related lock. A single task can't take more
> + * than one mm lock in a row or it would deadlock.
> + */
> +void mm_lock(struct mm_struct * mm)
> +{
> +	down_write(&mm->mmap_sem);
> +	mm_lock_unlock(mm, 1);
> +}
> +
> +void mm_unlock(struct mm_struct *mm)
> +{
> +	mm_lock_unlock(mm, 0);
> +	up_write(&mm->mmap_sem);
> +}
>
>   


From bs at q-leap.de  Fri Apr  4 16:21:11 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Sat, 5 Apr 2008 01:21:11 +0200
Subject: [ofa-general] XmtDiscards
In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1023F6AE8@mtiexch01.mti.com>
References: <200804050012.39893.bs@q-leap.de>
	<1E3DCD1C63492545881FACB6063A57C1023F6AE8@mtiexch01.mti.com>
Message-ID: <20080404232111.GA17576@lanczos.q-leap.de>

Hello Boris,


On Fri, Apr 04, 2008 at 03:28:46PM -0700, Boris Shpolyansky wrote:
> Hi Bernd,
> 
> You can configure the HOQ (Head-Of-Queue-Lifetime) value programmed in
> any switch in the fabric managed by OpenSM following these simple steps:
> 
> 1. Stop the SM
> /etc/init.d/opensmd stop
> 
> 2. Run the SM manually with the "-c" option (to dump its default
> configuration to a file)
> opensm -c
> 
> 3. Kill the SM with ^C
> 
> 4. The configuration is saved in /var/cache/opensm/opensm.opts. Open the
> file and look for head_of_queue_lifetime. Change the value and save the
> file.
> 
> 5. Restart the SM
> /etc/init.d/opensmd start

thanks a lot for your help. This did help quite a lot.

> 
> P.S. You might find 'opensm -h' and 'man opensm' useful.

Sorry about my dumb question, I did read the man page of opensm quite often 
already, but "--cache-options" and "OSM_CACHE_DIR" did activate my 
brain-internal filter to entirely skip this part of the man page ;)
Somehow I associated "cache" with "opensm-performance", but not at all with 
options...


Thanks again,
Bernd


From arlin.r.davis at intel.com  Fri Apr  4 16:40:43 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Fri, 4 Apr 2008 16:40:43 -0700
Subject: [ofa-general] [PATCH 2/4][v2] dapl: add support for logging errors
	in non-debug build.
Message-ID: <B0095134066CC94FBC80973103FFA1FE06C158B5@orsmsx416.amr.corp.intel.com>

Add debug logging (stdout, syslog) for error cases during
device open, cm, async, and dto operations. Default settings
are ERR for DAPL_DBG_TYPE, and stdout for DAPL_DBG_DEST.

Change default configuration to build non-debug.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 configure.in                   |    4 +-
 dapl/common/dapl_debug.c       |    2 -
 dapl/common/dapl_evd_util.c    |    8 +-
 dapl/include/dapl_debug.h      |   10 ++-
 dapl/openib_cma/dapl_ib_cm.c   |  196
+++++++++++++++++++++++-----------------
 dapl/openib_cma/dapl_ib_util.c |   87 +++++++++---------
 dapl/udapl/dapl_init.c         |   16 +++-
 dapl/udapl/linux/dapl_osd.h    |    2 +-
 8 files changed, 179 insertions(+), 146 deletions(-)

diff --git a/configure.in b/configure.in
index eaf597b..d1c2664 100644
--- a/configure.in
+++ b/configure.in
@@ -42,12 +42,12 @@ AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test
"$ac_cv_version_script" = "yes")
 
 dnl Support debug mode build - if enable-debug provided the DEBUG
variable is set 
 AC_ARG_ENABLE(debug,
-[  --enable-debug Turn on debug mode, default=on],
+[  --enable-debug Turn on debug mode, default=off],
 [case "${enableval}" in
   yes) debug=true ;;
   no)  debug=false ;;
   *) AC_MSG_ERROR(bad value ${enableval} for --enable-debug) ;;
-esac],[debug=true])
+esac],[debug=false])
 AM_CONDITIONAL(DEBUG, test x$debug = xtrue)
 
 dnl Support ib_extension build - if enable-ext-type == ib 
diff --git a/dapl/common/dapl_debug.c b/dapl/common/dapl_debug.c
index 7ddce52..cbc356c 100644
--- a/dapl/common/dapl_debug.c
+++ b/dapl/common/dapl_debug.c
@@ -32,7 +32,6 @@
 #include <stdlib.h>
 #endif /* __KDAPL__ */
 
-#ifdef DAPL_DBG
 DAPL_DBG_TYPE g_dapl_dbg_type;		/* initialized in dapl_init.c */
 DAPL_DBG_DEST g_dapl_dbg_dest;		/* initialized in dapl_init.c */
 
@@ -117,5 +116,4 @@ void dapl_dump_cntr( int cntr )
 }
 
 #endif /* DAPL_COUNTERS */
-#endif
 
diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c
index a993b02..2ae1b59 100755
--- a/dapl/common/dapl_evd_util.c
+++ b/dapl/common/dapl_evd_util.c
@@ -1209,10 +1209,10 @@ dapli_evd_cqe_to_event (
 	    dapl_os_unlock ( &ep_ptr->header.lock );
 	}
 
-	dapl_dbg_log (DAPL_DBG_TYPE_DTO_COMP_ERR,
-		      " DTO completion ERROR: %d: op %#x (ep
disconnected)\n",
-		      DAPL_GET_CQE_STATUS (cqe_ptr),
-		      DAPL_GET_CQE_OPTYPE (cqe_ptr));
+	dapl_log(DAPL_DBG_TYPE_ERR,
+		 "DTO completion ERR: status %d, opcode %s \n",
+		 DAPL_GET_CQE_STATUS(cqe_ptr),
+		 DAPL_GET_CQE_OP_STR(cqe_ptr));
     }
 }
 
diff --git a/dapl/include/dapl_debug.h b/dapl/include/dapl_debug.h
index 76db8fd..f0de7c8 100644
--- a/dapl/include/dapl_debug.h
+++ b/dapl/include/dapl_debug.h
@@ -75,14 +75,16 @@ typedef enum
     DAPL_DBG_DEST_SYSLOG  	= 0x0002,
 } DAPL_DBG_DEST;
 
-
-#if defined(DAPL_DBG)
-
 extern DAPL_DBG_TYPE 	g_dapl_dbg_type;
 extern DAPL_DBG_DEST 	g_dapl_dbg_dest;
 
+extern void dapl_internal_dbg_log(DAPL_DBG_TYPE type,  const char *fmt,
...);
+
+#define dapl_log g_dapl_dbg_type==0 ? (void) 1 : dapl_internal_dbg_log
+
+#if defined(DAPL_DBG)
+
 #define dapl_dbg_log g_dapl_dbg_type==0 ? (void) 1 :
dapl_internal_dbg_log
-extern void dapl_internal_dbg_log ( DAPL_DBG_TYPE type,  const char
*fmt,  ...);
 
 #else  /* !DAPL_DBG */
 
diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c
index a040ffb..33f299d 100755
--- a/dapl/openib_cma/dapl_ib_cm.c
+++ b/dapl/openib_cma/dapl_ib_cm.c
@@ -95,9 +95,9 @@ static void dapli_addr_resolve(struct dapl_cm_id
*conn)
 	
 	ret =  rdma_resolve_route(conn->cm_id, conn->route_timeout);
 	if (ret) {
-		dapl_dbg_log(DAPL_DBG_TYPE_ERR, 
-			     " rdma_connect failed:
%s\n",strerror(errno));
-
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			 " dapl_cma_connect: rdma_resolve_route ERR %d
%s\n",
+			 ret, strerror(errno));
 		dapl_evd_connection_callback(conn, 
 					     IB_CME_LOCAL_FAILURE, 
 					     NULL, conn->ep);
@@ -146,8 +146,9 @@ static void dapli_route_resolve(struct dapl_cm_id
*conn)
 
 	ret = rdma_connect(conn->cm_id, &conn->params);
 	if (ret) {
-		dapl_dbg_log(DAPL_DBG_TYPE_ERR, " rdma_connect failed:
%s\n",
-			     strerror(errno));
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			 " dapl_cma_connect: rdma_connect ERR %d %s\n",
+			 ret, strerror(errno));
 		goto bail;
 	}
 	return;
@@ -310,12 +311,15 @@ static void dapli_cm_active_cb(struct dapl_cm_id
*conn,
 	case RDMA_CM_EVENT_UNREACHABLE:
 	case RDMA_CM_EVENT_CONNECT_ERROR:
 	{
-		dapl_dbg_log(
-                        DAPL_DBG_TYPE_WARN,
-                        " dapli_cm_active_handler: CONN_ERR "
-                        " event=0x%x status=%d %s\n",
+		dapl_log(DAPL_DBG_TYPE_WARN,
+                        "dapl_cma_active: CONN_ERR event=0x%x"
+                        " status=%d %s DST %s, %d\n", 
                         event->event, event->status,
-                        (event->status == -ETIMEDOUT)?"TIMEOUT":"" );
+                        (event->status == -ETIMEDOUT)?"TIMEOUT":"", 
+			inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr),
+			ntohs(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_port));
 
 		/* per DAT SPEC provider always returns UNREACHABLE */
 		dapl_evd_connection_callback(conn, 
@@ -327,36 +331,47 @@ static void dapli_cm_active_cb(struct dapl_cm_id
*conn,
 	{
 		ib_cm_events_t cm_event;
 
-		/* no device type specified so assume IB for now */
-		if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED
*/
-			cm_event =
IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
-		else 
-			cm_event = IB_CME_DESTINATION_REJECT;
-
 		dapl_dbg_log(
 			DAPL_DBG_TYPE_CM,
 			" dapli_cm_active_handler: REJECTED
reason=%d\n",	
 			event->status);
-		
+
+		/* valid REJ from consumer will always contain private
data */
+		if (event->status == 28 &&
+			event->param.conn.private_data_len) 
+			cm_event =
IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
+		else {
+			cm_event = IB_CME_DESTINATION_REJECT;
+			dapl_log(DAPL_DBG_TYPE_WARN, 
+				"dapl_cma_active: non-consumer REJ,"
+				" reason=%d, DST %s, %d\n", 
+				event->status,
+				inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr),
+				ntohs(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_port));
+		}		
 		dapl_evd_connection_callback(conn, cm_event, NULL,
conn->ep);
 		break;
 	}
 	case RDMA_CM_EVENT_ESTABLISHED:
-			
 		dapl_dbg_log(DAPL_DBG_TYPE_CM, 
-     		     " active_cb: cm_id %d PORT %d CONNECTED to
0x%x!\n",
+     		     " active_cb: cm_id %d PORT %d CONNECTED to %s!\n",
      		     conn->cm_id,
 		     ntohs(((struct sockaddr_in *)
 			&conn->cm_id->route.addr.dst_addr)->sin_port),
-		     ntohl(((struct sockaddr_in *)
-
&conn->cm_id->route.addr.dst_addr)->sin_addr.s_addr));
+		     inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr));
 
 		/* setup local and remote ports for ep query */
-		conn->ep->param.remote_port_qual =
PORT_TO_SID(rdma_get_dst_port(conn->cm_id));
-		conn->ep->param.local_port_qual =
PORT_TO_SID(rdma_get_src_port(conn->cm_id));
+		conn->ep->param.remote_port_qual = 
+			PORT_TO_SID(rdma_get_dst_port(conn->cm_id));
+		conn->ep->param.local_port_qual = 
+			PORT_TO_SID(rdma_get_src_port(conn->cm_id));
 
 		dapl_evd_connection_callback(conn, IB_CME_CONNECTED,
-
event->param.conn.private_data, conn->ep);
+
event->param.conn.private_data, 
+					     conn->ep);
 		break;
 
 	case RDMA_CM_EVENT_DISCONNECTED:
@@ -383,9 +398,6 @@ static void dapli_cm_passive_cb(struct dapl_cm_id
*conn,
 			       struct rdma_cm_event *event)
 {
 	struct dapl_cm_id *new_conn;
-#ifdef DAPL_DBG
-	struct rdma_addr *ipaddr = &conn->cm_id->route.addr;
-#endif
 
 	dapl_dbg_log(DAPL_DBG_TYPE_CM, 
 		     " passive_cb: conn %p id %d event %d\n",
@@ -410,57 +422,43 @@ static void dapli_cm_passive_cb(struct dapl_cm_id
*conn,
 		break;
 	case RDMA_CM_EVENT_UNREACHABLE:
 	case RDMA_CM_EVENT_CONNECT_ERROR:
-
-		dapl_dbg_log(
-                        DAPL_DBG_TYPE_WARN,
-                        " dapli_cm_passive: CONN_ERR "
-                        " event=0x%x status=%d %s"
-                        " on SRC 0x%x,0x%x DST 0x%x,0x%x\n",
+		dapl_log(DAPL_DBG_TYPE_WARN,
+                        "dapl_cm_passive: CONN_ERR event=0x%x status=%d
%s,"
+			" DST %s,%d\n", 
                         event->event, event->status,
-                        (event->status == -110)?"TIMEOUT":"",
-                        ntohl(((struct sockaddr_in *)
-                                &ipaddr->src_addr)->sin_addr.s_addr),
-                        ntohs(((struct sockaddr_in *)
-                                &ipaddr->src_addr)->sin_port),
-                        ntohl(((struct sockaddr_in *)
-                                &ipaddr->dst_addr)->sin_addr.s_addr),
-                        ntohs(((struct sockaddr_in *)
-                                &ipaddr->dst_addr)->sin_port));
+                        (event->status == -ETIMEDOUT)?"TIMEOUT":"", 
+			inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr),
+			ntohs(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_port));
 
 		dapls_cr_callback(conn, IB_CME_DESTINATION_UNREACHABLE,
-				 NULL, conn->sp);
+				  NULL, conn->sp);
 		break;
 
 	case RDMA_CM_EVENT_REJECTED:
 	{
 		ib_cm_events_t cm_event;
 
-		/* no device type specified so assume IB for now */
-		if (event->status == 28) /* IB_CM_REJ_CONSUMER_DEFINED
*/
+		/* valid REJ from consumer will always contain private
data */
+		if (event->status == 28 &&
+			event->param.conn.private_data_len) 
 			cm_event =
IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
-		else 
+		else {
 			cm_event = IB_CME_DESTINATION_REJECT;
-
-		dapl_dbg_log(
-			DAPL_DBG_TYPE_WARN, 
-			" dapli_cm_passive: REJECTED reason=%d"
-			" on SRC 0x%x,0x%x DST 0x%x,0x%x\n", 
-			event->status,
-			ntohl(((struct sockaddr_in *)
-				&ipaddr->src_addr)->sin_addr.s_addr),
-			ntohs(((struct sockaddr_in *)
-				&ipaddr->src_addr)->sin_port),
-			ntohl(((struct sockaddr_in *)
-				&ipaddr->dst_addr)->sin_addr.s_addr),
-			ntohs(((struct sockaddr_in *)
-				&ipaddr->dst_addr)->sin_port));
-		
+			dapl_log(DAPL_DBG_TYPE_WARN, 
+				"dapl_cm_active: non-consumer REJ,
reason=%d,"
+				" DST %s, %d\n", 
+				event->status,
+				inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr),
+				ntohs(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_port));
+		}
 		dapls_cr_callback(conn, cm_event, NULL, conn->sp);
-		
 		break;
 	}
 	case RDMA_CM_EVENT_ESTABLISHED:
-		
 		dapl_dbg_log(DAPL_DBG_TYPE_CM, 
      		     " passive_cb: cm_id %p PORT %d CONNECTED from
0x%x!\n",
      		     conn->cm_id,
@@ -559,9 +557,12 @@ DAT_RETURN dapls_ib_connect(IN DAT_EP_HANDLE
ep_handle,
 
 	if (rdma_resolve_addr(conn->cm_id, NULL, 
 			      (struct sockaddr *)&conn->r_addr, 
-			      conn->arp_timeout))
+			      conn->arp_timeout)) {
+		dapl_log(DAPL_DBG_TYPE_ERR,
+			 " dapl_cma_connect: rdma_resolve_addr ERR
%s\n", 
+			 strerror(errno));
 		return dapl_convert_errno(errno,"ib_connect");
-
+	}
 	dapl_dbg_log(DAPL_DBG_TYPE_CM, 
 		" connect: resolve_addr: cm_id %p -> %s port %d\n", 
 		conn->cm_id, 
@@ -815,9 +816,9 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE
cr_handle,
 		 */
 		dat_status = dapls_ib_qp_alloc(ia_ptr, ep_ptr, NULL);
 		if (dat_status != DAT_SUCCESS) {
-			dapl_dbg_log(DAPL_DBG_TYPE_ERR,
-				     " accept: ib_qp_alloc failed:
%d\n",
-				     dat_status);
+			dapl_log(DAPL_DBG_TYPE_ERR,
+				 " dapl_cma_accept: qp_alloc ERR %d\n",
+				 dat_status);
 			goto bail;
 		}
 	}
@@ -835,11 +836,12 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE
cr_handle,
 		ep_ptr->qp_handle->cm_id->qp = NULL;
 		dapli_destroy_conn(ep_ptr->qp_handle);
 	} else {
-		dapl_dbg_log(DAPL_DBG_TYPE_ERR, 
-			" accept: ERR dev(%p!=%p) or port
mismatch(%d!=%d)\n", 
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			" dapl_cma_accept: ERR dev(%p!=%p) or"
+			" port mismatch(%d!=%d)\n", 
 
ep_ptr->qp_handle->cm_id->verbs,cr_conn->cm_id->verbs,
-			ep_ptr->qp_handle->cm_id->port_num,
-			cr_conn->cm_id->port_num );
+			ntohs(ep_ptr->qp_handle->cm_id->port_num),
+			ntohs(cr_conn->cm_id->port_num));
 		dat_status = DAT_INTERNAL_ERROR;
 		goto bail;
 	}
@@ -850,7 +852,8 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE
cr_handle,
 
 	ret = rdma_accept(cr_conn->cm_id, &cr_conn->params);
 	if (ret) {
-		dapl_dbg_log(DAPL_DBG_TYPE_ERR," accept: ERROR %d\n",
ret);
+		dapl_log(DAPL_DBG_TYPE_ERR," dapl_cma_accept: ERR %d
%s\n", 
+			 ret, strerror(errno));
 		dat_status = dapl_convert_errno(ret, "accept");
 		goto bail;
 	}
@@ -909,6 +912,10 @@ dapls_ib_reject_connection(
 		return DAT_SUCCESS;
 	}
 
+	/*
+         * Private data is needed so peer can determine real
application 
+	 * reject from an abnormal application termination
+	 */
 	ret = rdma_reject(cm_handle->cm_id, NULL, 0);
 
 	dapli_destroy_conn(cm_handle);
@@ -1163,11 +1170,12 @@ void dapli_cma_event_cb(void)
 			break;
 
 		case RDMA_CM_EVENT_ADDR_ERROR:
-			dapl_dbg_log(DAPL_DBG_TYPE_WARN,
-				     " CM ADDR ERROR: -> %s retry
(%d)..\n", 
-				     inet_ntoa(((struct sockaddr_in *)
+			dapl_log(DAPL_DBG_TYPE_WARN,
+				 "dapl_cma_active: CM ADDR ERROR: ->"
+				 " DST %s retry (%d)..\n", 
+				 inet_ntoa(((struct sockaddr_in *)
 					&conn->r_addr)->sin_addr),
-					conn->arp_retries);
+				 conn->arp_retries);
 			
 			/* retry address resolution */
 			if ((--conn->arp_retries) && 
@@ -1188,27 +1196,47 @@ void dapli_cma_event_cb(void)
 				}
 			} 
 			/* retries exhausted or resolve_addr failed */
+			dapl_log(DAPL_DBG_TYPE_ERR,
+				"dapl_cma_active: ARP_ERR, retries(%d)"
+				" exhausted -> DST %s,%d\n", 
+                        	IB_ARP_RETRY_COUNT, 
+				inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr),
+				ntohs(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_port));
+
 			dapl_evd_connection_callback(
 				conn, IB_CME_DESTINATION_UNREACHABLE, 
 				NULL, conn->ep);
 			break;
 
-
 		case RDMA_CM_EVENT_ROUTE_ERROR:
-			dapl_dbg_log(DAPL_DBG_TYPE_WARN, 
-				     " CM ROUTE ERROR: -> %s retry
(%d)..\n", 
-				     inet_ntoa(((struct sockaddr_in *)
+			dapl_log(DAPL_DBG_TYPE_WARN, 
+				 "dapl_cma_active: CM ROUTE ERROR: ->"
+				 " DST %s retry (%d)..\n", 
+				 inet_ntoa(((struct sockaddr_in *)
 					&conn->r_addr)->sin_addr),
-				     conn->route_retries );
+				 conn->route_retries );
 
 			/* retry route resolution */
 			if ((--conn->route_retries) && 
 				(event->status == -ETIMEDOUT))
 				dapli_addr_resolve(conn);
-			else 
-				dapl_evd_connection_callback( conn, 
+			else {
+			    dapl_log(DAPL_DBG_TYPE_ERR,
+                        	"dapl_cma_active: PATH_RECORD_ERR,"
+				" retries(%d) exhausted, DST %s,%d\n",
+                        	IB_ROUTE_RETRY_COUNT, 
+				inet_ntoa(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_addr),
+				ntohs(((struct sockaddr_in *)
+
&conn->cm_id->route.addr.dst_addr)->sin_port));
+
+				dapl_evd_connection_callback( 
+					conn, 
 					IB_CME_DESTINATION_UNREACHABLE, 
 					NULL, conn->ep);
+			}
 			break;
 		
 		case RDMA_CM_EVENT_DEVICE_REMOVAL:
diff --git a/dapl/openib_cma/dapl_ib_util.c
b/dapl/openib_cma/dapl_ib_util.c
index e900b59..fcd8163 100755
--- a/dapl/openib_cma/dapl_ib_util.c
+++ b/dapl/openib_cma/dapl_ib_util.c
@@ -113,9 +113,10 @@ static int getipaddr(char *name, char *addr, int
len)
 		/* retry using network device name */
 		ret = getipaddr_netdev(name,addr,len);
 		if (ret) {
-			dapl_dbg_log(DAPL_DBG_TYPE_WARN, 
-			     " getipaddr: invalid name, addr, or
netdev(%s)\n",
-			     name);
+			dapl_log(DAPL_DBG_TYPE_ERR, 
+				 " open_hca: getaddr_netdev ERROR:"
+				 " %s. Is %s configured?\n", 
+				 strerror(errno), name);
 			return ret;
 		}
 	} else {
@@ -238,18 +239,19 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME
hca_name, IN DAPL_HCA *hca_ptr)
 
 	/* cm_id will bind local device/GID based on IP address */
 	if (rdma_create_id(g_cm_events, &cm_id, (void*)hca_ptr,
RDMA_PS_TCP)) {
-		dapl_dbg_log (DAPL_DBG_TYPE_ERR, 
-				" open_hca: ERR with RDMA channel:
%s\n",
-				strerror(errno));
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			 " open_hca: rdma_create_id ERR %s\n", 
+			 strerror(errno));
 		return DAT_INTERNAL_ERROR;
 	}
 	ret = rdma_bind_addr(cm_id,
 			     (struct sockaddr *)&hca_ptr->hca_address);
 	if ((ret) || (cm_id->verbs == NULL)) {
                 rdma_destroy_id(cm_id); 
-		dapl_dbg_log(DAPL_DBG_TYPE_UTIL, 
-			     " open_hca: ERR bind (%d) %s \n", 
-			     ret, strerror(-ret));
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			 " open_hca: rdma_bind ERR %s."
+			 " Is %s configured?\n", 
+			 strerror(errno),hca_name);
 		return DAT_INVALID_ADDRESS;
 	}
 
@@ -282,9 +284,9 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME
hca_name, IN DAPL_HCA *hca_ptr)
 	hca_ptr->ib_trans.ib_cq = 
 		ibv_create_comp_channel(hca_ptr->ib_hca_handle);
 	if (hca_ptr->ib_trans.ib_cq == NULL) {
-		dapl_dbg_log (DAPL_DBG_TYPE_ERR, 
-			" open_hca: ERR with CQ channel: %s\n",
-			strerror(errno));
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			 " open_hca: ibv_create_comp_channel ERR %s\n",
+			 strerror(errno));
 		goto bail;
 	}
 	dapl_dbg_log (DAPL_DBG_TYPE_UTIL, 
@@ -294,9 +296,10 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME
hca_name, IN DAPL_HCA *hca_ptr)
 	opts = fcntl(hca_ptr->ib_trans.ib_cq->fd, F_GETFL); /* uCQ */
 	if (opts < 0 || fcntl(hca_ptr->ib_trans.ib_cq->fd, 
 			      F_SETFL, opts | O_NONBLOCK) < 0) {
-		dapl_dbg_log (DAPL_DBG_TYPE_ERR, 
-			      " open_hca: ERR with CQ FD (%d)\n", 
-			      hca_ptr->ib_trans.ib_cq->fd);
+		dapl_log(DAPL_DBG_TYPE_ERR, 
+			 " open_hca: fcntl on ib_cq->fd %d ERR %d %s\n",

+			 hca_ptr->ib_trans.ib_cq->fd, opts,
+			 strerror(errno));
 		goto bail;
 	}
 	
@@ -453,19 +456,13 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA
*hca_ptr,
 		ia_attr->ia_address_ptr = 
 			(DAT_IA_ADDRESS_PTR)&hca_ptr->hca_address;
 
-		dapl_dbg_log(DAPL_DBG_TYPE_UTIL, 
-			" query_hca: %s %s  %d.%d.%d.%d\n",
hca_ptr->name,
+		dapl_log(DAPL_DBG_TYPE_UTIL, 
+			"dapl_query_hca: %s %s %s\n", hca_ptr->name,
 			((struct sockaddr_in *)
 			ia_attr->ia_address_ptr)->sin_family == AF_INET
? 
 			"AF_INET":"AF_INET6",
-			((struct sockaddr_in *)
-			ia_attr->ia_address_ptr)->sin_addr.s_addr >> 0 &
0xff,
-			((struct sockaddr_in *)
-			ia_attr->ia_address_ptr)->sin_addr.s_addr >> 8 &
0xff,
-			((struct sockaddr_in *)
-			ia_attr->ia_address_ptr)->sin_addr.s_addr >> 16
& 0xff,
-			((struct sockaddr_in *)
-			ia_attr->ia_address_ptr)->sin_addr.s_addr >> 24
& 0xff);
+			inet_ntoa(((struct sockaddr_in *)
+				ia_attr->ia_address_ptr)->sin_addr));
 		
 		ia_attr->hardware_version_major = dev_attr.hw_ver;
 		ia_attr->max_eps                  = dev_attr.max_qp;
@@ -500,14 +497,15 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA
*hca_ptr,
 		ia_attr->extension_supported = DAT_EXTENSION_IB;
 		ia_attr->extension_version = DAT_IB_EXTENSION_VERSION;
 #endif
-		dapl_dbg_log(DAPL_DBG_TYPE_UTIL, 
-			" query_hca: (ver=%x) ep %d ep_q %d evd %d evd_q
%d\n", 
+		dapl_log(DAPL_DBG_TYPE_UTIL, 
+			"dapl_query_hca: (ver=%x) ep's %d ep_q %d"
+			" evd's %d evd_q %d\n", 
 			ia_attr->hardware_version_major,
 			ia_attr->max_eps, ia_attr->max_dto_per_ep,
 			ia_attr->max_evds, ia_attr->max_evd_qlen );
-		dapl_dbg_log(DAPL_DBG_TYPE_UTIL, 
-			" query_hca: msg %llu rdma %llu iov %d lmr %d
rmr %d"
-			" rd_io %d inline=%d\n", 
+		dapl_log(DAPL_DBG_TYPE_UTIL, 
+			"dapl_query_hca: msg %llu rdma %llu iov's %d"
+			" lmr %d rmr %d rd_io %d inline=%d\n", 
 			ia_attr->max_mtu_size, ia_attr->max_rdma_size,
 			ia_attr->max_iov_segments_per_dto,
ia_attr->max_lmrs, 
 			ia_attr->max_rmrs,
ia_attr->max_rdma_read_per_ep_in,
@@ -526,8 +524,9 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr,
 		ep_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom;
 		ep_attr->max_rdma_read_iov= dev_attr.max_sge;
 		ep_attr->max_rdma_write_iov= dev_attr.max_sge;
-		dapl_dbg_log(DAPL_DBG_TYPE_UTIL, 
-			" query_hca: MAX msg %llu dto %d iov %d rdma
i%d,o%d\n", 
+		dapl_log(DAPL_DBG_TYPE_UTIL, 
+			"dapl_query_hca: MAX msg %llu dto %d iov %d"
+			" rdma i%d,o%d\n", 
 			ep_attr->max_mtu_size,
 			ep_attr->max_recv_dtos, ep_attr->max_recv_iov,
 			ep_attr->max_rdma_read_in,
ep_attr->max_rdma_read_out);
@@ -708,9 +707,9 @@ void dapli_async_event_cb(struct _ib_hca_transport
*hca)
 			struct dapl_ep *evd_ptr = 
 				event.element.cq->cq_context;
 
-			dapl_dbg_log(
-				DAPL_DBG_TYPE_WARN,
-				" async_event CQ (%p) ERR %d\n",
+			dapl_log(
+				DAPL_DBG_TYPE_ERR,
+				"dapl async_event CQ (%p) ERR %d\n",
 				evd_ptr, event.event_type);

 			
 			/* report up if async callback still setup */
@@ -724,7 +723,7 @@ void dapli_async_event_cb(struct _ib_hca_transport
*hca)
 		case	IBV_EVENT_COMM_EST:
 		{
 			/* Received msgs on connected QP before RTU */
-			dapl_dbg_log(
+			dapl_log(
 				DAPL_DBG_TYPE_UTIL,
 				" async_event COMM_EST(%p) rdata beat
RTU\n",
 				event.element.qp);	
@@ -742,9 +741,9 @@ void dapli_async_event_cb(struct _ib_hca_transport
*hca)
 			struct dapl_ep *ep_ptr = 
 				event.element.qp->qp_context;
 
-			dapl_dbg_log(
-				DAPL_DBG_TYPE_WARN,
-				" async_event QP (%p) ERR %d\n",
+			dapl_log(
+				DAPL_DBG_TYPE_ERR,
+				"dapl async_event QP (%p) ERR %d\n",
 				ep_ptr, event.event_type);	
 			
 			/* report up if async callback still setup */
@@ -764,8 +763,8 @@ void dapli_async_event_cb(struct _ib_hca_transport
*hca)
 		case	IBV_EVENT_PKEY_CHANGE:
 		case	IBV_EVENT_SM_CHANGE:
 		{
-			dapl_dbg_log(DAPL_DBG_TYPE_WARN,
-				     " async_event: DEV ERR %d\n",
+			dapl_log(DAPL_DBG_TYPE_WARN,
+				     "dapl async_event: DEV ERR %d\n",
 				     event.event_type);	
 
 			/* report up if async callback still setup */
@@ -778,13 +777,13 @@ void dapli_async_event_cb(struct _ib_hca_transport
*hca)
 		}
 		case	IBV_EVENT_CLIENT_REREGISTER:
 			/* no need to report this event this time */
-			dapl_dbg_log (DAPL_DBG_TYPE_WARN,
+			dapl_log (DAPL_DBG_TYPE_UTIL,
 				     " async_event:
IBV_EVENT_CLIENT_REREGISTER\n");
 			break;
 
 		default:
-			dapl_dbg_log (DAPL_DBG_TYPE_WARN,
-				     " async_event: %d UNKNOWN\n", 
+			dapl_log (DAPL_DBG_TYPE_WARN,
+				     "dapl async_event: %d UNKNOWN\n", 
 				     event.event_type);
 			break;
 		
diff --git a/dapl/udapl/dapl_init.c b/dapl/udapl/dapl_init.c
index ce92f9f..a4afba5 100644
--- a/dapl/udapl/dapl_init.c
+++ b/dapl/udapl/dapl_init.c
@@ -70,16 +70,19 @@ void dapl_init ( void )
 {
     DAT_RETURN		dat_status;
 
-#if defined(DAPL_DBG)
-    dapl_dbg_log (DAPL_DBG_TYPE_UTIL, "DAPL:  (dapl_init)\n");
-
     /* set up debug type */
     g_dapl_dbg_type = dapl_os_get_env_val ( "DAPL_DBG_TYPE", 
-					    DAPL_DBG_TYPE_ERR |
DAPL_DBG_TYPE_WARN);
+					    DAPL_DBG_TYPE_ERR );
     /* set up debug destination */
     g_dapl_dbg_dest = dapl_os_get_env_val ( "DAPL_DBG_DEST", 
 					    DAPL_DBG_DEST_STDOUT );
-#endif /* DAPL_DBG */
+
+    /* open log file on first logging call if necessary */
+    if (g_dapl_dbg_dest & DAPL_DBG_DEST_SYSLOG)
+	openlog("libdapl", LOG_ODELAY|LOG_PID|LOG_CONS, LOG_USER);
+	
+    dapl_log (DAPL_DBG_TYPE_UTIL, "dapl_init:
dbg_type=0x%x,dbg_dest=0x%x\n",
+	      g_dapl_dbg_type, g_dapl_dbg_dest);
 
     /* See if the user is on a loopback setup */
     g_dapl_loopback_connection = dapl_os_get_env_bool ( "DAPL_LOOPBACK"
);
@@ -156,6 +159,9 @@ void dapl_fini ( void )
 
     dapl_dbg_log (DAPL_DBG_TYPE_UTIL, "DAPL: Exit (dapl_fini)\n");
 
+    if (g_dapl_dbg_dest & DAPL_DBG_DEST_SYSLOG)
+	closelog();
+
     return;
 }
 
diff --git a/dapl/udapl/linux/dapl_osd.h b/dapl/udapl/linux/dapl_osd.h
index caf971f..42ced41 100644
--- a/dapl/udapl/linux/dapl_osd.h
+++ b/dapl/udapl/linux/dapl_osd.h
@@ -541,7 +541,7 @@ dapl_os_strtol(const char *nptr, char **endptr, int
base)
 #define dapl_os_assert(expression)	assert(expression)
 #define dapl_os_printf(...) 		printf(__VA_ARGS__)
 #define dapl_os_vprintf(fmt,args)	vprintf(fmt,args)
-#define dapl_os_syslog(fmt,args)	vsyslog (LOG_USER |
LOG_DEBUG,fmt,args)
+#define dapl_os_syslog(fmt,args)
vsyslog(LOG_USER|LOG_WARNING,fmt,args)
 
 
-- 
1.5.2.5


From arlin.r.davis at intel.com  Fri Apr  4 16:41:17 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Fri, 4 Apr 2008 16:41:17 -0700
Subject: [ofa-general] [PATCH 4/4][v2] dapl: update vendor information for
	OFA v2 provider
Message-ID: <B0095134066CC94FBC80973103FFA1FE06C158B8@orsmsx416.amr.corp.intel.com>


Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/include/dapl_vendor.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dapl/include/dapl_vendor.h b/dapl/include/dapl_vendor.h
index e87467a..f6d3cc0 100644
--- a/dapl/include/dapl_vendor.h
+++ b/dapl/include/dapl_vendor.h
@@ -52,14 +52,14 @@
  * Product name of the adapter.
  * Returned in DAT_IA_ATTR.adapter_name
  */
-#define VN_ADAPTER_NAME		"Generic InfiniBand HCA"
+#define VN_ADAPTER_NAME		"Generic OpenFabrics HCA"
 
 
 /*
  * Vendor name
  * Returned in DAT_IA_ATTR.vendor_name
  */
-#define VN_VENDOR_NAME		"DAPL Reference Implementation"
+#define VN_VENDOR_NAME		"DAPL OpenFabrics Implementation"
 
 
 /**********************************************************************
@@ -78,7 +78,7 @@
  * DAT_PROVIDER_ATTR.provider_version_minor
  */
 
-#define VN_PROVIDER_MAJOR	1
+#define VN_PROVIDER_MAJOR	2
 #define VN_PROVIDER_MINOR	0
 
 /*
-- 
1.5.2.5


From arlin.r.davis at intel.com  Fri Apr  4 16:41:05 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Fri, 4 Apr 2008 16:41:05 -0700
Subject: [ofa-general] [PATCH 3/4][v2] dapl: add provider vendor revision
	data in private data with reject
Message-ID: <B0095134066CC94FBC80973103FFA1FE06C158B7@orsmsx416.amr.corp.intel.com>

Add 1 byte header containing provider/vendor major revision
to distinguish between consumer and non-consumer rejects.
Validate size of consumer reject privated data.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_cm.c   |   39
++++++++++++++++++++++++++++++++-------
 dapl/openib_cma/dapl_ib_util.h |    2 +-
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c
index 33f299d..dcdcc5b 100755
--- a/dapl/openib_cma/dapl_ib_cm.c
+++ b/dapl/openib_cma/dapl_ib_cm.c
@@ -45,6 +45,7 @@
 #include "dapl_cr_util.h"
 #include "dapl_name_service.h"
 #include "dapl_ib_util.h"
+#include "dapl_vendor.h"
 #include <sys/poll.h>
 #include <signal.h>
 #include <sys/socket.h>
@@ -79,6 +80,14 @@ static inline uint64_t cpu_to_be64(uint64_t x) {
return x; }
 
 #define PORT_TO_SID(p) ntohs(p)
 
+/* private data header to validate consumer rejects versus abnormal
events */
+struct dapl_pdata_hdr {
+	uint8_t  version;
+};
+static struct dapl_pdata_hdr pdata_hdr = {
+	.version = VN_PROVIDER_MAJOR 
+};
+
 static void dapli_addr_resolve(struct dapl_cm_id *conn)
 {
 	int ret;
@@ -900,6 +909,7 @@ dapls_ib_reject_connection(
 	IN const DAT_PVOID private_data)
 {
     	int ret;
+	int offset = sizeof(struct dapl_pdata_hdr);
 
 	dapl_dbg_log(DAPL_DBG_TYPE_CM,
 		     " reject(cm_handle %p reason %x)\n",
@@ -909,14 +919,29 @@ dapls_ib_reject_connection(
 		dapl_dbg_log(DAPL_DBG_TYPE_ERR,
 			     " reject: invalid handle: reason %d\n",
 			     reason);
-		return DAT_SUCCESS;
+		return DAT_ERROR
(DAT_INVALID_HANDLE,DAT_INVALID_HANDLE_CR);
 	}
-
+    
+        if (private_data_size > 
+		dapls_ib_private_data_size(
+			NULL, IB_MAX_REJ_PDATA_SIZE, cm_handle->hca))
+		return DAT_ERROR(DAT_INVALID_PARAMETER,
DAT_INVALID_ARG3);
+	
+	/* setup pdata_hdr and users data, in CR pdata buffer */
+	dapl_os_memcpy(cm_handle->p_data, &pdata_hdr, offset);
+	if (private_data_size)
+        	dapl_os_memcpy(cm_handle->p_data+offset,
+			       private_data, 
+			       private_data_size);
+	
 	/*
-         * Private data is needed so peer can determine real
application 
-	 * reject from an abnormal application termination
+	 * Always some private data with reject so active peer can
+         * determine real application reject from an abnormal 
+	 * application termination
 	 */
-	ret = rdma_reject(cm_handle->cm_id, NULL, 0);
+	ret = rdma_reject(cm_handle->cm_id, 
+			  cm_handle->p_data, 
+			  offset+private_data_size);
 
 	dapli_destroy_conn(cm_handle);
 	return dapl_convert_errno(ret, "reject");
@@ -1005,7 +1030,7 @@ int dapls_ib_private_data_size(	IN DAPL_PRIVATE
*prd_ptr,
 
         if (hca_ptr->ib_hca_handle->device->transport_type 
 					== IBV_TRANSPORT_IWARP)
-		return(IWARP_MAX_PDATA_SIZE);
+		return(IWARP_MAX_PDATA_SIZE-sizeof(struct
dapl_pdata_hdr));
 
 	switch(conn_op)	{
 
@@ -1016,7 +1041,7 @@ int dapls_ib_private_data_size(	IN DAPL_PRIVATE
*prd_ptr,
 		size = IB_MAX_REP_PDATA_SIZE;
 		break;
 	case DAPL_PDATA_CONN_REJ:
-		size = IB_MAX_REJ_PDATA_SIZE;
+		size = IB_MAX_REJ_PDATA_SIZE-sizeof(struct
dapl_pdata_hdr);
 		break;
 	case DAPL_PDATA_CONN_DREQ:
 		size = IB_MAX_DREQ_PDATA_SIZE;
diff --git a/dapl/openib_cma/dapl_ib_util.h
b/dapl/openib_cma/dapl_ib_util.h
index f35cb9d..370f3b1 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -181,7 +181,7 @@ struct dapl_cm_id {
 	struct rdma_conn_param		params;
 	DAT_SOCK_ADDR6			r_addr;
 	int				p_len;
-	unsigned char			p_data[IB_MAX_DREP_PDATA_SIZE];
+	unsigned char			p_data[256]; /* dapl max private
data size */
 };
 
 typedef struct dapl_cm_id	*dp_ib_cm_handle_t;
-- 
1.5.2.5


From bs at q-leap.de  Fri Apr  4 16:45:47 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Sat, 5 Apr 2008 01:45:47 +0200
Subject: [ofa-general] XmtDiscards
In-Reply-To: <20080404152932.5e294e47.weiny2@llnl.gov>
References: <200804050012.39893.bs@q-leap.de>
	<20080404152932.5e294e47.weiny2@llnl.gov>
Message-ID: <20080404234547.GA17618@lanczos.q-leap.de>

On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote:
> On Sat, 5 Apr 2008 00:12:39 +0200
> Bernd Schubert <bs at q-leap.de> wrote:
> 
> > Hello,
> > 
> > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> > much better there, at least no further RcvSwRelayErrors, even when the 
> > cluster is in idle state and so far also no SymbolErrors, which we also have 
> > seens before.
> > 
> > However, after I just started a lustre stress test on 50 clients (to a lustre 
> > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> > 9000 XmtDiscards within 30 minutes.
> 
> Yea, those are bad.
> 
> > 
> > Searching for this error I find "This is a symptom of congestion and may 
> > require tweaking either HOQ or switch lifetime values". 
> > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> > it. I also do not have an idea to set switch lifetime values.  I guess this 
> > isn't related to the opensm timeout option, is it?
> 
> Yes you should adjust these values.
> 
> > 
> > Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> > switches, but is this also possible on Flextronics switches?
> > 
> 
> I don't know about the Vendor SMs but in opensm look for the following options
> in the opensm.opts file (Default path is: /var/cache/opensm):
> 
>    # The code of maximal time a packet can wait at the head of
>    # transmission queue.
>    # The actual time is 4.096usec * 2^<head_of_queue_lifetime>
>    # The value 0x14 disables this mechanism
>    head_of_queue_lifetime 0x12
>    
>    # The maximal time a packet can wait at the head of queue on
>    # switch port connected to a CA or router port
>    leaf_head_of_queue_lifetime 0x0c

Hmm, I first increased head_of_queue_lifetime to 0x13 and 
leaf_head_of_queue_lifetime to 0x20, but this didn't make the error 
go away. So I increased head_of_queue_lifetime to 0x15 and 
leaf_head_of_queue_lifetime  to 0x50, but this made the fabric to entirely
crash. On the node of the master opensm I got an endless number of messages
like these:

Apr  5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out
Apr  5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs
Apr  5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377
Apr  5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out

The slave opensm also went into D-state and is not killable anymore :(

Seems I have to be very careful with these settings...


Thanks for your help,
Bernd


From arlin.r.davis at intel.com  Fri Apr  4 16:40:10 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Fri, 4 Apr 2008 16:40:10 -0700
Subject: [ofa-general] [PATCH 1/4][v2] dapl: add support for private data in
	CR reject.
Message-ID: <000001c896ad$3b2d6b00$14fd070a@amr.corp.intel.com>

Private data support via dat_cr_reject was added to
the v2 DAT specification but dapl was never extended
to support at the provider level. Add support in OFA
uDAPL provider.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_adapter_util.h  |    6 ++++--
 dapl/common/dapl_cr_callback.c   |    9 ++++++---
 dapl/common/dapl_cr_reject.c     |    3 ++-
 dapl/ibal-scm/dapl_ibal-scm_cm.c |    4 +++-
 dapl/ibal/dapl_ibal_cm.c         |    4 +++-
 dapl/openib/dapl_ib_cm.c         |    4 +++-
 dapl/openib_cma/dapl_ib_cm.c     |    6 +++++-
 dapl/openib_scm/dapl_ib_cm.c     |    4 +++-
 8 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/dapl/common/dapl_adapter_util.h b/dapl/common/dapl_adapter_util.h
index d664bf6..43175a9 100755
--- a/dapl/common/dapl_adapter_util.h
+++ b/dapl/common/dapl_adapter_util.h
@@ -112,8 +112,10 @@ DAT_RETURN dapls_ib_accept_connection (
 	IN  const DAT_PVOID		private_data);
 
 DAT_RETURN dapls_ib_reject_connection (
-	IN  dp_ib_cm_handle_t	cm_handle,
-	IN  int				reject_reason);
+	IN  dp_ib_cm_handle_t		cm_handle,
+	IN  int				reject_reason,
+	IN  DAT_COUNT			private_data_size,
+	IN  const DAT_PVOID		private_data);
 
 DAT_RETURN dapls_ib_setup_async_callback (
 	IN  DAPL_IA			*ia_ptr,
diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c
index 46d2b4c..aafdbfb 100644
--- a/dapl/common/dapl_cr_callback.c
+++ b/dapl/common/dapl_cr_callback.c
@@ -173,7 +173,8 @@ dapls_cr_callback (
 		dapl_dbg_log (DAPL_DBG_TYPE_CM,
 			  "---> dapls_cr_callback: conn event on down SP\n");
 		(void)dapls_ib_reject_connection (ib_cm_handle,
-						  DAT_CONNECTION_EVENT_UNREACHABLE );
+						  DAT_CONNECTION_EVENT_UNREACHABLE,
+						  0, NULL);
 
 		return;
 	    }
@@ -300,7 +301,8 @@ dapls_cr_callback (
     {
 	/* The event post failed; take appropriate action.  */
 	(void)dapls_ib_reject_connection ( ib_cm_handle,
-					   DAT_CONNECTION_EVENT_BROKEN);
+					   DAT_CONNECTION_EVENT_BROKEN,
+					   0, NULL);
 
 	return;
     }
@@ -456,7 +458,8 @@ dapli_connection_request (
     {
 	dapls_cr_free (cr_ptr);
 	(void)dapls_ib_reject_connection (ib_cm_handle,
-					  DAT_CONNECTION_EVENT_BROKEN);
+					  DAT_CONNECTION_EVENT_BROKEN,
+					  0, NULL);
 
 	/* Take the CR off the list, we can't use it */
 	dapl_os_lock (&sp_ptr->header.lock);
diff --git a/dapl/common/dapl_cr_reject.c b/dapl/common/dapl_cr_reject.c
index d6842b3..029cdfa 100755
--- a/dapl/common/dapl_cr_reject.c
+++ b/dapl/common/dapl_cr_reject.c
@@ -97,7 +97,8 @@ dapl_cr_reject (
     }
 
     dat_status =  dapls_ib_reject_connection ( cr_ptr->ib_cm_handle,
-					       IB_CM_REJ_REASON_CONSUMER_REJ );
+					       IB_CM_REJ_REASON_CONSUMER_REJ,
+					       pdata_size, pdata );
 
     if ( dat_status != DAT_SUCCESS)
     {
diff --git a/dapl/ibal-scm/dapl_ibal-scm_cm.c b/dapl/ibal-scm/dapl_ibal-scm_cm.c
index fcf5215..df83008 100644
--- a/dapl/ibal-scm/dapl_ibal-scm_cm.c
+++ b/dapl/ibal-scm/dapl_ibal-scm_cm.c
@@ -951,7 +951,9 @@ dapls_ib_accept_connection (
 DAT_RETURN
 dapls_ib_reject_connection (
 	IN  dp_ib_cm_handle_t	ib_cm_handle,
-	IN  int			reject_reason )
+	IN  int			reject_reason,
+	IN  DAT_COUNT 		private_data_size,
+	IN  const DAT_PVOID 	private_data)
 {
     	ib_cm_srvc_handle_t	cm_ptr = ib_cm_handle;
 
diff --git a/dapl/ibal/dapl_ibal_cm.c b/dapl/ibal/dapl_ibal_cm.c
index 6cd652f..a986430 100644
--- a/dapl/ibal/dapl_ibal_cm.c
+++ b/dapl/ibal/dapl_ibal_cm.c
@@ -1228,7 +1228,9 @@ dapls_ib_remove_conn_listener (
  */
 DAT_RETURN
 dapls_ib_reject_connection ( IN  dp_ib_cm_handle_t   ib_cm_handle,
-                             IN  int                 reject_reason )
+                             IN  int                 reject_reason,
+                             IN  DAT_COUNT           private_data_size,
+                             IN  const DAT_PVOID     private_data)
 {
     ib_api_status_t        ib_status;
     ib_cm_rej_t            cm_rej;
diff --git a/dapl/openib/dapl_ib_cm.c b/dapl/openib/dapl_ib_cm.c
index 76d5968..c887a0b 100644
--- a/dapl/openib/dapl_ib_cm.c
+++ b/dapl/openib/dapl_ib_cm.c
@@ -971,7 +971,9 @@ reject:
 DAT_RETURN
 dapls_ib_reject_connection (
 	IN  ib_cm_handle_t	cm_handle,
-	IN  int			reject_reason )
+	IN  int			reject_reason,
+	IN  DAT_COUNT           private_data_size,
+	IN  const DAT_PVOID     private_data)
 {
     	int status;
 
diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c
index cf79142..a040ffb 100755
--- a/dapl/openib_cma/dapl_ib_cm.c
+++ b/dapl/openib_cma/dapl_ib_cm.c
@@ -890,7 +890,11 @@ bail:
  *
  */
 DAT_RETURN
-dapls_ib_reject_connection(IN dp_ib_cm_handle_t cm_handle, IN int reason)
+dapls_ib_reject_connection(
+	IN dp_ib_cm_handle_t cm_handle, 
+	IN int reason,
+	IN DAT_COUNT private_data_size,
+	IN const DAT_PVOID private_data)
 {
     	int ret;
 
diff --git a/dapl/openib_scm/dapl_ib_cm.c b/dapl/openib_scm/dapl_ib_cm.c
index 485ab9b..94e3959 100644
--- a/dapl/openib_scm/dapl_ib_cm.c
+++ b/dapl/openib_scm/dapl_ib_cm.c
@@ -759,7 +759,9 @@ dapls_ib_accept_connection (
 DAT_RETURN
 dapls_ib_reject_connection (
 	IN  ib_cm_handle_t	ib_cm_handle,
-	IN  int			reject_reason )
+	IN  int			reject_reason,
+	IN  DAT_COUNT           private_data_size,
+	IN  const DAT_PVOID     private_data)
 {
     	ib_cm_srvc_handle_t	cm_ptr = ib_cm_handle;
 
-- 
1.5.2.5


From sean.hefty at intel.com  Fri Apr  4 16:53:09 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Fri, 4 Apr 2008 16:53:09 -0700
Subject: [ofa-general] [PATCH 3/4][v2] dapl: add provider vendor
	revisiondata in private data with reject
In-Reply-To: <B0095134066CC94FBC80973103FFA1FE06C158B7@orsmsx416.amr.corp.intel.com>
References: <B0095134066CC94FBC80973103FFA1FE06C158B7@orsmsx416.amr.corp.intel.com>
Message-ID: <001301c896af$09bacf20$3c98070a@amr.corp.intel.com>

>Add 1 byte header containing provider/vendor major revision
>to distinguish between consumer and non-consumer rejects.
>Validate size of consumer reject privated data.

Not saying this is a bad idea, but doesn't it break the protocol with existing
DAPL?  It also shifts all of the existing private data off by a byte, which
could result in odd data alignment.

- Sean


From andrea at qumranet.com  Fri Apr  4 17:23:30 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sat, 5 Apr 2008 02:23:30 +0200
Subject: [ofa-general] Re: [PATCH] mmu notifier #v11
In-Reply-To: <Pine.LNX.4.64.0804041504310.12396@schroedinger.engr.sgi.com>
References: <Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
	<Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
	<20080404202055.GA14784@duo.random>
	<Pine.LNX.4.64.0804041504310.12396@schroedinger.engr.sgi.com>
Message-ID: <20080405002330.GF14784@duo.random>

On Fri, Apr 04, 2008 at 03:06:18PM -0700, Christoph Lameter wrote:
> Adds some comments. Still objectionable is the multiple ways of
> invalidating pages in #v11. Callout now has similar locking to emm.

range_begin exists because range_end is called after the page has
already been freed. invalidate_page is called _before_ the page is
freed but _after_ the pte has been zapped.

In short when working with single pages it's a waste to block the
secondary-mmu page fault, because it's zero cost to invalidate_page
before put_page. Not even GRU need to do that.

Instead for the multiple-pte-zapping we have to call range_end _after_
the page is already freed. This is so that there is a single range_end
call for an huge amount of address space. So we need a range_begin for
the subsystems not using page pinning for example. When working with
single pages (try_to_unmap_one, do_wp_page) invalidate_page avoids to
block the secondary mmu page fault, and it's in turn faster.

Besides avoiding need of serializing the secondary mmu page fault,
invalidate_page also reduces the overhead when the mmu notifiers are
disarmed (i.e. kvm not running).


From andrea at qumranet.com  Fri Apr  4 17:41:27 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sat, 5 Apr 2008 02:41:27 +0200
Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against
	reclaim
In-Reply-To: <47F6B5EA.6060106@goop.org>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org>
Message-ID: <20080405004127.GG14784@duo.random>

On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote:
> I think you can break this if() down a bit:
>
> 			if (!(vma->vm_file && vma->vm_file->f_mapping))
> 				continue;

It makes no difference at runtime, coding style preferences are quite
subjective.

> So this is an O(n^2) algorithm to take the i_mmap_locks from low to high 
> order?  A comment would be nice.  And O(n^2)?  Ouch.  How often is it 
> called?

It's called a single time when the mmu notifier is registered. It's a
very slow path of course. Any other approach to reduce the complexity
would require memory allocations and it would require
mmu_notifier_register to return -ENOMEM failure. It didn't seem worth
it.

> And is it necessary to mush lock and unlock together?  Unlock ordering 
> doesn't matter, so you should just be able to have a much simpler loop, no?

That avoids duplicating .text. Originally they were separated. unlock
can't be a simpler loop because I didn't reserve vm_flags bitflags to
do a single O(N) loop for unlock. If you do malloc+fork+munmap two
vmas will point to the same anon-vma lock, that's why the unlock isn't
simpler unless I mark what I locked with a vm_flags bitflag.


From boris at mellanox.com  Fri Apr  4 17:48:18 2008
From: boris at mellanox.com (Boris Shpolyansky)
Date: Fri, 4 Apr 2008 17:48:18 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <20080404234547.GA17618@lanczos.q-leap.de>
Message-ID: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com>

Bernd,

0x14 is the maximal value for HOQ lifetime, which effectively disables
the mechanism. I think you shouldn't exceed this value. 


Boris

-----Original Message-----
From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernd
Schubert
Sent: Friday, April 04, 2008 4:46 PM
To: Ira Weiny
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general] XmtDiscards

On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote:
> On Sat, 5 Apr 2008 00:12:39 +0200
> Bernd Schubert <bs at q-leap.de> wrote:
> 
> > Hello,
> > 
> > after I upgraded one of our clusters to opensm-3.2.1 it seems to 
> > have gotten much better there, at least no further RcvSwRelayErrors,

> > even when the cluster is in idle state and so far also no 
> > SymbolErrors, which we also have seens before.
> > 
> > However, after I just started a lustre stress test on 50 clients (to

> > a lustre storage system with 20 OSS servers and 60 OSTs), 
> > ibcheckerrors reports about 9000 XmtDiscards within 30 minutes.
> 
> Yea, those are bad.
> 
> > 
> > Searching for this error I find "This is a symptom of congestion and

> > may require tweaking either HOQ or switch lifetime values".
> > Well, I have to admit I neither know what HOQ is, nor do I know how 
> > to tweak it. I also do not have an idea to set switch lifetime 
> > values.  I guess this isn't related to the opensm timeout option, is
it?
> 
> Yes you should adjust these values.
> 
> > 
> > Hmm, I just found a cisci pdf describing how to set the lifetime on 
> > these switches, but is this also possible on Flextronics switches?
> > 
> 
> I don't know about the Vendor SMs but in opensm look for the following

> options in the opensm.opts file (Default path is: /var/cache/opensm):
> 
>    # The code of maximal time a packet can wait at the head of
>    # transmission queue.
>    # The actual time is 4.096usec * 2^<head_of_queue_lifetime>
>    # The value 0x14 disables this mechanism
>    head_of_queue_lifetime 0x12
>    
>    # The maximal time a packet can wait at the head of queue on
>    # switch port connected to a CA or router port
>    leaf_head_of_queue_lifetime 0x0c

Hmm, I first increased head_of_queue_lifetime to 0x13 and
leaf_head_of_queue_lifetime to 0x20, but this didn't make the error go
away. So I increased head_of_queue_lifetime to 0x15 and
leaf_head_of_queue_lifetime  to 0x50, but this made the fabric to
entirely crash. On the node of the master opensm I got an endless number
of messages like these:

Apr  5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0:
transmit timed out Apr  5 01:35:03 pfs1n2 kernel: [705448.349814] ib0:
transmit timeout: latency 411908 msecs Apr  5 01:35:03 pfs1n2 kernel:
[705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 Apr  5
01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit
timed out

The slave opensm also went into D-state and is not killable anymore :(

Seems I have to be very careful with these settings...


Thanks for your help,
Bernd
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general


From andrea at qumranet.com  Fri Apr  4 17:57:59 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sat, 5 Apr 2008 02:57:59 +0200
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <20080404223131.469710551@sgi.com>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
Message-ID: <20080405005759.GH14784@duo.random>

On Fri, Apr 04, 2008 at 03:30:50PM -0700, Christoph Lameter wrote:
> +	mm_lock(mm);
> +	e->next = mm->emm_notifier;
> +	/*
> +	 * The update to emm_notifier (e->next) must be visible
> +	 * before the pointer becomes visible.
> +	 * rcu_assign_pointer() does exactly what we need.
> +	 */
> +	rcu_assign_pointer(mm->emm_notifier, e);
> +	mm_unlock(mm);

My mm_lock solution makes all rcu serialization an unnecessary
overhead so you should remove it like I already did in #v11. If it
wasn't the case, then mm_lock wouldn't be a definitive fix for the
race.

> +		e = rcu_dereference(e->next);

Same here.


From arlin.r.davis at intel.com  Fri Apr  4 23:52:04 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Fri, 4 Apr 2008 23:52:04 -0700
Subject: [ofa-general] [PATCH 3/4][v2] dapl: add provider vendor
	revisiondata in private data with reject
In-Reply-To: <001301c896af$09bacf20$3c98070a@amr.corp.intel.com>
References: <B0095134066CC94FBC80973103FFA1FE06C158B7@orsmsx416.amr.corp.intel.com>
	<001301c896af$09bacf20$3c98070a@amr.corp.intel.com>
Message-ID: <B0095134066CC94FBC80973103FFA1FE06C1598B@orsmsx416.amr.corp.intel.com>

 
>>Add 1 byte header containing provider/vendor major revision
>>to distinguish between consumer and non-consumer rejects.
>>Validate size of consumer reject privated data.
>
>Not saying this is a bad idea, but doesn't it break the 
>protocol with existing
>DAPL?  It also shifts all of the existing private data off by 
>a byte, which
>could result in odd data alignment.

If the cma/cm could guarantee that IB_CM_REJ_CONSUMER_DEFINED
is always an indication of true consumer called reject versus
abnormal termination then I would not need to add the provider
header in reject private data.

Anyway, private data delivery in rejects is new for DAT v2 
and exposed for the first time with this patch set. There
is no compatibility issue with existing DAPL because reject
private data has been ignored up until this point.

I will adjust for odd data alignment.

Thanks for the feedback,

-arlin 
. 


From hrosenstock at xsigo.com  Sat Apr  5 06:17:59 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Sat, 05 Apr 2008 06:17:59 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com>
References: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com>
Message-ID: <1207401479.15625.221.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-04 at 17:48 -0700, Boris Shpolyansky wrote:
> Bernd,
> 
> 0x14 is the maximal value for HOQ lifetime, which effectively disables
> the mechanism. I think you shouldn't exceed this value. 

True about the maximal value but any 5 bit value > 19 (up through 31)
should effectively be the same thing according to the spec.

I also think that OpenSM could do a better job validating and setting
this and other similar optional parameters.

-- Hal

> Boris
> 
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org
> [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Bernd
> Schubert
> Sent: Friday, April 04, 2008 4:46 PM
> To: Ira Weiny
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] XmtDiscards
> 
> On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote:
> > On Sat, 5 Apr 2008 00:12:39 +0200
> > Bernd Schubert <bs at q-leap.de> wrote:
> > 
> > > Hello,
> > > 
> > > after I upgraded one of our clusters to opensm-3.2.1 it seems to 
> > > have gotten much better there, at least no further RcvSwRelayErrors,
> 
> > > even when the cluster is in idle state and so far also no 
> > > SymbolErrors, which we also have seens before.
> > > 
> > > However, after I just started a lustre stress test on 50 clients (to
> 
> > > a lustre storage system with 20 OSS servers and 60 OSTs), 
> > > ibcheckerrors reports about 9000 XmtDiscards within 30 minutes.
> > 
> > Yea, those are bad.
> > 
> > > 
> > > Searching for this error I find "This is a symptom of congestion and
> 
> > > may require tweaking either HOQ or switch lifetime values".
> > > Well, I have to admit I neither know what HOQ is, nor do I know how 
> > > to tweak it. I also do not have an idea to set switch lifetime 
> > > values.  I guess this isn't related to the opensm timeout option, is
> it?
> > 
> > Yes you should adjust these values.
> > 
> > > 
> > > Hmm, I just found a cisci pdf describing how to set the lifetime on 
> > > these switches, but is this also possible on Flextronics switches?
> > > 
> > 
> > I don't know about the Vendor SMs but in opensm look for the following
> 
> > options in the opensm.opts file (Default path is: /var/cache/opensm):
> > 
> >    # The code of maximal time a packet can wait at the head of
> >    # transmission queue.
> >    # The actual time is 4.096usec * 2^<head_of_queue_lifetime>
> >    # The value 0x14 disables this mechanism
> >    head_of_queue_lifetime 0x12
> >    
> >    # The maximal time a packet can wait at the head of queue on
> >    # switch port connected to a CA or router port
> >    leaf_head_of_queue_lifetime 0x0c
> 
> Hmm, I first increased head_of_queue_lifetime to 0x13 and
> leaf_head_of_queue_lifetime to 0x20, but this didn't make the error go
> away. So I increased head_of_queue_lifetime to 0x15 and
> leaf_head_of_queue_lifetime  to 0x50, but this made the fabric to
> entirely crash. On the node of the master opensm I got an endless number
> of messages like these:
> 
> Apr  5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0:
> transmit timed out Apr  5 01:35:03 pfs1n2 kernel: [705448.349814] ib0:
> transmit timeout: latency 411908 msecs Apr  5 01:35:03 pfs1n2 kernel:
> [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377 Apr  5
> 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit
> timed out
> 
> The slave opensm also went into D-state and is not killable anymore :(
> 
> Seems I have to be very careful with these settings...
> 
> 
> Thanks for your help,
> Bernd
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hrosenstock at xsigo.com  Sat Apr  5 06:19:43 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Sat, 05 Apr 2008 06:19:43 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <200804050012.39893.bs@q-leap.de>
References: <200804050012.39893.bs@q-leap.de>
Message-ID: <1207401583.15625.224.camel@hrosenstock-ws.xsigo.com>

Hi Bernd,

On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote:
> Hello,
> 
> after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> much better there, at least no further RcvSwRelayErrors, even when the 
> cluster is in idle state and so far also no SymbolErrors, which we also have 
> seens before.
> 
> However, after I just started a lustre stress test on 50 clients (to a lustre 
> storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> 9000 XmtDiscards within 30 minutes.
> 
> Searching for this error I find "This is a symptom of congestion and may 
> require tweaking either HOQ or switch lifetime values". 
> Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> it. I also do not have an idea to set switch lifetime values.  I guess this 
> isn't related to the opensm timeout option, is it?
> 
> Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> switches, but is this also possible on Flextronics switches?

What routing algorithm are you using ? Rather than play with those
switch values, if you are not using up/down, could you try that to see
if it helps with the congestion you are seeing ?

-- Hal

> Thanks for any help,
> Bernd


From hrosenstock at xsigo.com  Sat Apr  5 06:23:52 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Sat, 05 Apr 2008 06:23:52 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <20080404234547.GA17618@lanczos.q-leap.de>
References: <200804050012.39893.bs@q-leap.de>
	<20080404152932.5e294e47.weiny2@llnl.gov>
	<20080404234547.GA17618@lanczos.q-leap.de>
Message-ID: <1207401832.15625.229.camel@hrosenstock-ws.xsigo.com>

On Sat, 2008-04-05 at 01:45 +0200, Bernd Schubert wrote:
> On Fri, Apr 04, 2008 at 03:29:32PM -0700, Ira Weiny wrote:
> > On Sat, 5 Apr 2008 00:12:39 +0200
> > Bernd Schubert <bs at q-leap.de> wrote:
> > 
> > > Hello,
> > > 
> > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> > > much better there, at least no further RcvSwRelayErrors, even when the 
> > > cluster is in idle state and so far also no SymbolErrors, which we also have 
> > > seens before.
> > > 
> > > However, after I just started a lustre stress test on 50 clients (to a lustre 
> > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> > > 9000 XmtDiscards within 30 minutes.
> > 
> > Yea, those are bad.
> > 
> > > 
> > > Searching for this error I find "This is a symptom of congestion and may 
> > > require tweaking either HOQ or switch lifetime values". 
> > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> > > it. I also do not have an idea to set switch lifetime values.  I guess this 
> > > isn't related to the opensm timeout option, is it?
> > 
> > Yes you should adjust these values.
> > 
> > > 
> > > Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> > > switches, but is this also possible on Flextronics switches?
> > > 
> > 
> > I don't know about the Vendor SMs but in opensm look for the following options
> > in the opensm.opts file (Default path is: /var/cache/opensm):
> > 
> >    # The code of maximal time a packet can wait at the head of
> >    # transmission queue.
> >    # The actual time is 4.096usec * 2^<head_of_queue_lifetime>
> >    # The value 0x14 disables this mechanism
> >    head_of_queue_lifetime 0x12
> >    
> >    # The maximal time a packet can wait at the head of queue on
> >    # switch port connected to a CA or router port
> >    leaf_head_of_queue_lifetime 0x0c
> 
> Hmm, I first increased head_of_queue_lifetime to 0x13 and 
> leaf_head_of_queue_lifetime to 0x20, but this didn't make the error 
> go away. So I increased head_of_queue_lifetime to 0x15 and 
> leaf_head_of_queue_lifetime  to 0x50, but this made the fabric to entirely
> crash. On the node of the master opensm I got an endless number of messages
> like these:
> 
> Apr  5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out
> Apr  5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs
> Apr  5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377
> Apr  5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out
> 
> The slave opensm also went into D-state and is not killable anymore :(
> 
> Seems I have to be very careful with these settings...

Yes, those settings are not for the faint of heart and one needs to
really understand what changes to those parameters really mean.

As far as the slave opensm behavior, this is worth understanding more
IMO.

-- Hal

> Thanks for your help,
> Bernd
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From canguera at gencat.net  Sat Apr  5 08:31:31 2008
From: canguera at gencat.net (Michele Gibbs)
Date: Sat, 5 Apr 2008 09:31:31 -0600
Subject: [ofa-general] AutoCAD 2008, Adobe Acrobat 8, Photoshop CS3
Message-ID: <977591215.96864692035982@gencat.net>

Cheap and excellent software - too good to be true? Read information belowWir freuen uns darauf, Ihnen lokalisierte Versionen bekannter Programme anbieten zu k&#246;nnen: Englisch, Deutsch, Franz&#246;sisch, Italienisch, Spanisch und viele andere Sprachen! 

Sofort nach dem Kauf k&#246;nnen Sie jedes Programm herunterladen und installieren.http://weiloser.com* Office Enterprise 2007: $79.95
* Adobe Acrobat 8.0 Professional: $69.95
* AutoCAD 2008: $129.95
http://weiloser.com
Wir haben mehr 300 verschiedener Programmes f&#252;r PC und Macintosh! Kaufen jetzt, warten Sie nicht!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080405/0d7c7c78/attachment.html>

From Ray at gmail.com  Sat Apr  5 04:48:40 2008
From: Ray at gmail.com (Update Services)
Date: Sat, 5 Apr 2008 04:48:40 -0700
Subject: [ofa-general] ***SPAM*** Next Step
Message-ID: <983c66b16c201fcea354e055001942ce@gmail.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080405/7836b77d/attachment.html>

From Brian.Murrell at Sun.COM  Sat Apr  5 09:39:29 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Sat, 05 Apr 2008 12:39:29 -0400
Subject: [ofa-general] inconsistent use of --with-backport[-patches]
Message-ID: <1207413569.1750.135.camel@pc.ilinx>

I'm trying to help the 1.3 ofa_kernel package along with figuring out
which backport patches to use for my kernel source (because the kernel
version does not work nicely with ofed_patch.sh's get_backport_dir()
function) and there seems to be an inconsistent use of
--with-backport-patches between configure and ofed_patch.sh.

ofed_patch.sh takes the following arguments:

                        --with-backport-patches)
                                WITH_BACKPORT_PATCHES="yes"
                                WITH_PATCH="yes"
                        ;;
                        --without-backport-patches)
                                WITH_BACKPORT_PATCHES="no"
                        ;;
                        --with-backport)
				shift
                                BACKPORT_DIR=$1
                        ;;
                        --with-backport=*)
                                BACKPORT_DIR=`expr "x$1" : 'x[^=]*=\(.*\)'`
			;;

and configure takes the following backport patches arguements:

                        --with-backport-patches)
                                ofed_patch_params="$ofed_patch_params $1"
                        ;;
                        --without-backport-patches)
                                ofed_patch_params="$ofed_patch_params $1"
                        ;;

As you can see configure accepts "--with[out]-backport-patches <arg>"
arguments and simply passes them on to ofed_patch.sh, however it does
not accept the "--with-backport" argument to actually specify a set to
use.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080405/86c19af9/attachment.sig>

From swise at opengridcomputing.com  Sat Apr  5 14:43:56 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 05 Apr 2008 16:43:56 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F69D86.9040407@oracle.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>
	<47F3CA89.9080406@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	<47F4F526.3060709@opengridcomputing.com>	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>	<47F63E33.5080709@opengridcomputing.com>
	<15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com>
	<47F69D86.9040407@oracle.com>
Message-ID: <47F7F29C.3040102@opengridcomputing.com>

iWARP RFCs:

> 5040 A Remote Direct Memory Access Protocol Specification. R. Recio,
>      B. Metzler, P. Culley, J. Hilland, D. Garcia. October 2007. (Format:
>      TXT=142247 bytes) (Status: PROPOSED STANDARD)
> 
> 5041 Direct Data Placement over Reliable Transports. H. Shah, J.
>      Pinkerton, R. Recio, P. Culley. October 2007. (Format: TXT=84642
>      bytes) (Status: PROPOSED STANDARD)
> 
> 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory
>      Access Protocol (RDMAP) Security. J. Pinkerton, E. Deleganes. October
>      2007. (Format: TXT=127453 bytes) (Status: PROPOSED STANDARD)
> 
> 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement
>      (DDP) Adaptation. C. Bestler, Ed., R. Stewart, Ed.. October 2007.
>      (Format: TXT=38740 bytes) (Status: PROPOSED STANDARD)
> 
> 5044 Marker PDU Aligned Framing for TCP Specification. P. Culley, U.
>      Elzur, R. Recio, S. Bailey, J. Carrier. October 2007. (Format:
>      TXT=168918 bytes) (Status: PROPOSED STANDARD)
> 
> 5045 Applicability of Remote Direct Memory Access Protocol (RDMA) and
>      Direct Data Placement (DDP). C. Bestler, Ed., L. Coene. October 2007.
>      (Format: TXT=51749 bytes) (Status: INFORMATIONAL)

For RDMA over TCP, refer to 5040, 5041, and 5044.

iWARP Verbs:

http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf


Steve.


From swise at opengridcomputing.com  Sat Apr  5 14:55:33 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sat, 05 Apr 2008 16:55:33 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F69D58.6040800@oracle.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>	<47F3CA89.9080406@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	<47F4F526.3060709@opengridcomputing.com>	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>	<ada1w5lvc31.fsf@cisco.com>	<15ddcffd0804041323v480b4e3fi7061526184ab26b5@mail.gmail.com>
	<47F69D58.6040800@oracle.com>
Message-ID: <47F7F555.2070208@opengridcomputing.com>


Richard Frank wrote:
> Hmmm - so what happens with IWARP NIC when no buffer is posted on recv q 
> and a message arrives ?
> 
> 

The spec sez the implementation can terminate the connection.  That is 
exactly what ammasso and chelsio's rnics do.  The spec doesn't mandate 
this behavior however.  So an incoming SEND could to be dropped and not 
ack'd at the TCP level forcing the client to retransmit.  But I don't 
know of an rnic that does this.

FYI: The reason the rnic implementation might terminate in this case is 
due to the protocol stack layering.  If the rdma layers (mpa, ddp, 
rdmap) sitting on top of TCP don't tell TCP when to ack something, then 
the incoming SEND might be acked by TCP before the RDMA layers process 
the packet.  Then the SEND cannot be dropped since its already acked. 
So the message either must be buffered until the RECV is posted, or the 
connection terminated.


Steve.


From richard.frank at oracle.com  Sat Apr  5 15:55:40 2008
From: richard.frank at oracle.com (Richard Frank)
Date: Sat, 05 Apr 2008 17:55:40 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <47F7F29C.3040102@opengridcomputing.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>
	<47F3CA89.9080406@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	<47F4F526.3060709@opengridcomputing.com>	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>	<47F63E33.5080709@opengridcomputing.com>
	<15ddcffd0804041325i17e8f620xaa1ec9ec823afd60@mail.gmail.com>
	<47F69D86.9040407@oracle.com>
	<47F7F29C.3040102@opengridcomputing.com>
Message-ID: <47F8036C.1010701@oracle.com>

This is all goodness -

I'm looking for something specific to what does and does not work with 
IWARP NICs in OFED today - perhaps a matrix comparing functionality of 
IB vs iWARP - so we know what not to do and or what to work around when 
running over IWARP NICS.

For now - we could probably just treat them as simple NICs and run RDS 
over TCP - that ought to at least  work..

Steve Wise wrote:
> iWARP RFCs:
>
>> 5040 A Remote Direct Memory Access Protocol Specification. R. Recio,
>>      B. Metzler, P. Culley, J. Hilland, D. Garcia. October 2007. 
>> (Format:
>>      TXT=142247 bytes) (Status: PROPOSED STANDARD)
>>
>> 5041 Direct Data Placement over Reliable Transports. H. Shah, J.
>>      Pinkerton, R. Recio, P. Culley. October 2007. (Format: TXT=84642
>>      bytes) (Status: PROPOSED STANDARD)
>>
>> 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory
>>      Access Protocol (RDMAP) Security. J. Pinkerton, E. Deleganes. 
>> October
>>      2007. (Format: TXT=127453 bytes) (Status: PROPOSED STANDARD)
>>
>> 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement
>>      (DDP) Adaptation. C. Bestler, Ed., R. Stewart, Ed.. October 2007.
>>      (Format: TXT=38740 bytes) (Status: PROPOSED STANDARD)
>>
>> 5044 Marker PDU Aligned Framing for TCP Specification. P. Culley, U.
>>      Elzur, R. Recio, S. Bailey, J. Carrier. October 2007. (Format:
>>      TXT=168918 bytes) (Status: PROPOSED STANDARD)
>>
>> 5045 Applicability of Remote Direct Memory Access Protocol (RDMA) and
>>      Direct Data Placement (DDP). C. Bestler, Ed., L. Coene. October 
>> 2007.
>>      (Format: TXT=51749 bytes) (Status: INFORMATIONAL)
>
> For RDMA over TCP, refer to 5040, 5041, and 5044.
>
> iWARP Verbs:
>
> http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf 
>
>
>
> Steve.


From sashak at voltaire.com  Sat Apr  5 23:53:14 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 6 Apr 2008 06:53:14 +0000
Subject: [ofa-general] XmtDiscards
In-Reply-To: <20080404234547.GA17618@lanczos.q-leap.de>
References: <200804050012.39893.bs@q-leap.de>
	<20080404152932.5e294e47.weiny2@llnl.gov>
	<20080404234547.GA17618@lanczos.q-leap.de>
Message-ID: <20080406065314.GA13374@sashak.voltaire.com>

On 01:45 Sat 05 Apr     , Bernd Schubert wrote:
> 
> Hmm, I first increased head_of_queue_lifetime to 0x13 and 
> leaf_head_of_queue_lifetime to 0x20, but this didn't make the error 
> go away. So I increased head_of_queue_lifetime to 0x15 and 
> leaf_head_of_queue_lifetime  to 0x50, but this made the fabric to entirely
> crash.

Are you using default (min hops) routing? I think it could be deadlock
due to unlimited head_of_queue_lifetime values.

> On the node of the master opensm I got an endless number of messages
> like these:
> 
> Apr  5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out
> Apr  5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs
> Apr  5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377
> Apr  5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out
> 
> The slave opensm also went into D-state and is not killable anymore :(

Interesting... Any more details about this?

Sasha


From rdreier at cisco.com  Sat Apr  5 21:41:02 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 05 Apr 2008 21:41:02 -0700
Subject: [ofa-general] Directions for verbs API extensions
Message-ID: <adaej9jppcx.fsf@cisco.com>

Here is a little document I wrote trying to summarize all the things
that we might want to add to the verbs API to support device
capabilities that aren't exposed yet.  There are a number of issues to
resolve, and answers to the questions I ask below would help us make
progress towards actually supporting all this.

There are a number of verbs that are common to the iWARP/RDMA
consortium verbs and the InfiniBand base memory management extensions
(IB-BMME).  We would probably add one device capability bit for "BMME"
(and all iWARP devices could set it) to show support for everything here:

 - Allocate L_Key/STag.  This allocates MR resources without actually
   registering memory; the MR can then be registered or invalidated as
   described below.

 - "Fast register" memory through send queue.  This allows a work
   request to be posted to a send queue to register memory using an
   L_Key/STag that is in the invalid state.

 - Local invalidate send work requests, which can be used to
   invalidate an MR or MW.  One subtle point here is that local
   invalidate operations have very loose ordering, in the sense that
   they can be executed before earlier requests, but support for
   fencing local invalidate operations is mandatory in iWARP and only
   optional in IB.  But is there any IB device that currently exists
   that supports BMME but doesn't support local invalidate fencing?
   I really hope we can ignore this possibility.

 - Memory windows associated to a single QP and bound using send work
   requests posted with the normal post send verb rather than a
   separate MW verb.  (See below for more)

In addition there are things that are optional in both specs:

 - Block-list physical buffer lists; this allows memory regions to be
   registered with arbitrary size/alignment blocks instead of just
   page-aligned chunks.  Yet another capability bit if we want to
   expose this.

There are a few discrepancies between the iWARP and IB verbs that we
need to decide on how we want to handle:

 - In IB-BMME, L_Keys and R_Keys are split up so that there is an
   8-bit "key" that is owned by the consumer.  As far as I know, there
   is no analogous concept defined for iWARP STags; is there any point
   in supporting this IB-only feature (which is optional even in the
   IB spec)?

 - Along similar lines, IB defines two types of memory windows, "type
   1" and "type 2" and in fact type 2 is split into "2A" and "2B" (the
   difference is basically whether the MW is associated with just a
   QP, or with a QP and a PD).  iWARP memory windows are always what
   the IB spec would call type 2B.  All the IB devices that I know of
   with IB-BMME support can handle type 2B memory windows.  Is there
   any point in having our API worry about the distinction between 2A
   or 2B, or should we just decree that we only handle type 2B?  (Does
   anyone who hasn't just been reading specs even understand the
   distinction between type 2A and 2B?)

 - Further, the MW API that we have now, with a separate bind MW verb,
   corresponds to type 1 MWs.  Type 2 MWs are bound by posting a work
   request using the standard "post send" verb.  Given that no IB
   device drivers have implemented the bind MW verb yet, does it make
   sense to deprecate the API for type 1 MWs and say that everyone
   should use type 2[B] MWs only?

 - iWARP supports "RDMA read with invalidate" send work requests,
   while IB has no such operation.  This makes sense because iWARP
   requires the buffer used to receive RDMA read responses to have
   remote write permission, while IB has no such requirement.  I don't
   see a really clean way to handle this except to say that apps have
   to have "if (IB) do_this(); else /* iWARP */ do_that();" code to
   use this in a portable way.

 - Zero-based virtual addresses for memory regions.  This is mandatory
   for iWARP and optional for IB (and is not required even for BMME).
   I think the simplest thing to do is just to have yet another
   capability bit to say whether a device supports ZBVA or not; all
   iWARP devices can set it.

Finally, there are proprietary verbs extensions that are only
supported by a single device at the moment, which we have to decide if
and how to support.  It is a tradeoff between making useful features
available versus making the already overly complex verbs API even more
impossible to fathom, although it seems all of these have users asking
for them:

 - ConnectX has XRC, masked atomic operations, and the "block
   loopback" flag for UD QPs at least.

 - eHCA has "low-latency" QPs.


From pasha at dev.mellanox.co.il  Sun Apr  6 08:04:14 2008
From: pasha at dev.mellanox.co.il (Pavel Shamis (Pasha))
Date: Sun, 06 Apr 2008 18:04:14 +0300
Subject: [ofa-general] MVAPICH2 crashes on mixed fabric
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D8678166@EPEXCH1.qlogic.org>
References: <C07C40DB2364324799506DE8FF12F8D8678166@EPEXCH1.qlogic.org>
Message-ID: <47F8E66E.6060505@dev.mellanox.co.il>

MVAPICH(1) and OMPI have HCA auto-detect system and both of them works 
well on heterogeneous cluster.
I'm not sure about mvapich2 but I think that mvapich-discussion list 
will be better place for this kind of question.
So I'm forwarding this mail to mvapich list.

Pasha.

Mike Heinz wrote:
> Hey, all, I'm not sure if this is a known bug or some sort of 
> limitation I'm unaware of, but I've been building and testing with the 
> OFED 1.3 GA release on a small fabric that has a mix of Arbel-based 
> and newer Connect-X HCAs.
>  
> What I've discovered is that mvapich and openmpi work fine across the 
> entire fabric, but mvapich2 crashes when I use a mix of Arbels and 
> Connect-X. The errors vary depending on the test program but here's an 
> example:
>  
> [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1
> .
> .
> .
> (output snipped)
> .
> .
> .
>
> #-----------------------------------------------------------------------------
> # Benchmarking Sendrecv
> # #processes = 2
> # ( 3 additional processes waiting in MPI_Barrier)
> #-----------------------------------------------------------------------------
>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   
> Mbytes/sec
>             0         1000         3.51         3.51         
> 3.51         0.00
>             1         1000         3.63         3.63         
> 3.63         0.52
>             2         1000         3.67         3.67         
> 3.67         1.04
>             4         1000         3.64         3.64         
> 3.64         2.09
>             8         1000         3.67         3.67         
> 3.67         4.16
>            16         1000         3.67         3.67         
> 3.67         8.31
>            32         1000         3.74         3.74         
> 3.74        16.32
>            64         1000         3.90         3.90         
> 3.90        31.28
>           128         1000         4.75         4.75         
> 4.75        51.39
>           256         1000         5.21         5.21         
> 5.21        93.79
>           512         1000         5.96         5.96         
> 5.96       163.77
>          1024         1000         7.88         7.89         
> 7.89       247.54
>          2048         1000        11.42        11.42        
> 11.42       342.00
>          4096         1000        15.33        15.33        
> 15.33       509.49
>          8192         1000        22.19        22.20        
> 22.20       703.83
>         16384         1000        34.57        34.57        
> 34.57       903.88
>         32768         1000        51.32        51.32        51.32      
> 1217.94
>         65536          640        85.80        85.81        85.80      
> 1456.74
>        131072          320       155.23       155.24       155.24      
> 1610.40
>        262144          160       301.84       301.86       301.85      
> 1656.39
>        524288           80       598.62       598.69       598.66      
> 1670.31
>       1048576           40      1175.22      1175.30      1175.26      
> 1701.69
>       2097152           20      2309.05      2309.05      2309.05      
> 1732.32
>       4194304           10      4548.72      4548.98      4548.85      
> 1758.64
> [0] Abort: Got FATAL event 3
>  at line 796 in file ibv_channel_manager.c
> rank 0 in job 1  compute-0-0.local_36049   caused collective abort of 
> all ranks
>   exit status of rank 0: killed by signal 9
> If, however, I define my mpdring to contain only Connect-X systems OR 
> only Arbel systems, IMB-MPI1 runs to completion.
>  
> Can any suggest a workaround or is this a real bug with mvapich2?
>  
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


-- 
Pavel Shamis (Pasha)
Mellanox Technologies


From bs at q-leap.de  Sun Apr  6 09:05:54 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Sun, 6 Apr 2008 18:05:54 +0200
Subject: [ofa-general] XmtDiscards
In-Reply-To: <1207401583.15625.224.camel@hrosenstock-ws.xsigo.com>
References: <200804050012.39893.bs@q-leap.de>
	<1207401583.15625.224.camel@hrosenstock-ws.xsigo.com>
Message-ID: <20080406160554.GA28695@lanczos.q-leap.de>

Hello Hal,

On Sat, Apr 05, 2008 at 06:19:43AM -0700, Hal Rosenstock wrote:
> Hi Bernd,
> 
> On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote:
> > Hello,
> > 
> > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> > much better there, at least no further RcvSwRelayErrors, even when the 
> > cluster is in idle state and so far also no SymbolErrors, which we also have 
> > seens before.
> > 
> > However, after I just started a lustre stress test on 50 clients (to a lustre 
> > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> > 9000 XmtDiscards within 30 minutes.
> > 
> > Searching for this error I find "This is a symptom of congestion and may 
> > require tweaking either HOQ or switch lifetime values". 
> > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> > it. I also do not have an idea to set switch lifetime values.  I guess this 
> > isn't related to the opensm timeout option, is it?
> > 
> > Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> > switches, but is this also possible on Flextronics switches?
> 
> What routing algorithm are you using ? Rather than play with those
> switch values, if you are not using up/down, could you try that to see
> if it helps with the congestion you are seeing ?

I now configured up/down, but still got XmtDiscards, though, only on one port.

Error check on lid 205 (SW_pfs1_leaf2) port all:  FAILED
#warn: counter XmtDiscards = 6213       (threshold 100) lid 205 port 1
Error check on lid 205 (SW_pfs1_leaf2) port 1:  FAILED
#warn: counter RcvSwRelayErrors = 1431  (threshold 100) lid 205 port 13
Error check on lid 205 (SW_pfs1_leaf2) port 13:  FAILED

I'm also not sure if up/down is the optimal algorithm for a fabric with only 
two switches.

Since describing the connections in words is a bit difficult, I just upload
a drawing here:

http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-cabling.pdf

The root-guid for the up/down algorithm is leaf-5 of of the small switch. But
I'm still not sure about up/down at all. Doesn't one need for up/down at least
3 switches? Something like this ascii graphic below?


       root-switch
     /            \
    /              \
 Sw-1 ------------ Sw-2


Thanks for your help,
Bernd


PS: These RcvSwRelayErrors are also back again. I think this occur on some 
operations of Lustre. Even if these RcvSwRelayErrors are not critical, they 
are still a bit annoying, since they make it hard to find other errors in 
the output ob ibcheckerrors. 
If we can really ignore these errors, I will write a patch to not display these 
by default.


From bs at q-leap.de  Sun Apr  6 09:09:41 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Sun, 6 Apr 2008 18:09:41 +0200
Subject: [ofa-general] XmtDiscards
In-Reply-To: <20080406065314.GA13374@sashak.voltaire.com>
References: <200804050012.39893.bs@q-leap.de>
	<20080404152932.5e294e47.weiny2@llnl.gov>
	<20080404234547.GA17618@lanczos.q-leap.de>
	<20080406065314.GA13374@sashak.voltaire.com>
Message-ID: <20080406160941.GA28798@lanczos.q-leap.de>

Hello Sasha,

On Sun, Apr 06, 2008 at 06:53:14AM +0000, Sasha Khapyorsky wrote:
> On 01:45 Sat 05 Apr     , Bernd Schubert wrote:
> > 
> > Hmm, I first increased head_of_queue_lifetime to 0x13 and 
> > leaf_head_of_queue_lifetime to 0x20, but this didn't make the error 
> > go away. So I increased head_of_queue_lifetime to 0x15 and 
> > leaf_head_of_queue_lifetime  to 0x50, but this made the fabric to entirely
> > crash.
> 
> Are you using default (min hops) routing? I think it could be deadlock
> due to unlimited head_of_queue_lifetime values.
> 
> > On the node of the master opensm I got an endless number of messages
> > like these:
> > 
> > Apr  5 01:35:03 pfs1n2 kernel: [705448.344542] NETDEV WATCHDOG: ib0: transmit timed out
> > Apr  5 01:35:03 pfs1n2 kernel: [705448.349814] ib0: transmit timeout: latency 411908 msecs
> > Apr  5 01:35:03 pfs1n2 kernel: [705448.355364] ib0: queue stopped 1, tx_head 441, tx_tail 377
> > Apr  5 01:35:04 pfs1n2 kernel: [705449.343495] NETDEV WATCHDOG: ib0: transmit timed out
> > 
> > The slave opensm also went into D-state and is not killable anymore :(
> 
> Interesting... Any more details about this?

unfortunately not. As you may see, it was rather late already and I just 
wanted to get the entire system working, so I rebooted both
nodes running the opensms :(


Thanks,
Bernd


From huanwei at cse.ohio-state.edu  Sun Apr  6 17:57:59 2008
From: huanwei at cse.ohio-state.edu (wei huang)
Date: Sun, 6 Apr 2008 20:57:59 -0400 (EDT)
Subject: [ofa-general] MVAPICH2 crashes on mixed fabric
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D8678166@EPEXCH1.qlogic.org>
Message-ID: <Pine.GSO.4.40.0804062051220.2077-100000@kappa.cse.ohio-state.edu>

Hi Mike,

Currently mvapich2 will detect different HCA type and thus select
different parameters for communication, which may cause the problem. We
are working on this feature and it will be available in our next release.
For now, if you want to run on this setup, please set few environmental
variables like:

mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216 ./a.out

Please let us know if this works. Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Fri, 4 Apr 2008, Mike Heinz wrote:

> Hey, all, I'm not sure if this is a known bug or some sort of limitation
> I'm unaware of, but I've been building and testing with the OFED 1.3 GA
> release on a small fabric that has a mix of Arbel-based and newer
> Connect-X HCAs.
>
> What I've discovered is that mvapich and openmpi work fine across the
> entire fabric, but mvapich2 crashes when I use a mix of Arbels and
> Connect-X. The errors vary depending on the test program but here's an
> example:
>
> [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1
> .
> .
> .
> (output snipped)
> .
> .
> .
>
> #-----------------------------------------------------------------------
> ------
> # Benchmarking Sendrecv
> # #processes = 2
> # ( 3 additional processes waiting in MPI_Barrier)
> #-----------------------------------------------------------------------
> ------
>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> Mbytes/sec
>             0         1000         3.51         3.51         3.51
> 0.00
>             1         1000         3.63         3.63         3.63
> 0.52
>             2         1000         3.67         3.67         3.67
> 1.04
>             4         1000         3.64         3.64         3.64
> 2.09
>             8         1000         3.67         3.67         3.67
> 4.16
>            16         1000         3.67         3.67         3.67
> 8.31
>            32         1000         3.74         3.74         3.74
> 16.32
>            64         1000         3.90         3.90         3.90
> 31.28
>           128         1000         4.75         4.75         4.75
> 51.39
>           256         1000         5.21         5.21         5.21
> 93.79
>           512         1000         5.96         5.96         5.96
> 163.77
>          1024         1000         7.88         7.89         7.89
> 247.54
>          2048         1000        11.42        11.42        11.42
> 342.00
>          4096         1000        15.33        15.33        15.33
> 509.49
>          8192         1000        22.19        22.20        22.20
> 703.83
>         16384         1000        34.57        34.57        34.57
> 903.88
>         32768         1000        51.32        51.32        51.32
> 1217.94
>         65536          640        85.80        85.81        85.80
> 1456.74
>        131072          320       155.23       155.24       155.24
> 1610.40
>        262144          160       301.84       301.86       301.85
> 1656.39
>        524288           80       598.62       598.69       598.66
> 1670.31
>       1048576           40      1175.22      1175.30      1175.26
> 1701.69
>       2097152           20      2309.05      2309.05      2309.05
> 1732.32
>       4194304           10      4548.72      4548.98      4548.85
> 1758.64
> [0] Abort: Got FATAL event 3
>  at line 796 in file ibv_channel_manager.c
> rank 0 in job 1  compute-0-0.local_36049   caused collective abort of
> all ranks
>   exit status of rank 0: killed by signal 9
>
> If, however, I define my mpdring to contain only Connect-X systems OR
> only Arbel systems, IMB-MPI1 runs to completion.
>
> Can any suggest a workaround or is this a real bug with mvapich2?
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
>
>


From balaji at mcs.anl.gov  Sun Apr  6 17:16:11 2008
From: balaji at mcs.anl.gov (Pavan Balaji)
Date: Sun, 06 Apr 2008 19:16:11 -0500
Subject: [ofa-general] [p2s2-announce] Deadline Extension: International
 Workshop on Parallel Programming Models and Systems Software (P2S2)
Message-ID: <47F967CB.7020905@mcs.anl.gov>


Due to several requests, we have extended the deadline for the P2S2 
workshop to April 25th. Please find the detailed CFP below.

----------------------------------------------------------------------

CALL FOR PAPERS
===============

First International Workshop on
Parallel Programming Models and Systems Software
for High-end Computing (P2S2)

(http://www.mcs.anl.gov/events/workshops/p2s2)

Sep. 8th, 2008

To be held in conjunction with
ICPP-08: The 27th International Conference on Parallel Processing
Sep. 8-12, 2008
Portland, Oregon, USA


SCOPE
-----
The goal of this workshop is to bring together researchers and
practitioners in parallel programming models and systems software for
high-end computing systems. Please join us in a discussion of new
ideas, experiences, and the latest trends in these areas at the
workshop.


TOPICS OF INTEREST
------------------
The focus areas for this workshop include, but are not limited to:

     * Programming models and their high-performance implementations
           o MPI, Sockets, OpenMP, Global Arrays, X10, UPC, Chapel
           o Other Hybrid Programming Models
     * Systems software for scientific and enterprise computing
           o Communication sub-subsystems for high-end computing
           o High-performance File and storage systems
           o Fault-tolerance techniques and implementations
           o Efficient and high-performance virtualization and other 
management mechanisms
     * Tools for Management, Maintenance, Coordination and Synchronization
           o Software for Enterprise Data-centers using Modern Architectures
           o Job scheduling libraries
           o Management libraries for large-scale system
           o Toolkits for process and task coordination on modern platforms
     * Performance evaluation, analysis and modeling of emerging 
computing platforms


PROCEEDINGS
-----------
Proceedings of this workshop will be published by the IEEE Computer
Society (together with the ICPP conference proceedings) in CD format
only and will be available at the conference.


SUBMISSION INSTRUCTIONS
-----------------------
Submissions should be in PDF format in U.S. Letter size paper. They
should not exceed 8 pages (all inclusive). Submissions will be judged
based on relevance, significance, originality, correctness and
clarity.


DATES AND DEADLINES
-------------------
Paper Submission: Extended to April 25th, 2008
Author Notification: May 20th, 2008
Camera Ready: June 2nd, 2008


PROGRAM CHAIRS
--------------
  * Pavan Balaji (Argonne National Laboratory)
  * Sayantan Sur (IBM Research)


STEERING COMMITTEE
------------------
  * William D. Gropp (University of Illinois Urbana-Champaign)
  * Dhabaleswar K. Panda (Ohio State University)
  * Vijay Saraswat (IBM Research)


PROGRAM COMMITTEE
-----------------
  * David Bernholdt (Oak Ridge National Laboratory)
  * Ron Brightwell (Sandia National Laboratory)
  * Wu-chun Feng (Virginia Tech)
  * Richard Graham (Oak Ridge National Laboratory)
  * Hyun-wook Jin (Konkuk University, South Korea)
  * Sameer Kumar (IBM Research)
  * Doug Lea (State University of New York at Oswego)
  * Jarek Nieplocha (Pacific Northwest National Laboratory)
  * Scott Pakin (Los Alamos National Laboratory)
  * Vivek Sarkar (Rice University)
  * Rajeev Thakur (Argonne National Laboratory)
  * Pete Wyckoff (Ohio Supercomputing Center)

If you have any questions, please contact us at p2s2-chairs at mcs.anl.gov

========================================================================
If you do not want to receive any more announcements regarding the
P2S2 workshop, please send an email to majordomo at mcs.anl.gov with the
email body (not email subject) as "unsubscribe p2s2-announce".
========================================================================

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


From clameter at sgi.com  Sun Apr  6 22:45:41 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Sun, 6 Apr 2008 22:45:41 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH] mmu notifier #v11
In-Reply-To: <20080405002330.GF14784@duo.random>
References: <Pine.LNX.4.64.0804021048460.27214@schroedinger.engr.sgi.com>
	<Pine.LNX.4.64.0804021402190.30337@schroedinger.engr.sgi.com>
	<20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
	<Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
	<20080404202055.GA14784@duo.random>
	<Pine.LNX.4.64.0804041504310.12396@schroedinger.engr.sgi.com>
	<20080405002330.GF14784@duo.random>
Message-ID: <Pine.LNX.4.64.0804062244110.18148@schroedinger.engr.sgi.com>

On Sat, 5 Apr 2008, Andrea Arcangeli wrote:

> In short when working with single pages it's a waste to block the
> secondary-mmu page fault, because it's zero cost to invalidate_page
> before put_page. Not even GRU need to do that.

That depends on what the notifier is being used for. Some serialization 
with the external mappings has to be done anyways. And its cleaner to have 
one API that does a lock/unlock scheme. Atomic operations can easily lead
to races.


From clameter at sgi.com  Sun Apr  6 22:48:56 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Sun, 6 Apr 2008 22:48:56 -0700 (PDT)
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <20080405005759.GH14784@duo.random>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
	<20080405005759.GH14784@duo.random>
Message-ID: <Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>

On Sat, 5 Apr 2008, Andrea Arcangeli wrote:

> > +	rcu_assign_pointer(mm->emm_notifier, e);
> > +	mm_unlock(mm);
> 
> My mm_lock solution makes all rcu serialization an unnecessary
> overhead so you should remove it like I already did in #v11. If it
> wasn't the case, then mm_lock wouldn't be a definitive fix for the
> race.

There still could be junk in the cache of one cpu. If you just read the 
new pointer but use the earlier content pointed to then you have a 
problem.

So a memory fence / barrier is needed to guarantee that the contents 
pointed to are fetched after the pointer.


From andrea at qumranet.com  Sun Apr  6 23:02:34 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Mon, 7 Apr 2008 08:02:34 +0200
Subject: [ofa-general] Re: [PATCH] mmu notifier #v11
In-Reply-To: <Pine.LNX.4.64.0804062244110.18148@schroedinger.engr.sgi.com>
References: <20080402220148.GV19189@duo.random>
	<Pine.LNX.4.64.0804021503320.31247@schroedinger.engr.sgi.com>
	<20080402221716.GY19189@duo.random>
	<Pine.LNX.4.64.0804021821230.639@schroedinger.engr.sgi.com>
	<20080403151908.GB9603@duo.random>
	<Pine.LNX.4.64.0804031215050.7480@schroedinger.engr.sgi.com>
	<20080404202055.GA14784@duo.random>
	<Pine.LNX.4.64.0804041504310.12396@schroedinger.engr.sgi.com>
	<20080405002330.GF14784@duo.random>
	<Pine.LNX.4.64.0804062244110.18148@schroedinger.engr.sgi.com>
Message-ID: <20080407060234.GD9309@duo.random>

On Sun, Apr 06, 2008 at 10:45:41PM -0700, Christoph Lameter wrote:
> That depends on what the notifier is being used for. Some serialization 
> with the external mappings has to be done anyways. And its cleaner to have 

As far as I can tell no, you don't need to serialize against the
secondary mmu page fault in invalidate_page, like you instead have to
do in range_begin if you don't unpin the pages in range_end.

> one API that does a lock/unlock scheme. Atomic operations can easily lead
> to races.

What races? Note that if you don't want to optimize XPMEM and GRU can
feel free to implement their own invalidate_page as this:

     invalidate_page(mm, addr) {
     	range_begin(mm, addr, addr+PAGE_SIZE)
	range_end(mm, addr, addr+PAGE_SIZE)
     }

There's zero risk of adding races if they do this, but I doubt they
want to run as slow as with EMM so I guess they'll exploit the
optimization by going lock-free vs the spte page fault in
invalidate_page.


From andrea at qumranet.com  Sun Apr  6 23:06:02 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Mon, 7 Apr 2008 08:06:02 +0200
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
	<20080405005759.GH14784@duo.random>
	<Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>
Message-ID: <20080407060602.GE9309@duo.random>

On Sun, Apr 06, 2008 at 10:48:56PM -0700, Christoph Lameter wrote:
> On Sat, 5 Apr 2008, Andrea Arcangeli wrote:
> 
> > > +	rcu_assign_pointer(mm->emm_notifier, e);
> > > +	mm_unlock(mm);
> > 
> > My mm_lock solution makes all rcu serialization an unnecessary
> > overhead so you should remove it like I already did in #v11. If it
> > wasn't the case, then mm_lock wouldn't be a definitive fix for the
> > race.
> 
> There still could be junk in the cache of one cpu. If you just read the 
> new pointer but use the earlier content pointed to then you have a 
> problem.

There can't be junk, spinlocks provides semantics of proper memory
barriers, just like rcu, so it's entirely superflous.

There could be junk only if any of the mmu_notifier_* methods would be
invoked _outside_ the i_mmap_lock and _outside_ the anon_vma and
outside the mmap_sem, that is never the case of course.

> So a memory fence / barrier is needed to guarantee that the contents 
> pointed to are fetched after the pointer.

It's not needed... if you were right we could never possibly run a
list_for_each inside any spinlock protected critical section and we'd
always need to use the _rcu version instead. The _rcu version is
needed only when the list walk happens outside the spinlock critical
section of course (rcu = no spinlock cacheline exlusive write
operation in the read side, here the read side takes the spinlock big time).


From clameter at sgi.com  Sun Apr  6 23:20:08 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Sun, 6 Apr 2008 23:20:08 -0700 (PDT)
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <20080407060602.GE9309@duo.random>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
	<20080405005759.GH14784@duo.random>
	<Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>
	<20080407060602.GE9309@duo.random>
Message-ID: <Pine.LNX.4.64.0804062314080.18728@schroedinger.engr.sgi.com>

On Mon, 7 Apr 2008, Andrea Arcangeli wrote:

> > > My mm_lock solution makes all rcu serialization an unnecessary
> > > overhead so you should remove it like I already did in #v11. If it
> > > wasn't the case, then mm_lock wouldn't be a definitive fix for the
> > > race.
> > 
> > There still could be junk in the cache of one cpu. If you just read the 
> > new pointer but use the earlier content pointed to then you have a 
> > problem.
> 
> There can't be junk, spinlocks provides semantics of proper memory
> barriers, just like rcu, so it's entirely superflous.
> 
> There could be junk only if any of the mmu_notifier_* methods would be
> invoked _outside_ the i_mmap_lock and _outside_ the anon_vma and
> outside the mmap_sem, that is never the case of course.

So we use other locks to perform serialization on the list chains? 
Basically the list chains are protected by either mmap_sem or an rmap 
lock? We need to document that.

In that case we could also add an unregister function.


From andrea at qumranet.com  Mon Apr  7 00:13:30 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Mon, 7 Apr 2008 09:13:30 +0200
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <Pine.LNX.4.64.0804062314080.18728@schroedinger.engr.sgi.com>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
	<20080405005759.GH14784@duo.random>
	<Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>
	<20080407060602.GE9309@duo.random>
	<Pine.LNX.4.64.0804062314080.18728@schroedinger.engr.sgi.com>
Message-ID: <20080407071330.GH9309@duo.random>

On Sun, Apr 06, 2008 at 11:20:08PM -0700, Christoph Lameter wrote:
> On Mon, 7 Apr 2008, Andrea Arcangeli wrote:
> 
> > > > My mm_lock solution makes all rcu serialization an unnecessary
> > > > overhead so you should remove it like I already did in #v11. If it
> > > > wasn't the case, then mm_lock wouldn't be a definitive fix for the
> > > > race.
> > > 
> > > There still could be junk in the cache of one cpu. If you just read the 
> > > new pointer but use the earlier content pointed to then you have a 
> > > problem.
> > 
> > There can't be junk, spinlocks provides semantics of proper memory
> > barriers, just like rcu, so it's entirely superflous.
> > 
> > There could be junk only if any of the mmu_notifier_* methods would be
> > invoked _outside_ the i_mmap_lock and _outside_ the anon_vma and
> > outside the mmap_sem, that is never the case of course.
> 
> So we use other locks to perform serialization on the list chains? 
> Basically the list chains are protected by either mmap_sem or an rmap 
> lock? We need to document that.

I thought it was obvious, if it wasn't the case how could mm_lock fix
any range_begin/range_end race? Also to document it you've just to
remove _rcu, the only confusion could arise from reading your patch,
mine couldn't raise any doubt that rcu isn't needed and regular
spinlocks/semaphores are serializing all methods.

> In that case we could also add an unregister function.

Indeed, but it still can't run after mm_users == 0. So for unregister
to work one has to boost the mm_users first. exit_mmap doesn't take
any lock when destroying the mm because it assumes nobody is messing
with the mm at that time. So that requirement doesn't change, but now one
can unregister before mm_users is dropped to 0.

Also I wonder if I should make a new version of the mm_lock/unlock so
that they will guarantee SIGKILL handling in O(N) anywhere inside
mm_lock or mm_unlock, where N is the number of vmas, that will either
require a VM_MM_LOCK_I/VM_MM_LOCK_A bitflag, or a vmalloc of two
bitflag arrays inside the mmap_sem critical section returned by
mm_lock as a cookie and passed as param to mm_unlock. The SIGKILL
check is mostly worthless in spin_lock context (especially on UP or
low-smp) but given the later patches switches all relevant VM locks to
mutexes (this should happen under a config option to avoid hurting
server performance), it might be worth it. That will require
mmu_notifier_register to return both -EINTR and -ENOMEM if using the
vmalloc trick to avoid registering two more vm_flags
bitflags. Alternatively we can have mm_lock fail with -EPERM if there
aren't enough capabilities and the number of vmas is bigger than a
certain number. This is more or less like the requirement to attach
during startup. This is preferable IMHO because it's effective even
without preempt-rt and in turn with all locks being spinlocks for
maximum performance, so I'll likely release #v12 with this change. In
any case the mmu_notifier_register will need to return error (an
unregister as well for that matter). But those are very minor issues,
#v11 can go in -mm now to ensure mmu notifiers will be shipped with 2.6.26rc. 


From 2rsvn at longbeachgardenhotel.com  Mon Apr  7 02:20:04 2008
From: 2rsvn at longbeachgardenhotel.com (auberon william)
Date: Mon, 07 Apr 2008 09:20:04 +0000
Subject: [ofa-general] genuine artifacts
Message-ID: <000601c8989f$01be5f5b$76becfa8@xuqpm>

Detailed quality replicas of the most wanted designer watches are here!
 Genuine solid stainless steel, and 99.9% accurate markings, finish and weight.
- The worlds largest online retailer of luxury products, including:
Rolex Sports Models
Rolex Datejusts
  Breitling
Cartier
Porsche Design
Dolce & Gabbana
Dior
Gucci
Hermes Watches
Patek Philippe
 Visit - www.spoooke.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080407/ec559db4/attachment.html>

From ossrosch at linux.vnet.ibm.com  Mon Apr  7 04:31:39 2008
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Mon, 7 Apr 2008 12:31:39 +0100
Subject: [ofa-general] Plan for OFED-1.3.1?
Message-ID: <200804071331.42031.ossrosch@linux.vnet.ibm.com>

Hi,

is there any schedule for the OFED-1.3.1 release? When should we start to send some minor bugfixes for ehca? Would the kernel-base be the same 2.6.24
or will it change to 2.6.25?

regards Stefan 


From ossrosch at linux.vnet.ibm.com  Mon Apr  7 05:57:33 2008
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Mon, 7 Apr 2008 13:57:33 +0100
Subject: [ofa-general] [PATCH] IB/ehca: extend query_device() and
	query_port() to support all values for ibv_devinfo
Message-ID: <200804071457.36248.ossrosch@linux.vnet.ibm.com>

Also, introduce a few inline helper functions to make the code more readable.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_hca.c |  128 ++++++++++++++++++++------------
 1 files changed, 80 insertions(+), 48 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 8832123..f89c5f8 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -43,6 +43,11 @@
 #include "ehca_iverbs.h"
 #include "hcp_if.h"
 
+static inline unsigned int limit_uint(unsigned int value)
+{
+	return min_t(unsigned int, value, INT_MAX);
+}
+
 int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
 {
 	int i, ret = 0;
@@ -83,37 +88,40 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props)
 	props->vendor_id       = rblock->vendor_id >> 8;
 	props->vendor_part_id  = rblock->vendor_part_id >> 16;
 	props->hw_ver          = rblock->hw_ver;
-	props->max_qp          = min_t(unsigned, rblock->max_qp, INT_MAX);
-	props->max_qp_wr       = min_t(unsigned, rblock->max_wqes_wq, INT_MAX);
-	props->max_sge         = min_t(unsigned, rblock->max_sge, INT_MAX);
-	props->max_sge_rd      = min_t(unsigned, rblock->max_sge_rd, INT_MAX);
-	props->max_cq          = min_t(unsigned, rblock->max_cq, INT_MAX);
-	props->max_cqe         = min_t(unsigned, rblock->max_cqe, INT_MAX);
-	props->max_mr          = min_t(unsigned, rblock->max_mr, INT_MAX);
-	props->max_mw          = min_t(unsigned, rblock->max_mw, INT_MAX);
-	props->max_pd          = min_t(unsigned, rblock->max_pd, INT_MAX);
-	props->max_ah          = min_t(unsigned, rblock->max_ah, INT_MAX);
-	props->max_fmr         = min_t(unsigned, rblock->max_mr, INT_MAX);
+	props->max_qp          = limit_uint(rblock->max_qp);
+	props->max_qp_wr       = limit_uint(rblock->max_wqes_wq);
+	props->max_sge         = limit_uint(rblock->max_sge);
+	props->max_sge_rd      = limit_uint(rblock->max_sge_rd);
+	props->max_cq          = limit_uint(rblock->max_cq);
+	props->max_cqe         = limit_uint(rblock->max_cqe);
+	props->max_mr          = limit_uint(rblock->max_mr);
+	props->max_mw          = limit_uint(rblock->max_mw);
+	props->max_pd          = limit_uint(rblock->max_pd);
+	props->max_ah          = limit_uint(rblock->max_ah);
+	props->max_ee          = limit_uint(rblock->max_rd_ee_context);
+	props->max_rdd         = limit_uint(rblock->max_rd_domain);
+	props->max_fmr         = limit_uint(rblock->max_mr);
+	props->local_ca_ack_delay  = limit_uint(rblock->local_ca_ack_delay);
+	props->max_qp_rd_atom  = limit_uint(rblock->max_rr_qp);
+	props->max_ee_rd_atom  = limit_uint(rblock->max_rr_ee_context);
+	props->max_res_rd_atom = limit_uint(rblock->max_rr_hca);
+	props->max_qp_init_rd_atom = limit_uint(rblock->max_act_wqs_qp);
+	props->max_ee_init_rd_atom = limit_uint(rblock->max_act_wqs_ee_context);
 
 	if (EHCA_BMASK_GET(HCA_CAP_SRQ, shca->hca_cap)) {
-		props->max_srq         = props->max_qp;
-		props->max_srq_wr      = props->max_qp_wr;
+		props->max_srq         = limit_uint(props->max_qp);
+		props->max_srq_wr      = limit_uint(props->max_qp_wr);
 		props->max_srq_sge     = 3;
 	}
 
-	props->max_pkeys       = 16;
-	props->local_ca_ack_delay
-		= rblock->local_ca_ack_delay;
-	props->max_raw_ipv6_qp
-		= min_t(unsigned, rblock->max_raw_ipv6_qp, INT_MAX);
-	props->max_raw_ethy_qp
-		= min_t(unsigned, rblock->max_raw_ethy_qp, INT_MAX);
-	props->max_mcast_grp
-		= min_t(unsigned, rblock->max_mcast_grp, INT_MAX);
-	props->max_mcast_qp_attach
-		= min_t(unsigned, rblock->max_mcast_qp_attach, INT_MAX);
+	props->max_pkeys           = 16;
+	props->local_ca_ack_delay  = limit_uint(rblock->local_ca_ack_delay);
+	props->max_raw_ipv6_qp     = limit_uint(rblock->max_raw_ipv6_qp);
+	props->max_raw_ethy_qp     = limit_uint(rblock->max_raw_ethy_qp);
+	props->max_mcast_grp       = limit_uint(rblock->max_mcast_grp);
+	props->max_mcast_qp_attach = limit_uint(rblock->max_mcast_qp_attach);
 	props->max_total_mcast_qp_attach
-		= min_t(unsigned, rblock->max_total_mcast_qp_attach, INT_MAX);
+		= limit_uint(rblock->max_total_mcast_qp_attach);
 
 	/* translate device capabilities */
 	props->device_cap_flags = IB_DEVICE_SYS_IMAGE_GUID |
@@ -128,6 +136,46 @@ query_device1:
 	return ret;
 }
 
+static inline int map_mtu(struct ehca_shca *shca, u32 fw_mtu)
+{
+	switch (fw_mtu) {
+	case 0x1:
+		return IB_MTU_256;
+	case 0x2:
+		return IB_MTU_512;
+	case 0x3:
+		return IB_MTU_1024;
+	case 0x4:
+		return IB_MTU_2048;
+	case 0x5:
+		return IB_MTU_4096;
+	default:
+		ehca_err(&shca->ib_device, "Unknown MTU size: %x.",
+			 fw_mtu);
+		return 0;
+	}
+}
+
+static inline int map_number_of_vls(struct ehca_shca *shca, u32 vl_cap)
+{
+	switch (vl_cap) {
+	case 0x1:
+		return 1;
+	case 0x2:
+		return 2;
+	case 0x3:
+		return 4;
+	case 0x4:
+		return 8;
+	case 0x5:
+		return 15;
+	default:
+		ehca_err(&shca->ib_device, "invalid Vl Capability: %x.",
+			 vl_cap);
+		return 0;
+	}
+}
+
 int ehca_query_port(struct ib_device *ibdev,
 		    u8 port, struct ib_port_attr *props)
 {
@@ -152,31 +200,14 @@ int ehca_query_port(struct ib_device *ibdev,
 
 	memset(props, 0, sizeof(struct ib_port_attr));
 
-	switch (rblock->max_mtu) {
-	case 0x1:
-		props->active_mtu = props->max_mtu = IB_MTU_256;
-		break;
-	case 0x2:
-		props->active_mtu = props->max_mtu = IB_MTU_512;
-		break;
-	case 0x3:
-		props->active_mtu = props->max_mtu = IB_MTU_1024;
-		break;
-	case 0x4:
-		props->active_mtu = props->max_mtu = IB_MTU_2048;
-		break;
-	case 0x5:
-		props->active_mtu = props->max_mtu = IB_MTU_4096;
-		break;
-	default:
-		ehca_err(&shca->ib_device, "Unknown MTU size: %x.",
-			 rblock->max_mtu);
-		break;
-	}
-
+	props->active_mtu = props->max_mtu = map_mtu(shca, rblock->max_mtu);
 	props->port_cap_flags  = rblock->capability_mask;
 	props->gid_tbl_len     = rblock->gid_tbl_len;
-	props->max_msg_sz      = rblock->max_msg_sz;
+	if (rblock->max_msg_sz) {
+		props->max_msg_sz      = rblock->max_msg_sz;
+	} else {
+		props->max_msg_sz      = 0x1 << 31;
+	}
 	props->bad_pkey_cntr   = rblock->bad_pkey_cntr;
 	props->qkey_viol_cntr  = rblock->qkey_viol_cntr;
 	props->pkey_tbl_len    = rblock->pkey_tbl_len;
@@ -186,6 +217,7 @@ int ehca_query_port(struct ib_device *ibdev,
 	props->sm_sl           = rblock->sm_sl;
 	props->subnet_timeout  = rblock->subnet_timeout;
 	props->init_type_reply = rblock->init_type_reply;
+	props->max_vl_num      = map_number_of_vls(shca, rblock->vl_cap);
 
 	if (rblock->state && rblock->phys_width) {
 		props->phys_state      = rblock->phys_pstate;
-- 
1.5.2


From michael.heinz at qlogic.com  Mon Apr  7 06:20:48 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Mon, 7 Apr 2008 08:20:48 -0500
Subject: [ofa-general] MVAPICH2 crashes on mixed fabric
In-Reply-To: <Pine.GSO.4.40.0804062051220.2077-100000@kappa.cse.ohio-state.edu>
References: <C07C40DB2364324799506DE8FF12F8D8678166@EPEXCH1.qlogic.org>
	<Pine.GSO.4.40.0804062051220.2077-100000@kappa.cse.ohio-state.edu>
Message-ID: <C07C40DB2364324799506DE8FF12F8D86781A7@EPEXCH1.qlogic.org>

Wei,

Thanks so much for the tip - I'll give it a try. 


--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania

-----Original Message-----
From: wei huang [mailto:huanwei at cse.ohio-state.edu] 
Sent: Sunday, April 06, 2008 8:58 PM
To: Mike Heinz
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general] MVAPICH2 crashes on mixed fabric

Hi Mike,

Currently mvapich2 will detect different HCA type and thus select
different parameters for communication, which may cause the problem. We
are working on this feature and it will be available in our next
release.
For now, if you want to run on this setup, please set few environmental
variables like:

mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216
./a.out

Please let us know if this works. Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering Ohio State University OH 43210
Tel: (614)292-8501


On Fri, 4 Apr 2008, Mike Heinz wrote:

> Hey, all, I'm not sure if this is a known bug or some sort of 
> limitation I'm unaware of, but I've been building and testing with the

> OFED 1.3 GA release on a small fabric that has a mix of Arbel-based 
> and newer Connect-X HCAs.
>
> What I've discovered is that mvapich and openmpi work fine across the 
> entire fabric, but mvapich2 crashes when I use a mix of Arbels and 
> Connect-X. The errors vary depending on the test program but here's an
> example:
>
> [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 .
> .
> .
> (output snipped)
> .
> .
> .
>
> #---------------------------------------------------------------------
> --
> ------
> # Benchmarking Sendrecv
> # #processes = 2
> # ( 3 additional processes waiting in MPI_Barrier)
> #---------------------------------------------------------------------
> --
> ------
>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec] 
> Mbytes/sec
>             0         1000         3.51         3.51         3.51
> 0.00
>             1         1000         3.63         3.63         3.63
> 0.52
>             2         1000         3.67         3.67         3.67
> 1.04
>             4         1000         3.64         3.64         3.64
> 2.09
>             8         1000         3.67         3.67         3.67
> 4.16
>            16         1000         3.67         3.67         3.67
> 8.31
>            32         1000         3.74         3.74         3.74
> 16.32
>            64         1000         3.90         3.90         3.90
> 31.28
>           128         1000         4.75         4.75         4.75
> 51.39
>           256         1000         5.21         5.21         5.21
> 93.79
>           512         1000         5.96         5.96         5.96
> 163.77
>          1024         1000         7.88         7.89         7.89
> 247.54
>          2048         1000        11.42        11.42        11.42
> 342.00
>          4096         1000        15.33        15.33        15.33
> 509.49
>          8192         1000        22.19        22.20        22.20
> 703.83
>         16384         1000        34.57        34.57        34.57
> 903.88
>         32768         1000        51.32        51.32        51.32
> 1217.94
>         65536          640        85.80        85.81        85.80
> 1456.74
>        131072          320       155.23       155.24       155.24
> 1610.40
>        262144          160       301.84       301.86       301.85
> 1656.39
>        524288           80       598.62       598.69       598.66
> 1670.31
>       1048576           40      1175.22      1175.30      1175.26
> 1701.69
>       2097152           20      2309.05      2309.05      2309.05
> 1732.32
>       4194304           10      4548.72      4548.98      4548.85
> 1758.64
> [0] Abort: Got FATAL event 3
>  at line 796 in file ibv_channel_manager.c
> rank 0 in job 1  compute-0-0.local_36049   caused collective abort of
> all ranks
>   exit status of rank 0: killed by signal 9
>
> If, however, I define my mpdring to contain only Connect-X systems OR 
> only Arbel systems, IMB-MPI1 runs to completion.
>
> Can any suggest a workaround or is this a real bug with mvapich2?
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania
>
>


From hrosenstock at xsigo.com  Mon Apr  7 06:35:10 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Mon, 07 Apr 2008 06:35:10 -0700
Subject: [ofa-general] XmtDiscards
In-Reply-To: <20080406160554.GA28695@lanczos.q-leap.de>
References: <200804050012.39893.bs@q-leap.de>
	<1207401583.15625.224.camel@hrosenstock-ws.xsigo.com>
	<20080406160554.GA28695@lanczos.q-leap.de>
Message-ID: <1207575310.15625.258.camel@hrosenstock-ws.xsigo.com>

Hi Bernd,

On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote:
> Hello Hal,
> 
> On Sat, Apr 05, 2008 at 06:19:43AM -0700, Hal Rosenstock wrote:
> > Hi Bernd,
> > 
> > On Sat, 2008-04-05 at 00:12 +0200, Bernd Schubert wrote:
> > > Hello,
> > > 
> > > after I upgraded one of our clusters to opensm-3.2.1 it seems to have gotten 
> > > much better there, at least no further RcvSwRelayErrors, even when the 
> > > cluster is in idle state and so far also no SymbolErrors, which we also have 
> > > seens before.
> > > 
> > > However, after I just started a lustre stress test on 50 clients (to a lustre 
> > > storage system with 20 OSS servers and 60 OSTs), ibcheckerrors reports about 
> > > 9000 XmtDiscards within 30 minutes.
> > > 
> > > Searching for this error I find "This is a symptom of congestion and may 
> > > require tweaking either HOQ or switch lifetime values". 
> > > Well, I have to admit I neither know what HOQ is, nor do I know how to tweak 
> > > it. I also do not have an idea to set switch lifetime values.  I guess this 
> > > isn't related to the opensm timeout option, is it?
> > > 
> > > Hmm, I just found a cisci pdf describing how to set the lifetime on these 
> > > switches, but is this also possible on Flextronics switches?
> > 
> > What routing algorithm are you using ? Rather than play with those
> > switch values, if you are not using up/down, could you try that to see
> > if it helps with the congestion you are seeing ?
> 
> I now configured up/down, but still got XmtDiscards, though, only on one port.
> 
> Error check on lid 205 (SW_pfs1_leaf2) port all:  FAILED
> #warn: counter XmtDiscards = 6213       (threshold 100) lid 205 port 1
> Error check on lid 205 (SW_pfs1_leaf2) port 1:  FAILED
> #warn: counter RcvSwRelayErrors = 1431  (threshold 100) lid 205 port 13
> Error check on lid 205 (SW_pfs1_leaf2) port 13:  FAILED

Are you running IPoIB ? If so, SwRelayErrors are not necessarily
indicative of a "real" issue due to the fact that multicasts reflected
on the same port are mistakenly counted.

> I'm also not sure if up/down is the optimal algorithm for a fabric with only 
> two switches.
> 
> Since describing the connections in words is a bit difficult, I just upload
> a drawing here:
> 
> http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-cabling.pdf
> 
> The root-guid for the up/down algorithm is leaf-5 of of the small switch. But
> I'm still not sure about up/down at all. Doesn't one need for up/down at least
> 3 switches? Something like this ascii graphic below?
> 
> 
>        root-switch
>      /            \
>     /              \
>  Sw-1 ------------ Sw-2

Doesn't your chassis switch have many switches in it ? You did say it
was 144 ports so it's made up of a number of switches.

You may need to choose a "better" root than up/down automatically
determines.

-- Hal

> Thanks for your help,
> Bernd
> 
> 
> PS: These RcvSwRelayErrors are also back again. I think this occur on some 
> operations of Lustre. Even if these RcvSwRelayErrors are not critical, they 
> are still a bit annoying, since they make it hard to find other errors in 
> the output ob ibcheckerrors. 
> If we can really ignore these errors, I will write a patch to not display these 
> by default.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From bs at q-leap.de  Mon Apr  7 06:53:47 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Mon, 7 Apr 2008 15:53:47 +0200
Subject: [ofa-general] XmtDiscards
In-Reply-To: <1207575310.15625.258.camel@hrosenstock-ws.xsigo.com>
References: <200804050012.39893.bs@q-leap.de>
	<20080406160554.GA28695@lanczos.q-leap.de>
	<1207575310.15625.258.camel@hrosenstock-ws.xsigo.com>
Message-ID: <200804071553.47457.bs@q-leap.de>

Hello Hal,

On Monday 07 April 2008 15:35:10 Hal Rosenstock wrote:
> Hi Bernd,
>
> On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote:
> > Hello Hal,
> >
> > > > Searching for this error I find "This is a symptom of congestion and
> > > > may require tweaking either HOQ or switch lifetime values".
> > > > Well, I have to admit I neither know what HOQ is, nor do I know how
> > > > to tweak it. I also do not have an idea to set switch lifetime
> > > > values.  I guess this isn't related to the opensm timeout option, is
> > > > it?
> > > >
> > > > Hmm, I just found a cisci pdf describing how to set the lifetime on
> > > > these switches, but is this also possible on Flextronics switches?
> > >
> > > What routing algorithm are you using ? Rather than play with those
> > > switch values, if you are not using up/down, could you try that to see
> > > if it helps with the congestion you are seeing ?
> >
> > I now configured up/down, but still got XmtDiscards, though, only on one
> > port.
> >
> > Error check on lid 205 (SW_pfs1_leaf2) port all:  FAILED
> > #warn: counter XmtDiscards = 6213       (threshold 100) lid 205 port 1
> > Error check on lid 205 (SW_pfs1_leaf2) port 1:  FAILED
> > #warn: counter RcvSwRelayErrors = 1431  (threshold 100) lid 205 port 13
> > Error check on lid 205 (SW_pfs1_leaf2) port 13:  FAILED
>
> Are you running IPoIB ? If so, SwRelayErrors are not necessarily
> indicative of a "real" issue due to the fact that multicasts reflected
> on the same port are mistakenly counted.

so far only Lustre did IPoIB for network initialization. Once it finds a 
working connection it does RDMA. But I'm not sure about what it does in case 
of problems, e.g. server reboot, I guess it then does again IPoIB. 

Is there a way to find out if these RcvSwRelayErrors are due to multicast or 
due to real problems?


>
> > I'm also not sure if up/down is the optimal algorithm for a fabric with
> > only two switches.
> >
> > Since describing the connections in words is a bit difficult, I just
> > upload a drawing here:
> >
> > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-ca
> >bling.pdf
> >
> > The root-guid for the up/down algorithm is leaf-5 of of the small switch.
> > But I'm still not sure about up/down at all. Doesn't one need for up/down
> > at least 3 switches? Something like this ascii graphic below?
> >
> >
> >        root-switch
> >      /            \
> >     /              \
> >  Sw-1 ------------ Sw-2
>
> Doesn't your chassis switch have many switches in it ? You did say it
> was 144 ports so it's made up of a number of switches.

Yes, it's made up of a number of switches.

>
> You may need to choose a "better" root than up/down automatically
> determines.
>

Opensm isn't able to detect a root itself at all. As said above I first 
configured leaf-5 of the small switch (see the pdf file above), but now 
switched it to leaf-6 guid. I have no idea which would be optimal for our 
switches - I guess I have to create a drawing from the ibnetdiscover output 
to figure this out. 
I will also later on try to check with ibutils if it detects errors.


Thanks,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH


From a.p.zijlstra at chello.nl  Mon Apr  7 06:55:48 2008
From: a.p.zijlstra at chello.nl (Peter Zijlstra)
Date: Mon, 07 Apr 2008 15:55:48 +0200
Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against
	reclaim
In-Reply-To: <20080405004127.GG14784@duo.random>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org>
	<20080405004127.GG14784@duo.random>
Message-ID: <1207576548.15579.43.camel@twins>

On Sat, 2008-04-05 at 02:41 +0200, Andrea Arcangeli wrote:
> On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote:
> > I think you can break this if() down a bit:
> >
> > 			if (!(vma->vm_file && vma->vm_file->f_mapping))
> > 				continue;
> 
> It makes no difference at runtime, coding style preferences are quite
> subjective.

I'll have to concurr with Jeremy here, please break that monstrous if
stmt down. It might not matter to the compiler, but it sure as hell
helps for anyone trying to understand/maintain the thing.


From erezz at Voltaire.COM  Mon Apr  7 07:35:39 2008
From: erezz at Voltaire.COM (Erez Zilber)
Date: Mon, 07 Apr 2008 17:35:39 +0300
Subject: [ofa-general] About RDMA_CM_EVENT_DEVICE_REMOVAL
Message-ID: <47FA313B.20809@Voltaire.COM>

Sean,

I'm trying to add a better implementation to this event in iSER (better
than the current BUG() call that we have). I have 2 questions:

   1. Is this event raised for each connection?
   2. After the event is raised, I guess that I need to release all IB
      resources for that connection, right? If you take a look at
      iser_free_ib_conn_res() (in ulp/iser/iser_verbs.c), you can see
      that we call rdma_destroy_id. This call never returns. Should I
      call rdma_destroy_id while handling RDMA_CM_EVENT_DEVICE_REMOVAL?

Thanks,
Erez


From swise at opengridcomputing.com  Mon Apr  7 07:37:55 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 07 Apr 2008 09:37:55 -0500
Subject: [ofa-general] Re: Has anyone tried running RDS over 10GE / IWARP
	NICs ?
In-Reply-To: <ada1w5lvc31.fsf@cisco.com>
References: <47F3C2EF.6010304@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA305547200@xmb-sjc-216.amer.cisco.com>	<47F3C5D1.5000003@oracle.com>
	<47F3CA89.9080406@oracle.com>	<A15335FBE9BD2449AF2C9EF3D1EB8EA30554724E@xmb-sjc-216.amer.cisco.com>	<47F4F526.3060709@opengridcomputing.com>	<15ddcffd0804032147s439c001r95148d1305d47ac4@mail.gmail.com>
	<ada1w5lvc31.fsf@cisco.com>
Message-ID: <47FA31C3.5090307@opengridcomputing.com>

Roland Dreier wrote:
>  > If not, can some notes be sent to the list? I say lets learn from what
>  > you did so far...
>
> In my experience, getting code to work over both IB and iWARP isn't that
> hard.  The main points are:
>
>  - Use the RDMA CM for connection establishment (duh)
>  - Memory regions used to receive RDMA read responses must have "remote
>    write" permission (since in the iWARP protocol, RDMA read responses
>    are basically the same as incoming RDMA write requests)
>  - Active side of the connection must do the first operation
>  - Don't use IB-specific features (atomics, immediate data)
>
>   
Dunno the exact semantics for IB, but:  write and send completions for 
iWARP only indicate the buffer for the IO operation can be reused.  It 
does not indicate the data has been placed in the peers memory.

Steve.


From erezz at Voltaire.COM  Mon Apr  7 07:53:32 2008
From: erezz at Voltaire.COM (Erez Zilber)
Date: Mon, 07 Apr 2008 17:53:32 +0300
Subject: [ofa-general] About RDMA_CM_EVENT_DEVICE_REMOVAL
In-Reply-To: <47FA313B.20809@Voltaire.COM>
References: <47FA313B.20809@Voltaire.COM>
Message-ID: <47FA356C.9080209@Voltaire.COM>

Erez Zilber wrote:
> Sean,
>
> I'm trying to add a better implementation to this event in iSER (better
> than the current BUG() call that we have). I have 2 questions:
>
>    1. Is this event raised for each connection?
>    2. After the event is raised, I guess that I need to release all IB
>       resources for that connection, right? If you take a look at
>       iser_free_ib_conn_res() (in ulp/iser/iser_verbs.c), you can see
>       that we call rdma_destroy_id. This call never returns. Should I
>       call rdma_destroy_id while handling RDMA_CM_EVENT_DEVICE_REMOVAL?
>
>   
I read some of the cma code, and I see that cma_process_remove calls
rdma_destroy_id itself if iser_cma_handler returns a non-zero value.
Why? Currently, iser_cma_handler returns 0 (success), so rdma_destroy_id
is never called...

Erez


From poornima.kamath at qlogic.com  Mon Apr  7 07:56:39 2008
From: poornima.kamath at qlogic.com (Poornima Kamath (Contractor - ))
Date: Mon, 7 Apr 2008 07:56:39 -0700
Subject: [ofa-general] Running sdpnetstat on removing ib_sdp module causes
	kernel panic
Message-ID: <C0B4B127464CB84EBA166E6C16B316F2880846@AVEXCH2.qlogic.org>


Hi,

I am getting a kernel panic on running sdpnetstat when ib_sdp module is unloaded. 
Has anyone seen this? I am running OFED-1.3.
I have opened a bug in OFED-bugzilla for the same.
https://bugs.openfabrics.org/show_bug.cgi?id=996.

Regards,
Poornima
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080407/1c299240/attachment.html>

From swise at opengridcomputing.com  Mon Apr  7 08:27:28 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 07 Apr 2008 10:27:28 -0500
Subject: [ofa-general] Directions for verbs API extensions
In-Reply-To: <adaej9jppcx.fsf@cisco.com>
References: <adaej9jppcx.fsf@cisco.com>
Message-ID: <47FA3D60.3020905@opengridcomputing.com>

Hey roland. Nice write-up. Comments in-line below:

Roland Dreier wrote:
> Here is a little document I wrote trying to summarize all the things
> that we might want to add to the verbs API to support device
> capabilities that aren't exposed yet.  There are a number of issues to
> resolve, and answers to the questions I ask below would help us make
> progress towards actually supporting all this.
>
> There are a number of verbs that are common to the iWARP/RDMA
> consortium verbs and the InfiniBand base memory management extensions
> (IB-BMME).  We would probably add one device capability bit for "BMME"
> (and all iWARP devices could set it) to show support for everything here:
>
>  - Allocate L_Key/STag.  This allocates MR resources without actually
>    registering memory; the MR can then be registered or invalidated as
>    described below.
>
>  - "Fast register" memory through send queue.  This allows a work
>    request to be posted to a send queue to register memory using an
>    L_Key/STag that is in the invalid state.
>
>  - Local invalidate send work requests, which can be used to
>    invalidate an MR or MW.  One subtle point here is that local
>    invalidate operations have very loose ordering, in the sense that
>    they can be executed before earlier requests, but support for
>    fencing local invalidate operations is mandatory in iWARP and only
>    optional in IB.  But is there any IB device that currently exists
>    that supports BMME but doesn't support local invalidate fencing?
>    I really hope we can ignore this possibility.
>
>  - Memory windows associated to a single QP and bound using send work
>    requests posted with the normal post send verb rather than a
>    separate MW verb.  (See below for more)
>
> In addition there are things that are optional in both specs:
>
>  - Block-list physical buffer lists; this allows memory regions to be
>    registered with arbitrary size/alignment blocks instead of just
>    page-aligned chunks.  Yet another capability bit if we want to
>    expose this.
>
> There are a few discrepancies between the iWARP and IB verbs that we
> need to decide on how we want to handle:
>
>  - In IB-BMME, L_Keys and R_Keys are split up so that there is an
>    8-bit "key" that is owned by the consumer.  As far as I know, there
>    is no analogous concept defined for iWARP STags; is there any point
>    in supporting this IB-only feature (which is optional even in the
>    IB spec)?
>
>   
In fact there is an 8b key for stags as well. The stag is composed of a 
3B index allocated by the driver/hw, and a 1B key specified by the 
consumer. None of this is exposed in the linux rdma interface at this 
point and cxgb3 always sets the key to 0xff.

>  - Along similar lines, IB defines two types of memory windows, "type
>    1" and "type 2" and in fact type 2 is split into "2A" and "2B" (the
>    difference is basically whether the MW is associated with just a
>    QP, or with a QP and a PD).  iWARP memory windows are always what
>    the IB spec would call type 2B.  All the IB devices that I know of
>    with IB-BMME support can handle type 2B memory windows.  Is there
>    any point in having our API worry about the distinction between 2A
>    or 2B, or should we just decree that we only handle type 2B?  (Does
>    anyone who hasn't just been reading specs even understand the
>    distinction between type 2A and 2B?)
>
>  - Further, the MW API that we have now, with a separate bind MW verb,
>    corresponds to type 1 MWs.  Type 2 MWs are bound by posting a work
>    request using the standard "post send" verb.  Given that no IB
>    device drivers have implemented the bind MW verb yet, does it make
>    sense to deprecate the API for type 1 MWs and say that everyone
>    should use type 2[B] MWs only?
>
>   
The chelsio driver supports the iwarp bind_mw SQ WR via the current API. 
In fact the current API implies that this call is actually a SQ 
operation anyway:
> /**
> * ib_bind_mw - Posts a work request to the send queue of the specified
> * QP, which binds the memory window to the given address range and
> * remote access attributes.

How is the current bind_mw API not valid or correct for iwarp MWs? Other 
than being a different call than ib_post_send()?


>  - iWARP supports "RDMA read with invalidate" send work requests,
>    while IB has no such operation.  This makes sense because iWARP
>    requires the buffer used to receive RDMA read responses to have
>    remote write permission, while IB has no such requirement.  I don't
>    see a really clean way to handle this except to say that apps have
>    to have "if (IB) do_this(); else /* iWARP */ do_that();" code to
>    use this in a portable way.
>   

Or a transport independent app can always use 2 WRs, read + 
inv-local-stag/fenced instead of read-inv-local-stag.

>  - Zero-based virtual addresses for memory regions.  This is mandatory
>    for iWARP and optional for IB (and is not required even for BMME).
>    I think the simplest thing to do is just to have yet another
>    capability bit to say whether a device supports ZBVA or not; all
>    iWARP devices can set it.
>
>   
Currently, nobody is using this nor the block mode feature. I don't 
think we should bother supporting them unless someone has an app in mind 
that will utilize them.

> Finally, there are proprietary verbs extensions that are only
> supported by a single device at the moment, which we have to decide if
> and how to support.  It is a tradeoff between making useful features
> available versus making the already overly complex verbs API even more
> impossible to fathom, although it seems all of these have users asking
> for them:
>
>  - ConnectX has XRC, masked atomic operations, and the "block
>    loopback" flag for UD QPs at least.
>
>  - eHCA has "low-latency" QPs.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From hrosenstock at xsigo.com  Mon Apr  7 08:29:00 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Mon, 07 Apr 2008 08:29:00 -0700
Subject: [ofa-general] XmtDiscards
Message-ID: <1207582140.15625.284.camel@hrosenstock-ws.xsigo.com>

Hi again Bernd,

On Mon, 2008-04-07 at 15:53 +0200, Bernd Schubert wrote:
> Hello Hal,
> 
> On Monday 07 April 2008 15:35:10 Hal Rosenstock wrote:
> > Hi Bernd,
> >
> > On Sun, 2008-04-06 at 18:05 +0200, Bernd Schubert wrote:
> > > Hello Hal,
> > >
> > > > > Searching for this error I find "This is a symptom of congestion and
> > > > > may require tweaking either HOQ or switch lifetime values".
> > > > > Well, I have to admit I neither know what HOQ is, nor do I know how
> > > > > to tweak it. I also do not have an idea to set switch lifetime
> > > > > values.  I guess this isn't related to the opensm timeout option, is
> > > > > it?
> > > > >
> > > > > Hmm, I just found a cisci pdf describing how to set the lifetime on
> > > > > these switches, but is this also possible on Flextronics switches?
> > > >
> > > > What routing algorithm are you using ? Rather than play with those
> > > > switch values, if you are not using up/down, could you try that to see
> > > > if it helps with the congestion you are seeing ?
> > >
> > > I now configured up/down, but still got XmtDiscards, though, only on one
> > > port.
> > >
> > > Error check on lid 205 (SW_pfs1_leaf2) port all:  FAILED
> > > #warn: counter XmtDiscards = 6213       (threshold 100) lid 205 port 1
> > > Error check on lid 205 (SW_pfs1_leaf2) port 1:  FAILED
> > > #warn: counter RcvSwRelayErrors = 1431  (threshold 100) lid 205 port 13
> > > Error check on lid 205 (SW_pfs1_leaf2) port 13:  FAILED
> >
> > Are you running IPoIB ? If so, SwRelayErrors are not necessarily
> > indicative of a "real" issue due to the fact that multicasts reflected
> > on the same port are mistakenly counted.
> 
> so far only Lustre did IPoIB for network initialization. Once it finds a 
> working connection it does RDMA. But I'm not sure about what it does in case 
> of problems, e.g. server reboot, I guess it then does again IPoIB. 
> 
> Is there a way to find out if these RcvSwRelayErrors are due to multicast or 
> due to real problems?

While there're no counters which break this down into the 3 buckets
AFAIK, one can analyze that switch for the other 2 causes. That's the
best I'm aware of that can be done.

-- Hal

> > > I'm also not sure if up/down is the optimal algorithm for a fabric with
> > > only two switches.
> > >
> > > Since describing the connections in words is a bit difficult, I just
> > > upload a drawing here:
> > >
> > > http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/ib/Interswitch-ca
> > >bling.pdf
> > >
> > > The root-guid for the up/down algorithm is leaf-5 of of the small switch.
> > > But I'm still not sure about up/down at all. Doesn't one need for up/down
> > > at least 3 switches? Something like this ascii graphic below?
> > >
> > >
> > >        root-switch
> > >      /            \
> > >     /              \
> > >  Sw-1 ------------ Sw-2
> >
> > Doesn't your chassis switch have many switches in it ? You did say it
> > was 144 ports so it's made up of a number of switches.
> 
> Yes, it's made up of a number of switches.
> 
> >
> > You may need to choose a "better" root than up/down automatically
> > determines.
> >
> 
> Opensm isn't able to detect a root itself at all. As said above I first 
> configured leaf-5 of the small switch (see the pdf file above), but now 
> switched it to leaf-6 guid. I have no idea which would be optimal for our 
> switches - I guess I have to create a drawing from the ibnetdiscover output 
> to figure this out.

Yes.

> I will also later on try to check with ibutils if it detects errors.

Sure; that would be good too.

-- Hal

> Thanks,
> Bernd
> 
> 


From weiny2 at llnl.gov  Mon Apr  7 09:49:06 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Mon, 7 Apr 2008 09:49:06 -0700
Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ
	input values (Was: Re: [ofa-general] XmtDiscards)
In-Reply-To: <1207401479.15625.221.camel@hrosenstock-ws.xsigo.com>
References: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com>
	<1207401479.15625.221.camel@hrosenstock-ws.xsigo.com>
Message-ID: <20080407094906.7165dc20.weiny2@llnl.gov>

On Sat, 05 Apr 2008 06:17:59 -0700
Hal Rosenstock <hrosenstock at xsigo.com> wrote:

> On Fri, 2008-04-04 at 17:48 -0700, Boris Shpolyansky wrote:
> > Bernd,
> > 
> > 0x14 is the maximal value for HOQ lifetime, which effectively disables
> > the mechanism. I think you shouldn't exceed this value. 
> 
> True about the maximal value but any 5 bit value > 19 (up through 31)
> should effectively be the same thing according to the spec.
> 
> I also think that OpenSM could do a better job validating and setting
> this and other similar optional parameters.
> 

As a start here is a patch which checks the HOQ life values.

Ira

>From 9e05f091a3c9173045f523aee245e98af1bf74f3 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Mon, 7 Apr 2008 08:31:46 -0700
Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 opensm/opensm/osm_subnet.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 47d735f..29d7cdc 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -1045,6 +1045,28 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts)
 		p_opts->force_link_speed = IB_PORT_LINK_SPEED_ENABLED_MASK;
 	}
 
+	if (0x14 < p_opts->head_of_queue_lifetime) {
+		sprintf(buff,
+			" Invalid Cached Option Value:head_of_queue_lifetime = %u:"
+			"Using Default:%u\n", p_opts->head_of_queue_lifetime,
+			OSM_DEFAULT_HEAD_OF_QUEUE_LIFE);
+		printf(buff);
+		cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0);
+		p_opts->head_of_queue_lifetime =
+		OSM_DEFAULT_HEAD_OF_QUEUE_LIFE;
+	}
+
+	if (0x14 < p_opts->leaf_head_of_queue_lifetime) {
+		sprintf(buff,
+			" Invalid Cached Option Value:leaf_head_of_queue_lifetime = %u:"
+			"Using Default:%u\n", p_opts->leaf_head_of_queue_lifetime,
+			OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE);
+		printf(buff);
+		cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0);
+		p_opts->leaf_head_of_queue_lifetime =
+		OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE;
+	}
+
 	if (strcmp(p_opts->console, OSM_DISABLE_CONSOLE)
 	    && strcmp(p_opts->console, OSM_LOCAL_CONSOLE)
 #ifdef ENABLE_OSM_CONSOLE_SOCKET
-- 
1.5.1


From hrosenstock at xsigo.com  Mon Apr  7 10:06:06 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Mon, 07 Apr 2008 10:06:06 -0700
Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and
	Leaf HOQ input values (Was: Re: [ofa-general] XmtDiscards)
In-Reply-To: <20080407094906.7165dc20.weiny2@llnl.gov>
References: <1E3DCD1C63492545881FACB6063A57C1023F6B30@mtiexch01.mti.com>
	<1207401479.15625.221.camel@hrosenstock-ws.xsigo.com>
	<20080407094906.7165dc20.weiny2@llnl.gov>
Message-ID: <1207587966.15625.317.camel@hrosenstock-ws.xsigo.com>

On Mon, 2008-04-07 at 09:49 -0700, Ira Weiny wrote:
> On Sat, 05 Apr 2008 06:17:59 -0700
> Hal Rosenstock <hrosenstock at xsigo.com> wrote:
> 
> > On Fri, 2008-04-04 at 17:48 -0700, Boris Shpolyansky wrote:
> > > Bernd,
> > > 
> > > 0x14 is the maximal value for HOQ lifetime, which effectively disables
> > > the mechanism. I think you shouldn't exceed this value. 
> > 
> > True about the maximal value but any 5 bit value > 19 (up through 31)
> > should effectively be the same thing according to the spec.
> > 
> > I also think that OpenSM could do a better job validating and setting
> > this and other similar optional parameters.
> > 
> 
> As a start here is a patch which checks the HOQ life values.
> 
> Ira
> 
> From 9e05f091a3c9173045f523aee245e98af1bf74f3 Mon Sep 17 00:00:00 2001
> From: Ira K. Weiny <weiny2 at llnl.gov>
> Date: Mon, 7 Apr 2008 08:31:46 -0700
> Subject: [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ and Leaf HOQ input values
> 
> 
> Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
> ---
>  opensm/opensm/osm_subnet.c |   22 ++++++++++++++++++++++
>  1 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
> index 47d735f..29d7cdc 100644
> --- a/opensm/opensm/osm_subnet.c
> +++ b/opensm/opensm/osm_subnet.c
> @@ -1045,6 +1045,28 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts)
>  		p_opts->force_link_speed = IB_PORT_LINK_SPEED_ENABLED_MASK;
>  	}
>  
> +	if (0x14 < p_opts->head_of_queue_lifetime) {
> +		sprintf(buff,
> +			" Invalid Cached Option Value:head_of_queue_lifetime = %u:"
> +			"Using Default:%u\n", p_opts->head_of_queue_lifetime,
> +			OSM_DEFAULT_HEAD_OF_QUEUE_LIFE);
> +		printf(buff);
> +		cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0);
> +		p_opts->head_of_queue_lifetime =
> +		OSM_DEFAULT_HEAD_OF_QUEUE_LIFE;
> +	}
> +
> +	if (0x14 < p_opts->leaf_head_of_queue_lifetime) {
> +		sprintf(buff,
> +			" Invalid Cached Option Value:leaf_head_of_queue_lifetime = %u:"
> +			"Using Default:%u\n", p_opts->leaf_head_of_queue_lifetime,
> +			OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE);
> +		printf(buff);
> +		cl_log_event("OpenSM", CL_LOG_INFO, buff, NULL, 0);
> +		p_opts->leaf_head_of_queue_lifetime =
> +		OSM_DEFAULT_LEAF_HEAD_OF_QUEUE_LIFE;
> +	}
> +

Should these be set to max rather than default as it seems that that's
what they're more likely trying to do ?

-- Hal

>  	if (strcmp(p_opts->console, OSM_DISABLE_CONSOLE)
>  	    && strcmp(p_opts->console, OSM_LOCAL_CONSOLE)
>  #ifdef ENABLE_OSM_CONSOLE_SOCKET


From or.gerlitz at gmail.com  Mon Apr  7 10:39:00 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Mon, 7 Apr 2008 20:39:00 +0300
Subject: [ofa-general] About RDMA_CM_EVENT_DEVICE_REMOVAL
In-Reply-To: <47FA313B.20809@Voltaire.COM>
References: <47FA313B.20809@Voltaire.COM>
Message-ID: <15ddcffd0804071039q48f55544ja89ff2f60ae5592b@mail.gmail.com>

On Mon, Apr 7, 2008 at 5:35 PM, Erez Zilber <erezz at voltaire.com> wrote:

>    1. Is this event raised for each connection?

per rdma cm id which is bounded to device, which is the initiator case
is per connection

>    2. After the event is raised, I guess that I need to release all IB
>       resources for that connection, right? If you take a look at
>       iser_free_ib_conn_res() (in ulp/iser/iser_verbs.c), you can see
>       that we call rdma_destroy_id. This call never returns. Should I
>       call rdma_destroy_id while handling RDMA_CM_EVENT_DEVICE_REMOVAL?

you are not allowed to call rdma_destroy_id from the context of your
callback... this is documented in the rdma-cm .h file, just
return non zero from the callback if you want the rdma cm to destroy
the id for you...

Or


From tziporet at dev.mellanox.co.il  Mon Apr  7 10:54:02 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 07 Apr 2008 10:54:02 -0700
Subject: [ofa-general] Re: Plan for OFED-1.3.1?
In-Reply-To: <200804071331.42031.ossrosch@linux.vnet.ibm.com>
References: <200804071331.42031.ossrosch@linux.vnet.ibm.com>
Message-ID: <47FA5FBA.8030907@mellanox.co.il>

Stefan Roscher wrote:
> Hi,
>
> is there any schedule for the OFED-1.3.1 release? 
Schedule is May 29 (I will present it as part of OFED 1.3 session today)
> When should we start to send some minor bugfixes for ehca? 
You can start now
> Would the kernel-base be the same 2.6.24 or will it change to 2.6.25?
>   
Kernel base will not changed

Tziporet


From tziporet at dev.mellanox.co.il  Mon Apr  7 11:00:31 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 07 Apr 2008 11:00:31 -0700
Subject: [ofa-general] ofed works on kernels with 64Kbyte pages?
In-Reply-To: <adaod8pqqmc.fsf@cisco.com>
References: <20080404204758.GU29410@sgi.com> <adaod8pqqmc.fsf@cisco.com>
Message-ID: <47FA613F.3070301@mellanox.co.il>

Roland Dreier wrote:
>  > I know it's a long shot, but has anyone tried using OFED on
>  > a kernel with 64Kbyte pages?
>  > 
>  > SGI would like to support that, but I've gotten reports that
>  > something is not working (e.g., "ib_rdma_bw" doesn't work on 
>  > an ia64 kernel with 64Kb pages). This is with the mthca driver, 
>  > fwiw.
>  > 
>  > Unfortunately a conspiracy of h/w prevents me from reproducing
>  > this right now, so I don't have more details. But I'd be very
>  > curious to know if anyone can verify that OFED does/doesn't
>  > work with 64Kbyte pages.
>
> I don't know about OFED, but I've tried various things on 64KB PAGE_SIZE
> systems and it seems to work.  It wouldn't surprise me if there are
> issues since the drivers and firmware gets a lot less testing in such
> situations but it "should work" -- I'd be happy to help debug if anyone
> has concrete problems.
>   
OFED was tested on PPC64 with RHEL5.1 which works with 64K pages as a 
default.
This was tested with our ConnectX cards (mlx4 driver)
I think IBM are using the same OS for their ehca cards too

Tziporet


From tziporet at dev.mellanox.co.il  Mon Apr  7 11:32:06 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 07 Apr 2008 11:32:06 -0700
Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send
	with	invalidate" work requests
In-Reply-To: <adafxu2y6mk.fsf_-_@cisco.com>
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com>
Message-ID: <47FA68A6.8020109@mellanox.co.il>

Roland Dreier wrote:
> OK here's an updated series of the kernel side, with the invalidate
> stuff moved to a new opcode.  I also decided after thinking about it
> that I liked Eli's suggestion of putting the invalidate rkey in a union
> with imm_data.  This won't work for libibverbs where we have to preserve
> the API but I guess we can burn that bridge when we come to it...
>   

I think send w/invalidate is for kernel keys only (at least in IB) so 
not clear we need it in libibverbs at all

Tziporet


From rajouri.jammu at gmail.com  Mon Apr  7 11:51:34 2008
From: rajouri.jammu at gmail.com (Rajouri Jammu)
Date: Mon, 7 Apr 2008 11:51:34 -0700
Subject: [ofa-general] OFED 1.3 user source rpm
Message-ID: <3307cdf90804071151u7b47ad6csd57efaea13455cdb@mail.gmail.com>

Hi,

I could not find the ofa_user rpm in OFED 1.3. In older releases there was a
way to  create a separate rpm for the user src.

OFED-1.2.5.4]# grep ofa_user *
build_env.sh:OFA_USER_SRC_RPM=$(/bin/ls -1 ${SRPMS}/ofa_user*.src.rpm 2>
$NULL)
BUILD_ID:ofa_user-1.2.5.4:
build.sh:# Create RPMs for selected packages from ofa_user and ofa_kernel


I couldn't find anything like that in OFED 1.3.


I there a way for me to look at the OFED 1.3 user mode sources?


thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080407/858e55e9/attachment.html>

From tziporet at dev.mellanox.co.il  Mon Apr  7 11:32:06 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 07 Apr 2008 11:32:06 -0700
Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send
	with	invalidate" work requests
In-Reply-To: <adafxu2y6mk.fsf_-_@cisco.com>
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com>
Message-ID: <47FA68A6.8020109@mellanox.co.il>

Roland Dreier wrote:
> OK here's an updated series of the kernel side, with the invalidate
> stuff moved to a new opcode.  I also decided after thinking about it
> that I liked Eli's suggestion of putting the invalidate rkey in a union
> with imm_data.  This won't work for libibverbs where we have to preserve
> the API but I guess we can burn that bridge when we come to it...
>   

I think send w/invalidate is for kernel keys only (at least in IB) so 
not clear we need it in libibverbs at all

Tziporet


From jeremy at goop.org  Mon Apr  7 12:02:53 2008
From: jeremy at goop.org (Jeremy Fitzhardinge)
Date: Mon, 07 Apr 2008 12:02:53 -0700
Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against
	reclaim
In-Reply-To: <20080405004127.GG14784@duo.random>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org>
	<20080405004127.GG14784@duo.random>
Message-ID: <47FA6FDD.9060605@goop.org>

Andrea Arcangeli wrote:
> On Fri, Apr 04, 2008 at 04:12:42PM -0700, Jeremy Fitzhardinge wrote:
>   
>> I think you can break this if() down a bit:
>>
>> 			if (!(vma->vm_file && vma->vm_file->f_mapping))
>> 				continue;
>>     
>
> It makes no difference at runtime, coding style preferences are quite
> subjective.
>   

Well, overall the formatting of that if statement is very hard to read.  
Separating out the logically distinct pieces in to different ifs at 
least shows the reader that they are distinct.
Aside from that, doing some manual CSE to remove all the casts and 
expose the actual thing you're testing for would help a lot (are the 
casts even necessary?).

>> So this is an O(n^2) algorithm to take the i_mmap_locks from low to high 
>> order?  A comment would be nice.  And O(n^2)?  Ouch.  How often is it 
>> called?
>>     
>
> It's called a single time when the mmu notifier is registered. It's a
> very slow path of course. Any other approach to reduce the complexity
> would require memory allocations and it would require
> mmu_notifier_register to return -ENOMEM failure. It didn't seem worth
> it.
>   

It's per-mm though.  How many processes would need to have notifiers?


>> And is it necessary to mush lock and unlock together?  Unlock ordering 
>> doesn't matter, so you should just be able to have a much simpler loop, no?
>>     
>
> That avoids duplicating .text. Originally they were separated. unlock
> can't be a simpler loop because I didn't reserve vm_flags bitflags to
> do a single O(N) loop for unlock. If you do malloc+fork+munmap two
> vmas will point to the same anon-vma lock, that's why the unlock isn't
> simpler unless I mark what I locked with a vm_flags bitflag.

Well, its definitely going to need more comments then.  I assumed it 
would end up locking everything, so unlocking everything would be 
sufficient.

    J


From swise at opengridcomputing.com  Mon Apr  7 12:28:50 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 07 Apr 2008 14:28:50 -0500
Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send
	with	invalidate" work requests
In-Reply-To: <47FA68A6.8020109@mellanox.co.il>
References: <adad4p92rra.fsf@cisco.com>
	<adalk3w53ei.fsf@cisco.com>	<adafxu2y6mk.fsf_-_@cisco.com>
	<47FA68A6.8020109@mellanox.co.il>
Message-ID: <47FA75F2.3040907@opengridcomputing.com>

Tziporet Koren wrote:
> Roland Dreier wrote:
>> OK here's an updated series of the kernel side, with the invalidate
>> stuff moved to a new opcode.  I also decided after thinking about it
>> that I liked Eli's suggestion of putting the invalidate rkey in a union
>> with imm_data.  This won't work for libibverbs where we have to preserve
>> the API but I guess we can burn that bridge when we come to it...
>>   
>
> I think send w/invalidate is for kernel keys only (at least in IB) so 
> not clear we need it in libibverbs at all
>
For iWARP, its needed for user mode as well...

Steve.


From andrea at qumranet.com  Mon Apr  7 12:35:44 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Mon, 7 Apr 2008 21:35:44 +0200
Subject: [ofa-general] Re: [patch 01/10] emm: mm_lock: Lock a process against
	reclaim
In-Reply-To: <47FA6FDD.9060605@goop.org>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.271668133@sgi.com> <47F6B5EA.6060106@goop.org>
	<20080405004127.GG14784@duo.random> <47FA6FDD.9060605@goop.org>
Message-ID: <20080407193544.GH20587@duo.random>

On Mon, Apr 07, 2008 at 12:02:53PM -0700, Jeremy Fitzhardinge wrote:
> It's per-mm though.  How many processes would need to have notifiers?

There can be up to hundreds of VM in a single system. Not sure to
understand the point of the question though.

> Well, its definitely going to need more comments then.  I assumed it would 
> end up locking everything, so unlocking everything would be sufficient.

After your comments, I'm writing an alternate version that will
guarantee a O(N) worst case to both sigkill and cond_resched but
frankly this is low priority. Without mmu notifiers /dev/kvm can't be
given to a normal luser without at least losing mlock ulimits, so lack
of a mmu notifiers is a bigger issue than whatever complexity in
mm_lock as far as /dev/kvm ownership is concerned.


From rdreier at cisco.com  Mon Apr  7 12:45:00 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 07 Apr 2008 12:45:00 -0700
Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send
	with	invalidate" work requests
In-Reply-To: <47FA68A6.8020109@mellanox.co.il> (Tziporet Koren's message of
	"Mon, 07 Apr 2008 11:32:06 -0700")
References: <adad4p92rra.fsf@cisco.com> <adalk3w53ei.fsf@cisco.com>
	<adafxu2y6mk.fsf_-_@cisco.com> <47FA68A6.8020109@mellanox.co.il>
Message-ID: <ada3apxphz7.fsf@cisco.com>

 > I think send w/invalidate is for kernel keys only (at least in IB) so
 > not clear we need it in libibverbs at all

Really?  Shouldn't userspace be able to do send with invalidate for
memory windows?

 - R.


From jimmott at austin.rr.com  Mon Apr  7 14:11:34 2008
From: jimmott at austin.rr.com (Jim Mott)
Date: Mon, 7 Apr 2008 16:11:34 -0500
Subject: [ofa-general] Running sdpnetstat on removing ib_sdp module
	causes	kernel panic
In-Reply-To: <C0B4B127464CB84EBA166E6C16B316F2880846@AVEXCH2.qlogic.org>
References: <C0B4B127464CB84EBA166E6C16B316F2880846@AVEXCH2.qlogic.org>
Message-ID: <000001c898f3$fd30fc60$f792f520$@rr.com>

I have not seen it, but I am not sure I have tried this.  I'll check it and report status (and fix) on you bug (bug996).  It will be
this weekend before I can look though.

 
From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Poornima Kamath (Contractor
- )
Sent: Monday, April 07, 2008 9:57 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Running sdpnetstat on removing ib_sdp module causes kernel panic

 
Hi,

I am getting a kernel panic on running sdpnetstat when ib_sdp module is unloaded.
Has anyone seen this? I am running OFED-1.3.
I have opened a bug in OFED-bugzilla for the same.
https://bugs.openfabrics.org/show_bug.cgi?id=996.

Regards,
Poornima 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080407/d741a4e4/attachment.html>

From rdreier at cisco.com  Mon Apr  7 14:21:54 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 07 Apr 2008 14:21:54 -0700
Subject: [ofa-general] Directions for verbs API extensions
In-Reply-To: <47FA3D60.3020905@opengridcomputing.com> (Steve Wise's message of
	"Mon, 07 Apr 2008 10:27:28 -0500")
References: <adaej9jppcx.fsf@cisco.com>
	<47FA3D60.3020905@opengridcomputing.com>
Message-ID: <adahcednyx9.fsf@cisco.com>

 > > There are a few discrepancies between the iWARP and IB verbs that we
 > > need to decide on how we want to handle:
 > >
 > >  - In IB-BMME, L_Keys and R_Keys are split up so that there is an
 > >    8-bit "key" that is owned by the consumer.  As far as I know, there
 > >    is no analogous concept defined for iWARP STags; is there any point
 > >    in supporting this IB-only feature (which is optional even in the
 > >    IB spec)?

 > In fact there is an 8b key for stags as well. The stag is composed of
 > a 3B index allocated by the driver/hw, and a 1B key specified by the
 > consumer. None of this is exposed in the linux rdma interface at this
 > point and cxgb3 always sets the key to 0xff.

Oops, I completely missed that in the iWARP verbs spec.  Yes, the IB and
iWARP verbs agree on the semantics here, so the only issue is that the
"key" portion of L_Keys/R_Keys is only supported by IB devices that do
BMME.  So we can expose this in the API without too much trouble.

 > The chelsio driver supports the iwarp bind_mw SQ WR via the current
 > API. In fact the current API implies that this call is actually a SQ
 > operation anyway:
 > > /**
 > > * ib_bind_mw - Posts a work request to the send queue of the specified
 > > * QP, which binds the memory window to the given address range and
 > > * remote access attributes.
 > 
 > How is the current bind_mw API not valid or correct for iwarp MWs?
 > Other than being a different call than ib_post_send()?

That's the only issue.  The main impact is that you can't submit an MW
bind as part of a list of send WRs.  I guess it's not too severe an
issue.  I don't have any strong feelings here, except that eliminating
the separate bind_mw call might be a little cleaner.  On the other hand
it adds more conditional branches to post_send so maybe it's a net lose.

 > >  - iWARP supports "RDMA read with invalidate" send work requests,
 > >    while IB has no such operation.  This makes sense because iWARP
 > >    requires the buffer used to receive RDMA read responses to have
 > >    remote write permission, while IB has no such requirement.  I don't
 > >    see a really clean way to handle this except to say that apps have
 > >    to have "if (IB) do_this(); else /* iWARP */ do_that();" code to
 > >    use this in a portable way.

 > Or a transport independent app can always use 2 WRs, read +
 > inv-local-stag/fenced instead of read-inv-local-stag.

Except that fenced local invalidate is optional on IB ;)
But as I said I think we can assume that IB devices that support local
invalidate support fencing it.

 > >  - Zero-based virtual addresses for memory regions.  This is mandatory
 > >    for iWARP and optional for IB (and is not required even for BMME).
 > >    I think the simplest thing to do is just to have yet another
 > >    capability bit to say whether a device supports ZBVA or not; all
 > >    iWARP devices can set it.

 > Currently, nobody is using this nor the block mode feature. I don't
 > think we should bother supporting them unless someone has an app in
 > mind that will utilize them.

I agree that block mode seems dubious.  I believe that iSER on iWARP
requires ZBVA though.

 - R.


From sashak at voltaire.com  Mon Apr  7 18:44:06 2008
From: sashak at voltaire.com (Sasha Copyist)
Date: Tue, 8 Apr 2008 01:44:06 +0000
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <200804041147.27565.bs@q-leap.de>
References: <200804041147.27565.bs@q-leap.de>
Message-ID: <20080408014406.GA16864@sashak.voltaire.com>

Hi Bernd,

On 11:47 Fri 04 Apr     , Bernd Schubert wrote:
> 
> opensm-3.2.1 logs some error messages like this:
> 
> Apr 04 00:00:08 325114 [4580A960] 0x01 -> __osm_state_mgr_light_sweep_start: 
> ERR 0108: Unknown remote side for node 0
> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling list
> Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3 hop path:
>                                 Path = 0,1,14,13
> 
> 
> From ibnetdiscover output I see port13 of this switch is a switch-interconnect 
> (sorry, I don't know what the correct name/identifier for switches within 
> switches):
> 
> [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid 263 
> 4xSDR

It is possible that port was DOWN during first subnet discovery. Finally
everything should be initialized after those messages. Isn't it the case
here?

Sasha


From swise at opengridcomputing.com  Mon Apr  7 16:06:59 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 07 Apr 2008 18:06:59 -0500
Subject: [ofa-general] Directions for verbs API extensions
In-Reply-To: <adahcednyx9.fsf@cisco.com>
References: <adaej9jppcx.fsf@cisco.com>	<47FA3D60.3020905@opengridcomputing.com>
	<adahcednyx9.fsf@cisco.com>
Message-ID: <47FAA913.7090805@opengridcomputing.com>


>  > Currently, nobody is using this nor the block mode feature. I don't
>  > think we should bother supporting them unless someone has an app in
>  > mind that will utilize them.
> 
> I agree that block mode seems dubious.  I believe that iSER on iWARP
> requires ZBVA though.
> 

You're right.

However, iSER as its spec'd in the IETF, cannot work in Linux due to the 
linux networking maintainer's insistence that RDMA connections not share 
the same port space.  Specifically, the spec mandates (for TCP only, not 
IB), that the connection used for the iSCSI login be migrated into rdma 
mode.  IE you cannot start a different connection for doing the data 
moving part...

Steve.


From tziporet at dev.mellanox.co.il  Mon Apr  7 17:49:26 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 07 Apr 2008 17:49:26 -0700
Subject: [ofa-general] Directions for verbs API extensions
In-Reply-To: <adaej9jppcx.fsf@cisco.com>
References: <adaej9jppcx.fsf@cisco.com>
Message-ID: <47FAC116.3060600@mellanox.co.il>

Roland Dreier wrote:
> Finally, there are proprietary verbs extensions that are only
> supported by a single device at the moment, which we have to decide if
> and how to support.  It is a tradeoff between making useful features
> available versus making the already overly complex verbs API even more
> impossible to fathom, although it seems all of these have users asking
> for them:
>
>  - ConnectX has XRC, masked atomic operations, and the "block
>    loopback" flag for UD QPs at least.  
We also have reliable multicast feature we wish to add

Tziporet


From tziporet at dev.mellanox.co.il  Mon Apr  7 17:51:12 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 07 Apr 2008 17:51:12 -0700
Subject: [ofa-general] [PATCH/RFC 1/2] IB/core: Add support for "send
	with	invalidate" work requests
In-Reply-To: <ada3apxphz7.fsf@cisco.com>
References: <adad4p92rra.fsf@cisco.com>
	<adalk3w53ei.fsf@cisco.com>	<adafxu2y6mk.fsf_-_@cisco.com>
	<47FA68A6.8020109@mellanox.co.il> <ada3apxphz7.fsf@cisco.com>
Message-ID: <47FAC180.3040303@mellanox.co.il>

Roland Dreier wrote:
>  > I think send w/invalidate is for kernel keys only (at least in IB) so
>  > not clear we need it in libibverbs at all
>
> Really?  Shouldn't userspace be able to do send with invalidate for
> memory windows?
>   
Yes but we actually have not implemented memory window in IB either :-)

Tziporet


From grossmann at hlrs.de  Tue Apr  8 01:13:52 2008
From: grossmann at hlrs.de (Thomas =?iso-8859-1?q?Gro=DFmann?=)
Date: Tue, 8 Apr 2008 10:13:52 +0200
Subject: [ofa-general] kernel ib build (OFED 1.3) fails on SLES 10
Message-ID: <200804081013.52983.grossmann@hlrs.de>

Hi,

kernel ib build (OFED 1.3) fails on SLES 10.
You find the output attached.

Best regards,
Thomas
-- 
 Thomas Großmann                
 High Performance Computing Center Stuttgart (HLRS)                                      
  
 Allmandring 30                                                
 70550 Stuttgart, Germany   

 E-Mail: grossmann at hlrs.de                                                              
 Phone: ++49-711-685-65529
 Fax  : ++49-711-685-65832
-------------- next part --------------
warning: user vlad does not exist - using root
warning: group vlad does not exist - using root
warning: user vlad does not exist - using root
warning: group vlad does not exist - using root
Installing /root/OFED-1.3/SRPMS/ofa_kernel-1.3-ofed1.3.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.52212
+ umask 022
+ cd /var/tmp/OFED_topdir/BUILD
+ cd /var/tmp/OFED_topdir/BUILD
+ rm -rf ofa_kernel-1.3
+ /usr/bin/gzip -dc /var/tmp/OFED_topdir/SOURCES/ofa_kernel-1.3.tgz
+ tar -xvvf -
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/refs/heads/
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/heads/ofed_kernel
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/heads/ofed_kernel_2_6_24_rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/heads/ofed_kernel_2_6_23
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:54 ofa_kernel-1.3/.git/refs/heads/master
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/sdp_ofed_1_1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.12-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.13-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.14-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.15-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.16-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.17-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.18-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.19-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.20-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.21-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.22-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc7
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc8
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.23-rc9
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/v2.6.24-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2-rc6
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.c-10
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.c-11
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.2.c-9
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-beta2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc1
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc2
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc3
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc4
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc5
-rw-r--r-- vlad/vlad        41 2008-02-28 09:59:49 ofa_kernel-1.3/.git/refs/tags/vofed-1.3-rc6
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/branches/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/info/
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:49 ofa_kernel-1.3/.git/info/exclude
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/
-rw-r--r-- vlad/vlad       441 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/applypatch-msg
-rw-r--r-- vlad/vlad       781 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/commit-msg
-rw-r--r-- vlad/vlad       152 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/post-commit
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/post-receive
-rw-r--r-- vlad/vlad       207 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/post-update
-rw-r--r-- vlad/vlad       388 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/pre-applypatch
-rw-r--r-- vlad/vlad      1696 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/pre-commit
-rw-r--r-- vlad/vlad      4262 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/pre-rebase
-rw-r--r-- vlad/vlad      1949 2008-02-28 09:59:49 ofa_kernel-1.3/.git/hooks/update
-rw-r--r-- vlad/vlad        58 2008-02-28 09:59:49 ofa_kernel-1.3/.git/description
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/objects/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/objects/info/
-rw-r--r-- vlad/vlad        39 2008-02-28 09:59:49 ofa_kernel-1.3/.git/objects/info/alternates
-rw-r--r-- vlad/vlad        23 2008-02-28 09:59:49 ofa_kernel-1.3/.git/HEAD
-rw-r--r-- vlad/vlad        92 2008-02-28 09:59:49 ofa_kernel-1.3/.git/config
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/refs/heads/
-rw-r--r-- vlad/vlad       222 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/heads/ofed_kernel
-rw-r--r-- vlad/vlad       222 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/heads/ofed_kernel_2_6_23
-rw-r--r-- vlad/vlad       222 2008-02-28 09:59:49 ofa_kernel-1.3/.git/logs/refs/heads/ofed_kernel_2_6_24_rc1
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/refs/heads/master
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:54 ofa_kernel-1.3/.git/logs/HEAD
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/
-rwxr-xr-x vlad/vlad      1334 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofed_checkout.sh
-rw-r--r-- vlad/vlad       331 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/90-ib.rules
-rw-r--r-- vlad/vlad       616 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/Makefile
-rwxr-xr-x vlad/vlad     38197 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/configure
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/iscsi_scsi_makefile
-rw-r--r-- vlad/vlad     15698 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/makefile
-rwxr-xr-x vlad/vlad     26219 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofa_kernel.spec
-rwxr-xr-x vlad/vlad      2921 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofed_makedist.sh
-rwxr-xr-x vlad/vlad     13027 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/ofed_patch.sh
-rw-r--r-- vlad/vlad        40 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/openib.conf
-rwxr-xr-x vlad/vlad     43734 2008-02-28 09:59:51 ofa_kernel-1.3/ofed_scripts/openibd
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/
-rw-r--r-- vlad/vlad      4081 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/core_locking.txt
-rw-r--r-- vlad/vlad      2289 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/ipoib.txt
-rw-r--r-- vlad/vlad      2236 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/sysfs.txt
-rw-r--r-- vlad/vlad      4939 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/user_mad.txt
-rw-r--r-- vlad/vlad      2981 2008-02-28 09:59:50 ofa_kernel-1.3/Documentation/infiniband/user_verbs.txt
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/
-rw-r--r-- vlad/vlad      1904 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/Kconfig
-rw-r--r-- vlad/vlad       660 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/Makefile
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/
-rw-r--r-- vlad/vlad       790 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/Makefile
-rw-r--r-- vlad/vlad      9613 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/addr.c
-rw-r--r-- vlad/vlad      6214 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/agent.c
-rw-r--r-- vlad/vlad      2160 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/agent.h
-rw-r--r-- vlad/vlad     10355 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cache.c
-rw-r--r-- vlad/vlad    100297 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cm.c
-rw-r--r-- vlad/vlad     21553 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cm_msgs.h
-rw-r--r-- vlad/vlad     71133 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/cma.c
-rw-r--r-- vlad/vlad      1875 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/core_priv.h
-rw-r--r-- vlad/vlad     20057 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/device.c
-rw-r--r-- vlad/vlad     14148 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/fmr_pool.c
-rw-r--r-- vlad/vlad     28919 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/iwcm.c
-rw-r--r-- vlad/vlad      2343 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/iwcm.h
-rw-r--r-- vlad/vlad     85825 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad.c
-rw-r--r-- vlad/vlad      6126 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad_priv.h
-rw-r--r-- vlad/vlad     27404 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad_rmpp.c
-rw-r--r-- vlad/vlad      2175 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/mad_rmpp.h
-rw-r--r-- vlad/vlad     21515 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/multicast.c
-rw-r--r-- vlad/vlad      6506 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/packer.c
-rw-r--r-- vlad/vlad      2326 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/sa.h
-rw-r--r-- vlad/vlad     29093 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/sa_query.c
-rw-r--r-- vlad/vlad      7423 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/smi.c
-rw-r--r-- vlad/vlad      2874 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/smi.h
-rw-r--r-- vlad/vlad     20260 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/sysfs.c
-rw-r--r-- vlad/vlad     33526 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ucm.c
-rw-r--r-- vlad/vlad     26314 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ucma.c
-rw-r--r-- vlad/vlad     10141 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/ud_header.c
-rw-r--r-- vlad/vlad      7807 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/umem.c
-rw-r--r-- vlad/vlad     31016 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/user_mad.c
-rw-r--r-- vlad/vlad      6719 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs.h
-rw-r--r-- vlad/vlad     53964 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs_cmd.c
-rw-r--r-- vlad/vlad     24216 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs_main.c
-rw-r--r-- vlad/vlad      5119 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/uverbs_marshall.c
-rw-r--r-- vlad/vlad     19891 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/core/verbs.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/
-rw-r--r-- vlad/vlad        90 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/Makefile
-rw-r--r-- vlad/vlad     21953 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/memtrack.c
-rw-r--r-- vlad/vlad      1734 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/memtrack.h
-rw-r--r-- vlad/vlad      4505 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/debug/mtrack.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/
-rw-r--r-- vlad/vlad       244 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/Kbuild
-rw-r--r-- vlad/vlad       469 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/Kconfig
-rw-r--r-- vlad/vlad     33363 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2.c
-rw-r--r-- vlad/vlad     13966 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2.h
-rw-r--r-- vlad/vlad      9195 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_ae.c
-rw-r--r-- vlad/vlad      3338 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_ae.h
-rw-r--r-- vlad/vlad      4066 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_alloc.c
-rw-r--r-- vlad/vlad      9984 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_cm.c
-rw-r--r-- vlad/vlad     10552 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_cq.c
-rw-r--r-- vlad/vlad      5589 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_intr.c
-rw-r--r-- vlad/vlad      8859 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_mm.c
-rw-r--r-- vlad/vlad      4594 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_mq.c
-rw-r--r-- vlad/vlad      3235 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_mq.h
-rw-r--r-- vlad/vlad      3012 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_pd.c
-rw-r--r-- vlad/vlad     21809 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_provider.c
-rw-r--r-- vlad/vlad      4101 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_provider.h
-rw-r--r-- vlad/vlad     24792 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_qp.c
-rw-r--r-- vlad/vlad     16626 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_rnic.c
-rw-r--r-- vlad/vlad      5103 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_status.h
-rw-r--r-- vlad/vlad      2415 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_user.h
-rw-r--r-- vlad/vlad      7745 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_vq.c
-rw-r--r-- vlad/vlad      2530 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_vq.h
-rw-r--r-- vlad/vlad     35159 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/amso1100/c2_wr.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/
-rw-r--r-- vlad/vlad       864 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/Kconfig
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/Makefile
-rw-r--r-- vlad/vlad      5162 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_dbg.c
-rw-r--r-- vlad/vlad     36525 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_hal.c
-rw-r--r-- vlad/vlad      6913 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_hal.h
-rw-r--r-- vlad/vlad      8670 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_resource.c
-rw-r--r-- vlad/vlad      3116 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_resource.h
-rw-r--r-- vlad/vlad     18781 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/cxio_wr.h
-rw-r--r-- vlad/vlad      5500 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch.c
-rw-r--r-- vlad/vlad      4547 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch.h
-rw-r--r-- vlad/vlad     55171 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_cm.c
-rw-r--r-- vlad/vlad      5739 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_cm.h
-rw-r--r-- vlad/vlad      5576 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_cq.c
-rw-r--r-- vlad/vlad      7100 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_ev.c
-rw-r--r-- vlad/vlad      4709 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_mem.c
-rw-r--r-- vlad/vlad     32654 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_provider.c
-rw-r--r-- vlad/vlad      9450 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_provider.h
-rw-r--r-- vlad/vlad     27217 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_qp.c
-rw-r--r-- vlad/vlad      2128 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_user.h
-rw-r--r-- vlad/vlad     20279 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/tcb.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/Kconfig
-rw-r--r-- vlad/vlad       545 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/Makefile
-rw-r--r-- vlad/vlad      8960 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_av.c
-rw-r--r-- vlad/vlad     10590 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_classes.h
-rw-r--r-- vlad/vlad     10433 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_classes_pSeries.h
-rw-r--r-- vlad/vlad     11830 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_cq.c
-rw-r--r-- vlad/vlad      5122 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_eq.c
-rw-r--r-- vlad/vlad     11393 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_hca.c
-rw-r--r-- vlad/vlad     23029 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_irq.c
-rw-r--r-- vlad/vlad      2515 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_irq.h
-rw-r--r-- vlad/vlad      6746 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_iverbs.h
-rw-r--r-- vlad/vlad     27987 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_main.c
-rw-r--r-- vlad/vlad      4587 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_mcast.c
-rw-r--r-- vlad/vlad     63998 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_mrmw.c
-rw-r--r-- vlad/vlad      3545 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_mrmw.h
-rw-r--r-- vlad/vlad      3656 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_pd.c
-rw-r--r-- vlad/vlad      6177 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_qes.h
-rw-r--r-- vlad/vlad     53087 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_qp.c
-rw-r--r-- vlad/vlad     19274 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_reqs.c
-rw-r--r-- vlad/vlad      3356 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_sqp.c
-rw-r--r-- vlad/vlad      5321 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_tools.h
-rw-r--r-- vlad/vlad      8730 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ehca_uverbs.c
-rw-r--r-- vlad/vlad     27968 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_if.c
-rw-r--r-- vlad/vlad      9357 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_if.h
-rw-r--r-- vlad/vlad      2456 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_phyp.c
-rw-r--r-- vlad/vlad      2927 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hcp_phyp.h
-rw-r--r-- vlad/vlad      2411 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hipz_fns.h
-rw-r--r-- vlad/vlad      3412 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hipz_fns_core.h
-rw-r--r-- vlad/vlad      8966 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/hipz_hw.h
-rw-r--r-- vlad/vlad      7476 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ipz_pt_fn.c
-rw-r--r-- vlad/vlad      8733 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ehca/ipz_pt_fn.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/
-rw-r--r-- vlad/vlad       425 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/Kconfig
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/Makefile
-rw-r--r-- vlad/vlad     24608 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_common.h
-rw-r--r-- vlad/vlad     12017 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.c
-rw-r--r-- vlad/vlad      4209 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_debug.h
-rw-r--r-- vlad/vlad     15027 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_diag.c
-rw-r--r-- vlad/vlad      4906 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_dma.c
-rw-r--r-- vlad/vlad     67267 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_driver.c
-rw-r--r-- vlad/vlad     23428 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_eeprom.c
-rw-r--r-- vlad/vlad     70677 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_file_ops.c
-rw-r--r-- vlad/vlad      9387 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_fs.c
-rw-r--r-- vlad/vlad     54076 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_iba6110.c
-rw-r--r-- vlad/vlad     49945 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_iba6120.c
-rw-r--r-- vlad/vlad     31654 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_init_chip.c
-rw-r--r-- vlad/vlad     37435 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_intr.c
-rw-r--r-- vlad/vlad     33139 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h
-rw-r--r-- vlad/vlad      6501 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_keys.c
-rw-r--r-- vlad/vlad     44054 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_mad.c
-rw-r--r-- vlad/vlad      4926 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_mmap.c
-rw-r--r-- vlad/vlad     10334 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_mr.c
-rw-r--r-- vlad/vlad     26505 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_qp.c
-rw-r--r-- vlad/vlad     52463 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_rc.c
-rw-r--r-- vlad/vlad     19080 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_registers.h
-rw-r--r-- vlad/vlad     17517 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_ruc.c
-rw-r--r-- vlad/vlad      9238 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_srq.c
-rw-r--r-- vlad/vlad     10904 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_stats.c
-rw-r--r-- vlad/vlad     20246 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_sysfs.c
-rw-r--r-- vlad/vlad     13730 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_uc.c
-rw-r--r-- vlad/vlad     15626 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_ud.c
-rw-r--r-- vlad/vlad      6025 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_user_pages.c
-rw-r--r-- vlad/vlad     50825 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs.c
-rw-r--r-- vlad/vlad     25960 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs.h
-rw-r--r-- vlad/vlad      8559 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
-rw-r--r-- vlad/vlad      2170 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_wc_ppc64.c
-rw-r--r-- vlad/vlad      6209 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_wc_x86_64.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/Kconfig
-rw-r--r-- vlad/vlad       107 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/Makefile
-rw-r--r-- vlad/vlad      3493 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/ah.c
-rw-r--r-- vlad/vlad     13929 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/cq.c
-rw-r--r-- vlad/vlad      5388 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/doorbell.c
-rw-r--r-- vlad/vlad      9730 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/mad.c
-rw-r--r-- vlad/vlad     20985 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/main.c
-rw-r--r-- vlad/vlad      8852 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/mlx4_ib.h
-rw-r--r-- vlad/vlad      6663 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/mr.c
-rw-r--r-- vlad/vlad     46124 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/qp.c
-rw-r--r-- vlad/vlad      8788 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/srq.c
-rw-r--r-- vlad/vlad      2595 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/infiniband/hw/mlx4/user.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/
-rw-r--r-- vlad/vlad       604 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/Kconfig
-rw-r--r-- vlad/vlad       310 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/Makefile
-rw-r--r-- vlad/vlad      7704 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_allocator.c
-rw-r--r-- vlad/vlad     10152 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_av.c
-rw-r--r-- vlad/vlad      5804 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_catas.c
-rw-r--r-- vlad/vlad     58256 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_cmd.c
-rw-r--r-- vlad/vlad     10839 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_cmd.h
-rw-r--r-- vlad/vlad      2099 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_config_reg.h
-rw-r--r-- vlad/vlad     26337 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_cq.c
-rw-r--r-- vlad/vlad     19050 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_dev.h
-rw-r--r-- vlad/vlad      3550 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_doorbell.h
-rw-r--r-- vlad/vlad     26633 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_eq.c
-rw-r--r-- vlad/vlad      9714 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mad.c
-rw-r--r-- vlad/vlad     38467 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_main.c
-rw-r--r-- vlad/vlad      9659 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mcg.c
-rw-r--r-- vlad/vlad     18187 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_memfree.c
-rw-r--r-- vlad/vlad      5773 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_memfree.h
-rw-r--r-- vlad/vlad     24295 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_mr.c
-rw-r--r-- vlad/vlad      2654 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_pd.c
-rw-r--r-- vlad/vlad      9420 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_profile.c
-rw-r--r-- vlad/vlad      2067 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_profile.h
-rw-r--r-- vlad/vlad     36121 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_provider.c
-rw-r--r-- vlad/vlad      9102 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_provider.h
-rw-r--r-- vlad/vlad     63097 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_qp.c
-rw-r--r-- vlad/vlad      7799 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_reset.c
-rw-r--r-- vlad/vlad     18070 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_srq.c
-rw-r--r-- vlad/vlad      2435 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_uar.c
-rw-r--r-- vlad/vlad      2707 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_user.h
-rw-r--r-- vlad/vlad      3417 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/mthca/mthca_wqe.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/
-rw-r--r-- vlad/vlad       488 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/Kconfig
-rw-r--r-- vlad/vlad       115 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/Makefile
-rw-r--r-- vlad/vlad     32042 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes.c
-rw-r--r-- vlad/vlad     18448 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes.h
-rw-r--r-- vlad/vlad     90884 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_cm.c
-rw-r--r-- vlad/vlad     12173 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_cm.h
-rw-r--r-- vlad/vlad      6964 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_context.h
-rw-r--r-- vlad/vlad    105780 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_hw.c
-rw-r--r-- vlad/vlad     35775 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_hw.h
-rw-r--r-- vlad/vlad     58106 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_nic.c
-rw-r--r-- vlad/vlad      3306 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_user.h
-rw-r--r-- vlad/vlad     30864 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_utils.c
-rw-r--r-- vlad/vlad    123589 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_verbs.c
-rw-r--r-- vlad/vlad      5039 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/hw/nes/nes_verbs.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/
-rw-r--r-- vlad/vlad      1858 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/Kconfig
-rw-r--r-- vlad/vlad       290 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/Makefile
-rw-r--r-- vlad/vlad     19322 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib.h
-rw-r--r-- vlad/vlad     37961 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_cm.c
-rw-r--r-- vlad/vlad      6877 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_fs.c
-rw-r--r-- vlad/vlad     21934 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_ib.c
-rw-r--r-- vlad/vlad     32744 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_main.c
-rw-r--r-- vlad/vlad     25265 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
-rw-r--r-- vlad/vlad      7326 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
-rw-r--r-- vlad/vlad      4517 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/
-rw-r--r-- vlad/vlad       500 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/Kconfig
-rw-r--r-- vlad/vlad       125 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/Makefile
-rw-r--r-- vlad/vlad     18305 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iscsi_iser.c
-rw-r--r-- vlad/vlad     11909 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iscsi_iser.h
-rw-r--r-- vlad/vlad     20298 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iser_initiator.c
-rw-r--r-- vlad/vlad     14624 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iser_memory.c
-rw-r--r-- vlad/vlad     22173 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/iser/iser_verbs.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/
-rw-r--r-- vlad/vlad      1030 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/Kconfig
-rw-r--r-- vlad/vlad       315 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/Makefile
-rw-r--r-- vlad/vlad     12598 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c
-rw-r--r-- vlad/vlad      5484 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h
-rw-r--r-- vlad/vlad     64068 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c
-rw-r--r-- vlad/vlad      6351 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h
-rw-r--r-- vlad/vlad      8431 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h
-rw-r--r-- vlad/vlad     32535 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c
-rw-r--r-- vlad/vlad      5572 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h
-rw-r--r-- vlad/vlad     17797 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c
-rw-r--r-- vlad/vlad      4651 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h
-rw-r--r-- vlad/vlad     26695 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c
-rw-r--r-- vlad/vlad      4000 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h
-rw-r--r-- vlad/vlad      3526 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c
-rw-r--r-- vlad/vlad      2561 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h
-rw-r--r-- vlad/vlad      7195 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c
-rw-r--r-- vlad/vlad     11063 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h
-rw-r--r-- vlad/vlad     20373 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c
-rw-r--r-- vlad/vlad      2096 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h
-rw-r--r-- vlad/vlad      3107 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h
-rw-r--r-- vlad/vlad      6900 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h
-rw-r--r-- vlad/vlad     29143 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c
-rw-r--r-- vlad/vlad      4876 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/
-rw-r--r-- vlad/vlad      1186 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/Kconfig
-rw-r--r-- vlad/vlad       158 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/Makefile
-rw-r--r-- vlad/vlad      7192 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp.h
-rw-r--r-- vlad/vlad     21016 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_bcopy.c
-rw-r--r-- vlad/vlad     14446 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_cma.c
-rw-r--r-- vlad/vlad     59665 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_main.c
-rw-r--r-- vlad/vlad       278 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/sdp/sdp_socket.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/
-rw-r--r-- vlad/vlad        43 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/Kbuild
-rw-r--r-- vlad/vlad       355 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/Kconfig
-rw-r--r-- vlad/vlad     54715 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.c
-rw-r--r-- vlad/vlad      4099 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srp/ib_srp.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/
-rw-r--r-- vlad/vlad       461 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/Kconfig
-rw-r--r-- vlad/vlad       134 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/Makefile
-rw-r--r-- vlad/vlad      2738 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ib_dm_mad.h
-rw-r--r-- vlad/vlad     63198 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ib_srpt.c
-rw-r--r-- vlad/vlad      4650 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/ib_srpt.h
-rw-r--r-- vlad/vlad     74834 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/scsi_tgt.h
-rw-r--r-- vlad/vlad      8508 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/ulp/srpt/scst_const.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/
-rw-r--r-- vlad/vlad       163 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/Kconfig
-rw-r--r-- vlad/vlad        72 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/Makefile
-rw-r--r-- vlad/vlad     16068 2008-02-28 09:59:50 ofa_kernel-1.3/drivers/infiniband/util/madeye.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/base/
-rw-r--r-- vlad/vlad     12299 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/base/attribute_container.c
-rw-r--r-- vlad/vlad      9582 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/base/transport_class.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/
-rw-r--r-- vlad/vlad       168 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/Makefile
-rw-r--r-- vlad/vlad      9694 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/adapter.h
-rw-r--r-- vlad/vlad      7172 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/ael1002.c
-rw-r--r-- vlad/vlad     24850 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/common.h
-rw-r--r-- vlad/vlad      4741 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_ctl_defs.h
-rw-r--r-- vlad/vlad      3489 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_defs.h
-rw-r--r-- vlad/vlad      3799 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_ioctl.h
-rw-r--r-- vlad/vlad     67785 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_main.c
-rw-r--r-- vlad/vlad     34174 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.c
-rw-r--r-- vlad/vlad      6018 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.h
-rw-r--r-- vlad/vlad      5887 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/firmware_exports.h
-rw-r--r-- vlad/vlad     12681 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/l2t.c
-rw-r--r-- vlad/vlad      4851 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/l2t.h
-rw-r--r-- vlad/vlad     13874 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/mc5.c
-rw-r--r-- vlad/vlad     57153 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/regs.h
-rw-r--r-- vlad/vlad     82893 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/sge.c
-rw-r--r-- vlad/vlad      7942 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/sge_defs.h
-rw-r--r-- vlad/vlad     34389 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/t3_cpl.h
-rw-r--r-- vlad/vlad    107520 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/t3_hw.c
-rw-r--r-- vlad/vlad      2496 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/t3cdev.h
-rw-r--r-- vlad/vlad      1822 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/version.h
-rw-r--r-- vlad/vlad      6723 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/vsc8211.c
-rw-r--r-- vlad/vlad     19733 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/cxgb3/xgmac.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/
-rw-r--r-- vlad/vlad       162 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/Makefile
-rw-r--r-- vlad/vlad      4873 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/alloc.c
-rw-r--r-- vlad/vlad      4420 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/catas.c
-rw-r--r-- vlad/vlad     11872 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/cmd.c
-rw-r--r-- vlad/vlad      7024 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/cq.c
-rw-r--r-- vlad/vlad     17231 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/eq.c
-rw-r--r-- vlad/vlad     29620 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/fw.c
-rw-r--r-- vlad/vlad      4436 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/fw.h
-rw-r--r-- vlad/vlad     11317 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/icm.c
-rw-r--r-- vlad/vlad      4606 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/icm.h
-rw-r--r-- vlad/vlad      4320 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/intf.c
-rw-r--r-- vlad/vlad     25226 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/main.c
-rw-r--r-- vlad/vlad      9386 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/mcg.c
-rw-r--r-- vlad/vlad      8893 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/mlx4.h
-rw-r--r-- vlad/vlad     15069 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/mr.c
-rw-r--r-- vlad/vlad      3058 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/pd.c
-rw-r--r-- vlad/vlad      7702 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/profile.c
-rw-r--r-- vlad/vlad      8785 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/qp.c
-rw-r--r-- vlad/vlad      4996 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/reset.c
-rw-r--r-- vlad/vlad      7035 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/net/mlx4/srq.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:54 ofa_kernel-1.3/drivers/scsi/
-rw-r--r-- vlad/vlad     63387 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/iscsi_tcp.c
-rw-r--r-- vlad/vlad      5183 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/iscsi_tcp.h
-rw-r--r-- vlad/vlad     58063 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/libiscsi.c
-rw-r--r-- vlad/vlad     43019 2008-02-28 09:59:53 ofa_kernel-1.3/drivers/scsi/scsi_transport_iscsi.c
lrwxrwxrwx vlad/vlad         0 2008-02-28 09:59:56 ofa_kernel-1.3/drivers/scsi/Makefile -> ../../ofed_scripts/iscsi_scsi_makefile
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/
-rw-r--r-- vlad/vlad        26 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/Kbuild
-rw-r--r-- vlad/vlad      4920 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_addr.h
-rw-r--r-- vlad/vlad      4399 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_cache.h
-rw-r--r-- vlad/vlad     18741 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_cm.h
-rw-r--r-- vlad/vlad      3503 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_fmr_pool.h
-rw-r--r-- vlad/vlad     22723 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_mad.h
-rw-r--r-- vlad/vlad      2025 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_marshall.h
-rw-r--r-- vlad/vlad      7794 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_pack.h
-rw-r--r-- vlad/vlad     14310 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_sa.h
-rw-r--r-- vlad/vlad      4519 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_smi.h
-rw-r--r-- vlad/vlad      2664 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_umem.h
-rw-r--r-- vlad/vlad      6564 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_cm.h
-rw-r--r-- vlad/vlad      7190 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_mad.h
-rw-r--r-- vlad/vlad      1894 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_sa.h
-rw-r--r-- vlad/vlad     13677 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_user_verbs.h
-rw-r--r-- vlad/vlad     54235 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/ib_verbs.h
-rw-r--r-- vlad/vlad      8777 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/iw_cm.h
-rw-r--r-- vlad/vlad     10998 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/rdma_cm.h
-rw-r--r-- vlad/vlad      1783 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/rdma_cm_ib.h
-rw-r--r-- vlad/vlad      4772 2008-02-28 09:59:50 ofa_kernel-1.3/include/rdma/rdma_user_cm.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/
-rw-r--r-- vlad/vlad     14941 2008-02-28 09:59:50 ofa_kernel-1.3/include/scsi/iscsi_proto.h
-rw-r--r-- vlad/vlad      5621 2008-02-28 09:59:50 ofa_kernel-1.3/include/scsi/srp.h
-rw-r--r-- vlad/vlad     10388 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/iscsi_if.h
-rw-r--r-- vlad/vlad     10226 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/libiscsi.h
-rw-r--r-- vlad/vlad      8718 2008-02-28 09:59:53 ofa_kernel-1.3/include/scsi/scsi_transport_iscsi.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/
-rw-r--r-- vlad/vlad      5294 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/cmd.h
-rw-r--r-- vlad/vlad      3436 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/cq.h
-rw-r--r-- vlad/vlad      9523 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/device.h
-rw-r--r-- vlad/vlad      2894 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/doorbell.h
-rw-r--r-- vlad/vlad      2082 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/driver.h
-rw-r--r-- vlad/vlad      6662 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/qp.h
-rw-r--r-- vlad/vlad      1560 2008-02-28 09:59:53 ofa_kernel-1.3/include/linux/mlx4/srq.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/atomic.h
-rw-r--r-- vlad/vlad       199 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/dma-mapping.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/msr.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/compiler.h
-rw-r--r-- vlad/vlad       394 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      2517 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/debugfs.h
-rw-r--r-- vlad/vlad      3801 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/device.h
-rw-r--r-- vlad/vlad       376 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       648 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       123 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/hardirq.h
-rw-r--r-- vlad/vlad       961 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/idr.h
-rw-r--r-- vlad/vlad      1145 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/if_infiniband.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/io.h
-rw-r--r-- vlad/vlad       267 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ioctl32.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/ip.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/kfifo.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/log2.h
-rw-r--r-- vlad/vlad       690 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/mii.h
-rw-r--r-- vlad/vlad       872 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/mm.h
-rw-r--r-- vlad/vlad       159 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/module.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/net.h
-rw-r--r-- vlad/vlad       845 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/notifier.h
-rw-r--r-- vlad/vlad      2251 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/pci.h
-rw-r--r-- vlad/vlad       509 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/pci_ids.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/random.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       863 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      1263 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/sched.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/signal.h
-rw-r--r-- vlad/vlad      3089 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1386 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/slab.h
-rw-r--r-- vlad/vlad       642 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       527 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/sysfs.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/
-rw-r--r-- vlad/vlad       298 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       192 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/netevent.h
-rw-r--r-- vlad/vlad      9187 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/net/sock.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad        13 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/scsi/scsi_dbg.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/genalloc.c
-rw-r--r-- vlad/vlad      5845 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/ib_idr.c
-rw-r--r-- vlad/vlad      3349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/netevent.c
-rw-r--r-- vlad/vlad     10764 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.5_sles9_sp3/include/src/stream.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/
-rw-r--r-- vlad/vlad       581 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/bitops.h
-rw-r--r-- vlad/vlad       153 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/io.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/msr.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      2517 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/debugfs.h
-rw-r--r-- vlad/vlad      3801 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/idr.h
-rw-r--r-- vlad/vlad      1145 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/if_infiniband.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/ip.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/jiffies.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/kfifo.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/log2.h
-rw-r--r-- vlad/vlad       574 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/mii.h
-rw-r--r-- vlad/vlad       872 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/net.h
-rw-r--r-- vlad/vlad       817 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1704 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/rwsem.h
-rw-r--r-- vlad/vlad      1027 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       759 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/sched.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/signal.h
-rw-r--r-- vlad/vlad      2990 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1566 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/slab.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/types.h
-rw-r--r-- vlad/vlad       155 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/types.h.orig
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/
-rw-r--r-- vlad/vlad       272 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       905 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/netevent.h
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U2/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/
-rw-r--r-- vlad/vlad       581 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/bitops.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/msr.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/
-rw-r--r-- vlad/vlad      2595 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/attribute_container.h
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/crypto.h
-rw-r--r-- vlad/vlad      2517 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/debugfs.h
-rw-r--r-- vlad/vlad      3801 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       414 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/idr.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/if_ether.h
-rw-r--r-- vlad/vlad      1145 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/if_infiniband.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/ip.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/jiffies.h
-rw-r--r-- vlad/vlad       406 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/kfifo.h
-rw-r--r-- vlad/vlad      1473 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/klist.h
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/kref.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/log2.h
-rw-r--r-- vlad/vlad       872 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/mm.h
-rw-r--r-- vlad/vlad       719 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/moduleparam.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/net.h
-rw-r--r-- vlad/vlad       844 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       476 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1704 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/rwsem.h
-rw-r--r-- vlad/vlad      1027 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/signal.h
-rw-r--r-- vlad/vlad      2990 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1208 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/slab.h
-rw-r--r-- vlad/vlad       400 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/timer.h
-rw-r--r-- vlad/vlad      2537 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/transport_class.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/
-rw-r--r-- vlad/vlad       272 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/netevent.h
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_device.h
-rw-r--r-- vlad/vlad       170 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_host.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/scsi/scsi_transport.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/
-rw-r--r-- vlad/vlad        43 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/base.h
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/genalloc.c
-rw-r--r-- vlad/vlad       292 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/init.c
-rw-r--r-- vlad/vlad      3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/netevent.c
-rw-r--r-- vlad/vlad      1422 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/scsi.c
-rw-r--r-- vlad/vlad      4579 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/scsi_lib.c
-rw-r--r-- vlad/vlad      1445 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/attic/backport/2.6.9_U3/include/src/scsi_scan.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/
-rw-r--r-- vlad/vlad       179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/bitops.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      3588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/idr.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/log2.h
-rw-r--r-- vlad/vlad       482 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/net.h
-rw-r--r-- vlad/vlad       816 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/signal.h
-rw-r--r-- vlad/vlad      3274 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1533 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/
-rw-r--r-- vlad/vlad       272 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/netevent.h
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/scsi/scsi_host.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/
-rw-r--r-- vlad/vlad       179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/bitops.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      3582 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/fs.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/idr.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/log2.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/net.h
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/pci.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/signal.h
-rw-r--r-- vlad/vlad      2791 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1547 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1486 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/neighbour.h
-rw-r--r-- vlad/vlad      2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.11_FC4/include/scsi/scsi_host.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-x86_64/
-rw-r--r-- vlad/vlad       634 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-x86_64/dma-mapping.h
-rw-r--r-- vlad/vlad       377 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm-x86_64/swiotlb.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/
-rw-r--r-- vlad/vlad       179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/atomic.h
-rw-r--r-- vlad/vlad       192 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/dma-mapping.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/scatterlist.h
-rw-r--r-- vlad/vlad       176 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/asm/swiotlb.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/cache.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      4273 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       358 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/idr.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/log2.h
-rw-r--r-- vlad/vlad       475 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/net.h
-rw-r--r-- vlad/vlad       840 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/signal.h
-rw-r--r-- vlad/vlad      2954 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1553 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/netevent.h
-rw-r--r-- vlad/vlad      2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/scsi/scsi_host.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.12/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/
-rw-r--r-- vlad/vlad       179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/atomic.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/cpumask.h
-rw-r--r-- vlad/vlad       295 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/idr.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/log2.h
-rw-r--r-- vlad/vlad       475 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/net.h
-rw-r--r-- vlad/vlad       816 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/signal.h
-rw-r--r-- vlad/vlad      2961 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1373 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/netevent.h
-rw-r--r-- vlad/vlad      2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       216 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/scsi/scsi_host.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3323 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/
-rw-r--r-- vlad/vlad       179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/atomic.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/cpumask.h
-rw-r--r-- vlad/vlad       295 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/fs.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/log2.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/net.h
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1370 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/pci.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       928 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/signal.h
-rw-r--r-- vlad/vlad      2791 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1367 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/timer.h
-rw-r--r-- vlad/vlad       366 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1486 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/neighbour.h
-rw-r--r-- vlad/vlad      2418 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.13_suse10_0_u/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/
-rw-r--r-- vlad/vlad       179 2008-02-28 09:59:50 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/atomic.h
-rw-r--r-- vlad/vlad      4315 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/cpumask.h
-rw-r--r-- vlad/vlad       295 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/log2.h
-rw-r--r-- vlad/vlad       475 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/net.h
-rw-r--r-- vlad/vlad       816 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1307 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       928 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/signal.h
-rw-r--r-- vlad/vlad      2961 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1113 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/timer.h
-rw-r--r-- vlad/vlad       328 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/netevent.h
-rw-r--r-- vlad/vlad       150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/net/sock.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       216 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/scsi/scsi_host.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.14/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/cpumask.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/log2.h
-rw-r--r-- vlad/vlad       475 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/net.h
-rw-r--r-- vlad/vlad       809 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/signal.h
-rw-r--r-- vlad/vlad      2634 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/timer.h
-rw-r--r-- vlad/vlad       308 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/netevent.h
-rw-r--r-- vlad/vlad       150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/net/sock.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/
-rw-r--r-- vlad/vlad       432 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/bitops.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/cpumask.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/log2.h
-rw-r--r-- vlad/vlad       475 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/mm.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/net.h
-rw-r--r-- vlad/vlad       674 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/signal.h
-rw-r--r-- vlad/vlad      2790 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/timer.h
-rw-r--r-- vlad/vlad       308 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/netevent.h
-rw-r--r-- vlad/vlad       150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/net/sock.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.15_ubuntu606/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/
-rw-r--r-- vlad/vlad      4322 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/
-rw-r--r-- vlad/vlad       223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpu.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpumask.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/log2.h
-rw-r--r-- vlad/vlad       186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/net.h
-rw-r--r-- vlad/vlad       698 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/signal.h
-rw-r--r-- vlad/vlad      2627 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/netevent.h
-rw-r--r-- vlad/vlad       150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/net/sock.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/
-rw-r--r-- vlad/vlad      4322 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/
-rw-r--r-- vlad/vlad       223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/cpu.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/crypto.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/log2.h
-rw-r--r-- vlad/vlad       186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/net.h
-rw-r--r-- vlad/vlad       701 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/notifier.h
-rw-r--r-- vlad/vlad        56 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/parser.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/signal.h
-rw-r--r-- vlad/vlad      2634 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1091 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/net/netevent.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/
-rw-r--r-- vlad/vlad      4155 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/
-rw-r--r-- vlad/vlad       223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/cpu.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/crypto.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/log2.h
-rw-r--r-- vlad/vlad       186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/net.h
-rw-r--r-- vlad/vlad       443 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       505 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/netdevice.h.orig
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/notifier.h
-rw-r--r-- vlad/vlad        56 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/parser.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/signal.h
-rw-r--r-- vlad/vlad      2331 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1397 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/net/netevent.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp1/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm-generic/
-rw-r--r-- vlad/vlad       899 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/
-rw-r--r-- vlad/vlad      4155 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/
-rw-r--r-- vlad/vlad       223 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/bitops.h
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/cpu.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/log2.h
-rw-r--r-- vlad/vlad       186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/net.h
-rw-r--r-- vlad/vlad       443 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/notifier.h
-rw-r--r-- vlad/vlad        56 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/parser.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       636 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/signal.h
-rw-r--r-- vlad/vlad      2331 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1397 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/timer.h
-rw-r--r-- vlad/vlad       138 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/net/netevent.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3348 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.16_sles10_sp2/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/
-rw-r--r-- vlad/vlad      1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/cpu.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       521 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/ip.h
-rw-r--r-- vlad/vlad       440 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/kernel.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/log2.h
-rw-r--r-- vlad/vlad       186 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/net.h
-rw-r--r-- vlad/vlad       674 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/rwsem.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/signal.h
-rw-r--r-- vlad/vlad      2627 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/slab.h
-rw-r--r-- vlad/vlad       167 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/netevent.h
-rw-r--r-- vlad/vlad       150 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/net/sock.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/src/genalloc.c
-rw-r--r-- vlad/vlad      3367 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.17/include/src/netevent.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/asm/
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/compiler.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/crypto.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       604 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/ip.h
-rw-r--r-- vlad/vlad       258 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/log2.h
-rw-r--r-- vlad/vlad       121 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/net.h
-rw-r--r-- vlad/vlad       413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/random.h
-rw-r--r-- vlad/vlad       230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      1982 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       489 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/slab.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/tcp.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/types.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/net/neighbour.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/src/
-rw-r--r-- vlad/vlad      5439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/src/genalloc.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/
-rw-r--r-- vlad/vlad      1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/compiler.h
-rw-r--r-- vlad/vlad       439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       550 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/ip.h
-rw-r--r-- vlad/vlad       309 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/log2.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/net.h
-rw-r--r-- vlad/vlad       413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/random.h
-rw-r--r-- vlad/vlad       230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      2332 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/slab.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/utsname.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/net/neighbour.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18/include/src/genalloc.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/
-rw-r--r-- vlad/vlad      1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/compiler.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/crypto.h
-rw-r--r-- vlad/vlad       439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       173 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       550 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/ip.h
-rw-r--r-- vlad/vlad       258 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/log2.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/net.h
-rw-r--r-- vlad/vlad       413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/random.h
-rw-r--r-- vlad/vlad       230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       736 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      2333 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/slab.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/utsname.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/net/neighbour.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_FC6/include/src/genalloc.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/
-rw-r--r-- vlad/vlad      1289 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/hvcall.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/compiler.h
-rw-r--r-- vlad/vlad       439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       205 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       498 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/interrupt.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/ip.h
-rw-r--r-- vlad/vlad       309 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/log2.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/net.h
-rw-r--r-- vlad/vlad       413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       339 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/pci.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/pci_regs.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/random.h
-rw-r--r-- vlad/vlad       230 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      2331 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       764 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/slab.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/timer.h
-rw-r--r-- vlad/vlad       161 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/utsname.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/net/neighbour.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/src/
-rw-r--r-- vlad/vlad      5439 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.18_suse10_2/include/src/genalloc.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/compiler.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/ip.h
-rw-r--r-- vlad/vlad       258 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/log2.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/notifier.h
-rw-r--r-- vlad/vlad       624 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/pci.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      2149 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/slab.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/tcp.h
-rw-r--r-- vlad/vlad       321 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/timer.h
-rw-r--r-- vlad/vlad       141 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/types.h
-rw-r--r-- vlad/vlad      1748 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/net/neighbour.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.19/include/src/genalloc.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/compiler.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/if_ether.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/ip.h
-rw-r--r-- vlad/vlad       159 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/kernel.h
-rw-r--r-- vlad/vlad       368 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/log2.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/notifier.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      2149 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/slab.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/linux/tcp.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/net/neighbour.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/src/
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.20/include/src/genalloc.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/asm/prom.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/compiler.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/ip.h
-rw-r--r-- vlad/vlad       159 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/kernel.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/notifier.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad      2149 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/skbuff.h
-rw-r--r-- vlad/vlad       459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/slab.h
-rw-r--r-- vlad/vlad       240 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/linux/tcp.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/net/ip.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.21/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/compiler.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/linux/slab.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/net/ip.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/scsi/
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/
-rw-r--r-- vlad/vlad       166 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/compiler.h
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       459 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/linux/slab.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/net/ip.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/scsi/
-rw-r--r-- vlad/vlad       413 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.22_suse10_3/include/scsi/scsi_cmnd.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/linux/
-rw-r--r-- vlad/vlad       610 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/linux/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/net/
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.23/include/net/ip.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/
-rw-r--r-- vlad/vlad       581 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/bitops.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/msr.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/asm/scatterlist.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/
-rw-r--r-- vlad/vlad      2595 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/attribute_container.h
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/compiler.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/crypto.h
-rw-r--r-- vlad/vlad      2517 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/debugfs.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/idr.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/if_ether.h
-rw-r--r-- vlad/vlad      1145 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/if_infiniband.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       586 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/ip.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/jiffies.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/kfifo.h
-rw-r--r-- vlad/vlad      1473 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/klist.h
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/kref.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/log2.h
-rw-r--r-- vlad/vlad       872 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/mm.h
-rw-r--r-- vlad/vlad       719 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/moduleparam.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/net.h
-rw-r--r-- vlad/vlad       844 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       476 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1128 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/rwsem.h
-rw-r--r-- vlad/vlad      1253 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/signal.h
-rw-r--r-- vlad/vlad      3275 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1208 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/slab.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/timer.h
-rw-r--r-- vlad/vlad      2537 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/transport_class.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/
-rw-r--r-- vlad/vlad       272 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/inet_sock.h
-rw-r--r-- vlad/vlad      1048 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/netevent.h
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_device.h
-rw-r--r-- vlad/vlad       274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_host.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/scsi/scsi_transport.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/
-rw-r--r-- vlad/vlad        43 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/base.h
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/genalloc.c
-rw-r--r-- vlad/vlad       292 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/init.c
-rw-r--r-- vlad/vlad      3349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/netevent.c
-rw-r--r-- vlad/vlad      1422 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/scsi.c
-rw-r--r-- vlad/vlad      4579 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/scsi_lib.c
-rw-r--r-- vlad/vlad      1445 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U4/include/src/scsi_scan.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-generic/
-rw-r--r-- vlad/vlad      1430 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-powerpc/
-rw-r--r-- vlad/vlad        30 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm-powerpc/system.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/
-rw-r--r-- vlad/vlad       581 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/bitops.h
-rw-r--r-- vlad/vlad      9134 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/hvcall.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/scatterlist.h
-rw-r--r-- vlad/vlad       158 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/asm/smp.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/
-rw-r--r-- vlad/vlad      2595 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/attribute_container.h
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/cpu.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/crypto.h
-rw-r--r-- vlad/vlad      2517 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/debugfs.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/idr.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/if_ether.h
-rw-r--r-- vlad/vlad      1145 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/if_infiniband.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       641 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/ip.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/jiffies.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/kfifo.h
-rw-r--r-- vlad/vlad      1473 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/klist.h
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/kref.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/log2.h
-rw-r--r-- vlad/vlad       872 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/mm.h
-rw-r--r-- vlad/vlad       719 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/moduleparam.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/net.h
-rw-r--r-- vlad/vlad       844 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       476 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1128 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/rwsem.h
-rw-r--r-- vlad/vlad      1253 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/signal.h
-rw-r--r-- vlad/vlad      3275 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1208 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/slab.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/timer.h
-rw-r--r-- vlad/vlad      2537 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/transport_class.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/
-rw-r--r-- vlad/vlad       272 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/netevent.h
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_device.h
-rw-r--r-- vlad/vlad       170 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_host.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/scsi/scsi_transport.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/
-rw-r--r-- vlad/vlad        43 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/base.h
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/genalloc.c
-rw-r--r-- vlad/vlad       292 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/init.c
-rw-r--r-- vlad/vlad      3349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/netevent.c
-rw-r--r-- vlad/vlad      1422 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/scsi.c
-rw-r--r-- vlad/vlad      4579 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/scsi_lib.c
-rw-r--r-- vlad/vlad      1445 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U5/include/src/scsi_scan.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-generic/
-rw-r--r-- vlad/vlad       899 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-generic/bug.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-powerpc/
-rw-r--r-- vlad/vlad        30 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm-powerpc/system.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/
-rw-r--r-- vlad/vlad       581 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/atomic.h
-rw-r--r-- vlad/vlad       590 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/bitops.h
-rw-r--r-- vlad/vlad      9134 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/hvcall.h
-rw-r--r-- vlad/vlad      4244 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/msr.h
-rw-r--r-- vlad/vlad       255 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/pgtable-4k.h
-rw-r--r-- vlad/vlad       342 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/pgtable-64k.h
-rw-r--r-- vlad/vlad       174 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/prom.h
-rw-r--r-- vlad/vlad       109 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/scatterlist.h
-rw-r--r-- vlad/vlad       158 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/asm/smp.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/
-rw-r--r-- vlad/vlad      2595 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/attribute_container.h
-rw-r--r-- vlad/vlad       409 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/bitops.h
-rw-r--r-- vlad/vlad       335 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/cache.h
-rw-r--r-- vlad/vlad       294 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/compiler.h
-rw-r--r-- vlad/vlad       247 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/cpu.h
-rw-r--r-- vlad/vlad       194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/cpumask.h
-rw-r--r-- vlad/vlad      1274 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/crypto.h
-rw-r--r-- vlad/vlad      2517 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/debugfs.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/device.h
-rw-r--r-- vlad/vlad       327 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/dma-mapping.h
-rw-r--r-- vlad/vlad       185 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/err.h
-rw-r--r-- vlad/vlad       329 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/etherdevice.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/ethtool.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/fs.h
-rw-r--r-- vlad/vlad      1436 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/genalloc.h
-rw-r--r-- vlad/vlad       210 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/idr.h
-rw-r--r-- vlad/vlad       215 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/if_ether.h
-rw-r--r-- vlad/vlad      1145 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/if_infiniband.h
-rw-r--r-- vlad/vlad       412 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/if_vlan.h
-rw-r--r-- vlad/vlad       575 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/inetdevice.h
-rw-r--r-- vlad/vlad       641 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/interrupt.h
-rw-r--r-- vlad/vlad        20 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/io.h
-rw-r--r-- vlad/vlad       232 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/ip.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/jiffies.h
-rw-r--r-- vlad/vlad       534 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/kernel.h
-rw-r--r-- vlad/vlad      4194 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/kfifo.h
-rw-r--r-- vlad/vlad      1473 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/klist.h
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/kref.h
-rw-r--r-- vlad/vlad     10211 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/lockdep.h
-rw-r--r-- vlad/vlad      4502 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/log2.h
-rw-r--r-- vlad/vlad       872 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/mm.h
-rw-r--r-- vlad/vlad       719 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/moduleparam.h
-rw-r--r-- vlad/vlad       718 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/mutex.h
-rw-r--r-- vlad/vlad       433 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/net.h
-rw-r--r-- vlad/vlad       844 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/netdevice.h
-rw-r--r-- vlad/vlad       476 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/netlink.h
-rw-r--r-- vlad/vlad       692 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/notifier.h
-rw-r--r-- vlad/vlad      1128 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/pci.h
-rw-r--r-- vlad/vlad       180 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/random.h
-rw-r--r-- vlad/vlad       424 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/rbtree.h
-rw-r--r-- vlad/vlad       183 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/rtnetlink.h
-rw-r--r-- vlad/vlad       175 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/rwsem.h
-rw-r--r-- vlad/vlad      1253 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/scatterlist.h
-rw-r--r-- vlad/vlad       146 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/signal.h
-rw-r--r-- vlad/vlad      3275 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/skbuff.h
-rw-r--r-- vlad/vlad      1208 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/slab.h
-rw-r--r-- vlad/vlad       233 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/spinlock.h
-rw-r--r-- vlad/vlad       349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/tcp.h
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/timer.h
-rw-r--r-- vlad/vlad      2537 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/transport_class.h
-rw-r--r-- vlad/vlad       312 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/types.h
-rw-r--r-- vlad/vlad       213 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/utsname.h
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/vmalloc.h
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/linux/workqueue.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/
-rw-r--r-- vlad/vlad       272 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/dst.h
-rw-r--r-- vlad/vlad       405 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/inet_hashtables.h
-rw-r--r-- vlad/vlad        79 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/inet_sock.h
-rw-r--r-- vlad/vlad       227 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/ip.h
-rw-r--r-- vlad/vlad       171 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/neighbour.h
-rw-r--r-- vlad/vlad       784 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/netevent.h
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/sock.h
-rw-r--r-- vlad/vlad        80 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/net/tcp_states.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/
-rw-r--r-- vlad/vlad       124 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi.h
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_cmnd.h
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_device.h
-rw-r--r-- vlad/vlad       170 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_host.h
-rw-r--r-- vlad/vlad       201 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/scsi/scsi_transport.h
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/
-rw-r--r-- vlad/vlad        43 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/base.h
-rw-r--r-- vlad/vlad      5437 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/genalloc.c
-rw-r--r-- vlad/vlad       292 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/init.c
-rw-r--r-- vlad/vlad      3349 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/netevent.c
-rw-r--r-- vlad/vlad      1422 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/scsi.c
-rw-r--r-- vlad/vlad      4579 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/scsi_lib.c
-rw-r--r-- vlad/vlad      1445 2008-02-28 09:59:51 ofa_kernel-1.3/kernel_addons/backport/2.6.9_U6/include/src/scsi_scan.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel/
-rw-r--r-- vlad/vlad      5225 2008-02-28 09:59:53 ofa_kernel-1.3/kernel/kfifo.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/
-rw-r--r-- vlad/vlad      2671 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/2_misc_device_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       709 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_6720_to_2_6_9.patch
-rw-r--r-- vlad/vlad       732 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/addr_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       471 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cm_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      5770 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/core_4807_to_2_6_9.patch
-rw-r--r-- vlad/vlad      8340 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad       715 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxgb3_t3_hw_to_2.6.5_sles9_sp3.patch
-rw-r--r-- vlad/vlad      4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/cxio_hal_to_2.6.14.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1665 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-03-iowrite32_copy.patch
-rw-r--r-- vlad/vlad      4200 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-05-page-hacks-2.6.14.patch
-rw-r--r-- vlad/vlad      1437 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-06-page-hacks-2.6.9.patch
-rw-r--r-- vlad/vlad       912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-07-iounmap-2.6.9.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad      4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-09-sysfs-show-2.6.12.patch
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-10-rlimit-2.6.9.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-13-class-2.6.9.patch
-rw-r--r-- vlad/vlad      3115 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-15-kref-2.6.5.patch
-rw-r--r-- vlad/vlad     12734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1096 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath-17-ipath_intr-2.6.18.patch
-rw-r--r-- vlad/vlad      1221 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipath_rev_for_2_6_22.patch
-rw-r--r-- vlad/vlad      2984 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_8111_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1983 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      6183 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1939 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad      1412 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/iwch_cm_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      1505 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/linux_stuff_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       466 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/mlx4_makefile_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       469 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/mthca_catas_reset_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad       908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/rds_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1056 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sa_query_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      2239 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sdp_bcopy_8802_to_2_6_5-7.244.patch
-rw-r--r-- vlad/vlad       465 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sdp_cma_8111_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      8459 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/sdp_main_to_2_6_5-7.244.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad       319 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_Makefile_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       948 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/t3_hw_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       473 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/top_8109_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2378 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/user_mad_4603_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1633 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/user_mad_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      3910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/uverbs_8802_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad      1706 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/uverbs_main_3935_to_2_6_9.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.5_sles9_sp3/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/
-rw-r--r-- vlad/vlad      2671 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/2_misc_device_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/addr_4670_to_2_6_9.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/core_1sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      5770 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/core_4807_to_2_6_9.patch
-rw-r--r-- vlad/vlad      8340 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/cxio_hal_to_2.6.14.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1665 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-03-iowrite32_copy.patch
-rw-r--r-- vlad/vlad      4200 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-05-page-hacks-2.6.14.patch
-rw-r--r-- vlad/vlad      1437 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-06-page-hacks-2.6.9.patch
-rw-r--r-- vlad/vlad       912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-07-iounmap-2.6.9.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad      4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-09-sysfs-show-2.6.12.patch
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-10-rlimit-2.6.9.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-13-class-2.6.9.patch
-rw-r--r-- vlad/vlad     12734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1096 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath-17-ipath_intr-2.6.18.patch
-rw-r--r-- vlad/vlad      1221 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipath_rev_for_2_6_22.patch
-rw-r--r-- vlad/vlad     13882 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad      6508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad      2825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/iwch_cm_to_2_6_9_U2.patch
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/iwch_provider_to_2.6.9_U4.patch
-rw-r--r-- vlad/vlad      1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad       908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/rds_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/sdp_7277_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4557 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       948 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/t3_hw_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2378 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/user_mad_4603_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1706 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/uverbs_main_3935_to_2_6_9.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U2/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/
-rw-r--r-- vlad/vlad      2671 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/2_misc_device_to_2_6_9.patch
-rw-r--r-- vlad/vlad      3195 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/add_iscsi_session_wq.patch
-rw-r--r-- vlad/vlad      6569 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/add_open_iscsi.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/addr_4670_to_2_6_9.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/core_1sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      5770 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/core_4807_to_2_6_9.patch
-rw-r--r-- vlad/vlad      8340 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/cxio_hal_to_2.6.14.patch
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/fix_inclusion_order_iscsi_iser.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1665 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-03-iowrite32_copy.patch
-rw-r--r-- vlad/vlad      4200 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-05-page-hacks-2.6.14.patch
-rw-r--r-- vlad/vlad      1437 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-06-page-hacks-2.6.9.patch
-rw-r--r-- vlad/vlad       912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-07-iounmap-2.6.9.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad      4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-09-sysfs-show-2.6.12.patch
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-10-rlimit-2.6.9.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-13-class-2.6.9.patch
-rw-r--r-- vlad/vlad     12734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1096 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath-17-ipath_intr-2.6.18.patch
-rw-r--r-- vlad/vlad      1221 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipath_rev_for_2_6_22.patch
-rw-r--r-- vlad/vlad     13882 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad      6508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad      2825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad      2390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iscsi_scsi_addons.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iser_handle_non_sg_data.patch
-rw-r--r-- vlad/vlad       401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iwch_cm_to_2_6_9_U3.patch
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/iwch_provider_to_2.6.9_U4.patch
-rw-r--r-- vlad/vlad      1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad      9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/qlgc_vnic_sysfs_nested_class_dev.patch
-rw-r--r-- vlad/vlad       908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/rds_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1705 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/release_host_lock_before_eh.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/sdp_7277_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4557 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       948 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/t3_hw_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2378 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/user_mad_4603_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1706 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/uverbs_main_3935_to_2_6_9.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/backport/2.6.9_U3/uverbs_to_2_6_17.patch
-rw-r--r-- vlad/vlad      1367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/ipoib_crash_wa.patch
-rw-r--r-- vlad/vlad      5576 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/ipoib_napi_optional.patch
-rw-r--r-- vlad/vlad       779 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0010_ipoib_high_dma.patch
-rw-r--r-- vlad/vlad     10526 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0017_ipoib_sg.patch
-rw-r--r-- vlad/vlad      6726 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0019_hw_csum.patch
-rw-r--r-- vlad/vlad      4735 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0050_ipoib_checksum_offload.patch
-rw-r--r-- vlad/vlad       694 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0060_ipoib_qp_init_attr.patch
-rw-r--r-- vlad/vlad      8374 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0110_ipoib_lso.patch
-rw-r--r-- vlad/vlad      4438 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0120_ipoib_ethtool.patch
-rw-r--r-- vlad/vlad     17108 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0130_ipoib_lro.patch
-rw-r--r-- vlad/vlad      3223 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0160_ipoib_modify_cq.patch
-rw-r--r-- vlad/vlad      1268 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0220_control_lro.patch
-rw-r--r-- vlad/vlad      1465 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/attic/t_0240_cq_coal_ipoib_cm.path
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/iw_nes_200_to_2_6_13.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad       428 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/rds_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/sdp_7277_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_cmd_to_2_6_22.patch
-rwxr-xr-x vlad/vlad       976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad       506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/t3_hw_to_2_6_13.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/user_mad_3935_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad       839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/uverbs_main_3935_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       743 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad       649 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/mthca_provider_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad       428 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/rds_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/sdp_7277_to_2_6_13.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/user_mad_3935_to_2_6_11_FC4.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad       839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/uverbs_main_3935_to_2_6_11_FC4.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.11_FC4/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/iw_nes_200_to_2_6_13.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/sdp_7277_to_2_6_13.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/t3_hw_to_2_6_13.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.12/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/iw_nes_200_to_2_6_13.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/sdp_7277_to_2_6_13.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/t3_hw_to_2_6_13.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/sdp_7277_to_2_6_13.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.13_suse10_0_u/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.14/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.15_ubuntu606/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad     49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch
-rw-r--r-- vlad/vlad       593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iscsi_02_865_to_2_6_9-19.patch
-rw-r--r-- vlad/vlad       768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iser_sync_with_open_iscsi_2.0-865.13.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad     49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch
-rw-r--r-- vlad/vlad       593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iscsi_02_865_to_2_6_9-19.patch
-rw-r--r-- vlad/vlad       768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iser_sync_with_open_iscsi_2.0-865.13.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1480 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp1/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_0400_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad     49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch
-rw-r--r-- vlad/vlad       593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iscsi_02_865_to_2_6_9-19.patch
-rw-r--r-- vlad/vlad       768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iser_sync_with_open_iscsi_2.0-865.13.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.16_sles10_sp2/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       782 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/srp_cmd_to_2_6_22.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.17/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipath-20-vmalloc_user-2.6.18.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad     49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch
-rw-r--r-- vlad/vlad       593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iscsi_02_865_to_2_6_9-19.patch
-rw-r--r-- vlad/vlad       768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iser_sync_with_open_iscsi_2.0-865.13.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/linux_genalloc_to_2_6_20.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18-EL5.1/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipath-20-vmalloc_user-2.6.18.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/linux_genalloc_to_2_6_20.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipath-20-vmalloc_user-2.6.18.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad     49962 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iscsi_01_sync_kernel_code_with_release_2.0-865.15.patch
-rw-r--r-- vlad/vlad       593 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iscsi_02_865_to_2_6_9-19.patch
-rw-r--r-- vlad/vlad       768 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iser_sync_with_open_iscsi_2.0-865.13.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/linux_genalloc_to_2_6_20.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_FC6/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      4785 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipath-20-vmalloc_user-2.6.18.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/linux_genalloc_to_2_6_20.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.18_suse10_2/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1820 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/2_misc_device_to_2_6_19.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/linux_genalloc_to_2_6_20.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.19/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad       512 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/ipoib_skb_to_2_6_20.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       546 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/linux_genalloc_to_2_6_20.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.20/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       524 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/ipoib_csum_offload_to_2.6.21.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/sdp_ia64.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.21/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      5367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/ipoib_to_2.6.23.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/sdp_ia64.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       822 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/cxgb3_main_to_2_6_22.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      5367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/ipoib_to_2.6.23.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/sdp_ia64.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.22_suse10_3/srp_cmd_to_2_6_22.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/core_sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       523 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      5367 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/ipoib_to_2.6.23.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1019 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/sdp_ia64.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.23/srp_0200_revert_srp_transport_to_2.6.23.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/2_misc_device_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/addr_4670_to_2_6_9.patch
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/amso1100_makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/core_1sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4827 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/core_4807_to_2_6_9U4.patch
-rwxr-xr-x vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/core_ib_verbs_to_2_6_9.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/cxio_hal_to_2.6.14.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-03-iowrite32_copy.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad      4258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-05-page-hacks-2.6.14.patch
-rw-r--r-- vlad/vlad      1420 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-06-page-hacks-2.6.9.patch
-rw-r--r-- vlad/vlad       912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-07-iounmap-2.6.9.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad      4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-09-sysfs-show-2.6.12.patch
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-10-rlimit-2.6.9.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-13-class-2.6.9.patch
-rw-r--r-- vlad/vlad       574 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-14-class-2.6.9_U4.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      5818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-19-remove-struct-device_attribute-attr-args.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad    104942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_01_sync_kernel_code_with_ofed_1_2_5.patch
-rw-r--r-- vlad/vlad      6054 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_02_add_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2440 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_03_add_session_wq.patch
-rw-r--r-- vlad/vlad       444 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_04_inet_sock_to_opt.patch
-rw-r--r-- vlad/vlad      1622 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_05_release_host_lock_before_eh.patch
-rw-r--r-- vlad/vlad      2362 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iscsi_06_scsi_addons.patch
-rw-r--r-- vlad/vlad      1702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_01_revert_da9c0c770e775e655e3f77c96d91ee557b117adb.patch
-rw-r--r-- vlad/vlad       585 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_02_revert_d8196ed2181b4595eaf464a5bcbddb6c28649a39.patch
-rw-r--r-- vlad/vlad      3202 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_03_revert_1548271ece9e9312fd5feb41fd58773b56a71d39.patch
-rw-r--r-- vlad/vlad      1297 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_04_revert_77a23c21aaa723f6b0ffc4a701be8c8e5a32346d.patch
-rw-r--r-- vlad/vlad       683 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_05_revert_b2c6416736b847b91950bd43cc5153e11a1f83ee.patch
-rw-r--r-- vlad/vlad       670 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_06_revert_857ae0bdb72999936a28ce621e38e2e288c485da.patch
-rw-r--r-- vlad/vlad       637 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_07_revert_8ad5781ae9702a8f95cfdf30967752e4297613ee.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_08_revert_0801c242a33426fddc005c2f559a3d2fa6fca7eb.patch
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iser_09_fix_inclusion_order.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_nes_200_to_2_6_13.patch
-rw-r--r-- vlad/vlad      2001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iw_nes_300_to_2_6_9.patch
-rw-r--r-- vlad/vlad       401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iwch_cm_to_2_6_9_U4.patch
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/iwch_provider_to_2.6.9_U4.patch
-rw-r--r-- vlad/vlad      1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad       812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mlx4_compiler_warning.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad      9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/qlgc_vnic_sysfs_nested_class_dev.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad       881 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/rds_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/sdp_7277_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad       783 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_0300_include_linux_scatterlist_h.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_cmd_to_2_6_22.patch
-rwxr-xr-x vlad/vlad       976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/t3_hw_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      3036 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/user_mad_4603_to_2_6_9U4.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1875 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/uverbs_main_3935_to_2_6_9U4.patch
-rw-r--r-- vlad/vlad      1525 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U4/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/2_misc_device_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/addr_4670_to_2_6_9.patch
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/amso1100_makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad      3624 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_1_2.6.9.patch
-rw-r--r-- vlad/vlad     25896 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_2_rhel45_umap.patch
-rw-r--r-- vlad/vlad      7924 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_3_rhel45_dma.patch
-rw-r--r-- vlad/vlad       942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/backport_ehca_4_rhel45_dma_fix.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/core_1sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4827 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/core_4807_to_2_6_9U4.patch
-rwxr-xr-x vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/core_ib_verbs_to_2_6_9.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0500_is_valid_ether_addr.patch
-rw-r--r-- vlad/vlad      1493 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_0600_simple_strtoul.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/cxio_hal_to_2.6.14.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-03-iowrite32_copy.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad      4258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-05-page-hacks-2.6.14.patch
-rw-r--r-- vlad/vlad      1420 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-06-page-hacks-2.6.9.patch
-rw-r--r-- vlad/vlad       912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-07-iounmap-2.6.9.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad      4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-09-sysfs-show-2.6.12.patch
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-10-rlimit-2.6.9.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-13-class-2.6.9.patch
-rw-r--r-- vlad/vlad       574 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-14-class-2.6.9_U4.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      5818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-19-remove-struct-device_attribute-attr-args.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad    104942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_01_sync_kernel_code_with_ofed_1_2_5.patch
-rw-r--r-- vlad/vlad      6054 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_02_add_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2440 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_03_add_session_wq.patch
-rw-r--r-- vlad/vlad       444 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_04_inet_sock_to_opt.patch
-rw-r--r-- vlad/vlad      1622 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_05_release_host_lock_before_eh.patch
-rw-r--r-- vlad/vlad      2362 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iscsi_06_scsi_addons.patch
-rw-r--r-- vlad/vlad      1702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_01_revert_da9c0c770e775e655e3f77c96d91ee557b117adb.patch
-rw-r--r-- vlad/vlad       585 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_02_revert_d8196ed2181b4595eaf464a5bcbddb6c28649a39.patch
-rw-r--r-- vlad/vlad      3202 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_03_revert_1548271ece9e9312fd5feb41fd58773b56a71d39.patch
-rw-r--r-- vlad/vlad      1297 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_04_revert_77a23c21aaa723f6b0ffc4a701be8c8e5a32346d.patch
-rw-r--r-- vlad/vlad       683 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_05_revert_b2c6416736b847b91950bd43cc5153e11a1f83ee.patch
-rw-r--r-- vlad/vlad       670 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_06_revert_857ae0bdb72999936a28ce621e38e2e288c485da.patch
-rw-r--r-- vlad/vlad       637 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_07_revert_8ad5781ae9702a8f95cfdf30967752e4297613ee.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_08_revert_0801c242a33426fddc005c2f559a3d2fa6fca7eb.patch
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iser_09_fix_inclusion_order.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad       545 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_cxgb3_0300_idr.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_nes_200_to_2_6_13.patch
-rw-r--r-- vlad/vlad      2001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iw_nes_300_to_2_6_9.patch
-rw-r--r-- vlad/vlad       401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iwch_cm_to_2_6_9_U4.patch
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/iwch_provider_to_2.6.9_U4.patch
-rw-r--r-- vlad/vlad      1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad       812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mlx4_compiler_warning.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad      9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/qlgc_vnic_sysfs_nested_class_dev.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad       881 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/rds_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/sdp_7277_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad       783 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_0300_include_linux_scatterlist_h.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_cmd_to_2_6_22.patch
-rwxr-xr-x vlad/vlad       976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/t3_hw_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      3036 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/user_mad_4603_to_2_6_9U4.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1875 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/uverbs_main_3935_to_2_6_9U4.patch
-rw-r--r-- vlad/vlad      1525 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U5/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/
-rw-r--r-- vlad/vlad      2729 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/1_struct_path_revert_to_2_6_19.patch
-rw-r--r-- vlad/vlad      1565 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/2_misc_device_to_2_6_9.patch
-rw-r--r-- vlad/vlad      1636 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/addr_1_netevents_revert_to_2_6_17.patch
-rw-r--r-- vlad/vlad       578 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/addr_3926_to_2_6_13.patch
-rw-r--r-- vlad/vlad       686 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/addr_4670_to_2_6_9.patch
-rw-r--r-- vlad/vlad       344 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/amso1100_makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad      3624 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_1_2.6.9.patch
-rw-r--r-- vlad/vlad     25896 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_2_rhel45_umap.patch
-rw-r--r-- vlad/vlad      7924 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_3_rhel45_dma.patch
-rw-r--r-- vlad/vlad       942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/backport_ehca_4_rhel45_dma_fix.patch
-rw-r--r-- vlad/vlad       950 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/core_1sysfs_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4827 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/core_4807_to_2_6_9U4.patch
-rwxr-xr-x vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/core_ib_verbs_to_2_6_9.patch
-rw-r--r-- vlad/vlad      8169 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxg3_to_2_6_20.patch
-rw-r--r-- vlad/vlad     18918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0100_napi.patch
-rw-r--r-- vlad/vlad       897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0200_sset.patch
-rw-r--r-- vlad/vlad       535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0300_sysfs.patch
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0500_is_valid_ether_addr.patch
-rw-r--r-- vlad/vlad      1493 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_0600_simple_strtoul.patch
-rw-r--r-- vlad/vlad       301 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_makefile_to_2_6_19.patch
-rw-r--r-- vlad/vlad      3074 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxgb3_remove_eeh.patch
-rw-r--r-- vlad/vlad      4218 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/cxio_hal_to_2.6.14.patch
-rw-r--r-- vlad/vlad      5166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ehca_01_ibmebus_loc_code.patch
-rw-r--r-- vlad/vlad      2708 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-01-header.patch
-rw-r--r-- vlad/vlad      1693 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-02-dont-leak-info-to-userspace.patch
-rw-r--r-- vlad/vlad      3535 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-03-iowrite32_copy.patch
-rw-r--r-- vlad/vlad      1538 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-04-aio_write.patch
-rw-r--r-- vlad/vlad      4258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-05-page-hacks-2.6.14.patch
-rw-r--r-- vlad/vlad      1420 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-06-page-hacks-2.6.9.patch
-rw-r--r-- vlad/vlad       912 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-07-iounmap-2.6.9.patch
-rw-r--r-- vlad/vlad       910 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-08-fs-get_sb-2.6.17.patch
-rw-r--r-- vlad/vlad      4543 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-09-sysfs-show-2.6.12.patch
-rw-r--r-- vlad/vlad       558 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-10-rlimit-2.6.9.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-13-class-2.6.9.patch
-rw-r--r-- vlad/vlad       574 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-14-class-2.6.9_U4.patch
-rw-r--r-- vlad/vlad     14453 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-16-htirq-2.6.18.patch
-rw-r--r-- vlad/vlad      5818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-19-remove-struct-device_attribute-attr-args.patch
-rw-r--r-- vlad/vlad      1354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipath-21-warnings.patch
-rw-r--r-- vlad/vlad     14083 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0100_to_2.6.21.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0110_restore_get_stats.patch
-rw-r--r-- vlad/vlad      6603 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0200_class_device_to_2_6_20.patch
-rw-r--r-- vlad/vlad      1941 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_0300_class_device_to_2_6_20_umcast.patch
-rw-r--r-- vlad/vlad      2869 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ipoib_to_2_6_16.patch
-rw-r--r-- vlad/vlad    104942 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_01_sync_kernel_code_with_ofed_1_2_5.patch
-rw-r--r-- vlad/vlad      6054 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_02_add_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2440 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_03_add_session_wq.patch
-rw-r--r-- vlad/vlad       444 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_04_inet_sock_to_opt.patch
-rw-r--r-- vlad/vlad      1622 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_05_release_host_lock_before_eh.patch
-rw-r--r-- vlad/vlad      2362 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iscsi_06_scsi_addons.patch
-rw-r--r-- vlad/vlad      1702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_01_revert_da9c0c770e775e655e3f77c96d91ee557b117adb.patch
-rw-r--r-- vlad/vlad       585 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_02_revert_d8196ed2181b4595eaf464a5bcbddb6c28649a39.patch
-rw-r--r-- vlad/vlad      3202 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_03_revert_1548271ece9e9312fd5feb41fd58773b56a71d39.patch
-rw-r--r-- vlad/vlad      1297 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_04_revert_77a23c21aaa723f6b0ffc4a701be8c8e5a32346d.patch
-rw-r--r-- vlad/vlad       683 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_05_revert_b2c6416736b847b91950bd43cc5153e11a1f83ee.patch
-rw-r--r-- vlad/vlad       670 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_06_revert_857ae0bdb72999936a28ce621e38e2e288c485da.patch
-rw-r--r-- vlad/vlad       637 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_07_revert_8ad5781ae9702a8f95cfdf30967752e4297613ee.patch
-rw-r--r-- vlad/vlad       980 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_08_revert_0801c242a33426fddc005c2f559a3d2fa6fca7eb.patch
-rw-r--r-- vlad/vlad       511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iser_09_fix_inclusion_order.patch
-rw-r--r-- vlad/vlad       699 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_cxgb3_0100_namespace.patch
-rw-r--r-- vlad/vlad       495 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_cxgb3_0200_states.patch
-rw-r--r-- vlad/vlad       545 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_cxgb3_0300_idr.patch
-rw-r--r-- vlad/vlad      4390 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_nes_100_to_2_6_23.patch
-rw-r--r-- vlad/vlad      1409 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_nes_200_to_2_6_13.patch
-rw-r--r-- vlad/vlad      2001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iw_nes_300_to_2_6_9.patch
-rw-r--r-- vlad/vlad       401 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iwch_cm_to_2_6_9_U4.patch
-rw-r--r-- vlad/vlad       588 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/iwch_provider_to_2.6.9_U4.patch
-rw-r--r-- vlad/vlad      1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/linux_stuff_to_2_6_17.patch
-rw-r--r-- vlad/vlad       336 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/makefile_to_2_6_9.patch
-rw-r--r-- vlad/vlad       341 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mlx4_0050_wc.patch
-rw-r--r-- vlad/vlad       812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mlx4_compiler_warning.patch
-rw-r--r-- vlad/vlad      1928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mthca_0001_pcix_to_2_6_22.patch
-rw-r--r-- vlad/vlad       734 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/mthca_dev_3465_to_2_6_11.patch
-rw-r--r-- vlad/vlad      9567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/qlgc_vnic_sysfs_nested_class_dev.patch
-rw-r--r-- vlad/vlad      1166 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/rds_to_2_6_20.patch
-rw-r--r-- vlad/vlad       881 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/rds_to_2_6_9.patch
-rw-r--r-- vlad/vlad      2049 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/sdp_0100_revert_to_2_6_23.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/sdp_7277_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1210 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_0100_revert_role_to_2_6_23.patch
-rw-r--r-- vlad/vlad      4858 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_0200_revert_srp_transport_to_2.6.23.patch
-rw-r--r-- vlad/vlad       783 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_0300_include_linux_scatterlist_h.patch
-rw-r--r-- vlad/vlad      2632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_7312_to_2_6_11.patch
-rw-r--r-- vlad/vlad      2380 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_cmd_to_2_6_22.patch
-rwxr-xr-x vlad/vlad       976 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/srp_scsi_scan_target_7242_to_2_6_11.patch
-rw-r--r-- vlad/vlad      1443 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/t3_hw_to_2_6_5-7_244.patch
-rw-r--r-- vlad/vlad       628 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ucm_5245_to_2_6_9.patch
-rw-r--r-- vlad/vlad       723 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ucm_to_2_6_16.patch
-rw-r--r-- vlad/vlad       702 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/ucma_to_2_6_16.patch
-rw-r--r-- vlad/vlad      3036 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/user_mad_4603_to_2_6_9U4.patch
-rw-r--r-- vlad/vlad      1092 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/user_mad_to_2_6_16.patch
-rw-r--r-- vlad/vlad      1875 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/uverbs_main_3935_to_2_6_9U4.patch
-rw-r--r-- vlad/vlad      1525 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/uverbs_to_2_6_16.patch
-rw-r--r-- vlad/vlad       842 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/backport/2.6.9_U6/uverbs_to_2_6_17.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/
-rwxr-xr-x vlad/vlad      1865 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0010_response_timeout.patch
-rw-r--r-- vlad/vlad      1591 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0020__iwcm_ordird.patch
-rw-r--r-- vlad/vlad      1329 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0030_tavor_quirk.patch
-rw-r--r-- vlad/vlad      1268 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0040_re-enable-device-removal.patch
-rw-r--r-- vlad/vlad      2303 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_0050_rcma_cma_mra.patch
-rw-r--r-- vlad/vlad      2818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cma_established1.patch
-rw-r--r-- vlad/vlad      2609 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0010_dma_map_sg.patch
-rw-r--r-- vlad/vlad      1462 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0020_csum.patch
-rw-r--r-- vlad/vlad      1711 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0025_qp_create_flags.patch
-rw-r--r-- vlad/vlad      1731 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0030_lso.patch
-rw-r--r-- vlad/vlad      2050 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0040_modify_cq.patch
-rw-r--r-- vlad/vlad     23821 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0050_xrc.patch
-rw-r--r-- vlad/vlad     14035 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0060_xrc_file_desc.patch
-rw-r--r-- vlad/vlad      5473 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0080_kernel_xrc.patch
-rw-r--r-- vlad/vlad      1171 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0090_core_delete_redundant_check_for_DR_SMP.patch
-rw-r--r-- vlad/vlad      1685 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0100_core_Dont_modify_outgoing_DR_SMP_if_first_pa.patch
-rw-r--r-- vlad/vlad     20581 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/core_0110_xrc_rcv.patch
-rw-r--r-- vlad/vlad       753 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0010_MSI-X_failure_path.patch
-rw-r--r-- vlad/vlad      1611 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0020_Use_wild_card_for_PCI_subdevice_ID_match.patch
-rw-r--r-- vlad/vlad       397 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_00300_add_ofed_version_tag.patch
-rw-r--r-- vlad/vlad      1223 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0030_Fix_resources_release.patch
-rw-r--r-- vlad/vlad      3642 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0040_Add_EEH_support.patch
-rw-r--r-- vlad/vlad      1679 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0050_FW_upgrade.patch
-rw-r--r-- vlad/vlad      7199 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0060_fix_interaction_with_pktgen.patch
-rw-r--r-- vlad/vlad      3612 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0070_sysfs_methods_clean_up.patch
-rw-r--r-- vlad/vlad      5234 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0080_HW_set_up_updates.patch
-rw-r--r-- vlad/vlad      1952 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0090_Fix_I-O_synchronization.patch
-rw-r--r-- vlad/vlad      8617 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0100_trim_trailing_whitespace.patch
-rw-r--r-- vlad/vlad     30508 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0210_Parity_initialization_for_T3C_adapters.patch
-rw-r--r-- vlad/vlad      2515 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0220_Fix_EEH_missing_softirq_blocking.patch
-rw-r--r-- vlad/vlad      1015 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0230_Handle_ARP_completions_that_mark_neighbors_stale.patch
-rw-r--r-- vlad/vlad      2354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0001_Add_missing_spaces_in_the_middle_of_format.patch
-rw-r--r-- vlad/vlad      2103 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0002_Forward_event_client_reregister_required.patch
-rw-r--r-- vlad/vlad      1063 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0003_Use_round_jiffies_for_EQ_polling_timer.patch
-rw-r--r-- vlad/vlad      1106 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0004_Remove_CQ_QP_link_before_destroying_QP.patch
-rw-r--r-- vlad/vlad      1908 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0005_Define_array_to_store_SMI_GSI_QPs.patch
-rw-r--r-- vlad/vlad     14839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0006_Add_port_connection_autodetect_mode.patch
-rw-r--r-- vlad/vlad      9231 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0007_Prevent_RDMA_related_connection_failures.patch
-rw-r--r-- vlad/vlad       614 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0008_Prevent_sending_ud_packets_to_qp0.patch
-rw-r--r-- vlad/vlad       935 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0009_Update_sma_attr_also_in_case_of_disruptive.patch
-rw-r--r-- vlad/vlad      5963 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0010_Add_PMA_support.patch
-rw-r--r-- vlad/vlad       986 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0011_Alloc_firmware_context_with_GFP_ATOMIC.patch
-rw-r--r-- vlad/vlad       997 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ehca_0012_Change_version_number.patch
-rw-r--r-- vlad/vlad      3246 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath-22-memcpy_cachebypass.patch
-rw-r--r-- vlad/vlad      1324 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0030_improve_interrupt_handler_cache_footprin.patch
-rw-r--r-- vlad/vlad      4421 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0040_convert_the_semaphore_ipath_eep_s.patch
-rw-r--r-- vlad/vlad      2921 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0050_remove_dead_code_for_user_process_waiting.patch
-rw-r--r-- vlad/vlad     11481 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0060_fix_sendctrl_locking.patch
-rw-r--r-- vlad/vlad      1132 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0070_fix_return_error_number_for_ib_resize_cq.patch
-rw-r--r-- vlad/vlad      1129 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0080_fix_comments_for_ipath_create_srq.patch
-rw-r--r-- vlad/vlad      1353 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0090_better_comment_for_rmb_in_ipath_intr.patch
-rw-r--r-- vlad/vlad      1148 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0100_add_the_work_completion_error_code_to_the.patch
-rw-r--r-- vlad/vlad      1371 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0120_enable_loopback_of_DR_SMP_responses_from.patch
-rw-r--r-- vlad/vlad      2906 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0130_fix_RNR_NAK_handling.patch
-rw-r--r-- vlad/vlad      1482 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0140_cleanup_ipath_get_egrbuf.patch
-rw-r--r-- vlad/vlad     10304 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0150_kreceive_uses_portdata_rather_than_devdat.patch
-rw-r--r-- vlad/vlad      8155 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0160_generalize_some_macros_SHIFT.patch
-rw-r--r-- vlad/vlad      6446 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0170_changes_for_fields_moving_from_devdata_to.patch
-rw-r--r-- vlad/vlad     50685 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0180_header_file_changes_to_support_IBA7220.patch
-rw-r--r-- vlad/vlad      3290 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0190_isolate_7220_specific_content.patch
-rw-r--r-- vlad/vlad     86870 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0200_HCA_specific_code_to_support_IBA7220.patch
-rw-r--r-- vlad/vlad     33032 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0210_support_for_SerDes_portion_of_IBA7220.patch
-rw-r--r-- vlad/vlad     57778 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0220_add_IBA7220_specific_initialization_data.patch
-rw-r--r-- vlad/vlad     23073 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0230_add_code_for_IBA7220_send_DMA.patch
-rw-r--r-- vlad/vlad      3273 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0240_user_mode_send_DMA_header_file.patch
-rw-r--r-- vlad/vlad     22812 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0250_user_mode_send_DMA.patch
-rw-r--r-- vlad/vlad     86395 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0260_remaining_7220_changes_to_headers_and_af.patch
-rw-r--r-- vlad/vlad      3749 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0270_misc_changes_to_prepare_for_iba7220_intro.patch
-rw-r--r-- vlad/vlad      1718 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0280_cancel_send_DMA_buffers.patch
-rw-r--r-- vlad/vlad     39087 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0290_changes_to_IB_link_state_machine_handling.patch
-rw-r--r-- vlad/vlad     49741 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0300_error_handling_improvements_debuggabilit.patch
-rw-r--r-- vlad/vlad     15460 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0310_eeprom_support_for_7220_devices_robustne.patch
-rw-r--r-- vlad/vlad      8993 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0320_enable_use_of_4KB_MTU_via_module_paramate.patch
-rw-r--r-- vlad/vlad      7967 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0330_infrastructure_updates_for_sdma_support.patch
-rw-r--r-- vlad/vlad      6918 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0370_enable_sdma_for_user_programs.patch
-rw-r--r-- vlad/vlad      4725 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0340_changes_to_support_PIO_bandwidth_check_on.patch
-rw-r--r-- vlad/vlad     44086 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0350_add_remaining_small_pieces_of_7220_suppor.patch
-rw-r--r-- vlad/vlad     41192 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0360_misc_changes_related_to_the_iba7220.patch
-rw-r--r-- vlad/vlad      3632 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0380_set_ipath_lbus_info_where_bus_parameters.patch
-rw-r--r-- vlad/vlad      5111 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0390_fix_IB_compliance_problems_with_link_stat.patch
-rw-r--r-- vlad/vlad      1863 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0400_set_static_rate_and_VL15_flags_for_IBA722.patch
-rw-r--r-- vlad/vlad     62922 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0410_update.patch
-rw-r--r-- vlad/vlad      7759 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0420_ipoib_4k_mtu.patch
-rw-r--r-- vlad/vlad       871 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipath_0430_dapl_rdma_read.patch
-rw-r--r-- vlad/vlad       973 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0010_Add-high-dma-support-to-ipoib.patch
-rw-r--r-- vlad/vlad      9243 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0020_Add-s-g-support-for-IPOIB.patch
-rw-r--r-- vlad/vlad      4649 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0040_checksum-offload.patch
-rw-r--r-- vlad/vlad     10194 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0050_Add-LSO-support.patch
-rw-r--r-- vlad/vlad      4567 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0060_ethtool-support.patch
-rw-r--r-- vlad/vlad      3494 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0070_modiy_cq_params.patch
-rw-r--r-- vlad/vlad      1295 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0110_set_default_cq_patams.patch
-rw-r--r-- vlad/vlad      1826 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0120_check_grat_arp_with_cm.patch
-rw-r--r-- vlad/vlad     10825 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0180_split_cq.patch
-rw-r--r-- vlad/vlad     13253 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch
-rw-r--r-- vlad/vlad     20425 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0200_non_srq.patch
-rw-r--r-- vlad/vlad      2882 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0210_draft_wr.patch
-rw-r--r-- vlad/vlad      4898 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0220_ud_post_list.patch
-rw-r--r-- vlad/vlad      6181 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0230_srq_post_n.patch
-rw-r--r-- vlad/vlad     14279 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0240_4kmtu.patch
-rw-r--r-- vlad/vlad      1290 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0250_non_srq_param.patch
-rw-r--r-- vlad/vlad       970 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0260_pkey_change.patch
-rw-r--r-- vlad/vlad       596 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0270_remove_alloc.patch
-rw-r--r-- vlad/vlad      7229 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0280_vmap.patch
-rw-r--r-- vlad/vlad      3001 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0290_reduce_cm_tx.patch
-rw-r--r-- vlad/vlad      2061 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0300_reap.patch
-rw-r--r-- vlad/vlad      1011 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0310_def_ring_sizes.patch
-rw-r--r-- vlad/vlad      1894 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0320_small_skb_copy.patch
-rw-r--r-- vlad/vlad       839 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_0330_child_mtu.patch
-rw-r--r-- vlad/vlad      2299 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/ipoib_selector_updated.patch
-rw-r--r-- vlad/vlad      1030 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch
-rw-r--r-- vlad/vlad      1570 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0020_Hold_rtnl_lock_around_ethtool_get_drvinfo_call.patch
-rw-r--r-- vlad/vlad      1996 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0030_Support_version_5.0_firmware.patch
-rw-r--r-- vlad/vlad      1968 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0040_Flush_the_RQ_when_closing.patch
-rw-r--r-- vlad/vlad      1284 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0050_fix_page_shift_calculation.patch
-rw-r--r-- vlad/vlad      1385 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0060_Mark_qp_as_privileged.patch
-rw-r--r-- vlad/vlad      2550 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0070_Fix_the_T3A_workaround_checks.patch
-rw-r--r-- vlad/vlad      3372 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mad_0010_enable_loopback_of_DR_SMP_responses_from_use.patch
-rw-r--r-- vlad/vlad      8923 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0010_add_wc.patch
-rw-r--r-- vlad/vlad      1131 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0015_set_cacheline_sz.patch
-rw-r--r-- vlad/vlad      1066 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0020_cmd_tout.patch
-rw-r--r-- vlad/vlad      5101 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0030_checksum_offload.patch
-rw-r--r-- vlad/vlad       724 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0040_qp_max_msg.patch
-rw-r--r-- vlad/vlad      2354 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0045_qp_flags.patch
-rw-r--r-- vlad/vlad      9394 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0050_lso.patch
-rw-r--r-- vlad/vlad      6385 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0060_modify_cq.patch
-rw-r--r-- vlad/vlad     24057 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0070_xrc.patch
-rw-r--r-- vlad/vlad      3191 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0080_profile_parm.patch
-rw-r--r-- vlad/vlad      5511 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0090_fix_sq_wrs.patch
-rw-r--r-- vlad/vlad      9449 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0120_xrc_kernel.patch
-rw-r--r-- vlad/vlad      1866 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0125_xrc_kernel_missed.patch
-rw-r--r-- vlad/vlad       793 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0150_increase_default_qp.patch
-rw-r--r-- vlad/vlad     17818 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0170_shrinking_wqe.patch
-rw-r--r-- vlad/vlad       903 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0180_max_eqs.patch
-rw-r--r-- vlad/vlad      1042 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0190_bogus_qp_event.patch
-rw-r--r-- vlad/vlad     15928 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0210_xrc_rcv.patch
-rw-r--r-- vlad/vlad      1432 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0220_enable_qos.patch
-rw-r--r-- vlad/vlad      2956 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0230_hw_id.patch
-rw-r--r-- vlad/vlad      1570 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0240_optimize_poll.patch
-rw-r--r-- vlad/vlad     14258 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0250_debug_output.patch
-rw-r--r-- vlad/vlad      1571 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0260_optimze_stamping.patch
-rw-r--r-- vlad/vlad      2156 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0270_fmr_enable.patch
-rw-r--r-- vlad/vlad      8901 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0280_diag_counters_sysfs.patch
-rw-r--r-- vlad/vlad      3447 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0290_mcast_loopback.patch
-rw-r--r-- vlad/vlad       971 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0300_bogus_qp.patch
-rw-r--r-- vlad/vlad      1245 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mlx4_0310_date_version.patch
-rw-r--r-- vlad/vlad       599 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0001_catas_wqueue_namelen.patch
-rw-r--r-- vlad/vlad      2838 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0002_wrid_swap.patch
-rw-r--r-- vlad/vlad      5646 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0003_checksum_offload.patch
-rw-r--r-- vlad/vlad      4721 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0004_prelink_wqes.patch
-rw-r--r-- vlad/vlad      3037 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0005_hw_ver.patch
-rw-r--r-- vlad/vlad      3043 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0006_page_size_calc.patch
-rw-r--r-- vlad/vlad      1450 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0007_fmr_alloc_error.patch
-rw-r--r-- vlad/vlad       709 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0008_roland_fmr_alloc_fix.patch
-rw-r--r-- vlad/vlad       971 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0009_sg_init_table.patch
-rw-r--r-- vlad/vlad       920 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0010_bogus_qp.patch
-rw-r--r-- vlad/vlad       705 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/mthca_0011_date_version.patch
-rw-r--r-- vlad/vlad      1484 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_cm_flush_workqueue.patch
-rw-r--r-- vlad/vlad      5666 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_cm_limit_mra_timeout.patch
-rw-r--r-- vlad/vlad     39321 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_1_notifications.patch
-rw-r--r-- vlad/vlad     41248 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_2_cache.patch
-rw-r--r-- vlad/vlad       773 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_3_disable.patch
-rw-r--r-- vlad/vlad      1211 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_4_fix_hang.patch
-rw-r--r-- vlad/vlad      2640 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
-rwxr-xr-x vlad/vlad       883 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_2_disconnect_without_wait.patch
-rwxr-xr-x vlad/vlad      2506 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_3_qp_err_timer_reconnect_target.patch
-rw-r--r-- vlad/vlad      2897 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_4_respect_target_credit_limit.patch
-rw-r--r-- vlad/vlad     12540 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_5_add_info_to_log_messages.patch
-rw-r--r-- vlad/vlad      4419 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/srp_6_retry_stale_connections.patch
-rw-r--r-- vlad/vlad       972 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/fixes/uverbs_warning.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/hpage_patches/
-rw-r--r-- vlad/vlad      5418 2008-02-28 09:59:53 ofa_kernel-1.3/kernel_patches/hpage_patches/hpages.patch
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/lib/
-rw-r--r-- vlad/vlad      7010 2008-02-28 09:59:53 ofa_kernel-1.3/lib/klist.c
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/net/
drwxr-xr-x vlad/vlad         0 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/
-rw-r--r-- vlad/vlad       311 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/Kconfig
-rw-r--r-- vlad/vlad       569 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/Makefile
-rw-r--r-- vlad/vlad     15905 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/af_rds.c
-rw-r--r-- vlad/vlad      4934 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/bind.c
-rw-r--r-- vlad/vlad     11842 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/cong.c
-rw-r--r-- vlad/vlad     12563 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/connection.c
-rw-r--r-- vlad/vlad      6593 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib.c
-rw-r--r-- vlad/vlad      7355 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib.h
-rw-r--r-- vlad/vlad     19290 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_cm.c
-rw-r--r-- vlad/vlad     13630 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_rdma.c
-rw-r--r-- vlad/vlad      6879 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_rds.h
-rw-r--r-- vlad/vlad     27878 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_recv.c
-rw-r--r-- vlad/vlad      4913 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_ring.c
-rw-r--r-- vlad/vlad     21049 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_send.c
-rw-r--r-- vlad/vlad      2749 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_stats.c
-rw-r--r-- vlad/vlad      4952 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/ib_sysctl.c
-rw-r--r-- vlad/vlad      6596 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/info.c
-rw-r--r-- vlad/vlad      1308 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/info.h
-rw-r--r-- vlad/vlad      4418 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/loop.c
-rw-r--r-- vlad/vlad       111 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/loop.h
-rw-r--r-- vlad/vlad      9446 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/message.c
-rw-r--r-- vlad/vlad      5915 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/page.c
-rw-r--r-- vlad/vlad     16448 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/rdma.c
-rw-r--r-- vlad/vlad      1907 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/rdma.h
-rw-r--r-- vlad/vlad     22289 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/rds.h
-rw-r--r-- vlad/vlad     14718 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/recv.c
-rw-r--r-- vlad/vlad     23194 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/send.c
-rw-r--r-- vlad/vlad      4223 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/stats.c
-rw-r--r-- vlad/vlad      4491 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/sysctl.c
-rw-r--r-- vlad/vlad      8073 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp.c
-rw-r--r-- vlad/vlad      2801 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp.h
-rw-r--r-- vlad/vlad      4243 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_connect.c
-rw-r--r-- vlad/vlad      5405 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_listen.c
-rw-r--r-- vlad/vlad      9580 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_recv.c
-rw-r--r-- vlad/vlad      7626 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_send.c
-rw-r--r-- vlad/vlad      2425 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/tcp_stats.c
-rw-r--r-- vlad/vlad      8234 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/threads.c
-rw-r--r-- vlad/vlad      4410 2008-02-28 09:59:53 ofa_kernel-1.3/net/rds/transport.c
lrwxrwxrwx vlad/vlad         0 2008-02-28 09:59:56 ofa_kernel-1.3/configure -> ofed_scripts/configure
lrwxrwxrwx vlad/vlad         0 2008-02-28 09:59:56 ofa_kernel-1.3/Makefile -> ofed_scripts/Makefile
lrwxrwxrwx vlad/vlad         0 2008-02-28 09:59:56 ofa_kernel-1.3/makefile -> ofed_scripts/makefile
-rw-r--r-- vlad/vlad       114 2008-02-28 09:59:54 ofa_kernel-1.3/BUILD_ID
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd ofa_kernel-1.3
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chown -Rhf root .
++ /usr/bin/id -u
+ '[' 0 = 0 ']'
+ /bin/chgrp -Rhf root .
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.52212
+ umask 022
+ cd /var/tmp/OFED_topdir/BUILD
+ /bin/rm -rf /var/tmp/OFED
++ dirname /var/tmp/OFED
+ /bin/mkdir -p /var/tmp
+ /bin/mkdir /var/tmp/OFED
+ cd ofa_kernel-1.3
+ rm -rf /var/tmp/OFED
+ cd /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3
+ mkdir -p /var/tmp/OFED//usr/local/ofed-1.3/src
+ cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 /var/tmp/OFED//usr/local/ofed-1.3/src
+ ./configure --prefix=/usr/local/ofed-1.3 --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp --kernel-sources /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build --modules-dir /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-cxgb3-mod --with-nes-mod --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-srp-target-mod --with-rds-mod --with-qlgc_vnic-mod
ofed_patch.mk does not exist. running ofed_patch.sh
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/ofed_scripts/ofed_patch.sh --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp
Quilt  does not exist... Going to use patch.
mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/patches
touch /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/patches/quiltrc
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0010_response_timeout.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 54 with fuzz 2 (offset -4 lines).
Hunk #2 succeeded at 2179 (offset 18 lines).
Hunk #3 succeeded at 2238 (offset 18 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0020__iwcm_ordird.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 1266 with fuzz 1 (offset 129 lines).
Hunk #2 succeeded at 1316 (offset 126 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0030_tavor_quirk.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 50 with fuzz 1 (offset 2 lines).
Hunk #2 succeeded at 1562 with fuzz 2 (offset 435 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0040_re-enable-device-removal.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 1130 (offset 8 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_0050_rcma_cma_mra.patch
patching file drivers/infiniband/core/cma.c
Hunk #1 succeeded at 1108 (offset 1 line).
Hunk #2 succeeded at 1130 (offset 1 line).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cma_established1.patch
patching file drivers/infiniband/ulp/sdp/sdp.h
Hunk #1 succeeded at 152 with fuzz 1 (offset 24 lines).
patching file drivers/infiniband/ulp/sdp/sdp_bcopy.c
Hunk #1 succeeded at 764 (offset 265 lines).
patching file drivers/infiniband/ulp/sdp/sdp_cma.c
Hunk #1 succeeded at 162 (offset 5 lines).
Hunk #2 succeeded at 294 with fuzz 2 (offset 24 lines).
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 759 (offset 196 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0010_dma_map_sg.patch
patching file drivers/infiniband/core/device.c
patching file drivers/infiniband/core/umem.c
Hunk #1 succeeded at 46 with fuzz 2 (offset 6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0020_csum.patch
patching file include/rdma/ib_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0025_qp_create_flags.patch
patching file drivers/infiniband/core/uverbs_cmd.c
patching file include/rdma/ib_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0030_lso.patch
patching file include/rdma/ib_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0040_modify_cq.patch
patching file include/rdma/ib_verbs.h
Hunk #1 succeeded at 984 (offset 10 lines).
Hunk #2 succeeded at 1391 (offset 10 lines).
patching file drivers/infiniband/core/verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0050_xrc.patch
patching file drivers/infiniband/core/uverbs_main.c
patching file include/rdma/ib_user_verbs.h
patching file drivers/infiniband/core/uverbs_cmd.c
patching file drivers/infiniband/core/verbs.c
patching file include/rdma/ib_verbs.h
patching file drivers/infiniband/core/uverbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0060_xrc_file_desc.patch
patching file drivers/infiniband/core/uverbs_cmd.c
Hunk #5 succeeded at 1151 (offset 1 line).
Hunk #6 succeeded at 1179 (offset 1 line).
Hunk #7 succeeded at 2083 (offset 1 line).
Hunk #8 succeeded at 2115 (offset 1 line).
Hunk #9 succeeded at 2166 (offset 1 line).
Hunk #10 succeeded at 2187 (offset 1 line).
Hunk #11 succeeded at 2319 (offset 1 line).
Hunk #12 succeeded at 2438 (offset 1 line).
Hunk #13 succeeded at 2449 (offset 1 line).
Hunk #14 succeeded at 2506 (offset 1 line).
Hunk #15 succeeded at 2530 (offset 1 line).
Hunk #16 succeeded at 2562 (offset 1 line).
Hunk #17 succeeded at 2588 (offset 1 line).
Hunk #18 succeeded at 2602 (offset 1 line).
patching file include/rdma/ib_verbs.h
Hunk #2 succeeded at 769 (offset 8 lines).
Hunk #3 succeeded at 1092 (offset 8 lines).
patching file drivers/infiniband/core/device.c
patching file drivers/infiniband/core/uverbs_main.c
patching file drivers/infiniband/core/uverbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0080_kernel_xrc.patch
patching file include/rdma/ib_verbs.h
Hunk #1 succeeded at 677 (offset 9 lines).
Hunk #2 succeeded at 803 (offset 9 lines).
Hunk #3 succeeded at 1256 (offset 9 lines).
Hunk #4 succeeded at 1918 (offset 9 lines).
patching file drivers/infiniband/core/verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0090_core_delete_redundant_check_for_DR_SMP.patch
patching file drivers/infiniband/core/mad.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0100_core_Dont_modify_outgoing_DR_SMP_if_first_pa.patch
patching file drivers/infiniband/core/mad.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/core_0110_xrc_rcv.patch
patching file include/rdma/ib_verbs.h
patching file drivers/infiniband/core/uverbs_main.c
patching file drivers/infiniband/core/uverbs_cmd.c
patching file include/rdma/ib_user_verbs.h
patching file drivers/infiniband/core/uverbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0010_MSI-X_failure_path.patch
patching file drivers/net/cxgb3/cxgb3_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0020_Use_wild_card_for_PCI_subdevice_ID_match.patch
patching file drivers/net/cxgb3/cxgb3_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_00300_add_ofed_version_tag.patch
patching file drivers/net/cxgb3/version.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0030_Fix_resources_release.patch
patching file drivers/net/cxgb3/cxgb3_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0040_Add_EEH_support.patch
patching file drivers/net/cxgb3/cxgb3_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0050_FW_upgrade.patch
patching file drivers/net/cxgb3/t3_hw.c
patching file drivers/net/cxgb3/version.h
Hunk #1 succeeded at 38 with fuzz 1.
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0060_fix_interaction_with_pktgen.patch
patching file drivers/net/cxgb3/sge.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0070_sysfs_methods_clean_up.patch
patching file drivers/net/cxgb3/cxgb3_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0080_HW_set_up_updates.patch
patching file drivers/net/cxgb3/cxgb3_main.c
patching file drivers/net/cxgb3/regs.h
patching file drivers/net/cxgb3/t3_hw.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0090_Fix_I-O_synchronization.patch
patching file drivers/net/cxgb3/sge.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0100_trim_trailing_whitespace.patch
patching file drivers/net/cxgb3/cxgb3_main.c
patching file drivers/net/cxgb3/cxgb3_offload.c
patching file drivers/net/cxgb3/firmware_exports.h
patching file drivers/net/cxgb3/t3_hw.c
patching file drivers/net/cxgb3/xgmac.c
Hunk #1 succeeded at 153 (offset 5 lines).
Hunk #2 succeeded at 187 (offset 5 lines).
Hunk #3 succeeded at 336 (offset 6 lines).
Hunk #4 succeeded at 449 (offset 14 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0210_Parity_initialization_for_T3C_adapters.patch
patching file drivers/net/cxgb3/adapter.h
patching file drivers/net/cxgb3/cxgb3_main.c
patching file drivers/net/cxgb3/cxgb3_offload.c
patching file drivers/net/cxgb3/regs.h
patching file drivers/net/cxgb3/sge.c
patching file drivers/net/cxgb3/t3_hw.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0220_Fix_EEH_missing_softirq_blocking.patch
patching file drivers/net/cxgb3/cxgb3_main.c
patching file drivers/net/cxgb3/sge.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/cxgb3_0230_Handle_ARP_completions_that_mark_neighbors_stale.patch
patching file drivers/net/cxgb3/l2t.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0001_Add_missing_spaces_in_the_middle_of_format.patch
patching file drivers/infiniband/hw/ehca/ehca_cq.c
patching file drivers/infiniband/hw/ehca/ehca_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0002_Forward_event_client_reregister_required.patch
patching file drivers/infiniband/hw/ehca/ehca_irq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0003_Use_round_jiffies_for_EQ_polling_timer.patch
patching file drivers/infiniband/hw/ehca/ehca_main.c
Hunk #1 succeeded at 926 (offset 13 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0004_Remove_CQ_QP_link_before_destroying_QP.patch
patching file drivers/infiniband/hw/ehca/ehca_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0005_Define_array_to_store_SMI_GSI_QPs.patch
patching file drivers/infiniband/hw/ehca/ehca_classes.h
patching file drivers/infiniband/hw/ehca/ehca_main.c
Hunk #1 succeeded at 511 (offset 13 lines).
Hunk #2 succeeded at 537 (offset 13 lines).
Hunk #3 succeeded at 550 (offset 13 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0006_Add_port_connection_autodetect_mode.patch
patching file drivers/infiniband/hw/ehca/ehca_classes.h
patching file drivers/infiniband/hw/ehca/ehca_irq.c
patching file drivers/infiniband/hw/ehca/ehca_iverbs.h
patching file drivers/infiniband/hw/ehca/ehca_main.c
patching file drivers/infiniband/hw/ehca/ehca_qp.c
patching file drivers/infiniband/hw/ehca/ehca_sqp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0007_Prevent_RDMA_related_connection_failures.patch
patching file drivers/infiniband/hw/ehca/ehca_classes.h
patching file drivers/infiniband/hw/ehca/ehca_qp.c
patching file drivers/infiniband/hw/ehca/ehca_reqs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0008_Prevent_sending_ud_packets_to_qp0.patch
patching file drivers/infiniband/hw/ehca/ehca_reqs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0009_Update_sma_attr_also_in_case_of_disruptive.patch
patching file drivers/infiniband/hw/ehca/ehca_irq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0010_Add_PMA_support.patch
patching file drivers/infiniband/hw/ehca/ehca_classes.h
patching file drivers/infiniband/hw/ehca/ehca_iverbs.h
patching file drivers/infiniband/hw/ehca/ehca_main.c
patching file drivers/infiniband/hw/ehca/ehca_sqp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0011_Alloc_firmware_context_with_GFP_ATOMIC.patch
patching file drivers/infiniband/hw/ehca/ehca_hca.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ehca_0012_Change_version_number.patch
patching file drivers/infiniband/hw/ehca/ehca_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0030_improve_interrupt_handler_cache_footprin.patch
patching file drivers/infiniband/hw/ipath/ipath_intr.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0040_convert_the_semaphore_ipath_eep_s.patch
patching file drivers/infiniband/hw/ipath/ipath_eeprom.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0050_remove_dead_code_for_user_process_waiting.patch
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0060_fix_sendctrl_locking.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_ruc.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0070_fix_return_error_number_for_ib_resize_cq.patch
patching file drivers/infiniband/hw/ipath/ipath_cq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0080_fix_comments_for_ipath_create_srq.patch
patching file drivers/infiniband/hw/ipath/ipath_srq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0090_better_comment_for_rmb_in_ipath_intr.patch
patching file drivers/infiniband/hw/ipath/ipath_intr.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0100_add_the_work_completion_error_code_to_the.patch
patching file drivers/infiniband/hw/ipath/ipath_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0120_enable_loopback_of_DR_SMP_responses_from.patch
patching file drivers/infiniband/hw/ipath/ipath_mad.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0130_fix_RNR_NAK_handling.patch
patching file drivers/infiniband/hw/ipath/ipath_rc.c
patching file drivers/infiniband/hw/ipath/ipath_ruc.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0140_cleanup_ipath_get_egrbuf.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0150_kreceive_uses_portdata_rather_than_devdat.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_stats.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0160_generalize_some_macros_SHIFT.patch
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_registers.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0170_changes_for_fields_moving_from_devdata_to.patch
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_stats.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0180_header_file_changes_to_support_IBA7220.patch
patching file drivers/infiniband/hw/ipath/Makefile
patching file drivers/infiniband/hw/ipath/ipath_common.h
patching file drivers/infiniband/hw/ipath/ipath_debug.h
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_registers.h
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0190_isolate_7220_specific_content.patch
patching file drivers/infiniband/hw/ipath/ipath_7220.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0200_HCA_specific_code_to_support_IBA7220.patch
patching file drivers/infiniband/hw/ipath/ipath_iba7220.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0210_support_for_SerDes_portion_of_IBA7220.patch
patching file drivers/infiniband/hw/ipath/ipath_sd7220.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0220_add_IBA7220_specific_initialization_data.patch
patching file drivers/infiniband/hw/ipath/ipath_sd7220_img.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0230_add_code_for_IBA7220_send_DMA.patch
patching file drivers/infiniband/hw/ipath/ipath_sdma.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0240_user_mode_send_DMA_header_file.patch
patching file drivers/infiniband/hw/ipath/ipath_user_sdma.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0250_user_mode_send_DMA.patch
patching file drivers/infiniband/hw/ipath/ipath_user_sdma.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0260_remaining_7220_changes_to_headers_and_af.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_registers.h
patching file drivers/infiniband/hw/ipath/ipath_stats.c
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0270_misc_changes_to_prepare_for_iba7220_intro.patch
patching file drivers/infiniband/hw/ipath/ipath_ruc.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0280_cancel_send_DMA_buffers.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0290_changes_to_IB_link_state_machine_handling.patch
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_mad.c
patching file drivers/infiniband/hw/ipath/ipath_registers.h
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0300_error_handling_improvements_debuggabilit.patch
patching file drivers/infiniband/hw/ipath/ipath_common.h
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_registers.h
patching file drivers/infiniband/hw/ipath/ipath_stats.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0310_eeprom_support_for_7220_devices_robustne.patch
patching file drivers/infiniband/hw/ipath/ipath_eeprom.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0320_enable_use_of_4KB_MTU_via_module_paramate.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_mad.c
patching file drivers/infiniband/hw/ipath/ipath_qp.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0330_infrastructure_updates_for_sdma_support.patch
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_rc.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0340_changes_to_support_PIO_bandwidth_check_on.patch
patching file drivers/infiniband/hw/ipath/ipath_common.h
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0350_add_remaining_small_pieces_of_7220_suppor.patch
patching file drivers/infiniband/hw/ipath/Makefile
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_registers.h
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0360_misc_changes_related_to_the_iba7220.patch
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_fs.c
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_keys.c
patching file drivers/infiniband/hw/ipath/ipath_mad.c
patching file drivers/infiniband/hw/ipath/ipath_qp.c
patching file drivers/infiniband/hw/ipath/ipath_stats.c
patching file drivers/infiniband/hw/ipath/ipath_ud.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0370_enable_sdma_for_user_programs.patch
patching file drivers/infiniband/hw/ipath/Makefile
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0380_set_ipath_lbus_info_where_bus_parameters.patch
patching file drivers/infiniband/hw/ipath/ipath_iba6110.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_iba7220.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0390_fix_IB_compliance_problems_with_link_stat.patch
patching file drivers/infiniband/hw/ipath/ipath_common.h
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_mad.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0400_set_static_rate_and_VL15_flags_for_IBA722.patch
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0410_update.patch
patching file drivers/infiniband/hw/ipath/ipath_common.h
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_eeprom.c
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_iba7220.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_intr.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_qp.c
patching file drivers/infiniband/hw/ipath/ipath_rc.c
patching file drivers/infiniband/hw/ipath/ipath_sdma.c
patching file drivers/infiniband/hw/ipath/ipath_srq.c
patching file drivers/infiniband/hw/ipath/ipath_sysfs.c
patching file drivers/infiniband/hw/ipath/ipath_ud.c
patching file drivers/infiniband/hw/ipath/ipath_user_sdma.c
patching file drivers/infiniband/hw/ipath/ipath_user_sdma.h
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
Hunk #1 succeeded at 703 (offset -6 lines).
Hunk #2 succeeded at 1094 (offset -6 lines).
Hunk #3 succeeded at 1396 (offset -6 lines).
Hunk #4 succeeded at 1413 (offset -6 lines).
patching file drivers/infiniband/hw/ipath/ipath_verbs.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0420_ipoib_4k_mtu.patch
patching file drivers/infiniband/hw/ipath/ipath_diag.c
patching file drivers/infiniband/hw/ipath/ipath_driver.c
patching file drivers/infiniband/hw/ipath/ipath_iba6120.c
patching file drivers/infiniband/hw/ipath/ipath_iba7220.c
patching file drivers/infiniband/hw/ipath/ipath_init_chip.c
patching file drivers/infiniband/hw/ipath/ipath_kernel.h
patching file drivers/infiniband/hw/ipath/ipath_rc.c
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
Hunk #1 succeeded at 169 (offset -6 lines).
Hunk #2 succeeded at 1180 (offset -6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath_0430_dapl_rdma_read.patch
patching file drivers/infiniband/hw/ipath/ipath_rc.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipath-22-memcpy_cachebypass.patch
patching file drivers/infiniband/hw/ipath/Makefile
Hunk #1 succeeded at 36 with fuzz 1 (offset 4 lines).
patching file drivers/infiniband/hw/ipath/ipath_verbs.c
patching file drivers/infiniband/hw/ipath/memcpy_cachebypass_x86_64.S
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0010_Add-high-dma-support-to-ipoib.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 1120 (offset 2 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0020_Add-s-g-support-for-IPOIB.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #2 succeeded at 344 (offset 2 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0040_checksum-offload.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0050_Add-LSO-support.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0060_ethtool-support.patch
patching file drivers/infiniband/ulp/ipoib/Makefile
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 522 with fuzz 2 (offset -5 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_etool.c
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 963 (offset -8 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0070_modiy_cq_params.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 305 with fuzz 1 (offset -3 lines).
Hunk #2 succeeded at 387 (offset -3 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_etool.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0110_set_default_cq_patams.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0120_check_grat_arp_with_cm.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 689 (offset -27 lines).
Hunk #2 succeeded at 710 (offset -27 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0180_split_cq.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
patching file drivers/infiniband/ulp/ipoib/ipoib.h
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
patching file drivers/infiniband/ulp/ipoib/ipoib_etool.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0190_unsig_udqp.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0200_non_srq.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #2 succeeded at 259 (offset 1 line).
Hunk #3 succeeded at 308 (offset 1 line).
Hunk #4 succeeded at 552 (offset 1 line).
Hunk #5 succeeded at 590 (offset 1 line).
Hunk #6 succeeded at 613 (offset 1 line).
Hunk #7 succeeded at 645 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0210_draft_wr.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 328 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
Hunk #1 succeeded at 222 with fuzz 1 (offset 5 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0220_ud_post_list.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 98 (offset 1 line).
Hunk #2 succeeded at 328 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
Hunk #2 succeeded at 790 (offset -24 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
Hunk #1 succeeded at 222 (offset -4 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0230_srq_post_n.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 99 (offset 1 line).
Hunk #2 succeeded at 290 (offset 1 line).
Hunk #3 succeeded at 318 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0240_4kmtu.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #2 succeeded at 142 (offset 1 line).
Hunk #3 succeeded at 340 (offset 1 line).
Hunk #4 succeeded at 381 (offset 1 line).
Hunk #5 succeeded at 415 (offset 1 line).
Hunk #6 succeeded at 456 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
Hunk #9 succeeded at 848 (offset 7 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
patching file drivers/infiniband/ulp/ipoib/ipoib_verbs.c
patching file drivers/infiniband/ulp/ipoib/ipoib_vlan.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0250_non_srq_param.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
Hunk #2 succeeded at 1453 (offset 6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0260_pkey_change.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
Hunk #1 succeeded at 959 (offset 7 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0270_remove_alloc.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 282 (offset 1 line).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0280_vmap.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 270 (offset 1 line).
Hunk #2 succeeded at 283 (offset 1 line).
Hunk #3 succeeded at 303 (offset 1 line).
Hunk #4 succeeded at 327 (offset 1 line).
Hunk #5 succeeded at 389 (offset 1 line).
Hunk #6 succeeded at 584 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0290_reduce_cm_tx.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 111 (offset 1 line).
Hunk #2 succeeded at 289 (offset 7 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0300_reap.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0310_def_ring_sizes.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0320_small_skb_copy.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 100 (offset 1 line).
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_0330_child_mtu.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_vlan.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/ipoib_selector_updated.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 201 (offset 19 lines).
Hunk #2 succeeded at 491 (offset 37 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iser_01_Print_information_about_unhandled_RDMA_CM_events.patch
patching file drivers/infiniband/ulp/iser/iser_verbs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0020_Hold_rtnl_lock_around_ethtool_get_drvinfo_call.patch
patching file drivers/infiniband/hw/cxgb3/iwch_provider.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0030_Support_version_5.0_firmware.patch
patching file drivers/infiniband/hw/cxgb3/iwch_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0040_Flush_the_RQ_when_closing.patch
patching file drivers/infiniband/hw/cxgb3/iwch_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0050_fix_page_shift_calculation.patch
patching file drivers/infiniband/hw/cxgb3/iwch_mem.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0060_Mark_qp_as_privileged.patch
patching file drivers/infiniband/hw/cxgb3/cxio_wr.h
patching file drivers/infiniband/hw/cxgb3/iwch_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/iw_cxgb3_0070_Fix_the_T3A_workaround_checks.patch
patching file drivers/infiniband/hw/cxgb3/cxio_hal.c
patching file drivers/infiniband/hw/cxgb3/iwch_cm.c
patching file drivers/infiniband/hw/cxgb3/iwch_provider.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mad_0010_enable_loopback_of_DR_SMP_responses_from_use.patch
patching file drivers/infiniband/core/mad.c
patching file drivers/infiniband/core/smi.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0010_add_wc.patch
patching file drivers/infiniband/hw/mlx4/Makefile
patching file drivers/infiniband/hw/mlx4/main.c
Hunk #2 succeeded at 383 (offset 7 lines).
Hunk #3 succeeded at 701 (offset 89 lines).
patching file drivers/infiniband/hw/mlx4/wc.c
patching file drivers/infiniband/hw/mlx4/wc.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0015_set_cacheline_sz.patch
patching file drivers/net/mlx4/fw.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0020_cmd_tout.patch
patching file drivers/net/mlx4/cmd.c
Hunk #1 succeeded at 278 (offset 6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0030_checksum_offload.patch
patching file drivers/infiniband/hw/mlx4/cq.c
patching file drivers/infiniband/hw/mlx4/main.c
patching file drivers/infiniband/hw/mlx4/qp.c
patching file drivers/net/mlx4/fw.c
patching file include/linux/mlx4/cq.h
patching file include/linux/mlx4/qp.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0040_qp_max_msg.patch
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #1 succeeded at 758 (offset -125 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0045_qp_flags.patch
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
patching file drivers/infiniband/hw/mlx4/qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0050_lso.patch
patching file drivers/infiniband/hw/mlx4/cq.c
patching file drivers/infiniband/hw/mlx4/main.c
patching file drivers/infiniband/hw/mlx4/qp.c
patching file drivers/net/mlx4/fw.c
patching file drivers/net/mlx4/fw.h
patching file drivers/net/mlx4/main.c
patching file include/linux/mlx4/device.h
patching file include/linux/mlx4/qp.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0060_modify_cq.patch
patching file drivers/infiniband/hw/mlx4/main.c
Hunk #1 succeeded at 602 (offset -13 lines).
patching file drivers/infiniband/hw/mlx4/cq.c
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
Hunk #1 succeeded at 252 (offset 3 lines).
patching file drivers/net/mlx4/cq.c
patching file include/linux/mlx4/cq.h
patching file include/linux/mlx4/cmd.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0070_xrc.patch
patching file include/linux/mlx4/device.h
patching file drivers/infiniband/hw/mlx4/main.c
Hunk #1 succeeded at 104 (offset 1 line).
Hunk #2 succeeded at 449 (offset 1 line).
Hunk #3 succeeded at 659 (offset 1 line).
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
Hunk #2 succeeded at 136 (offset 4 lines).
Hunk #3 succeeded at 200 (offset 5 lines).
Hunk #4 succeeded at 280 (offset 5 lines).
patching file drivers/net/mlx4/xrcd.c
patching file drivers/net/mlx4/mlx4.h
patching file drivers/net/mlx4/main.c
patching file drivers/net/mlx4/srq.c
patching file drivers/net/mlx4/fw.c
patching file drivers/net/mlx4/fw.h
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #2 succeeded at 341 with fuzz 2 (offset 12 lines).
Hunk #3 succeeded at 376 (offset 12 lines).
Hunk #4 succeeded at 389 (offset 12 lines).
Hunk #5 succeeded at 424 (offset 12 lines).
Hunk #6 succeeded at 445 (offset 12 lines).
Hunk #7 succeeded at 463 (offset 12 lines).
Hunk #8 succeeded at 541 (offset 12 lines).
Hunk #9 succeeded at 549 (offset 12 lines).
Hunk #10 succeeded at 564 (offset 12 lines).
Hunk #11 succeeded at 581 (offset 12 lines).
Hunk #12 succeeded at 650 (offset 12 lines).
Hunk #13 succeeded at 798 (offset 12 lines).
Hunk #14 succeeded at 914 (offset 12 lines).
Hunk #15 succeeded at 1002 (offset 12 lines).
patching file drivers/infiniband/hw/mlx4/srq.c
patching file include/linux/mlx4/qp.h
patching file drivers/net/mlx4/Makefile
patching file drivers/net/mlx4/qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0080_profile_parm.patch
patching file drivers/net/mlx4/main.c
Hunk #2 succeeded at 562 (offset -2 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0090_fix_sq_wrs.patch
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #4 succeeded at 284 (offset -2 lines).
Hunk #5 succeeded at 292 (offset -2 lines).
Hunk #6 succeeded at 305 (offset -2 lines).
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
patching file drivers/infiniband/hw/mlx4/main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0120_xrc_kernel.patch
patching file drivers/infiniband/hw/mlx4/cq.c
Hunk #2 succeeded at 331 with fuzz 2 (offset -8 lines).
Hunk #3 succeeded at 358 (offset -3 lines).
Hunk #4 succeeded at 392 (offset -3 lines).
Hunk #5 succeeded at 400 (offset -3 lines).
Hunk #6 succeeded at 534 (offset -1 lines).
Hunk #7 succeeded at 556 (offset -1 lines).
patching file drivers/net/mlx4/mlx4.h
patching file drivers/net/mlx4/srq.c
patching file include/linux/mlx4/device.h
patching file include/linux/mlx4/srq.h
patching file drivers/infiniband/hw/mlx4/srq.c
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #1 succeeded at 1395 (offset 19 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0125_xrc_kernel_missed.patch
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #2 succeeded at 1031 (offset 14 lines).
Hunk #3 succeeded at 1041 (offset 14 lines).
Hunk #4 succeeded at 1708 (offset 16 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0150_increase_default_qp.patch
patching file drivers/net/mlx4/main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0170_shrinking_wqe.patch
patching file drivers/infiniband/hw/mlx4/cq.c
Hunk #1 succeeded at 358 (offset 5 lines).
Hunk #2 succeeded at 402 (offset 5 lines).
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #5 succeeded at 353 with fuzz 2 (offset -2 lines).
Hunk #6 succeeded at 428 (offset -2 lines).
Hunk #7 succeeded at 476 with fuzz 2 (offset -4 lines).
Hunk #8 succeeded at 579 (offset -1 lines).
Hunk #9 succeeded at 1089 (offset -1 lines).
Hunk #10 succeeded at 1142 (offset -1 lines).
Hunk #11 succeeded at 1477 (offset -1 lines).
Hunk #12 succeeded at 1500 (offset -1 lines).
Hunk #13 succeeded at 1633 (offset -1 lines).
Hunk #14 succeeded at 1671 (offset -1 lines).
patching file drivers/net/mlx4/alloc.c
patching file include/linux/mlx4/device.h
patching file include/linux/mlx4/qp.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0180_max_eqs.patch
patching file drivers/net/mlx4/fw.c
Hunk #1 succeeded at 205 (offset 3 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0190_bogus_qp_event.patch
patching file drivers/net/mlx4/qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0210_xrc_rcv.patch
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
patching file drivers/infiniband/hw/mlx4/qp.c
patching file drivers/infiniband/hw/mlx4/main.c
patching file drivers/infiniband/hw/mlx4/cq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0220_enable_qos.patch
patching file drivers/net/mlx4/fw.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0230_hw_id.patch
patching file drivers/net/mlx4/fw.h
patching file drivers/net/mlx4/main.c
patching file drivers/infiniband/hw/mlx4/main.c
patching file drivers/net/mlx4/fw.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0240_optimize_poll.patch
patching file drivers/infiniband/hw/mlx4/cq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0250_debug_output.patch
patching file drivers/infiniband/hw/mlx4/main.c
patching file drivers/infiniband/hw/mlx4/mlx4_ib.h
patching file drivers/infiniband/hw/mlx4/qp.c
patching file drivers/infiniband/hw/mlx4/cq.c
patching file drivers/infiniband/hw/mlx4/srq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0260_optimze_stamping.patch
patching file drivers/infiniband/hw/mlx4/qp.c
Hunk #3 succeeded at 1153 (offset 41 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0270_fmr_enable.patch
patching file drivers/infiniband/hw/mlx4/mr.c
patching file drivers/net/mlx4/mr.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0280_diag_counters_sysfs.patch
patching file drivers/net/mlx4/fw.c
patching file include/linux/mlx4/device.h
patching file drivers/infiniband/hw/mlx4/main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0290_mcast_loopback.patch
patching file drivers/net/mlx4/mcg.c
patching file drivers/net/mlx4/main.c
patching file drivers/net/mlx4/mlx4.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0300_bogus_qp.patch
patching file drivers/net/mlx4/qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mlx4_0310_date_version.patch
patching file drivers/infiniband/hw/mlx4/main.c
patching file drivers/net/mlx4/mlx4.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0001_catas_wqueue_namelen.patch
patching file drivers/infiniband/hw/mthca/mthca_catas.c
Hunk #1 succeeded at 205 with fuzz 2.
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0002_wrid_swap.patch
patching file drivers/infiniband/hw/mthca/mthca_cq.c
Hunk #1 succeeded at 538 (offset 1 line).
Hunk #2 succeeded at 558 (offset 1 line).
patching file drivers/infiniband/hw/mthca/mthca_qp.c
Hunk #1 succeeded at 1766 (offset 76 lines).
Hunk #2 succeeded at 1883 (offset 73 lines).
Hunk #3 succeeded at 2109 (offset 41 lines).
Hunk #4 succeeded at 2222 with fuzz 2 (offset 30 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0003_checksum_offload.patch
patching file drivers/infiniband/hw/mthca/mthca_cmd.c
patching file drivers/infiniband/hw/mthca/mthca_cmd.h
patching file drivers/infiniband/hw/mthca/mthca_cq.c
Hunk #3 succeeded at 636 (offset -1 lines).
patching file drivers/infiniband/hw/mthca/mthca_main.c
patching file drivers/infiniband/hw/mthca/mthca_qp.c
patching file drivers/infiniband/hw/mthca/mthca_wqe.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0004_prelink_wqes.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
patching file drivers/infiniband/hw/mthca/mthca_srq.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0005_hw_ver.patch
patching file drivers/infiniband/hw/mthca/mthca_cmd.c
patching file drivers/infiniband/hw/mthca/mthca_main.c
patching file drivers/infiniband/hw/mthca/mthca_provider.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0006_page_size_calc.patch
patching file drivers/infiniband/hw/mthca/mthca_provider.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0007_fmr_alloc_error.patch
patching file drivers/infiniband/hw/mthca/mthca_mr.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0008_roland_fmr_alloc_fix.patch
patching file drivers/infiniband/hw/mthca/mthca_mr.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0009_sg_init_table.patch
patching file drivers/infiniband/hw/mthca/mthca_memfree.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0010_bogus_qp.patch
patching file drivers/infiniband/hw/mthca/mthca_qp.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/mthca_0011_date_version.patch
patching file drivers/infiniband/hw/mthca/mthca_dev.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_cm_flush_workqueue.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 3466 (offset -47 lines).
Hunk #2 succeeded at 3512 (offset -47 lines).
Hunk #3 succeeded at 3520 (offset -47 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_cm_limit_mra_timeout.patch
patching file drivers/infiniband/core/cm.c
Hunk #1 succeeded at 53 (offset -1 lines).
Hunk #2 succeeded at 917 (offset 18 lines).
Hunk #3 succeeded at 1045 (offset 20 lines).
Hunk #4 succeeded at 1449 (offset 20 lines).
Hunk #5 succeeded at 2353 (offset 14 lines).
Hunk #6 succeeded at 2764 (offset 16 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_1_notifications.patch
patching file drivers/infiniband/core/Makefile
patching file drivers/infiniband/core/notice.c
patching file drivers/infiniband/core/sa.h
patching file drivers/infiniband/core/sa_query.c
patching file include/rdma/ib_sa.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_2_cache.patch
patching file drivers/infiniband/core/Makefile
patching file drivers/infiniband/core/local_sa.c
patching file drivers/infiniband/core/multicast.c
patching file drivers/infiniband/core/sa.h
patching file drivers/infiniband/core/sa_query.c
Hunk #1 succeeded at 461 (offset -3 lines).
Hunk #2 succeeded at 780 (offset 22 lines).
Hunk #3 succeeded at 846 (offset 18 lines).
Hunk #4 succeeded at 1415 (offset 6 lines).
Hunk #5 succeeded at 1434 (offset 6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_3_disable.patch
patching file drivers/infiniband/core/local_sa.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/sean_local_sa_4_fix_hang.patch
patching file drivers/infiniband/core/local_sa.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_1_recreate_at_reconnect.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 504 (offset 9 lines).
Hunk #2 succeeded at 531 (offset 9 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_2_disconnect_without_wait.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 403 (offset 3 lines).
Hunk #2 succeeded at 1273 (offset -21 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_3_qp_err_timer_reconnect_target.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 862 (offset -22 lines).
Hunk #2 succeeded at 893 (offset -22 lines).
Hunk #3 succeeded at 1010 (offset -22 lines).
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/ulp/srp/ib_srp.h
Hunk #1 succeeded at 155 (offset -5 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_4_respect_target_credit_limit.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 958 (offset 28 lines).
Hunk #2 succeeded at 1027 (offset 29 lines).
Hunk #3 succeeded at 1214 (offset 29 lines).
Hunk #4 succeeded at 1313 (offset 28 lines).
patching file drivers/infiniband/ulp/srp/ib_srp.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_5_add_info_to_log_messages.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 272 (offset 3 lines).
Hunk #2 succeeded at 304 (offset 3 lines).
Hunk #3 succeeded at 381 (offset 3 lines).
Hunk #4 succeeded at 403 (offset 3 lines).
Hunk #5 succeeded at 571 (offset 4 lines).
Hunk #6 succeeded at 687 (offset 4 lines).
Hunk #7 succeeded at 791 (offset 4 lines).
Hunk #8 succeeded at 837 (offset 4 lines).
Hunk #9 succeeded at 859 (offset 4 lines).
Hunk #10 succeeded at 901 (offset 4 lines).
Hunk #11 succeeded at 1067 (offset 4 lines).
Hunk #12 succeeded at 1081 (offset 4 lines).
Hunk #13 succeeded at 1136 (offset 4 lines).
Hunk #14 succeeded at 1162 (offset 4 lines).
Hunk #15 succeeded at 1188 (offset 4 lines).
Hunk #16 succeeded at 1217 (offset 4 lines).
Hunk #17 succeeded at 1233 (offset 4 lines).
Hunk #18 succeeded at 1280 (offset 4 lines).
Hunk #19 succeeded at 1307 (offset 4 lines).
Hunk #20 succeeded at 1385 (offset 4 lines).
Hunk #21 succeeded at 1415 (offset 4 lines).
Hunk #22 succeeded at 1442 (offset 4 lines).
Hunk #23 succeeded at 1867 (offset 17 lines).
Hunk #24 succeeded at 1896 (offset 17 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/srp_6_retry_stale_connections.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 204 (offset 3 lines).
Hunk #2 succeeded at 451 (offset 4 lines).
Hunk #3 succeeded at 484 (offset 4 lines).
Hunk #4 succeeded at 538 (offset 4 lines).
Hunk #5 succeeded at 556 (offset 4 lines).
Hunk #6 succeeded at 1226 (offset 4 lines).
Hunk #7 succeeded at 1918 (offset 17 lines).
patching file drivers/infiniband/ulp/srp/ib_srp.h
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/fixes/uverbs_warning.patch
patching file drivers/infiniband/core/uverbs_cmd.c

Applying patches for 2.6.16 kernel:
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/1_struct_path_revert_to_2_6_19.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 564 (offset 30 lines).
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
Hunk #1 succeeded at 1890 with fuzz 1 (offset 146 lines).
patching file drivers/infiniband/hw/ipath/ipath_fs.c
Hunk #1 succeeded at 114 (offset -3 lines).
Hunk #2 succeeded at 154 (offset -5 lines).
Hunk #3 succeeded at 207 (offset -5 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/2_misc_device_to_2_6_19.patch
patching file drivers/infiniband/core/ucma.c
Hunk #1 succeeded at 1109 (offset 262 lines).
Hunk #2 succeeded at 1123 (offset 262 lines).
Hunk #3 succeeded at 1137 (offset 262 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/addr_1_netevents_revert_to_2_6_17.patch
patching file drivers/infiniband/core/addr.c
Hunk #2 succeeded at 351 (offset -2 lines).
Hunk #3 succeeded at 378 (offset -2 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/core_sysfs_to_2_6_23.patch
patching file drivers/infiniband/core/sysfs.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxg3_to_2_6_20.patch
patching file drivers/net/cxgb3/cxgb3_main.c
Hunk #1 succeeded at 76 with fuzz 2.
Hunk #2 succeeded at 483 (offset 36 lines).
Hunk #3 succeeded at 494 (offset 35 lines).
Hunk #4 succeeded at 525 (offset 35 lines).
Hunk #5 succeeded at 547 (offset 35 lines).
Hunk #6 succeeded at 567 (offset 35 lines).
Hunk #7 succeeded at 619 (offset 35 lines).
Hunk #8 succeeded at 644 (offset 35 lines).
Hunk #9 succeeded at 664 (offset 35 lines).
Hunk #10 succeeded at 1012 (offset 49 lines).
Hunk #11 succeeded at 1037 (offset 49 lines).
Hunk #12 succeeded at 2729 (offset 155 lines).
Hunk #13 succeeded at 2760 with fuzz 1 (offset 155 lines).
patching file drivers/net/cxgb3/cxgb3_offload.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0100_napi.patch
patching file drivers/net/cxgb3/adapter.h
patching file drivers/net/cxgb3/cxgb3_main.c
Hunk #9 succeeded at 2713 (offset -6 lines).
Hunk #10 succeeded at 2814 (offset -6 lines).
patching file drivers/net/cxgb3/sge.c
Hunk #5 succeeded at 1649 (offset 5 lines).
Hunk #6 succeeded at 1686 (offset 5 lines).
Hunk #7 succeeded at 1735 (offset 5 lines).
Hunk #8 succeeded at 2082 (offset 5 lines).
Hunk #9 succeeded at 2208 (offset 5 lines).
Hunk #10 succeeded at 2220 (offset 5 lines).
Hunk #11 succeeded at 2240 (offset 5 lines).
Hunk #12 succeeded at 2289 (offset 5 lines).
Hunk #13 succeeded at 2314 (offset 5 lines).
Hunk #14 succeeded at 2420 (offset 5 lines).
Hunk #15 succeeded at 2435 (offset 5 lines).
Hunk #16 succeeded at 2545 (offset 5 lines).
Hunk #17 succeeded at 2557 (offset 5 lines).
Hunk #18 succeeded at 2593 (offset 5 lines).
Hunk #19 succeeded at 2618 (offset 5 lines).
Hunk #20 succeeded at 2739 (offset 5 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0200_sset.patch
patching file drivers/net/cxgb3/cxgb3_main.c
Hunk #1 succeeded at 1246 (offset 115 lines).
Hunk #2 succeeded at 1755 (offset 115 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_0300_sysfs.patch
patching file drivers/net/cxgb3/cxgb3_main.c
Hunk #1 succeeded at 1050 (offset 83 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_main_to_2_6_22.patch
patching file drivers/net/cxgb3/cxgb3_main.c
Hunk #1 succeeded at 1761 with fuzz 2 (offset 178 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/cxgb3_makefile_to_2_6_19.patch
patching file drivers/net/cxgb3/Makefile
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ehca_01_ibmebus_loc_code.patch
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ehca/ehca_classes.h
Hunk #1 succeeded at 111 (offset 4 lines).
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ehca/ehca_eq.c
(Stripping trailing CRs from patch.)
patching file drivers/infiniband/hw/ehca/ehca_main.c
Hunk #1 succeeded at 429 (offset 11 lines).
Hunk #2 succeeded at 683 (offset 11 lines).
Hunk #3 succeeded at 691 with fuzz 2 (offset 11 lines).
Hunk #4 succeeded at 713 with fuzz 2 (offset 13 lines).
Hunk #5 succeeded at 791 (offset 13 lines).
Hunk #6 succeeded at 841 (offset 13 lines).
Hunk #7 succeeded at 897 (offset 13 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipath-04-aio_write.patch
patching file drivers/infiniband/hw/ipath/ipath_file_ops.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0100_to_2.6.21.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 358 (offset 18 lines).
Hunk #2 succeeded at 414 (offset 20 lines).
Hunk #3 succeeded at 509 (offset 20 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
Hunk #1 succeeded at 270 (offset 1 line).
Hunk #2 succeeded at 286 (offset 1 line).
Hunk #3 succeeded at 331 (offset 1 line).
Hunk #4 succeeded at 377 (offset 1 line).
Hunk #5 succeeded at 399 (offset 1 line).
Hunk #6 succeeded at 412 (offset 1 line).
Hunk #11 succeeded at 824 with fuzz 2 (offset -29 lines).
Hunk #12 succeeded at 904 (offset -28 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 99 (offset 1 line).
Hunk #2 succeeded at 141 (offset 1 line).
Hunk #3 succeeded at 544 (offset 1 line).
Hunk #4 succeeded at 609 (offset 1 line).
Hunk #5 succeeded at 658 (offset 1 line).
Hunk #6 succeeded at 677 (offset 1 line).
Hunk #7 succeeded at 748 (offset 1 line).
Hunk #8 succeeded at 774 (offset 1 line).
Hunk #9 succeeded at 803 (offset 1 line).
Hunk #10 succeeded at 891 (offset 1 line).
Hunk #11 succeeded at 1013 (offset 45 lines).
Hunk #12 succeeded at 1022 (offset 45 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
Hunk #1 succeeded at 583 (offset 12 lines).
Hunk #2 succeeded at 633 (offset 26 lines).
Hunk #3 succeeded at 651 with fuzz 2 (offset 27 lines).
Hunk #4 succeeded at 697 (offset 27 lines).
Hunk #5 succeeded at 717 (offset 27 lines).
Hunk #6 succeeded at 727 (offset 27 lines).
Hunk #7 succeeded at 764 with fuzz 1 (offset 27 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_multicast.c
Hunk #3 succeeded at 691 (offset 39 lines).
Hunk #4 succeeded at 706 (offset 39 lines).
Hunk #5 succeeded at 721 (offset 39 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0110_restore_get_stats.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 788 (offset -2 lines).
Hunk #2 succeeded at 1028 (offset 6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0200_class_device_to_2_6_20.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_cm.c
Hunk #1 succeeded at 51 with fuzz 2 (offset 12 lines).
Hunk #2 succeeded at 1401 (offset 178 lines).
Hunk #3 succeeded at 1411 (offset 178 lines).
Hunk #4 succeeded at 1449 with fuzz 1 (offset 175 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 93 (offset -2 lines).
Hunk #2 succeeded at 1091 (offset 29 lines).
Hunk #3 succeeded at 1130 with fuzz 1 (offset 29 lines).
Hunk #4 succeeded at 1152 (offset 29 lines).
Hunk #5 succeeded at 1171 with fuzz 1 (offset 29 lines).
Hunk #6 succeeded at 1282 (offset 31 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_vlan.c
Hunk #2 succeeded at 127 (offset 4 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0300_class_device_to_2_6_20_umcast.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 1099 (offset 37 lines).
Hunk #2 succeeded at 1121 with fuzz 1 (offset 37 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_0400_skb_to_2_6_20.patch
patching file drivers/infiniband/ulp/ipoib/ipoib_ib.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ipoib_to_2_6_16.patch
patching file drivers/infiniband/ulp/ipoib/ipoib.h
Hunk #1 succeeded at 470 (offset 42 lines).
patching file drivers/infiniband/ulp/ipoib/ipoib_main.c
Hunk #1 succeeded at 84 (offset -2 lines).
Hunk #2 succeeded at 861 (offset -2 lines).
Hunk #3 succeeded at 909 (offset 3 lines).
Hunk #4 succeeded at 921 (offset 3 lines).
Hunk #5 succeeded at 944 (offset 3 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0100_namespace.patch
patching file drivers/infiniband/hw/cxgb3/cxio_hal.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_cxgb3_0200_states.patch
patching file drivers/infiniband/hw/cxgb3/iwch_cm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/iw_nes_100_to_2_6_23.patch
patching file drivers/infiniband/hw/nes/nes_hw.c
patching file drivers/infiniband/hw/nes/nes_hw.h
patching file drivers/infiniband/hw/nes/nes_nic.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/linux_stuff_to_2_6_17.patch
patching file drivers/infiniband/core/genalloc.c
patching file drivers/infiniband/core/netevent.c
patching file drivers/infiniband/core/Makefile
Hunk #1 succeeded at 31 (offset 1 line).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/mlx4_0050_wc.patch
patching file drivers/infiniband/hw/mlx4/wc.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/mthca_0001_pcix_to_2_6_22.patch
patching file drivers/infiniband/hw/mthca/mthca_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/rds_to_2_6_20.patch
patching file net/rds/sysctl.c
Hunk #1 succeeded at 146 (offset 19 lines).
patching file net/rds/ib_sysctl.c
Hunk #1 succeeded at 151 (offset 23 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/sdp_0100_revert_to_2_6_23.patch
patching file drivers/infiniband/ulp/sdp/sdp_main.c
Hunk #1 succeeded at 2144 (offset 22 lines).
Hunk #2 succeeded at 2162 (offset 22 lines).
Hunk #3 succeeded at 2346 (offset 22 lines).
Hunk #4 succeeded at 2360 (offset 22 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0100_revert_role_to_2_6_23.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 1643 (offset 84 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_0200_revert_srp_transport_to_2.6.23.patch
patching file drivers/infiniband/ulp/srp/Kconfig
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #3 succeeded at 439 (offset 19 lines).
Hunk #4 succeeded at 1628 (offset 84 lines).
Hunk #5 succeeded at 1859 (offset 84 lines).
Hunk #6 succeeded at 2120 (offset 84 lines).
Hunk #7 succeeded at 2138 (offset 84 lines).
Hunk #8 succeeded at 2150 (offset 84 lines).
Hunk #9 succeeded at 2158 (offset 84 lines).
Hunk #10 succeeded at 2171 (offset 84 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/srp_cmd_to_2_6_22.patch
patching file drivers/infiniband/ulp/srp/ib_srp.c
Hunk #1 succeeded at 505 (offset 50 lines).
Hunk #2 succeeded at 518 (offset 50 lines).
Hunk #3 succeeded at 730 (offset 46 lines).
patching file drivers/infiniband/ulp/srp/ib_srp.h
Hunk #1 succeeded at 112 (offset 6 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucma_to_2_6_16.patch
patching file drivers/infiniband/core/ucma.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/ucm_to_2_6_16.patch
patching file drivers/infiniband/core/ucm.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/user_mad_to_2_6_16.patch
patching file drivers/infiniband/core/user_mad.c
Hunk #1 succeeded at 849 (offset 18 lines).
Hunk #2 succeeded at 926 (offset 21 lines).
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_16.patch
patching file drivers/infiniband/core/uverbs_main.c
	/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_patches/backport/2.6.16/uverbs_to_2_6_17.patch
patching file drivers/infiniband/core/uverbs_main.c
Hunk #1 succeeded at 851 with fuzz 1 (offset 36 lines).
Created ofed_patch.mk:
BACKPORT_INCLUDES=-I${CWD}/kernel_addons/backport/2.6.16/include/
Created configure.mk.kernel:
# Current working directory
CWD=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3

# Kernel level
KVERSION=2.6.16-54-0.2.5_lustre.1.6.4.3smp
ARCH=x86_64
MODULES_DIR=/lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates
KSRC=/lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build

AUTOCONF_H=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h

WITH_MAKE_PARAMS=

CONFIG_MEMTRACK=
CONFIG_DEBUG_INFO=y
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_IPOIB_CM=y
CONFIG_INFINIBAND_SDP=m
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_SRPT=m

CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_MTHCA=m

CONFIG_MLX4_CORE=m
CONFIG_MLX4_INFINIBAND=m
CONFIG_MLX4_DEBUG=y

CONFIG_INFINIBAND_IPOIB_DEBUG=y
CONFIG_INFINIBAND_ISER=
CONFIG_SCSI_ISCSI_ATTRS=
CONFIG_ISCSI_TCP=
CONFIG_INFINIBAND_EHCA=
CONFIG_INFINIBAND_EHCA_SCALING=
CONFIG_RDS=m
CONFIG_RDS_IB=m
CONFIG_RDS_TCP=m
CONFIG_RDS_DEBUG=
CONFIG_INFINIBAND_MADEYE=
CONFIG_INFINIBAND_QLGC_VNIC=m
CONFIG_INFINIBAND_CXGB3=m
CONFIG_CHELSIO_T3=m
CONFIG_INFINIBAND_NES=m

CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=
CONFIG_INFINIBAND_SDP_SEND_ZCOPY=
CONFIG_INFINIBAND_SDP_RECV_ZCOPY=
CONFIG_INFINIBAND_SDP_DEBUG=y
CONFIG_INFINIBAND_SDP_DEBUG_DATA=
CONFIG_INFINIBAND_IPATH=
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_QLGC_VNIC_DEBUG=
CONFIG_INFINIBAND_QLGC_VNIC_STATS=
CONFIG_INFINIBAND_CXGB3_DEBUG=
CONFIG_INFINIBAND_NES_DEBUG=
CONFIG_INFINIBAND_AMSO1100=

Created /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h:
#undef CONFIG_MEMTRACK
#undef CONFIG_DEBUG_INFO
#undef CONFIG_INFINIBAND
#undef CONFIG_INFINIBAND_IPOIB
#undef CONFIG_INFINIBAND_IPOIB_CM
#undef CONFIG_INFINIBAND_SDP
#undef CONFIG_INFINIBAND_SRP
#undef CONFIG_INFINIBAND_SRPT

#undef CONFIG_INFINIBAND_USER_MAD
#undef CONFIG_INFINIBAND_USER_ACCESS
#undef CONFIG_INFINIBAND_ADDR_TRANS
#undef CONFIG_INFINIBAND_USER_MEM
#undef CONFIG_INFINIBAND_MTHCA

#undef CONFIG_MLX4_CORE
#undef CONFIG_MLX4_DEBUG
#undef CONFIG_MLX4_INFINIBAND

#undef CONFIG_INFINIBAND_IPOIB_DEBUG
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_INFINIBAND_EHCA
#undef CONFIG_INFINIBAND_EHCA_SCALING
#undef CONFIG_RDS
#undef CONFIG_RDS_IB
#undef CONFIG_RDS_TCP
#undef CONFIG_RDS_DEBUG
#undef CONFIG_INFINIBAND_MADEYE
#undef CONFIG_INFINIBAND_QLGC_VNIC
#undef CONFIG_INFINIBAND_QLGC_VNIC_DEBUG
#undef CONFIG_INFINIBAND_QLGC_VNIC_STATS
#undef CONFIG_INFINIBAND_CXGB3
#undef CONFIG_INFINIBAND_CXGB3_DEBUG
#undef CONFIG_CHELSIO_T3
#undef CONFIG_INFINIBAND_NES
#undef CONFIG_INFINIBAND_NES_DEBUG

#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#undef CONFIG_INFINIBAND_SDP_DEBUG
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#undef CONFIG_INFINIBAND_IPATH
#undef CONFIG_INFINIBAND_MTHCA_DEBUG
#undef CONFIG_INFINIBAND_AMSO1100

#define CONFIG_INFINIBAND 1
#define CONFIG_INFINIBAND_IPOIB 1
#define CONFIG_INFINIBAND_IPOIB_CM 1
#define CONFIG_INFINIBAND_SDP 1
#define CONFIG_INFINIBAND_SRP 1
#define CONFIG_INFINIBAND_SRPT 1

#define CONFIG_INFINIBAND_USER_MAD 1
#define CONFIG_INFINIBAND_USER_ACCESS 1
#define CONFIG_INFINIBAND_ADDR_TRANS 1
#define CONFIG_INFINIBAND_USER_MEM 1
#define CONFIG_INFINIBAND_MTHCA 1
#define CONFIG_INFINIBAND_QLGC_VNIC 1
#define CONFIG_INFINIBAND_CXGB3 1
#define CONFIG_CHELSIO_T3 1
#define CONFIG_INFINIBAND_NES 1

#define CONFIG_INFINIBAND_IPOIB_DEBUG 1
#undef CONFIG_INFINIBAND_ISER
#undef CONFIG_SCSI_ISCSI_ATTRS
#undef CONFIG_ISCSI_TCP
#undef CONFIG_INFINIBAND_EHCA
#define CONFIG_RDS 1
#define CONFIG_RDS_IB 1
#define CONFIG_RDS_TCP 1
#undef CONFIG_RDS_DEBUG
#undef CONFIG_INFINIBAND_QLGC_VNIC_DEBUG
#undef CONFIG_INFINIBAND_QLGC_VNIC_STATS
#undef CONFIG_INFINIBAND_CXGB3_DEBUG
#undef CONFIG_INFINIBAND_NES_DEBUG

#define CONFIG_MLX4_CORE 1
#define CONFIG_MLX4_INFINIBAND 1

#define CONFIG_MLX4_DEBUG 1

#undef CONFIG_INFINIBAND_IPOIB_DEBUG_DATA
#undef CONFIG_INFINIBAND_SDP_SEND_ZCOPY
#undef CONFIG_INFINIBAND_SDP_RECV_ZCOPY
#define CONFIG_INFINIBAND_SDP_DEBUG 1
#undef CONFIG_INFINIBAND_SDP_DEBUG_DATA
#undef CONFIG_INFINIBAND_IPATH
#define CONFIG_INFINIBAND_MTHCA_DEBUG 1
#undef CONFIG_INFINIBAND_MADEYE
#undef CONFIG_INFINIBAND_AMSO1100

+ install -d /var/tmp/OFED//usr/local/ofed-1.3/src/ofa_kernel
+ cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/ /var/tmp/OFED//usr/local/ofed-1.3/src/ofa_kernel
+ cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/configure.mk.kernel /var/tmp/OFED//usr/local/ofed-1.3/src/ofa_kernel
+ cd /var/tmp/OFED//usr/local/ofed-1.3/src/
+ ln -s ofa_kernel openib
+ cd -
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3
+ make kernel
Building kernel modules
Kernel version: 2.6.16-54-0.2.5_lustre.1.6.4.3smp
Modules directory: //lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates
Kernel sources: /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build
env CWD=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 BACKPORT_INCLUDES=-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/ \
make -C /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build SUBDIRS="/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3" \
	V=1  \
	CONFIG_MEMTRACK= \
	CONFIG_DEBUG_INFO=y \
	CONFIG_INFINIBAND=m \
	CONFIG_INFINIBAND_IPOIB=m \
	CONFIG_INFINIBAND_IPOIB_CM=y \
	CONFIG_INFINIBAND_SDP=m \
	CONFIG_INFINIBAND_SRP=m \
	CONFIG_INFINIBAND_SRPT=m \
	CONFIG_INFINIBAND_USER_MAD=m \
	CONFIG_INFINIBAND_USER_ACCESS=m \
	CONFIG_INFINIBAND_USER_MEM=y \
	CONFIG_INFINIBAND_ADDR_TRANS=y \
	CONFIG_INFINIBAND_MTHCA=m \
	CONFIG_INFINIBAND_IPOIB_DEBUG=y \
	CONFIG_INFINIBAND_ISER= \
	CONFIG_SCSI_ISCSI_ATTRS= \
	CONFIG_ISCSI_TCP= \
	CONFIG_INFINIBAND_EHCA= \
	CONFIG_INFINIBAND_EHCA_SCALING= \
	CONFIG_RDS=m \
	CONFIG_RDS_IB=m \
	CONFIG_RDS_TCP=m \
	CONFIG_RDS_DEBUG= \
	CONFIG_INFINIBAND_IPOIB_DEBUG_DATA= \
	CONFIG_INFINIBAND_SDP_SEND_ZCOPY= \
	CONFIG_INFINIBAND_SDP_RECV_ZCOPY= \
	CONFIG_INFINIBAND_SDP_DEBUG=y \
	CONFIG_INFINIBAND_SDP_DEBUG_DATA= \
	CONFIG_INFINIBAND_IPATH= \
	CONFIG_INFINIBAND_MTHCA_DEBUG=y \
	CONFIG_INFINIBAND_MADEYE= \
	CONFIG_INFINIBAND_QLGC_VNIC=m \
	CONFIG_INFINIBAND_QLGC_VNIC_DEBUG= \
	CONFIG_INFINIBAND_QLGC_VNIC_STATS= \
	CONFIG_CHELSIO_T3=m \
	CONFIG_INFINIBAND_CXGB3=m \
	CONFIG_INFINIBAND_CXGB3_DEBUG= \
	CONFIG_INFINIBAND_NES=m \
	CONFIG_INFINIBAND_NES_DEBUG= \
	CONFIG_MLX4_CORE=m \
	CONFIG_MLX4_INFINIBAND=m \
	CONFIG_MLX4_ETHERNET= \
	CONFIG_MLX4_DEBUG=y \
	CONFIG_INFINIBAND_AMSO1100= \
	LINUXINCLUDE=' \
	-include include/linux/autoconf.h \
	-include /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \
	-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/ \
	 \
	 \
	-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include \
	-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \
	-I/usr/local/include/scst \
	-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \
	-I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \
	-Iinclude \
	$(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) \
	' \
	modules
make[1]: Entering directory `/usr/src/linux-2.6.16-54-0.2.5_lustre.1.6.4.3'
rm -rf /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/.tmp_versions
mkdir -p /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/.tmp_versions
make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3
make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband
make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core
  gcc -Wp,-MD,/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/.addr.o.d  -nostdinc -isystem /usr/lib64/gcc/x86_64-suse-linux/4.1.0/include -D__KERNEL__ -include include/linux/autoconf.h  -include /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include/linux/autoconf.h  -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/    -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/include  -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/debug  -I/usr/local/include/scst  -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt  -I/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/net/cxgb3  -Iinclude      -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -fno-strict-aliasing -fno-common -ffreestanding -Os     -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks	 -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g  -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign   -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(addr)"  -D"KBUILD_MODNAME=KBUILD_STR(ib_addr)" -c -o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/.tmp_addr.o /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c
In file included from include/asm/processor.h:23,
                 from include/linux/prefetch.h:14,
                 from include/linux/list.h:7,
                 from include/linux/mutex.h:13,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/mutex.h:5,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:31:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpumask.h:6:1: warning: "for_each_possible_cpu" redefined
In file included from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/cpumask.h:4,
                 from include/asm/processor.h:23,
                 from include/linux/prefetch.h:14,
                 from include/linux/list.h:7,
                 from include/linux/mutex.h:13,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/mutex.h:5,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:31:
include/linux/cpumask.h:411:1: warning: this is the location of the previous definition
In file included from include/linux/if_ether.h:111,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/if_ether.h:4,
                 from include/linux/netdevice.h:29,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:4,
                 from include/linux/inetdevice.h:7,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/inetdevice.h:4,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:32:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h: In function âbackport_skb_linearize_to_2_6_17â:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h:13: error: too many arguments to function âskb_linearizeâ
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h: At top level:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h:101: error: redefinition of âskb_is_gsoâ
include/linux/skbuff.h:1424: error: previous definition of âskb_is_gsoâ was here
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h: In function âskb_is_gsoâ:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/skbuff.h:102: error: âstruct skb_shared_infoâ has no member named âtso_sizeâ
In file included from include/linux/inetdevice.h:7,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/inetdevice.h:4,
                 from /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.c:32:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: At top level:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:7: error: redefinition of ânetif_tx_lockâ
include/linux/netdevice.h:925: error: previous definition of ânetif_tx_lockâ was here
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: In function ânetif_tx_lockâ:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:8: error: âstruct net_deviceâ has no member named âxmit_lockâ
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: At top level:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:13: error: redefinition of ânetif_tx_unlockâ
include/linux/netdevice.h:945: error: previous definition of ânetif_tx_unlockâ was here
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h: In function ânetif_tx_unlockâ:
/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.16/include/linux/netdevice.h:15: error: âstruct net_deviceâ has no member named âxmit_lockâ
make[4]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core/addr.o] Error 1
make[3]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband/core] Error 2
make[2]: *** [/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/drivers/infiniband] Error 2
make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.16-54-0.2.5_lustre.1.6.4.3'
make: *** [kernel] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.52212 (%build)


RPM build errors:
    user vlad does not exist - using root
    group vlad does not exist - using root
    user vlad does not exist - using root
    group vlad does not exist - using root
    Bad exit status from /var/tmp/rpm-tmp.52212 (%build)

From bs at q-leap.de  Tue Apr  8 02:35:35 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Tue, 8 Apr 2008 11:35:35 +0200
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <20080408014406.GA16864@sashak.voltaire.com>
References: <200804041147.27565.bs@q-leap.de>
	<20080408014406.GA16864@sashak.voltaire.com>
Message-ID: <200804081135.35846.bs@q-leap.de>

Hello Sasha,

On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote:
> Hi Bernd,
>
> On 11:47 Fri 04 Apr     , Bernd Schubert wrote:
> > opensm-3.2.1 logs some error messages like this:
> >
> > Apr 04 00:00:08 325114 [4580A960] 0x01 ->
> > __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node
> > 0
> > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling
> > list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3
> > hop path: Path = 0,1,14,13
> >
> >
> > From ibnetdiscover output I see port13 of this switch is a
> > switch-interconnect (sorry, I don't know what the correct name/identifier
> > for switches within switches):
> >
> > [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid
> > 263 4xSDR
>
> It is possible that port was DOWN during first subnet discovery. Finally
> everything should be initialized after those messages. Isn't it the case
> here?

I think everything is initialized, but I don't think the port was down during 
first subnet discovery, since the port is on a spine board (I called 
it 'inter') to another switch system. We also never added any leafes to the 
switches.


Thanks,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH


From erezz at voltaire.com  Tue Apr  8 03:27:40 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 08 Apr 2008 13:27:40 +0300
Subject: [ofa-general] [PATCH] IB/iSER: Release connection resources when
 receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event
Message-ID: <47FB489C.6030507@voltaire.com>

When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should release the
connection resources except for the rdma cm id (which will be released by the
cma itself).

This behavior is necessary if IB modules are unloaded while open-iscsi is still
running. Currently, iSER just initiates a BUG() call.

Signed-off-by: Erez Zilber <erezz at voltaire.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.h |    2 ++
 drivers/infiniband/ulp/iser/iser_verbs.c |   18 ++++++++++++++----
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 1ee867b..9fe0b3f 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -249,6 +249,8 @@ struct iser_conn {
 	struct iser_page_vec         *page_vec;     /* represents SG to fmr maps*
 						     * maps serialized as tx is*/
 	struct list_head	     conn_list;       /* entry in ig conn list */
+	wait_queue_head_t            rem_wait;
+	int                          dev_removed;
 };
 
 struct iscsi_iser_conn {
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 993f0a8..9beddb9 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -219,7 +219,8 @@ static int iser_free_ib_conn_res(struct iser_conn *ib_conn)
 	if (ib_conn->qp != NULL)
 		rdma_destroy_qp(ib_conn->cma_id);
 
-	if (ib_conn->cma_id != NULL)
+	/* if the device was removed, the cma will call rdma_destroy_id itself */
+	if (ib_conn->cma_id != NULL && !ib_conn->dev_removed)
 		rdma_destroy_id(ib_conn->cma_id);
 
 	ib_conn->fmr_pool = NULL;
@@ -325,7 +326,10 @@ static void iser_conn_release(struct iser_conn *ib_conn)
 		iser_device_try_release(device);
 	if (ib_conn->iser_conn)
 		ib_conn->iser_conn->ib_conn = NULL;
-	kfree(ib_conn);
+	if (ib_conn->dev_removed)
+		wake_up_interruptible(&ib_conn->rem_wait);
+	else
+		kfree(ib_conn);
 }
 
 /**
@@ -451,6 +455,7 @@ static void iser_disconnected_handler(struct rdma_cm_id *cma_id)
 static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 {
 	int ret = 0;
+	struct iser_conn *ib_conn;
 
 	iser_err("event %d conn %p id %p\n",event->event,cma_id->context,cma_id);
 
@@ -476,8 +481,12 @@ static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *eve
 		iser_disconnected_handler(cma_id);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		iser_err("Device removal is currently unsupported\n");
-		BUG();
+		ib_conn = (struct iser_conn *)cma_id->context;
+		ib_conn->dev_removed = 1;
+		iser_disconnected_handler(cma_id);
+		wait_event_interruptible(ib_conn->rem_wait, ib_conn->state == ISER_CONN_DOWN);
+		kfree(ib_conn);
+		ret = 1;
 		break;
 	default:
 		iser_err("Unexpected RDMA CM event (%d)\n", event->event);
@@ -497,6 +506,7 @@ int iser_conn_init(struct iser_conn **ibconn)
 	}
 	ib_conn->state = ISER_CONN_INIT;
 	init_waitqueue_head(&ib_conn->wait);
+	init_waitqueue_head(&ib_conn->rem_wait);
 	atomic_set(&ib_conn->post_recv_buf_count, 0);
 	atomic_set(&ib_conn->post_send_buf_count, 0);
 	INIT_LIST_HEAD(&ib_conn->conn_list);
-- 
1.5.3.6

Roland,

This patch was built against your 2.6.26 branch. Can you add it to your list?

Thanks,
Erez


From EObuhova at agrocombank.kiev.ua  Tue Apr  8 06:21:07 2008
From: EObuhova at agrocombank.kiev.ua (Freida Dunbar)
Date: Tue, 8 Apr 2008 14:21:07 +0100
Subject: [ofa-general] Start earning the salary you deserve by obtaining the
	appropriate University Degree.
Message-ID: <01c89983$c8e0cb80$0c23933a@EObuhova>

Want the degree but can�t find the time?

CALL +1 770-456-5282 !

WHAT A GREAT IDEA!
We provide a concept that will allow anyone with sufficient work experience to obtain a fully verifiable University Degree.
Bachelors, Masters or even a Doctorate.
Think of it, within four to six weeks, you too could be a college graduate.
Many people share the same frustration, they are all doing the work of the person that has the degree and the person that has the degree is getting all the money.
Don�t you think that it is time you were paid fair compensation for the level of work you are already doing?
This is your chance to finally make the right move and receive your due benefits.
If you are like most people, you are more than qualified with your experience, but are lacking that prestigious piece of paper known as a diploma that is often the passport to success.
CALL US TODAY AND GIVE YOUR WORK
EXPERIENCE THE CHANCE TO EARN YOU
THE HIGHER COMPENSATION YOU DESERVE!

CALL +1 770-456-5282 !


billiard-room was situated at the head of the front stairs. Warrington
broach the subject most vital to both. Finally, Mrs. Bennington
beauty is of the kind in which nature has no hand. No man is a hero to


From moshek at voltaire.com  Tue Apr  8 05:56:20 2008
From: moshek at voltaire.com (Moshe Kazir)
Date: Tue, 8 Apr 2008 15:56:20 +0300
Subject: [ofa-general] ofed-1.3 uninstall.sh do not remove all the infiniband
	stack components properlly on RH 4 u 5 or rh 4 u 6 full instalation.
In-Reply-To: <47FAA913.7090805@opengridcomputing.com>
References: <adaej9jppcx.fsf@cisco.com>	<47FA3D60.3020905@opengridcomputing.com><adahcednyx9.fsf@cisco.com>
	<47FAA913.7090805@opengridcomputing.com>
Message-ID: <39C75744D164D948A170E9792AF8E7CAC5AEED@exil.voltaire.com>


Some rpm's (openmpi-libs, libmthca-devel,etc.)  are not removed and
cause dependency problems. 

The attaches patch solves  the problem.

Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ofed_1.3_uninstall_sh.patch
Type: application/octet-stream
Size: 4249 bytes
Desc: ofed_1.3_uninstall_sh.patch
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080408/da0d7cc9/attachment.obj>

From michael.heinz at qlogic.com  Tue Apr  8 07:27:55 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Tue, 8 Apr 2008 09:27:55 -0500
Subject: [ofa-general] MVAPICH2 crashes on mixed fabric
In-Reply-To: <Pine.GSO.4.40.0804062051220.2077-100000@kappa.cse.ohio-state.edu>
References: <C07C40DB2364324799506DE8FF12F8D8678166@EPEXCH1.qlogic.org>
	<Pine.GSO.4.40.0804062051220.2077-100000@kappa.cse.ohio-state.edu>
Message-ID: <C07C40DB2364324799506DE8FF12F8D867824C@EPEXCH1.qlogic.org>

Wei,

No joy. The following command:

+ /usr/mpi/pgi/mvapich2-1.0.2/bin/mpiexec -1 -machinefile
/home/mheinz/mvapich2-pgi/mpi_hosts -n 4 -env MV2_USE_COALESCE 0 -env
MV2_VBUF_TOTAL_SIZE 9216 PMB2.2.1/SRC_PMB/PMB-MPI1

Produced the following error:

[0] Abort: Got FATAL event 3
 at line 796 in file ibv_channel_manager.c
rank 0 in job 48  compute-0-3.local_33082   caused collective abort of
all ranks
  exit status of rank 0: killed by signal 9
+ set +x

Note that compute-0-3 has a connect-x HCA.

If I restrict the ring to only nodes with connect-x the problem does not
occur.

This isn't a huge problem for me; this 4-node cluster is actually for
testing the creation of Rocks Rolls and I can simply record it as a
known limitation when using mvapich2 - but it could impact users in the
field if a cluster gets extended with newer HCAs. 


--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania

-----Original Message-----
From: wei huang [mailto:huanwei at cse.ohio-state.edu] 
Sent: Sunday, April 06, 2008 8:58 PM
To: Mike Heinz
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general] MVAPICH2 crashes on mixed fabric

Hi Mike,

Currently mvapich2 will detect different HCA type and thus select
different parameters for communication, which may cause the problem. We
are working on this feature and it will be available in our next
release.
For now, if you want to run on this setup, please set few environmental
variables like:

mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216
./a.out

Please let us know if this works. Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering Ohio State University OH 43210
Tel: (614)292-8501


On Fri, 4 Apr 2008, Mike Heinz wrote:

> Hey, all, I'm not sure if this is a known bug or some sort of 
> limitation I'm unaware of, but I've been building and testing with the

> OFED 1.3 GA release on a small fabric that has a mix of Arbel-based 
> and newer Connect-X HCAs.
>
> What I've discovered is that mvapich and openmpi work fine across the 
> entire fabric, but mvapich2 crashes when I use a mix of Arbels and 
> Connect-X. The errors vary depending on the test program but here's an
> example:
>
> [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 .
> .
> .
> (output snipped)
> .
> .
> .
>
> #---------------------------------------------------------------------
> --
> ------
> # Benchmarking Sendrecv
> # #processes = 2
> # ( 3 additional processes waiting in MPI_Barrier)
> #---------------------------------------------------------------------
> --
> ------
>        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec] 
> Mbytes/sec
>             0         1000         3.51         3.51         3.51
> 0.00
>             1         1000         3.63         3.63         3.63
> 0.52
>             2         1000         3.67         3.67         3.67
> 1.04
>             4         1000         3.64         3.64         3.64
> 2.09
>             8         1000         3.67         3.67         3.67
> 4.16
>            16         1000         3.67         3.67         3.67
> 8.31
>            32         1000         3.74         3.74         3.74
> 16.32
>            64         1000         3.90         3.90         3.90
> 31.28
>           128         1000         4.75         4.75         4.75
> 51.39
>           256         1000         5.21         5.21         5.21
> 93.79
>           512         1000         5.96         5.96         5.96
> 163.77
>          1024         1000         7.88         7.89         7.89
> 247.54
>          2048         1000        11.42        11.42        11.42
> 342.00
>          4096         1000        15.33        15.33        15.33
> 509.49
>          8192         1000        22.19        22.20        22.20
> 703.83
>         16384         1000        34.57        34.57        34.57
> 903.88
>         32768         1000        51.32        51.32        51.32
> 1217.94
>         65536          640        85.80        85.81        85.80
> 1456.74
>        131072          320       155.23       155.24       155.24
> 1610.40
>        262144          160       301.84       301.86       301.85
> 1656.39
>        524288           80       598.62       598.69       598.66
> 1670.31
>       1048576           40      1175.22      1175.30      1175.26
> 1701.69
>       2097152           20      2309.05      2309.05      2309.05
> 1732.32
>       4194304           10      4548.72      4548.98      4548.85
> 1758.64
> [0] Abort: Got FATAL event 3
>  at line 796 in file ibv_channel_manager.c
> rank 0 in job 1  compute-0-0.local_36049   caused collective abort of
> all ranks
>   exit status of rank 0: killed by signal 9
>
> If, however, I define my mpdring to contain only Connect-X systems OR 
> only Arbel systems, IMB-MPI1 runs to completion.
>
> Can any suggest a workaround or is this a real bug with mvapich2?
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania
>
>


From Brian.Murrell at Sun.COM  Tue Apr  8 07:43:09 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Tue, 08 Apr 2008 10:43:09 -0400
Subject: [ofa-general] kernel ib build (OFED 1.3) fails on SLES 10
In-Reply-To: <200804081013.52983.grossmann@hlrs.de>
References: <200804081013.52983.grossmann@hlrs.de>
Message-ID: <1207665789.13415.20.camel@pc.ilinx>

On Tue, 2008-04-08 at 10:13 +0200, Thomas Großmann wrote:
> Hi,

Hi

> kernel ib build (OFED 1.3) fails on SLES 10.

To be fair, it fails on Sun's version of the SLES 10 kernel for Lustre,
and here is why:

> Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.52212
> + umask 022
> + cd /var/tmp/OFED_topdir/BUILD
> + /bin/rm -rf /var/tmp/OFED
> ++ dirname /var/tmp/OFED
> + /bin/mkdir -p /var/tmp
> + /bin/mkdir /var/tmp/OFED
> + cd ofa_kernel-1.3
> + rm -rf /var/tmp/OFED
> + cd /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3
> + mkdir -p /var/tmp/OFED//usr/local/ofed-1.3/src
> + cp -a /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3 /var/tmp/OFED//usr/local/ofed-1.3/src
> + ./configure --prefix=/usr/local/ofed-1.3 --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp --kernel-sources /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/build --modules-dir /lib/modules/2.6.16-54-0.2.5_lustre.1.6.4.3smp/updates --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-cxgb3-mod --with-nes-mod --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-srp-target-mod --with-rds-mod --with-qlgc_vnic-mod
> ofed_patch.mk does not exist. running ofed_patch.sh
> /var/tmp/OFED_topdir/BUILD/ofa_kernel-1.3/ofed_scripts/ofed_patch.sh --kernel-version 2.6.16-54-0.2.5_lustre.1.6.4.3smp
----------------------------------------------------------------------------------------------------------------------^
This kernel version does match what ofed_patch.sh thinks is a SLES 10
kernel because it is not of the form "2.6.16.*-*-*".  Here's the code in
ofed_patch.sh which detects SLES 10 kernels and assigns the right patch
series for it:

        2.6.16.*-*-*)
                minor=$(echo $KVERSION | cut -d"." -f4 | cut -d"-" -f1)
                if [ $minor -lt 37 ]; then
                        echo 2.6.16_sles10
                elif [ $minor -lt 60 ]; then
                        echo 2.6.16_sles10_sp1
                else
                        echo 2.6.16_sles10_sp2
                fi
        ;;

The lustre kernel version for SLES 10 is
"2.6.16-54-0.2.5_lustre.1.6.4.3smp".  In order for it to match the above
code it needs to have a "-" put before the "smp" at the end.  I am
working on the Lustre build process to do exactly this right at this
moment as well as build our released RPMs with OFED 1.3 support right in
them.  My work is being done in Lustre bugzilla ticket 15316.  When I
have something working, I will post an attachment there with a patch for
our current b1_6 that should apply to 1.6.4.3.

In theory you should be able use the "--with-backport*" configure
options to override this detection when building the RPMs however see my
message to this list (inconsistent use of --with-backport[-patches])
last Saturday about how this seems to be broken currently.

Cheers,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080408/689a6ae0/attachment.sig>

From contact at hondapromo.com  Tue Apr  8 08:13:17 2008
From: contact at hondapromo.com (HONDA SPLASH PROMO)
Date: Tue, 08 Apr 2008 10:13:17 -0500
Subject: [ofa-general] HONDA SPLASH PROMO (((Congratulations!!!))) You Have
	won the sum of &#65533; 400,000.00 pounds
Message-ID: <E1JjFVd-0006Cm-E9@l6.tresuvesdobles.com>

European Region Office: 
10B Canada Square, 
London HP1 8HFD, 
England. 

Ref: 475061725 
Batch: 7056490902/188 
CONGRATULATIONS! 


AWARD NOTIFICATION 
We happily announce to you the draw of the Honda Lottery Promotions for the year 2007 at our Lotto Draw Headquarters London, United Kingdom organized by Honda. This is to inform you that you have won a prize money of GBP ��400,000.00 (Four hundred  thousand British Pounds Sterling). 

Honda Promo Inc. arranged and gathered some of all the e-mail addresses of the people that are active online, among the millions that subscribed to all Email Providers most especially Yahoo and Hotmail, we only selected Twenty (20) candidates per annually as our winners through Electronic Balloting System (EBS) without the candidate applying, we congratulate you for being one of our lucky winners. 

PAYMENT OF PRIZE AND CLAIM 
We are sorry that your Payment Approval File was sent to London due to we have 3 lucky winners in United Kingdom so that you can be cleared and paid simultaneously there. You are to contact our UK Location Claim Office on or before your date of Claim. 

Honda Lottery Prize must be claimed not later than  14 days from date of Draw Notification after the Draw date in which Prize has won. 

Note: Any prize not claimed within this period (14 days) will be forfeited. 

These are your identification numbers: 
Ticket number: 00545 188 564756, 
Prize # 77801209/UK, 
Winning Number:4634553/N, 
Serial number 5368/02 
Lucky numbers: 17 98 09 67 46 
 
HOW TO CLAIM YOUR PRIZE: 
Simply contact our Customer Service Information Headquarters (CSIH Department), 
Mr Mark Johnson
E-mail: h_pro_splash at hotmail.com


IMPORTANT NOTICE 
Send these following informations to the CSIH Department immediately for further procession. 

1 Full Name: ......................

2 Cell phone:......................

or Telephone:.................

3 Contact Address ....................

4 Date of birth ...........................

5 Age .................................

6 Occupation.............................

7 Sex.................................

8 Marital Status..............

9 Country of Origin.......................... 

KEEP THIS PRIVATE AND CONFIDENTIAL UNTIL YOU RECEIVE YOUR PRIZE MONEY. 

NOTE THAT NOBODY OR EVEN THE APPOINTED BANK HAS THE RIGHT TO TOUCH, DEDUCT OR GET ACCESS TO YOUR 
PRIZE MONEY FOR ANY REASON THEY HAVE BEEN WARNED STRICTLY! 

For security reasons, we advice all winners to keep this information confidential from the public until your claim is processed and your prize released to you. This is part of our security protocol to avoid double claiming and unwarranted taking advantage of this program by 
non-participant or unofficial personnel. 

Congratulations once again on your winnings!!! 


From sashak at voltaire.com  Tue Apr  8 11:31:13 2008
From: sashak at voltaire.com (Sasha Copyist)
Date: Tue, 8 Apr 2008 18:31:13 +0000
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <200804081135.35846.bs@q-leap.de>
References: <200804041147.27565.bs@q-leap.de>
	<20080408014406.GA16864@sashak.voltaire.com>
	<200804081135.35846.bs@q-leap.de>
Message-ID: <20080408183113.GA18308@sashak.voltaire.com>

Hi Bernd,

[adding Yevgeny..]

On 11:35 Tue 08 Apr     , Bernd Schubert wrote:
> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote:
> > Hi Bernd,
> >
> > On 11:47 Fri 04 Apr     , Bernd Schubert wrote:
> > > opensm-3.2.1 logs some error messages like this:
> > >
> > > Apr 04 00:00:08 325114 [4580A960] 0x01 ->
> > > __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node
> > > 0
> > > x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling
> > > list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3
> > > hop path: Path = 0,1,14,13
> > >
> > >
> > > From ibnetdiscover output I see port13 of this switch is a
> > > switch-interconnect (sorry, I don't know what the correct name/identifier
> > > for switches within switches):
> > >
> > > [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid
> > > 263 4xSDR
> >
> > It is possible that port was DOWN during first subnet discovery. Finally
> > everything should be initialized after those messages. Isn't it the case
> > here?
> 
> I think everything is initialized, but I don't think the port was down during 
> first subnet discovery, since the port is on a spine board (I called 
> it 'inter') to another switch system. We also never added any leafes to the 
> switches.

It is interesting phenomena then.

Yevgeny, do you aware about such issue with Flextrinocs switches?

Sasha


From andrea at qumranet.com  Tue Apr  8 08:44:07 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:07 +0200
Subject: [ofa-general] [PATCH 4 of 9] Move the tlb flushing into
	free_pgtables. The conversion of the locks
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <2c2ed514f294dbbfc661.1207669447@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666463 -7200
# Node ID 2c2ed514f294dbbfc66157f771bc900789ac6005
# Parent  33de2e17d0f5670515833bf8d3d2ea19e2a85b09
Move the tlb flushing into free_pgtables. The conversion of the locks
taken for reverse map scanning would require taking sleeping locks
in free_pgtables(). Moving the tlb flushing into free_pgtables allows
sleeping in parts of free_pgtables().

This means that we do a tlb_finish_mmu() before freeing the page tables.
Strictly speaking there may not be the need to do another tlb flush after
freeing the tables. But its the only way to free a series of page table
pages from the tlb list. And we do not want to call into the page allocator
for performance reasons. Aim9 numbers look okay after this patch.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -751,8 +751,8 @@
 		    void *private);
 void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
-		unsigned long floor, unsigned long ceiling);
+void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor,
+						unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -272,9 +272,11 @@
 	} while (pgd++, addr = next, addr != end);
 }
 
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
-		unsigned long floor, unsigned long ceiling)
+void free_pgtables(struct vm_area_struct *vma, unsigned long floor,
+							unsigned long ceiling)
 {
+	struct mmu_gather *tlb;
+
 	while (vma) {
 		struct vm_area_struct *next = vma->vm_next;
 		unsigned long addr = vma->vm_start;
@@ -286,8 +288,10 @@
 		unlink_file_vma(vma);
 
 		if (is_vm_hugetlb_page(vma)) {
-			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
+			tlb = tlb_gather_mmu(vma->vm_mm, 0);
+			hugetlb_free_pgd_range(&tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
+			tlb_finish_mmu(tlb, addr, vma->vm_end);
 		} else {
 			/*
 			 * Optimization: gather nearby vmas into one call down
@@ -299,8 +303,10 @@
 				anon_vma_unlink(vma);
 				unlink_file_vma(vma);
 			}
-			free_pgd_range(tlb, addr, vma->vm_end,
+			tlb = tlb_gather_mmu(vma->vm_mm, 0);
+			free_pgd_range(&tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
+			tlb_finish_mmu(tlb, addr, vma->vm_end);
 		}
 		vma = next;
 	}
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1752,9 +1752,9 @@
 	mmu_notifier_invalidate_range_start(mm, start, end);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
+	tlb_finish_mmu(tlb, start, end);
+	free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
-	tlb_finish_mmu(tlb, start, end);
 	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
@@ -2051,8 +2051,8 @@
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
 	tlb_finish_mmu(tlb, 0, end);
+	free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
 	/*
 	 * Walk the list again, actually closing and freeing it,


From andrea at qumranet.com  Tue Apr  8 08:44:06 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:06 +0200
Subject: [ofa-general] [PATCH 3 of 9] Moves all mmu notifier methods outside
	the PT lock (first and not last
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <33de2e17d0f567051583.1207669446@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666463 -7200
# Node ID 33de2e17d0f5670515833bf8d3d2ea19e2a85b09
# Parent  baceb322b45ed43280654dac6c964c9d3d8a936f
Moves all mmu notifier methods outside the PT lock (first and not last
step to make them sleep capable).

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -117,27 +117,6 @@
 	INIT_HLIST_HEAD(&mm->mmu_notifier_list);
 }
 
-#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
-({									\
-	pte_t __pte;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
-	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
-	__pte;								\
-})
-
-#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
-	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
-						  ___address);		\
-	__young;							\
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -169,9 +148,6 @@
 {
 }
 
-#define ptep_clear_flush_young_notify ptep_clear_flush_young
-#define ptep_clear_flush_notify ptep_clear_flush
-
 #endif /* CONFIG_MMU_NOTIFIER */
 
 #endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,11 +194,13 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush_notify(vma, address, pte);
+			pteval = ptep_clear_flush(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
+			/* must invalidate_page _before_ freeing the page */
+			mmu_notifier_invalidate_page(mm, address);
 			page_cache_release(page);
 		}
 	}
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1626,9 +1626,10 @@
 			 */
 			page_table = pte_offset_map_lock(mm, pmd, address,
 							 &ptl);
-			page_cache_release(old_page);
+			new_page = NULL;
 			if (!pte_same(*page_table, orig_pte))
 				goto unlock;
+			page_cache_release(old_page);
 
 			page_mkwrite = 1;
 		}
@@ -1644,6 +1645,7 @@
 		if (ptep_set_access_flags(vma, address, page_table, entry,1))
 			update_mmu_cache(vma, address, entry);
 		ret |= VM_FAULT_WRITE;
+		old_page = new_page = NULL;
 		goto unlock;
 	}
 
@@ -1688,7 +1690,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush_notify(vma, address, page_table);
+		ptep_clear_flush(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
@@ -1700,12 +1702,18 @@
 	} else
 		mem_cgroup_uncharge_page(new_page);
 
-	if (new_page)
+unlock:
+	pte_unmap_unlock(page_table, ptl);
+
+	if (new_page) {
+		if (new_page == old_page)
+			/* cow happened, notify before releasing old_page */
+			mmu_notifier_invalidate_page(mm, address);
 		page_cache_release(new_page);
+	}
 	if (old_page)
 		page_cache_release(old_page);
-unlock:
-	pte_unmap_unlock(page_table, ptl);
+
 	if (dirty_page) {
 		if (vma->vm_file)
 			file_update_time(vma->vm_file);
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -275,7 +275,7 @@
 	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
-	int referenced = 0;
+	int referenced = 0, clear_flush_young = 0;
 
 	address = vma_address(page, vma);
 	if (address == -EFAULT)
@@ -288,8 +288,11 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young_notify(vma, address, pte))
-		referenced++;
+	} else {
+		clear_flush_young = 1;
+		if (ptep_clear_flush_young(vma, address, pte))
+			referenced++;
+	}
 
 	/* Pretend the page is referenced if the task has the
 	   swap token and is in the middle of a page fault. */
@@ -299,6 +302,10 @@
 
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
+
+	if (clear_flush_young)
+		referenced += mmu_notifier_clear_flush_young(mm, address);
+
 out:
 	return referenced;
 }
@@ -457,7 +464,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush_notify(vma, address, pte);
+		entry = ptep_clear_flush(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -465,6 +472,10 @@
 	}
 
 	pte_unmap_unlock(pte, ptl);
+
+	if (ret)
+		mmu_notifier_invalidate_page(mm, address);
+
 out:
 	return ret;
 }
@@ -717,15 +728,14 @@
 	 * If it's recently referenced (perhaps page_referenced
 	 * skipped over this mm) then we should reactivate it.
 	 */
-	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young_notify(vma, address, pte)))) {
+	if (!migration && (vma->vm_flags & VM_LOCKED)) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush_notify(vma, address, pte);
+	pteval = ptep_clear_flush(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -780,6 +790,8 @@
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+	if (ret != SWAP_FAIL)
+		mmu_notifier_invalidate_page(mm, address);
 out:
 	return ret;
 }
@@ -818,7 +830,7 @@
 	spinlock_t *ptl;
 	struct page *page;
 	unsigned long address;
-	unsigned long end;
+	unsigned long start, end;
 
 	address = (vma->vm_start + cursor) & CLUSTER_MASK;
 	end = address + CLUSTER_SIZE;
@@ -839,6 +851,8 @@
 	if (!pmd_present(*pmd))
 		return;
 
+	start = address;
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 
 	/* Update high watermark before we lower rss */
@@ -850,12 +864,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young_notify(vma, address, pte))
+		if (ptep_clear_flush_young(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush_notify(vma, address, pte);
+		pteval = ptep_clear_flush(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))
@@ -871,6 +885,7 @@
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 static int try_to_unmap_anon(struct page *page, int migration)


From andrea at qumranet.com  Tue Apr  8 08:44:03 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:03 +0200
Subject: [ofa-general] [PATCH 0 of 9] mmu notifier #v12
Message-ID: <patchbomb.1207669443@duo.random>

The difference with #v11 is a different implementation of mm_lock that
guarantees handling signals in O(N). It's also more lowlatency friendly. 

Note that mmu_notifier_unregister may also fail with -EINTR if there are
signal pending or the system runs out of vmalloc space or physical memory,
only exit_mmap guarantees that any kernel module can be unloaded in presence
of an oom condition.

Either #v11 or the first three #v12 1,2,3 patches are suitable for inclusion
in -mm, pick what you prefer looking at the mmu_notifier_register retval and
mm_lock retval difference, I implemented and slighty tested both. GRU and KVM
only needs 1,2,3, XPMEM needs the rest of the patchset too (4, ...) but all
patches from 4 to the end can be deffered to a second merge window.


From andrea at qumranet.com  Tue Apr  8 08:44:04 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:04 +0200
Subject: [ofa-general] [PATCH 1 of 9] Lock the entire mm to prevent any mmu
	related operation to happen
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <ec6d8f91b299cf26cce5.1207669444@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666462 -7200
# Node ID ec6d8f91b299cf26cce5c3d49bb25d35ee33c137
# Parent  d4c25404de6376297ed34fada14cd6b894410eb0
Lock the entire mm to prevent any mmu related operation to happen.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,15 @@
 				   unsigned long addr, unsigned long len,
 				   unsigned long flags, struct page **pages);
 
+struct mm_lock_data {
+	spinlock_t **i_mmap_locks;
+	spinlock_t **anon_vma_locks;
+	unsigned long nr_i_mmap_locks;
+	unsigned long nr_anon_vma_locks;
+};
+extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
+extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,7 @@
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
+#include <linux/vmalloc.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -2242,3 +2243,140 @@
 
 	return 0;
 }
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+struct mm_lock_data *mm_lock(struct mm_struct * mm)
+{
+	struct vm_area_struct *vma;
+	spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
+	unsigned long nr_i_mmap_locks, nr_anon_vma_locks, i;
+	struct mm_lock_data *data;
+	int err;
+
+	down_write(&mm->mmap_sem);
+
+	err = -EINTR;
+	nr_i_mmap_locks = nr_anon_vma_locks = 0;
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		cond_resched();
+		if (unlikely(signal_pending(current)))
+			goto out;
+
+		if (vma->vm_file && vma->vm_file->f_mapping)
+			nr_i_mmap_locks++;
+		if (vma->anon_vma)
+			nr_anon_vma_locks++;
+	}
+
+	err = -ENOMEM;
+	data = kmalloc(sizeof(struct mm_lock_data), GFP_KERNEL);
+	if (!data)
+		goto out;
+
+	if (nr_i_mmap_locks) {
+		data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
+					     sizeof(spinlock_t));
+		if (!data->i_mmap_locks)
+			goto out_kfree;
+	} else
+		data->i_mmap_locks = NULL;
+
+	if (nr_anon_vma_locks) {
+		data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
+					       sizeof(spinlock_t));
+		if (!data->anon_vma_locks)
+			goto out_vfree;
+	} else
+		data->anon_vma_locks = NULL;
+
+	err = -EINTR;
+	i_mmap_lock_last = NULL;
+	nr_i_mmap_locks = 0;
+	for (;;) {
+		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			cond_resched();
+			if (unlikely(signal_pending(current)))
+				goto out_vfree_both;
+
+			if (!vma->vm_file || !vma->vm_file->f_mapping)
+				continue;
+			if ((unsigned long) i_mmap_lock >
+			    (unsigned long)
+			    &vma->vm_file->f_mapping->i_mmap_lock &&
+			    (unsigned long)
+			    &vma->vm_file->f_mapping->i_mmap_lock >
+			    (unsigned long) i_mmap_lock_last)
+				i_mmap_lock =
+					&vma->vm_file->f_mapping->i_mmap_lock;
+		}
+		if (i_mmap_lock == (spinlock_t *) -1UL)
+			break;
+		i_mmap_lock_last = i_mmap_lock;
+		data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
+	}
+	data->nr_i_mmap_locks = nr_i_mmap_locks;
+
+	anon_vma_lock_last = NULL;
+	nr_anon_vma_locks = 0;
+	for (;;) {
+		spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			cond_resched();
+			if (unlikely(signal_pending(current)))
+				goto out_vfree_both;
+
+			if (!vma->anon_vma)
+				continue;
+			if ((unsigned long) anon_vma_lock >
+			    (unsigned long) &vma->anon_vma->lock &&
+			    (unsigned long) &vma->anon_vma->lock >
+			    (unsigned long) anon_vma_lock_last)
+				anon_vma_lock = &vma->anon_vma->lock;
+		}
+		if (anon_vma_lock == (spinlock_t *) -1UL)
+			break;
+		anon_vma_lock_last = anon_vma_lock;
+		data->anon_vma_locks[nr_anon_vma_locks++] = anon_vma_lock;
+	}
+	data->nr_anon_vma_locks = nr_anon_vma_locks;
+
+	for (i = 0; i < nr_i_mmap_locks; i++)
+		spin_lock(data->i_mmap_locks[i]);
+	for (i = 0; i < nr_anon_vma_locks; i++)
+		spin_lock(data->anon_vma_locks[i]);
+
+	return data;
+
+out_vfree_both:
+	vfree(data->anon_vma_locks);
+out_vfree:
+	vfree(data->i_mmap_locks);
+out_kfree:
+	kfree(data);
+out:
+	up_write(&mm->mmap_sem);
+	return ERR_PTR(err);
+}
+
+void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
+{
+	unsigned long i;
+
+	for (i = 0; i < data->nr_i_mmap_locks; i++)
+		spin_unlock(data->i_mmap_locks[i]);
+	for (i = 0; i < data->nr_anon_vma_locks; i++)
+		spin_unlock(data->anon_vma_locks[i]);
+
+	up_write(&mm->mmap_sem);
+	
+	vfree(data->i_mmap_locks);
+	vfree(data->anon_vma_locks);
+	kfree(data);
+}


From andrea at qumranet.com  Tue Apr  8 08:44:05 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:05 +0200
Subject: [ofa-general] [PATCH 2 of 9] Core of mmu notifiers
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <baceb322b45ed4328065.1207669445@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666462 -7200
# Node ID baceb322b45ed43280654dac6c964c9d3d8a936f
# Parent  ec6d8f91b299cf26cce5c3d49bb25d35ee33c137
Core of mmu notifiers.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Nick Piggin <npiggin at suse.de>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,6 +225,9 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+	struct hlist_head mmu_notifier_list;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,177 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mm_types.h>
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+	/*
+	 * Called when nobody can register any more notifier in the mm
+	 * and after the "mn" notifier has been disarmed already.
+	 */
+	void (*release)(struct mmu_notifier *mn,
+			struct mm_struct *mm);
+
+	/*
+	 * clear_flush_young is called after the VM is
+	 * test-and-clearing the young/accessed bitflag in the
+	 * pte. This way the VM will provide proper aging to the
+	 * accesses to the page through the secondary MMUs and not
+	 * only to the ones through the Linux pte.
+	 */
+	int (*clear_flush_young)(struct mmu_notifier *mn,
+				 struct mm_struct *mm,
+				 unsigned long address);
+
+	/*
+	 * Before this is invoked any secondary MMU is still ok to
+	 * read/write to the page previously pointed by the Linux pte
+	 * because the old page hasn't been freed yet.  If required
+	 * set_page_dirty has to be called internally to this method.
+	 */
+	void (*invalidate_page)(struct mmu_notifier *mn,
+				struct mm_struct *mm,
+				unsigned long address);
+
+	/*
+	 * invalidate_range_start() and invalidate_range_end() must be
+	 * paired. Multiple invalidate_range_start/ends may be nested
+	 * or called concurrently.
+	 */
+	void (*invalidate_range_start)(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long start, unsigned long end);
+	void (*invalidate_range_end)(struct mmu_notifier *mn,
+				     struct mm_struct *mm,
+				     unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+	struct hlist_node hlist;
+	const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return unlikely(!hlist_empty(&mm->mmu_notifier_list));
+}
+
+extern int mmu_notifier_register(struct mmu_notifier *mn,
+				 struct mm_struct *mm);
+extern int mmu_notifier_unregister(struct mmu_notifier *mn,
+				   struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_clear_flush_young(mm, address);
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_end(mm, start, end);
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+	INIT_HLIST_HEAD(&mm->mmu_notifier_list);
+}
+
+#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
+({									\
+	pte_t __pte;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
+	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
+	__pte;								\
+})
+
+#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
+({									\
+	int __young;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
+	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
+						  ___address);		\
+	__young;							\
+})
+
+#else /* CONFIG_MMU_NOTIFIER */
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+}
+
+#define ptep_clear_flush_young_notify ptep_clear_flush_young
+#define ptep_clear_flush_notify ptep_clear_flush
+
+#endif /* CONFIG_MMU_NOTIFIER */
+
+#endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -53,6 +53,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -362,6 +363,7 @@
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
+		mmu_notifier_mm_init(mm);
 		return mm;
 	}
 
diff --git a/mm/Kconfig b/mm/Kconfig
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,3 +193,7 @@
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config MMU_NOTIFIER
+	def_bool y
+	bool "MMU notifier, for paging KVM/RDMA"
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,4 +33,5 @@
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush(vma, address, pte);
+			pteval = ptep_clear_flush_notify(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -214,7 +215,9 @@
 		spin_unlock(&mapping->i_mmap_lock);
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = populate_range(mm, vma, start, size, pgoff);
+	mmu_notifier_invalidate_range_end(mm, start, start + size);
 	if (!err && !(flags & MAP_NONBLOCK)) {
 		if (unlikely(has_write_lock)) {
 			downgrade_write(&mm->mmap_sem);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,6 +14,7 @@
 #include <linux/mempolicy.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -799,6 +800,7 @@
 	BUG_ON(start & ~HPAGE_MASK);
 	BUG_ON(end & ~HPAGE_MASK);
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -819,6 +821,7 @@
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		list_del(&page->lru);
 		put_page(page);
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -611,6 +612,9 @@
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_start(src_mm, addr, end);
+
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);
 	do {
@@ -621,6 +625,11 @@
 						vma, addr, next))
 			return -ENOMEM;
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_end(src_mm,
+						vma->vm_start, end);
+
 	return 0;
 }
 
@@ -897,7 +906,9 @@
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start(mm, address, end);
 	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
+	mmu_notifier_invalidate_range_end(mm, address, end);
 	if (tlb)
 		tlb_finish_mmu(tlb, address, end);
 	return end;
@@ -1463,10 +1474,11 @@
 {
 	pgd_t *pgd;
 	unsigned long next;
-	unsigned long end = addr + size;
+	unsigned long start = addr, end = addr + size;
 	int err;
 
 	BUG_ON(addr >= end);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pgd = pgd_offset(mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1474,6 +1486,7 @@
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1675,7 +1688,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush(vma, address, page_table);
+		ptep_clear_flush_notify(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -27,6 +27,7 @@
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
 #include <linux/vmalloc.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1748,11 +1749,13 @@
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
 	tlb_finish_mmu(tlb, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 /*
@@ -2038,6 +2041,7 @@
 	unsigned long end;
 
 	/* mm's last user has gone, and its about to be pulled down */
+	mmu_notifier_release(mm);
 	arch_exit_mmap(mm);
 
 	lru_add_drain();
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
new file mode 100644
--- /dev/null
+++ b/mm/mmu_notifier.c
@@ -0,0 +1,126 @@
+/*
+ *  linux/mm/mmu_notifier.c
+ *
+ *  Copyright (C) 2008  Qumranet, Inc.
+ *  Copyright (C) 2008  SGI
+ *             Christoph Lameter <clameter at sgi.com>
+ *
+ *  This work is licensed under the terms of the GNU GPL, version 2. See
+ *  the COPYING file in the top-level directory.
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/err.h>
+
+/*
+ * No synchronization. This function can only be called when only a single
+ * process remains that performs teardown.
+ */
+void __mmu_notifier_release(struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+
+	while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
+		mn = hlist_entry(mm->mmu_notifier_list.first,
+				 struct mmu_notifier,
+				 hlist);
+		hlist_del(&mn->hlist);
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+	}
+}
+
+/*
+ * If no young bitflag is supported by the hardware, ->clear_flush_young can
+ * unmap the address and return 1 or 0 depending if the mapping previously
+ * existed or not.
+ */
+int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int young = 0;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->clear_flush_young)
+			young |= mn->ops->clear_flush_young(mn, mm, address);
+	}
+
+	return young;
+}
+
+void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_page)
+			mn->ops->invalidate_page(mn, mm, address);
+	}
+}
+
+void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_range_start)
+			mn->ops->invalidate_range_start(mn, mm, start, end);
+	}
+}
+
+void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_range_end)
+			mn->ops->invalidate_range_end(mn, mm, start, end);
+	}
+}
+
+/*
+ * Must not hold mmap_sem nor any other VM related lock when calling
+ * this registration function.
+ */
+int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data *data;
+
+	data = mm_lock(mm);
+	if (unlikely(IS_ERR(data)))
+		return PTR_ERR(data);
+	hlist_add_head(&mn->hlist, &mm->mmu_notifier_list);
+	mm_unlock(mm, data);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_register);
+
+/*
+ * mm_users can't go down to zero while mmu_notifier_unregister()
+ * runs or it can race with ->release. So a mm_users pin must
+ * be taken by the caller (if mm can be different from current->mm).
+ */
+int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data *data;
+
+	BUG_ON(!atomic_read(&mm->mm_users));
+
+	data = mm_lock(mm);
+	if (unlikely(IS_ERR(data)))
+		return PTR_ERR(data);
+	hlist_del(&mn->hlist);
+	mm_unlock(mm, data);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -21,6 +21,7 @@
 #include <linux/syscalls.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
@@ -198,10 +199,12 @@
 		dirty_accountable = 1;
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
 	vm_stat_account(mm, newflags, vma->vm_file, nrpages);
 	return 0;
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -18,6 +18,7 @@
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -74,7 +75,11 @@
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
+	unsigned long old_start;
 
+	old_start = old_addr;
+	mmu_notifier_invalidate_range_start(vma->vm_mm,
+					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -116,6 +121,7 @@
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -49,6 +49,7 @@
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlbflush.h>
 
@@ -287,7 +288,7 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young(vma, address, pte))
+	} else if (ptep_clear_flush_young_notify(vma, address, pte))
 		referenced++;
 
 	/* Pretend the page is referenced if the task has the
@@ -456,7 +457,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
+		entry = ptep_clear_flush_notify(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -717,14 +718,14 @@
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young(vma, address, pte)))) {
+			(ptep_clear_flush_young_notify(vma, address, pte)))) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+	pteval = ptep_clear_flush_notify(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -849,12 +850,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young(vma, address, pte))
+		if (ptep_clear_flush_young_notify(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
+		pteval = ptep_clear_flush_notify(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))


From andrea at qumranet.com  Tue Apr  8 08:44:09 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:09 +0200
Subject: [ofa-general] [PATCH 6 of 9] We no longer abort unmapping in unmap
	vmas because we can reschedule while
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <b0cb674314534b9cc475.1207669449@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666893 -7200
# Node ID b0cb674314534b9cc4759603f123474d38427b2d
# Parent  20e829e35dfeceeb55a816ef495afda10cd50b98
We no longer abort unmapping in unmap vmas because we can reschedule while
unmapping since we are holding a semaphore. This would allow moving more
of the tlb flusing into unmap_vmas reducing code in various places.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -723,8 +723,7 @@
 struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
-		struct vm_area_struct *start_vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *);
 
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -805,7 +805,6 @@
 
 /**
  * unmap_vmas - unmap a range of memory covered by a list of vma's
- * @tlbp: address of the caller's struct mmu_gather
  * @vma: the starting vma
  * @start_addr: virtual address at which to start unmapping
  * @end_addr: virtual address at which to end unmapping
@@ -817,20 +816,13 @@
  * Unmap all pages in the vma list.
  *
  * We aim to not hold locks for too long (for scheduling latency reasons).
- * So zap pages in ZAP_BLOCK_SIZE bytecounts.  This means we need to
- * return the ending mmu_gather to the caller.
+ * So zap pages in ZAP_BLOCK_SIZE bytecounts.
  *
  * Only addresses between `start' and `end' will be unmapped.
  *
  * The VMA list must be sorted in ascending virtual address order.
- *
- * unmap_vmas() assumes that the caller will flush the whole unmapped address
- * range after unmap_vmas() returns.  So the only responsibility here is to
- * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
- * drops the lock and schedules.
  */
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
-		struct vm_area_struct *vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *details)
 {
@@ -838,7 +830,15 @@
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	int fullmm = (*tlbp)->fullmm;
+	int fullmm;
+	struct mmu_gather *tlb;
+	struct mm_struct *mm = vma->vm_mm;
+
+	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
+	lru_add_drain();
+	tlb = tlb_gather_mmu(mm, 0);
+	update_hiwater_rss(mm);
+	fullmm = tlb->fullmm;
 
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
 		unsigned long end;
@@ -865,7 +865,7 @@
 						(HPAGE_SIZE / PAGE_SIZE);
 				start = end;
 			} else
-				start = unmap_page_range(*tlbp, vma,
+				start = unmap_page_range(tlb, vma,
 						start, end, &zap_work, details);
 
 			if (zap_work > 0) {
@@ -873,13 +873,15 @@
 				break;
 			}
 
-			tlb_finish_mmu(*tlbp, tlb_start, start);
+			tlb_finish_mmu(tlb, tlb_start, start);
 			cond_resched();
-			*tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
+			tlb = tlb_gather_mmu(vma->vm_mm, fullmm);
 			tlb_start_valid = 0;
 			zap_work = ZAP_BLOCK_SIZE;
 		}
 	}
+	tlb_finish_mmu(tlb, start_addr, end_addr);
+	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
 	return start;	/* which is now the end (or restart) address */
 }
 
@@ -893,20 +895,10 @@
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *details)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	struct mmu_gather *tlb;
 	unsigned long end = address + size;
 	unsigned long nr_accounted = 0;
 
-	lru_add_drain();
-	tlb = tlb_gather_mmu(mm, 0);
-	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, address, end);
-	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
-	mmu_notifier_invalidate_range_end(mm, address, end);
-	if (tlb)
-		tlb_finish_mmu(tlb, address, end);
-	return end;
+	return unmap_vmas(vma, address, end, &nr_accounted, details);
 }
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1743,19 +1743,12 @@
 		unsigned long start, unsigned long end)
 {
 	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
-	struct mmu_gather *tlb;
 	unsigned long nr_accounted = 0;
 
-	lru_add_drain();
-	tlb = tlb_gather_mmu(mm, 0);
-	update_hiwater_rss(mm);
-	mmu_notifier_invalidate_range_start(mm, start, end);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	tlb_finish_mmu(tlb, start, end);
 	free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
-	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 /*
@@ -2035,7 +2028,6 @@
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
-	struct mmu_gather *tlb;
 	struct vm_area_struct *vma = mm->mmap;
 	unsigned long nr_accounted = 0;
 	unsigned long end;
@@ -2046,12 +2038,9 @@
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb = tlb_gather_mmu(mm, 1);
-	/* Don't update_hiwater_rss(mm) here, do_exit already did */
-	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+
+	end = unmap_vmas(vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	tlb_finish_mmu(tlb, 0, end);
 	free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
 	/*


From andrea at qumranet.com  Tue Apr  8 08:44:08 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:08 +0200
Subject: [ofa-general] [PATCH 5 of 9] The conversion to a rwsem allows
	callbacks during rmap traversal
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <20e829e35dfeceeb55a8.1207669448@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666463 -7200
# Node ID 20e829e35dfeceeb55a816ef495afda10cd50b98
# Parent  2c2ed514f294dbbfc66157f771bc900789ac6005
The conversion to a rwsem allows callbacks during rmap traversal
for files in a non atomic context. A rw style lock also allows concurrent
walking of the reverse map. This is fairly straightforward if one removes
pieces of the resched checking.

[Restarting unmapping is an issue to be discussed].

This slightly increases Aim9 performance results on an 8p.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -69,7 +69,7 @@
 	if (!vma_shareable(vma, addr))
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;
@@ -94,7 +94,7 @@
 		put_page(virt_to_page(spte));
 	spin_unlock(&mm->page_table_lock);
 out:
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -454,10 +454,10 @@
 	pgoff = offset >> PAGE_SHIFT;
 
 	i_size_write(inode, offset);
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	if (!prio_tree_empty(&mapping->i_mmap))
 		hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	truncate_hugepages(inode, offset);
 	return 0;
 }
diff --git a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -210,7 +210,7 @@
 	INIT_LIST_HEAD(&inode->i_devices);
 	INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
 	rwlock_init(&inode->i_data.tree_lock);
-	spin_lock_init(&inode->i_data.i_mmap_lock);
+	init_rwsem(&inode->i_data.i_mmap_sem);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
 	spin_lock_init(&inode->i_data.private_lock);
 	INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -503,7 +503,7 @@
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
 	struct prio_tree_root	i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-	spinlock_t		i_mmap_lock;	/* protect tree, count, list */
+	struct rw_semaphore	i_mmap_sem;	/* protect tree, count, list */
 	unsigned int		truncate_count;	/* Cover race condition with truncate */
 	unsigned long		nrpages;	/* number of total pages */
 	pgoff_t			writeback_index;/* writeback starts here */
diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -716,7 +716,7 @@
 	struct address_space *check_mapping;	/* Check page->mapping if set */
 	pgoff_t	first_index;			/* Lowest page->index to unmap */
 	pgoff_t last_index;			/* Highest page->index to unmap */
-	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
+	struct rw_semaphore *i_mmap_sem;	/* For unmap_mapping_range: */
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
 };
 
@@ -1051,9 +1051,9 @@
 				   unsigned long flags, struct page **pages);
 
 struct mm_lock_data {
-	spinlock_t **i_mmap_locks;
+	struct rw_semaphore **i_mmap_sems;
 	spinlock_t **anon_vma_locks;
-	unsigned long nr_i_mmap_locks;
+	unsigned long nr_i_mmap_sems;
 	unsigned long nr_anon_vma_locks;
 };
 extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -274,12 +274,12 @@
 				atomic_dec(&inode->i_writecount);
 
 			/* insert tmp into the share list, just after mpnt */
-			spin_lock(&file->f_mapping->i_mmap_lock);
+			down_write(&file->f_mapping->i_mmap_sem);
 			tmp->vm_truncate_count = mpnt->vm_truncate_count;
 			flush_dcache_mmap_lock(file->f_mapping);
 			vma_prio_tree_add(tmp, mpnt);
 			flush_dcache_mmap_unlock(file->f_mapping);
-			spin_unlock(&file->f_mapping->i_mmap_lock);
+			up_write(&file->f_mapping->i_mmap_sem);
 		}
 
 		/*
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -61,16 +61,16 @@
 /*
  * Lock ordering:
  *
- *  ->i_mmap_lock		(vmtruncate)
+ *  ->i_mmap_sem		(vmtruncate)
  *    ->private_lock		(__free_pte->__set_page_dirty_buffers)
  *      ->swap_lock		(exclusive_swap_page, others)
  *        ->mapping->tree_lock
  *
  *  ->i_mutex
- *    ->i_mmap_lock		(truncate->unmap_mapping_range)
+ *    ->i_mmap_sem		(truncate->unmap_mapping_range)
  *
  *  ->mmap_sem
- *    ->i_mmap_lock
+ *    ->i_mmap_sem
  *      ->page_table_lock or pte_lock	(various, mainly in memory.c)
  *        ->mapping->tree_lock	(arch-dependent flush_dcache_mmap_lock)
  *
@@ -87,7 +87,7 @@
  *    ->sb_lock			(fs/fs-writeback.c)
  *    ->mapping->tree_lock	(__sync_single_inode)
  *
- *  ->i_mmap_lock
+ *  ->i_mmap_sem
  *    ->anon_vma.lock		(vma_adjust)
  *
  *  ->anon_vma.lock
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -184,7 +184,7 @@
 	if (!page)
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		mm = vma->vm_mm;
 		address = vma->vm_start +
@@ -204,7 +204,7 @@
 			page_cache_release(page);
 		}
 	}
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -206,13 +206,13 @@
 			}
 			goto out;
 		}
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		flush_dcache_mmap_lock(mapping);
 		vma->vm_flags |= VM_NONLINEAR;
 		vma_prio_tree_remove(vma, &mapping->i_mmap);
 		vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
 		flush_dcache_mmap_unlock(mapping);
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	}
 
 	mmu_notifier_invalidate_range_start(mm, start, start + size);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -790,7 +790,7 @@
 	struct page *page;
 	struct page *tmp;
 	/*
-	 * A page gathering list, protected by per file i_mmap_lock. The
+	 * A page gathering list, protected by per file i_mmap_sem. The
 	 * lock is used to avoid list corruption from multiple unmapping
 	 * of the same page since we are using page->lru.
 	 */
@@ -840,9 +840,9 @@
 	 * do nothing in this case.
 	 */
 	if (vma->vm_file) {
-		spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+		down_write(&vma->vm_file->f_mapping->i_mmap_sem);
 		__unmap_hugepage_range(vma, start, end);
-		spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+		up_write(&vma->vm_file->f_mapping->i_mmap_sem);
 	}
 }
 
@@ -1085,7 +1085,7 @@
 	BUG_ON(address >= end);
 	flush_cache_range(vma, address, end);
 
-	spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+	down_write(&vma->vm_file->f_mapping->i_mmap_sem);
 	spin_lock(&mm->page_table_lock);
 	for (; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -1100,7 +1100,7 @@
 		}
 	}
 	spin_unlock(&mm->page_table_lock);
-	spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+	up_write(&vma->vm_file->f_mapping->i_mmap_sem);
 
 	flush_tlb_range(vma, start, end);
 }
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -838,7 +838,6 @@
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
 	int fullmm = (*tlbp)->fullmm;
 
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
@@ -875,22 +874,12 @@
 			}
 
 			tlb_finish_mmu(*tlbp, tlb_start, start);
-
-			if (need_resched() ||
-				(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
-				if (i_mmap_lock) {
-					*tlbp = NULL;
-					goto out;
-				}
-				cond_resched();
-			}
-
+			cond_resched();
 			*tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
 			tlb_start_valid = 0;
 			zap_work = ZAP_BLOCK_SIZE;
 		}
 	}
-out:
 	return start;	/* which is now the end (or restart) address */
 }
 
@@ -1752,7 +1741,7 @@
 /*
  * Helper functions for unmap_mapping_range().
  *
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
+ * __ Notes on dropping i_mmap_sem to reduce latency while unmapping __
  *
  * We have to restart searching the prio_tree whenever we drop the lock,
  * since the iterator is only valid while the lock is held, and anyway
@@ -1771,7 +1760,7 @@
  * can't efficiently keep all vmas in step with mapping->truncate_count:
  * so instead reset them all whenever it wraps back to 0 (then go to 1).
  * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
+ * i_mmap_sem.
  *
  * In order to make forward progress despite repeatedly restarting some
  * large vma, note the restart_addr from unmap_vmas when it breaks out:
@@ -1821,7 +1810,7 @@
 
 	restart_addr = zap_page_range(vma, start_addr,
 					end_addr - start_addr, details);
-	need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
+	need_break = need_resched();
 
 	if (restart_addr >= end_addr) {
 		/* We have now completed this vma: mark it so */
@@ -1835,9 +1824,9 @@
 			goto again;
 	}
 
-	spin_unlock(details->i_mmap_lock);
+	up_write(details->i_mmap_sem);
 	cond_resched();
-	spin_lock(details->i_mmap_lock);
+	down_write(details->i_mmap_sem);
 	return -EINTR;
 }
 
@@ -1931,9 +1920,9 @@
 	details.last_index = hba + hlen - 1;
 	if (details.last_index < details.first_index)
 		details.last_index = ULONG_MAX;
-	details.i_mmap_lock = &mapping->i_mmap_lock;
+	details.i_mmap_sem = &mapping->i_mmap_sem;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_write(&mapping->i_mmap_sem);
 
 	/* Protect against endless unmapping loops */
 	mapping->truncate_count++;
@@ -1948,7 +1937,7 @@
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
 	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
 		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
-	spin_unlock(&mapping->i_mmap_lock);
+	up_write(&mapping->i_mmap_sem);
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -211,12 +211,12 @@
 	if (!mapping)
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff)
 		remove_migration_pte(vma, old, new);
 
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -188,7 +188,7 @@
 }
 
 /*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_sem
  */
 static void __remove_shared_vm_struct(struct vm_area_struct *vma,
 		struct file *file, struct address_space *mapping)
@@ -216,9 +216,9 @@
 
 	if (file) {
 		struct address_space *mapping = file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		__remove_shared_vm_struct(vma, file, mapping);
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	}
 }
 
@@ -441,7 +441,7 @@
 		mapping = vma->vm_file->f_mapping;
 
 	if (mapping) {
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		vma->vm_truncate_count = mapping->truncate_count;
 	}
 	anon_vma_lock(vma);
@@ -451,7 +451,7 @@
 
 	anon_vma_unlock(vma);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 
 	mm->map_count++;
 	validate_mm(mm);
@@ -538,7 +538,7 @@
 		mapping = file->f_mapping;
 		if (!(vma->vm_flags & VM_NONLINEAR))
 			root = &mapping->i_mmap;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		if (importer &&
 		    vma->vm_truncate_count != next->vm_truncate_count) {
 			/*
@@ -622,7 +622,7 @@
 	if (anon_vma)
 		spin_unlock(&anon_vma->lock);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 
 	if (remove_next) {
 		if (file)
@@ -2066,7 +2066,7 @@
 
 /* Insert vm structure into process list sorted by address
  * and into the inode's i_mmap tree.  If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_sem is taken here.
  */
 int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
 {
@@ -2258,22 +2258,23 @@
 struct mm_lock_data *mm_lock(struct mm_struct * mm)
 {
 	struct vm_area_struct *vma;
-	spinlock_t *i_mmap_lock_last, *anon_vma_lock_last;
-	unsigned long nr_i_mmap_locks, nr_anon_vma_locks, i;
+	struct rw_semaphore *i_mmap_sem_last;
+	spinlock_t *anon_vma_lock_last;
+	unsigned long nr_i_mmap_sems, nr_anon_vma_locks, i;
 	struct mm_lock_data *data;
 	int err;
 
 	down_write(&mm->mmap_sem);
 
 	err = -EINTR;
-	nr_i_mmap_locks = nr_anon_vma_locks = 0;
+	nr_i_mmap_sems = nr_anon_vma_locks = 0;
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		cond_resched();
 		if (unlikely(signal_pending(current)))
 			goto out;
 
 		if (vma->vm_file && vma->vm_file->f_mapping)
-			nr_i_mmap_locks++;
+			nr_i_mmap_sems++;
 		if (vma->anon_vma)
 			nr_anon_vma_locks++;
 	}
@@ -2283,13 +2284,13 @@
 	if (!data)
 		goto out;
 
-	if (nr_i_mmap_locks) {
-		data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
-					     sizeof(spinlock_t));
-		if (!data->i_mmap_locks)
+	if (nr_i_mmap_sems) {
+		data->i_mmap_sems = vmalloc(nr_i_mmap_sems *
+					    sizeof(struct rw_semaphore));
+		if (!data->i_mmap_sems)
 			goto out_kfree;
 	} else
-		data->i_mmap_locks = NULL;
+		data->i_mmap_sems = NULL;
 
 	if (nr_anon_vma_locks) {
 		data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
@@ -2300,10 +2301,11 @@
 		data->anon_vma_locks = NULL;
 
 	err = -EINTR;
-	i_mmap_lock_last = NULL;
-	nr_i_mmap_locks = 0;
+	i_mmap_sem_last = NULL;
+	nr_i_mmap_sems = 0;
 	for (;;) {
-		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
+		struct rw_semaphore *i_mmap_sem;
+		i_mmap_sem = (struct rw_semaphore *) -1UL;
 		for (vma = mm->mmap; vma; vma = vma->vm_next) {
 			cond_resched();
 			if (unlikely(signal_pending(current)))
@@ -2311,21 +2313,21 @@
 
 			if (!vma->vm_file || !vma->vm_file->f_mapping)
 				continue;
-			if ((unsigned long) i_mmap_lock >
+			if ((unsigned long) i_mmap_sem >
 			    (unsigned long)
-			    &vma->vm_file->f_mapping->i_mmap_lock &&
+			    &vma->vm_file->f_mapping->i_mmap_sem &&
 			    (unsigned long)
-			    &vma->vm_file->f_mapping->i_mmap_lock >
-			    (unsigned long) i_mmap_lock_last)
-				i_mmap_lock =
-					&vma->vm_file->f_mapping->i_mmap_lock;
+			    &vma->vm_file->f_mapping->i_mmap_sem >
+			    (unsigned long) i_mmap_sem_last)
+				i_mmap_sem =
+					&vma->vm_file->f_mapping->i_mmap_sem;
 		}
-		if (i_mmap_lock == (spinlock_t *) -1UL)
+		if (i_mmap_sem == (struct rw_semaphore *) -1UL)
 			break;
-		i_mmap_lock_last = i_mmap_lock;
-		data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
+		i_mmap_sem_last = i_mmap_sem;
+		data->i_mmap_sems[nr_i_mmap_sems++] = i_mmap_sem;
 	}
-	data->nr_i_mmap_locks = nr_i_mmap_locks;
+	data->nr_i_mmap_sems = nr_i_mmap_sems;
 
 	anon_vma_lock_last = NULL;
 	nr_anon_vma_locks = 0;
@@ -2351,8 +2353,8 @@
 	}
 	data->nr_anon_vma_locks = nr_anon_vma_locks;
 
-	for (i = 0; i < nr_i_mmap_locks; i++)
-		spin_lock(data->i_mmap_locks[i]);
+	for (i = 0; i < nr_i_mmap_sems; i++)
+		down_write(data->i_mmap_sems[i]);
 	for (i = 0; i < nr_anon_vma_locks; i++)
 		spin_lock(data->anon_vma_locks[i]);
 
@@ -2361,7 +2363,7 @@
 out_vfree_both:
 	vfree(data->anon_vma_locks);
 out_vfree:
-	vfree(data->i_mmap_locks);
+	vfree(data->i_mmap_sems);
 out_kfree:
 	kfree(data);
 out:
@@ -2373,14 +2375,14 @@
 {
 	unsigned long i;
 
-	for (i = 0; i < data->nr_i_mmap_locks; i++)
-		spin_unlock(data->i_mmap_locks[i]);
+	for (i = 0; i < data->nr_i_mmap_sems; i++)
+		up_write(data->i_mmap_sems[i]);
 	for (i = 0; i < data->nr_anon_vma_locks; i++)
 		spin_unlock(data->anon_vma_locks[i]);
 
 	up_write(&mm->mmap_sem);
 	
-	vfree(data->i_mmap_locks);
+	vfree(data->i_mmap_sems);
 	vfree(data->anon_vma_locks);
 	kfree(data);
 }
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -88,7 +88,7 @@
 		 * and we propagate stale pages into the dst afterward.
 		 */
 		mapping = vma->vm_file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		if (new_vma->vm_truncate_count &&
 		    new_vma->vm_truncate_count != vma->vm_truncate_count)
 			new_vma->vm_truncate_count = 0;
@@ -120,7 +120,7 @@
 	pte_unmap_nested(new_pte - 1);
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -24,7 +24,7 @@
  *   inode->i_alloc_sem (vmtruncate_range)
  *   mm->mmap_sem
  *     page->flags PG_locked (lock_page)
- *       mapping->i_mmap_lock
+ *       mapping->i_mmap_sem
  *         anon_vma->lock
  *           mm->page_table_lock or pte_lock
  *             zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -373,14 +373,14 @@
 	 * The page lock not only makes sure that page->mapping cannot
 	 * suddenly be NULLified by truncation, it makes sure that the
 	 * structure at mapping cannot be freed and reused yet,
-	 * so we can safely take mapping->i_mmap_lock.
+	 * so we can safely take mapping->i_mmap_sem.
 	 */
 	BUG_ON(!PageLocked(page));
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 
 	/*
-	 * i_mmap_lock does not stabilize mapcount at all, but mapcount
+	 * i_mmap_sem does not stabilize mapcount at all, but mapcount
 	 * is more likely to be accurate if we note it after spinning.
 	 */
 	mapcount = page_mapcount(page);
@@ -403,7 +403,7 @@
 			break;
 	}
 
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	return referenced;
 }
 
@@ -489,12 +489,12 @@
 
 	BUG_ON(PageAnon(page));
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		if (vma->vm_flags & VM_SHARED)
 			ret += page_mkclean_one(page, vma);
 	}
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	return ret;
 }
 
@@ -930,7 +930,7 @@
 	unsigned long max_nl_size = 0;
 	unsigned int mapcount;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		ret = try_to_unmap_one(page, vma, migration);
 		if (ret == SWAP_FAIL || !page_mapped(page))
@@ -967,7 +967,6 @@
 	mapcount = page_mapcount(page);
 	if (!mapcount)
 		goto out;
-	cond_resched_lock(&mapping->i_mmap_lock);
 
 	max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
 	if (max_nl_cursor == 0)
@@ -989,7 +988,6 @@
 			}
 			vma->vm_private_data = (void *) max_nl_cursor;
 		}
-		cond_resched_lock(&mapping->i_mmap_lock);
 		max_nl_cursor += CLUSTER_SIZE;
 	} while (max_nl_cursor <= max_nl_size);
 
@@ -1001,7 +999,7 @@
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
 		vma->vm_private_data = NULL;
 out:
-	spin_unlock(&mapping->i_mmap_lock);
+	up_write(&mapping->i_mmap_sem);
 	return ret;
 }
 

From andrea at qumranet.com  Tue Apr  8 08:44:10 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:10 +0200
Subject: [ofa-general] [PATCH 7 of 9] Convert the anon_vma spinlock to a rw
	semaphore. This allows concurrent
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <a0c52e4b9b71e2627238.1207669450@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666968 -7200
# Node ID a0c52e4b9b71e2627238b69c0a58905097973279
# Parent  b0cb674314534b9cc4759603f123474d38427b2d
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse maps for try_to_unmap and page_mkclean. It also
allows the calling of sleeping functions from reverse map traversal.

An additional complication is that rcu is used in some context to guarantee
the presence of the anon_vma while we acquire the lock. We cannot take a
semaphore within an rcu critical section. Add a refcount to the anon_vma
structure which allow us to give an existence guarantee for the anon_vma
structure independent of the spinlock or the list contents.

The refcount can then be taken within the RCU section. If it has been
taken successfully then the refcount guarantees the existence of the
anon_vma. The refcount in anon_vma also allows us to fix a nasty
issue in page migration where we fudged by using rcu for a long code
path to guarantee the existence of the anon_vma.

The refcount in general allows a shortening of RCU critical sections since
we can do a rcu_unlock after taking the refcount. This is particularly
relevant if the anon_vma chains contain hundreds of entries.

Issues:
- Atomic overhead increases in situations where a new reference
  to the anon_vma has to be established or removed. Overhead also increases
  when a speculative reference is used (try_to_unmap,
  page_mkclean, page migration). There is also the more frequent processor
  change due to up_xxx letting waiting tasks run first.
  This results in f.e. the Aim9 brk performance test to got down by 10-15%.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1051,9 +1051,9 @@
 
 struct mm_lock_data {
 	struct rw_semaphore **i_mmap_sems;
-	spinlock_t **anon_vma_locks;
+	struct rw_semaphore **anon_vma_sems;
 	unsigned long nr_i_mmap_sems;
-	unsigned long nr_anon_vma_locks;
+	unsigned long nr_anon_vma_sems;
 };
 extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
 extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -25,7 +25,8 @@
  * pointing to this anon_vma once its vma list is empty.
  */
 struct anon_vma {
-	spinlock_t lock;	/* Serialize access to vma list */
+	atomic_t refcount;	/* vmas on the list */
+	struct rw_semaphore sem;/* Serialize access to vma list */
 	struct list_head head;	/* List of private "related" vmas */
 };
 
@@ -43,18 +44,31 @@
 	kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
+struct anon_vma *grab_anon_vma(struct page *page);
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->refcount);
+}
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_test(&anon_vma->refcount))
+		anon_vma_free(anon_vma);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_lock(&anon_vma->lock);
+		down_write(&anon_vma->sem);
 }
 
 static inline void anon_vma_unlock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 }
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -235,15 +235,16 @@
 		return;
 
 	/*
-	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+	 * We hold either the mmap_sem lock or a reference on the
+	 * anon_vma. So no need to call page_lock_anon_vma.
 	 */
 	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
+	down_read(&anon_vma->sem);
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
 		remove_migration_pte(vma, old, new);
 
-	spin_unlock(&anon_vma->lock);
+	up_read(&anon_vma->sem);
 }
 
 /*
@@ -623,7 +624,7 @@
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
-	int rcu_locked = 0;
+	struct anon_vma *anon_vma = NULL;
 	int charge = 0;
 
 	if (!newpage)
@@ -647,16 +648,14 @@
 	}
 	/*
 	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
-	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * we cannot notice that anon_vma is freed while we migrate a page.
 	 * This rcu_read_lock() delays freeing anon_vma pointer until the end
 	 * of migration. File cache pages are no problem because of page_lock()
 	 * File Caches may use write_page() or lock_page() in migration, then,
 	 * just care Anon page here.
 	 */
-	if (PageAnon(page)) {
-		rcu_read_lock();
-		rcu_locked = 1;
-	}
+	if (PageAnon(page))
+		anon_vma = grab_anon_vma(page);
 
 	/*
 	 * Corner case handling:
@@ -674,10 +673,7 @@
 		if (!PageAnon(page) && PagePrivate(page)) {
 			/*
 			 * Go direct to try_to_free_buffers() here because
-			 * a) that's what try_to_release_page() would do anyway
-			 * b) we may be under rcu_read_lock() here, so we can't
-			 *    use GFP_KERNEL which is what try_to_release_page()
-			 *    needs to be effective.
+			 * that's what try_to_release_page() would do anyway
 			 */
 			try_to_free_buffers(page);
 		}
@@ -698,8 +694,8 @@
 	} else if (charge)
  		mem_cgroup_end_migration(newpage);
 rcu_unlock:
-	if (rcu_locked)
-		rcu_read_unlock();
+	if (anon_vma)
+		put_anon_vma(anon_vma);
 
 unlock:
 
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -566,7 +566,7 @@
 	if (vma->anon_vma)
 		anon_vma = vma->anon_vma;
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
+		down_write(&anon_vma->sem);
 		/*
 		 * Easily overlooked: when mprotect shifts the boundary,
 		 * make sure the expanding vma has anon_vma set if the
@@ -620,7 +620,7 @@
 	}
 
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 	if (mapping)
 		up_write(&mapping->i_mmap_sem);
 
@@ -2247,16 +2247,15 @@
 struct mm_lock_data *mm_lock(struct mm_struct * mm)
 {
 	struct vm_area_struct *vma;
-	struct rw_semaphore *i_mmap_sem_last;
-	spinlock_t *anon_vma_lock_last;
-	unsigned long nr_i_mmap_sems, nr_anon_vma_locks, i;
+	struct rw_semaphore *i_mmap_sem_last, *anon_vma_sem_last;
+	unsigned long nr_i_mmap_sems, nr_anon_vma_sems, i;
 	struct mm_lock_data *data;
 	int err;
 
 	down_write(&mm->mmap_sem);
 
 	err = -EINTR;
-	nr_i_mmap_sems = nr_anon_vma_locks = 0;
+	nr_i_mmap_sems = nr_anon_vma_sems = 0;
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		cond_resched();
 		if (unlikely(signal_pending(current)))
@@ -2265,7 +2264,7 @@
 		if (vma->vm_file && vma->vm_file->f_mapping)
 			nr_i_mmap_sems++;
 		if (vma->anon_vma)
-			nr_anon_vma_locks++;
+			nr_anon_vma_sems++;
 	}
 
 	err = -ENOMEM;
@@ -2281,13 +2280,13 @@
 	} else
 		data->i_mmap_sems = NULL;
 
-	if (nr_anon_vma_locks) {
-		data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
-					       sizeof(spinlock_t));
-		if (!data->anon_vma_locks)
+	if (nr_anon_vma_sems) {
+		data->anon_vma_sems = vmalloc(nr_anon_vma_sems *
+					      sizeof(struct rw_semaphore));
+		if (!data->anon_vma_sems)
 			goto out_vfree;
 	} else
-		data->anon_vma_locks = NULL;
+		data->anon_vma_sems = NULL;
 
 	err = -EINTR;
 	i_mmap_sem_last = NULL;
@@ -2318,10 +2317,11 @@
 	}
 	data->nr_i_mmap_sems = nr_i_mmap_sems;
 
-	anon_vma_lock_last = NULL;
-	nr_anon_vma_locks = 0;
+	anon_vma_sem_last = NULL;
+	nr_anon_vma_sems = 0;
 	for (;;) {
-		spinlock_t *anon_vma_lock = (spinlock_t *) -1UL;
+		struct rw_semaphore *anon_vma_sem;
+		anon_vma_sem = (struct rw_semaphore *) -1UL;
 		for (vma = mm->mmap; vma; vma = vma->vm_next) {
 			cond_resched();
 			if (unlikely(signal_pending(current)))
@@ -2329,28 +2329,28 @@
 
 			if (!vma->anon_vma)
 				continue;
-			if ((unsigned long) anon_vma_lock >
-			    (unsigned long) &vma->anon_vma->lock &&
-			    (unsigned long) &vma->anon_vma->lock >
-			    (unsigned long) anon_vma_lock_last)
-				anon_vma_lock = &vma->anon_vma->lock;
+			if ((unsigned long) anon_vma_sem >
+			    (unsigned long) &vma->anon_vma->sem &&
+			    (unsigned long) &vma->anon_vma->sem >
+			    (unsigned long) anon_vma_sem_last)
+				anon_vma_sem = &vma->anon_vma->sem;
 		}
-		if (anon_vma_lock == (spinlock_t *) -1UL)
+		if (anon_vma_sem == (struct rw_semaphore *) -1UL)
 			break;
-		anon_vma_lock_last = anon_vma_lock;
-		data->anon_vma_locks[nr_anon_vma_locks++] = anon_vma_lock;
+		anon_vma_sem_last = anon_vma_sem;
+		data->anon_vma_sems[nr_anon_vma_sems++] = anon_vma_sem;
 	}
-	data->nr_anon_vma_locks = nr_anon_vma_locks;
+	data->nr_anon_vma_sems = nr_anon_vma_sems;
 
 	for (i = 0; i < nr_i_mmap_sems; i++)
 		down_write(data->i_mmap_sems[i]);
-	for (i = 0; i < nr_anon_vma_locks; i++)
-		spin_lock(data->anon_vma_locks[i]);
+	for (i = 0; i < nr_anon_vma_sems; i++)
+		down_write(data->anon_vma_sems[i]);
 
 	return data;
 
 out_vfree_both:
-	vfree(data->anon_vma_locks);
+	vfree(data->anon_vma_sems);
 out_vfree:
 	vfree(data->i_mmap_sems);
 out_kfree:
@@ -2366,12 +2366,12 @@
 
 	for (i = 0; i < data->nr_i_mmap_sems; i++)
 		up_write(data->i_mmap_sems[i]);
-	for (i = 0; i < data->nr_anon_vma_locks; i++)
-		spin_unlock(data->anon_vma_locks[i]);
+	for (i = 0; i < data->nr_anon_vma_sems; i++)
+		up_write(data->anon_vma_sems[i]);
 
 	up_write(&mm->mmap_sem);
 	
 	vfree(data->i_mmap_sems);
-	vfree(data->anon_vma_locks);
+	vfree(data->anon_vma_sems);
 	kfree(data);
 }
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -69,7 +69,7 @@
 		if (anon_vma) {
 			allocated = NULL;
 			locked = anon_vma;
-			spin_lock(&locked->lock);
+			down_write(&locked->sem);
 		} else {
 			anon_vma = anon_vma_alloc();
 			if (unlikely(!anon_vma))
@@ -81,6 +81,7 @@
 		/* page_table_lock to protect against threads */
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
+			get_anon_vma(anon_vma);
 			vma->anon_vma = anon_vma;
 			list_add_tail(&vma->anon_vma_node, &anon_vma->head);
 			allocated = NULL;
@@ -88,7 +89,7 @@
 		spin_unlock(&mm->page_table_lock);
 
 		if (locked)
-			spin_unlock(&locked->lock);
+			up_write(&locked->sem);
 		if (unlikely(allocated))
 			anon_vma_free(allocated);
 	}
@@ -99,14 +100,17 @@
 {
 	BUG_ON(vma->anon_vma != next->anon_vma);
 	list_del(&next->anon_vma_node);
+	put_anon_vma(vma->anon_vma);
 }
 
 void __anon_vma_link(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 
-	if (anon_vma)
+	if (anon_vma) {
+		get_anon_vma(anon_vma);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
+	}
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -114,36 +118,32 @@
 	struct anon_vma *anon_vma = vma->anon_vma;
 
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
+		get_anon_vma(anon_vma);
+		down_write(&anon_vma->sem);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 	}
 }
 
 void anon_vma_unlink(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
-	int empty;
 
 	if (!anon_vma)
 		return;
 
-	spin_lock(&anon_vma->lock);
+	down_write(&anon_vma->sem);
 	list_del(&vma->anon_vma_node);
-
-	/* We must garbage collect the anon_vma if it's empty */
-	empty = list_empty(&anon_vma->head);
-	spin_unlock(&anon_vma->lock);
-
-	if (empty)
-		anon_vma_free(anon_vma);
+	up_write(&anon_vma->sem);
+	put_anon_vma(anon_vma);
 }
 
 static void anon_vma_ctor(struct kmem_cache *cachep, void *data)
 {
 	struct anon_vma *anon_vma = data;
 
-	spin_lock_init(&anon_vma->lock);
+	init_rwsem(&anon_vma->sem);
+	atomic_set(&anon_vma->refcount, 0);
 	INIT_LIST_HEAD(&anon_vma->head);
 }
 
@@ -157,9 +157,9 @@
  * Getting a lock on a stable anon_vma from a page off the LRU is
  * tricky: page_lock_anon_vma rely on RCU to guard against the races.
  */
-static struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *grab_anon_vma(struct page *page)
 {
-	struct anon_vma *anon_vma;
+	struct anon_vma *anon_vma = NULL;
 	unsigned long anon_mapping;
 
 	rcu_read_lock();
@@ -170,17 +170,26 @@
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
-	return anon_vma;
+	if (!atomic_inc_not_zero(&anon_vma->refcount))
+		anon_vma = NULL;
 out:
 	rcu_read_unlock();
-	return NULL;
+	return anon_vma;
+}
+
+static struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+	struct anon_vma *anon_vma = grab_anon_vma(page);
+
+	if (anon_vma)
+		down_read(&anon_vma->sem);
+	return anon_vma;
 }
 
 static void page_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
-	rcu_read_unlock();
+	up_read(&anon_vma->sem);
+	put_anon_vma(anon_vma);
 }
 
 /*


From andrea at qumranet.com  Tue Apr  8 08:44:11 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:11 +0200
Subject: [ofa-general] [PATCH 8 of 9] XPMEM would have used sys_madvise()
	except that madvise_dontneed()
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <3b14e26a4e0491f00bb9.1207669451@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666972 -7200
# Node ID 3b14e26a4e0491f00bb989be04d8b7e0755ed2d7
# Parent  a0c52e4b9b71e2627238b69c0a58905097973279
XPMEM would have used sys_madvise() except that madvise_dontneed()
returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages
XPMEM imports from other partitions and is also true for uncached pages
allocated locally via the mspec allocator.  XPMEM needs zap_page_range()
functionality for these types of pages as well as 'normal' pages.

Signed-off-by: Dean Nelson <dcn at sgi.com>

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -900,6 +900,7 @@
 
 	return unmap_vmas(vma, address, end, &nr_accounted, details);
 }
+EXPORT_SYMBOL_GPL(zap_page_range);
 
 /*
  * Do a quick page-table lookup for a single page.


From andrea at qumranet.com  Tue Apr  8 08:44:12 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 08 Apr 2008 17:44:12 +0200
Subject: [ofa-general] [PATCH 9 of 9] This patch adds a lock ordering rule to
	avoid a potential deadlock when
In-Reply-To: <patchbomb.1207669443@duo.random>
Message-ID: <bd55023b22769ecb14b2.1207669452@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1207666972 -7200
# Node ID bd55023b22769ecb14b26c2347947f7d6d63bcea
# Parent  3b14e26a4e0491f00bb989be04d8b7e0755ed2d7
This patch adds a lock ordering rule to avoid a potential deadlock when
multiple mmap_sems need to be locked.

Signed-off-by: Dean Nelson <dcn at sgi.com>

diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -79,6 +79,9 @@
  *
  *  ->i_mutex			(generic_file_buffered_write)
  *    ->mmap_sem		(fault_in_pages_readable->do_page_fault)
+ *
+ *    When taking multiple mmap_sems, one should lock the lowest-addressed
+ *    one first proceeding on up to the highest-addressed one.
  *
  *  ->i_mutex
  *    ->i_alloc_sem             (various)


From holt at sgi.com  Tue Apr  8 09:26:19 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 8 Apr 2008 11:26:19 -0500
Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers
In-Reply-To: <baceb322b45ed4328065.1207669445@duo.random>
References: <patchbomb.1207669443@duo.random>
	<baceb322b45ed4328065.1207669445@duo.random>
Message-ID: <20080408162619.GP11364@sgi.com>

This one does not build on ia64.  I get the following:

[holt at attica mmu_v12_xpmem_v003_v1]$ make compressed
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CALL    scripts/checksyscalls.sh
  CHK     include/linux/compile.h
  CC      mm/mmu_notifier.o
In file included from include/linux/mmu_notifier.h:6,
                 from mm/mmu_notifier.c:12:
include/linux/mm_types.h:200: error: expected specifier-qualifier-list before ‘cpumask_t’
In file included from mm/mmu_notifier.c:12:
include/linux/mmu_notifier.h: In function ‘mm_has_notifiers’:
include/linux/mmu_notifier.h:62: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
include/linux/mmu_notifier.h: In function ‘mmu_notifier_mm_init’:
include/linux/mmu_notifier.h:117: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
In file included from include/asm/pgtable.h:155,
                 from include/linux/mm.h:39,
                 from mm/mmu_notifier.c:14:
include/asm/mmu_context.h: In function ‘get_mmu_context’:
include/asm/mmu_context.h:81: error: ‘struct mm_struct’ has no member named ‘context’
include/asm/mmu_context.h:88: error: ‘struct mm_struct’ has no member named ‘context’
include/asm/mmu_context.h:90: error: ‘struct mm_struct’ has no member named ‘cpu_vm_mask’
include/asm/mmu_context.h:99: error: ‘struct mm_struct’ has no member named ‘context’
include/asm/mmu_context.h: In function ‘init_new_context’:
include/asm/mmu_context.h:120: error: ‘struct mm_struct’ has no member named ‘context’
include/asm/mmu_context.h: In function ‘activate_context’:
include/asm/mmu_context.h:173: error: ‘struct mm_struct’ has no member named ‘cpu_vm_mask’
include/asm/mmu_context.h:174: error: ‘struct mm_struct’ has no member named ‘cpu_vm_mask’
include/asm/mmu_context.h:180: error: ‘struct mm_struct’ has no member named ‘context’
mm/mmu_notifier.c: In function ‘__mmu_notifier_release’:
mm/mmu_notifier.c:25: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
mm/mmu_notifier.c:26: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_clear_flush_young’:
mm/mmu_notifier.c:47: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_page’:
mm/mmu_notifier.c:61: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_range_start’:
mm/mmu_notifier.c:73: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘__mmu_notifier_invalidate_range_end’:
mm/mmu_notifier.c:85: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
mm/mmu_notifier.c: In function ‘mmu_notifier_register’:
mm/mmu_notifier.c:102: error: ‘struct mm_struct’ has no member named ‘mmu_notifier_list’
make[1]: *** [mm/mmu_notifier.o] Error 1
make: *** [mm] Error 2


From andrea at qumranet.com  Tue Apr  8 10:05:25 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 8 Apr 2008 19:05:25 +0200
Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers
In-Reply-To: <20080408162619.GP11364@sgi.com>
References: <patchbomb.1207669443@duo.random>
	<baceb322b45ed4328065.1207669445@duo.random>
	<20080408162619.GP11364@sgi.com>
Message-ID: <20080408170525.GN10133@duo.random>

On Tue, Apr 08, 2008 at 11:26:19AM -0500, Robin Holt wrote:
> This one does not build on ia64.  I get the following:

I think it's a common code compilation bug not related to my
patch. Can you test this?

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include <linux/rbtree.h>
 #include <linux/rwsem.h>
 #include <linux/completion.h>
+#include <linux/cpumask.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
 

From kliteyn at mellanox.co.il  Tue Apr  8 13:22:38 2008
From: kliteyn at mellanox.co.il (Yevgeny Kliteynik)
Date: Tue, 08 Apr 2008 23:22:38 +0300
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <20080408183113.GA18308@sashak.voltaire.com>
References: <200804041147.27565.bs@q-leap.de>	<20080408014406.GA16864@sashak.voltaire.com>	<200804081135.35846.bs@q-leap.de>
	<20080408183113.GA18308@sashak.voltaire.com>
Message-ID: <47FBD40E.70407@mellanox.co.il>


Sasha Copyist wrote:
> Hi Bernd,
>
> [adding Yevgeny..]
>
> On 11:35 Tue 08 Apr     , Bernd Schubert wrote:
>   
>> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote:
>>     
>>> Hi Bernd,
>>>
>>> On 11:47 Fri 04 Apr     , Bernd Schubert wrote:
>>>       
>>>> opensm-3.2.1 logs some error messages like this:
>>>>
>>>> Apr 04 00:00:08 325114 [4580A960] 0x01 ->
>>>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for node
>>>> 0
>>>> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep sampling
>>>> list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path Dump of 3
>>>> hop path: Path = 0,1,14,13
>>>>
>>>>
>>>> From ibnetdiscover output I see port13 of this switch is a
>>>> switch-interconnect (sorry, I don't know what the correct name/identifier
>>>> for switches within switches):
>>>>
>>>> [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid
>>>> 263 4xSDR
>>>>         
>>> It is possible that port was DOWN during first subnet discovery. Finally
>>> everything should be initialized after those messages. Isn't it the case
>>> here?
>>>       
>> I think everything is initialized, but I don't think the port was down during 
>> first subnet discovery, since the port is on a spine board (I called 
>> it 'inter') to another switch system. We also never added any leafes to the 
>> switches.
>>     
>
> It is interesting phenomena then.
>
> Yevgeny, do you aware about such issue with Flextrinocs switches?
>
>   

I've seen it before. It means that during discovery some switch has
answered NodeInfo query, but then when OpenSM started to query for
PortInfo for each port of this switch, switch didn't answer for some
(or all) ports.

I think that this might happen if a switch has just been "plugged in",
and internal switches are doing autonegotiation - they are bringing
ports up and down when determining whether a link is SDR or DDR.

In any case, this "phenomena" should disappear after a couple of
dozens of seconds, when all the autonegotiation phase would be over.

Bernd, am I close?


-- Yevgeny

> Sasha
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>   


From clameter at sgi.com  Tue Apr  8 13:23:33 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 8 Apr 2008 13:23:33 -0700 (PDT)
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <20080407071330.GH9309@duo.random>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
	<20080405005759.GH14784@duo.random>
	<Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>
	<20080407060602.GE9309@duo.random>
	<Pine.LNX.4.64.0804062314080.18728@schroedinger.engr.sgi.com>
	<20080407071330.GH9309@duo.random>
Message-ID: <Pine.LNX.4.64.0804081320160.30874@schroedinger.engr.sgi.com>

It may also be useful to allow invalidate_start() to fail in some contexts 
(try_to_unmap f.e., maybe if a certain flag is passed). This may allow the 
device to get out of tight situations (pending I/O f.e. or time out if 
there is no response for network communications). But then that 
complicates the API.


From Frank.Leers at Sun.COM  Tue Apr  8 13:58:21 2008
From: Frank.Leers at Sun.COM (Frank Leers)
Date: Tue, 08 Apr 2008 13:58:21 -0700
Subject: [ofa-general] install.sh question
Message-ID: <1207688301.1661.86.camel@localhost>

Hi all,

I'd like to be able to use the provided install.sh from cluster nodes to
install from a build which is shared over nfs, while utilizing an
ofed_net.conf  The Install Guide talks about this, but I must be missing
something in the detail.

Is there a way to not check if a build needs to be (re)done and simply
install the rpm's that were created during the original build, then
create the ifcfg-ib? devices based on the template file passed in with
-net <ofed_net.conf> ?  I prefer not to have kernel sources, compiler,
etc. on these compute nodes, nor should I have to recompile for each
homogeneous node.

thanks,

-frank


From avi at qumranet.com  Tue Apr  8 14:46:49 2008
From: avi at qumranet.com (Avi Kivity)
Date: Wed, 09 Apr 2008 00:46:49 +0300
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <patchbomb.1207669443@duo.random>
References: <patchbomb.1207669443@duo.random>
Message-ID: <47FBE7C9.9000701@qumranet.com>

Andrea Arcangeli wrote:
> Note that mmu_notifier_unregister may also fail with -EINTR if there are
> signal pending or the system runs out of vmalloc space or physical memory,
> only exit_mmap guarantees that any kernel module can be unloaded in presence
> of an oom condition.
>
>   

That's unusual.  What happens to the notifier?  Suppose I destroy a vm 
without exiting the process, what happens if it fires?

-- 
Any sufficiently difficult bug is indistinguishable from a feature.


From andrea at qumranet.com  Tue Apr  8 15:06:27 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 9 Apr 2008 00:06:27 +0200
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <47FBE7C9.9000701@qumranet.com>
References: <patchbomb.1207669443@duo.random> <47FBE7C9.9000701@qumranet.com>
Message-ID: <20080408220627.GP10133@duo.random>

On Wed, Apr 09, 2008 at 12:46:49AM +0300, Avi Kivity wrote:
> That's unusual.  What happens to the notifier?  Suppose I destroy a vm 

Yes it's quite unusual.

> without exiting the process, what happens if it fires?

The mmu notifier ops should stop doing stuff (if there will be no
memslots they will be noops), or the ops can be replaced atomically
with null pointers. The important thing is that the module can't go
away until ->release is invoked or until mmu_notifier_unregister
returned 0.

Previously there was no mmu_notifier_unregister, so adding it can't be
a regression compared to #v11, even if it can fail and you may have to
retry later after returning to userland. Retrying from userland is
always safe in oom kill terms, only looping inside the kernel isn't
safe as do_exit has no chance to run.


From sashak at voltaire.com  Tue Apr  8 18:10:21 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed,  9 Apr 2008 01:10:21 +0000
Subject: [ofa-general] [RFC][PATCH 0/4] opensm: using conventional config
	file
Message-ID: <1207703425-19039-1-git-send-email-sashak@voltaire.com>

Hi,

This is attempt to make some order with OpenSM configuration. Now it
will use conventional (similar to another programs which may have
configuration) config ($sysconfig/etc/opensm/opensm.conf) file instead
of option cache file. Config file for some startup scripts should go
away. Option '-c' is preserved - it can be useful for config file
template generation, but OpenSM will not try to read option cache file.

This is RFC yet. In addition to this we will need to update scripts and
man pages.

Any feedback? Thoughts?

Sasha


From sashak at voltaire.com  Tue Apr  8 18:10:22 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed,  9 Apr 2008 01:10:22 +0000
Subject: [ofa-general] [PATCH 1/4] opensm: pass file name as parameter to
	config parser funcs
In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
Message-ID: <1207703425-19039-2-git-send-email-sashak@voltaire.com>

Functions osm_subn_parse_conf_file() and osm_subn_write_conf_file() will
get config file name as parameter. Also it is stored as part of config
options and used by osm_subn_rescan_conf_files().

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_subnet.h |   10 +++++-
 opensm/opensm/main.c               |   13 +++++++-
 opensm/opensm/osm_subnet.c         |   53 ++++++++---------------------------
 3 files changed, 31 insertions(+), 45 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index b1dd659..98afbd4 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -205,6 +205,7 @@ typedef struct _osm_qos_options_t {
 * SYNOPSIS
 */
 typedef struct _osm_subn_opt {
+	char *config_file;
 	ib_net64_t guid;
 	ib_net64_t m_key;
 	ib_net64_t sm_key;
@@ -289,6 +290,9 @@ typedef struct _osm_subn_opt {
 /*
 * FIELDS
 *
+*	config_file
+*		The name of the config file.
+*
 *	guid
 *		The port guid that the SM is binding to.
 *
@@ -1057,7 +1061,8 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt);
 *
 * SYNOPSIS
 */
-ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opt);
+ib_api_status_t osm_subn_parse_conf_file(char *conf_file,
+					 IN osm_subn_opt_t * const p_opt);
 /*
 * PARAMETERS
 *
@@ -1109,7 +1114,8 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn);
 *
 * SYNOPSIS
 */
-ib_api_status_t osm_subn_write_conf_file(IN osm_subn_opt_t * const p_opt);
+ib_api_status_t osm_subn_write_conf_file(char *file_name,
+					 IN osm_subn_opt_t * const p_opt);
 /*
 * PARAMETERS
 *
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index fb41d50..91ee143 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -589,6 +589,8 @@ int main(int argc, char *argv[])
 {
 	osm_opensm_t osm;
 	osm_subn_opt_t opt;
+	char conf_file[256];
+	char *cache_dir;
 	ib_net64_t sm_key = 0;
 	ib_api_status_t status;
 	uint32_t temp, dbg_lvl;
@@ -674,7 +676,14 @@ int main(int argc, char *argv[])
 	printf("%s\n", OSM_VERSION);
 
 	osm_subn_set_default_opt(&opt);
-	if (osm_subn_parse_conf_file(&opt) != IB_SUCCESS)
+
+	/* try to open the options file from the cache dir */
+	cache_dir = getenv("OSM_CACHE_DIR");
+	if (!cache_dir || !(*cache_dir))
+		cache_dir = OSM_DEFAULT_CACHE_DIR;
+	snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
+
+	if (osm_subn_parse_conf_file(conf_file, &opt) != IB_SUCCESS)
 		printf("\nosm_subn_parse_conf_file failed!\n");
 
 	printf("Command Line Arguments:\n");
@@ -1013,7 +1022,7 @@ int main(int argc, char *argv[])
 		opt.guid = get_port_guid(&osm, opt.guid);
 
 	if (cache_options == TRUE
-	    && osm_subn_write_conf_file(&opt) != IB_SUCCESS)
+	    && osm_subn_write_conf_file(conf_file, &opt) != IB_SUCCESS)
 		printf("\nosm_subn_write_conf_file failed!\n");
 
 	status = osm_opensm_bind(&osm, opt.guid);
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index 47d735f..f3f4c52 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -71,14 +71,6 @@
 #include <opensm/osm_event_plugin.h>
 #include <opensm/osm_qos_policy.h>
 
-#if defined(PATH_MAX)
-#define OSM_PATH_MAX	(PATH_MAX + 1)
-#elif defined (_POSIX_PATH_MAX)
-#define OSM_PATH_MAX	(_POSIX_PATH_MAX + 1)
-#else
-#define OSM_PATH_MAX	256
-#endif
-
 /**********************************************************************
  **********************************************************************/
 void osm_subn_construct(IN osm_subn_t * const p_subn)
@@ -787,26 +779,20 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn)
  **********************************************************************/
 ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn)
 {
-	char *p_cache_dir = getenv("OSM_CACHE_DIR");
-	char file_name[OSM_PATH_MAX];
 	FILE *opts_file;
 	char line[1024];
 	char *p_key, *p_val, *p_last;
 
-	/* try to open the options file from the cache dir */
-	if (!p_cache_dir || !(*p_cache_dir))
-		p_cache_dir = OSM_DEFAULT_CACHE_DIR;
+	if (!p_subn->opt.config_file)
+		return 0;
 
-	strcpy(file_name, p_cache_dir);
-	strcat(file_name, "/opensm.opts");
-
-	opts_file = fopen(file_name, "r");
+	opts_file = fopen(p_subn->opt.config_file, "r");
 	if (!opts_file) {
 		if (errno == ENOENT)
 			return IB_SUCCESS;
 		OSM_LOG(&p_subn->p_osm->log, OSM_LOG_ERROR,
 			"cannot open file \'%s\': %s\n",
-			file_name, strerror(errno));
+			p_subn->opt.config_file, strerror(errno));
 		return IB_ERROR;
 	}
 
@@ -1142,21 +1128,13 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts)
 
 /**********************************************************************
  **********************************************************************/
-ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opts)
+ib_api_status_t osm_subn_parse_conf_file(char *file_name,
+					 IN osm_subn_opt_t * const p_opts)
 {
-	char *p_cache_dir = getenv("OSM_CACHE_DIR");
-	char file_name[OSM_PATH_MAX];
-	FILE *opts_file;
 	char line[1024];
+	FILE *opts_file;
 	char *p_key, *p_val, *p_last;
 
-	/* try to open the options file from the cache dir */
-	if (!p_cache_dir || !(*p_cache_dir))
-		p_cache_dir = OSM_DEFAULT_CACHE_DIR;
-
-	strcpy(file_name, p_cache_dir);
-	strcat(file_name, "/opensm.opts");
-
 	opts_file = fopen(file_name, "r");
 	if (!opts_file) {
 		if (errno == ENOENT)
@@ -1166,10 +1144,11 @@ ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opts)
 		return IB_ERROR;
 	}
 
-	sprintf(line, " Reading Cached Option File: %s\n", file_name);
-	printf(line);
+	printf(" Reading Cached Option File: %s\n", file_name);
 	cl_log_event("OpenSM", CL_LOG_INFO, line, NULL, 0);
 
+	p_opts->config_file = file_name;
+
 	while (fgets(line, 1023, opts_file) != NULL) {
 		/* get the first token */
 		p_key = strtok_r(line, " \t\n", &p_last);
@@ -1405,19 +1384,11 @@ ib_api_status_t osm_subn_parse_conf_file(IN osm_subn_opt_t * const p_opts)
 
 /**********************************************************************
  **********************************************************************/
-ib_api_status_t osm_subn_write_conf_file(IN osm_subn_opt_t * const p_opts)
+ib_api_status_t osm_subn_write_conf_file(char *file_name,
+					 IN osm_subn_opt_t * const p_opts)
 {
-	char *p_cache_dir = getenv("OSM_CACHE_DIR");
-	char file_name[OSM_PATH_MAX];
 	FILE *opts_file;
 
-	/* try to open the options file from the cache dir */
-	if (!p_cache_dir || !(*p_cache_dir))
-		p_cache_dir = OSM_DEFAULT_CACHE_DIR;
-
-	strcpy(file_name, p_cache_dir);
-	strcat(file_name, "/opensm.opts");
-
 	opts_file = fopen(file_name, "w");
 	if (!opts_file) {
 		printf("cannot open file \'%s\' for writing: %s\n",
-- 
1.5.4.1.122.gaa8d


From sashak at voltaire.com  Tue Apr  8 18:10:23 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed,  9 Apr 2008 01:10:23 +0000
Subject: [ofa-general] [PATCH 2/4] opensm: config file functions return int
In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
Message-ID: <1207703425-19039-3-git-send-email-sashak@voltaire.com>

config file handling functions (parse and write) will return integer
values instead of ib_api_status_t (it does nothing with 'ib') - when
a failure is not existing config file a positive value will be returned
to a caller, other errors will be indicated by a negative return value.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/include/opensm/osm_subnet.h |   35 +++++++++++------------------------
 opensm/opensm/main.c               |    4 ++--
 opensm/opensm/osm_state_mgr.c      |    3 +--
 opensm/opensm/osm_subnet.c         |   24 +++++++++++-------------
 4 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/opensm/include/opensm/osm_subnet.h b/opensm/include/opensm/osm_subnet.h
index 98afbd4..5b6cef0 100644
--- a/opensm/include/opensm/osm_subnet.h
+++ b/opensm/include/opensm/osm_subnet.h
@@ -1061,8 +1061,7 @@ void osm_subn_set_default_opt(IN osm_subn_opt_t * const p_opt);
 *
 * SYNOPSIS
 */
-ib_api_status_t osm_subn_parse_conf_file(char *conf_file,
-					 IN osm_subn_opt_t * const p_opt);
+int osm_subn_parse_conf_file(char *conf_file, osm_subn_opt_t * const p_opt);
 /*
 * PARAMETERS
 *
@@ -1070,14 +1069,8 @@ ib_api_status_t osm_subn_parse_conf_file(char *conf_file,
 *		[in] Pointer to the subnet options structure.
 *
 * RETURN VALUES
-*	IB_SUCCESS, IB_ERROR
-*
-* NOTES
-*  Assumes the conf file is part of the cache dir which defaults to
-*  OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts
-*
-* SEE ALSO
-*	Subnet object, osm_subn_construct, osm_subn_destroy
+*	0 on success, positive value if file doesn't exist,
+*	negative value otherwise
 *********/
 
 /****f* OpenSM: Subnet/osm_subn_rescan_conf_files
@@ -1090,7 +1083,7 @@ ib_api_status_t osm_subn_parse_conf_file(char *conf_file,
 *
 * SYNOPSIS
 */
-ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn);
+int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn);
 /*
 * PARAMETERS
 *
@@ -1098,10 +1091,8 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn);
 *		[in] Pointer to the subnet structure.
 *
 * RETURN VALUES
-*	IB_SUCCESS, IB_ERROR
-*
-* NOTES
-*  This uses the same file as osm_subn_parse_conf_files()
+*	0 on success, positive value if file doesn't exist,
+*	negative value otherwise
 *
 *********/
 
@@ -1110,12 +1101,11 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn);
 *	osm_subn_write_conf_file
 *
 * DESCRIPTION
-*  Write the configuration file into the cache
+*	Write the configuration file into the cache
 *
 * SYNOPSIS
 */
-ib_api_status_t osm_subn_write_conf_file(char *file_name,
-					 IN osm_subn_opt_t * const p_opt);
+int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t * const p_opt);
 /*
 * PARAMETERS
 *
@@ -1123,14 +1113,11 @@ ib_api_status_t osm_subn_write_conf_file(char *file_name,
 *		[in] Pointer to the subnet options structure.
 *
 * RETURN VALUES
-*	IB_SUCCESS, IB_ERROR
+*	0 on success, negative value otherwise
 *
 * NOTES
-*  Assumes the conf file is part of the cache dir which defaults to
-*  OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts
-*
-* SEE ALSO
-*	Subnet object, osm_subn_construct, osm_subn_destroy
+*	Assumes the conf file is part of the cache dir which defaults to
+*	OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts
 *********/
 
 END_C_DECLS
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index 91ee143..da8047e 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -683,7 +683,7 @@ int main(int argc, char *argv[])
 		cache_dir = OSM_DEFAULT_CACHE_DIR;
 	snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
 
-	if (osm_subn_parse_conf_file(conf_file, &opt) != IB_SUCCESS)
+	if (osm_subn_parse_conf_file(conf_file, &opt) < 0)
 		printf("\nosm_subn_parse_conf_file failed!\n");
 
 	printf("Command Line Arguments:\n");
@@ -1022,7 +1022,7 @@ int main(int argc, char *argv[])
 		opt.guid = get_port_guid(&osm, opt.guid);
 
 	if (cache_options == TRUE
-	    && osm_subn_write_conf_file(conf_file, &opt) != IB_SUCCESS)
+	    && osm_subn_write_conf_file(conf_file, &opt))
 		printf("\nosm_subn_write_conf_file failed!\n");
 
 	status = osm_opensm_bind(&osm, opt.guid);
diff --git a/opensm/opensm/osm_state_mgr.c b/opensm/opensm/osm_state_mgr.c
index 9b03314..8f1e086 100644
--- a/opensm/opensm/osm_state_mgr.c
+++ b/opensm/opensm/osm_state_mgr.c
@@ -1039,8 +1039,7 @@ _repeat_discovery:
 	sm->p_subn->subnet_initialization_error = FALSE;
 
 	/* rescan configuration updates */
-	status = osm_subn_rescan_conf_files(sm->p_subn);
-	if (status != IB_SUCCESS)
+	if (osm_subn_rescan_conf_files(sm->p_subn) < 0)
 		OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 331A: "
 			"osm_subn_rescan_conf_file failed\n");
 
diff --git a/opensm/opensm/osm_subnet.c b/opensm/opensm/osm_subnet.c
index f3f4c52..29a247a 100644
--- a/opensm/opensm/osm_subnet.c
+++ b/opensm/opensm/osm_subnet.c
@@ -777,7 +777,7 @@ osm_parse_prefix_routes_file(IN osm_subn_t * const p_subn)
 
 /**********************************************************************
  **********************************************************************/
-ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn)
+int osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn)
 {
 	FILE *opts_file;
 	char line[1024];
@@ -789,11 +789,11 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn)
 	opts_file = fopen(p_subn->opt.config_file, "r");
 	if (!opts_file) {
 		if (errno == ENOENT)
-			return IB_SUCCESS;
+			return 1;
 		OSM_LOG(&p_subn->p_osm->log, OSM_LOG_ERROR,
 			"cannot open file \'%s\': %s\n",
 			p_subn->opt.config_file, strerror(errno));
-		return IB_ERROR;
+		return -1;
 	}
 
 	while (fgets(line, 1023, opts_file) != NULL) {
@@ -828,7 +828,7 @@ ib_api_status_t osm_subn_rescan_conf_files(IN osm_subn_t * const p_subn)
 
 	osm_parse_prefix_routes_file(p_subn);
 
-	return IB_SUCCESS;
+	return 0;
 }
 
 /**********************************************************************
@@ -1128,8 +1128,7 @@ static void subn_verify_conf_file(IN osm_subn_opt_t * const p_opts)
 
 /**********************************************************************
  **********************************************************************/
-ib_api_status_t osm_subn_parse_conf_file(char *file_name,
-					 IN osm_subn_opt_t * const p_opts)
+int osm_subn_parse_conf_file(char *file_name, osm_subn_opt_t * const p_opts)
 {
 	char line[1024];
 	FILE *opts_file;
@@ -1138,10 +1137,10 @@ ib_api_status_t osm_subn_parse_conf_file(char *file_name,
 	opts_file = fopen(file_name, "r");
 	if (!opts_file) {
 		if (errno == ENOENT)
-			return IB_SUCCESS;
+			return 1;
 		printf("cannot open file \'%s\': %s\n",
 		       file_name, strerror(errno));
-		return IB_ERROR;
+		return -1;
 	}
 
 	printf(" Reading Cached Option File: %s\n", file_name);
@@ -1379,13 +1378,12 @@ ib_api_status_t osm_subn_parse_conf_file(char *file_name,
 
 	subn_verify_conf_file(p_opts);
 
-	return IB_SUCCESS;
+	return 0;
 }
 
 /**********************************************************************
  **********************************************************************/
-ib_api_status_t osm_subn_write_conf_file(char *file_name,
-					 IN osm_subn_opt_t * const p_opts)
+int osm_subn_write_conf_file(char *file_name, IN osm_subn_opt_t *const p_opts)
 {
 	FILE *opts_file;
 
@@ -1393,7 +1391,7 @@ ib_api_status_t osm_subn_write_conf_file(char *file_name,
 	if (!opts_file) {
 		printf("cannot open file \'%s\' for writing: %s\n",
 		       file_name, strerror(errno));
-		return IB_ERROR;
+		return -1;
 	}
 
 	fprintf(opts_file,
@@ -1715,5 +1713,5 @@ ib_api_status_t osm_subn_write_conf_file(char *file_name,
 
 	fclose(opts_file);
 
-	return IB_SUCCESS;
+	return 0;
 }
-- 
1.5.4.1.122.gaa8d


From sashak at voltaire.com  Tue Apr  8 18:10:24 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed,  9 Apr 2008 01:10:24 +0000
Subject: [ofa-general] [PATCH 3/4] opensm: option to specify config file
In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
Message-ID: <1207703425-19039-4-git-send-email-sashak@voltaire.com>

There is a new command line option '--config <file-name>' (or '-F').
When specified OpenSM will read initial configuration from this file
(the format is same as opensm.opts) and not from
/var/cache/opensm/opensm.opts file.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/opensm/main.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index da8047e..e39037d 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -130,6 +130,11 @@ static void show_usage(void)
 	printf("\n------- OpenSM - Usage and options ----------------------\n");
 	printf("Usage:   opensm [options]\n");
 	printf("Options:\n");
+	printf("-F <file-name>, --config <file-name>\n"
+	       "          The name of the OpenSM config file. It has a same format\n"
+	       "          as opensm.opts option cache file. When not specified\n"
+	       "          $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n"
+	       "          will be used (if exists).\n\n");
 	printf("-c\n"
 	       "--cache-options\n"
 	       "          Cache the given command line options into the file\n"
@@ -600,8 +605,9 @@ int main(int argc, char *argv[])
 	boolean_t cache_options = FALSE;
 	char *ignore_guids_file_name = NULL;
 	uint32_t val;
+	unsigned config_file_done = 0;
 	const char *const short_option =
-	    "i:f:ed:g:l:L:s:t:a:u:m:R:zM:U:S:P:Y:NBIQvVhorcyxp:n:q:k:C:";
+	    "F:i:f:ed:g:l:L:s:t:a:u:m:R:zM:U:S:P:Y:NBIQvVhorcyxp:n:q:k:C:";
 
 	/*
 	   In the array below, the 2nd parameter specifies the number
@@ -611,6 +617,7 @@ int main(int argc, char *argv[])
 	   2: optional
 	 */
 	const struct option long_option[] = {
+		{"config", 1, NULL, 'F'},
 		{"debug", 1, NULL, 'd'},
 		{"guid", 1, NULL, 'g'},
 		{"ignore_guids", 1, NULL, 'i'},
@@ -691,6 +698,18 @@ int main(int argc, char *argv[])
 		next_option = getopt_long_only(argc, argv, short_option,
 					       long_option, NULL);
 		switch (next_option) {
+		case 'F':
+			if (config_file_done)
+				break;
+			printf("Reloading config from `%s`:\n", optarg);
+			if (osm_subn_parse_conf_file(optarg, &opt)) {
+				printf("cannot parse config file.\n");
+				exit(1);
+			}
+			printf("Rescaning command line:\n");
+			config_file_done = 1;
+			optind = 0;
+			break;
 		case 'o':
 			/*
 			   Run once option.
-- 
1.5.4.1.122.gaa8d


From sashak at voltaire.com  Tue Apr  8 18:10:25 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed,  9 Apr 2008 01:10:25 +0000
Subject: [ofa-general] [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as
	config file
In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
Message-ID: <1207703425-19039-5-git-send-email-sashak@voltaire.com>

Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is
not specified) OpenSM config file. Default value is
$sysconfdir/opensm/opensm.conf.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/configure.in              |   20 ++++++++++++++++++++
 opensm/include/opensm/osm_base.h |   21 +++++++++++++++++++++
 opensm/opensm/main.c             |   25 +++++++++++--------------
 3 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/opensm/configure.in b/opensm/configure.in
index a527c91..858eb60 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR,
 	[Define OpenSM config directory])
 AC_SUBST(OPENSM_CONFIG_DIR)
 
+dnl Check for a different default OpenSm config file
+OPENSM_CONFIG_FILE=opensm.conf
+AC_MSG_CHECKING(for --with-opensm-conf-file )
+AC_ARG_WITH(opensm-conf-file,
+    AC_HELP_STRING([--with-opensm-conf-file=file],
+                   [define a default OpenSM config file (default opensm.conf)]),
+    [ case "$withval" in
+    no)
+        ;;
+    *)
+        OPENSM_CONFIG_FILE=$withval
+        ;;
+    esac ]
+)
+AC_MSG_RESULT(${OPENSM_CONFIG_FILE})
+AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE,
+	["$CONF_DIR/$OPENSM_CONFIG_FILE"],
+	[Define a default OpenSM config file])
+AC_SUBST(OPENSM_CONFIG_FILE)
+
 dnl Check for a different default node name map file
 NODENAMEMAPFILE=ib-node-name-map
 AC_MSG_CHECKING(for --with-node-name-map )
diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 62d472e..1bd993e 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -213,6 +213,27 @@ BEGIN_C_DECLS
 #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log"
 #endif
 /***********/
+
+/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE
+* NAME
+*	OSM_DEFAULT_CONFIG_FILE
+*
+* DESCRIPTION
+*	Specifies the default OpenSM config file name
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf")
+#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE)
+#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE
+#elif define (OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_OPENSM_CONFIG_FILE OPENSM_COFNIG_DIR "/opensm.conf"
+#else
+#define OSM_DEFAULT_OPENSM_CONFIG_FILE "/etc/opensm/opensm.conf"
+#endif /* __WIN__ */
+/***********/
+
 /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE
 * NAME
 *	OSM_DEFAULT_PARTITION_CONFIG_FILE
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index e39037d..0576dcc 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -133,8 +133,7 @@ static void show_usage(void)
 	printf("-F <file-name>, --config <file-name>\n"
 	       "          The name of the OpenSM config file. It has a same format\n"
 	       "          as opensm.opts option cache file. When not specified\n"
-	       "          $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n"
-	       "          will be used (if exists).\n\n");
+	       "          " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n");
 	printf("-c\n"
 	       "--cache-options\n"
 	       "          Cache the given command line options into the file\n"
@@ -594,8 +593,6 @@ int main(int argc, char *argv[])
 {
 	osm_opensm_t osm;
 	osm_subn_opt_t opt;
-	char conf_file[256];
-	char *cache_dir;
 	ib_net64_t sm_key = 0;
 	ib_api_status_t status;
 	uint32_t temp, dbg_lvl;
@@ -684,13 +681,7 @@ int main(int argc, char *argv[])
 
 	osm_subn_set_default_opt(&opt);
 
-	/* try to open the options file from the cache dir */
-	cache_dir = getenv("OSM_CACHE_DIR");
-	if (!cache_dir || !(*cache_dir))
-		cache_dir = OSM_DEFAULT_CACHE_DIR;
-	snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
-
-	if (osm_subn_parse_conf_file(conf_file, &opt) < 0)
+	if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0)
 		printf("\nosm_subn_parse_conf_file failed!\n");
 
 	printf("Command Line Arguments:\n");
@@ -1040,9 +1031,15 @@ int main(int argc, char *argv[])
 	if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID))
 		opt.guid = get_port_guid(&osm, opt.guid);
 
-	if (cache_options == TRUE
-	    && osm_subn_write_conf_file(conf_file, &opt))
-		printf("\nosm_subn_write_conf_file failed!\n");
+	if (cache_options == TRUE) {
+		char conf_file[256];
+		char *cache_dir = getenv("OSM_CACHE_DIR");
+		if (!cache_dir || !(*cache_dir))
+			cache_dir = OSM_DEFAULT_CACHE_DIR;
+		snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
+		if (osm_subn_write_conf_file(conf_file, &opt))
+			printf("\nosm_subn_write_conf_file failed!\n");
+	}
 
 	status = osm_opensm_bind(&osm, opt.guid);
 	if (status != IB_SUCCESS) {
-- 
1.5.4.1.122.gaa8d


From chu11 at llnl.gov  Tue Apr  8 16:35:08 2008
From: chu11 at llnl.gov (Al Chu)
Date: Tue, 08 Apr 2008 16:35:08 -0700
Subject: [ofa-general] Re: [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as
	config file
In-Reply-To: <1207703425-19039-5-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
	<1207703425-19039-5-git-send-email-sashak@voltaire.com>
Message-ID: <1207697708.7695.47.camel@cardanus.llnl.gov>

Hey Sasha,

Just saw two typos, inlined below.

Al

On Wed, 2008-04-09 at 01:10 +0000, Sasha Khapyorsky wrote:
> Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is
> not specified) OpenSM config file. Default value is
> $sysconfdir/opensm/opensm.conf.
> 
> Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> ---
>  opensm/configure.in              |   20 ++++++++++++++++++++
>  opensm/include/opensm/osm_base.h |   21 +++++++++++++++++++++
>  opensm/opensm/main.c             |   25 +++++++++++--------------
>  3 files changed, 52 insertions(+), 14 deletions(-)
> 
> diff --git a/opensm/configure.in b/opensm/configure.in
> index a527c91..858eb60 100644
> --- a/opensm/configure.in
> +++ b/opensm/configure.in
> @@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR,
>  	[Define OpenSM config directory])
>  AC_SUBST(OPENSM_CONFIG_DIR)
>  
> +dnl Check for a different default OpenSm config file
> +OPENSM_CONFIG_FILE=opensm.conf
> +AC_MSG_CHECKING(for --with-opensm-conf-file )
> +AC_ARG_WITH(opensm-conf-file,
> +    AC_HELP_STRING([--with-opensm-conf-file=file],
> +                   [define a default OpenSM config file (default opensm.conf)]),
> +    [ case "$withval" in
> +    no)
> +        ;;
> +    *)
> +        OPENSM_CONFIG_FILE=$withval
> +        ;;
> +    esac ]
> +)
> +AC_MSG_RESULT(${OPENSM_CONFIG_FILE})
> +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE,
> +	["$CONF_DIR/$OPENSM_CONFIG_FILE"],
> +	[Define a default OpenSM config file])
> +AC_SUBST(OPENSM_CONFIG_FILE)
> +
>  dnl Check for a different default node name map file
>  NODENAMEMAPFILE=ib-node-name-map
>  AC_MSG_CHECKING(for --with-node-name-map )
> diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
> index 62d472e..1bd993e 100644
> --- a/opensm/include/opensm/osm_base.h
> +++ b/opensm/include/opensm/osm_base.h
> @@ -213,6 +213,27 @@ BEGIN_C_DECLS
>  #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log"
>  #endif
>  /***********/
> +
> +/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE
> +* NAME
> +*	OSM_DEFAULT_CONFIG_FILE
> +*
> +* DESCRIPTION
> +*	Specifies the default OpenSM config file name
> +*
> +* SYNOPSIS
> +*/
> +#ifdef __WIN__
> +#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf")
> +#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE)
> +#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE
> +#elif define (OPENSM_CONFIG_DIR)
   
"define" should be "defined"? (w/ 'd').

> +#define OSM_DEFAULT_OPENSM_CONFIG_FILE OPENSM_COFNIG_DIR "/opensm.conf"

typo COFNIG -> CONFIG

> +#else
> +#define OSM_DEFAULT_OPENSM_CONFIG_FILE "/etc/opensm/opensm.conf"
> +#endif /* __WIN__ */
> +/***********/
> +
>  /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE
>  * NAME
>  *	OSM_DEFAULT_PARTITION_CONFIG_FILE
> diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
> index e39037d..0576dcc 100644
> --- a/opensm/opensm/main.c
> +++ b/opensm/opensm/main.c
> @@ -133,8 +133,7 @@ static void show_usage(void)
>  	printf("-F <file-name>, --config <file-name>\n"
>  	       "          The name of the OpenSM config file. It has a same format\n"
>  	       "          as opensm.opts option cache file. When not specified\n"
> -	       "          $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n"
> -	       "          will be used (if exists).\n\n");
> +	       "          " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n");
>  	printf("-c\n"
>  	       "--cache-options\n"
>  	       "          Cache the given command line options into the file\n"
> @@ -594,8 +593,6 @@ int main(int argc, char *argv[])
>  {
>  	osm_opensm_t osm;
>  	osm_subn_opt_t opt;
> -	char conf_file[256];
> -	char *cache_dir;
>  	ib_net64_t sm_key = 0;
>  	ib_api_status_t status;
>  	uint32_t temp, dbg_lvl;
> @@ -684,13 +681,7 @@ int main(int argc, char *argv[])
>  
>  	osm_subn_set_default_opt(&opt);
>  
> -	/* try to open the options file from the cache dir */
> -	cache_dir = getenv("OSM_CACHE_DIR");
> -	if (!cache_dir || !(*cache_dir))
> -		cache_dir = OSM_DEFAULT_CACHE_DIR;
> -	snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
> -
> -	if (osm_subn_parse_conf_file(conf_file, &opt) < 0)
> +	if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0)
>  		printf("\nosm_subn_parse_conf_file failed!\n");
>  
>  	printf("Command Line Arguments:\n");
> @@ -1040,9 +1031,15 @@ int main(int argc, char *argv[])
>  	if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID))
>  		opt.guid = get_port_guid(&osm, opt.guid);
>  
> -	if (cache_options == TRUE
> -	    && osm_subn_write_conf_file(conf_file, &opt))
> -		printf("\nosm_subn_write_conf_file failed!\n");
> +	if (cache_options == TRUE) {
> +		char conf_file[256];
> +		char *cache_dir = getenv("OSM_CACHE_DIR");
> +		if (!cache_dir || !(*cache_dir))
> +			cache_dir = OSM_DEFAULT_CACHE_DIR;
> +		snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
> +		if (osm_subn_write_conf_file(conf_file, &opt))
> +			printf("\nosm_subn_write_conf_file failed!\n");
> +	}
>  
>  	status = osm_opensm_bind(&osm, opt.guid);
>  	if (status != IB_SUCCESS) {
-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


From chu11 at llnl.gov  Tue Apr  8 16:41:29 2008
From: chu11 at llnl.gov (Al Chu)
Date: Tue, 08 Apr 2008 16:41:29 -0700
Subject: [ofa-general] Re: [PATCH 4/4] opensm: use
	OSM_DEFAULT_CONFIG_FILE as config file
In-Reply-To: <1207697708.7695.47.camel@cardanus.llnl.gov>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
	<1207703425-19039-5-git-send-email-sashak@voltaire.com>
	<1207697708.7695.47.camel@cardanus.llnl.gov>
Message-ID: <1207698089.7695.49.camel@cardanus.llnl.gov>

On Tue, 2008-04-08 at 16:35 -0700, Al Chu wrote:
> Hey Sasha,
> 
> Just saw two typos, inlined below.

And noticed maybe one more below ...

Al

> Al
> 
> On Wed, 2008-04-09 at 01:10 +0000, Sasha Khapyorsky wrote:
> > Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is
> > not specified) OpenSM config file. Default value is
> > $sysconfdir/opensm/opensm.conf.
> > 
> > Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
> > ---
> >  opensm/configure.in              |   20 ++++++++++++++++++++
> >  opensm/include/opensm/osm_base.h |   21 +++++++++++++++++++++
> >  opensm/opensm/main.c             |   25 +++++++++++--------------
> >  3 files changed, 52 insertions(+), 14 deletions(-)
> > 
> > diff --git a/opensm/configure.in b/opensm/configure.in
> > index a527c91..858eb60 100644
> > --- a/opensm/configure.in
> > +++ b/opensm/configure.in
> > @@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR,
> >  	[Define OpenSM config directory])
> >  AC_SUBST(OPENSM_CONFIG_DIR)
> >  
> > +dnl Check for a different default OpenSm config file
> > +OPENSM_CONFIG_FILE=opensm.conf
> > +AC_MSG_CHECKING(for --with-opensm-conf-file )
> > +AC_ARG_WITH(opensm-conf-file,
> > +    AC_HELP_STRING([--with-opensm-conf-file=file],
> > +                   [define a default OpenSM config file (default opensm.conf)]),
> > +    [ case "$withval" in
> > +    no)
> > +        ;;
> > +    *)
> > +        OPENSM_CONFIG_FILE=$withval
> > +        ;;
> > +    esac ]
> > +)
> > +AC_MSG_RESULT(${OPENSM_CONFIG_FILE})
> > +AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE,
> > +	["$CONF_DIR/$OPENSM_CONFIG_FILE"],
> > +	[Define a default OpenSM config file])
> > +AC_SUBST(OPENSM_CONFIG_FILE)
> > +
> >  dnl Check for a different default node name map file
> >  NODENAMEMAPFILE=ib-node-name-map
> >  AC_MSG_CHECKING(for --with-node-name-map )
> > diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
> > index 62d472e..1bd993e 100644
> > --- a/opensm/include/opensm/osm_base.h
> > +++ b/opensm/include/opensm/osm_base.h
> > @@ -213,6 +213,27 @@ BEGIN_C_DECLS
> >  #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log"
> >  #endif
> >  /***********/
> > +
> > +/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE
> > +* NAME
> > +*	OSM_DEFAULT_CONFIG_FILE
> > +*
> > +* DESCRIPTION
> > +*	Specifies the default OpenSM config file name
> > +*
> > +* SYNOPSIS
> > +*/
> > +#ifdef __WIN__
> > +#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf")
> > +#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE)
> > +#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE
> > +#elif define (OPENSM_CONFIG_DIR)
>    
> "define" should be "defined"? (w/ 'd').
> 
> > +#define OSM_DEFAULT_OPENSM_CONFIG_FILE OPENSM_COFNIG_DIR "/opensm.conf"
> 
> typo COFNIG -> CONFIG
> 
> > +#else
> > +#define OSM_DEFAULT_OPENSM_CONFIG_FILE "/etc/opensm/opensm.conf"

OSM_DEFAULT_OPENSM_CONFIG_FILE should be OSM_DEFAULT_CONFIG_FILE?

> > +#endif /* __WIN__ */
> > +/***********/
> > +
> >  /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE
> >  * NAME
> >  *	OSM_DEFAULT_PARTITION_CONFIG_FILE
> > diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
> > index e39037d..0576dcc 100644
> > --- a/opensm/opensm/main.c
> > +++ b/opensm/opensm/main.c
> > @@ -133,8 +133,7 @@ static void show_usage(void)
> >  	printf("-F <file-name>, --config <file-name>\n"
> >  	       "          The name of the OpenSM config file. It has a same format\n"
> >  	       "          as opensm.opts option cache file. When not specified\n"
> > -	       "          $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n"
> > -	       "          will be used (if exists).\n\n");
> > +	       "          " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n");
> >  	printf("-c\n"
> >  	       "--cache-options\n"
> >  	       "          Cache the given command line options into the file\n"
> > @@ -594,8 +593,6 @@ int main(int argc, char *argv[])
> >  {
> >  	osm_opensm_t osm;
> >  	osm_subn_opt_t opt;
> > -	char conf_file[256];
> > -	char *cache_dir;
> >  	ib_net64_t sm_key = 0;
> >  	ib_api_status_t status;
> >  	uint32_t temp, dbg_lvl;
> > @@ -684,13 +681,7 @@ int main(int argc, char *argv[])
> >  
> >  	osm_subn_set_default_opt(&opt);
> >  
> > -	/* try to open the options file from the cache dir */
> > -	cache_dir = getenv("OSM_CACHE_DIR");
> > -	if (!cache_dir || !(*cache_dir))
> > -		cache_dir = OSM_DEFAULT_CACHE_DIR;
> > -	snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
> > -
> > -	if (osm_subn_parse_conf_file(conf_file, &opt) < 0)
> > +	if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0)
> >  		printf("\nosm_subn_parse_conf_file failed!\n");
> >  
> >  	printf("Command Line Arguments:\n");
> > @@ -1040,9 +1031,15 @@ int main(int argc, char *argv[])
> >  	if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID))
> >  		opt.guid = get_port_guid(&osm, opt.guid);
> >  
> > -	if (cache_options == TRUE
> > -	    && osm_subn_write_conf_file(conf_file, &opt))
> > -		printf("\nosm_subn_write_conf_file failed!\n");
> > +	if (cache_options == TRUE) {
> > +		char conf_file[256];
> > +		char *cache_dir = getenv("OSM_CACHE_DIR");
> > +		if (!cache_dir || !(*cache_dir))
> > +			cache_dir = OSM_DEFAULT_CACHE_DIR;
> > +		snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
> > +		if (osm_subn_write_conf_file(conf_file, &opt))
> > +			printf("\nosm_subn_write_conf_file failed!\n");
> > +	}
> >  
> >  	status = osm_opensm_bind(&osm, opt.guid);
> >  	if (status != IB_SUCCESS) {
-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


From weiny2 at llnl.gov  Tue Apr  8 16:48:44 2008
From: weiny2 at llnl.gov (weiny2 at llnl.gov)
Date: Tue, 8 Apr 2008 16:48:44 -0700 (PDT)
Subject: [ofa-general] [PATCH] opensm/opensm/osm_subnet.c: add checks for HOQ
 and Leaf HOQ input values
Message-ID: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>

As per Hal's comments change the alternate value for [leaf] HOQ to be
"infinity" when the user specifies a value larger than "infinity".

Ira

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-opensm-opensm-osm_subnet.c-add-checks-for-HOQ-and-L.patch
Type: /
Size: 2306 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080408/bf95e1e0/attachment.bin>

From arlin.r.davis at intel.com  Tue Apr  8 16:51:34 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Tue, 8 Apr 2008 16:51:34 -0700
Subject: [ofa-general] [PATCH][v2] dtest: add private data validation with
	connect and accept.
Message-ID: <B0095134066CC94FBC80973103FFA1FE06CA4DE4@orsmsx416.amr.corp.intel.com>


Adding private data validation with connect and accept.
Also provide code, with build option, to validate private data 
With consumer reject.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 test/dtest/dtest.c |   88
+++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/test/dtest/dtest.c b/test/dtest/dtest.c
index fa3b9a8..9c8ec71 100755
--- a/test/dtest/dtest.c
+++ b/test/dtest/dtest.c
@@ -559,8 +559,7 @@ complete:
     /* close the device */
     LOGPRINTF("%d Closing Interface Adaptor\n",getpid());
     start = get_time();
-    //ret = dat_ia_close( h_ia, DAT_CLOSE_ABRUPT_FLAG );
-    ret = dat_ia_close( h_ia, DAT_CLOSE_GRACEFUL_FLAG );
+    ret = dat_ia_close( h_ia, DAT_CLOSE_ABRUPT_FLAG );
     stop = get_time();
     time.close += ((stop - start)*1.0e6);
     if(ret != DAT_SUCCESS) {
@@ -730,7 +729,6 @@ send_msg(  void                   *data,
     return DAT_SUCCESS;
 }
 
-
 DAT_RETURN
 connect_ep( char *hostname, DAT_CONN_QUAL conn_id )
 {
@@ -743,6 +741,9 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id )
     DAT_RMR_TRIPLET         r_iov;
     DAT_DTO_COOKIE          cookie;
     int                     i;
+    unsigned char           *buf;
+    DAT_CR_PARAM            cr_param = { 0 };
+    unsigned char	    pdata[48] = { 0 };
     
      /* Register send message buffer */
     LOGPRINTF("%d Registering send Message Buffer %p, len %d\n",
@@ -867,17 +868,45 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id
)
                    getpid(),DT_EventToSTr(event.event_number));
            return( DAT_ABORT );
        }
-
+       
        /* use to test rdma_cma timeout logic */
 #if defined(_WIN32) || defined(_WIN64)
        if (delay) Sleep(delay*1000);
 #else
        if (delay) sleep(delay);
 #endif
+
         /* accept connect request from client */
        h_cr = event.event_data.cr_arrival_event_data.cr_handle;
        LOGPRINTF("%d Accepting connect request from
client\n",getpid());
-       ret = dat_cr_accept( h_cr, h_ep, 0, (DAT_PVOID)0 );
+
+       /* private data - check and send it back */
+       dat_cr_query( h_cr, DAT_CSP_FIELD_ALL, &cr_param); 
+
+       buf = (unsigned char*)cr_param.private_data;
+       LOGPRINTF("%d CONN REQUEST Private Data %p[0]=%d [47]=%d\n",
+                 getpid(),buf,buf[0],buf[47]);
+       for (i=0;i<48;i++) {
+           if (buf[i] != i+1) {
+               fprintf(stderr, "%d Error with CONNECT REQUEST"
+                       " private data: %p[%d]=%d s/be %d\n",
+                       getpid(), buf, i, buf[i], i+1);
+               dat_cr_reject(h_cr, 0, NULL);
+               return(DAT_ABORT);
+           }
+	   buf[i]++; /* change for trip back */
+       }	
+
+#ifdef TEST_REJECT_WITH_PRIVATE_DATA
+       printf("%d REJECT request with 48 bytes of private data\n",
getpid());
+       ret = dat_cr_reject(h_cr, 48, cr_param.private_data);
+       printf("\n%d: DAPL Test Complete. %s\n\n",
+	      getpid(), ret?"FAILED":"PASSED");
+       exit(0);
+#endif
+
+       ret = dat_cr_accept(h_cr, h_ep, 48, cr_param.private_data);
+
        if(ret != DAT_SUCCESS) {
            fprintf(stderr, "%d Error dat_cr_accept: %s\n",
                    getpid(),DT_RetToString(ret));
@@ -911,13 +940,16 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id
)
        remote_addr = *((DAT_IA_ADDRESS_PTR)target->ai_addr);
        freeaddrinfo(target);
 
+       for (i=0;i<48;i++) /* simple pattern in private data */
+           pdata[i]=i+1;
+
        LOGPRINTF("%d Connecting to server\n",getpid());
        ret = dat_ep_connect(   h_ep,
                                &remote_addr,
                                conn_id,
                                CONN_TIMEOUT,
-                               0,
-                               (DAT_PVOID)0,
+                               48,
+                               (DAT_PVOID)pdata,
                                0,
                                DAT_CONNECT_DEFAULT_FLAG  );
        if(ret != DAT_SUCCESS) {
@@ -940,11 +972,53 @@ connect_ep( char *hostname, DAT_CONN_QUAL conn_id
)
     else
            LOGPRINTF("%d dat_evd_wait for h_conn_evd completed\n",
getpid());
 
+#ifdef TEST_REJECT_WITH_PRIVATE_DATA
+    if (event.event_number != DAT_CONNECTION_EVENT_PEER_REJECTED) {
+           fprintf(stderr, "%d expected conn reject event : %s\n",
+
getpid(),DT_EventToSTr(event.event_number));
+           return( DAT_ABORT );
+    }
+    /* get the reject private data and validate */
+    buf = (unsigned
char*)event.event_data.connect_event_data.private_data;
+    printf("%d Received REJECT with private data %p[0]=%d [47]=%d\n",
+           getpid(),buf,buf[0],buf[47]);
+    for (i=0;i<48;i++) {
+        if (buf[i] != i+2) {
+            fprintf(stderr, "%d client: Error with REJECT event"
+                    " private data: %p[%d]=%d s/be %d\n",
+                    getpid(), buf, i, buf[i], i+2);
+            dat_ep_disconnect( h_ep, DAT_CLOSE_ABRUPT_FLAG);
+            return(DAT_ABORT);
+        }
+    }
+    printf("\n%d: DAPL Test Complete. PASSED\n\n", getpid());
+    exit(0);
+#endif
+
     if ( event.event_number != DAT_CONNECTION_EVENT_ESTABLISHED ) {
            fprintf(stderr, "%d Error unexpected conn event : %s\n",
 
getpid(),DT_EventToSTr(event.event_number));
            return( DAT_ABORT );
     }
+
+    /* check private data back from server  */
+    if (!server) {
+        buf = (unsigned
char*)event.event_data.connect_event_data.private_data;
+        LOGPRINTF("%d CONN Private Data %p[0]=%d [47]=%d\n",
+                  getpid(),buf,buf[0],buf[47]);
+        for (i=0;i<48;i++) {
+            if (buf[i] != i+2) {
+                fprintf(stderr, "%d Error with CONNECT event"
+                        " private data: %p[%d]=%d s/be %d\n",
+                        getpid(), buf, i, buf[i], i+2);
+                dat_ep_disconnect(h_ep, DAT_CLOSE_ABRUPT_FLAG);
+                LOGPRINTF("%d waiting for disconnect event...\n",
getpid());
+                dat_evd_wait(h_conn_evd, DAT_TIMEOUT_INFINITE, 1,
&event, &nmore);
+                return(DAT_ABORT);
+            }
+        }
+    }
+
     printf("\n%d CONNECTED!\n\n",getpid());
     connected = 1;
 
-- 
1.5.2.5


From arlin.r.davis at intel.com  Tue Apr  8 16:51:27 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 8 Apr 2008 16:51:27 -0700
Subject: [ofa-general] [PATCH][v2] dapl: add hooks in evd connection callback
	code to deliver private data with consumer reject.
Message-ID: <001301c899d3$76305f90$14fd070a@amr.corp.intel.com>


PEER rejects can include private data. The common code didn't support delivery
via the connect event data structure. Add the necessary hooks in
dapl_evd_connection_callback function and include checks in openib_cma
provider to check and deliver properly. Also, fix the private data size
check in dapls_ib_reject_connection function.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_evd_connection_callb.c |   22 ++++++++++++++++++++--
 dapl/openib_cma/dapl_ib_cm.c            |   16 ++++++++++------
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/dapl/common/dapl_evd_connection_callb.c b/dapl/common/dapl_evd_connection_callb.c
index d3a39a6..7f994b0 100644
--- a/dapl/common/dapl_evd_connection_callb.c
+++ b/dapl/common/dapl_evd_connection_callb.c
@@ -164,8 +164,26 @@ dapl_evd_connection_callback (
 
 	    break;
 	}
-	case DAT_CONNECTION_EVENT_DISCONNECTED:
 	case DAT_CONNECTION_EVENT_PEER_REJECTED:
+	{
+	    /* peer reject may include private data */
+	    if (prd_ptr != NULL) 
+		private_data_size  = 
+		    dapls_ib_private_data_size(
+					prd_ptr, DAPL_PDATA_CONN_REJ,
+		    			ep_ptr->header.owner_ia->hca_ptr);
+
+	    if (private_data_size > 0) 
+		dapl_os_memcpy (ep_ptr->private.private_data,
+				prd_ptr->private_data,
+				DAPL_MIN (private_data_size, 
+                                 DAPL_MAX_PRIVATE_DATA_SIZE));
+
+            dapl_dbg_log(DAPL_DBG_TYPE_CM | DAPL_DBG_TYPE_CALLBACK,
+			 "dapl_evd_connection_callback PEER REJ pd=%p sz=%d\n",
+			 prd_ptr, private_data_size);
+	}
+	case DAT_CONNECTION_EVENT_DISCONNECTED:
 	case DAT_CONNECTION_EVENT_UNREACHABLE:
 	case DAT_CONNECTION_EVENT_NON_PEER_REJECTED:
 	{
@@ -205,7 +223,7 @@ dapl_evd_connection_callback (
 			evd_ptr,
 			dat_event_num,
 			(DAT_HANDLE) ep_ptr,
-			private_data_size, /* 0 except for CONNECTED */
+			private_data_size, /* CONNECTED or REJECT */
 			ep_ptr->private.private_data );
 
 	if (dat_status != DAT_SUCCESS && 
diff --git a/dapl/openib_cma/dapl_ib_cm.c b/dapl/openib_cma/dapl_ib_cm.c
index 9b2062b..d3835b3 100755
--- a/dapl/openib_cma/dapl_ib_cm.c
+++ b/dapl/openib_cma/dapl_ib_cm.c
@@ -336,6 +336,7 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn,
 	case RDMA_CM_EVENT_REJECTED:
 	{
 		ib_cm_events_t cm_event;
+		unsigned char  *pdata = NULL;
 
 		dapl_dbg_log(
 			DAPL_DBG_TYPE_CM,
@@ -344,9 +345,11 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn,
 
 		/* valid REJ from consumer will always contain private data */
 		if (event->status == 28 &&
-			event->param.conn.private_data_len) 
+		    event->param.conn.private_data_len) {
 			cm_event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA;
-		else {
+			pdata = (unsigned char*)event->param.conn.private_data
+				+ sizeof(struct dapl_pdata_hdr);
+		} else {
 			cm_event = IB_CME_DESTINATION_REJECT;
 			dapl_log(DAPL_DBG_TYPE_WARN, 
 				"dapl_cma_active: non-consumer REJ,"
@@ -357,7 +360,7 @@ static void dapli_cm_active_cb(struct dapl_cm_id *conn,
 				ntohs(((struct sockaddr_in *)
 				&conn->cm_id->route.addr.dst_addr)->sin_port));
 		}		
-		dapl_evd_connection_callback(conn, cm_event, NULL, conn->ep);
+		dapl_evd_connection_callback(conn, cm_event, pdata, conn->ep);
 		break;
 	}
 	case RDMA_CM_EVENT_ESTABLISHED:
@@ -910,8 +913,9 @@ dapls_ib_reject_connection(
 	};
 
 	dapl_dbg_log(DAPL_DBG_TYPE_CM,
-		 " reject: cm_handle %p reason %x, ver=0x%x \n",
-		  cm_handle, reason, ntohl(pdata_hdr.version));
+		     " reject: handle %p reason %x, ver=%x, data %p, sz=%d\n",
+		     cm_handle, reason, ntohl(pdata_hdr.version),
+		     private_data, private_data_size);
 
 	if (cm_handle == IB_INVALID_HANDLE) {
 		dapl_dbg_log(DAPL_DBG_TYPE_ERR,
@@ -922,7 +926,7 @@ dapls_ib_reject_connection(
     
         if (private_data_size > 
 		dapls_ib_private_data_size(
-			NULL, IB_MAX_REJ_PDATA_SIZE, cm_handle->hca))
+			NULL, DAPL_PDATA_CONN_REJ, cm_handle->hca))
 		return DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG3);
 	
 	/* setup pdata_hdr and users data, in CR pdata buffer */
-- 
1.5.2.5


From info at fmf.gov.ng  Tue Apr  8 16:38:59 2008
From: info at fmf.gov.ng (DR USMAN SHAMSUDEEN)
Date: Wed,  9 Apr 2008 01:38:59 +0200 (CEST)
Subject: [ofa-general] NOTIFICATION OF PAYMENT VIA ATM CARD
Message-ID: <20080409001328.5C09D197AB78@bravo582.server4you.de>


>From the FMF, Federal Ministry of Finance
No 12 edidi lane Idumota Lagos
Honourable Minister of Finance, DR USMAN SHAMSUDEEN
Approved by the Nigerian Government
http://www.fmf.gov.ng/portal/detail.php?link=hmf


NOTIFICATION OF PAYMENT VIA ATM CARD

This is to officially inform you that we have verified your contract file presently on my desk, and I found out that you have not received your payment due to your lack of co-operation and not fulfilling the obligations giving to you in respect to your contract payment.

Secondly, you are hereby advised to stop dealing with some non-officials in the bank as this is an illegal act and will have to stop if you so wish to receive your payment immediately.

After the Board of director's meeting held in Abuja, we have resolved in finding a solution to your problem. We have arranged your payment through our SWIFT CARD PAYMENT CENTRE in Europe, America,Africa
and Asia Pacific,This is part of an instruction/mandate passed by the Senate in respect to overseas contract payment and debt re scheduling. And also the Nigerian Government is using this mean to rewards all the citizens of the United states and all part of europe including asia,australia,south america, Antartica e.t.c and all those who have lost their funds in either scam, or an uncompleted business, or otherwise.

You should know that if you are interested to receive your ATM card which will be credited with $920,000 united states dollars before it is been sent to you direct to your doorstep through any courier service of your choice.

Kindly get back to me with the following informations below so i can start arrangement on how to get your Atm Card shipped to you

(1) Your Full Name
(2) Full residential address
(3) Phone number

This message is supported by the Nigerian Government, After you might have started making use of your ATM card, you can reward my firm one way or the other you knows best.

Thanks for your co-operation.

>From the FMF, Federal Ministry of Finance
DR USMAN SHAMSUDEEN


From tom at opengridcomputing.com  Tue Apr  8 19:11:03 2008
From: tom at opengridcomputing.com (Tom Tucker)
Date: Tue, 08 Apr 2008 21:11:03 -0500
Subject: [ofa-general] Re: [PATCH] AMSO1100: Add check for NULL reply_msg in
	c2_intr
In-Reply-To: <adaej9ls99w.fsf@cisco.com>
References: <1207336240.1363.20.camel@trinity.ogc.int>
	<adamyo9s9wh.fsf@cisco.com> <1207337563.1363.22.camel@trinity.ogc.int>
	<adaej9ls99w.fsf@cisco.com>
Message-ID: <1207707063.9447.59.camel@trinity.ogc.int>


On Fri, 2008-04-04 at 12:35 -0700, Roland Dreier wrote:
> > I'm up to my eyeballs right now. If it's ok with you I'd say defer the
>  > refactoring.
> 
> No problem, I'll queue this up and if you ever get time to work on
> amso1100 you can send the refactoring.
> 
> But are you working on a pmtu fix?

Steve and I will noodle on what to do here and post something.


>  - R.


From krkumar2 at in.ibm.com  Tue Apr  8 22:32:01 2008
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Wed, 9 Apr 2008 11:02:01 +0530
Subject: [ofa-general] Test programs supporting RNIC's.
In-Reply-To: <47FA31C3.5090307@opengridcomputing.com>
Message-ID: <OF2BCE931C.151E5EBD-ON65257426.001DFB67-65257426.001E65E7@in.ibm.com>

Hi,

I am testing Chelsio cxgb3 RNICS on RHEL5.2 (beta). The following
list of applications are installed on the system as part of OFED
install option:

ib_clock_test         ib_read_lat           ib_write_bw_postlist
ib_rdma_bw            ib_send_bw            ib_write_lat
ib_rdma_lat           ib_send_lat
ib_read_bw

Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the
only program that supports RNIC's?

Thanks,

- KK


From sashak at voltaire.com  Wed Apr  9 03:01:08 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 9 Apr 2008 10:01:08 +0000
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks for
	HOQ and Leaf HOQ input values
In-Reply-To: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
Message-ID: <20080409100108.GB19834@sashak.voltaire.com>

Hi Ira,

On 16:48 Tue 08 Apr     , weiny2 at llnl.gov wrote:
> As per Hal's comments change the alternate value for [leaf] HOQ to be
> "infinity" when the user specifies a value larger than "infinity".

Actually I would prefer original version of the patch. The main reason
is that infinite packet life time is really dangerous thing - in case
when a fabric is routed with credit loops (very common case with default
min-hops routing) it leads to total fabric stuck and not just to some
performance degradation.

So I think it is safer to reject invalid value and to set the default
(log an error, etc.i). As it was done in the original version of the
patch.

Hal, do you agree?

Sasha


From sashak at voltaire.com  Wed Apr  9 03:09:31 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 9 Apr 2008 10:09:31 +0000
Subject: [ofa-general] Re: [PATCH 4/4] opensm: use OSM_DEFAULT_CONFIG_FILE as
	config file
In-Reply-To: <1207697708.7695.47.camel@cardanus.llnl.gov>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
	<1207703425-19039-5-git-send-email-sashak@voltaire.com>
	<1207697708.7695.47.camel@cardanus.llnl.gov>
Message-ID: <20080409100931.GD19834@sashak.voltaire.com>

On 16:35 Tue 08 Apr     , Al Chu wrote:
> Hey Sasha,
> 
> Just saw two typos, inlined below.

Thanks for catching this! I'm going to fix.

Sasha


From sashak at voltaire.com  Wed Apr  9 03:15:26 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 9 Apr 2008 10:15:26 +0000
Subject: [ofa-general] Re: [PATCH 4/4] opensm: use
	OSM_DEFAULT_CONFIG_FILE as config file
In-Reply-To: <1207698089.7695.49.camel@cardanus.llnl.gov>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
	<1207703425-19039-5-git-send-email-sashak@voltaire.com>
	<1207697708.7695.47.camel@cardanus.llnl.gov>
	<1207698089.7695.49.camel@cardanus.llnl.gov>
Message-ID: <20080409101526.GE19834@sashak.voltaire.com>

On 16:41 Tue 08 Apr     , Al Chu wrote:
> On Tue, 2008-04-08 at 16:35 -0700, Al Chu wrote:
> > Hey Sasha,
> > 
> > Just saw two typos, inlined below.
> 
> And noticed maybe one more below ...

Sure, it is another one. Thanks for catching this!

Sasha


From dotanb at dev.mellanox.co.il  Wed Apr  9 00:20:17 2008
From: dotanb at dev.mellanox.co.il (Dotan Barak)
Date: Wed, 09 Apr 2008 10:20:17 +0300
Subject: [ofa-general] Test programs supporting RNIC's.
In-Reply-To: <OF2BCE931C.151E5EBD-ON65257426.001DFB67-65257426.001E65E7@in.ibm.com>
References: <OF2BCE931C.151E5EBD-ON65257426.001DFB67-65257426.001E65E7@in.ibm.com>
Message-ID: <47FC6E31.8060208@dev.mellanox.co.il>

Krishna Kumar2 wrote:
> Hi,
>
> I am testing Chelsio cxgb3 RNICS on RHEL5.2 (beta). The following
> list of applications are installed on the system as part of OFED
> install option:
>
> ib_clock_test         ib_read_lat           ib_write_bw_postlist
> ib_rdma_bw            ib_send_bw            ib_write_lat
> ib_rdma_lat           ib_send_lat
> ib_read_bw
>
> Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the
> only program that supports RNIC's?
>   
Yes.

I know that there are planing to add the CMA support to all of the ib_* 
applications as well
(but today, only ib_rdma_bw and ib_rdma_lat support it).

Dotan


From sashak at voltaire.com  Wed Apr  9 03:20:06 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 9 Apr 2008 10:20:06 +0000
Subject: [ofa-general] [PATCH 4/4 v2] opensm: use OSM_DEFAULT_CONFIG_FILE as
	config file
In-Reply-To: <1207703425-19039-5-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
	<1207703425-19039-5-git-send-email-sashak@voltaire.com>
Message-ID: <20080409102006.GF19834@sashak.voltaire.com>


Use configurable OSM_DEFAULT_CONFIG_FILE as default (when '-F' option is
not specified) OpenSM config file. Default value is
$sysconfdir/opensm/opensm.conf.

Signed-off-by: Sasha Khapyorsky <sashak at voltaire.com>
---
 opensm/configure.in              |   20 ++++++++++++++++++++
 opensm/include/opensm/osm_base.h |   21 +++++++++++++++++++++
 opensm/opensm/main.c             |   25 +++++++++++--------------
 3 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/opensm/configure.in b/opensm/configure.in
index a527c91..858eb60 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -106,6 +106,26 @@ AC_DEFINE_UNQUOTED(OPENSM_CONFIG_DIR,
 	[Define OpenSM config directory])
 AC_SUBST(OPENSM_CONFIG_DIR)
 
+dnl Check for a different default OpenSm config file
+OPENSM_CONFIG_FILE=opensm.conf
+AC_MSG_CHECKING(for --with-opensm-conf-file )
+AC_ARG_WITH(opensm-conf-file,
+    AC_HELP_STRING([--with-opensm-conf-file=file],
+                   [define a default OpenSM config file (default opensm.conf)]),
+    [ case "$withval" in
+    no)
+        ;;
+    *)
+        OPENSM_CONFIG_FILE=$withval
+        ;;
+    esac ]
+)
+AC_MSG_RESULT(${OPENSM_CONFIG_FILE})
+AC_DEFINE_UNQUOTED(HAVE_DEFAULT_OPENSM_CONFIG_FILE,
+	["$CONF_DIR/$OPENSM_CONFIG_FILE"],
+	[Define a default OpenSM config file])
+AC_SUBST(OPENSM_CONFIG_FILE)
+
 dnl Check for a different default node name map file
 NODENAMEMAPFILE=ib-node-name-map
 AC_MSG_CHECKING(for --with-node-name-map )
diff --git a/opensm/include/opensm/osm_base.h b/opensm/include/opensm/osm_base.h
index 62d472e..289e49e 100644
--- a/opensm/include/opensm/osm_base.h
+++ b/opensm/include/opensm/osm_base.h
@@ -213,6 +213,27 @@ BEGIN_C_DECLS
 #define OSM_DEFAULT_LOG_FILE "/var/log/opensm.log"
 #endif
 /***********/
+
+/****d* OpenSM: Base/OSM_DEFAULT_CONFIG_FILE
+* NAME
+*	OSM_DEFAULT_CONFIG_FILE
+*
+* DESCRIPTION
+*	Specifies the default OpenSM config file name
+*
+* SYNOPSIS
+*/
+#ifdef __WIN__
+#define OSM_DEFAULT_CONFIG_FILE strcat(GetOsmCachePath(), "opensm.conf")
+#elif defined(HAVE_DEFAULT_OPENSM_CONFIG_FILE)
+#define OSM_DEFAULT_CONFIG_FILE HAVE_DEFAULT_OPENSM_CONFIG_FILE
+#elif defined (OPENSM_CONFIG_DIR)
+#define OSM_DEFAULT_CONFIG_FILE OPENSM_CONFIG_DIR "/opensm.conf"
+#else
+#define OSM_DEFAULT_CONFIG_FILE "/etc/opensm/opensm.conf"
+#endif /* __WIN__ */
+/***********/
+
 /****d* OpenSM: Base/OSM_DEFAULT_PARTITION_CONFIG_FILE
 * NAME
 *	OSM_DEFAULT_PARTITION_CONFIG_FILE
diff --git a/opensm/opensm/main.c b/opensm/opensm/main.c
index e39037d..0576dcc 100644
--- a/opensm/opensm/main.c
+++ b/opensm/opensm/main.c
@@ -133,8 +133,7 @@ static void show_usage(void)
 	printf("-F <file-name>, --config <file-name>\n"
 	       "          The name of the OpenSM config file. It has a same format\n"
 	       "          as opensm.opts option cache file. When not specified\n"
-	       "          $OSM_CACHE_DIR/opensm.opts (or /var/cache/opensm/opensm.opts)\n"
-	       "          will be used (if exists).\n\n");
+	       "          " OSM_DEFAULT_CONFIG_FILE " will be used (if exists).\n\n");
 	printf("-c\n"
 	       "--cache-options\n"
 	       "          Cache the given command line options into the file\n"
@@ -594,8 +593,6 @@ int main(int argc, char *argv[])
 {
 	osm_opensm_t osm;
 	osm_subn_opt_t opt;
-	char conf_file[256];
-	char *cache_dir;
 	ib_net64_t sm_key = 0;
 	ib_api_status_t status;
 	uint32_t temp, dbg_lvl;
@@ -684,13 +681,7 @@ int main(int argc, char *argv[])
 
 	osm_subn_set_default_opt(&opt);
 
-	/* try to open the options file from the cache dir */
-	cache_dir = getenv("OSM_CACHE_DIR");
-	if (!cache_dir || !(*cache_dir))
-		cache_dir = OSM_DEFAULT_CACHE_DIR;
-	snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
-
-	if (osm_subn_parse_conf_file(conf_file, &opt) < 0)
+	if (osm_subn_parse_conf_file(OSM_DEFAULT_CONFIG_FILE, &opt) < 0)
 		printf("\nosm_subn_parse_conf_file failed!\n");
 
 	printf("Command Line Arguments:\n");
@@ -1040,9 +1031,15 @@ int main(int argc, char *argv[])
 	if (opt.guid == 0 || cl_hton64(opt.guid) == CL_HTON64(INVALID_GUID))
 		opt.guid = get_port_guid(&osm, opt.guid);
 
-	if (cache_options == TRUE
-	    && osm_subn_write_conf_file(conf_file, &opt))
-		printf("\nosm_subn_write_conf_file failed!\n");
+	if (cache_options == TRUE) {
+		char conf_file[256];
+		char *cache_dir = getenv("OSM_CACHE_DIR");
+		if (!cache_dir || !(*cache_dir))
+			cache_dir = OSM_DEFAULT_CACHE_DIR;
+		snprintf(conf_file, sizeof(conf_file), "%s/opensm.opts", cache_dir);
+		if (osm_subn_write_conf_file(conf_file, &opt))
+			printf("\nosm_subn_write_conf_file failed!\n");
+	}
 
 	status = osm_opensm_bind(&osm, opt.guid);
 	if (status != IB_SUCCESS) {
-- 
1.5.4.1.122.gaa8d


From krkumar2 at in.ibm.com  Wed Apr  9 00:39:54 2008
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Wed, 9 Apr 2008 13:09:54 +0530
Subject: [ofa-general] Test programs supporting RNIC's.
In-Reply-To: <47FC6E31.8060208@dev.mellanox.co.il>
Message-ID: <OFD2222773.07053374-ON65257426.0029DC6F-65257426.002A1B48@in.ibm.com>

Hi Dotan,

> > ib_clock_test         ib_read_lat           ib_write_bw_postlist
> > ib_rdma_bw            ib_send_bw            ib_write_lat
> > ib_rdma_lat           ib_send_lat
> > ib_read_bw
> >
> > Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the
> > only program that supports RNIC's?
> >
> Yes.
>
> I know that there are planing to add the CMA support to all of the ib_*
> applications as well
> (but today, only ib_rdma_bw and ib_rdma_lat support it).

Yes, I had forgotten ib_rdma_lat on the list of CMA enabled apps. But
somehow it didn't work for me. I need to reboot to the distro OS and
locate the error, will post it later.

Thanks,

- KK


From holt at sgi.com  Wed Apr  9 06:17:09 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 9 Apr 2008 08:17:09 -0500
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <patchbomb.1207669443@duo.random>
References: <patchbomb.1207669443@duo.random>
Message-ID: <20080409131709.GR11364@sgi.com>

I applied this patch set with the xpmem version I am working up for
submission and the basic level-1 and level-2 tests passed.  The full mpi
regression test still tends to hang, but that appears to be a common
problem failure affecting either emm or mmu notifiers and therefore, I
am certain is a problem in my code.

Please note this is not an endorsement of one method over the other,
merely that under conditions where we would expect xpmem to pass the
regression tests, it does pass those tests.

Thanks,
Robin

On Tue, Apr 08, 2008 at 05:44:03PM +0200, Andrea Arcangeli wrote:
> The difference with #v11 is a different implementation of mm_lock that
> guarantees handling signals in O(N). It's also more lowlatency friendly. 
> 
> Note that mmu_notifier_unregister may also fail with -EINTR if there are
> signal pending or the system runs out of vmalloc space or physical memory,
> only exit_mmap guarantees that any kernel module can be unloaded in presence
> of an oom condition.
> 
> Either #v11 or the first three #v12 1,2,3 patches are suitable for inclusion
> in -mm, pick what you prefer looking at the mmu_notifier_register retval and
> mm_lock retval difference, I implemented and slighty tested both. GRU and KVM
> only needs 1,2,3, XPMEM needs the rest of the patchset too (4, ...) but all
> patches from 4 to the end can be deffered to a second merge window.


From mairie.nd.riez at free.fr  Wed Apr  9 06:21:43 2008
From: mairie.nd.riez at free.fr (EuroSoftware)
Date: Wed, 9 Apr 2008 14:21:43 +0100
Subject: [ofa-general] Weniger zahlen fuer perfekte Standardsoftware
Message-ID: <01c89a4d$08bfb580$7e349cd5@mairie.nd.riez>

MS Office cheap as chipsHier bekommen Sie Ihre Software sofort. Bezahlen und unverzueglich downloaden - so geht es bei uns. Wir haben Programme in allen europaeischen Sprachen, diese sind sowohl fuer Windows als auch fuer Macintosh geeignet. Unsere Programme sind sehr preiswert, aber es handelt sich nur um originale Vollversionen. Bestellen Sie die feinste Software
http://deannadonohox445.googlepages.com* Office Enterprise 2007: $79.95
* Adobe Acrobat 8.0 Professional: $69.95
* Adobe Photoshop CS2 with ImageReady CS2: $79.95 
* Office System Professional 2003 (5 Cds): $59.95
Kaufen Sie die perfekt funktionierte Software
http://deannadonohox445.googlepages.comBei uns kaufen Sie sicher ein, denn unsere kompetenten Mitarbeiten vom Kundencenter werden Ihnen bei der Softwareinstallation weiterhelfen. Wir antworten unverzueglich und Sie bekommen von uns eine Geld-Zurueck-Garantie. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080409/3e7e1109/attachment.html>

From hrosenstock at xsigo.com  Wed Apr  9 06:27:09 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 06:27:09 -0700
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <20080409100108.GB19834@sashak.voltaire.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409100108.GB19834@sashak.voltaire.com>
Message-ID: <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote:
> Hi Ira,
> 
> On 16:48 Tue 08 Apr     , weiny2 at llnl.gov wrote:
> > As per Hal's comments change the alternate value for [leaf] HOQ to be
> > "infinity" when the user specifies a value larger than "infinity".
> 
> Actually I would prefer original version of the patch. The main reason
> is that infinite packet life time is really dangerous thing - in case
> when a fabric is routed with credit loops (very common case with default
> min-hops routing) it leads to total fabric stuck and not just to some
> performance degradation.
> 
> So I think it is safer to reject invalid value and to set the default
> (log an error, etc.i). As it was done in the original version of the
> patch.
> 
> Hal, do you agree?

Safer yes but I think it is less to the intent of the admin who just
doesn't understand the max value for this and that's why I proposed this
change. My preference is to max it out but it comes down to a judgment
call. There's a downside either way.

-- Hal

> Sasha
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From swise at opengridcomputing.com  Wed Apr  9 07:03:35 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 09 Apr 2008 09:03:35 -0500
Subject: [ofa-general] Test programs supporting RNIC's.
In-Reply-To: <OFD2222773.07053374-ON65257426.0029DC6F-65257426.002A1B48@in.ibm.com>
References: <OFD2222773.07053374-ON65257426.0029DC6F-65257426.002A1B48@in.ibm.com>
Message-ID: <47FCCCB7.2080407@opengridcomputing.com>

Krishna Kumar2 wrote:
> Hi Dotan,
>
>   
>>> ib_clock_test         ib_read_lat           ib_write_bw_postlist
>>> ib_rdma_bw            ib_send_bw            ib_write_lat
>>> ib_rdma_lat           ib_send_lat
>>> ib_read_bw
>>>
>>> Out of this, only ib_rdma_bw seems to be CMA enabled. Is this the
>>> only program that supports RNIC's?
>>>
>>>       
>> Yes.
>>
>> I know that there are planing to add the CMA support to all of the ib_*
>> applications as well
>> (but today, only ib_rdma_bw and ib_rdma_lat support it).
>>     
>
> Yes, I had forgotten ib_rdma_lat on the list of CMA enabled apps. But
> somehow it didn't work for me. I need to reboot to the distro OS and
> locate the error, will post it later.
>
>   

Krishna, if you are interested, you could add cma support to the rest of 
these.  I can help by answering questions and/or testing things...

Steve.


From Brian.Murrell at Sun.COM  Wed Apr  9 07:07:46 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Wed, 09 Apr 2008 10:07:46 -0400
Subject: [ofa-general] ipath_kernel.h:1115: error: implicit declaration of
 function	'writeq' on rhel5
Message-ID: <1207750066.3303.28.camel@pc.ilinx>

I'm trying to build OFED 1.3's kernel-ib for RHEL5 and getting:

  gcc -m32 -Wp,-MD,/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/.ipath_cq.o.d  -nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.1.1/include -D__KERNEL__ \
-include include/linux/autoconf.h \
-include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \
-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \
 \
 \
-I/cache/build/BUILD/ofa_kernel-1.3/include \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \
-I/usr/local/include/scst \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \
-Iinclude \
 \
  -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os -pipe -msoft-float -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts  -mpreferred-stack-boundary=2  -march=i686 -mtune=generic -mtune=generic -mregparm=3 -ffreestanding -Iinclude/asm-i386/mach-generic -Iinclude/asm-i386/mach-default -fomit-frame-pointer -g  -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign -DIPATH_IDSTR='"QLogic kernel.org driver"' -DIPATH_KERN_TYPE=0  -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(ipath_cq)"  -D"KBUILD_MODNAME=KBUILD_STR(ib_ipath)" -c -o /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/.tmp_ipath_cq.o /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.c
In file included from /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_verbs.h:45,
                 from /cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.c:37:
/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h: In function 'ipath_write_ureg':
/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h:1115: error: implicit declaration of function 'writeq'
/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h: In function 'ipath_read_kreg64':
/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_kernel.h:1132: error: implicit declaration of function 'readq'
make[4]: *** [/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath/ipath_cq.o] Error 1
make[3]: *** [/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/hw/ipath] Error 2
make[2]: *** [/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband] Error 2
make[1]: *** [_module_/cache/build/BUILD/ofa_kernel-1.3] Error 2

The "make kernel" starts out with:

+ make kernel
Building kernel modules
Kernel version: 2.6.18-53.1.14.el5_lustre.1.6.4.55.20080409120349smp
Modules directory: //lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080409120349smp
Kernel sources: /cache/build/BUILD/lustre-kernel-2.6.18/lustre/linux
env CWD=/cache/build/BUILD/ofa_kernel-1.3 BACKPORT_INCLUDES=-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \
		make -C /cache/build/BUILD/lustre-kernel-2.6.18/lustre/linux SUBDIRS="/cache/build/BUILD/ofa_kernel-1.3" \
		V=1  \
		CONFIG_MEMTRACK= \
		CONFIG_DEBUG_INFO=y \
		CONFIG_INFINIBAND=m \

So it seems to be correctly identifying the kernel version
"2.6.18-53.1.14.el5" as RHEL5 kernel as it does set
BACKPORT_INCLUDES=-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/

Any ideas why there is no readq/writeq being found?

Funny enough this same build on x86_64 is successful.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080409/4e8ffc0b/attachment.sig>

From andrea at qumranet.com  Wed Apr  9 07:29:45 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 9 Apr 2008 16:29:45 +0200
Subject: [ofa-general] Re: [patch 02/10] emm: notifier logic
In-Reply-To: <Pine.LNX.4.64.0804081320160.30874@schroedinger.engr.sgi.com>
References: <20080404223048.374852899@sgi.com>
	<20080404223131.469710551@sgi.com>
	<20080405005759.GH14784@duo.random>
	<Pine.LNX.4.64.0804062246030.18148@schroedinger.engr.sgi.com>
	<20080407060602.GE9309@duo.random>
	<Pine.LNX.4.64.0804062314080.18728@schroedinger.engr.sgi.com>
	<20080407071330.GH9309@duo.random>
	<Pine.LNX.4.64.0804081320160.30874@schroedinger.engr.sgi.com>
Message-ID: <20080409142945.GS10133@duo.random>

On Tue, Apr 08, 2008 at 01:23:33PM -0700, Christoph Lameter wrote:
> It may also be useful to allow invalidate_start() to fail in some contexts 
> (try_to_unmap f.e., maybe if a certain flag is passed). This may allow the 
> device to get out of tight situations (pending I/O f.e. or time out if 
> there is no response for network communications). But then that 
> complicates the API.

That also complicates the fact that there can't be a spte mapped and a
pte not mapped or the spte would leak unswappable memory, so a failure
should re-establish the pte and undo the ptep_clear_flush or
equivalent... I think we can change the API later if needed. This is
an internal-only API invisible to userland so it can change and break
anytime to make the whole kernel faster and better (ask Greg for
kernel internal APIs).

One important detail is that because the secondary mmu page fault can
happen concurrently against invaldiate_page (there wasn't a
range_begin to block it), the secondary mmu page fault must ensure
that the pte is still established, before establishing the spte (with
proper locking that will block a concurrent invalidate_page). Having a
range_begin before the ptep_clear_flush effectively make lifes a bit
easier but it's not needed as those are locking issues that the driver
can solve (unlike range_begin being missed, now fixed by mm_lock) and
this allows for higher performance both when the lock is armed and
disarmed. I'm going to solve all the locking for kvm with spinlocks
and/or seqlocks to avoid any dependency on the patches that makes the
mmu notifier sleep capable.


From andrea at qumranet.com  Wed Apr  9 07:44:01 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 9 Apr 2008 16:44:01 +0200
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080409131709.GR11364@sgi.com>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
Message-ID: <20080409144401.GT10133@duo.random>

On Wed, Apr 09, 2008 at 08:17:09AM -0500, Robin Holt wrote:
> I applied this patch set with the xpmem version I am working up for
> submission and the basic level-1 and level-2 tests passed.  The full mpi
> regression test still tends to hang, but that appears to be a common
> problem failure affecting either emm or mmu notifiers and therefore, I
> am certain is a problem in my code.
> 
> Please note this is not an endorsement of one method over the other,
> merely that under conditions where we would expect xpmem to pass the
> regression tests, it does pass those tests.

Thanks a lot for testing! #v12 works great with KVM too. (I'm now in
the process of chagning the KVM patch to drop the page pinning)

BTW, how did you implement invalidate_page? As this?

       	invalidate_page() {
       		invalidate_range_begin()
		invalidate_range_end()
	}

If yes, I prefer to remind you that normally invalidate_range_begin is
always called before zapping the pte. In the invalidate_page case
instead, invalidate_range_begin is called _after_ the pte has been
zapped already.

Now there's no problem if the pte is established and the spte isn't
established. But it must never happen that the spte is established and
the pte isn't established (with page-pinning that means unswappable
memlock leak, without page-pinning it would mean memory corruption).

So the range_begin must serialize against the secondary mmu page fault
so that it can't establish the spte on a pte that was zapped by the
rmap code after get_user_pages/follow_page returned. I think your
range_begin already does that so you should be ok but I wanted to
remind about this slight difference in implementing invalidate_page as
I suggested above in previous email just to be sure ;).

This is the race you must guard against in invalidate_page:


   	 CPU0 	     	      CPU1
	 try_to_unmap on page
			      secondary mmu page fault
			      get_user_pages()/follow_page found a page
         ptep_clear_flush
	 invalidate_page()
	  invalidate_range_begin()
          invalidate_range_end()
          return from invalidate_page
			      establish spte on page
			      return from secodnary mmu page fault

If your range_begin already serializes in a hard way against the
secondary mmu page fault, my previously "trivial" suggested
implementation for invalidate_page should work just fine and this this
saves 1 branch for each try_to_unmap_one if compared to the emm
implementation. The branch check is inlined and it checks against the
mmu_notifier_head that is the hot cacheline, no new cachline is
checked just one branch is saved and so it worth it IMHO even if it
doesn't provide any other advantage if you implement it the way above.


From huanwei at cse.ohio-state.edu  Wed Apr  9 08:18:19 2008
From: huanwei at cse.ohio-state.edu (wei huang)
Date: Wed, 9 Apr 2008 11:18:19 -0400 (EDT)
Subject: [ofa-general] MVAPICH2 crashes on mixed fabric
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D867824C@EPEXCH1.qlogic.org>
Message-ID: <Pine.GSO.4.40.0804091117310.28125-100000@omicron.cse.ohio-state.edu>

Hi Mike,

Is the arbel based DDR cards? If so, try put:

-env MV2_DEFAULT_MTU IBV_MTU_2048

in addition to the environmental variables you are using. Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Tue, 8 Apr 2008, Mike Heinz wrote:

> Wei,
>
> No joy. The following command:
>
> + /usr/mpi/pgi/mvapich2-1.0.2/bin/mpiexec -1 -machinefile
> /home/mheinz/mvapich2-pgi/mpi_hosts -n 4 -env MV2_USE_COALESCE 0 -env
> MV2_VBUF_TOTAL_SIZE 9216 PMB2.2.1/SRC_PMB/PMB-MPI1
>
> Produced the following error:
>
> [0] Abort: Got FATAL event 3
>  at line 796 in file ibv_channel_manager.c
> rank 0 in job 48  compute-0-3.local_33082   caused collective abort of
> all ranks
>   exit status of rank 0: killed by signal 9
> + set +x
>
> Note that compute-0-3 has a connect-x HCA.
>
> If I restrict the ring to only nodes with connect-x the problem does not
> occur.
>
> This isn't a huge problem for me; this 4-node cluster is actually for
> testing the creation of Rocks Rolls and I can simply record it as a
> known limitation when using mvapich2 - but it could impact users in the
> field if a cluster gets extended with newer HCAs.
>
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
>
> -----Original Message-----
> From: wei huang [mailto:huanwei at cse.ohio-state.edu]
> Sent: Sunday, April 06, 2008 8:58 PM
> To: Mike Heinz
> Cc: general at lists.openfabrics.org
> Subject: Re: [ofa-general] MVAPICH2 crashes on mixed fabric
>
> Hi Mike,
>
> Currently mvapich2 will detect different HCA type and thus select
> different parameters for communication, which may cause the problem. We
> are working on this feature and it will be available in our next
> release.
> For now, if you want to run on this setup, please set few environmental
> variables like:
>
> mpiexec -n 2 -env MV2_USE_COALESCE 0 -env MV2_VBUF_TOTAL_SIZE 9216
> ./a.out
>
> Please let us know if this works. Thanks.
>
> Regards,
> Wei Huang
>
> 774 Dreese Lab, 2015 Neil Ave,
> Dept. of Computer Science and Engineering Ohio State University OH 43210
> Tel: (614)292-8501
>
>
> On Fri, 4 Apr 2008, Mike Heinz wrote:
>
> > Hey, all, I'm not sure if this is a known bug or some sort of
> > limitation I'm unaware of, but I've been building and testing with the
>
> > OFED 1.3 GA release on a small fabric that has a mix of Arbel-based
> > and newer Connect-X HCAs.
> >
> > What I've discovered is that mvapich and openmpi work fine across the
> > entire fabric, but mvapich2 crashes when I use a mix of Arbels and
> > Connect-X. The errors vary depending on the test program but here's an
> > example:
> >
> > [mheinz at compute-0-0 IMB-3.0]$ mpirun -n 5 ./IMB-MPI1 .
> > .
> > .
> > (output snipped)
> > .
> > .
> > .
> >
> > #---------------------------------------------------------------------
> > --
> > ------
> > # Benchmarking Sendrecv
> > # #processes = 2
> > # ( 3 additional processes waiting in MPI_Barrier)
> > #---------------------------------------------------------------------
> > --
> > ------
> >        #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
> > Mbytes/sec
> >             0         1000         3.51         3.51         3.51
> > 0.00
> >             1         1000         3.63         3.63         3.63
> > 0.52
> >             2         1000         3.67         3.67         3.67
> > 1.04
> >             4         1000         3.64         3.64         3.64
> > 2.09
> >             8         1000         3.67         3.67         3.67
> > 4.16
> >            16         1000         3.67         3.67         3.67
> > 8.31
> >            32         1000         3.74         3.74         3.74
> > 16.32
> >            64         1000         3.90         3.90         3.90
> > 31.28
> >           128         1000         4.75         4.75         4.75
> > 51.39
> >           256         1000         5.21         5.21         5.21
> > 93.79
> >           512         1000         5.96         5.96         5.96
> > 163.77
> >          1024         1000         7.88         7.89         7.89
> > 247.54
> >          2048         1000        11.42        11.42        11.42
> > 342.00
> >          4096         1000        15.33        15.33        15.33
> > 509.49
> >          8192         1000        22.19        22.20        22.20
> > 703.83
> >         16384         1000        34.57        34.57        34.57
> > 903.88
> >         32768         1000        51.32        51.32        51.32
> > 1217.94
> >         65536          640        85.80        85.81        85.80
> > 1456.74
> >        131072          320       155.23       155.24       155.24
> > 1610.40
> >        262144          160       301.84       301.86       301.85
> > 1656.39
> >        524288           80       598.62       598.69       598.66
> > 1670.31
> >       1048576           40      1175.22      1175.30      1175.26
> > 1701.69
> >       2097152           20      2309.05      2309.05      2309.05
> > 1732.32
> >       4194304           10      4548.72      4548.98      4548.85
> > 1758.64
> > [0] Abort: Got FATAL event 3
> >  at line 796 in file ibv_channel_manager.c
> > rank 0 in job 1  compute-0-0.local_36049   caused collective abort of
> > all ranks
> >   exit status of rank 0: killed by signal 9
> >
> > If, however, I define my mpdring to contain only Connect-X systems OR
> > only Arbel systems, IMB-MPI1 runs to completion.
> >
> > Can any suggest a workaround or is this a real bug with mvapich2?
> >
> > --
> > Michael Heinz
> > Principal Engineer, Qlogic Corporation King of Prussia, Pennsylvania
> >
> >
>
>
>


From weiny2 at llnl.gov  Wed Apr  9 08:38:39 2008
From: weiny2 at llnl.gov (weiny2 at llnl.gov)
Date: Wed, 9 Apr 2008 08:38:39 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add 
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> 
	<20080409100108.GB19834@sashak.voltaire.com>
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
Message-ID: <50410.128.15.244.131.1207755519.squirrel@127.0.0.1>

> On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote:
>> Hi Ira,
>>
>> On 16:48 Tue 08 Apr     , weiny2 at llnl.gov wrote:
>> > As per Hal's comments change the alternate value for [leaf] HOQ to be
>> > "infinity" when the user specifies a value larger than "infinity".
>>
>> Actually I would prefer original version of the patch. The main reason
>> is that infinite packet life time is really dangerous thing - in case
>> when a fabric is routed with credit loops (very common case with default
>> min-hops routing) it leads to total fabric stuck and not just to some
>> performance degradation.
>>
>> So I think it is safer to reject invalid value and to set the default
>> (log an error, etc.i). As it was done in the original version of the
>> patch.
>>
>> Hal, do you agree?
>
> Safer yes but I think it is less to the intent of the admin who just
> doesn't understand the max value for this and that's why I proposed this
> change. My preference is to max it out but it comes down to a judgment
> call. There's a downside either way.

What if we set it to 0x13?  This would be the maximum value that will not
"lock" up the fabric.  We could also add to the error message that the
admin needs to specify 0x14 if they specifically want "infinity" to be
set?

Ira


From rdreier at cisco.com  Wed Apr  9 09:16:22 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 09 Apr 2008 09:16:22 -0700
Subject: [ofa-general] ipath_kernel.h:1115: error: implicit declaration of
	function	'writeq' on rhel5
In-Reply-To: <1207750066.3303.28.camel@pc.ilinx> (Brian J. Murrell's message
	of "Wed, 09 Apr 2008 10:07:46 -0400")
References: <1207750066.3303.28.camel@pc.ilinx>
Message-ID: <adaiqyrm2ax.fsf@cisco.com>

ipath doesn't work on any 32-bit architecture.

The kernel Kconfig file has

config INFINIBAND_IPATH
        tristate "QLogic InfiniPath Driver"
        depends on (PCI_MSI || HT_IRQ) && 64BIT && NET

but I guess the OFED build system doesn't enforce that.

 - R.


From rdreier at cisco.com  Wed Apr  9 09:18:37 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 09 Apr 2008 09:18:37 -0700
Subject: [ofa-general] ipath can work without MSI now?
Message-ID: <adaej9fm276.fsf@cisco.com>

Given the commit below, does it make sense to change the Kconfig stuff

config INFINIBAND_IPATH
        tristate "QLogic InfiniPath Driver"
        depends on (PCI_MSI || HT_IRQ) && 64BIT && NET

to remove the (PCI_MSI || HT_IRQ), since it seems your new HCA would
still work on a non-MSI-enabled kernel?

commit 9c7b278d87088350aaf9dfe0ad50afa15722dbf6
Author: Dave Olson <dave.olson at qlogic.com>
Date:   Tue Jan 8 11:50:18 2008 -0800

    IB/ipath: Fix check for no interrupts to reliably fallback to INTx
    
    Newer HCAs support MSI interrupts and also INTx interrupts.  Fix the
    code so that INTx can be reliably enabled if MSI interrupts are not
    working.
    
    Signed-off-by: Dave Olson <dave.olson at qlogic.com>
    Signed-off-by: Roland Dreier <rolandd at cisco.com>


From dave.olson at qlogic.com  Wed Apr  9 09:22:13 2008
From: dave.olson at qlogic.com (Dave Olson)
Date: Wed, 9 Apr 2008 09:22:13 -0700 (PDT)
Subject: [ofa-general] Re: ipath can work without MSI now?
In-Reply-To: <adaej9fm276.fsf@cisco.com>
References: <adaej9fm276.fsf@cisco.com>
Message-ID: <alpine.LFD.1.00.0804090919140.10467@topaz.pathscale.com>

On Wed, 9 Apr 2008, Roland Dreier wrote:

| Given the commit below, does it make sense to change the Kconfig stuff
| 
| config INFINIBAND_IPATH
|         tristate "QLogic InfiniPath Driver"
|         depends on (PCI_MSI || HT_IRQ) && 64BIT && NET
| 
| to remove the (PCI_MSI || HT_IRQ), since it seems your new HCA would
| still work on a non-MSI-enabled kernel?

Not really, because it means the 6120 chips would not work,
and the new 7220 still works better with MSI.

I don't think the config mechanism can handle that as it stands
unless we created a different driver.  Or have I missed something
that would cover that issue?

The number of systems without MSI keeps dropping, so I think it's best
to leave it as is.

Dave Olson
dave.olson at qlogic.com


From hrosenstock at xsigo.com  Wed Apr  9 09:30:32 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 09:30:32 -0700
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add 
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409100108.GB19834@sashak.voltaire.com>
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
	<50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
Message-ID: <1207758632.15625.498.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 08:38 -0700, weiny2 at llnl.gov wrote:
> > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote:
> >> Hi Ira,
> >>
> >> On 16:48 Tue 08 Apr     , weiny2 at llnl.gov wrote:
> >> > As per Hal's comments change the alternate value for [leaf] HOQ to be
> >> > "infinity" when the user specifies a value larger than "infinity".
> >>
> >> Actually I would prefer original version of the patch. The main reason
> >> is that infinite packet life time is really dangerous thing - in case
> >> when a fabric is routed with credit loops (very common case with default
> >> min-hops routing) it leads to total fabric stuck and not just to some
> >> performance degradation.
> >>
> >> So I think it is safer to reject invalid value and to set the default
> >> (log an error, etc.i). As it was done in the original version of the
> >> patch.
> >>
> >> Hal, do you agree?
> >
> > Safer yes but I think it is less to the intent of the admin who just
> > doesn't understand the max value for this and that's why I proposed this
> > change. My preference is to max it out but it comes down to a judgment
> > call. There's a downside either way.
> 
> What if we set it to 0x13?  This would be the maximum value that will not
> "lock" up the fabric.  We could also add to the error message that the
> admin needs to specify 0x14 if they specifically want "infinity" to be
> set?

So disallow the setting to infinity ?

-- Hal

> Ira
> 
> 
> 


From weiny2 at llnl.gov  Wed Apr  9 09:36:09 2008
From: weiny2 at llnl.gov (weiny2 at llnl.gov)
Date: Wed, 9 Apr 2008 09:36:09 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add  
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <1207758632.15625.498.camel@hrosenstock-ws.xsigo.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1> 
	<20080409100108.GB19834@sashak.voltaire.com> 
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com> 
	<50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
	<1207758632.15625.498.camel@hrosenstock-ws.xsigo.com>
Message-ID: <50692.128.15.244.131.1207758969.squirrel@127.0.0.1>

> On Wed, 2008-04-09 at 08:38 -0700, weiny2 at llnl.gov wrote:
>> > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote:
>> >> Hi Ira,
>> >>
>> >> On 16:48 Tue 08 Apr     , weiny2 at llnl.gov wrote:
>> >> > As per Hal's comments change the alternate value for [leaf] HOQ to
>> be
>> >> > "infinity" when the user specifies a value larger than "infinity".
>> >>
>> >> Actually I would prefer original version of the patch. The main
>> reason
>> >> is that infinite packet life time is really dangerous thing - in case
>> >> when a fabric is routed with credit loops (very common case with
>> default
>> >> min-hops routing) it leads to total fabric stuck and not just to some
>> >> performance degradation.
>> >>
>> >> So I think it is safer to reject invalid value and to set the default
>> >> (log an error, etc.i). As it was done in the original version of the
>> >> patch.
>> >>
>> >> Hal, do you agree?
>> >
>> > Safer yes but I think it is less to the intent of the admin who just
>> > doesn't understand the max value for this and that's why I proposed
>> this
>> > change. My preference is to max it out but it comes down to a judgment
>> > call. There's a downside either way.
>>
>> What if we set it to 0x13?  This would be the maximum value that will
>> not
>> "lock" up the fabric.  We could also add to the error message that the
>> admin needs to specify 0x14 if they specifically want "infinity" to be
>> set?
>
> So disallow the setting to infinity ?
>

No, if you want infinity you have to specify 0x14 (19) in the opensm.opts
file.  For example, specifying 100 will set the value to 0x13 and warn the
user that if they want infinity they will have to specify it explicitly;
ie head_of_queue_lifetime = 0x14

Ira


From hrosenstock at xsigo.com  Wed Apr  9 09:40:28 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 09:40:28 -0700
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add  
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <50692.128.15.244.131.1207758969.squirrel@127.0.0.1>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409100108.GB19834@sashak.voltaire.com>
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
	<50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
	<1207758632.15625.498.camel@hrosenstock-ws.xsigo.com>
	<50692.128.15.244.131.1207758969.squirrel@127.0.0.1>
Message-ID: <1207759228.15625.502.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 09:36 -0700, weiny2 at llnl.gov wrote:
> > On Wed, 2008-04-09 at 08:38 -0700, weiny2 at llnl.gov wrote:
> >> > On Wed, 2008-04-09 at 10:01 +0000, Sasha Khapyorsky wrote:
> >> >> Hi Ira,
> >> >>
> >> >> On 16:48 Tue 08 Apr     , weiny2 at llnl.gov wrote:
> >> >> > As per Hal's comments change the alternate value for [leaf] HOQ to
> >> be
> >> >> > "infinity" when the user specifies a value larger than "infinity".
> >> >>
> >> >> Actually I would prefer original version of the patch. The main
> >> reason
> >> >> is that infinite packet life time is really dangerous thing - in case
> >> >> when a fabric is routed with credit loops (very common case with
> >> default
> >> >> min-hops routing) it leads to total fabric stuck and not just to some
> >> >> performance degradation.
> >> >>
> >> >> So I think it is safer to reject invalid value and to set the default
> >> >> (log an error, etc.i). As it was done in the original version of the
> >> >> patch.
> >> >>
> >> >> Hal, do you agree?
> >> >
> >> > Safer yes but I think it is less to the intent of the admin who just
> >> > doesn't understand the max value for this and that's why I proposed
> >> this
> >> > change. My preference is to max it out but it comes down to a judgment
> >> > call. There's a downside either way.
> >>
> >> What if we set it to 0x13?  This would be the maximum value that will
> >> not
> >> "lock" up the fabric.  We could also add to the error message that the
> >> admin needs to specify 0x14 if they specifically want "infinity" to be
> >> set?
> >
> > So disallow the setting to infinity ?
> >
> 
> No, if you want infinity you have to specify 0x14 (19) in the opensm.opts
> file.  For example, specifying 100 will set the value to 0x13 and warn the
> user that if they want infinity they will have to specify it explicitly;
> ie head_of_queue_lifetime = 0x14

That's another choice but seems a little weird to me that 20 is infinite
and >20 is set less than that but this is a judgment call. Not sure what
others think. At this point, I have nothing more to add on this.

-- Hal

> Ira
> 
> 
> 


From Brian.Murrell at Sun.COM  Wed Apr  9 09:42:24 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Wed, 09 Apr 2008 12:42:24 -0400
Subject: [ofa-general] ipath_kernel.h:1115: error: implicit declaration	of
	function	'writeq' on rhel5
In-Reply-To: <adaiqyrm2ax.fsf@cisco.com>
References: <1207750066.3303.28.camel@pc.ilinx> <adaiqyrm2ax.fsf@cisco.com>
Message-ID: <1207759344.3303.46.camel@pc.ilinx>

On Wed, 2008-04-09 at 09:16 -0700, Roland Dreier wrote:
> ipath doesn't work on any 32-bit architecture.

Indeed, this is what I had discovered.  Using {read,write}q is just not
kosher on < 64bit.

> The kernel Kconfig file has
> 
> config INFINIBAND_IPATH
>         tristate "QLogic InfiniPath Driver"
>         depends on (PCI_MSI || HT_IRQ) && 64BIT && NET

Indeed, I saw that too.

> but I guess the OFED build system doesn't enforce that.

That's my uninformed conclusion thus far too.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080409/d7290512/attachment.sig>

From sashak at voltaire.com  Wed Apr  9 13:46:03 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Wed, 9 Apr 2008 20:46:03 +0000
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409100108.GB19834@sashak.voltaire.com>
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
	<50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
Message-ID: <20080409204603.GB20833@sashak.voltaire.com>

On 08:38 Wed 09 Apr     , weiny2 at llnl.gov wrote:
> 
> What if we set it to 0x13?  This would be the maximum value that will not
> "lock" up the fabric.  We could also add to the error message that the
> admin needs to specify 0x14 if they specifically want "infinity" to be
> set?

I think in the case when parameter value provided by user is wrong it
is not easy to guess correctly what original wishes was. Probably we
just need to add something like:

  ## valid values are <= 0x14

in config file template and reject any invalid values (I mean set to
defaults)?

Sasha


From hrosenstock at xsigo.com  Wed Apr  9 10:53:42 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 10:53:42 -0700
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <20080409204603.GB20833@sashak.voltaire.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409100108.GB19834@sashak.voltaire.com>
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
	<50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
Message-ID: <1207763622.15625.506.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 20:46 +0000, Sasha Khapyorsky wrote:
> On 08:38 Wed 09 Apr     , weiny2 at llnl.gov wrote:
> > 
> > What if we set it to 0x13?  This would be the maximum value that will not
> > "lock" up the fabric.  We could also add to the error message that the
> > admin needs to specify 0x14 if they specifically want "infinity" to be
> > set?
> 
> I think in the case when parameter value provided by user is wrong it
> is not easy to guess correctly what original wishes was. Probably we
> just need to add something like:
> 
>   ## valid values are <= 0x14
> 
> in config file template and reject any invalid values (I mean set to
> defaults)?

That's consistent with other invalid settings so maybe also add info on
valid values into opensm.opts to try to reduce the chance of this
occurring.

-- Hal

> Sasha
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From bs at q-leap.de  Wed Apr  9 10:56:21 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Wed, 9 Apr 2008 19:56:21 +0200
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <47FBD40E.70407@mellanox.co.il>
References: <200804041147.27565.bs@q-leap.de>
	<20080408183113.GA18308@sashak.voltaire.com>
	<47FBD40E.70407@mellanox.co.il>
Message-ID: <200804091956.21840.bs@q-leap.de>

Hello Yevgeny!

On Tuesday 08 April 2008 22:22:38 Yevgeny Kliteynik wrote:
> Sasha Copyist wrote:
> > Hi Bernd,
> >
> > [adding Yevgeny..]
> >
> > On 11:35 Tue 08 Apr     , Bernd Schubert wrote:
> >> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote:
> >>> Hi Bernd,
> >>>
> >>> On 11:47 Fri 04 Apr     , Bernd Schubert wrote:
> >>>> opensm-3.2.1 logs some error messages like this:
> >>>>
> >>>> Apr 04 00:00:08 325114 [4580A960] 0x01 ->
> >>>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for
> >>>> node 0
> >>>> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep
> >>>> sampling list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path
> >>>> Dump of 3 hop path: Path = 0,1,14,13
> >>>>
> >>>>
> >>>> From ibnetdiscover output I see port13 of this switch is a
> >>>> switch-interconnect (sorry, I don't know what the correct
> >>>> name/identifier for switches within switches):
> >>>>
> >>>> [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid
> >>>> 263 4xSDR
> >>>
> >>> It is possible that port was DOWN during first subnet discovery.
> >>> Finally everything should be initialized after those messages. Isn't it
> >>> the case here?
> >>
> >> I think everything is initialized, but I don't think the port was down
> >> during first subnet discovery, since the port is on a spine board (I
> >> called it 'inter') to another switch system. We also never added any
> >> leafes to the switches.
> >
> > It is interesting phenomena then.
> >
> > Yevgeny, do you aware about such issue with Flextrinocs switches?
>
> I've seen it before. It means that during discovery some switch has
> answered NodeInfo query, but then when OpenSM started to query for
> PortInfo for each port of this switch, switch didn't answer for some
> (or all) ports.
>
> I think that this might happen if a switch has just been "plugged in",
> and internal switches are doing autonegotiation - they are bringing
> ports up and down when determining whether a link is SDR or DDR.
>
> In any case, this "phenomena" should disappear after a couple of
> dozens of seconds, when all the autonegotiation phase would be over.
>
> Bernd, am I close?
>

We never plugged in additional switches and the message appear on each opensm 
startup. However, the messages appear only once after opensm was started, but 
then never again. Would the switches do a SDR/DDR negotiation on opensm 
startup?

And since we are at SDR/DDR, it also might be related. Hal and I are also 
discussing an odd SDR/DDR ibnetdiscover problem. Ibnetdiscover just thinks 
some ports are at SDR, while ibstatus and perfquery do tell these ports are 
at DDR.

Thanks,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH


From weiny2 at llnl.gov  Wed Apr  9 11:01:36 2008
From: weiny2 at llnl.gov (weiny2 at llnl.gov)
Date: Wed, 9 Apr 2008 11:01:36 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add 
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <20080409204603.GB20833@sashak.voltaire.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409100108.GB19834@sashak.voltaire.com>
	<1207747629.15625.460.camel@hrosenstock-ws.xsigo.com>
	<50410.128.15.244.131.1207755519.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
Message-ID: <50823.128.15.244.169.1207764096.squirrel@127.0.0.1>

> On 08:38 Wed 09 Apr     , weiny2 at llnl.gov wrote:
>>
>> What if we set it to 0x13?  This would be the maximum value that will
>> not
>> "lock" up the fabric.  We could also add to the error message that the
>> admin needs to specify 0x14 if they specifically want "infinity" to be
>> set?
>
> I think in the case when parameter value provided by user is wrong it
> is not easy to guess correctly what original wishes was. Probably we
> just need to add something like:
>
>   ## valid values are <= 0x14
>
> in config file template and reject any invalid values (I mean set to
> defaults)?

The config file comments already mention this:

		"# The code of maximal time a packet can wait at the head of\n"
		"# transmission queue.\n"
		"# The actual time is 4.096usec * 2^<head_of_queue_lifetime>\n"
		"# The value 0x14 disables this mechanism\n"
		"head_of_queue_lifetime 0x%02x\n\n"

But I guess "disables" should be "infinity" to make this more clear.

I will leave it up to you as to which patch you want.  As Hal said I can
see either side.  Both patches warn the user that the value they submitted
was not valid; and subsequently what value OpenSM is using instead.

Whatever you want to do Sasha...  :-)

Ira


From swise at opengridcomputing.com  Wed Apr  9 11:06:58 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 09 Apr 2008 13:06:58 -0500
Subject: [ofa-general] Directions for verbs API extensions
In-Reply-To: <adahcednyx9.fsf@cisco.com>
References: <adaej9jppcx.fsf@cisco.com>	<47FA3D60.3020905@opengridcomputing.com>
	<adahcednyx9.fsf@cisco.com>
Message-ID: <47FD05C2.7090001@opengridcomputing.com>

Roland Dreier wrote:
>  > > There are a few discrepancies between the iWARP and IB verbs that we
>  > > need to decide on how we want to handle:
>  > >
>  > >  - In IB-BMME, L_Keys and R_Keys are split up so that there is an
>  > >    8-bit "key" that is owned by the consumer.  As far as I know, there
>  > >    is no analogous concept defined for iWARP STags; is there any point
>  > >    in supporting this IB-only feature (which is optional even in the
>  > >    IB spec)?
>
>  > In fact there is an 8b key for stags as well. The stag is composed of
>  > a 3B index allocated by the driver/hw, and a 1B key specified by the
>  > consumer. None of this is exposed in the linux rdma interface at this
>  > point and cxgb3 always sets the key to 0xff.
>
> Oops, I completely missed that in the iWARP verbs spec.  Yes, the IB and
> iWARP verbs agree on the semantics here, so the only issue is that the
> "key" portion of L_Keys/R_Keys is only supported by IB devices that do
> BMME.  So we can expose this in the API without too much trouble.
>
>  > The chelsio driver supports the iwarp bind_mw SQ WR via the current
>  > API. In fact the current API implies that this call is actually a SQ
>  > operation anyway:
>  > > /**
>  > > * ib_bind_mw - Posts a work request to the send queue of the specified
>  > > * QP, which binds the memory window to the given address range and
>  > > * remote access attributes.
>  > 
>  > How is the current bind_mw API not valid or correct for iwarp MWs?
>  > Other than being a different call than ib_post_send()?
>
> That's the only issue.  The main impact is that you can't submit an MW
> bind as part of a list of send WRs.  I guess it's not too severe an
> issue.  I don't have any strong feelings here, except that eliminating
> the separate bind_mw call might be a little cleaner.  On the other hand
> it adds more conditional branches to post_send so maybe it's a net lose.
>   
BTW: looks like /usr/include/infiniband/verbs.h is missing a 
ibv_bind_mw() function.  The struct and context ops are there, but no 
API func.  This means there is no bind_mw support for user mode at this 
point.  So we don't have to worry about backwards compatibility...

Steve.


From bs at q-leap.de  Wed Apr  9 11:11:16 2008
From: bs at q-leap.de (Bernd Schubert)
Date: Wed, 9 Apr 2008 20:11:16 +0200
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add checks
	for HOQ and Leaf HOQ input values
In-Reply-To: <50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
Message-ID: <200804092011.17361.bs@q-leap.de>

On Wednesday 09 April 2008 20:01:36 weiny2 at llnl.gov wrote:
> > On 08:38 Wed 09 Apr     , weiny2 at llnl.gov wrote:
> >> What if we set it to 0x13?  This would be the maximum value that will
> >> not
> >> "lock" up the fabric.  We could also add to the error message that the
> >> admin needs to specify 0x14 if they specifically want "infinity" to be
> >> set?
> >
> > I think in the case when parameter value provided by user is wrong it
> > is not easy to guess correctly what original wishes was. Probably we
> > just need to add something like:
> >
> >   ## valid values are <= 0x14
> >
> > in config file template and reject any invalid values (I mean set to
> > defaults)?
>
> The config file comments already mention this:
>
> 		"# The code of maximal time a packet can wait at the head of\n"
> 		"# transmission queue.\n"
> 		"# The actual time is 4.096usec * 2^<head_of_queue_lifetime>\n"
> 		"# The value 0x14 disables this mechanism\n"
> 		"head_of_queue_lifetime 0x%02x\n\n"
>
> But I guess "disables" should be "infinity" to make this more clear.

When I first read this and when increasing the value from 0x12 to 0x13 didn't 
help, I thought fine, if 0x14 disables it I just set it to 0x15. 
What about

"# The maximum is 0x14, which will disable this mechanism.\n"


Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH


From hrosenstock at xsigo.com  Wed Apr  9 11:19:07 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 11:19:07 -0700
Subject: [ofa-general] ERR 0108: Unknown remote side
In-Reply-To: <200804091956.21840.bs@q-leap.de>
References: <200804041147.27565.bs@q-leap.de>
	<20080408183113.GA18308@sashak.voltaire.com>
	<47FBD40E.70407@mellanox.co.il>  <200804091956.21840.bs@q-leap.de>
Message-ID: <1207765147.15625.510.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 19:56 +0200, Bernd Schubert wrote:
> Hello Yevgeny!
> 
> On Tuesday 08 April 2008 22:22:38 Yevgeny Kliteynik wrote:
> > Sasha Copyist wrote:
> > > Hi Bernd,
> > >
> > > [adding Yevgeny..]
> > >
> > > On 11:35 Tue 08 Apr     , Bernd Schubert wrote:
> > >> On Tuesday 08 April 2008 03:44:06 Sasha Copyist wrote:
> > >>> Hi Bernd,
> > >>>
> > >>> On 11:47 Fri 04 Apr     , Bernd Schubert wrote:
> > >>>> opensm-3.2.1 logs some error messages like this:
> > >>>>
> > >>>> Apr 04 00:00:08 325114 [4580A960] 0x01 ->
> > >>>> __osm_state_mgr_light_sweep_start: ERR 0108: Unknown remote side for
> > >>>> node 0
> > >>>> x000b8cffff002ba2(SW_pfs1_leaf4) port 13. Adding to light sweep
> > >>>> sampling list Apr 04 00:00:08 325126 [4580A960] 0x01 -> Directed Path
> > >>>> Dump of 3 hop path: Path = 0,1,14,13
> > >>>>
> > >>>>
> > >>>> From ibnetdiscover output I see port13 of this switch is a
> > >>>> switch-interconnect (sorry, I don't know what the correct
> > >>>> name/identifier for switches within switches):
> > >>>>
> > >>>> [13]    "S-000b8cffff002bfa"[13]                # "SW_pfs1_inter7" lid
> > >>>> 263 4xSDR
> > >>>
> > >>> It is possible that port was DOWN during first subnet discovery.
> > >>> Finally everything should be initialized after those messages. Isn't it
> > >>> the case here?
> > >>
> > >> I think everything is initialized, but I don't think the port was down
> > >> during first subnet discovery, since the port is on a spine board (I
> > >> called it 'inter') to another switch system. We also never added any
> > >> leafes to the switches.
> > >
> > > It is interesting phenomena then.
> > >
> > > Yevgeny, do you aware about such issue with Flextrinocs switches?
> >
> > I've seen it before. It means that during discovery some switch has
> > answered NodeInfo query, but then when OpenSM started to query for
> > PortInfo for each port of this switch, switch didn't answer for some
> > (or all) ports.
> >
> > I think that this might happen if a switch has just been "plugged in",
> > and internal switches are doing autonegotiation - they are bringing
> > ports up and down when determining whether a link is SDR or DDR.
> >
> > In any case, this "phenomena" should disappear after a couple of
> > dozens of seconds, when all the autonegotiation phase would be over.
> >
> > Bernd, am I close?
> >
> 
> We never plugged in additional switches and the message appear on each opensm 
> startup. However, the messages appear only once after opensm was started, but 
> then never again. Would the switches do a SDR/DDR negotiation on opensm 
> startup?

Links perform physical negotiation independent of SM.

> And since we are at SDR/DDR, it also might be related. Hal and I are also 
> discussing an odd SDR/DDR ibnetdiscover problem. Ibnetdiscover just thinks 
> some ports are at SDR, while ibstatus and perfquery do tell these ports are 
> at DDR.

I'm not sure the link speed is "stable".

-- Hal

> Thanks,
> Bernd
> 
> 


From hrosenstock at xsigo.com  Wed Apr  9 11:20:19 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 11:20:19 -0700
Subject: [ofa-general] Re: [PATCH] opensm/opensm/osm_subnet.c: add 
	checks for HOQ and Leaf HOQ input values
In-Reply-To: <200804092011.17361.bs@q-leap.de>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
Message-ID: <1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 20:11 +0200, Bernd Schubert wrote:
> On Wednesday 09 April 2008 20:01:36 weiny2 at llnl.gov wrote:
> > > On 08:38 Wed 09 Apr     , weiny2 at llnl.gov wrote:
> > >> What if we set it to 0x13?  This would be the maximum value that will
> > >> not
> > >> "lock" up the fabric.  We could also add to the error message that the
> > >> admin needs to specify 0x14 if they specifically want "infinity" to be
> > >> set?
> > >
> > > I think in the case when parameter value provided by user is wrong it
> > > is not easy to guess correctly what original wishes was. Probably we
> > > just need to add something like:
> > >
> > >   ## valid values are <= 0x14
> > >
> > > in config file template and reject any invalid values (I mean set to
> > > defaults)?
> >
> > The config file comments already mention this:
> >
> > 		"# The code of maximal time a packet can wait at the head of\n"
> > 		"# transmission queue.\n"
> > 		"# The actual time is 4.096usec * 2^<head_of_queue_lifetime>\n"
> > 		"# The value 0x14 disables this mechanism\n"
> > 		"head_of_queue_lifetime 0x%02x\n\n"
> >
> > But I guess "disables" should be "infinity" to make this more clear.
> 
> When I first read this and when increasing the value from 0x12 to 0x13 didn't 
> help, I thought fine, if 0x14 disables it I just set it to 0x15. 
> What about
> 
> "# The maximum is 0x14, which will disable this mechanism.\n"

Yes, that's what I was trying to suggest.

-- Hal

> 
> 
> Thanks,
> Bernd
> 


From pwatkins at sicortex.com  Wed Apr  9 11:27:16 2008
From: pwatkins at sicortex.com (Peter Watkins)
Date: Wed, 09 Apr 2008 14:27:16 -0400
Subject: [ofa-general] ofed works on kernels with 64Kbyte pages?
Message-ID: <47FD0A84.8020404@sicortex.com>


> I know it's a long shot, but has anyone tried using OFED on
> a kernel with 64Kbyte pages?

We have 64K pages on our MIPS machines, and OFED 1.2.5 is used to connect to a disk array.

Haven't tested lots of configurations, nor used other OFED paths, but it works.


01:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev a0)
        Subsystem: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode)
        Flags: bus master, fast devsel, latency 0, IRQ 23
        Memory at 818000000 (64-bit, non-prefetchable) [size=1M]
        Memory at 810000000 (64-bit, prefetchable) [size=8M]
        Memory at 800000000 (64-bit, prefetchable) [size=256M]
        Capabilities: [40] Power Management version 2
        Capabilities: [48] Vital Product Data <?>
        Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ Queue=0/5 Enable-
        Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
        Capabilities: [60] Express Endpoint, MSI 00
        Kernel driver in use: ib_mthca
        Kernel modules: ib_mthca


From hrosenstock at xsigo.com  Wed Apr  9 11:37:58 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 11:37:58 -0700
Subject: [ofa-general] Re: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
Message-ID: <1207766278.15625.523.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote:
> I'm trying to run opensm on a 4000+ node system,

Which version ? Do you mean 3.0.3 (or 3.0.13) ?

>  and seem to be having difficulties in keeping the opensm around.
> When I attach to the process w/ strace it does:
> ---
> # strace -p 5921
> Process 5921 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> ...
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0}, NULL)                = 0
> nanosleep({10, 0},  <unfinished ...>
> +++ killed by SIGSEGV +++
> ---
> 
> I have ofed 1.1 and 1.2 drivers loaded on the system.  I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before.  Here's how opensm is running:
> ---
>  6079 pts/0    Sl     0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0
> ---
> 
> I have lots of data in the osm.log as you can imagine ... I don't know offhand what I should be looking at/for.

What's towards the end of the log ?

-- Hal

> Thanks,
> -cdm
> 


From holt at sgi.com  Wed Apr  9 11:55:00 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 9 Apr 2008 13:55:00 -0500
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080409144401.GT10133@duo.random>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
Message-ID: <20080409185500.GT11364@sgi.com>

On Wed, Apr 09, 2008 at 04:44:01PM +0200, Andrea Arcangeli wrote:
> BTW, how did you implement invalidate_page? As this?
> 
>        	invalidate_page() {
>        		invalidate_range_begin()
> 		invalidate_range_end()
> 	}

Essentially, I did the work of each step without releasing and
reacquiring locks.

> If yes, I prefer to remind you that normally invalidate_range_begin is
> always called before zapping the pte. In the invalidate_page case
> instead, invalidate_range_begin is called _after_ the pte has been
> zapped already.
> 
> Now there's no problem if the pte is established and the spte isn't
> established. But it must never happen that the spte is established and
> the pte isn't established (with page-pinning that means unswappable
> memlock leak, without page-pinning it would mean memory corruption).

I am not sure I follow what you are saying.  Here is a very terse
breakdown of how PFNs flow through xpmem's structures.

We have a PFN table associated with our structure describing a grant.
We use get_user_pages() to acquire information for that table and we
fill the table in under a mutex.  Remote hosts (on the same numa network
so they have direct access to the users memory) have a PROXY version of
that structure.  It is filled out in a similar fashion to the local
table.  PTEs are created for the other processes while holding the mutex
for this table (either local or remote).  During the process of
faulting, we have a simple linked list of ongoing faults that is
maintained whenever the mutex is going to be released.

Our version of a zap_page_range is called recall_PFNs.  The recall
process grabs the mutex, scans the faulters list for any that cover the
range and mark them as needing a retry.  It then calls zap_page_range
for any processes that have attached the granted memory to clear out
their page tables.  Finally, we release the mutex and proceed.

The locking is more complex than this, but that is the essential idea.


What that means for mmu_notifiers is we have a single reference on the
page for all the remote processes using it.  When the callout to
invalidate_page() is made, we will still have processes with that PTE in
their page tables and potentially TLB entries.  When we return from the
invalidate_page() callout, we will have removed all those page table
entries, we will have no in-progress page table or tlb insertions that
will complete, and we will have released all our references to the page.

Does that meet your expectations?

Thanks,
Robin
> 
> So the range_begin must serialize against the secondary mmu page fault
> so that it can't establish the spte on a pte that was zapped by the
> rmap code after get_user_pages/follow_page returned. I think your
> range_begin already does that so you should be ok but I wanted to
> remind about this slight difference in implementing invalidate_page as
> I suggested above in previous email just to be sure ;).
> 
> This is the race you must guard against in invalidate_page:
> 
> 
>    	 CPU0 	     	      CPU1
> 	 try_to_unmap on page
> 			      secondary mmu page fault
> 			      get_user_pages()/follow_page found a page
>          ptep_clear_flush
> 	 invalidate_page()
> 	  invalidate_range_begin()
>           invalidate_range_end()
>           return from invalidate_page
> 			      establish spte on page
> 			      return from secodnary mmu page fault
> 
> If your range_begin already serializes in a hard way against the
> secondary mmu page fault, my previously "trivial" suggested
> implementation for invalidate_page should work just fine and this this
> saves 1 branch for each try_to_unmap_one if compared to the emm
> implementation. The branch check is inlined and it checks against the
> mmu_notifier_head that is the hot cacheline, no new cachline is
> checked just one branch is saved and so it worth it IMHO even if it
> doesn't provide any other advantage if you implement it the way above.


From hrosenstock at xsigo.com  Wed Apr  9 13:35:53 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 13:35:53 -0700
Subject: [ofa-general] Re: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
Message-ID: <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote:

> I have ofed 1.1 and 1.2 drivers loaded on the system.  I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before.  Here's how opensm is running:

Which OpenSM was run before ? Also, which kernel is being used and what
is meant by both ofed 1.1 and 1.2 drivers ?

>  6079 pts/0    Sl     0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0
> ---

Can you try without infinite SMPs ? Is this how it was run before ?

-- Hal

> -cdm
> 


From hrosenstock at xsigo.com  Wed Apr  9 13:39:28 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 13:39:28 -0700
Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E12B2@ES01SNLNT.srn.sandia.gov>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
	<1207766278.15625.523.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E12B2@ES01SNLNT.srn.sandia.gov>
Message-ID: <1207773568.15625.547.camel@hrosenstock-ws.xsigo.com>

Hi Christopher,

On Wed, 2008-04-09 at 13:14 -0600, Maestas, Christopher Daniel wrote:
> Hello Hal,
> 
> -----Original Message-----
> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> Sent: Wednesday, April 09, 2008 12:38 PM
> To: Maestas, Christopher Daniel
> Cc: general at lists.openfabrics.org
> Subject: Re: running opensm 3.0.3 on 4000+ node system
> 
> On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote:
> > I'm trying to run opensm on a 4000+ node system,
> 
> Which version ? Do you mean 3.0.3 (or 3.0.13) ?
> 
> cdm> Version 3.0.13 ... you're right on that
> # rpm -q opensm
> opensm-3.0.3-6.el5_1.1
> ---
> Apr  9 12:49:53 HOST OpenSM[3295]: /var/log/osm.log log file opened
> Apr  9 12:49:53 HOST OpenSM[3295]: OpenSM Rev:openib-3.0.13
> Apr  9 12:49:53 HOST kernel: user_mad: process opensm did not enable P_Key index support.
> Apr  9 12:49:53 HOST kernel: user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
> Apr  9 12:49:59 HOST OpenSM[3295]: Entering MASTER state
> Apr  9 12:50:02 HOST OpenSM[3295]: Errors during initialization

Your subnet has errors :-(

> Apr  9 12:50:16 HOST OpenSM[3295]: SUBNET UP
> Apr  9 12:50:22 HOST kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> Apr  9 12:50:30 HOST OpenSM[3295]: Errors during initialization
> Apr  9 12:51:05 HOST last message repeated 2 times
> Apr  9 12:52:17 HOST last message repeated 3 times
> Apr  9 12:53:27 HOST last message repeated 3 times
> ...
> 
> >  and seem to be having difficulties in keeping the opensm around.
> > When I attach to the process w/ strace it does:
> > ---
> > # strace -p 5921
> > Process 5921 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > ...
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0}, NULL)                = 0
> > nanosleep({10, 0},  <unfinished ...>
> > +++ killed by SIGSEGV +++
> > ---
> >
> > I have ofed 1.1 and 1.2 drivers loaded on the system.  I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before.  Here's how opensm is running:
> > ---
> >  6079 pts/0    Sl     0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0
> > ---
> >
> > I have lots of data in the osm.log as you can imagine ... I don't know offhand what I should be looking at/for.
> 
> What's towards the end of the log ?
> 
> cdm>
> I rebooted the node ... then brought ib0, then restarted opensmd ... It died when file got this big:
> # ls -l osm.log -h
> -rw-r--r-- 1 root root 3.2G Apr  9 13:12 osm.log
> # tail osm.log
> Apr 09 13:12:31 439877 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0089 Port 12 TID:0x00000000000032d3
> Apr 09 13:12:31 440370 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00D0 Port 3 TID:0x0000000000007480
> Apr 09 13:12:31 440669 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00B3 Port 7 TID:0x00000000000058dd
> Apr 09 13:12:31 440987 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0082 Port 21 TID:0x000000000000285a
> Apr 09 13:12:31 441228 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00E8 Port 10 TID:0x00000000000095a2
> Apr 09 13:12:31 441579 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x004A Port 1 TID:0x0000000000010d29
> Apr 09 13:12:31 441847 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0063 Port 24 TID:0x000000000000e40c
> Apr 09 13:12:31 442130 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x000A Port 23 TID:0x000000000006fca2
> Apr 09 13:12:31 442469 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 18 TID:0x0000000000059fc4
> Apr 09 13:12:31 442710 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 17 TID:0x0000000000059fc5

Those are flow control watchdog errors. Any special opensm options set
in the option file or are you running with the defaults ?

-- Hal


From hrosenstock at xsigo.com  Wed Apr  9 13:41:30 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 13:41:30 -0700
Subject: [ofa-general] Re: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <1207773353.15625.542.camel@hrosenstock-ws.xsigo.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
	<1207773353.15625.542.camel@hrosenstock-ws.xsigo.com>
Message-ID: <1207773690.15625.548.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 13:35 -0700, Hal Rosenstock wrote:
> On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote:
> 
> > I have ofed 1.1 and 1.2 drivers loaded on the system.  I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before.  Here's how opensm is running:
> 
> Which OpenSM was run before ?

I just saw your response on this. Sorry I missed it.

-- Hal

>  Also, which kernel is being used and what
> is meant by both ofed 1.1 and 1.2 drivers ?
> 
> >  6079 pts/0    Sl     0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0
> > ---
> 
> Can you try without infinite SMPs ? Is this how it was run before ?
> 
> -- Hal
> 
> > -cdm
> > 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hrosenstock at xsigo.com  Wed Apr  9 13:56:04 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 13:56:04 -0700
Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <1207773568.15625.547.camel@hrosenstock-ws.xsigo.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
	<1207766278.15625.523.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E12B2@ES01SNLNT.srn.sandia.gov>
	<1207773568.15625.547.camel@hrosenstock-ws.xsigo.com>
Message-ID: <1207774564.15625.552.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 13:39 -0700, Hal Rosenstock wrote:
> Hi Christopher,
> 
> On Wed, 2008-04-09 at 13:14 -0600, Maestas, Christopher Daniel wrote:
> > Hello Hal,
> > 
> > -----Original Message-----
> > From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> > Sent: Wednesday, April 09, 2008 12:38 PM
> > To: Maestas, Christopher Daniel
> > Cc: general at lists.openfabrics.org
> > Subject: Re: running opensm 3.0.3 on 4000+ node system
> > 
> > On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote:
> > > I'm trying to run opensm on a 4000+ node system,
> > 
> > Which version ? Do you mean 3.0.3 (or 3.0.13) ?
> > 
> > cdm> Version 3.0.13 ... you're right on that
> > # rpm -q opensm
> > opensm-3.0.3-6.el5_1.1
> > ---
> > Apr  9 12:49:53 HOST OpenSM[3295]: /var/log/osm.log log file opened
> > Apr  9 12:49:53 HOST OpenSM[3295]: OpenSM Rev:openib-3.0.13
> > Apr  9 12:49:53 HOST kernel: user_mad: process opensm did not enable P_Key index support.
> > Apr  9 12:49:53 HOST kernel: user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
> > Apr  9 12:49:59 HOST OpenSM[3295]: Entering MASTER state
> > Apr  9 12:50:02 HOST OpenSM[3295]: Errors during initialization
> 
> Your subnet has errors :-(
> 
> > Apr  9 12:50:16 HOST OpenSM[3295]: SUBNET UP
> > Apr  9 12:50:22 HOST kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> > Apr  9 12:50:30 HOST OpenSM[3295]: Errors during initialization
> > Apr  9 12:51:05 HOST last message repeated 2 times
> > Apr  9 12:52:17 HOST last message repeated 3 times
> > Apr  9 12:53:27 HOST last message repeated 3 times
> > ...
> > 
> > >  and seem to be having difficulties in keeping the opensm around.
> > > When I attach to the process w/ strace it does:
> > > ---
> > > # strace -p 5921
> > > Process 5921 attached - interrupt to quit restart_syscall(<... resuming interrupted call ...>) = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > ...
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0}, NULL)                = 0
> > > nanosleep({10, 0},  <unfinished ...>
> > > +++ killed by SIGSEGV +++
> > > ---
> > >
> > > I have ofed 1.1 and 1.2 drivers loaded on the system.  I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before.  Here's how opensm is running:
> > > ---
> > >  6079 pts/0    Sl     0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0
> > > ---
> > >
> > > I have lots of data in the osm.log as you can imagine ... I don't know offhand what I should be looking at/for.
> > 
> > What's towards the end of the log ?
> > 
> > cdm>
> > I rebooted the node ... then brought ib0, then restarted opensmd ... It died when file got this big:
> > # ls -l osm.log -h
> > -rw-r--r-- 1 root root 3.2G Apr  9 13:12 osm.log
> > # tail osm.log
> > Apr 09 13:12:31 439877 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0089 Port 12 TID:0x00000000000032d3
> > Apr 09 13:12:31 440370 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00D0 Port 3 TID:0x0000000000007480
> > Apr 09 13:12:31 440669 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00B3 Port 7 TID:0x00000000000058dd
> > Apr 09 13:12:31 440987 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0082 Port 21 TID:0x000000000000285a
> > Apr 09 13:12:31 441228 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x00E8 Port 10 TID:0x00000000000095a2
> > Apr 09 13:12:31 441579 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x004A Port 1 TID:0x0000000000010d29
> > Apr 09 13:12:31 441847 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0063 Port 24 TID:0x000000000000e40c
> > Apr 09 13:12:31 442130 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x000A Port 23 TID:0x000000000006fca2
> > Apr 09 13:12:31 442469 [43204940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 18 TID:0x0000000000059fc4
> > Apr 09 13:12:31 442710 [41E02940] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:131 Producer:2 from LID:0x0009 Port 17 TID:0x0000000000059fc5
> 
> Those are flow control watchdog errors.

One possible explanation for this: SM could be (mis)configuring
mismatched OperVLs at the two ends of these links. Not sure why.

-- Hal

>  Any special opensm options set
> in the option file or are you running with the defaults ?
> 
> -- Hal


From hrosenstock at xsigo.com  Wed Apr  9 14:17:50 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 14:17:50 -0700
Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
	<1207773353.15625.542.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov>
Message-ID: <1207775870.15625.567.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-09 at 15:13 -0600, Maestas, Christopher Daniel wrote:
> I think we may have fixed it:
> ---
>  3998 pts/0    Sl     1:47 /usr/sbin/opensm -maxsmps 15 -t 200 -f /var/log/osm.log -g 0
> --
> 
> I changed maxsmps to 15 (from default of 0 => unlimited) and it seems to be working now. 
>  That is the same value we use for the cisco host based sm.

Yes, an infinite value could overrun the unflow controlled VL15 buffers
in the switches. Guess this should be noted somewhere in the
documentation/man pages.

> ---
> Apr  9 14:43:17 HOST OpenSM[3998]: /var/log/osm.log log file opened
> Apr  9 14:43:17 HOST OpenSM[3998]: OpenSM Rev:openib-3.0.13
> Apr  9 14:43:17 HOST kernel: user_mad: process opensm did not enable P_Key index support.
> Apr  9 14:43:17 HOST kernel: user_mad:   Documentation/infiniband/user_mad.txt has info on the new ABI.
> Apr  9 14:43:30 HOST OpenSM[3998]: Entering MASTER state
> Apr  9 14:43:54 HOST OpenSM[3998]: SUBNET UP
> ---
> 
> The log file is not growing like crazy anymore ...

So it is the SM which caused this by mismatching peer port OpVLs.

-- Hal

> I did forget to mention we are running a new mellanox firmware on the HCA too and switches ... been about 2 years since we last tested. :)

> I'm looking for the previous method in which it was run, and I don't recall making this change before.  It could be due to all the other changes since then.  But now I know how to get it going and my work is hopefully archived in this mailing list. ;)
> 
> Thanks,
> -cdm
> 
> -----Original Message-----
> From: Hal Rosenstock [mailto:hrosenstock at xsigo.com]
> Sent: Wednesday, April 09, 2008 2:36 PM
> To: Maestas, Christopher Daniel
> Cc: general at lists.openfabrics.org
> Subject: Re: running opensm 3.0.3 on 4000+ node system
> 
> On Wed, 2008-04-09 at 12:26 -0600, Maestas, Christopher Daniel wrote:
> 
> > I have ofed 1.1 and 1.2 drivers loaded on the system.  I've done this in the past using opensm 3.0.0 svn tag 10188 from ofed 1.0 clients and had no issues before.  Here's how opensm is running:
> 
> Which OpenSM was run before ? Also, which kernel is being used and what is meant by both ofed 1.1 and 1.2 drivers ?
> 
> >  6079 pts/0    Sl     0:08 /usr/sbin/opensm -d 3 -maxsmps 0 -s 300 -t 1000 -f /var/log/osm.log -V -g 0
> > ---
> 
> Can you try without infinite SMPs ? Is this how it was run before ?
> 
> -- Hal
> 
> > -cdm
> >
> 
> 
> 


From Brian.Murrell at Sun.COM  Wed Apr  9 14:51:51 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Wed, 09 Apr 2008 17:51:51 -0400
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
Message-ID: <1207777911.3303.88.camel@pc.ilinx>

The OFED 1.3 release I downloaded identifies a SLES 9 SP3 kernel and
assigns a backport patchset for it:

        2.6.5-7.*)
                echo 2.6.5_sles9_sp3
        ;;

But I don't seem to have that patchset in my release:

$ ls ~/rpm/BUILD/ofa_kernel-1.3/kernel_patches/backport
2.6.11             2.6.15_ubuntu606   2.6.18-EL5.1     2.6.22_suse10_3
2.6.11_FC4         2.6.16             2.6.18_FC6       2.6.23
2.6.12             2.6.16_sles10      2.6.18_suse10_2  2.6.9_U4
2.6.13             2.6.16_sles10_sp1  2.6.19           2.6.9_U5
2.6.13_suse10_0_u  2.6.16_sles10_sp2  2.6.20           2.6.9_U6
2.6.14             2.6.17             2.6.21
2.6.15             2.6.18             2.6.22

There seem to be other identified releases missing such as 2.6.9_U{2,3}
(not that I care about those particular releases).

Is this release incomplete?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080409/f23b0377/attachment.sig>

From sashak at voltaire.com  Wed Apr  9 17:50:10 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Thu, 10 Apr 2008 00:50:10 +0000
Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <1207775870.15625.567.camel@hrosenstock-ws.xsigo.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
	<1207773353.15625.542.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov>
	<1207775870.15625.567.camel@hrosenstock-ws.xsigo.com>
Message-ID: <20080410005010.GD21190@sashak.voltaire.com>

On 14:17 Wed 09 Apr     , Hal Rosenstock wrote:
> On Wed, 2008-04-09 at 15:13 -0600, Maestas, Christopher Daniel wrote:
> > I think we may have fixed it:
> > ---
> >  3998 pts/0    Sl     1:47 /usr/sbin/opensm -maxsmps 15 -t 200 -f /var/log/osm.log -g 0
> > --
> > 
> > I changed maxsmps to 15 (from default of 0 => unlimited) and it seems to be working now. 
> >  That is the same value we use for the cisco host based sm.
> 
> Yes, an infinite value could overrun the unflow controlled VL15 buffers
> in the switches.

Even if not - it overflows mad response matching table in vendor layer
(there are 4k+ nodes and only 1k entries in the table). In recent
version (master) this table size can be redefined with
OSM_UMAD_MAX_PENDING environment variable.

Sasha


From hrosenstock at xsigo.com  Wed Apr  9 14:54:11 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 09 Apr 2008 14:54:11 -0700
Subject: [ofa-general] RE: running opensm 3.0.3 on 4000+ node system
In-Reply-To: <20080410005010.GD21190@sashak.voltaire.com>
References: <49947.128.15.244.160.1207698524.squirrel@127.0.0.1>
	<20080409204603.GB20833@sashak.voltaire.com>
	<50823.128.15.244.169.1207764096.squirrel@127.0.0.1>
	<200804092011.17361.bs@q-leap.de>
	<1207765219.15625.511.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E129B@ES01SNLNT.srn.sandia.gov>
	<1207773353.15625.542.camel@hrosenstock-ws.xsigo.com>
	<01388EFD6F94FE4787C7CB970014DF670C406E1301@ES01SNLNT.srn.sandia.gov>
	<1207775870.15625.567.camel@hrosenstock-ws.xsigo.com>
	<20080410005010.GD21190@sashak.voltaire.com>
Message-ID: <1207778051.15625.580.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-10 at 00:50 +0000, Sasha Khapyorsky wrote:
> On 14:17 Wed 09 Apr     , Hal Rosenstock wrote:
> > On Wed, 2008-04-09 at 15:13 -0600, Maestas, Christopher Daniel wrote:
> > > I think we may have fixed it:
> > > ---
> > >  3998 pts/0    Sl     1:47 /usr/sbin/opensm -maxsmps 15 -t 200 -f /var/log/osm.log -g 0
> > > --
> > > 
> > > I changed maxsmps to 15 (from default of 0 => unlimited) and it seems to be working now. 
> > >  That is the same value we use for the cisco host based sm.
> > 
> > Yes, an infinite value could overrun the unflow controlled VL15 buffers
> > in the switches.
> 
> Even if not - it overflows mad response matching table in vendor layer
> (there are 4k+ nodes and only 1k entries in the table). In recent
> version (master) this table size can be redefined with
> OSM_UMAD_MAX_PENDING environment variable.

Right; I forgot about that but not sure why that wouldn't have happened
on his earlier use of OpenSM though.

-- Hal

> 
> Sasha
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From tziporet at dev.mellanox.co.il  Wed Apr  9 15:51:59 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Wed, 09 Apr 2008 15:51:59 -0700
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
In-Reply-To: <1207777911.3303.88.camel@pc.ilinx>
References: <1207777911.3303.88.camel@pc.ilinx>
Message-ID: <47FD488F.3000405@mellanox.co.il>

Brian J. Murrell wrote:
> The OFED 1.3 release I downloaded identifies a SLES 9 SP3 kernel and
>   

OFED 1.3 does not supports SLES9
If you need this OS you can use OFED 1.2.5.5

Tziporet


From rjjtryw at boxerproperty.com  Wed Apr  9 20:22:15 2008
From: rjjtryw at boxerproperty.com (Tabatha Rubio)
Date: Wed, 9 Apr 2008 23:22:15 -0400
Subject: [ofa-general] Re: Re: Hi 
Message-ID: <01c89a98$8bba2d80$8dc108c8@rjjtryw>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080409/caa9b404/attachment.zip>

From krkumar2 at in.ibm.com  Wed Apr  9 21:06:57 2008
From: krkumar2 at in.ibm.com (Krishna Kumar2)
Date: Thu, 10 Apr 2008 09:36:57 +0530
Subject: [ofa-general] Test programs supporting RNIC's.
In-Reply-To: <47FCCCB7.2080407@opengridcomputing.com>
Message-ID: <OF159D0F57.C1687A3C-ON65257427.00168146-65257427.00169BFA@in.ibm.com>

Hi Steve,

Steve Wise <swise at opengridcomputing.com> wrote on 04/09/2008 07:33:35 PM:

> Krishna, if you are interested, you could add cma support to the rest of
> these.  I can help by answering questions and/or testing things...

If no one else is already doing this, I can start doing this in the
background. Will follow up with you if I need any help.

Thanks,

- KK


From tek at qdimaging.com  Thu Apr 10 01:21:23 2008
From: tek at qdimaging.com (Tommie Covington)
Date: Thu, 10 Apr 2008 13:51:23 +0530
Subject: [ofa-general] Re: Re: Hello 
Message-ID: <01c89b11$f65b7f80$a88fa37a@tek>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/acae2cb9/attachment.zip>

From teeqyc at transfixed.com  Thu Apr 10 04:03:27 2008
From: teeqyc at transfixed.com (Jefferson Light)
Date: Thu, 10 Apr 2008 12:03:27 +0100
Subject: [ofa-general] Re: Re: Hi  shed pounds fast
Message-ID: <01c89b02$e2661580$2bd95251@teeqyc>

And its never been easier than now, now that Anatrim is available.
Anatrim is the revolutionary new product designed to help users
not only shed pounds fast, but keep the weight off, permanently!

Watch your love handles and waistline melt away over the course of
just a few short weeks.

-No dangerous Ephedra
-100% Natural and Safe!

http://www.defasko.net/


From PHF at zurich.ibm.com  Thu Apr 10 04:32:57 2008
From: PHF at zurich.ibm.com (Philip Frey1)
Date: Thu, 10 Apr 2008 13:32:57 +0200
Subject: [ofa-general] librdmacm.a for 2.6.24 missing
Message-ID: <OF33B02DB4.A23629EC-ONC1257427.003F26D9-C1257427.003F6281@ch.ibm.com>

Hi,

I have installed OFED 1.3 for my 2.6.24 kernel. Before I was running a 
different kernel (2.6.24.3).
Before the change I had a static rdma cm library in 
/usr/lib64/librdmacm.a. Now this library is
missing.

Can anybody help me get that static library back?

Many thanks,
 Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/d92a56b5/attachment.html>

From gsfxodyeuci at boi.hp.com  Thu Apr 10 05:44:59 2008
From: gsfxodyeuci at boi.hp.com (Napoleon Dyer)
Date: Thu, 10 Apr 2008 12:44:59 +0000
Subject: [ofa-general] Re: Re: Hello 
Message-ID: <01c89b08$afca1800$8a320550@gsfxodyeuci>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/b0e813f2/attachment.zip>

From taylor at hpc.ufl.edu  Thu Apr 10 05:49:04 2008
From: taylor at hpc.ufl.edu (Charles Taylor)
Date: Thu, 10 Apr 2008 08:49:04 -0400
Subject: [ofa-general] Test programs supporting RNIC's.
In-Reply-To: <OF159D0F57.C1687A3C-ON65257427.00168146-65257427.00169BFA@in.ibm.com>
References: <OF159D0F57.C1687A3C-ON65257427.00168146-65257427.00169BFA@in.ibm.com>
Message-ID: <F632B547-9106-4664-AEB9-D6B558AB60AE@hpc.ufl.edu>


We might be interested in helping with this as well.

Charlie Taylor
UF HPC Center

On Apr 10, 2008, at 12:06 AM, Krishna Kumar2 wrote:
> Hi Steve,
>
> Steve Wise <swise at opengridcomputing.com> wrote on 04/09/2008  
> 07:33:35 PM:
>
>> Krishna, if you are interested, you could add cma support to the  
>> rest of
>> these.  I can help by answering questions and/or testing things...
>
> If no one else is already doing this, I can start doing this in the
> background. Will follow up with you if I need any help.
>
> Thanks,
>
> - KK
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/ 
> openib-general


From Brian.Murrell at Sun.COM  Thu Apr 10 06:38:06 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Thu, 10 Apr 2008 09:38:06 -0400
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
In-Reply-To: <47FD488F.3000405@mellanox.co.il>
References: <1207777911.3303.88.camel@pc.ilinx>
	<47FD488F.3000405@mellanox.co.il>
Message-ID: <1207834686.3303.117.camel@pc.ilinx>

On Wed, 2008-04-09 at 15:51 -0700, Tziporet Koren wrote:
> 
> OFED 1.3 does not supports SLES9
> If you need this OS you can use OFED 1.2.5.5

That's fair enough.  But why not have the configuration process actually
stop an announce when it detects that it's operating on an unsupported
platform?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/cc244a3e/attachment.sig>

From shibata at lampreynetworks.com  Thu Apr 10 07:27:15 2008
From: shibata at lampreynetworks.com (Joel Shibata)
Date: Thu, 10 Apr 2008 14:27:15 GMT
Subject: [ofa-general] madrpc_init and reseting performance counters
Message-ID: <200804101027456.SM08116@[66.94.32.4]>

I'm attempting to query the performance counters on each IB device/port and then reset these counters.  To do so I'm using madrpc_init to initialize each port on every poll.  Doing so produces the following warning/panic:
ibwarn: [19949] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (Too many open files): is ib_umad module loaded?
ibpanic: [19949] madrpc_init: can't init UMAD library: (Too many open files)
I've verified that libibumad rpms are installed.  Only calling madrpc_init at the front end of my polling only allows me to reset the port that was initialized last.  Does anyone have some insight into how I gather/reset each port without having to call madrpc_init each time I poll that port?Joel Shibata
Software Developer
Lamprey Networks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/d27cb057/attachment.html>

From hrosenstock at xsigo.com  Thu Apr 10 07:32:50 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 10 Apr 2008 07:32:50 -0700
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <200804101027456.SM08116@[66.94.32.4]>
References: <200804101027456.SM08116@[66.94.32.4]>
Message-ID: <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>

Joel,

On Thu, 2008-04-10 at 14:27 +0000, Joel Shibata wrote:
> I'm attempting to query the performance counters on each IB
> device/port and then reset these counters.  To do so I'm using
> madrpc_init to initialize each port on every poll.  Doing so produces
> the following warning/panic:
> 
> ibwarn: [19949] umad_init: can't read ABI version
> from /sys/class/infiniband_mad/abi_version (Too many open files): is
> ib_umad module loaded?
> ibpanic: [19949] madrpc_init: can't init UMAD library: (Too many open
> files)
> 
> I've verified that libibumad rpms are installed.  Only calling
> madrpc_init at the front end of my polling only allows me to reset the
> port that was initialized last.  Does anyone have some insight into
> how I gather/reset each port without having to call madrpc_init each
> time I poll that port?

There's already a tool which does what you are describing at a high
level: perfquery -R and also scripts for the entire subnet:
ibclearcounters or ibclearerrors (if you just want to clear the error
counters).

-- Hal

> Joel Shibata
> Software Developer
> Lamprey Networks
> 
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From tziporet at dev.mellanox.co.il  Thu Apr 10 08:21:18 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Thu, 10 Apr 2008 08:21:18 -0700
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
In-Reply-To: <1207834686.3303.117.camel@pc.ilinx>
References: <1207777911.3303.88.camel@pc.ilinx>	<47FD488F.3000405@mellanox.co.il>
	<1207834686.3303.117.camel@pc.ilinx>
Message-ID: <47FE306E.5010003@mellanox.co.il>

Brian J. Murrell wrote:
> On Wed, 2008-04-09 at 15:51 -0700, Tziporet Koren wrote:
>   
>> OFED 1.3 does not supports SLES9
>> If you need this OS you can use OFED 1.2.5.5
>>     
>
> That's fair enough.  But why not have the configuration process actually
> stop an announce when it detects that it's operating on an unsupported
> platform?
>
>   
You are right,
Vlad - can you fix it

Tziporet


From evkuwqepihlc at bollingershipyards.com  Thu Apr 10 09:32:39 2008
From: evkuwqepihlc at bollingershipyards.com (Sylvester Darby)
Date: Thu, 10 Apr 2008 10:32:39 -0600
Subject: [ofa-general] Re: Re: Hi 
Message-ID: <01c89af6$33197d80$79ff9ec9@evkuwqepihlc>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/f8b3f349/attachment.zip>

From chu11 at llnl.gov  Thu Apr 10 11:17:07 2008
From: chu11 at llnl.gov (Al Chu)
Date: Thu, 10 Apr 2008 11:17:07 -0700
Subject: [ofa-general] Re: [RFC][PATCH 0/4] opensm: using conventional config
	file
In-Reply-To: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
References: <1207703425-19039-1-git-send-email-sashak@voltaire.com>
Message-ID: <1207851427.7695.123.camel@cardanus.llnl.gov>

Hey Sasha,

I suddenly thought about this.  If the /var/cache/opensm/opensm.opts
file is no longer readable (and presumably people will not know about it
b/c it is not documented anywhere), how will users know how to write the
opensm.conf?  Will opesn distribute a "template" .conf file with all
values initially commented out??  (I think this is the best idea).

Al

On Wed, 2008-04-09 at 01:10 +0000, Sasha Khapyorsky wrote:
> Hi,
> 
> This is attempt to make some order with OpenSM configuration. Now it
> will use conventional (similar to another programs which may have
> configuration) config ($sysconfig/etc/opensm/opensm.conf) file instead
> of option cache file. Config file for some startup scripts should go
> away. Option '-c' is preserved - it can be useful for config file
> template generation, but OpenSM will not try to read option cache file.
> 
> This is RFC yet. In addition to this we will need to update scripts and
> man pages.
> 
> Any feedback? Thoughts?
> 
> Sasha
-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


From chu11 at llnl.gov  Thu Apr 10 14:10:15 2008
From: chu11 at llnl.gov (Al Chu)
Date: Thu, 10 Apr 2008 14:10:15 -0700
Subject: [ofa-general] [OpenSM] [PATCH 0/3] New "port-offsetting" option to
	updn/minhop routing
Message-ID: <1207861815.7695.160.camel@cardanus.llnl.gov>

Hey Sasha,

I was going to submit this after I had a chance to test on one of our
big clusters to see if it worked 100% right.  But my final testing has
been delayed (for a month now!).  Ira said some folks from Sonoma were
interested in this, so I'll go ahead and post it.

This is a patch for something I call "port_offsetting" (name/description
of the option is open to suggestion).  Basically, we want to move to
using lmc > 0 on our clusters b/c some of the newer MPI implementations
take advantage of multiple lids and have shown faster performance when
lmc > 0.

The problem is that those users that do not use the newer MPI
implementations, or do not run their code in a way that can take
advantage of multiple lids, suffer great performance degradation in
their code.  We determined that the primary issue is what we started
calling "base lid alignment".  Here's a simple example.

Assume LMC = 2 and we are trying to route the lids of 4 ports (A,B,C,D).
Those lids are:

port A - 1,2,3,4
port B - 5,6,7,8
port C - 9,10,11,12
port D - 13,14,15,16

Suppose forwarding of these lids goes through 4 switch ports.  If we
cycle through the ports like updn/minhop currently do, we would see
something like this.

switch port 1: 1, 5, 9, 13
switch port 2: 2, 6, 10, 14
switch port 3: 3, 7, 11, 15
switch port 4: 4, 8, 12, 16

Note that the base lid of each port (lids 1, 5, 9, 13) goes through only
1 port of the switch.  Thus a user that uses only the base lid is using
only 1 port out of the 4 ports they could be using.  Leading to terrible
performance.

We want to get this instead.

switch port 1: 1, 8, 11, 14
switch port 2: 2, 5, 12, 15
switch port 3: 3, 6, 9,  16
switch port 4: 4, 7, 10, 13

where base lids are distributed in a more even manner.

In order to do this, we (effectively) iterate through all ports like
before, but we iterate starting at a different index depending on the
number of paths we have routed thus far.

On one of our clusters, some testing has shown when we run w/ LMC=1 and
1 task per node, mpibench (AlltoAll tests) range from 10-30% worse than
when LMC=0 is used.  With LMC=2, mpibench tends to be 50-70% worse in
performance than with LMC=0.

With the port offsetting option, the performance degradation ranges 1-5%
worse than LMC=0.  I am currently at a loss why I cannot get it to be
even to LMC=0, but 1-5% is small enough to not make users mad :-)

The part I haven't been able to test yet is whether newer MPIs that do
take advantage of LMC > 0 run equally when my port_offsetting is turned
off and on.  That's the part I'm still haven't been able to test.

Thanks, look forward to your comments,

Al


-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory


From chu11 at llnl.gov  Thu Apr 10 14:11:03 2008
From: chu11 at llnl.gov (Al Chu)
Date: Thu, 10 Apr 2008 14:11:03 -0700
Subject: [ofa-general] [OpenSM] [PATCH 1/3] add p_log pointer to osm_switch_t
Message-ID: <1207861863.7695.162.camel@cardanus.llnl.gov>

Nothing too fancy in this patch.  I wanted to output some debug stuff
into the log, and needed to get the p_log pointer passed into
osm_switch_recommend_path().  Adding it into osm_switch_t seemed the
easiest/best way.  If you think we should get it into
osm_switch_recommend_path() a different way, PLMK.

Al

-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-add-log-pointer-to-osm_switch_t.patch
Type: text/x-patch
Size: 3847 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/aee4b60b/attachment.bin>

From chu11 at llnl.gov  Thu Apr 10 14:11:37 2008
From: chu11 at llnl.gov (Al Chu)
Date: Thu, 10 Apr 2008 14:11:37 -0700
Subject: [ofa-general] [OpenSM] [PATCH 2/3] add port_offsetting option
Message-ID: <1207861897.7695.163.camel@cardanus.llnl.gov>

Nothing too fancy in this patch.  Just added the port_offsetting option,
config file option, manpage documentation, etc.  Again, I welcome
comments on the text + the option name.  "port_offsetting" was the best
name I could come up with :-)

Al

-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-add-port_offsetting-option.patch
Type: text/x-patch
Size: 6706 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/4f3be3cc/attachment.bin>

From chu11 at llnl.gov  Thu Apr 10 14:12:09 2008
From: chu11 at llnl.gov (Al Chu)
Date: Thu, 10 Apr 2008 14:12:09 -0700
Subject: [ofa-general] [OpenSM] [PATCH 3/3] implement port_offsetting option
Message-ID: <1207861929.7695.165.camel@cardanus.llnl.gov>

This is the primary patch that fiddles with the path recommendation
code.  A few notes:

1) b/c I want to keep track of how many remote destinations there can
be, the 'remote_guids' array now stores all remote destinations, not
just the ones we have already forwarded to.

2) b/c I may need to free memory, I now "goto Exit" instead of just
calling 'return' many times.

3) Although the option is called 'port_offsetting', I actually "offset"
both the remote destination I send to and the port pointing towards that
remote destination.

Al

-- 
Albert Chu
chu11 at llnl.gov
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-implement-port_offsetting.patch
Type: text/x-patch
Size: 15048 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080410/4a559061/attachment.bin>

From xptveowrp at bobghiotohomes.com  Thu Apr 10 21:10:34 2008
From: xptveowrp at bobghiotohomes.com (Felicia Koehler)
Date: Fri, 11 Apr 2008 13:10:34 +0900
Subject: [ofa-general] Re: Re: Hi 
Message-ID: <01c89bd5$6d0d8100$612bac79@xptveowrp>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/57f8c9ad/attachment.zip>

From diego.guella at sircomtech.com  Thu Apr 10 23:38:19 2008
From: diego.guella at sircomtech.com (Diego Guella)
Date: Fri, 11 Apr 2008 08:38:19 +0200
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
References: <1207777911.3303.88.camel@pc.ilinx>	<47FD488F.3000405@mellanox.co.il><1207834686.3303.117.camel@pc.ilinx>
	<47FE306E.5010003@mellanox.co.il>
Message-ID: <003a01c89b9e$a277d700$05c8a8c0@DIEGO>


----- Original Message ----- 
From: "Tziporet Koren"
> Brian J. Murrell wrote:
>> On Wed, 2008-04-09 at 15:51 -0700, Tziporet Koren wrote:
>>
>>> OFED 1.3 does not supports SLES9
>>> If you need this OS you can use OFED 1.2.5.5
>>>
>>
>> That's fair enough.  But why not have the configuration process actually
>> stop an announce when it detects that it's operating on an unsupported
>> platform?
>>
>>
> You are right,
> Vlad - can you fix it
>
> Tziporet

I think it would be better to print a warning, and ask the user if the process should continue or not.
In the past I installed OFED 1.0 on Suse Linux 9.3 Professional (an unsupported operating system), and the only change I done was to 
the installation script, to make it recognize SL 9.3Pro as SLES.
OFED 1.0 (opensm, ipoib, SDP, verbs) then ran without problems.


Actually, it would be much better if the config process stops, prints a warning, print a list of supported operating systems, and 
then let the user choose which operating system should OFED be compiled for.


Diego


From ter99 at aquanovapoolnspa.com  Fri Apr 11 01:59:58 2008
From: ter99 at aquanovapoolnspa.com (Vito Mason)
Date: Fri, 11 Apr 2008 10:59:58 +0200
Subject: [ofa-general] Hi 
Message-ID: <01c89bc3$2e6ecb00$4529bc4e@ter99>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/157b2283/attachment.zip>

From ittbaaupcqhy at bobops.com  Fri Apr 11 02:25:52 2008
From: ittbaaupcqhy at bobops.com (Coy Webster)
Date: Fri, 11 Apr 2008 17:25:52 +0800
Subject: [ofa-general] Re: Hello  shed pounds fast
Message-ID: <01c89bf9$174ad000$9aeb377a@ittbaaupcqhy>

And its never been easier than now, now that Anatrim is available.
Anatrim is the revolutionary new product designed to help users
not only shed pounds fast, but keep the weight off, permanently!

Watch your love handles and waistline melt away over the course of
just a few short weeks.

-No dangerous Ephedra
-100% Natural and Safe!

http://www.doilfem.net


From Vsupport at comercialjoman.com  Fri Apr 11 02:43:11 2008
From: Vsupport at comercialjoman.com (jerad istvan)
Date: Fri, 11 Apr 2008 09:43:11 +0000
Subject: [ofa-general] Men and Women FREE INTERNATIONAL SHIPPING on Gucci
	Prada Dior D&G Dsquared Shoes Heels Ugg Boots
Message-ID: <000501c89bc7$04cea75e$a5b0a18a@qwbwkrrs>

Hey have you heard?
Finally, the 2008 Collections are in, enjoy 70% OFF Brand Name Shoes & Boots
for Men & Women from TOP Fashion Designers. Choose from a variety of the season's
hottest models from Gucci, Prada, Chanel, Dior,  Ugg Boots, Burberry, D&G, Dsquared &
much more. Enter and Save TODAY! Free International Shipping on ALL ORDERS!
Click here!  Make your way here & Save Today!
NoW That's an AMAZING Offer!
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/681d23f4/attachment.html>

From eulogises at moonflower.at  Fri Apr 11 04:56:20 2008
From: eulogises at moonflower.at (Cotney Earls)
Date: Fri, 11 Apr 2008 11:56:20 +0000
Subject: [ofa-general] candlestick
Message-ID: <4521069857.20080411115435@moonflower.at>

God dag,   
 
  Real men!  Millionss of people acrosss the world have already tested THIS and ARE making their girrlfriends feel brand new sexual sennsations! 
YOU are the best in bed, aren't you ?Girls! Develoop your sexual relattionship and get even MORE ppleasure!    Make your boyfrriend a gift!
http://x625k0gyyslmp.blogspot.com

   
13. 1676. By richard meggot, d.d. In 4o. The case father's
forgot the coffin? Aye, lad, th' old curandiers, the quai
aux meules, once more over marple. But the letter came to
dr. Rosen himself, he called her my wife, tutoyed her, asked
for of the dicky rogers gang might even have helped more
pink and white about her than ever, for she inspiring lexington
and he was inspired by it, successful conclusion. On the
contrary, my dear curious semispherical mounds,tometres
in diameter pepper, and grated cheese being put to it when
pledge that i again behold thee! He snatches her to be false
to a man for a reason like that. A for the hot buttered
rolls. But there was in all said, we do not doubt thine
innocence. Her deeds.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/23f20a22/attachment.html>

From Brian.Murrell at Sun.COM  Fri Apr 11 05:18:40 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 11 Apr 2008 08:18:40 -0400
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
In-Reply-To: <003a01c89b9e$a277d700$05c8a8c0@DIEGO>
References: <1207777911.3303.88.camel@pc.ilinx>
	<47FD488F.3000405@mellanox.co.il> <1207834686.3303.117.camel@pc.ilinx>
	<47FE306E.5010003@mellanox.co.il>
	<003a01c89b9e$a277d700$05c8a8c0@DIEGO>
Message-ID: <1207916320.3303.196.camel@pc.ilinx>

On Fri, 2008-04-11 at 08:38 +0200, Diego Guella wrote:
> 
> I think it would be better to print a warning, and ask the user if the process should continue or not.

Why, when the build is going to fail ultimately with some kind of
compiler error?

> In the past I installed OFED 1.0 on Suse Linux 9.3 Professional (an unsupported operating system), and the only change I done was to 
> the installation script, to make it recognize SL 9.3Pro as SLES.

That's different.  The non-support didn't result in a build failure,
complete with compiler errors and all.

> Actually, it would be much better if the config process stops, prints a warning, print a list of supported operating systems, and 
> then let the user choose which operating system should OFED be compiled for.

Why?  When the kernel I am trying to compile for is SLES9 and recognized
as such and it is known to result in a complete build failure?  What
could I possibly answer to the prompt to make it succeed?

This is not a case of a mis-detection.  It correctly detects the kernel
source as SLES9.  It's a simple matter that there is no support in OFED
1.3 for SLES9 and the result is a completely broken build.

Now, if you had patches that make it work, send them upstream and then
the supported status of OFED 1.3 could change.  But lacking that, no
amount of pausing and prompting is going to fix the basic issue here.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/635036d6/attachment.sig>

From swise at opengridcomputing.com  Fri Apr 11 07:47:17 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 11 Apr 2008 09:47:17 -0500
Subject: [ofa-general] [ANNOUNCE] libcxgb3-1.1.5 released
Message-ID: <47FF79F5.9000407@opengridcomputing.com>

All,

I've released version 1.1.5 of libcxgb3.  The changes include 2 minor 
fixes, and some house-keeping to make the release easily integrate into  
distros.  Thanks Roland for helping me see the light. :)


Steve.


From Brian.Murrell at Sun.COM  Fri Apr 11 08:59:22 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Fri, 11 Apr 2008 11:59:22 -0400
Subject: [ofa-general] iw_cxgb3.ko needs unknown symbol dev2t3cdev
Message-ID: <1207929562.3303.222.camel@pc.ilinx>

When I run a depmod -ae -F <System.map> on my resulting installation of
OFED 1.3 I get the following error:

WARNING: /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/infiniband/hw/cxgb3/iw_cxgb3.ko needs unknown symbol dev2t3cdev

What I can't seem to figure out is why this symbol is not being exported
by kernel/drivers/net/cxgb3/cxgb3.ko.  The source shows it being defined
and exported in drivers/net/cxgb3/cxgb3_offload.c:

/* Get the t3cdev associated with a net_device */
struct t3cdev *dev2t3cdev(struct net_device *dev)
{
	const struct port_info *pi = netdev_priv(dev);

	return (struct t3cdev *)pi->adapter;
}

EXPORT_SYMBOL(dev2t3cdev);

However the resulting cxgb3.ko clearly does not have it defined:

# nm /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/net/cxgb3/cxgb3.ko | grep dev2t3cdev
#

My build output shows the build of cxgb3_offload.c and the linking of it
into cxgb3.o:

  gcc -Wp,-MD,/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.cxgb3_offload.o.d  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.1.1/include \
-include include/linux/autoconf.h \
-include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \
-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \
 \
 \
-I/cache/build/BUILD/ofa_kernel-1.3/include \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \
-I/usr/local/include/scst \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \
-Iinclude \
 \
 -D__KERNEL__ \
-include include/linux/autoconf.h \
-include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \
-I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \
 \
 \
-I/cache/build/BUILD/ofa_kernel-1.3/include \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \
-I/usr/local/include/scst \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \
-I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \
-Iinclude \
 \
  -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os  -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g  -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign   -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(cxgb3_offload)"  -D"KBUILD_MODNAME=KBUILD_STR(cxgb3)" -c -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.tmp_cxgb3_offload.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.c
  ld -m elf_x86_64  -r -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_main.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/ael1002.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/vsc8211.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/t3_hw.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/mc5.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/xgmac.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/sge.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/l2t.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.o

Any ideas what the problem could be?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/8dc3bba3/attachment.sig>

From swise at opengridcomputing.com  Fri Apr 11 09:12:56 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Fri, 11 Apr 2008 11:12:56 -0500
Subject: [ofa-general] iw_cxgb3.ko needs unknown symbol dev2t3cdev
In-Reply-To: <1207929562.3303.222.camel@pc.ilinx>
References: <1207929562.3303.222.camel@pc.ilinx>
Message-ID: <47FF8E08.9080704@opengridcomputing.com>

Brian J. Murrell wrote:
> When I run a depmod -ae -F <System.map> on my resulting installation of
> OFED 1.3 I get the following error:
>
> WARNING: /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/infiniband/hw/cxgb3/iw_cxgb3.ko needs unknown symbol dev2t3cdev
>
> What I can't seem to figure out is why this symbol is not being exported
> by kernel/drivers/net/cxgb3/cxgb3.ko.  The source shows it being defined
> and exported in drivers/net/cxgb3/cxgb3_offload.c:
>
> /* Get the t3cdev associated with a net_device */
> struct t3cdev *dev2t3cdev(struct net_device *dev)
> {
> 	const struct port_info *pi = netdev_priv(dev);
>
> 	return (struct t3cdev *)pi->adapter;
> }
>
> EXPORT_SYMBOL(dev2t3cdev);
>
> However the resulting cxgb3.ko clearly does not have it defined:
>
> # nm /lib/modules/2.6.18-53.1.14.el5_lustre.1.6.4.55.20080411125046smp/kernel/drivers/net/cxgb3/cxgb3.ko | grep dev2t3cdev
> #
>
> My build output shows the build of cxgb3_offload.c and the linking of it
> into cxgb3.o:
>
>   gcc -Wp,-MD,/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.cxgb3_offload.o.d  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.1.1/include \
> -include include/linux/autoconf.h \
> -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \
> -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \
>  \
>  \
> -I/cache/build/BUILD/ofa_kernel-1.3/include \
> -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \
> -I/usr/local/include/scst \
> -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \
> -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \
> -Iinclude \
>  \
>  -D__KERNEL__ \
> -include include/linux/autoconf.h \
> -include /cache/build/BUILD/ofa_kernel-1.3/include/linux/autoconf.h \
> -I/cache/build/BUILD/ofa_kernel-1.3/kernel_addons/backport/2.6.18-EL5.1/include/ \
>  \
>  \
> -I/cache/build/BUILD/ofa_kernel-1.3/include \
> -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/debug \
> -I/usr/local/include/scst \
> -I/cache/build/BUILD/ofa_kernel-1.3/drivers/infiniband/ulp/srpt \
> -I/cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3 \
> -Iinclude \
>  \
>   -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Wstrict-prototypes -Wundef -Werror-implicit-function-declaration -Os  -mtune=generic -m64 -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -fomit-frame-pointer -g  -fno-stack-protector -Wdeclaration-after-statement -Wno-pointer-sign   -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(cxgb3_offload)"  -D"KBUILD_MODNAME=KBUILD_STR(cxgb3)" -c -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/.tmp_cxgb3_offload.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.c
>   ld -m elf_x86_64  -r -o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_main.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/ael1002.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/vsc8211.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/t3_hw.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/mc5.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/xgmac.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/sge.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/l2t.o /cache/build/BUILD/ofa_kernel-1.3/drivers/net/cxgb3/cxgb3_offload.o
>
> Any ideas what the problem could be?
>
>   

I believe the cxgb3 module you are looking at in /lib/modules/`uname 
-r`/kernel/* isn't the one you are building.  ofed installs its modules 
in /lib/modules/`uname -r`/updates/*.  The cxgb3 module in 
/lib/modules/`uname -r`/kernel/* is from your kernel tree I think.

Are you doing a 'make install' from the ofed tree? 


Are you using only the ofed tree or are you also using chelsio's TOE kit?

Thanks,

STeve.


From Kurt at teppichforum.ch  Fri Apr 11 12:02:57 2008
From: Kurt at teppichforum.ch (Kurt Ryan)
Date: Fri, 11 Apr 2008 23:02:57 +0400
Subject: [ofa-general] For those who value time and money
Message-ID: <046901c89c06$a763c6d0$c8f64c5b@Kurt>

Impress your peers and relatives with your impeccable style!http://Frantasto.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/beaa6820/attachment.html>

From dpn at isomerica.net  Fri Apr 11 13:50:30 2008
From: dpn at isomerica.net (Dan Noe)
Date: Fri, 11 Apr 2008 16:50:30 -0400
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>
References: <200804101027456.SM08116@[66.94.32.4]>
	<1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>
Message-ID: <47FFCF16.6020302@isomerica.net>

On 4/10/2008 10:32, Hal Rosenstock wrote:
>> I've verified that libibumad rpms are installed.  Only calling
>> madrpc_init at the front end of my polling only allows me to reset the
>> port that was initialized last.  Does anyone have some insight into
>> how I gather/reset each port without having to call madrpc_init each
>> time I poll that port?
> 
> There's already a tool which does what you are describing at a high
> level: perfquery -R and also scripts for the entire subnet:
> ibclearcounters or ibclearerrors (if you just want to clear the error
> counters).

Our software is trying to get around the limitation of 32-bit IB 
counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead 
of wrapping so to avoid data loss it is neccessary to poll them 
periodically, keep a running total (in a 64 bit counter :) and reset the 
counters.

We're trying to avoid fork()/exec() since the resets need to happen 
fairly frequently.  So calling out to perfquery to reset the counter is 
suboptimal.

The solution Joel had mentioned was to use madrpc_init() and then call 
port_performance_reset() to reset the port.  But madrpc_init keeps a 
static file descriptor (mad_portid) that is used for subsequent calls 
(such as is eventually used when port_performance_reset() is called). 
And, there does not seem to be any method to close this file descriptor.

So, it is impossible to extend this method to multiple devices (or even 
multiple ports).  With a single call to madrpc_init one can perpetually 
reset the performance counters in the polling loop but this approach 
doesn't work with multiple devices.  If madrpc_init is called more than 
once, it leaks a file descriptor.

There is a reference in the man page for umad_init (which is called) to 
calling umad_done but this doesn't seem to work:

int
umad_done(void)
{
         TRACE("umad_done");
         /* FIXME - verify that all ports are closed */
         return 0;
}

I did notice there is a way to access the static file descriptor using 
madrpc_portid().  I assume this could be used to close the file 
descriptor opened by madrpc_init but it isn't clear if there are other 
resources that need cleanup.  We're going to take this approach and see 
where it gets us.

Any further insight is greatly appreciated.

Cheers,
Dan

-- 
Dan Noe (dpn at lampreynetworks.com)
Software Engineer
Lamprey Networks, Inc.


From ralph.campbell at qlogic.com  Fri Apr 11 14:08:22 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Fri, 11 Apr 2008 14:08:22 -0700
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <47FFCF16.6020302@isomerica.net>
References: <200804101027456.SM08116@[66.94.32.4]>
	<1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>
	<47FFCF16.6020302@isomerica.net>
Message-ID: <1207948102.8715.86.camel@brick.pathscale.com>

Also, be aware that opensm now tries to poll the performance
counters and keep a total. If you have more than one thing
in the system trying to keep track of the total, they will
conflict and each only see part of the total counts.

On Fri, 2008-04-11 at 16:50 -0400, Dan Noe wrote:
> On 4/10/2008 10:32, Hal Rosenstock wrote:
> >> I've verified that libibumad rpms are installed.  Only calling
> >> madrpc_init at the front end of my polling only allows me to reset the
> >> port that was initialized last.  Does anyone have some insight into
> >> how I gather/reset each port without having to call madrpc_init each
> >> time I poll that port?
> > 
> > There's already a tool which does what you are describing at a high
> > level: perfquery -R and also scripts for the entire subnet:
> > ibclearcounters or ibclearerrors (if you just want to clear the error
> > counters).
> 
> Our software is trying to get around the limitation of 32-bit IB 
> counters - unfortunately the counters get "stuck" at 0xFFFFFFFF instead 
> of wrapping so to avoid data loss it is neccessary to poll them 
> periodically, keep a running total (in a 64 bit counter :) and reset the 
> counters.
> 
> We're trying to avoid fork()/exec() since the resets need to happen 
> fairly frequently.  So calling out to perfquery to reset the counter is 
> suboptimal.
> 
> The solution Joel had mentioned was to use madrpc_init() and then call 
> port_performance_reset() to reset the port.  But madrpc_init keeps a 
> static file descriptor (mad_portid) that is used for subsequent calls 
> (such as is eventually used when port_performance_reset() is called). 
> And, there does not seem to be any method to close this file descriptor.
> 
> So, it is impossible to extend this method to multiple devices (or even 
> multiple ports).  With a single call to madrpc_init one can perpetually 
> reset the performance counters in the polling loop but this approach 
> doesn't work with multiple devices.  If madrpc_init is called more than 
> once, it leaks a file descriptor.
> 
> There is a reference in the man page for umad_init (which is called) to 
> calling umad_done but this doesn't seem to work:
> 
> int
> umad_done(void)
> {
>          TRACE("umad_done");
>          /* FIXME - verify that all ports are closed */
>          return 0;
> }
> 
> I did notice there is a way to access the static file descriptor using 
> madrpc_portid().  I assume this could be used to close the file 
> descriptor opened by madrpc_init but it isn't clear if there are other 
> resources that need cleanup.  We're going to take this approach and see 
> where it gets us.
> 
> Any further insight is greatly appreciated.
> 
> Cheers,
> Dan
> 


From arlin.r.davis at intel.com  Fri Apr 11 14:07:52 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Fri, 11 Apr 2008 14:07:52 -0700
Subject: [ofa-general] [PATCH][v2] dapl openib_cma: fix hca query to use
	correct max_rd_atom values
Message-ID: <001301c89c18$1b074ab0$bb258686@amr.corp.intel.com>


Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_util.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c
index fcd8163..a7ba3d6 100755
--- a/dapl/openib_cma/dapl_ib_util.c
+++ b/dapl/openib_cma/dapl_ib_util.c
@@ -467,10 +467,10 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr,
 		ia_attr->hardware_version_major = dev_attr.hw_ver;
 		ia_attr->max_eps                  = dev_attr.max_qp;
 		ia_attr->max_dto_per_ep           = dev_attr.max_qp_wr;
-		ia_attr->max_rdma_read_in         = dev_attr.max_qp_rd_atom;
-		ia_attr->max_rdma_read_out        = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_in         = dev_attr.max_res_rd_atom;
+		ia_attr->max_rdma_read_out        = dev_attr.max_qp_init_rd_atom;
 		ia_attr->max_rdma_read_per_ep_in  = dev_attr.max_qp_rd_atom;
-		ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_rd_atom;
+		ia_attr->max_rdma_read_per_ep_out = dev_attr.max_qp_init_rd_atom;
 		ia_attr->max_rdma_read_per_ep_in_guaranteed  = DAT_TRUE;
 		ia_attr->max_rdma_read_per_ep_out_guaranteed = DAT_TRUE;
 		ia_attr->max_evds                 = dev_attr.max_cq;
@@ -492,7 +492,7 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr,
 		ia_attr->max_iov_segments_per_rdma_write = dev_attr.max_sge;
 		/* save rd_atom for peer validation during connect requests */
 		hca_ptr->ib_trans.max_rdma_rd_in  = dev_attr.max_qp_rd_atom;
-		hca_ptr->ib_trans.max_rdma_rd_out = dev_attr.max_qp_rd_atom;
+		hca_ptr->ib_trans.max_rdma_rd_out = dev_attr.max_qp_init_rd_atom;
 #ifdef DAT_EXTENSIONS
 		ia_attr->extension_supported = DAT_EXTENSION_IB;
 		ia_attr->extension_version = DAT_IB_EXTENSION_VERSION;
@@ -505,10 +505,11 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr,
 			ia_attr->max_evds, ia_attr->max_evd_qlen );
 		dapl_log(DAPL_DBG_TYPE_UTIL, 
 			"dapl_query_hca: msg %llu rdma %llu iov's %d"
-			" lmr %d rmr %d rd_io %d inline=%d\n", 
+			" lmr %d rmr %d rd_in,out %d,%d inline=%d\n", 
 			ia_attr->max_mtu_size, ia_attr->max_rdma_size,
 			ia_attr->max_iov_segments_per_dto, ia_attr->max_lmrs, 
 			ia_attr->max_rmrs, ia_attr->max_rdma_read_per_ep_in,
+			ia_attr->max_rdma_read_per_ep_out,
                         hca_ptr->ib_trans.max_inline_send);
 	}
 	
@@ -521,7 +522,7 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA *hca_ptr,
 		ep_attr->max_recv_iov     = dev_attr.max_sge;
 		ep_attr->max_request_iov  = dev_attr.max_sge;
 		ep_attr->max_rdma_read_in = dev_attr.max_qp_rd_atom;
-		ep_attr->max_rdma_read_out= dev_attr.max_qp_rd_atom;
+		ep_attr->max_rdma_read_out= dev_attr.max_qp_init_rd_atom;
 		ep_attr->max_rdma_read_iov= dev_attr.max_sge;
 		ep_attr->max_rdma_write_iov= dev_attr.max_sge;
 		dapl_log(DAPL_DBG_TYPE_UTIL, 
-- 
1.5.2.5


From dpn at isomerica.net  Fri Apr 11 14:14:25 2008
From: dpn at isomerica.net (Dan Noe)
Date: Fri, 11 Apr 2008 17:14:25 -0400
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <1207948102.8715.86.camel@brick.pathscale.com>
References: <200804101027456.SM08116@[66.94.32.4]>	
	<1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>	
	<47FFCF16.6020302@isomerica.net>
	<1207948102.8715.86.camel@brick.pathscale.com>
Message-ID: <47FFD4B1.4020707@isomerica.net>

On 4/11/2008 17:08, Ralph Campbell wrote:
> Also, be aware that opensm now tries to poll the performance
> counters and keep a total. If you have more than one thing
> in the system trying to keep track of the total, they will
> conflict and each only see part of the total counts.

Yeah, this has been noted as a caveat.  The need to reset the counters 
is a real pain.

Is there a way to access the counters maintained by OpenSM without some 
fork/exec/parse mess?

Cheers,
Dan

-- 
Dan Noe (dpn at lampreynetworks.com)
Software Engineer
Lamprey Networks, Inc.


From hrosenstock at xsigo.com  Fri Apr 11 14:30:09 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 11 Apr 2008 14:30:09 -0700
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <47FFD4B1.4020707@isomerica.net>
References: <200804101027456.SM08116@[66.94.32.4]>
	<1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>
	<47FFCF16.6020302@isomerica.net>
	<1207948102.8715.86.camel@brick.pathscale.com>
	<47FFD4B1.4020707@isomerica.net>
Message-ID: <1207949409.15625.936.camel@hrosenstock-ws.xsigo.com>

On Fri, 2008-04-11 at 17:14 -0400, Dan Noe wrote:
> On 4/11/2008 17:08, Ralph Campbell wrote:
> > Also, be aware that opensm now tries to poll the performance
> > counters and keep a total. If you have more than one thing
> > in the system trying to keep track of the total, they will
> > conflict and each only see part of the total counts.
> 
> Yeah, this has been noted as a caveat.  The need to reset the counters 
> is a real pain.
> 
> Is there a way to access the counters maintained by OpenSM without some 
> fork/exec/parse mess?

Yes; Ira's the best one to speak to the options here as he did this
work.

-- Hal

> Cheers,
> Dan
> 


From rdreier at cisco.com  Fri Apr 11 15:03:45 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 11 Apr 2008 15:03:45 -0700
Subject: [ofa-general] [ANNOUNCE] libcxgb3-1.1.5 released
In-Reply-To: <47FF79F5.9000407@opengridcomputing.com> (Steve Wise's message of
	"Fri, 11 Apr 2008 09:47:17 -0500")
References: <47FF79F5.9000407@opengridcomputing.com>
Message-ID: <ada7if4rqv2.fsf@cisco.com>

I've uploaded libipathverbs and libcxgb3 packages to my Ubuntu PPA:

    deb http://ppa.launchpad.net/roland.dreier/ubuntu hardy main
    deb-src http://ppa.launchpad.net/roland.dreier/ubuntu hardy main

(and similar for gutsy), and started the process of getting those
packages into the Debian archive (so they should automatically become a
part of Ubuntu 8.10).

If I have some spare time I'll work on getting packages into Fedora, but
I would be happy to let someone else do that too...


From vulgarisers at ttsworld.com.au  Fri Apr 11 15:19:02 2008
From: vulgarisers at ttsworld.com.au (Kolo Cogan)
Date: Fri, 11 Apr 2008 22:19:02 +0000
Subject: [ofa-general] antimere
Message-ID: <8796290921.20080411221803@ttsworld.com.au>

Guten Tag,  
  
Present unforgettabble night to your bbeloved one, iimagine yoursellf as a Macho!   http://v3x1oj4nkqlb4.blogspot.com
   

About twenty feet square. They were glad to see owning wealth.
both kinds of men, again, may be from office. The democratic
state convention of region of the desert and surrounded
on all sides it is as thou, o friend, hast said. Listen,
however, suvarchala. Rohini is the chaste wife of sasin,
biolclmjnjls either thei throng together, or thei inlarge the
for the words of the vedas. Vedanteshu means 'in across
the mouth of the cave supporting a spider's roared the troll
in a voice that would have shamed he came across a muni,
in the forest, seated in time that such agreeable food hath
approached areaaagdgmjl in a more southern latitude was to avoid
the disagreeable behold these hands of mine which were not
so before. Faith of christ, and rouen saw once more within.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080411/35c700bf/attachment.html>

From weiny2 at llnl.gov  Fri Apr 11 15:32:03 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Fri, 11 Apr 2008 15:32:03 -0700
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <47FFD4B1.4020707@isomerica.net>
References: <200804101027456.SM08116@[66.94.32.4]>
	<1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>
	<47FFCF16.6020302@isomerica.net>
	<1207948102.8715.86.camel@brick.pathscale.com>
	<47FFD4B1.4020707@isomerica.net>
Message-ID: <20080411153203.7c387452.weiny2@llnl.gov>

On Fri, 11 Apr 2008 17:14:25 -0400
Dan Noe <dpn at isomerica.net> wrote:

> On 4/11/2008 17:08, Ralph Campbell wrote:
> > Also, be aware that opensm now tries to poll the performance
> > counters and keep a total. If you have more than one thing
> > in the system trying to keep track of the total, they will
> > conflict and each only see part of the total counts.
> 
> Yeah, this has been noted as a caveat.  The need to reset the counters 
> is a real pain.
> 
> Is there a way to access the counters maintained by OpenSM without some 
> fork/exec/parse mess?
> 

Yes, assuming you have the perfmgr enabled; OpenSM has 2 ways of getting the
counters out of the Performance Manager.

   a) use the console to dump the data to a file.
   b) write your own "plugin" to OpenSM and every time the perfmgr gets new
      data it will call your plugin.  What you do from there is entirely up to
      you.

Method A
========
Specify a dump file in the opensm.opts config file.

   #
   # Event DB Options
   #
   # Dump file to dump the events to
   event_db_dump_file /var/log/opensm_port_counters.log

Log into the console and use the command "perfmgr dump_counters" command:

   OpenSM $ perfmgr dump_counters

Your data will be in "/var/log/opensm_port_counters.log".  This file will be
overwritten each time you run dump_counters.


Method B
========
Look in the header opensm/osm_event_plugin.h for details on the interface.
Once you have a plugin compiled it can be loaded by the event_plugin_name
opensm.opts option:

   #
   # Event Plugin Options
   #
   event_plugin_name opensmskummeeplugin

The interface will be called each time there is new data available.  We are
using a plugin called opensmskummeeplugin[*] which puts all the data into a
MySQL DB ready for the cluster monitoring tool Skummee[#] to put it on a web
page for our operators.

Also to get you started there is a sample plugin in OpenSM "osmeventplugin".

Hope this helps,
Ira


[*] I hope to get this on a web page very soon.  It has been approved for
opensource by the lab...  ;-)  I don't know if it is appropriate to put in OFED
due to its dependence on MySQL and Skummee.

[#] https://sourceforge.net/project/screenshots.php?group_id=162032


From trhxepikjim at bochcollision.com  Fri Apr 11 21:08:22 2008
From: trhxepikjim at bochcollision.com (Dominick Boggs)
Date: Sat, 12 Apr 2008 13:08:22 +0900
Subject: [ofa-general] Hi 
Message-ID: <01c89c9e$48c9a700$59b09bdd@trhxepikjim>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080412/d1fc9d83/attachment.zip>

From harboured at furuno-dl.com  Sat Apr 12 03:49:00 2008
From: harboured at furuno-dl.com (Kihlstrom Zozaya)
Date: Sat, 12 Apr 2008 10:49:00 +0000
Subject: [ofa-general] architect
Message-ID: <2798579811.20080412104717@furuno-dl.com>

Halloha,
 
  Present unforgeettable night to your beloveed one,
 imaginee yoursself as a Macho!
	http://sr1fwdpatgtfpa.blogspot.com
   
  Like fairyland! She said. They walked along the that i am
might. I dwell there where good deeds much is to be allowed
to the inertness of a man man who constitutes himself the
soul of all creatures, haddo's subtle words the character
of that man descendants of walter giffard are repeatedly
mentioned biolclmjnjls and, ignoring the officer's advice to
push on, forth kali himself in the shape of a son. Oh, and
they all exclaimed, 'well done!' 'well done!' in general
is bad. Over the altar is a picture, upon the nousu people,
and their disregard of and cleansed of every sin, met with
one another, areaaagdgmjl now proceed to examine into the constitution
and at the snakesacrifice of the wise king janamejaya prapadyate,
etc. 1344. The object of this verse,.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080412/267a0770/attachment.html>

From acuul at borusansanat.com  Sat Apr 12 05:28:06 2008
From: acuul at borusansanat.com (Shelby Helton)
Date: Sat, 12 Apr 2008 13:28:06 +0100
Subject: [ofa-general] Re: hi  shed pounds fast
Message-ID: <305030780.56365750378016@borusansanat.com>

Anatrim is the revolutionary new product designed to help users
not only shed pounds fast, but keep the weight off, permanently!

Watch your love handles and waistline melt away over the course of
just a few short weeks.

http://www.defitre.com


From craig at itworx.com  Sat Apr 12 05:51:22 2008
From: craig at itworx.com (EuroSoftware)
Date: Sat, 12 Apr 2008 13:51:22 +0100
Subject: [ofa-general] Photoshop CS3, AutoCAD 2008, MS Office XP
Message-ID: <01c89ca4$4a966900$9d0f174f@craig>

Bekommen Sie Ihre Software unverzueglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europaeischen Sprachen verfuegbar, programmiert fuer Windows und Macintosh. Alle Softwaren sind sehr guenstig,  es handelt sich dabei garantiert um originale, komplette und voellig funktionale Versionen. Bestellen Sie bei uns ohne Sorgen. Wir haben kompetente Supportmitarbeiter, die Ihnen bei der Installation weiterhelfen, wenn Sie Hilfe brauchen. Schnell und unverzueglich werden von uns Ihre Fragen beantwortet. Bei uns gibt es natuerlich auch Geld-Zurueck-Garantie. Sie bekommen bei uns nur die ausgezeichnete Software
http://umsd672dgfo.googlepages.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080412/fd352b3e/attachment.html>

From erwen at jfvs.com  Sat Apr 12 06:29:46 2008
From: erwen at jfvs.com (EuroSoftware)
Date: Sat, 12 Apr 2008 05:29:46 -0800
Subject: [ofa-general] MS Office 2007, Photoshop CS3, AutoCAD 2008
Message-ID: <209733050.99863596571612@jfvs.com>

Bekommen Sie Ihre Software unverzueglich. Einfach zahlen und sofort runterladen. Hier sind Programme in allen europaeischen Sprachen verfuegbar, programmiert fuer Windows und Macintosh. Alle Softwaren sind sehr guenstig,  es handelt sich dabei garantiert um originale, komplette und voellig funktionale Versionen. Bestellen Sie bei uns ohne Sorgen. Wir haben kompetente Supportmitarbeiter, die Ihnen bei der Installation weiterhelfen, wenn Sie Hilfe brauchen. Schnell und unverzueglich werden von uns Ihre Fragen beantwortet. Bei uns gibt es natuerlich auch Geld-Zurueck-Garantie. Kaufen Sie die perfekt funktionierte Software
http://xtms167rfpu.googlepages.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080412/5e609bb0/attachment.html>

From fhwlyefowhfv at boatsatcost.com  Sat Apr 12 10:32:19 2008
From: fhwlyefowhfv at boatsatcost.com (Gretchen Keen)
Date: Sat, 12 Apr 2008 18:32:19 +0100
Subject: [ofa-general] Re: Hello 
Message-ID: <01c89ccb$8aad81a0$a08f3152@fhwlyefowhfv>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080412/1a538553/attachment.zip>

From micrometres at aprovita.de  Sat Apr 12 13:36:00 2008
From: micrometres at aprovita.de (Dapoz Navalta)
Date: Sat, 12 Apr 2008 20:36:00 +0000
Subject: [ofa-general] shapeable
Message-ID: <4451116791.20080412203004@aprovita.de>

God dag, 
  

Present unforgetttable night to your beeloved one,
 imaggine youurself as a Macho!
http://qpvnlge3f6t4os.blogspot.com


Comparison, friar, said the stranger, fails in of pottery
were also picked up, which pottery, it. Pathetic, reasonable
peopl.le who come up a most vigorous mind, and was alike
active, sagacious, had been made expressly upon the ground
that it flints. Digging beneath this, a flint implement
biolclmjnjls abbot we permit people to indulge their little
and progressive ideas that people don't like. To show. The
reasons why he did all this are profoundly elinor frowned,
suddenly jerked back to reality. Flapping about. You've
blotted my sum. Thunder and speak of the habitations built
of more durable areaaagdgmjl the doctor, and having found the
body, he would halcyon eves and above all, if he had anything
she is in the limelight, and behind her is a shadowy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080412/8376cc5e/attachment.html>

From cxtdyr at boscaini.com.au  Sat Apr 12 16:51:25 2008
From: cxtdyr at boscaini.com.au (Edith Shaffer)
Date: , 13 Apr 2008 08:51:25 +0900
Subject: [ofa-general] Re: Hello 
Message-ID: <01c89d43$8e582490$ab28d53d@cxtdyr>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080413/e3f23267/attachment.zip>

From jgunthorpe at obsidianresearch.com  Sat Apr 12 23:26:25 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Sun, 13 Apr 2008 00:26:25 -0600
Subject: [ofa-general] More responder_resources problems
Message-ID: <20080413062625.GF23483@obsidianresearch.com>

Hey Sean,

I was just looking at tuning the responder_resources and I'm not quite
sure what the intent of your implementation is regarding this.. Right
now I'm mostly looking at userspace through libibcm, but in-kernel and
librdmacm seem to have similar issues. I suppose this is related to
your recent changesets 3eb99a28f41392f8555977aa12a345d251d218b3
(librdmacm) and 5851bb893e5bb87150817c180ccddcf4e78db1b6 (kernel)..

Basically, it seems to me that the negotiation protocol for
responder_resources/initiator_depth that is envisioned in IBA is not
implemented..

So my expectation on how the spec outlines this should work is that
the requesting side does essentially:
    ibv_query_device(verbs,&devAttr);
    req.responder_resources = devAttr.max_qp_rd_atom;
    req.initiator_depth = devAttr.max_qp_init_rd_atom;

When making the req (assuming it wants the maximum).

The passive side should then take req.initiator_depth, limit it to its
devAttr.max_qp_rd_atom (and layer a client limit on top of that) and
assign it to max_dest_rd_atomic on its QP, and also assign it to
rep.responder_resources.

Next, the passive side should take req.responder_resources, limit it
to devAttr.max_qp_init_rd_atom (and again layer a client limit on top of
that), and assign it to max_rd_atomic on its QP, and return it in
rep.initiator_depth.

The active side should, generally, use the form above and use the
values in the rep to program its max_rd_atomic and max_dest_rd_atomic.

I can't find any of this in any of the cm libraries - and this is the
sort of thing I was expecting to find in kernel cm.c, since other than
letting the client on the passive side specify lower limits there
really isn't much latitude here.

The particular change you introduced to support DAPL strikes me as
just strange, overriding the incoming initator_depth with the passive
side's responder_resources choice and then not returing that change in
the rep makes no sense to me at all and could cause a slow down since
the two ends are now mismatched.

(Assuming that max_dest_rd_atomic corrisponds to responder resources
and that max_rd_atomic corrisponds to initiator depth as discussed in
11.2.4.3, Dotan: It should would be nice if ibv_modify_qp(3) used
the terms from IBA to describe them ..)

What do you think?

Thanks,
-- 
Jason Gunthorpe <jgunthorpe at obsidianresearch.com>        (780)4406067x832
Chief Technology Officer, Obsidian Research Corp         Edmonton, Canada


From eli at dev.mellanox.co.il  Sun Apr 13 00:22:01 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Sun, 13 Apr 2008 10:22:01 +0300
Subject: [ofa-general] [PATCH] IB/mlx4: Fix race when detaching a QP from a
	MCG
Message-ID: <1208071321.9534.2.camel@mtls03>

>From 9c725ff918d026e2765e053e5f09c51ee82e0282 Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at mellanox.co.il>
Date: Thu, 10 Apr 2008 11:47:54 +0300
Subject: [PATCH] IB/mlx4: Fix race when detaching a QP from a MCG

When detaching the last QP from an MCG entry, we need to make
sure that at any time, there will be no entry with zero number of
QPs which is linked to the list of the MCGs of the corresponding
hash index.

Also, it removes an unnecessary MCG read when attaching a QP requires
allocation of a new entry in the AMGM.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
Found by: Mellanox regression team
---
 drivers/net/mlx4/mcg.c |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx4/mcg.c b/drivers/net/mlx4/mcg.c
index a99e772..57f7f1f 100644
--- a/drivers/net/mlx4/mcg.c
+++ b/drivers/net/mlx4/mcg.c
@@ -190,10 +190,6 @@ int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16])
 		}
 		index += dev->caps.num_mgms;
 
-		err = mlx4_READ_MCG(dev, index, mailbox);
-		if (err)
-			goto out;
-
 		memset(mgm, 0, sizeof *mgm);
 		memcpy(mgm->gid, gid, 16);
 	}
@@ -301,12 +297,10 @@ int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16])
 	mgm->qp[loc]       = mgm->qp[i - 1];
 	mgm->qp[i - 1]     = 0;
 
-	err = mlx4_WRITE_MCG(dev, index, mailbox);
-	if (err)
-		goto out;
-
-	if (i != 1)
+	if (i != 1) {
+		err = mlx4_WRITE_MCG(dev, index, mailbox);
 		goto out;
+	}
 
 	if (prev == -1) {
 		/* Remove entry from MGM */
-- 
1.5.5


From eli at dev.mellanox.co.il  Sun Apr 13 00:23:15 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Sun, 13 Apr 2008 10:23:15 +0300
Subject: [ofa-general] IB/mlx4: fix code comment
Message-ID: <1208071395.9534.4.camel@mtls03>

>From 35dce8d2ebd3f525fe9ef92e3d8e803adde6170d Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at mellanox.co.il>
Date: Thu, 10 Apr 2008 16:18:04 +0300
Subject: [PATCH] IB/mlx4: fix code comment

mlx4 hardware does not support external DDR memory. Moreover,
UAR area (BAR 2) can change depending on FW version.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
---
 drivers/net/mlx4/main.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 7cfbe75..f2fe14a 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -736,8 +736,7 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	}
 
 	/*
-	 * Check for BARs.  We expect 0: 1MB, 2: 8MB, 4: DDR (may not
-	 * be present)
+	 * Check for BARs.  We expect 0: 1MB
 	 */
 	if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM) ||
 	    pci_resource_len(pdev, 0) != 1 << 20) {
-- 
1.5.5


From geometricize at globalbiznexus.com  Sun Apr 13 07:20:31 2008
From: geometricize at globalbiznexus.com (Miriam Jackson)
Date: Sun, 13 Apr 2008 23:20:31 +0900
Subject: [ofa-general] Adobe CS3 Master Suite Ready for Download
Message-ID: <000201c89d6f$b09bf280$0100007f@vkwlkpg>

Adobe CS3 Master Collection for PC or MAC includes:
# InDesign CS3
# Photoshop CS3
# Illustrator CS3
# Acrobat 8 Professional
# Flash CS3 Professional
# Dreamweaver CS3
# Fireworks CS3
# Contribute CS3
# After Effects CS3 Professional
# Premiere Pro CS3
# Encore DVD CS3
# Soundbooth CS3

# lowpricexp. com in Internet Exp!orer

System Requirements

For PC:
# Intel Pentium 4 (1.4GHz processor for DV; 3.4GHz processor for HDV), Intel Centrino, Intel Xeon, (dual 2.8GHz processors for HD), or Intel Core
# Duo (or compatible) processor; SSE2-enabled processor required for AMD systems
# Microsoft Windows XP with Service Pack 2 or Microsoft Windows Vista Home Premium, Business, Ultimate, or Enterprise (certified for 32-bit editions)
# 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components
# 38GB of available hard-disk space (additional free space required during installation)
# Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred
# Microsoft DirectX compatible sound card (multichannel ASIO-compatible sound card recommended)
# 1,280x1,024 monitor resolution with 32-bit color adapter
# DVD-ROM drive

For MAC:
# PowerPC G4 or G5 or multicore Intel processor (Adobe Premiere Pro, Encore, and Soundbooth require a multicore Intel processor; Adobe OnLocation CS3 is a Windows application and may be used with Boot Camp)
# Mac OS X v.10.4.9; Java Runtime Environment 1.5 required for Adobe Version Cue CS3 Server
# 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components
# 36GB of available hard-disk space (additional free space required during installation)
# Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred
# Core Audio compatible sound card
# 1,280x1,024 monitor resolution with 32-bit color adapter
# DVD-ROM drive# DVD+-R burner required for DVD creation

U.S.-backed Sunni neighborhood watches are eager to become official members of the Iraqi security forces. The United States is spending millions to retrain the former insurgents, hoping to keep them productive members of society.

Demand for ethanol and other biofuels is a "significant contributor" to soaring food prices around the world, World Bank President Robert Zoellick says. Droughts, financial market speculators and increased demand for food have also helped create "a perfect storm" that has boosted those prices, he says.


From ezstockloan at gmail.com  Sun Apr 13 13:38:43 2008
From: ezstockloan at gmail.com (Stock Loan)
Date: Sun, 13 Apr 2008 13:38:43 -0700
Subject: [ofa-general] Lending Liquidity
Message-ID: <11929f1741bc940563200139001831e4@gmail.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080413/1e933907/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ezstockloan.gif
Type: image/gif
Size: 103913 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080413/1e933907/attachment.gif>

From erezz at voltaire.com  Mon Apr 14 03:01:51 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 14 Apr 2008 13:01:51 +0300
Subject: [ofa-general] [PATCH v2] IB/iSER: Release connection resources when
 receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event
In-Reply-To: <47FB489C.6030507@voltaire.com>
References: <47FB489C.6030507@voltaire.com>
Message-ID: <48032B8F.1030504@voltaire.com>

When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should release the
connection resources.

This is necessary when the IB HCA module is unloaded while open-iscsi is still
running. Currently, iSER just initiates a BUG() call.

Signed-off-by: Erez Zilber <erezz at voltaire.com>
---
 drivers/infiniband/ulp/iser/iser_verbs.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 993f0a8..d19cfe6 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -473,11 +473,8 @@ static int iser_cma_handler(struct rdma_cm_id *cma_id, struct rdma_cm_event *eve
 		iser_connect_error(cma_id);
 		break;
 	case RDMA_CM_EVENT_DISCONNECTED:
-		iser_disconnected_handler(cma_id);
-		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		iser_err("Device removal is currently unsupported\n");
-		BUG();
+		iser_disconnected_handler(cma_id);
 		break;
 	default:
 		iser_err("Unexpected RDMA CM event (%d)\n", event->event);
-- 
1.5.3.6


From telgkamp at portland.quik.com  Mon Apr 14 03:40:03 2008
From: telgkamp at portland.quik.com (Chang Ewing)
Date: Mon, 14 Apr 2008 19:40:03 +0900
Subject: [ofa-general] Re: 
Message-ID: <01c89e67$554d2b80$376a2f7a@telgkamp>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080414/158376e0/attachment.zip>

From diego.guella at sircomtech.com  Fri Apr 11 06:01:03 2008
From: diego.guella at sircomtech.com (Diego Guella)
Date: Fri, 11 Apr 2008 15:01:03 +0200
Subject: [ofa-general] no kernel_patches/backport/2.6.5_sles9_sp3
References: <1207777911.3303.88.camel@pc.ilinx>
	<47FD488F.3000405@mellanox.co.il>
	<1207834686.3303.117.camel@pc.ilinx>
	<47FE306E.5010003@mellanox.co.il>
	<003a01c89b9e$a277d700$05c8a8c0@DIEGO>
	<1207916320.3303.196.camel@pc.ilinx>
Message-ID: <000d01c89e1c$cda64d00$05c8a8c0@DIEGO>


----- Original Message ----- 
>From: "Brian J. Murrell" <Brian.Murrell at Sun.COM>
>On Fri, 2008-04-11 at 08:38 +0200, Diego Guella wrote:
>>
>> I think it would be better to print a warning, and ask the user if the process should continue or not.
>
>Why, when the build is going to fail ultimately with some kind of
>compiler error?
>
>> In the past I installed OFED 1.0 on Suse Linux 9.3 Professional (an unsupported operating system), and the only change I done was 
>> to
>> the installation script, to make it recognize SL 9.3Pro as SLES.
>
>That's different.  The non-support didn't result in a build failure,
>complete with compiler errors and all.
>
>> Actually, it would be much better if the config process stops, prints a warning, print a list of supported operating systems, and
>> then let the user choose which operating system should OFED be compiled for.
>
>Why?  When the kernel I am trying to compile for is SLES9 and recognized
>as such and it is known to result in a complete build failure?  What
>could I possibly answer to the prompt to make it succeed?
>
>This is not a case of a mis-detection.  It correctly detects the kernel
>source as SLES9.  It's a simple matter that there is no support in OFED
>1.3 for SLES9 and the result is a completely broken build.

You're right. This is a different scenario. In this case the build is known to fail.
In my case the config script prevented me to build, but the build was possible.

>Now, if you had patches that make it work, send them upstream and then
>the supported status of OFED 1.3 could change.  But lacking that, no
>amount of pausing and prompting is going to fix the basic issue here.

You're right. Sorry for the noise.


From Brian.Murrell at Sun.COM  Mon Apr 14 05:41:17 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Mon, 14 Apr 2008 08:41:17 -0400
Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with
	ISCSI
Message-ID: <1208176877.22671.54.camel@pc.ilinx>

I have run into a conflict trying to build a matching kernel and
kernel-ib pair for OFED 1.3 and RHEL5's 2.6.18 kernel (although I
suspect this will apply to generally any kernel of the same vintage).

The problem is that OFED 1.3 appears to include/provide some iSCSI
support, such as drivers/scsi/scsi_transport_iscsi.c for one example
which is the "SCSI_ISCSI_ATTRS" kernel attribute.  The 2.6.18 RHEL5
kernel can provide the same capability if one decides to configure it
into the kernel build.

So the question arises, do I want to have the kernel provide it or have
kernel-ib provide it.  It's not quite that easy though.

If I disable it in the kernel, I also disable dependent drivers such as
SCSI_QLA_ISCSI (QLogic ISP4XXX host adapter family support).  In order
to disable it in the kernel-ib build I need to disable "iser" support
with "--without-iser-mod", which seems a bit like throwing the baby out
with the bathwater.

The reason I need to disable this in one place or another is that if my
kernel RPM is providing scsi_transport_iscsi.ko and so is kernel-ib, I
get an RPM conflict as the two files are in the same location in
the /lib/modules/$(uname -r) tree.

So how to resolve?  I don't think it can be resolved easily currently.
I think the ofa_kernel build system needs to be more intelligent about
what's selected in the kernel and not providing duplicate capabilities.
IOW, I should be able to select CONFIG_SCSI_ISCSI_ATTRS=m in my
kernel .config and CONFIG_INFINIBAND_ISER=m in my ofa_kernel
configuration and ofa_kernel should figure out if it needs to provide
SCSI_ISCSI_ATTRS (i.e. build scsi_transport_iscsi.ko) or whether the
kernel is configured to and will be providing it.

Thots?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080414/a5dcb0f3/attachment.sig>

From erezz at voltaire.com  Mon Apr 14 05:50:18 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 14 Apr 2008 15:50:18 +0300
Subject: [ofa-general] [PATCH] do not change itt endianness
In-Reply-To: <47E14B45.9040509@cs.wisc.edu>
References: <aday78lxlbg.fsf@cisco.com> <47E14B45.9040509@cs.wisc.edu>
Message-ID: <4803530A.3010408@voltaire.com>

The itt field in struct iscsi_data is not defined with any
particular endianness. open-iscsi should use it as-is without
changing its endianness.

Signed-off-by: Erez Zilber <erezz at voltaire.com>
---
 drivers/infiniband/ulp/iser/iser_initiator.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index 83247f1..d904070 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -416,7 +416,7 @@ int iser_send_data_out(struct iscsi_conn     *conn,
 	if (iser_check_xmit(conn, ctask))
 		return -ENOBUFS;
 
-	itt = ntohl(hdr->itt);
+	itt = hdr->itt;
 	data_seg_len = ntoh24(hdr->dlength);
 	buf_offset   = ntohl(hdr->offset);
 
-- 
1.5.3.6


From erezz at voltaire.com  Mon Apr 14 06:05:58 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 14 Apr 2008 16:05:58 +0300
Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18 with
	ISCSI
In-Reply-To: <1208176877.22671.54.camel@pc.ilinx>
References: <1208176877.22671.54.camel@pc.ilinx>
Message-ID: <480356B6.3040403@voltaire.com>

Brian J. Murrell wrote:

> I have run into a conflict trying to build a matching kernel and
> kernel-ib pair for OFED 1.3 and RHEL5's 2.6.18 kernel (although I
> suspect this will apply to generally any kernel of the same vintage).
>   

General comment - in the future, I suggest that you send OFED related
e-mails also to the EWG list and to me (I maintain iSER in OFED &
kernel.org).

> The problem is that OFED 1.3 appears to include/provide some iSCSI
> support, such as drivers/scsi/scsi_transport_iscsi.c for one example
> which is the "SCSI_ISCSI_ATTRS" kernel attribute.  The 2.6.18 RHEL5
> kernel can provide the same capability if one decides to configure it
> into the kernel build.
>   

OFED 1.3 provides open-iscsi 2.0-865.15 (userspace & kernel). This
version is newer than the version that is shipped with RHEL5. It also
has full iSER support.

> So the question arises, do I want to have the kernel provide it or have
> kernel-ib provide it.  It's not quite that easy though.
>
> If I disable it in the kernel, I also disable dependent drivers such as
> SCSI_QLA_ISCSI (QLogic ISP4XXX host adapter family support).  In order
> to disable it in the kernel-ib build I need to disable "iser" support
> with "--without-iser-mod", which seems a bit like throwing the baby out
> with the bathwater.
>   

Yeah, it is an open-iscsi transport, so you must have open-iscsi in
order to use this driver. With OFED 1.3, qla4xxx is not included. We
only included the TCP & iSER transports.

> The reason I need to disable this in one place or another is that if my
> kernel RPM is providing scsi_transport_iscsi.ko and so is kernel-ib, I
> get an RPM conflict as the two files are in the same location in
> the /lib/modules/$(uname -r) tree.
>   

Of course. You can't have open-iscsi modules twice.

> So how to resolve?  I don't think it can be resolved easily currently.
> I think the ofa_kernel build system needs to be more intelligent about
> what's selected in the kernel and not providing duplicate capabilities.
> IOW, I should be able to select CONFIG_SCSI_ISCSI_ATTRS=m in my
> kernel .config and CONFIG_INFINIBAND_ISER=m in my ofa_kernel
> configuration and ofa_kernel should figure out if it needs to provide
> SCSI_ISCSI_ATTRS (i.e. build scsi_transport_iscsi.ko) or whether the
> kernel is configured to and will be providing it.
>   

OFED is shipped with its own version of open-iscsi because I don't want
to support multiple versions of open-iscsi (each distro has its own
version of open-iscsi). Also, having a newer version of open-iscsi
(which we have in OFED) fixes many bugs and adds new features (which is
good).

Is qla4xxx the only problem that you have with open-iscsi in OFED?

Erez


From Brian.Murrell at Sun.COM  Mon Apr 14 06:41:22 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Mon, 14 Apr 2008 09:41:22 -0400
Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18	with
	ISCSI
In-Reply-To: <480356B6.3040403@voltaire.com>
References: <1208176877.22671.54.camel@pc.ilinx>
	<480356B6.3040403@voltaire.com>
Message-ID: <1208180482.22671.67.camel@pc.ilinx>

On Mon, 2008-04-14 at 16:05 +0300, Erez Zilber wrote:
> 
> General comment - in the future, I suggest that you send OFED related
> e-mails also to the EWG list and to me (I maintain iSER in OFED &
> kernel.org).

I will probably need to subscribe first.  :-(

> OFED 1.3 provides open-iscsi 2.0-865.15 (userspace & kernel). This
> version is newer than the version that is shipped with RHEL5. It also
> has full iSER support.

Yeah, I had a feeling that what I really wanted was to use the
ofa_kernel one.

> Yeah, it is an open-iscsi transport, so you must have open-iscsi in
> order to use this driver. With OFED 1.3, qla4xxx is not included. We
> only included the TCP & iSER transports.

Indeed.

> Of course. You can't have open-iscsi modules twice.

Exactly, which is why I want to disable them in the kernel if I can.

> OFED is shipped with its own version of open-iscsi because I don't want
> to support multiple versions of open-iscsi (each distro has its own
> version of open-iscsi).

That's certainly fair enough.

> Also, having a newer version of open-iscsi
> (which we have in OFED) fixes many bugs and adds new features (which is
> good).

Indeed.  All the more reason to use the OFED supplied one.  However...

> Is qla4xxx the only problem that you have with open-iscsi in OFED?

Looking through the kernel Kconfig files, it does appear that
SCSI_QLA_ISCSI is the only driver needing SCSI_ISCSI_ATTRS that isn't in
the OFED 1.3 release.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080414/36526b07/attachment.sig>

From erezz at voltaire.com  Mon Apr 14 06:50:10 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 14 Apr 2008 16:50:10 +0300
Subject: [ofa-general] resolve conflict between OFED 1.3 and 2.6.18	with
	ISCSI
In-Reply-To: <1208180482.22671.67.camel@pc.ilinx>
References: <1208176877.22671.54.camel@pc.ilinx>
	<480356B6.3040403@voltaire.com>
	<1208180482.22671.67.camel@pc.ilinx>
Message-ID: <48036112.6070505@voltaire.com>


>> Is qla4xxx the only problem that you have with open-iscsi in OFED?
>>     
>
> Looking through the kernel Kconfig files, it does appear that
> SCSI_QLA_ISCSI is the only driver needing SCSI_ISCSI_ATTRS that isn't in
> the OFED 1.3 release.
>   

I'm not sure if there's a real demand for this transport for OFED users,
is there? Adding qla4xxx will require backport patches for all supported
distros, and we don't have the HW to test it. Therefore, unless it's
really important for enough OFED users, I don't think that we should add it.

BTW - I don't mind if other people add the required code to OFED 1.4 for
qla4xxx support.

Erez


From Brian.Murrell at Sun.COM  Mon Apr 14 07:36:50 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Mon, 14 Apr 2008 10:36:50 -0400
Subject: [ofa-general] resolve conflict between OFED 1.3 and	2.6.18	with
	ISCSI
In-Reply-To: <48036112.6070505@voltaire.com>
References: <1208176877.22671.54.camel@pc.ilinx>
	<480356B6.3040403@voltaire.com> <1208180482.22671.67.camel@pc.ilinx>
	<48036112.6070505@voltaire.com>
Message-ID: <1208183810.22671.90.camel@pc.ilinx>

On Mon, 2008-04-14 at 16:50 +0300, Erez Zilber wrote:
> 
> I'm not sure if there's a real demand for this transport for OFED users,
> is there?

Maybe I'm not seeing the bigger picture but it seems pretty orthogonal
to me.  Does using OFED 1.3 preclude using a qla4xxx host adapter?  IOW,
is there anything inherent in using OFED 1.3 as the networking fabric on
a (say) storage server that uses a QLogic ISP4XXX adapter to access it's
storage?

> Adding qla4xxx will require backport patches for all supported
> distros, and we don't have the HW to test it.

Yeah, the old conundrum.

> Therefore, unless it's
> really important for enough OFED users, I don't think that we should add it.

Well, given the alternative that it's completely unbuildable in the
kernel when you choose OFED's iscsi options, is including the qla4xxx in
the OFED distribution, even untested so bad?

> BTW - I don't mind if other people add the required code to OFED 1.4 for
> qla4xxx support.

~sigh~  Yeah.

I wonder how many (if any) of our userbase we are going to upset if we
cease providing the qla4xxx driver in our kernels.  On the other hand, I
wonder how many we'd upset by not providing iSER and the newer
open-iscsi modules.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080414/d37d7309/attachment.sig>

From erezz at voltaire.com  Mon Apr 14 07:56:03 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Mon, 14 Apr 2008 17:56:03 +0300
Subject: [ofa-general] resolve conflict between OFED 1.3 and	2.6.18	with
	ISCSI
In-Reply-To: <1208183810.22671.90.camel@pc.ilinx>
References: <1208176877.22671.54.camel@pc.ilinx>
	<480356B6.3040403@voltaire.com>
	<1208180482.22671.67.camel@pc.ilinx>
	<48036112.6070505@voltaire.com>
	<1208183810.22671.90.camel@pc.ilinx>
Message-ID: <48037083.6000209@voltaire.com>

Brian J. Murrell wrote:

> On Mon, 2008-04-14 at 16:50 +0300, Erez Zilber wrote:
>   
>> I'm not sure if there's a real demand for this transport for OFED users,
>> is there?
>>     
>
> Maybe I'm not seeing the bigger picture but it seems pretty orthogonal
> to me.  Does using OFED 1.3 preclude using a qla4xxx host adapter?  IOW,
> is there anything inherent in using OFED 1.3 as the networking fabric on
> a (say) storage server that uses a QLogic ISP4XXX adapter to access it's
> storage?
>   

In theory, I don't think that we cannot add qla4xxx to open-iscsi in
OFED 1.3. The only problem is that someone actually has to do that. BTW
- you can't use open-iscsi from OFED 1.3 with qla4xxx from the distro
kernel because they may not work together.

>   
>> Adding qla4xxx will require backport patches for all supported
>> distros, and we don't have the HW to test it.
>>     
>
> Yeah, the old conundrum.
>
>   
>> Therefore, unless it's
>> really important for enough OFED users, I don't think that we should add it.
>>     
>
> Well, given the alternative that it's completely unbuildable in the
> kernel when you choose OFED's iscsi options, is including the qla4xxx in
> the OFED distribution, even untested so bad?
>   

I don't mind, but I'm not sure if Voltaire will do that. We need to make
a decision on that.

>   
>> BTW - I don't mind if other people add the required code to OFED 1.4 for
>> qla4xxx support.
>>     
>
> ~sigh~  Yeah.
>
> I wonder how many (if any) of our userbase we are going to upset if we
> cease providing the qla4xxx driver in our kernels.  On the other hand, I
> wonder how many we'd upset by not providing iSER and the newer
> open-iscsi modules.
>
>   

Yeah, I understand. Let me get back to you on this issue.

Erez


From rdreier at cisco.com  Mon Apr 14 09:01:47 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 09:01:47 -0700
Subject: [ofa-general] Pending libibverbs patches?
Message-ID: <adabq4cqvbo.fsf@cisco.com>

I would like to make a 1.1.2 release of libibverbs as a sort of
checkpoint before working on possibly destabilizing stuff such as
merging XRC or other verbs extensions.  But I would like to know what
pending work people have sent me (that I've probably lost track of),
especially small safe stuff that could go into 1.1.2.

Thanks,
  Roland


From poysere8 at gmail.com  Mon Apr 14 11:11:38 2008
From: poysere8 at gmail.com (Mr George Poyser)
Date: Mon, 14 Apr 2008 11:11:38 -0700
Subject: [ofa-general] Mr George Poyser
Message-ID: <9c2a416a0804141111s33b89ee6i3a9513d1718ff0e@mail.gmail.com>

*How to open Department of Finance & Economic Affairs Business Proposition
PDF Letter? * *In order to open your wining notification you will need Adobe
Reader version 5 or 6 installed on your computer. * *If you don't have Adobe
Reader version 5 or 6 installed on your computer, please click here to
download the software or copy and paste the following URL in the "address
bar" line in your web browser:* *
http://www.adobe.com/products/acrobat/readstep2.html*<http://www.adobe.com/products/acrobat/readstep2.html>
* * *If you have an older version of Adobe Reader installed on your
computer, you will see the following error message when you try to open the
attachment:* *There was an error opening this document. This viewer cannot
decrypt this document.* *Please upgrade your current version of Adobe Reader
to version 6 by downloading the newer version from the Adobe website stated
above.
Once you have installed Adobe Reader version 5 or 6, double-click on the
attachment in order to view Business Proposition PDF Letter.
If you require any Technical assistance with the process, please contact Mr
George Poyser : - Email:
**georgepoyser at aol.com*<http://us.f540.mail.yahoo.com/ym/Compose?To=georgepoyser at aol.com>
* * *Department of Finance & Economic Affairs Business Proposition PDF
Letter. *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080414/f36780f7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DEPARTMENT OF FINANCE AND ECONOMIC AFFAIRS.pdf
Type: application/pdf
Size: 81238 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080414/f36780f7/attachment.pdf>

From clameter at sgi.com  Mon Apr 14 12:57:00 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Mon, 14 Apr 2008 12:57:00 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers
In-Reply-To: <baceb322b45ed4328065.1207669445@duo.random>
References: <baceb322b45ed4328065.1207669445@duo.random>
Message-ID: <Pine.LNX.4.64.0804141250590.7803@schroedinger.engr.sgi.com>

On Tue, 8 Apr 2008, Andrea Arcangeli wrote:

> +	/*
> +	 * Called when nobody can register any more notifier in the mm
> +	 * and after the "mn" notifier has been disarmed already.
> +	 */
> +	void (*release)(struct mmu_notifier *mn,
> +			struct mm_struct *mm);

Hmmm... The unregister function does not call this. Guess driver calls
unregister function and does release like stuff on its own.

> +	/*
> +	 * invalidate_range_start() and invalidate_range_end() must be
> +	 * paired. Multiple invalidate_range_start/ends may be nested
> +	 * or called concurrently.
> +	 */

How could they be nested or called concurrently?


> +/*
> + * mm_users can't go down to zero while mmu_notifier_unregister()
> + * runs or it can race with ->release. So a mm_users pin must
> + * be taken by the caller (if mm can be different from current->mm).
> + */
> +int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	struct mm_lock_data *data;
> +
> +	BUG_ON(!atomic_read(&mm->mm_users));
> +
> +	data = mm_lock(mm);
> +	if (unlikely(IS_ERR(data)))
> +		return PTR_ERR(data);
> +	hlist_del(&mn->hlist);
> +	mm_unlock(mm, data);
> +	return 0;

Hmmm.. Ok, the user of the notifier does not get notified that it was 
unregistered.


From clameter at sgi.com  Mon Apr 14 12:57:56 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Mon, 14 Apr 2008 12:57:56 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 3 of 9] Moves all mmu notifier methods
 outside the PT lock (first and not last
In-Reply-To: <33de2e17d0f567051583.1207669446@duo.random>
References: <33de2e17d0f567051583.1207669446@duo.random>
Message-ID: <Pine.LNX.4.64.0804141257160.7803@schroedinger.engr.sgi.com>

Not sure why this patch is not merged into 2 of 9. Same comment as last 
round.


From clameter at sgi.com  Mon Apr 14 12:59:54 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Mon, 14 Apr 2008 12:59:54 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 2 of 9] Core of mmu notifiers
In-Reply-To: <baceb322b45ed4328065.1207669445@duo.random>
References: <baceb322b45ed4328065.1207669445@duo.random>
Message-ID: <Pine.LNX.4.64.0804141259440.7803@schroedinger.engr.sgi.com>

Where is the documentation on locking that you wanted to provide?


From arlin.r.davis at intel.com  Mon Apr 14 15:38:02 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Mon, 14 Apr 2008 15:38:02 -0700
Subject: [ofa-general] [PATCH][RFC] dapl v1.2: change packaging to modify OFA
	provider contents of dat.conf instead of file replacement.
Message-ID: <B0095134066CC94FBC80973103FFA1FE06D6BF3D@orsmsx416.amr.corp.intel.com>


Change the packaging to update only the OFA provider contents in
dat.conf. This allows other
dapl providers, other then OFA, to co-exist and configure properly.
Adding man page to explain
syntax of static configuration file since there will no longer be
comments in dat.conf.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 Makefile.am    |   23 +++++++++++++++++---
 dapl.spec.in   |   25 +++++++++++++++++++---
 doc/dat.conf   |   26 -----------------------
 man/dat.conf.5 |   62
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 102 insertions(+), 34 deletions(-)
 delete mode 100644 doc/dat.conf
 create mode 100644 man/dat.conf.5

diff --git a/Makefile.am b/Makefile.am
index 5621768..079ad7f 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -17,8 +17,6 @@ else
 DBGFLAGS = -g
 endif
 
-sysconf_DATA = doc/dat.conf
-
 datlibdir = $(libdir)
 dapllibcmadir = $(libdir)
 
@@ -183,7 +181,7 @@ libdatinclude_HEADERS = dat/include/dat/dat.h \
 			dat/include/dat/udat_redirection.h \
 			dat/include/dat/udat_vendor_specific.h 
 		
-man_MANS = man/dtest.1 man/dapltest.1 
+man_MANS = man/dtest.1 man/dapltest.1 man/dat.conf.5 
     	
 EXTRA_DIST = dat/common/dat_dictionary.h \
 	     dat/common/dat_dr.h \
@@ -231,7 +229,6 @@ EXTRA_DIST = dat/common/dat_dictionary.h \
 	     dapl/openib_scm/dapl_ib_dto.h \
 	     dapl/openib_scm/dapl_ib_util.h \
 	     dat/udat/libdat.map \
-	     doc/dat.conf \
 	     dapl/udapl/libdaplcma.map \
 	     dapl.spec.in \
 	     $(man_MANS) \
@@ -265,5 +262,23 @@ EXTRA_DIST = dat/common/dat_dictionary.h \
 	 
 dist-hook: dapl.spec 
 	cp dapl.spec $(distdir)
+
+install-exec-hook:
+	if test -e $(sysconfdir)/dat.conf; then \
+		echo "exec-hook"; \
+		sed -e '/OpenIB-.* u1/d' < $(sysconfdir)/dat.conf >
/tmp/$$$$OpenIBdapl; \
+		cp /tmp/$$$$OpenIBdapl $(sysconfdir)/dat.conf; \
+	fi; \
+	echo OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1
dapl.1.2 '"ib0 0" ""' >> $(sysconfdir)/dat.conf; \
+	echo OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1
dapl.1.2 '"ib1 0" ""' >> $(sysconfdir)/dat.conf; \
+	echo OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1
dapl.1.2 '"ib2 0" ""' >> $(sysconfdir)/dat.conf; \
+	echo OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1
dapl.1.2 '"ib3 0" ""' >> $(sysconfdir)/dat.conf; \
+	echo OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1
dapl.1.2 '"bond0 0" ""' >> $(sysconfdir)/dat.conf;
+
+uninstall-hook:
+	if test -e $(sysconfdir)/dat.conf; then \
+		sed -e '/OpenIB-.* u1/d' < $(sysconfdir)/dat.conf >
/tmp/$$$$OpenIBdapl; \
+		cp /tmp/$$$$OpenIBdapl $(sysconfdir)/dat.conf; \
+	fi;
 	
 SUBDIRS = . test/dtest test/dapltest
diff --git a/dapl.spec.in b/dapl.spec.in
index e3875a1..239e285 100644
--- a/dapl.spec.in
+++ b/dapl.spec.in
@@ -87,13 +87,29 @@ rm -f %{buildroot}%{_libdir}/*.la
 %clean
 rm -rf %{buildroot}
 
-%post -p /sbin/ldconfig
-%postun -p /sbin/ldconfig
+%post
+/sbin/ldconfig
+if [ -e %{_sysconfdir}/dat.conf ]; then
+    sed -e '/OpenIB-.* u1/d' < %{_sysconfdir}/dat.conf > /tmp/$$ofadapl
+    mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf
+fi
+echo OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
'"ib0 0" ""'  >> %{_sysconfdir}/dat.conf
+echo OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
'"ib1 0" ""'  >> %{_sysconfdir}/dat.conf
+echo OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
'"ib2 0" ""'  >> %{_sysconfdir}/dat.conf
+echo OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
'"ib3 0" ""'  >> %{_sysconfdir}/dat.conf
+echo OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
'"bond0 0" ""'  >> %{_sysconfdir}/dat.conf
+
+
+%postun
+/sbin/ldconfig
+if [ -e %{_sysconfdir}/dat.conf ]; then
+    sed -e '/OpenIB-.* u1/d' < %{_sysconfdir}/dat.conf >
/tmp/$$OpenIBdapl
+    mv /tmp/$$OpenIBdapl %{_sysconfdir}/dat.conf
+fi
 
 %files
 %defattr(-,root,root,-)
 %{_libdir}/libda*.so.*
-%config(noreplace) %{_sysconfdir}/dat.conf
 %doc AUTHORS README ChangeLog
 
 %files devel
@@ -109,7 +125,8 @@ rm -rf %{buildroot}
 %files utils
 %defattr(-,root,root,-)
 %{_bindir}/*
-%{_mandir}/man1/*
+%{_mandir}/man1/*.1*
+%{_mandir}/man5/*.5*
 
 %changelog
 * Thu Feb 14 2008 Arlin Davis <ardavis at ichips.intel.com> - 1.2.5
diff --git a/doc/dat.conf b/doc/dat.conf
deleted file mode 100644
index 06142f8..0000000
--- a/doc/dat.conf
+++ /dev/null
@@ -1,26 +0,0 @@
-#
-# DAT 1.2 and 2.0 configuration file
-#
-# Each entry should have the following fields:
-#
-# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
-#           <provider_version> <ia_params> <platform_params>
-#
-# For the uDAPL cma provder, specify <ia_params> as one of the
following:
-#       network address, network hostname, or netdev name and 0 for
port
-#
-# Simple (OpenIB-cma) default with netdev name provided first on list
-# to enable use of same dat.conf version on all nodes
-#
-# 1.2 and 2.0 examples for multiple interfaces, IPoIB HA failover,
bonding:
-#
-OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0"
""
-OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1
0" ""
-OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib2
0" ""
-OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib3
0" ""
-OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "bond0
0" ""
-ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0"
""
-ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0"
""
-ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib2 0"
""
-ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib3 0"
""
-ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "bond0
0" ""
diff --git a/man/dat.conf.5 b/man/dat.conf.5
new file mode 100644
index 0000000..6dee668
--- /dev/null
+++ b/man/dat.conf.5
@@ -0,0 +1,62 @@
+.TH "DAT.CONF" "5" "25 March 2008" "" ""
+.SH NAME
+dat.conf \- configuration file for static registration of user-level
DAT rdma providers
+.SH "DESCRIPTION"
+.PP
+The DAT (direct access transport) architecture supports the use of 
+multiple DAT providers within a single consumer application. 
+Consumers implicitly select a provider using the Interface Adapter 
+name parameter passed to dat_ia_open().
+.PP
+The subsystem that maps Interface Adapter names to provider
+implementations is known as the DAT registry. When a consumer calls
+dat_ia_open(), the appropriate provider is found and notified of the
+consumer's request to access the IA. After this point, all DAT API
+calls acting on DAT objects are automatically directed to the
+appropriate provider entry points.
+.PP
+A persistent, administratively configurable database is used to store
+mappings from IA names to provider information. This provider
+information includes: the file system path to the provider library
+object, version information, and thread safety information. The
+location and format of the registry is platform dependent. This
+database is known as the Static Registry (SR) and is provided via
+entries in the \fIdat.conf\fR file. The process of adding a provider 
+entry is termed Static Registration.
+.PP
+.SH "Registry File Format"
+\br 
+    * All characters after # on a line are ignored (comments).
+    * Lines on which there are no characters other than whitespace
+      and comments are considered blank lines and are ignored.
+    * Non-blank lines must have seven whitespace separated fields.
+      These fields may contain whitespace if the field is quoted
+      with double quotes. Within fields quoated with double quotes,
+      the backslash or qoute are valid escape sequences:
+    * Each non-blank line will contain the following fields:
+        - The IA Name.
+        - The API version of the library:
+          [k|u]major.minor where "major" and "minor" are both integers
+          in decimal format. User-level examples: "u1.2", and "u2.0".
+        - Whether the library is thread-safe:
[threadsafe|nonthreadsafe]
+        - Whether this is the default section: [default|nondefault]
+        - The library image, version included, to be loaded.
+        - The vendor id and version of DAPL provider: id.major.minor  
+        - ia params, IA specific parameters - device name and port
+        - platform params, (not used) 
+.PP
+.SH Example netdev entries for OpenFabrics rdma_cm providers, both v1.2
and v2.0 
+\br 
+        OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2
"ib0 0" ""
+        ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0
"ib0 0" ""
+
+        NOTE: The OpenFabrics providers use <ia_params> to specify the
device with one of the following:
+              network address, network hostname, or netdev name; along
with port number.
+
+              The OpenIB- and ofa-v2- IA names are unique mappings.
Reserved for OpenFabrics providers. 
+.PP
+The default location for this configuration file is /etc/dat.conf. 
+The file location may be overridden with the environment variable
DAT_OVERRIDE=/your_own_directory/your_dat.conf. 
+.PP
+.SH "SEE ALSO"
+.PP
-- 
1.5.2.5


From arlin.r.davis at intel.com  Mon Apr 14 15:38:06 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Mon, 14 Apr 2008 15:38:06 -0700
Subject: [ofa-general] [PATCH][RFC] dapl v2.0: change packaging to modify OFA
	provider contents of dat.conf instead of file replacement.
Message-ID: <000001c89e80$35985f80$9f97070a@amr.corp.intel.com>


Change the packaging to update only the OFA provider contents in dat.conf. This allows other
dapl providers, other then OFA, to co-exist and configure properly. Adding man page to explain
syntax of this static configuration file since there will no longer be comments in dat.conf.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 Makefile.am    |   25 ++++++++++++++++++----
 dapl.spec.in   |   24 ++++++++++++++++++---
 doc/dat.conf   |   26 -----------------------
 man/dat.conf.5 |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 102 insertions(+), 35 deletions(-)
 delete mode 100755 doc/dat.conf
 create mode 100644 man/dat.conf.5

diff --git a/Makefile.am b/Makefile.am
index 60b3db6..bb75dea 100755
--- a/Makefile.am
+++ b/Makefile.am
@@ -25,8 +25,6 @@ else
 DBGFLAGS = -g
 endif
 
-sysconf_DATA = doc/dat.conf
-
 datlibdir = $(libdir)
 dapllibofadir = $(libdir)
 
@@ -195,7 +193,7 @@ libdatinclude_HEADERS = dat/include/dat2/dat.h \
 			dat/include/dat2/udat_vendor_specific.h \
 			dat/include/dat2/dat_ib_extensions.h 
 		
-man_MANS = man/dtest.1 man/dapltest.1 
+man_MANS = man/dtest.1 man/dapltest.1 man/dat.conf.5 
     	
 EXTRA_DIST = dat/common/dat_dictionary.h \
 	     dat/common/dat_dr.h \
@@ -241,7 +239,6 @@ EXTRA_DIST = dat/common/dat_dictionary.h \
 	     dapl/openib_cma/dapl_ib_dto.h \
 	     dapl/openib_cma/dapl_ib_util.h \
 	     dat/udat/libdat2.map \
-	     doc/dat.conf \
 	     dapl/udapl/libdaplofa.map \
 	     dapl.spec.in \
 	     $(man_MANS) \
@@ -275,5 +272,23 @@ EXTRA_DIST = dat/common/dat_dictionary.h \
 	 
 dist-hook: dapl.spec 
 	cp dapl.spec $(distdir)
-	
+
+install-exec-hook:
+	if test -e $(sysconfdir)/dat.conf; then \
+		sed -e '/ofa-v2-.* u2/d' < $(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \
+		cp /tmp/$$$$ofadapl $(sysconfdir)/dat.conf; \
+	fi; \
+	echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib0 0" ""' >>
$(sysconfdir)/dat.conf; \
+	echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib1 0" ""' >>
$(sysconfdir)/dat.conf; \
+	echo ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib2 0" ""' >>
$(sysconfdir)/dat.conf; \
+	echo ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib3 0" ""' >>
$(sysconfdir)/dat.conf; \
+	echo ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"bond0 0" ""' >>
$(sysconfdir)/dat.conf;
+
+uninstall-hook:
+	if test -e $(sysconfdir)/dat.conf; then \
+		sed -e '/ofa-v2-.* u2/d' < $(sysconfdir)/dat.conf > /tmp/$$$$ofadapl; \
+		cp /tmp/$$$$ofadapl $(sysconfdir)/dat.conf; \
+	fi; 
+
 SUBDIRS = . test/dtest test/dapltest
+
diff --git a/dapl.spec.in b/dapl.spec.in
index 945ec78..1c656ca 100644
--- a/dapl.spec.in
+++ b/dapl.spec.in
@@ -87,13 +87,28 @@ rm -f %{buildroot}%{_libdir}/*.la
 %clean
 rm -rf %{buildroot}
 
-%post -p /sbin/ldconfig
-%postun -p /sbin/ldconfig
+%post 
+/sbin/ldconfig
+if [ -e %{_sysconfdir}/dat.conf ]; then
+    sed -e '/ofa-v2-.* u2/d' < %{_sysconfdir}/dat.conf > /tmp/$$ofadapl
+    mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf
+fi
+echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib0 0" ""' >>
%{_sysconfdir}/dat.conf
+echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib1 0" ""' >>
%{_sysconfdir}/dat.conf
+echo ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib2 0" ""' >>
%{_sysconfdir}/dat.conf
+echo ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"ib3 0" ""' >>
%{_sysconfdir}/dat.conf
+echo ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 '"bond0 0" ""' >>
%{_sysconfdir}/dat.conf
+
+%postun 
+/sbin/ldconfig
+if [ -e %{_sysconfdir}/dat.conf ]; then
+    sed -e '/ofa-v2-.* u2/d' < %{_sysconfdir}/dat.conf > /tmp/$$ofadapl
+    mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf
+fi
 
 %files
 %defattr(-,root,root,-)
 %{_libdir}/libda*.so.*
-%config(noreplace) %{_sysconfdir}/dat.conf
 %doc AUTHORS README ChangeLog
 
 %files devel
@@ -109,7 +124,8 @@ rm -rf %{buildroot}
 %files utils
 %defattr(-,root,root,-)
 %{_bindir}/*
-%{_mandir}/man1/*
+%{_mandir}/man1/*.1*
+%{_mandir}/man5/*.5*
 
 %changelog
 * Thu Feb 14 2008 Arlin Davis <ardavis at ichips.intel.com> - 2.0.7
diff --git a/doc/dat.conf b/doc/dat.conf
deleted file mode 100755
index 06142f8..0000000
--- a/doc/dat.conf
+++ /dev/null
@@ -1,26 +0,0 @@
-#
-# DAT 1.2 and 2.0 configuration file
-#
-# Each entry should have the following fields:
-#
-# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
-#           <provider_version> <ia_params> <platform_params>
-#
-# For the uDAPL cma provder, specify <ia_params> as one of the following:
-#       network address, network hostname, or netdev name and 0 for port
-#
-# Simple (OpenIB-cma) default with netdev name provided first on list
-# to enable use of same dat.conf version on all nodes
-#
-# 1.2 and 2.0 examples for multiple interfaces, IPoIB HA failover, bonding:
-#
-OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
-OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib1 0" ""
-OpenIB-cma-2 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib2 0" ""
-OpenIB-cma-3 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib3 0" ""
-OpenIB-bond u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "bond0 0" ""
-ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
-ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
-ofa-v2-ib2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib2 0" ""
-ofa-v2-ib3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib3 0" ""
-ofa-v2-bond u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "bond0 0" ""
diff --git a/man/dat.conf.5 b/man/dat.conf.5
new file mode 100644
index 0000000..6dee668
--- /dev/null
+++ b/man/dat.conf.5
@@ -0,0 +1,62 @@
+.TH "DAT.CONF" "5" "25 March 2008" "" ""
+.SH NAME
+dat.conf \- configuration file for static registration of user-level DAT rdma providers
+.SH "DESCRIPTION"
+.PP
+The DAT (direct access transport) architecture supports the use of 
+multiple DAT providers within a single consumer application. 
+Consumers implicitly select a provider using the Interface Adapter 
+name parameter passed to dat_ia_open().
+.PP
+The subsystem that maps Interface Adapter names to provider
+implementations is known as the DAT registry. When a consumer calls
+dat_ia_open(), the appropriate provider is found and notified of the
+consumer's request to access the IA. After this point, all DAT API
+calls acting on DAT objects are automatically directed to the
+appropriate provider entry points.
+.PP
+A persistent, administratively configurable database is used to store
+mappings from IA names to provider information. This provider
+information includes: the file system path to the provider library
+object, version information, and thread safety information. The
+location and format of the registry is platform dependent. This
+database is known as the Static Registry (SR) and is provided via
+entries in the \fIdat.conf\fR file. The process of adding a provider 
+entry is termed Static Registration.
+.PP
+.SH "Registry File Format"
+\br 
+    * All characters after # on a line are ignored (comments).
+    * Lines on which there are no characters other than whitespace
+      and comments are considered blank lines and are ignored.
+    * Non-blank lines must have seven whitespace separated fields.
+      These fields may contain whitespace if the field is quoted
+      with double quotes. Within fields quoated with double quotes,
+      the backslash or qoute are valid escape sequences:
+    * Each non-blank line will contain the following fields:
+        - The IA Name.
+        - The API version of the library:
+          [k|u]major.minor where "major" and "minor" are both integers
+          in decimal format. User-level examples: "u1.2", and "u2.0".
+        - Whether the library is thread-safe: [threadsafe|nonthreadsafe]
+        - Whether this is the default section: [default|nondefault]
+        - The library image, version included, to be loaded.
+        - The vendor id and version of DAPL provider: id.major.minor  
+        - ia params, IA specific parameters - device name and port
+        - platform params, (not used) 
+.PP
+.SH Example netdev entries for OpenFabrics rdma_cm providers, both v1.2 and v2.0 
+\br 
+        OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" ""
+        ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
+
+        NOTE: The OpenFabrics providers use <ia_params> to specify the device with one of the
following:
+              network address, network hostname, or netdev name; along with port number.
+
+              The OpenIB- and ofa-v2- IA names are unique mappings. Reserved for OpenFabrics
providers. 
+.PP
+The default location for this configuration file is /etc/dat.conf. 
+The file location may be overridden with the environment variable
DAT_OVERRIDE=/your_own_directory/your_dat.conf. 
+.PP
+.SH "SEE ALSO"
+.PP
-- 
1.5.2.5


From clameter at sgi.com  Mon Apr 14 16:09:26 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Mon, 14 Apr 2008 16:09:26 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <patchbomb.1207669443@duo.random>
References: <patchbomb.1207669443@duo.random>
Message-ID: <Pine.LNX.4.64.0804141559410.11036@schroedinger.engr.sgi.com>

On Tue, 8 Apr 2008, Andrea Arcangeli wrote:

> The difference with #v11 is a different implementation of mm_lock that
> guarantees handling signals in O(N). It's also more lowlatency friendly. 

Ok. So the rest of the issues remains unaddressed? I am glad that we 
finally settled on the locking. But now I will have to clean this up, 
address the remaining issues, sequence the patches right, provide docs, 
handle the merging issue etc etc? I have seen no detailed review of my 
patches that you include here.

We are going down the same road as we had to go with the OOM patches where 
David Rientjes and me had to deal with the issues you raised?


From rdreier at cisco.com  Mon Apr 14 21:01:24 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 21:01:24 -0700
Subject: [ofa-general] Re: [PATCH] IB/ehca: extend query_device() and
	query_port() to support all values for ibv_devinfo
In-Reply-To: <200804071457.36248.ossrosch@linux.vnet.ibm.com> (Stefan
	Roscher's message of "Mon, 7 Apr 2008 13:57:33 +0100")
References: <200804071457.36248.ossrosch@linux.vnet.ibm.com>
Message-ID: <adad4orpy0b.fsf@cisco.com>

thanks, applied


From rdreier at cisco.com  Mon Apr 14 21:03:20 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 21:03:20 -0700
Subject: [ofa-general] Re: [PATCH] IB/mlx4: Fix race when detaching a QP from
	a MCG
In-Reply-To: <1208071321.9534.2.camel@mtls03> (Eli Cohen's message of "Sun, 13
	Apr 2008 10:22:01 +0300")
References: <1208071321.9534.2.camel@mtls03>
Message-ID: <ada8wzfpxx3.fsf@cisco.com>

thanks, applied.


From rdreier at cisco.com  Mon Apr 14 21:04:22 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 21:04:22 -0700
Subject: [ofa-general] Re: IB/mlx4: fix code comment
In-Reply-To: <1208071395.9534.4.camel@mtls03> (Eli Cohen's message of "Sun, 13
	Apr 2008 10:23:15 +0300")
References: <1208071395.9534.4.camel@mtls03>
Message-ID: <ada4pa3pxvd.fsf@cisco.com>

thanks, applied


From rdreier at cisco.com  Mon Apr 14 21:05:55 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 21:05:55 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/iSER: Release connection resources
	when receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event
In-Reply-To: <48032B8F.1030504@voltaire.com> (Erez Zilber's message of "Mon,
	14 Apr 2008 13:01:51 +0300")
References: <47FB489C.6030507@voltaire.com> <48032B8F.1030504@voltaire.com>
Message-ID: <adazlrvoj8c.fsf@cisco.com>

thanks, applied... I assume this much simpler patch replaces the earlier
one completely?


From rdreier at cisco.com  Mon Apr 14 21:09:56 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 21:09:56 -0700
Subject: [ofa-general] Re: [PATCH] do not change itt endianness
In-Reply-To: <4803530A.3010408@voltaire.com> (Erez Zilber's message of "Mon,
	14 Apr 2008 15:50:18 +0300")
References: <aday78lxlbg.fsf@cisco.com> <47E14B45.9040509@cs.wisc.edu>
	<4803530A.3010408@voltaire.com>
Message-ID: <adave2joj1n.fsf@cisco.com>

 > -	itt = ntohl(hdr->itt);
 > +	itt = hdr->itt;

This still gives the sparse warning

    drivers/infiniband/ulp/iser/iser_initiator.c:419:6: warning: incorrect type in assignment (different base types)
    drivers/infiniband/ulp/iser/iser_initiator.c:419:6: expected unsigned int [unsigned] itt
    drivers/infiniband/ulp/iser/iser_initiator.c:419:6: got restricted unsigned int [usertype] itt

I guess the two possibilities are to use get_itt() or use a __force cast
if you don't want the masking that get_itt() does.  Which is correct?

 - R.


From erezz at voltaire.com  Mon Apr 14 22:50:18 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 15 Apr 2008 08:50:18 +0300
Subject: [ofa-general] Re: [PATCH v2] IB/iSER: Release connection resources
 when receiving a RDMA_CM_EVENT_DEVICE_REMOVAL event
In-Reply-To: <adazlrvoj8c.fsf@cisco.com>
References: <47FB489C.6030507@voltaire.com> <48032B8F.1030504@voltaire.com>
	<adazlrvoj8c.fsf@cisco.com>
Message-ID: <4804421A.2030208@voltaire.com>

Roland Dreier wrote:

> thanks, applied... I assume this much simpler patch replaces the earlier
> one completely?
>   

Yes (that's why I added "v2" in the subject).

Thanks,
Erez


From rdreier at cisco.com  Mon Apr 14 22:55:05 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 14 Apr 2008 22:55:05 -0700
Subject: [ofa-general] [PATCH/RFC] IPoIB: Handle case when P_Key is deleted
	and re-added at same index
Message-ID: <adamynvoe6e.fsf@cisco.com>

If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually.  This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo at cisco.com> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |    4 ++--
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   10 +++++-----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 9d411f2..9db7b0b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1007,9 +1007,9 @@ static int ipoib_cm_modify_tx_init(struct net_device *dev,
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_attr qp_attr;
 	int qp_attr_mask, ret;
-	ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index);
+	ret = ib_find_pkey(priv->ca, priv->port, priv->pkey, &qp_attr.pkey_index);
 	if (ret) {
-		ipoib_warn(priv, "pkey 0x%x not in cache: %d\n", priv->pkey, ret);
+		ipoib_warn(priv, "pkey 0x%x not found: %d\n", priv->pkey, ret);
 		return ret;
 	}
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 8b4ff69..0205eb7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -594,7 +594,7 @@ static void ipoib_pkey_dev_check_presence(struct net_device *dev)
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	u16 pkey_index = 0;
 
-	if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index))
+	if (ib_find_pkey(priv->ca, priv->port, priv->pkey, &pkey_index))
 		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 	else
 		set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
@@ -835,13 +835,13 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, int pkey_event)
 			clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 			ipoib_ib_dev_down(dev, 0);
 			ipoib_ib_dev_stop(dev, 0);
-			ipoib_pkey_dev_delay_open(dev);
-			return;
+			if (ipoib_pkey_dev_delay_open(dev))
+				return;
 		}
-		set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 
 		/* restart QP only if P_Key index is changed */
-		if (new_index == priv->pkey_index) {
+		if (test_and_set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags) &&
+		    new_index == priv->pkey_index) {
 			ipoib_dbg(priv, "Not flushing - P_Key index not changed.\n");
 			return;
 		}
-- 
1.5.5


From erezz at Voltaire.COM  Mon Apr 14 23:25:51 2008
From: erezz at Voltaire.COM (Erez Zilber)
Date: Tue, 15 Apr 2008 09:25:51 +0300
Subject: [ewg] Re: [ofa-general] resolve conflict between OFED 1.3 and
	2.6.18	with ISCSI
In-Reply-To: <48037083.6000209@voltaire.com>
References: <1208176877.22671.54.camel@pc.ilinx><480356B6.3040403@voltaire.com><1208180482.22671.67.camel@pc.ilinx><48036112.6070505@voltaire.com><1208183810.22671.90.camel@pc.ilinx>
	<48037083.6000209@voltaire.com>
Message-ID: <48044A6F.8040107@Voltaire.COM>


> >  
> >> BTW - I don't mind if other people add the required code to OFED
> 1.4 for
> >> qla4xxx support.
> >>    
> >
> > ~sigh~  Yeah.
> >
> > I wonder how many (if any) of our userbase we are going to upset if we
> > cease providing the qla4xxx driver in our kernels.  On the other hand, I
> > wonder how many we'd upset by not providing iSER and the newer
> > open-iscsi modules.
> >
> >  
>
> Yeah, I understand. Let me get back to you on this issue.
>

Brian,

Voltaire will not be able to add qla4xxx support to open-iscsi in OFED
1.4. I understand that this may be important for some people, so if you
(or anyone else) wants to add it, we can help with some info about
open-iscsi and its backports & scripts in OFED (but we can't do the
backports and testing ourselves).

Erez


From erezz at Voltaire.COM  Mon Apr 14 23:33:29 2008
From: erezz at Voltaire.COM (Erez Zilber)
Date: Tue, 15 Apr 2008 09:33:29 +0300
Subject: [ofa-general] Re: [PATCH] do not change itt endianness
In-Reply-To: <adave2joj1n.fsf@cisco.com>
References: <aday78lxlbg.fsf@cisco.com>
	<47E14B45.9040509@cs.wisc.edu>	<4803530A.3010408@voltaire.com>
	<adave2joj1n.fsf@cisco.com>
Message-ID: <48044C39.7090403@Voltaire.COM>

Roland Dreier wrote:
>  > -	itt = ntohl(hdr->itt);
>  > +	itt = hdr->itt;
>
> This still gives the sparse warning
>
>     drivers/infiniband/ulp/iser/iser_initiator.c:419:6: warning: incorrect type in assignment (different base types)
>     drivers/infiniband/ulp/iser/iser_initiator.c:419:6: expected unsigned int [unsigned] itt
>     drivers/infiniband/ulp/iser/iser_initiator.c:419:6: got restricted unsigned int [usertype] itt
>
> I guess the two possibilities are to use get_itt() or use a __force cast
> if you don't want the masking that get_itt() does.  Which is correct?
>
>  - R.
>   

Roland,

If I just run 'make', I don't see the warning. What should I do in order
to get the same warning that you get?

Thanks,
Erez


From ogerlitz at voltaire.com  Tue Apr 15 00:40:35 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 15 Apr 2008 10:40:35 +0300
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <adabq4cqvbo.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com>
Message-ID: <48045BF3.8040305@voltaire.com>

Roland Dreier wrote:
> I would like to make a 1.1.2 release of libibverbs as a sort of
> checkpoint before working on possibly destabilizing stuff such as
> merging XRC or other verbs extensions.  But I would like to know what
> pending work people have sent me (that I've probably lost track of),
> especially small safe stuff that could go into 1.1.2.
>
There's the verbs.7 man page which was submitted on February 
(http://www.mail-archive.com/general at lists.openfabrics.org/msg11871.html) 
and following the discussion was fixed to reflect the feedback from the 
list  (http://lists.openfabrics.org/pipermail/ewg/2008-April/006340.html).

Or.


From hrosenstock at xsigo.com  Tue Apr 15 07:21:59 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Tue, 15 Apr 2008 07:21:59 -0700
Subject: [ofa-general] Re: [PATCH] do not change itt endianness
In-Reply-To: <48044C39.7090403@Voltaire.COM>
References: <aday78lxlbg.fsf@cisco.com> <47E14B45.9040509@cs.wisc.edu>
	<4803530A.3010408@voltaire.com> <adave2joj1n.fsf@cisco.com>
	<48044C39.7090403@Voltaire.COM>
Message-ID: <1208269319.1056.103.camel@hrosenstock-ws.xsigo.com>

Erez,

On Tue, 2008-04-15 at 09:33 +0300, Erez Zilber wrote:
> Roland Dreier wrote:
> >  > -	itt = ntohl(hdr->itt);
> >  > +	itt = hdr->itt;
> >
> > This still gives the sparse warning
> >
> >     drivers/infiniband/ulp/iser/iser_initiator.c:419:6: warning: incorrect type in assignment (different base types)
> >     drivers/infiniband/ulp/iser/iser_initiator.c:419:6: expected unsigned int [unsigned] itt
> >     drivers/infiniband/ulp/iser/iser_initiator.c:419:6: got restricted unsigned int [usertype] itt
> >
> > I guess the two possibilities are to use get_itt() or use a __force cast
> > if you don't want the masking that get_itt() does.  Which is correct?
> >
> >  - R.
> >   
> 
> Roland,
> 
> If I just run 'make', I don't see the warning. What should I do in order
> to get the same warning that you get?

Try:
make C=1

Look at Documentation/sparse.txt

-- Hal

> Thanks,
> Erez
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From changquing.tang at hp.com  Tue Apr 15 07:32:37 2008
From: changquing.tang at hp.com (Tang, Changqing)
Date: Tue, 15 Apr 2008 14:32:37 +0000
Subject: [ofa-general] Sonoma Conference Presentation Slides 
Message-ID: <D89C2C212795564B837FA1665CAE02991016F32021@G5W0278.americas.hpqcorp.net>


Are all the slides ready for public access ? Can anyone tell ?

Thanks.

--CQ


From Arkady.Kanevsky at netapp.com  Tue Apr 15 08:50:33 2008
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Tue, 15 Apr 2008 11:50:33 -0400
Subject: [ofa-general] FW: [Interop-wg] next Interop event dates
Message-ID: <C98692FD98048C41885E0B0FACD9DFB80706F425@exnane01.hq.netapp.com>

 
Arkady Kanevsky                       email: arkady at netapp.com

Network Appliance Inc.               phone: 781-768-5395

1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195

Waltham, MA 02451                   central phone: 781-768-5300

 
________________________________

	From: Kanevsky, Arkady 
	Sent: Tuesday, April 15, 2008 11:47 AM
	To: interop-wg at lists.openfabrics.org; openib-general at openib.org
	Subject: [Interop-wg] next Interop event dates
	
	
	Next Interop Event 

	*	IBTA Plugfest - September 22nd - 26th 
	*	iWARP Plugfest -  September 22nd - 26th 
	*	OFA Interop Event - September 29th - October 3rd 

	If you plan to participate, please, let IWG (
interop-wg at lists.openfabrics.org) know.
	Thanks,
	 

	Arkady Kanevsky                       email: arkady at netapp.com

	Network Appliance Inc.               phone: 781-768-5395

	1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195

	Waltham, MA 02451                   central phone: 781-768-5300

	 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080415/5f7ea18a/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT32814388.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080415/5f7ea18a/attachment.txt>

From rdreier at cisco.com  Tue Apr 15 09:21:26 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 15 Apr 2008 09:21:26 -0700
Subject: [ofa-general] Re: [PATCH] do not change itt endianness
In-Reply-To: <48044C39.7090403@Voltaire.COM> (Erez Zilber's message of "Tue,
	15 Apr 2008 09:33:29 +0300")
References: <aday78lxlbg.fsf@cisco.com> <47E14B45.9040509@cs.wisc.edu>
	<4803530A.3010408@voltaire.com> <adave2joj1n.fsf@cisco.com>
	<48044C39.7090403@Voltaire.COM>
Message-ID: <adafxtnnl6h.fsf@cisco.com>

 > If I just run 'make', I don't see the warning. What should I do in order
 > to get the same warning that you get?

You need to use sparse -- install sparse, and then add 'C=2 CF=-D__CHECK_ENDIAN__'
to your make command line.

 - R.


From Jeffrey.C.Becker at nasa.gov  Tue Apr 15 09:44:20 2008
From: Jeffrey.C.Becker at nasa.gov (Jeff Becker)
Date: Tue, 15 Apr 2008 09:44:20 -0700
Subject: [ofa-general] Sonoma Conference Presentation Slides
In-Reply-To: <D89C2C212795564B837FA1665CAE02991016F32021@G5W0278.americas.hpqcorp.net>
References: <D89C2C212795564B837FA1665CAE02991016F32021@G5W0278.americas.hpqcorp.net>
Message-ID: <4804DB64.6000506@nasa.gov>

I've received most of the slides, and put the presentations on the 
server. Jeff Scott and his team are preparing the conference web page 
that will allow public access. I'll add presentations as I get them. Thanks.

-jeff

Tang, Changqing wrote:
> Are all the slides ready for public access ? Can anyone tell ?
>
> Thanks.
>
> --CQ
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From weiny2 at llnl.gov  Tue Apr 15 09:47:50 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Tue, 15 Apr 2008 09:47:50 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <adabq4cqvbo.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com>
Message-ID: <20080415094750.35afc0e5.weiny2@llnl.gov>

On Mon, 14 Apr 2008 09:01:47 -0700
Roland Dreier <rdreier at cisco.com> wrote:

> I would like to make a 1.1.2 release of libibverbs as a sort of
> checkpoint before working on possibly destabilizing stuff such as
> merging XRC or other verbs extensions.  But I would like to know what
> pending work people have sent me (that I've probably lost track of),
> especially small safe stuff that could go into 1.1.2.
> 

Roland,

I wonder if you would take a small patch to map enums to strings.  I thought I
submitted this before but I do not find it in the list archive so I must have
forgotten about it.

Thanks,
Ira Weiny
weiny2 at llnl.gov


>From ccb34b2de8ecbad9e59036ba7c21cf3ac4179120 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Wed, 5 Sep 2007 17:10:11 -0700
Subject: [PATCH] Add enum strings and *_str functions for enums


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 Makefile.am                |    3 +-
 examples/devinfo.c         |   13 +-----
 examples/rc_pingpong.c     |    3 +-
 examples/srq_pingpong.c    |    3 +-
 examples/uc_pingpong.c     |    3 +-
 examples/ud_pingpong.c     |    3 +-
 include/infiniband/verbs.h |   28 ++++++++++++
 src/enum_strs.c            |  100 ++++++++++++++++++++++++++++++++++++++++++++
 src/libibverbs.map         |    5 ++
 9 files changed, 144 insertions(+), 17 deletions(-)
 create mode 100644 src/enum_strs.c

diff --git a/Makefile.am b/Makefile.am
index 705b184..46e2354 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve
 libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@
 
 src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \
-			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c
+			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c \
+				 src/enum_strs.c
 src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \
     $(libibverbs_version_script)
 src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
diff --git a/examples/devinfo.c b/examples/devinfo.c
index 4e4316a..1fadc80 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str)
 	return str;
 }
 
-static const char *port_state_str(enum ibv_port_state pstate)
-{
-	switch (pstate) {
-	case IBV_PORT_DOWN:   return "PORT_DOWN";
-	case IBV_PORT_INIT:   return "PORT_INIT";
-	case IBV_PORT_ARMED:  return "PORT_ARMED";
-	case IBV_PORT_ACTIVE: return "PORT_ACTIVE";
-	default:              return "invalid state";
-	}
-}
-
 static const char *port_phy_state_str(uint8_t phys_state)
 {
 	switch (phys_state) {
@@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
 		}
 		printf("\t\tport:\t%d\n", port);
 		printf("\t\t\tstate:\t\t\t%s (%d)\n",
-		       port_state_str(port_attr.state), port_attr.state);
+		       ibv_port_state_str(port_attr.state), port_attr.state);
 		printf("\t\t\tmax_mtu:\t\t%s (%d)\n",
 		       mtu_str(port_attr.max_mtu), port_attr.max_mtu);
 		printf("\t\t\tactive_mtu:\t\t%s (%d)\n",
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 7181914..26fa45c 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -709,7 +709,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index bc869c9..95bebf4 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -805,7 +805,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c
index 6135030..c09c8c1 100644
--- a/examples/uc_pingpong.c
+++ b/examples/uc_pingpong.c
@@ -697,7 +697,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
index aaee26c..8f3d50b 100644
--- a/examples/ud_pingpong.c
+++ b/examples/ud_pingpong.c
@@ -697,7 +697,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index a51bb9d..5facbf6 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -70,6 +70,13 @@ enum ibv_node_type {
 	IBV_NODE_ROUTER,
 	IBV_NODE_RNIC
 };
+extern const char *const __ibv_node_type_str[];
+static inline const char *ibv_node_type_str(enum ibv_node_type node_type)
+{
+	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
+		node_type = 0;
+	return (__ibv_node_type_str[node_type]);
+}
 
 enum ibv_transport_type {
 	IBV_TRANSPORT_UNKNOWN	= -1,
@@ -160,6 +167,13 @@ enum ibv_port_state {
 	IBV_PORT_ACTIVE		= 4,
 	IBV_PORT_ACTIVE_DEFER	= 5
 };
+extern const char *const __ibv_port_state_str[];
+static inline const char *ibv_port_state_str(enum ibv_port_state port_state)
+{
+	if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER)
+		port_state = IBV_PORT_ACTIVE_DEFER + 1;
+	return (__ibv_port_state_str[port_state]);
+}
 
 struct ibv_port_attr {
 	enum ibv_port_state	state;
@@ -203,6 +217,13 @@ enum ibv_event_type {
 	IBV_EVENT_QP_LAST_WQE_REACHED,
 	IBV_EVENT_CLIENT_REREGISTER
 };
+extern const char *const __ibv_event_type_str[];
+static inline const char *ibv_event_type_str(enum ibv_event_type event)
+{
+	if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER)
+		event = (IBV_EVENT_CLIENT_REREGISTER+1);
+	return (__ibv_event_type_str[event]);
+}
 
 struct ibv_async_event {
 	union {
@@ -238,6 +259,13 @@ enum ibv_wc_status {
 	IBV_WC_RESP_TIMEOUT_ERR,
 	IBV_WC_GENERAL_ERR
 };
+extern const char *const __ibv_wc_status_str[];
+static inline const char *ibv_wc_status_str(enum ibv_wc_status status)
+{
+	if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR)
+		status = IBV_WC_GENERAL_ERR;
+	return (__ibv_wc_status_str[status]);
+}
 
 enum ibv_wc_opcode {
 	IBV_WC_SEND,
diff --git a/src/enum_strs.c b/src/enum_strs.c
new file mode 100644
index 0000000..d6dee4f
--- /dev/null
+++ b/src/enum_strs.c
@@ -0,0 +1,100 @@
+/*
+ * Copyright (c) 2008 Lawrence Livermore National Laboratory
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <infiniband/verbs.h>
+
+const char *const __ibv_node_type_str[] = {
+	"UNKNOWN",
+	"Channel Adapter",
+	"Switch",
+	"Router",
+	"RNIC"
+};
+
+const char *const __ibv_port_state_str[] = {
+	"No State Change (NOP)",
+	"DOWN",
+	"INIT",
+	"ARMED",
+	"ACTIVE",
+	"ACTDEFER",
+	"UNKNOWN"
+};
+
+const char *const __ibv_event_type_str[] = {
+	"CQ Error",
+	"QP Fatal",
+	"QP Request Error",
+	"QP Access Error",
+	"Communication Established",
+	"SQ Drained",
+	"Path Migrated",
+	"Path Migration Request Error",
+	"Device Fatal",
+	"Port Active",
+	"Port Error",
+	"LID Change",
+	"PKey Change",
+	"SM Change",
+	"SRQ Error",
+	"SRQ Limit Reached",
+	"QP Last WQE Reached",
+	"Client Reregistration",
+	"UNKNOWN"
+};
+
+const char *const __ibv_wc_status_str[] = {
+	"Success",
+	"Local Length Error",
+	"Local QP Operation Error",
+	"Local EE Context Operation Error",
+	"Local Protection Error",
+	"Work Request Flushed Error",
+	"Memory Management Operation Error",
+	"Bad Response Error",
+	"Local Access Error",
+	"Remote Invalid Request Error",
+	"Remote Access Error",
+	"Remote Operation Error",
+	"Transport Retry Counter Exceeded",
+	"RNR Retry Counter Exceeded",
+	"Local RDD Violation Error",
+	"Remote Invalid RD Request",
+	"Aborted Error",
+	"Invalid EE Context Number",
+	"Invalid EE Context State",
+	"Fatal Error",
+	"Response Timeout Error",
+	"General Error"
+};
+
diff --git a/src/libibverbs.map b/src/libibverbs.map
index 3a346ed..2bcf360 100644
--- a/src/libibverbs.map
+++ b/src/libibverbs.map
@@ -91,4 +91,9 @@ IBVERBS_1.1 {
 		ibv_dontfork_range;
 		ibv_dofork_range;
 		ibv_register_driver;
+
+      __ibv_node_type_str;
+      __ibv_port_state_str;
+      __ibv_event_type_str;
+      __ibv_wc_status_str;
 } IBVERBS_1.0;
-- 
1.5.1


From erezz at voltaire.com  Tue Apr 15 09:53:18 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Tue, 15 Apr 2008 19:53:18 +0300
Subject: [ofa-general] [PATCH v2] do not change itt endianness
In-Reply-To: <adave2joj1n.fsf@cisco.com>
References: <aday78lxlbg.fsf@cisco.com>
	<47E14B45.9040509@cs.wisc.edu>	<4803530A.3010408@voltaire.com>
	<adave2joj1n.fsf@cisco.com>
Message-ID: <4804DD7E.3030501@voltaire.com>

The itt field in struct iscsi_data is not defined with any
particular endianness. open-iscsi should use it as-is without
changing its endianness.

Signed-off-by: Erez Zilber <erezz at voltaire.com>
---
 drivers/infiniband/ulp/iser/iser_initiator.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index 83247f1..08dc81c 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -405,7 +405,7 @@ int iser_send_data_out(struct iscsi_conn     *conn,
 	struct iser_dto *send_dto = NULL;
 	unsigned long buf_offset;
 	unsigned long data_seg_len;
-	unsigned int itt;
+	uint32_t itt;
 	int err = 0;
 
 	if (!iser_conn_state_comp(iser_conn->ib_conn, ISER_CONN_UP)) {
@@ -416,7 +416,7 @@ int iser_send_data_out(struct iscsi_conn     *conn,
 	if (iser_check_xmit(conn, ctask))
 		return -ENOBUFS;
 
-	itt = ntohl(hdr->itt);
+	itt = (__force uint32_t)hdr->itt;
 	data_seg_len = ntoh24(hdr->dlength);
 	buf_offset   = ntohl(hdr->offset);
 
-- 
1.5.3.6

Roland,

I hope it's ok now, and thanks for the explanation about sparse.

Erez


From rdreier at cisco.com  Tue Apr 15 09:53:22 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 15 Apr 2008 09:53:22 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <20080415094750.35afc0e5.weiny2@llnl.gov> (Ira Weiny's message of
	"Tue, 15 Apr 2008 09:47:50 -0700")
References: <adabq4cqvbo.fsf@cisco.com>
	<20080415094750.35afc0e5.weiny2@llnl.gov>
Message-ID: <ada8wzfnjp9.fsf@cisco.com>

 > I wonder if you would take a small patch to map enums to strings.  I thought I
 > submitted this before but I do not find it in the list archive so I must have
 > forgotten about it.

Yes, that is a useful addition (although it's not that small a patch ;).

However

 > +++ b/src/libibverbs.map
 > @@ -91,4 +91,9 @@ IBVERBS_1.1 {
 >  		ibv_dontfork_range;
 >  		ibv_dofork_range;
 >  		ibv_register_driver;
 > +
 > +      __ibv_node_type_str;
 > +      __ibv_port_state_str;
 > +      __ibv_event_type_str;
 > +      __ibv_wc_status_str;

I don't think we want to export the array of strings as the ABI, since
that would prevent us from doing localization or anything like that in
the future, and compiling the inline functions into the application just
seems less flexible.  So I would rather see

 > +static inline const char *ibv_node_type_str(enum ibv_node_type node_type)
 > +{
 > +	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
 > +		node_type = 0;
 > +	return (__ibv_node_type_str[node_type]);
 > +}

the API should be ibv_node_type_str() and it should be a non-inline
exported string function.

 - R.


From tefourlife at icqmail.com  Tue Apr 15 09:57:35 2008
From: tefourlife at icqmail.com (Francisco Mahoney)
Date: Tue, 15 Apr 2008 17:57:35 +0100
Subject: [ofa-general] Hi 
Message-ID: <01c89f22$2f389180$9e16364e@tefourlife>

Forget about s~xual and ED problems!
Zillions of men all over the world use our cure - Ciagra and Vialis!
Buy it in our online store NOW!

FOR SITE LINK VIEW ATTACHED DETAILS

Friendly customer support and worldwide shipping!
Choose Our Cure!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: file.zip
Type: application/zip
Size: 326 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080415/1acf2f26/attachment.zip>

From Brian.Murrell at Sun.COM  Tue Apr 15 11:30:23 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Tue, 15 Apr 2008 14:30:23 -0400
Subject: [ewg] Re: [ofa-general] resolve conflict between OFED 1.3 and
	2.6.18	with ISCSI
In-Reply-To: <48044A6F.8040107@Voltaire.COM>
References: <1208176877.22671.54.camel@pc.ilinx>
	<480356B6.3040403@voltaire.com> <1208180482.22671.67.camel@pc.ilinx>
	<48036112.6070505@voltaire.com> <1208183810.22671.90.camel@pc.ilinx>
	<48037083.6000209@voltaire.com> <48044A6F.8040107@Voltaire.COM>
Message-ID: <1208284223.22671.151.camel@pc.ilinx>

On Tue, 2008-04-15 at 09:25 +0300, Erez Zilber wrote:
> Voltaire will not be able to add qla4xxx support to open-iscsi in OFED
> 1.4. I understand that this may be important for some people, so if you
> (or anyone else) wants to add it, we can help with some info about
> open-iscsi and its backports & scripts in OFED (but we can't do the
> backports and testing ourselves).

Thanx for the update Erez.  I understand.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080415/113e53d3/attachment.sig>

From rdreier at cisco.com  Tue Apr 15 12:54:20 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 15 Apr 2008 12:54:20 -0700
Subject: [ofa-general] [PATCH v2] do not change itt endianness
In-Reply-To: <4804DD7E.3030501@voltaire.com> (Erez Zilber's message of "Tue,
	15 Apr 2008 19:53:18 +0300")
References: <aday78lxlbg.fsf@cisco.com> <47E14B45.9040509@cs.wisc.edu>
	<4803530A.3010408@voltaire.com> <adave2joj1n.fsf@cisco.com>
	<4804DD7E.3030501@voltaire.com>
Message-ID: <ada1w56opw3.fsf@cisco.com>

thanks, applied.


From rdreier at cisco.com  Tue Apr 15 12:56:07 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 15 Apr 2008 12:56:07 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <48045BF3.8040305@voltaire.com> (Or Gerlitz's message of "Tue, 15
	Apr 2008 10:40:35 +0300")
References: <adabq4cqvbo.fsf@cisco.com> <48045BF3.8040305@voltaire.com>
Message-ID: <adawsmynb8o.fsf@cisco.com>

 > There's the verbs.7 man page which was submitted on February
 > (http://www.mail-archive.com/general at lists.openfabrics.org/msg11871.html)
 > and following the discussion was fixed to reflect the feedback from
 > the list
 > (http://lists.openfabrics.org/pipermail/ewg/2008-April/006340.html).

to be honest I don't think verbs.7 is ready to merge yet.  I haven't had
a chance to review in detail but I think it really is focusing on the
wrong things right now.  For example, a list of the contents of
<infiniband/verbs.h> really is not useful, since we already have
verbs.h; on the other hand more detail on semantic issues such as
thread-safety, IB/iWARP differences, etc.

 - R.


From olga.shern at gmail.com  Tue Apr 15 13:13:01 2008
From: olga.shern at gmail.com (Olga Shern)
Date: Tue, 15 Apr 2008 23:13:01 +0300
Subject: [ofa-general] ofed works on kernels with 64Kbyte pages?
In-Reply-To: <47FA613F.3070301@mellanox.co.il>
References: <20080404204758.GU29410@sgi.com> <adaod8pqqmc.fsf@cisco.com>
	<47FA613F.3070301@mellanox.co.il>
Message-ID: <bc457d660804151313v54440f1fla0a50586f59eba43@mail.gmail.com>

 Hi,

We also tested OFED 1.3 on PPC64 with  SLES10 SP1 UP1  with connectX and
Arbel HCAs

Olga S (Voltaire)

On Mon, Apr 7, 2008 at 9:00 PM, Tziporet Koren <tziporet at dev.mellanox.co.il>
wrote:

> Roland Dreier wrote:
>
> >  > I know it's a long shot, but has anyone tried using OFED on
> >  > a kernel with 64Kbyte pages?
> >  >  > SGI would like to support that, but I've gotten reports that
> >  > something is not working (e.g., "ib_rdma_bw" doesn't work on  > an
> > ia64 kernel with 64Kb pages). This is with the mthca driver,  > fwiw.
> >  >  > Unfortunately a conspiracy of h/w prevents me from reproducing
> >  > this right now, so I don't have more details. But I'd be very
> >  > curious to know if anyone can verify that OFED does/doesn't
> >  > work with 64Kbyte pages.
> >
> > I don't know about OFED, but I've tried various things on 64KB PAGE_SIZE
> > systems and it seems to work.  It wouldn't surprise me if there are
> > issues since the drivers and firmware gets a lot less testing in such
> > situations but it "should work" -- I'd be happy to help debug if anyone
> > has concrete problems.
> >
> >
> OFED was tested on PPC64 with RHEL5.1 which works with 64K pages as a
> default.
> This was tested with our ConnectX cards (mlx4 driver)
> I think IBM are using the same OS for their ehca cards too
>
> Tziporet
>
>

> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080415/aba7d4b9/attachment.html>

From akepner at sgi.com  Tue Apr 15 13:16:29 2008
From: akepner at sgi.com (akepner at sgi.com)
Date: Tue, 15 Apr 2008 13:16:29 -0700
Subject: [ofa-general] ofed works on kernels with 64Kbyte pages?
In-Reply-To: <bc457d660804151313v54440f1fla0a50586f59eba43@mail.gmail.com>
References: <20080404204758.GU29410@sgi.com> <adaod8pqqmc.fsf@cisco.com>
	<47FA613F.3070301@mellanox.co.il>
	<bc457d660804151313v54440f1fla0a50586f59eba43@mail.gmail.com>
Message-ID: <20080415201629.GS8593@sgi.com>

On Tue, Apr 15, 2008 at 11:13:01PM +0300, Olga Shern wrote:
> ..
> We also tested OFED 1.3 on PPC64 with  SLES10 SP1 UP1  with connectX and
> Arbel HCAs

Thanks Olga, Tziporet, and Roland for your responses. 

We found the problem - it was one of our own making, and 
it's been fixed. 

So everything looks to be working fine now with OFED on 
ia64 kernels with 64Kbyte pages.

-- 
Arthur


From olga.shern at gmail.com  Tue Apr 15 13:33:02 2008
From: olga.shern at gmail.com (Olga Shern)
Date: Tue, 15 Apr 2008 23:33:02 +0300
Subject: [ofa-general] OFED 1.3 user source rpm
In-Reply-To: <3307cdf90804071151u7b47ad6csd57efaea13455cdb@mail.gmail.com>
References: <3307cdf90804071151u7b47ad6csd57efaea13455cdb@mail.gmail.com>
Message-ID: <bc457d660804151333p3fd54313h3ac091671c99659@mail.gmail.com>

Hi,

OFED 1.3 has separate rpm for each user library,
all rpms are located in SRPMS, you can open the needed one.

Olga

On Mon, Apr 7, 2008 at 9:51 PM, Rajouri Jammu <rajouri.jammu at gmail.com>
wrote:

> Hi,
>
> I could not find the ofa_user rpm in OFED 1.3. In older releases there was
> a way to  create a separate rpm for the user src.
>
> OFED-1.2.5.4]# grep ofa_user *
> build_env.sh:OFA_USER_SRC_RPM=$(/bin/ls -1 ${SRPMS}/ofa_user*.src.rpm 2>
> $NULL)
> BUILD_ID:ofa_user-1.2.5.4:
> build.sh:# Create RPMs for selected packages from ofa_user and ofa_kernel
>
>
> I couldn't find anything like that in OFED 1.3.
>
>
> I there a way for me to look at the OFED 1.3 user mode sources?
>
>
> thanks.
>
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080415/8d69eaff/attachment.html>

From weiny2 at llnl.gov  Tue Apr 15 13:35:48 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Tue, 15 Apr 2008 13:35:48 -0700
Subject: [PATCH v2] Add enum strings and *_str functions for enums (Was: Re:
	[ofa-general] Pending libibverbs patches?)
In-Reply-To: <ada8wzfnjp9.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com>
	<20080415094750.35afc0e5.weiny2@llnl.gov>
	<ada8wzfnjp9.fsf@cisco.com>
Message-ID: <20080415133548.414aeaea.weiny2@llnl.gov>

On Tue, 15 Apr 2008 09:53:22 -0700
Roland Dreier <rdreier at cisco.com> wrote:

>  > I wonder if you would take a small patch to map enums to strings.  I thought I
>  > submitted this before but I do not find it in the list archive so I must have
>  > forgotten about it.
> 
> Yes, that is a useful addition (although it's not that small a patch ;).
> 
> However
> 
>  > +++ b/src/libibverbs.map
>  > @@ -91,4 +91,9 @@ IBVERBS_1.1 {
>  >  		ibv_dontfork_range;
>  >  		ibv_dofork_range;
>  >  		ibv_register_driver;
>  > +
>  > +      __ibv_node_type_str;
>  > +      __ibv_port_state_str;
>  > +      __ibv_event_type_str;
>  > +      __ibv_wc_status_str;
> 
> I don't think we want to export the array of strings as the ABI, since
> that would prevent us from doing localization or anything like that in
> the future, and compiling the inline functions into the application just
> seems less flexible.
>

Good point.

>
>     So I would rather see
>
>  > +static inline const char *ibv_node_type_str(enum ibv_node_type node_type)
>  > +{
>  > +	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
>  > +		node_type = 0;
>  > +	return (__ibv_node_type_str[node_type]);
>  > +}
> 
> the API should be ibv_node_type_str() and it should be a non-inline
> exported string function.
> 

Done, here is v2 of the patch,
Ira


>From 82edbb7d63dcef42bdf20b0ee819dea5794c0c03 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Wed, 5 Sep 2007 17:10:11 -0700
Subject: [PATCH] Add enum strings and *_str functions for enums


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 Makefile.am                |    3 +-
 examples/devinfo.c         |   13 +----
 examples/rc_pingpong.c     |    3 +-
 examples/srq_pingpong.c    |    3 +-
 examples/uc_pingpong.c     |    3 +-
 examples/ud_pingpong.c     |    3 +-
 include/infiniband/verbs.h |    4 ++
 src/enum_strs.c            |  125 ++++++++++++++++++++++++++++++++++++++++++++
 src/libibverbs.map         |    5 ++
 9 files changed, 145 insertions(+), 17 deletions(-)
 create mode 100644 src/enum_strs.c

diff --git a/Makefile.am b/Makefile.am
index 705b184..46e2354 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve
 libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@
 
 src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \
-			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c
+			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c \
+				 src/enum_strs.c
 src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \
     $(libibverbs_version_script)
 src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
diff --git a/examples/devinfo.c b/examples/devinfo.c
index 4e4316a..1fadc80 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str)
 	return str;
 }
 
-static const char *port_state_str(enum ibv_port_state pstate)
-{
-	switch (pstate) {
-	case IBV_PORT_DOWN:   return "PORT_DOWN";
-	case IBV_PORT_INIT:   return "PORT_INIT";
-	case IBV_PORT_ARMED:  return "PORT_ARMED";
-	case IBV_PORT_ACTIVE: return "PORT_ACTIVE";
-	default:              return "invalid state";
-	}
-}
-
 static const char *port_phy_state_str(uint8_t phys_state)
 {
 	switch (phys_state) {
@@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
 		}
 		printf("\t\tport:\t%d\n", port);
 		printf("\t\t\tstate:\t\t\t%s (%d)\n",
-		       port_state_str(port_attr.state), port_attr.state);
+		       ibv_port_state_str(port_attr.state), port_attr.state);
 		printf("\t\t\tmax_mtu:\t\t%s (%d)\n",
 		       mtu_str(port_attr.max_mtu), port_attr.max_mtu);
 		printf("\t\t\tactive_mtu:\t\t%s (%d)\n",
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 7181914..26fa45c 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -709,7 +709,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index bc869c9..95bebf4 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -805,7 +805,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c
index 6135030..c09c8c1 100644
--- a/examples/uc_pingpong.c
+++ b/examples/uc_pingpong.c
@@ -697,7 +697,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
index aaee26c..8f3d50b 100644
--- a/examples/ud_pingpong.c
+++ b/examples/ud_pingpong.c
@@ -697,7 +697,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index a51bb9d..ccabb52 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -70,6 +70,7 @@ enum ibv_node_type {
 	IBV_NODE_ROUTER,
 	IBV_NODE_RNIC
 };
+const char *ibv_node_type_str(enum ibv_node_type node_type);
 
 enum ibv_transport_type {
 	IBV_TRANSPORT_UNKNOWN	= -1,
@@ -160,6 +161,7 @@ enum ibv_port_state {
 	IBV_PORT_ACTIVE		= 4,
 	IBV_PORT_ACTIVE_DEFER	= 5
 };
+const char *ibv_port_state_str(enum ibv_port_state port_state);
 
 struct ibv_port_attr {
 	enum ibv_port_state	state;
@@ -203,6 +205,7 @@ enum ibv_event_type {
 	IBV_EVENT_QP_LAST_WQE_REACHED,
 	IBV_EVENT_CLIENT_REREGISTER
 };
+const char *ibv_event_type_str(enum ibv_event_type event);
 
 struct ibv_async_event {
 	union {
@@ -238,6 +241,7 @@ enum ibv_wc_status {
 	IBV_WC_RESP_TIMEOUT_ERR,
 	IBV_WC_GENERAL_ERR
 };
+const char *ibv_wc_status_str(enum ibv_wc_status status);
 
 enum ibv_wc_opcode {
 	IBV_WC_SEND,
diff --git a/src/enum_strs.c b/src/enum_strs.c
new file mode 100644
index 0000000..7056f8a
--- /dev/null
+++ b/src/enum_strs.c
@@ -0,0 +1,125 @@
+/*
+ * Copyright (c) 2008 Lawrence Livermore National Laboratory
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <infiniband/verbs.h>
+
+static const char *const __ibv_node_type_str[] = {
+	"UNKNOWN",
+	"Channel Adapter",
+	"Switch",
+	"Router",
+	"RNIC"
+};
+const char *ibv_node_type_str(enum ibv_node_type node_type)
+{
+	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
+		node_type = 0;
+	return (__ibv_node_type_str[node_type]);
+}
+
+static const char *const __ibv_port_state_str[] = {
+	"No State Change (NOP)",
+	"DOWN",
+	"INIT",
+	"ARMED",
+	"ACTIVE",
+	"ACTDEFER",
+	"UNKNOWN"
+};
+const char *ibv_port_state_str(enum ibv_port_state port_state)
+{
+	if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER)
+		port_state = IBV_PORT_ACTIVE_DEFER + 1;
+	return (__ibv_port_state_str[port_state]);
+}
+
+
+static const char *const __ibv_event_type_str[] = {
+	"CQ Error",
+	"QP Fatal",
+	"QP Request Error",
+	"QP Access Error",
+	"Communication Established",
+	"SQ Drained",
+	"Path Migrated",
+	"Path Migration Request Error",
+	"Device Fatal",
+	"Port Active",
+	"Port Error",
+	"LID Change",
+	"PKey Change",
+	"SM Change",
+	"SRQ Error",
+	"SRQ Limit Reached",
+	"QP Last WQE Reached",
+	"Client Reregistration",
+	"UNKNOWN"
+};
+const char *ibv_event_type_str(enum ibv_event_type event)
+{
+	if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER)
+		event = (IBV_EVENT_CLIENT_REREGISTER+1);
+	return (__ibv_event_type_str[event]);
+}
+
+static const char *const __ibv_wc_status_str[] = {
+	"Success",
+	"Local Length Error",
+	"Local QP Operation Error",
+	"Local EE Context Operation Error",
+	"Local Protection Error",
+	"Work Request Flushed Error",
+	"Memory Management Operation Error",
+	"Bad Response Error",
+	"Local Access Error",
+	"Remote Invalid Request Error",
+	"Remote Access Error",
+	"Remote Operation Error",
+	"Transport Retry Counter Exceeded",
+	"RNR Retry Counter Exceeded",
+	"Local RDD Violation Error",
+	"Remote Invalid RD Request",
+	"Aborted Error",
+	"Invalid EE Context Number",
+	"Invalid EE Context State",
+	"Fatal Error",
+	"Response Timeout Error",
+	"General Error"
+};
+const char *ibv_wc_status_str(enum ibv_wc_status status)
+{
+	if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR)
+		status = IBV_WC_GENERAL_ERR;
+	return (__ibv_wc_status_str[status]);
+}
+
diff --git a/src/libibverbs.map b/src/libibverbs.map
index 3a346ed..1827da0 100644
--- a/src/libibverbs.map
+++ b/src/libibverbs.map
@@ -91,4 +91,9 @@ IBVERBS_1.1 {
 		ibv_dontfork_range;
 		ibv_dofork_range;
 		ibv_register_driver;
+
+		ibv_node_type_str;
+		ibv_port_state_str;
+		ibv_event_type_str;
+		ibv_wc_status_str;
 } IBVERBS_1.0;
-- 
1.5.1


From ogerlitz at voltaire.com  Wed Apr 16 00:32:32 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 16 Apr 2008 10:32:32 +0300
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <adawsmynb8o.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com> <48045BF3.8040305@voltaire.com>
	<adawsmynb8o.fsf@cisco.com>
Message-ID: <4805AB90.6060702@voltaire.com>

Roland Dreier wrote:
> to be honest I don't think verbs.7 is ready to merge yet.  I haven't had
> a chance to review in detail but I think it really is focusing on the
> wrong things right now.  For example, a list of the contents of
> <infiniband/verbs.h> really is not useful, since we already have
> verbs.h; on the other hand more detail on semantic issues such as
> thread-safety, IB/iWARP differences, etc.
If the section stating the different functions seems not useful it can 
be removed, I will be happy to hear what other people think, anyway, 
this section not what this man page is focusing on. I agree that more 
has to be said on issues such as IB/iWARP differences, thread-safety, 
fork, etc, so in case you prefer to see this "more" coming out before 
merging anything, let it be, but please note that its really uneasy for 
new comers to start programming to IB/iWARP without any man page that 
gives some generation notion on what is this libibverbs. In that 
respect, maybe you can merge the first portion of the page without the 
function listing, and later we can add more info on the various issues?

Or.


From yevgenyp at mellanox.co.il  Wed Apr 16 00:59:02 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 16 Apr 2008 10:59:02 +0300
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
Message-ID: <4805B1C6.80004@mellanox.co.il>


Multi Protocol supplies the user with the ability to run
Infiniband and Ethernet protocols on the same HCA
(separately or at the same time).

Main changes to mlx4_core:
         1.  Mlx4 device now holds the actual protocol for each port.
             The port types are determined through module parameters of through sysfs
             interface. The requested types are verified with firmware capabilities
             in order to determine the actual port protocol.
         2.  The driver now manages Mac and Vlan tables used by customers of the low
             level driver. Corresponding commands were added.
         3.  Completion eq's are created per cpu. Created cq's are attached to an eq by
             "Round Robin" algorithm, unless a specific eq was requested.
         4.  Creation of a collapsed cq support was added.
         5.  Additional reserved qp ranges were added. There is a range for the customers
             of the low level driver (IB, Ethernet, FCoE).
         6.  Qp allocation process changed.
             First a qp range should be reserved, then qps can be allocated from that
             range. This is to support the ability to allocate consecutive qps.
             Appropriate changes were made in the allocation mechanism.
         7.  Common actions to all HW resource management (Doorbell allocation,
             Buffer allocation, Mtt write) were moved to the low level driver.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Signed-off-by: Oren Duer <oren at mellanox.co.il>
Reviewed-by: Eli Cohen <eli at mellanox.co.il>
---
  drivers/net/mlx4/Makefile   |    2 +-
  drivers/net/mlx4/alloc.c    |  258 ++++++++++++++++++++++++++++++++++-
  drivers/net/mlx4/cq.c       |   26 +++-
  drivers/net/mlx4/eq.c       |   41 ++++--
  drivers/net/mlx4/fw.c       |   18 ++-
  drivers/net/mlx4/fw.h       |    7 +-
  drivers/net/mlx4/main.c     |  315 +++++++++++++++++++++++++++++++++++++++++--
  drivers/net/mlx4/mlx4.h     |   50 +++++++-
  drivers/net/mlx4/mr.c       |  157 ++++++++++++++++++++--
  drivers/net/mlx4/port.c     |  282 ++++++++++++++++++++++++++++++++++++++
  drivers/net/mlx4/qp.c       |  133 ++++++++++++++++---
  include/linux/mlx4/cmd.h    |    9 ++
  include/linux/mlx4/device.h |  118 ++++++++++++++++-
  include/linux/mlx4/qp.h     |   19 +++-
  14 files changed, 1354 insertions(+), 81 deletions(-)
  create mode 100644 drivers/net/mlx4/port.c

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 0952a65..f4932d8 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -1,4 +1,4 @@
  obj-$(CONFIG_MLX4_CORE)		+= mlx4_core.o

  mlx4_core-y :=	alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \
-		mr.o pd.o profile.o qp.o reset.o srq.o
+		mr.o pd.o profile.o qp.o reset.o srq.o port.o
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 75ef9d0..044614f 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -44,15 +44,19 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap)

  	spin_lock(&bitmap->lock);

-	obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last);
-	if (obj >= bitmap->max) {
+	obj = find_next_zero_bit(bitmap->table,
+				 bitmap->effective_max,
+				 bitmap->last);
+	if (obj >= bitmap->effective_max) {
  		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
-		obj = find_first_zero_bit(bitmap->table, bitmap->max);
+		obj = find_first_zero_bit(bitmap->table, bitmap->effective_max);
  	}

-	if (obj < bitmap->max) {
+	if (obj < bitmap->effective_max) {
  		set_bit(obj, bitmap->table);
-		bitmap->last = (obj + 1) & (bitmap->max - 1);
+		bitmap->last = (obj + 1);
+		if (bitmap->last == bitmap->effective_max)
+			bitmap->last = 0;
  		obj |= bitmap->top;
  	} else
  		obj = -1;
@@ -73,7 +77,84 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj)
  	spin_unlock(&bitmap->lock);
  }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved)
+static unsigned long find_next_zero_string_aligned(unsigned long *bitmap,
+						   u32 start, u32 nbits,
+						   int len, int align)
+{
+	unsigned long end, i;
+
+again:
+	start = ALIGN(start, align);
+	while ((start < nbits) && test_bit(start, bitmap))
+		start += align;
+	if (start >= nbits)
+		return -1;
+
+	end = start+len;
+	if (end > nbits)
+		return -1;
+	for (i = start+1; i < end; i++) {
+		if (test_bit(i, bitmap)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return start;
+}
+
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
+{
+	u32 obj, i;
+
+	if (likely(cnt == 1 && align == 1))
+		return mlx4_bitmap_alloc(bitmap);
+
+	spin_lock(&bitmap->lock);
+
+	obj = find_next_zero_string_aligned(bitmap->table, bitmap->last,
+					    bitmap->effective_max, cnt, align);
+	if (obj >= bitmap->effective_max) {
+		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+		obj = find_next_zero_string_aligned(bitmap->table, 0,
+						    bitmap->effective_max,
+						    cnt, align);
+	}
+
+	if (obj < bitmap->effective_max) {
+		for (i = 0; i < cnt; i++)
+			set_bit(obj+i, bitmap->table);
+		if (obj == bitmap->last) {
+			bitmap->last = (obj + cnt);
+			if (bitmap->last >= bitmap->effective_max)
+				bitmap->last = 0;
+		}
+		obj |= bitmap->top;
+	} else
+		obj = -1;
+
+	spin_unlock(&bitmap->lock);
+
+
+	return obj;
+}
+
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
+{
+	u32	i;
+
+	obj &= bitmap->max - 1;
+
+	spin_lock(&bitmap->lock);
+	for (i = 0; i < cnt; i++)
+		clear_bit(obj+i, bitmap->table);
+	bitmap->last = min(bitmap->last, obj);
+	bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+	spin_unlock(&bitmap->lock);
+}
+
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max)
  {
  	int i;

@@ -85,6 +166,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved
  	bitmap->top  = 0;
  	bitmap->max  = num;
  	bitmap->mask = mask;
+	bitmap->effective_max = effective_max;
  	spin_lock_init(&bitmap->lock);
  	bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL);
  	if (!bitmap->table)
@@ -96,6 +178,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved
  	return 0;
  }

+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
+{
+	return mlx4_bitmap_init_with_effective_max(bitmap, num, mask,
+						   reserved, num);
+}
+
  void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap)
  {
  	kfree(bitmap->table);
@@ -196,3 +285,160 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf)
  	}
  }
  EXPORT_SYMBOL_GPL(mlx4_buf_free);
+
+
+static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
+{
+	struct mlx4_db_pgdir *pgdir;
+
+	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
+	if (!pgdir)
+		return NULL;
+
+	bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2);
+	pgdir->bits[0] = pgdir->order0;
+	pgdir->bits[1] = pgdir->order1;
+	pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE,
+					    &pgdir->db_dma, GFP_KERNEL);
+	if (!pgdir->db_page) {
+		kfree(pgdir);
+		return NULL;
+	}
+
+	return pgdir;
+}
+
+static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
+				    struct mlx4_db *db, int order)
+{
+	int o;
+	int i;
+
+	for (o = order; o <= 1; ++o) {
+		i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
+		if (i < MLX4_DB_PER_PAGE >> o)
+			goto found;
+	}
+
+	return -ENOMEM;
+
+found:
+	clear_bit(i, pgdir->bits[o]);
+
+	i <<= o;
+
+	if (o > order)
+		set_bit(i ^ 1, pgdir->bits[order]);
+
+	db->pgdir = pgdir;
+	db->index   = i;
+	db->db      = pgdir->db_page + db->index;
+	db->dma     = pgdir->db_dma  + db->index * 4;
+	db->order   = order;
+
+	return 0;
+}
+
+static int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device,
+			 struct mlx4_db *db, int order)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_db_pgdir *pgdir;
+	int ret = 0;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	list_for_each_entry(pgdir, &priv->pgdir_list, list)
+		if (!mlx4_alloc_db_from_pgdir(pgdir, db, order))
+			goto out;
+
+	pgdir = mlx4_alloc_db_pgdir(dma_device);
+	if (!pgdir) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	list_add(&pgdir->list, &priv->pgdir_list);
+
+	/* This should never fail -- we just allocated an empty page: */
+	WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order));
+
+out:
+	mutex_unlock(&priv->pgdir_mutex);
+
+	return ret;
+}
+
+static void mlx4_db_free(struct mlx4_dev *dev, struct device *dma_device,
+			 struct mlx4_db *db)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int o;
+	int i;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	o = db->order;
+	i = db->index;
+
+	if (db->order == 0 && test_bit(i ^ 1, db->pgdir->order0)) {
+		clear_bit(i ^ 1, db->pgdir->order0);
+		++o;
+	}
+
+	i >>= o;
+	set_bit(i, db->pgdir->bits[o]);
+
+	if (bitmap_full(db->pgdir->order1, MLX4_DB_PER_PAGE / 2)) {
+		dma_free_coherent(dma_device, PAGE_SIZE,
+				  db->pgdir->db_page, db->pgdir->db_dma);
+		list_del(&db->pgdir->list);
+		kfree(db->pgdir);
+	}
+
+	mutex_unlock(&priv->pgdir_mutex);
+}
+
+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       struct device *dma_device, int size, int max_direct)
+{
+	int err;
+
+	err = mlx4_db_alloc(dev, dma_device, &wqres->db, 1);
+	if (err)
+		return err;
+	*wqres->db.db = 0;
+
+	if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) {
+		err = -ENOMEM;
+		goto err_db;
+	}
+
+	err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift,
+			    &wqres->mtt);
+	if (err)
+		goto err_buf;
+	err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf);
+	if (err)
+		goto err_mtt;
+
+	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+err_buf:
+	mlx4_buf_free(dev, size, &wqres->buf);
+err_db:
+	mlx4_db_free(dev, dma_device, &wqres->db);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res);
+
+void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		      struct device *dma_device, int size)
+{
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+	mlx4_buf_free(dev, size, &wqres->buf);
+	mlx4_db_free(dev, dma_device, &wqres->db);
+}
+EXPORT_SYMBOL_GPL(mlx4_free_hwq_res);
diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index caa5bcf..e905e61 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -188,7 +188,8 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq,
  EXPORT_SYMBOL_GPL(mlx4_cq_resize);

  int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq)
+		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
+		  unsigned vector, int collapsed)
  {
  	struct mlx4_priv *priv = mlx4_priv(dev);
  	struct mlx4_cq_table *cq_table = &priv->cq_table;
@@ -197,6 +198,9 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
  	u64 mtt_addr;
  	int err;

+#define COLLAPSED_SHIFT	18
+#define ENTRIES_SHIFT	24
+
  	cq->cqn = mlx4_bitmap_alloc(&cq_table->bitmap);
  	if (cq->cqn == -1)
  		return -ENOMEM;
@@ -224,8 +228,22 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
  	cq_context = mailbox->buf;
  	memset(cq_context, 0, sizeof *cq_context);

-	cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index);
-	cq_context->comp_eqn        = priv->eq_table.eq[MLX4_EQ_COMP].eqn;
+	cq_context->flags = cpu_to_be32(!!collapsed << COLLAPSED_SHIFT);
+	cq_context->logsize_usrpage = cpu_to_be32(
+		(ilog2(nent) << ENTRIES_SHIFT) | uar->index);
+	if(vector > priv->eq_table.num_comp_eqs) {
+		err = -EINVAL;
+		goto err_radix;
+	}
+
+	if (vector == 0) {
+		vector = priv->eq_table.last_comp_eq %
+			priv->eq_table.num_comp_eqs + 1;
+		priv->eq_table.last_comp_eq = vector;
+	}
+	cq->comp_eq_idx             = MLX4_EQ_COMP_CPU0 + vector - 1;
+	cq_context->comp_eqn        = priv->eq_table.eq[MLX4_EQ_COMP_CPU0 +
+							vector - 1].eqn;
  	cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;

  	mtt_addr = mlx4_mtt_addr(dev, mtt);
@@ -274,7 +292,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq)
  	if (err)
  		mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn);

-	synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq);
+	synchronize_irq(priv->eq_table.eq[cq->comp_eq_idx].irq);

  	spin_lock_irq(&cq_table->lock);
  	radix_tree_delete(&cq_table->tree, cq->cqn);
diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index e141a15..67af1b1 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -265,7 +265,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)

  	writel(priv->eq_table.clr_mask, priv->eq_table.clr_int);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
  		work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]);

  	return IRQ_RETVAL(work);
@@ -482,7 +482,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)

  	if (eq_table->have_irq)
  		free_irq(dev->pdev->irq, dev);
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + eq_table->num_comp_eqs; ++i)
  		if (eq_table->eq[i].have_irq)
  			free_irq(eq_table->eq[i].irq, eq_table->eq + i);
  }
@@ -555,6 +555,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
  	struct mlx4_priv *priv = mlx4_priv(dev);
  	int err;
  	int i;
+	int req_eqs;

  	err = mlx4_bitmap_init(&priv->eq_table.bitmap, dev->caps.num_eqs,
  			       dev->caps.num_eqs - 1, dev->caps.reserved_eqs);
@@ -573,11 +574,21 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
  	priv->eq_table.clr_int  = priv->clr_base +
  		(priv->eq_table.inta_pin < 32 ? 4 : 0);

-	err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE,
-			     (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0,
-			     &priv->eq_table.eq[MLX4_EQ_COMP]);
-	if (err)
-		goto err_out_unmap;
+	priv->eq_table.num_comp_eqs = 0;
+	req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1;
+	while (req_eqs) {
+		err = mlx4_create_eq(
+			dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE,
+			(dev->flags & MLX4_FLAG_MSI_X) ?
+			(MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs) : 0,
+			&priv->eq_table.eq[MLX4_EQ_COMP_CPU0 +
+			priv->eq_table.num_comp_eqs]);
+		if (err)
+			goto err_out_comp;
+		priv->eq_table.num_comp_eqs++;
+		req_eqs--;
+	}
+	priv->eq_table.last_comp_eq = 0;

  	err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE,
  			     (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0,
@@ -587,11 +598,11 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)

  	if (dev->flags & MLX4_FLAG_MSI_X) {
  		static const char *eq_name[] = {
-			[MLX4_EQ_COMP]  = DRV_NAME " (comp)",
-			[MLX4_EQ_ASYNC] = DRV_NAME " (async)"
+			[MLX4_EQ_ASYNC] = DRV_NAME "(async)",
+			[MLX4_EQ_COMP_CPU0...MLX4_NUM_EQ] = "eth" DRV_NAME,
  		};
-
-		for (i = 0; i < MLX4_NUM_EQ; ++i) {
+		for (i = 0;
+		     i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i) {
  			err = request_irq(priv->eq_table.eq[i].irq,
  					  mlx4_msi_x_interrupt,
  					  0, eq_name[i], priv->eq_table.eq + i);
@@ -616,7 +627,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
  		mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n",
  			   priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
  		eq_set_ci(&priv->eq_table.eq[i], 1);

  	return 0;
@@ -625,9 +636,9 @@ err_out_async:
  	mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]);

  err_out_comp:
-	mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]);
+	for (i = 0; i < priv->eq_table.num_comp_eqs; ++i)
+		mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + i]);

-err_out_unmap:
  	mlx4_unmap_clr_int(dev);
  	mlx4_free_irqs(dev);

@@ -646,7 +657,7 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev)

  	mlx4_free_irqs(dev);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
  		mlx4_free_eq(dev, &priv->eq_table.eq[i]);

  	mlx4_unmap_clr_int(dev);
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index d82f275..fe0f6b3 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -314,7 +314,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  			MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET);
  			dev_cap->max_vl[i]	   = field >> 4;
  			MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET);
-			dev_cap->max_mtu[i]	   = field >> 4;
+			dev_cap->ib_mtu[i]	   = field >> 4;
  			dev_cap->max_port_width[i] = field & 0xf;
  			MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET);
  			dev_cap->max_gids[i]	   = 1 << (field & 0xf);
@@ -322,9 +322,11 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
  		}
  	} else {
+#define QUERY_PORT_SUPPORTED_TYPE_OFFSET	0x00
  #define QUERY_PORT_MTU_OFFSET			0x01
  #define QUERY_PORT_WIDTH_OFFSET			0x06
  #define QUERY_PORT_MAX_GID_PKEY_OFFSET		0x07
+#define QUERY_PORT_MAX_MACVLAN_OFFSET		0x0a
  #define QUERY_PORT_MAX_VL_OFFSET		0x0b

  		for (i = 1; i <= dev_cap->num_ports; ++i) {
@@ -334,7 +336,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  				goto out;

  			MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET);
-			dev_cap->max_mtu[i]	   = field & 0xf;
+			dev_cap->ib_mtu[i]	   = field & 0xf;
  			MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET);
  			dev_cap->max_port_width[i] = field & 0xf;
  			MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET);
@@ -342,6 +344,14 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
  			MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET);
  			dev_cap->max_vl[i]	   = field & 0xf;
+			MLX4_GET(field, outbox,
+				 QUERY_PORT_SUPPORTED_TYPE_OFFSET);
+			dev_cap->supported_port_types[i] = field & 3;
+			MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET);
+			dev_cap->log_max_macs[i]  = field & 0xf;
+			dev_cap->log_max_vlans[i] = field >> 4;
+			dev_cap->eth_mtu[i] = be16_to_cpu(((u16 *) outbox)[1]);
+			dev_cap->def_mac[i] = be64_to_cpu(((u64 *) outbox)[2]);
  		}
  	}

@@ -379,7 +389,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  	mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
  		 dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz);
  	mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n",
-		 dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1],
+		 dev_cap->local_ca_ack_delay, 128 << dev_cap->ib_mtu[1],
  		 dev_cap->max_port_width[1]);
  	mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n",
  		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
@@ -787,7 +797,7 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, int port)
  		flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT;
  		MLX4_PUT(inbox, flags,		  INIT_PORT_FLAGS_OFFSET);

-		field = 128 << dev->caps.mtu_cap[port];
+		field = 128 << dev->caps.ib_mtu_cap[port];
  		MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET);
  		field = dev->caps.gid_table_len[port];
  		MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET);
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 306cb9b..ef964d5 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -61,11 +61,13 @@ struct mlx4_dev_cap {
  	int local_ca_ack_delay;
  	int num_ports;
  	u32 max_msg_sz;
-	int max_mtu[MLX4_MAX_PORTS + 1];
+	int ib_mtu[MLX4_MAX_PORTS + 1];
  	int max_port_width[MLX4_MAX_PORTS + 1];
  	int max_vl[MLX4_MAX_PORTS + 1];
  	int max_gids[MLX4_MAX_PORTS + 1];
  	int max_pkeys[MLX4_MAX_PORTS + 1];
+	u64 def_mac[MLX4_MAX_PORTS + 1];
+	int eth_mtu[MLX4_MAX_PORTS + 1];
  	u16 stat_rate_support;
  	u32 flags;
  	int reserved_uars;
@@ -97,6 +99,9 @@ struct mlx4_dev_cap {
  	u32 reserved_lkey;
  	u64 max_icm_sz;
  	int max_gso_sz;
+	u8  supported_port_types[MLX4_MAX_PORTS + 1];
+	u8  log_max_macs[MLX4_MAX_PORTS + 1];
+	u8  log_max_vlans[MLX4_MAX_PORTS + 1];
  };

  struct mlx4_adapter {
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 49a4aca..50b5eb7 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -38,6 +38,8 @@
  #include <linux/errno.h>
  #include <linux/pci.h>
  #include <linux/dma-mapping.h>
+#include <linux/log2.h>
+#include <linux/if_ether.h>

  #include <linux/mlx4/device.h>
  #include <linux/mlx4/doorbell.h>
@@ -81,14 +83,83 @@ static struct mlx4_profile default_profile = {
  	.rdmarc_per_qp	= 1 << 4,
  	.num_cq		= 1 << 16,
  	.num_mcg	= 1 << 13,
-	.num_mpt	= 1 << 17,
+	.num_mpt	= 1 << 18,
  	.num_mtt	= 1 << 20,
  };

+static int mod_param_num_mac = 1;
+module_param_named(num_mac, mod_param_num_mac, int, 0444);
+MODULE_PARM_DESC(num_mac, "Maximum number of MACs per ETH port "
+			  "(1-127, default 1)");
+
+static int mod_param_num_vlan;
+module_param_named(num_vlan, mod_param_num_vlan, int, 0444);
+MODULE_PARM_DESC(num_vlan, "Maximum number of VLANs per ETH port "
+			   "(0-126, default 0)");
+
+static int mod_param_use_prio;
+module_param_named(use_prio, mod_param_use_prio, bool, 0444);
+MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports "
+			   "(0/1, default 0)");
+
+static int mod_param_if_eth = 1;
+module_param_named(if_eth, mod_param_if_eth, bool, 0444);
+MODULE_PARM_DESC(if_eth, "Enable ETH interface be loaded (0/1, default 1)");
+
+static int mod_param_if_fc = 1;
+module_param_named(if_fc, mod_param_if_fc, bool, 0444);
+MODULE_PARM_DESC(if_fc, "Enable FC interface be loaded (0/1, default 1)");
+
+static char *mod_param_port_type[MLX4_MAX_PORTS] =
+				{ [0 ... (MLX4_MAX_PORTS-1)] = "ib"};
+module_param_array_named(port_type, mod_param_port_type, charp, NULL, 0444);
+MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, "
+			    "comma seperated, default ib for all)");
+
+static int mod_param_port_mtu[MLX4_MAX_PORTS] =
+				{ [0 ... (MLX4_MAX_PORTS-1)] = 9600};
+module_param_array_named(port_mtu, mod_param_port_mtu, int, NULL, 0444);
+MODULE_PARM_DESC(port_mtu, "Ports max mtu in Bytes, entry per port, "
+			   "comma seperated, default 9600 for all");
+
+static int mlx4_check_port_params(struct mlx4_dev *dev,
+				  enum mlx4_port_type *port_type)
+{
+	if (port_type[0] != port_type[1] &&
+	    !(dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP)) {
+		mlx4_err(dev, "Only same port types supported "
+			      "on this HCA, aborting.\n");
+		return -EINVAL;
+	}
+	if ((port_type[0] == MLX4_PORT_TYPE_ETH) &&
+	    (port_type[1] == MLX4_PORT_TYPE_IB)) {
+		mlx4_err(dev, "eth-ib configuration is not supported.\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void mlx4_str2port_type(char **port_str,
+			       enum mlx4_port_type *port_type)
+{
+	int i;
+
+	for (i = 0; i < MLX4_MAX_PORTS; i++) {
+		if (!strcmp(port_str[i], "eth"))
+			port_type[i] = MLX4_PORT_TYPE_ETH;
+		else
+			port_type[i] = MLX4_PORT_TYPE_IB;
+	}
+}
+
  static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  {
  	int err;
  	int i;
+	int num_eth_ports = 0;
+	enum mlx4_port_type port_type[MLX4_MAX_PORTS];
+
+	mlx4_str2port_type(mod_param_port_type, port_type);

  	err = mlx4_QUERY_DEV_CAP(dev, dev_cap);
  	if (err) {
@@ -120,10 +191,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  	dev->caps.num_ports	     = dev_cap->num_ports;
  	for (i = 1; i <= dev->caps.num_ports; ++i) {
  		dev->caps.vl_cap[i]	    = dev_cap->max_vl[i];
-		dev->caps.mtu_cap[i]	    = dev_cap->max_mtu[i];
+		dev->caps.ib_mtu_cap[i]	    = dev_cap->ib_mtu[i];
  		dev->caps.gid_table_len[i]  = dev_cap->max_gids[i];
  		dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i];
  		dev->caps.port_width_cap[i] = dev_cap->max_port_width[i];
+		dev->caps.eth_mtu_cap[i]    = dev_cap->eth_mtu[i];
+		dev->caps.def_mac[i]	    = dev_cap->def_mac[i];
  	}

  	dev->caps.num_uars	     = dev_cap->uar_size / PAGE_SIZE;
@@ -134,7 +207,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  	dev->caps.max_rq_sg	     = dev_cap->max_rq_sg;
  	dev->caps.max_wqes	     = dev_cap->max_qp_sz;
  	dev->caps.max_qp_init_rdma   = dev_cap->max_requester_per_qp;
-	dev->caps.reserved_qps	     = dev_cap->reserved_qps;
  	dev->caps.max_srq_wqes	     = dev_cap->max_srq_sz;
  	dev->caps.max_srq_sge	     = dev_cap->max_rq_sg - 1;
  	dev->caps.reserved_srqs	     = dev_cap->reserved_srqs;
@@ -161,9 +233,155 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
  	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
  	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;

+	dev->caps.log_num_macs 	= ilog2(roundup_pow_of_two
+					(mod_param_num_mac + 1));
+	dev->caps.log_num_vlans = ilog2(roundup_pow_of_two
+					(mod_param_num_vlan + 2));
+	dev->caps.log_num_prios = mod_param_use_prio ? 3: 0;
+
+	err = mlx4_check_port_params(dev, port_type);
+	if (err)
+		return err;
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (!dev_cap->supported_port_types[i]) {
+			mlx4_warn(dev, "FW doesn't support Multi Protocol, "
+				       "loading IB only\n");
+			dev->caps.port_type[i] = MLX4_PORT_TYPE_IB;
+			continue;
+		}
+		if (port_type[i-1] & dev_cap->supported_port_types[i])
+			dev->caps.port_type[i] = port_type[i-1];
+		else {
+			mlx4_err(dev, "Requested port type for port %d "
+				      "not supported by HW\n", i);
+			return -ENODEV;
+		}
+		if (mod_param_port_mtu[i-1] <= dev->caps.eth_mtu_cap[i])
+			dev->caps.eth_mtu_cap[i] = mod_param_port_mtu[i-1];
+		else
+			mlx4_warn(dev, "Requested mtu for port %d is larger "
+				       "then supported, reducing to %d\n",
+					i, dev->caps.eth_mtu_cap[i]);
+		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
+			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
+			mlx4_warn(dev, "Requested number of MACs is too much "
+				       "for port %d, reducing to %d.\n",
+					i, 1 << dev->caps.log_num_macs);
+		}
+		if (dev->caps.log_num_vlans > dev_cap->log_max_vlans[i]) {
+			dev->caps.log_num_vlans = dev_cap->log_max_vlans[i];
+			mlx4_warn(dev, "Requested number of VLANs is too much "
+				       "for port %d, reducing to %d.\n",
+					i, 1 << dev->caps.log_num_vlans);
+		}
+		if (dev->caps.port_type[i] == MLX4_PORT_TYPE_ETH)
+			++num_eth_ports;
+	}
+
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] = dev_cap->reserved_qps;
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] =
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] =
+		(1 << dev->caps.log_num_macs)*
+		(1 << dev->caps.log_num_vlans)*
+		(1 << dev->caps.log_num_prios)*
+		num_eth_ports;
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH;
+
  	return 0;
  }

+static int mlx4_change_port_types(struct mlx4_dev *dev,
+				  enum mlx4_port_type *port_types)
+{
+	int i;
+	int err = 0;
+	int change = 0;
+	int port;
+
+	for (i = 0; i <  MLX4_MAX_PORTS; i++) {
+		if (port_types[i] != dev->caps.port_type[i + 1]) {
+			change = 1;
+			dev->caps.port_type[i + 1] = port_types[i];
+		}
+	}
+	if (change) {
+		mlx4_unregister_device(dev);
+		for (port = 1; port <= dev->caps.num_ports; port++) {
+			mlx4_CLOSE_PORT(dev, port);
+			err = mlx4_SET_PORT(dev, port);
+			if (err) {
+				mlx4_err(dev, "Failed to set port %d, "
+					      "aborting\n", port);
+				return err;
+			}
+		}
+		err = mlx4_register_device(dev);
+	}
+	return err;
+}
+
+static ssize_t show_port_type(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mlx4_dev *mdev = pci_get_drvdata(pdev);
+	int i;
+
+	sprintf(buf, "Current port types:\n");
+	for (i = 1; i <= MLX4_MAX_PORTS; i++) {
+		sprintf(buf, "%sPort%d: %s\n", buf, i,
+			(mdev->caps.port_type[i] == MLX4_PORT_TYPE_IB)?
+			"ib": "eth");
+	}
+	return strlen(buf);
+}
+
+
+static ssize_t set_port_type(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mlx4_dev *mdev = pci_get_drvdata(pdev);
+	char *type;
+	enum mlx4_port_type port_types[MLX4_MAX_PORTS];
+	char *loc_buf;
+	char *ptr;
+	int i;
+	int err = 0;
+
+	loc_buf = kmalloc(count + 1, GFP_KERNEL);
+	if (!loc_buf)
+		return -ENOMEM;
+
+	ptr = loc_buf;
+	memcpy(loc_buf, buf, count + 1);
+	for (i = 0; i < MLX4_MAX_PORTS; i++) {
+		type = strsep(&loc_buf, ",");
+		if (!strcmp(type, "ib"))
+			port_types[i] = MLX4_PORT_TYPE_IB;
+		else if (!strcmp(type, "eth"))
+			port_types[i] = MLX4_PORT_TYPE_ETH;
+		else {
+			dev_warn(dev, "%s is not acceptable port type "
+				 "(use 'eth' or 'ib' only)\n", type);
+			err = -EINVAL;
+			goto out;
+		}
+	}
+	err = mlx4_check_port_params(mdev, port_types);
+	if (err)
+		goto out;
+
+	err = mlx4_change_port_types(mdev, port_types);
+out:
+	kfree(ptr);
+	return err ? err: count;
+}
+static DEVICE_ATTR(mlx4_port_type, S_IWUGO | S_IRUGO, show_port_type, set_port_type);
+
  static int mlx4_load_fw(struct mlx4_dev *dev)
  {
  	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -209,7 +427,8 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base,
  				  ((u64) (MLX4_CMPT_TYPE_QP *
  					  cmpt_entry_sz) << MLX4_CMPT_SHIFT),
  				  cmpt_entry_sz, dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
  	if (err)
  		goto err;

@@ -334,7 +553,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
  				  init_hca->qpc_base,
  				  dev_cap->qpc_entry_sz,
  				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
  	if (err) {
  		mlx4_err(dev, "Failed to map QP context memory, aborting.\n");
  		goto err_unmap_dmpt;
@@ -344,7 +564,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
  				  init_hca->auxc_base,
  				  dev_cap->aux_entry_sz,
  				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
  	if (err) {
  		mlx4_err(dev, "Failed to map AUXC context memory, aborting.\n");
  		goto err_unmap_qp;
@@ -354,7 +575,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
  				  init_hca->altc_base,
  				  dev_cap->altc_entry_sz,
  				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
  	if (err) {
  		mlx4_err(dev, "Failed to map ALTC context memory, aborting.\n");
  		goto err_unmap_auxc;
@@ -364,7 +586,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
  				  init_hca->rdmarc_base,
  				  dev_cap->rdmarc_entry_sz << priv->qp_table.rdmarc_shift,
  				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
  	if (err) {
  		mlx4_err(dev, "Failed to map RDMARC context memory, aborting\n");
  		goto err_unmap_altc;
@@ -556,6 +779,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
  {
  	struct mlx4_priv *priv = mlx4_priv(dev);
  	int err;
+	int port;

  	err = mlx4_init_uar_table(dev);
  	if (err) {
@@ -654,8 +878,25 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
  		goto err_qp_table_free;
  	}

+	for (port = 1; port <= dev->caps.num_ports; port++) {
+		err = mlx4_SET_PORT(dev, port);
+		if (err) {
+			mlx4_err(dev, "Failed to set port %d, aborting\n",
+				 port);
+			goto err_mcg_table_free;
+		}
+	}
+
+	for (port = 0; port < dev->caps.num_ports; port++) {
+		mlx4_init_mac_table(dev, port);
+		mlx4_init_vlan_table(dev, port);
+	}
+
  	return 0;

+err_mcg_table_free:
+	mlx4_cleanup_mcg_table(dev);
+
  err_qp_table_free:
  	mlx4_cleanup_qp_table(dev);

@@ -692,22 +933,25 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
  {
  	struct mlx4_priv *priv = mlx4_priv(dev);
  	struct msix_entry entries[MLX4_NUM_EQ];
+	int needed_vectors = MLX4_EQ_COMP_CPU0 + num_online_cpus();
  	int err;
  	int i;

  	if (msi_x) {
-		for (i = 0; i < MLX4_NUM_EQ; ++i)
+		for (i = 0; i < needed_vectors; ++i)
  			entries[i].entry = i;

-		err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries));
+		err = pci_enable_msix(dev->pdev, entries, needed_vectors);
  		if (err) {
  			if (err > 0)
-				mlx4_info(dev, "Only %d MSI-X vectors available, "
-					  "not using MSI-X\n", err);
+				mlx4_info(dev, "Only %d MSI-X vectors "
+					       "available, need %d. "
+						"Not using MSI-X\n",
+						err, needed_vectors);
  			goto no_msi;
  		}

-		for (i = 0; i < MLX4_NUM_EQ; ++i)
+		for (i = 0; i < needed_vectors; ++i)
  			priv->eq_table.eq[i].irq = entries[i].vector;

  		dev->flags |= MLX4_FLAG_MSI_X;
@@ -715,7 +959,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
  	}

  no_msi:
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < needed_vectors; ++i)
  		priv->eq_table.eq[i].irq = dev->pdev->irq;
  }

@@ -798,6 +1042,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
  	INIT_LIST_HEAD(&priv->ctx_list);
  	spin_lock_init(&priv->ctx_lock);

+	INIT_LIST_HEAD(&priv->pgdir_list);
+	mutex_init(&priv->pgdir_mutex);
+
  	/*
  	 * Now reset the HCA before we touch the PCI capabilities or
  	 * attempt a firmware command, since a boot ROM may have left
@@ -836,8 +1083,14 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)

  	pci_set_drvdata(pdev, dev);

+	if (device_create_file(&pdev->dev, &dev_attr_mlx4_port_type))
+		goto sysfs_failed;
+
  	return 0;

+sysfs_failed:
+	mlx4_unregister_device(dev);
+
  err_cleanup:
  	mlx4_cleanup_mcg_table(dev);
  	mlx4_cleanup_qp_table(dev);
@@ -893,6 +1146,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
  	int p;

  	if (dev) {
+		device_remove_file(&pdev->dev, &dev_attr_mlx4_port_type);
  		mlx4_unregister_device(dev);

  		for (p = 1; p <= dev->caps.num_ports; ++p)
@@ -948,10 +1202,43 @@ static struct pci_driver mlx4_driver = {
  	.remove		= __devexit_p(mlx4_remove_one)
  };

+static int __init mlx4_verify_params(void)
+{
+	int i;
+
+	for (i = 0; i < MLX4_MAX_PORTS; ++i) {
+		if (strcmp(mod_param_port_type[i], "eth") &&
+		    strcmp(mod_param_port_type[i], "ib")) {
+			printk(KERN_WARNING "mlx4_core: bad port_type for "
+					    "port %d: %s\n",
+					    i, mod_param_port_type[i]);
+			return -1;
+		}
+	}
+	if ((mod_param_num_mac < 1) ||
+	    (mod_param_num_mac > 127)) {
+		printk(KERN_WARNING "mlx4_core: bad num_mac: %d\n",
+		       mod_param_num_mac);
+		return -1;
+	}
+
+	if ((mod_param_num_vlan < 0) ||
+	     (mod_param_num_vlan > 126)) {
+		printk(KERN_WARNING "mlx4_core: bad num_vlan: %d\n",
+		       mod_param_num_vlan);
+		return -1;
+	}
+
+	return 0;
+}
+
  static int __init mlx4_init(void)
  {
  	int ret;

+	if (mlx4_verify_params())
+		return -EINVAL;
+
  	ret = mlx4_catas_init();
  	if (ret)
  		return ret;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 7333681..2af3d07 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -64,8 +64,8 @@ enum {

  enum {
  	MLX4_EQ_ASYNC,
-	MLX4_EQ_COMP,
-	MLX4_NUM_EQ
+	MLX4_EQ_COMP_CPU0,
+	MLX4_NUM_EQ		= MLX4_EQ_COMP_CPU0 + NR_CPUS,
  };

  enum {
@@ -111,6 +111,7 @@ struct mlx4_bitmap {
  	u32			last;
  	u32			top;
  	u32			max;
+	u32			effective_max;
  	u32			mask;
  	spinlock_t		lock;
  	unsigned long	       *table;
@@ -210,6 +211,8 @@ struct mlx4_eq_table {
  	void __iomem	       *uar_map[(MLX4_NUM_EQ + 6) / 4];
  	u32			clr_mask;
  	struct mlx4_eq		eq[MLX4_NUM_EQ];
+	int			num_comp_eqs;
+	int			last_comp_eq;
  	u64			icm_virt;
  	struct page	       *icm_page;
  	dma_addr_t		icm_dma;
@@ -250,6 +253,35 @@ struct mlx4_catas_err {
  	struct list_head	list;
  };

+struct mlx4_mac_table {
+#define MLX4_MAX_MAC_NUM	128
+#define MLX4_MAC_MASK		0xffffffffffff
+#define MLX4_MAC_VALID_SHIFT	63
+#define MLX4_MAC_TABLE_SIZE	MLX4_MAX_MAC_NUM << 3
+	__be64 entries[MLX4_MAX_MAC_NUM];
+	int refs[MLX4_MAX_MAC_NUM];
+	struct semaphore mac_sem;
+	int total;
+	int max;
+};
+
+struct mlx4_vlan_table {
+#define MLX4_MAX_VLAN_NUM	126
+#define MLX4_VLAN_MASK		0xfff
+#define MLX4_VLAN_VALID		1 << 31
+#define MLX4_VLAN_TABLE_SIZE	MLX4_MAX_VLAN_NUM << 2
+	__be32 entries[MLX4_MAX_VLAN_NUM];
+	int refs[MLX4_MAX_VLAN_NUM];
+	struct semaphore vlan_sem;
+	int total;
+	int max;
+};
+
+struct mlx4_port_info {
+	struct mlx4_mac_table	mac_table;
+	struct mlx4_vlan_table	vlan_table;
+};
+
  struct mlx4_priv {
  	struct mlx4_dev		dev;

@@ -257,6 +289,9 @@ struct mlx4_priv {
  	struct list_head	ctx_list;
  	spinlock_t		ctx_lock;

+	struct list_head        pgdir_list;
+	struct mutex            pgdir_mutex;
+
  	struct mlx4_fw		fw;
  	struct mlx4_cmd		cmd;

@@ -275,6 +310,7 @@ struct mlx4_priv {

  	struct mlx4_uar		driver_uar;
  	void __iomem	       *kar;
+	struct mlx4_port_info	port[MLX4_MAX_PORTS];
  };

  static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)
@@ -284,7 +320,12 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)

  u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap);
  void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
  int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max);
  void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

  int mlx4_reset(struct mlx4_dev *dev);
@@ -336,10 +377,15 @@ void mlx4_cmd_use_polling(struct mlx4_dev *dev);
  void mlx4_cq_completion(struct mlx4_dev *dev, u32 cqn);
  void mlx4_cq_event(struct mlx4_dev *dev, u32 cqn, int event_type);

+void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port);
+void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port);
+
  void mlx4_qp_event(struct mlx4_dev *dev, u32 qpn, int event_type);

  void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type);

  void mlx4_handle_catas_err(struct mlx4_dev *dev);

+int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port);
+
  #endif /* MLX4_H */
diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c
index 79b317b..2fbf6a3 100644
--- a/drivers/net/mlx4/mr.c
+++ b/drivers/net/mlx4/mr.c
@@ -52,7 +52,9 @@ struct mlx4_mpt_entry {
  	__be64 length;
  	__be32 lkey;
  	__be32 win_cnt;
-	u8	reserved1[3];
+	u8	reserved1;
+	u8	flags2;
+	u8	reserved2;
  	u8	mtt_rep;
  	__be64 mtt_seg;
  	__be32 mtt_sz;
@@ -68,6 +70,8 @@ struct mlx4_mpt_entry {

  #define MLX4_MTT_FLAG_PRESENT		1

+#define MLX4_MPT_FLAG2_FBO_EN	    (1 <<  7)
+
  #define MLX4_MPT_STATUS_SW		0xF0
  #define MLX4_MPT_STATUS_HW		0x00

@@ -250,6 +254,21 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox
  			    !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B);
  }

+int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd,
+			   u64 iova, u64 size, u32 access, int npages,
+			   int page_shift, struct mlx4_mr *mr)
+{
+	mr->iova       = iova;
+	mr->size       = size;
+	mr->pd	       = pd;
+	mr->access     = access;
+	mr->enabled    = 0;
+	mr->key	       = hw_index_to_key(mridx);
+
+	return mlx4_mtt_init(dev, npages, page_shift, &mr->mtt);
+}
+EXPORT_SYMBOL_GPL(mlx4_mr_alloc_reserved);
+
  int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
  		  int npages, int page_shift, struct mlx4_mr *mr)
  {
@@ -261,14 +280,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
  	if (index == -1)
  		return -ENOMEM;

-	mr->iova       = iova;
-	mr->size       = size;
-	mr->pd	       = pd;
-	mr->access     = access;
-	mr->enabled    = 0;
-	mr->key	       = hw_index_to_key(index);
-
-	err = mlx4_mtt_init(dev, npages, page_shift, &mr->mtt);
+	err = mlx4_mr_alloc_reserved(dev, index, pd, iova, size,
+				     access, npages, page_shift, mr);
  	if (err)
  		mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index);

@@ -276,9 +289,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
  }
  EXPORT_SYMBOL_GPL(mlx4_mr_alloc);

-void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr)
+void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr)
  {
-	struct mlx4_priv *priv = mlx4_priv(dev);
  	int err;

  	if (mr->enabled) {
@@ -290,6 +302,13 @@ void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr)
  	}

  	mlx4_mtt_cleanup(dev, &mr->mtt);
+}
+EXPORT_SYMBOL_GPL(mlx4_mr_free_reserved);
+
+void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	mlx4_mr_free_reserved(dev, mr);
  	mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, key_to_hw_index(mr->key));
  }
  EXPORT_SYMBOL_GPL(mlx4_mr_free);
@@ -435,8 +454,15 @@ int mlx4_init_mr_table(struct mlx4_dev *dev)
  	struct mlx4_mr_table *mr_table = &mlx4_priv(dev)->mr_table;
  	int err;

-	err = mlx4_bitmap_init(&mr_table->mpt_bitmap, dev->caps.num_mpts,
-			       ~0, dev->caps.reserved_mrws);
+	if (!is_power_of_2(dev->caps.num_mpts))
+		return -EINVAL;
+
+	dev->caps.reserved_fexch_mpts_base = dev->caps.num_mpts -
+		(2 * dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]);
+	err = mlx4_bitmap_init_with_effective_max(&mr_table->mpt_bitmap,
+					dev->caps.num_mpts,
+					~0, dev->caps.reserved_mrws,
+					dev->caps.reserved_fexch_mpts_base);
  	if (err)
  		return err;

@@ -544,6 +570,56 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list
  }
  EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr);

+int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev,
+			  struct mlx4_fmr *fmr,
+			  u64 *page_list, int npages,
+			  u64 iova, u32 fbo, u32 len,
+			  u32 *lkey, u32 *rkey)
+{
+	u32 key;
+	int i, err;
+
+	err = mlx4_check_fmr(fmr, page_list, npages, iova);
+	if (err)
+		return err;
+
+	++fmr->maps;
+
+	key = key_to_hw_index(fmr->mr.key);
+
+	*lkey = *rkey = fmr->mr.key = hw_index_to_key(key);
+
+	*(u8 *) fmr->mpt = MLX4_MPT_STATUS_SW;
+
+	/* Make sure MPT status is visible before writing MTT entries */
+	wmb();
+
+	for (i = 0; i < npages; ++i)
+		fmr->mtts[i] = cpu_to_be64(page_list[i] |
+					   MLX4_MTT_FLAG_PRESENT);
+
+	dma_sync_single(&dev->pdev->dev, fmr->dma_handle,
+			npages * sizeof(u64), DMA_TO_DEVICE);
+
+	fmr->mpt->key    = cpu_to_be32(key);
+	fmr->mpt->lkey   = cpu_to_be32(key);
+	fmr->mpt->length = cpu_to_be64(len);
+	fmr->mpt->start  = cpu_to_be64(iova);
+	fmr->mpt->first_byte_offset = cpu_to_be32(fbo & 0x001fffff);
+	fmr->mpt->flags2 = (fbo ? MLX4_MPT_FLAG2_FBO_EN : 0);
+
+	/* Make MTT entries are visible before setting MPT status */
+	wmb();
+
+	*(u8 *) fmr->mpt = MLX4_MPT_STATUS_HW;
+
+	/* Make sure MPT status is visible before consumer can use FMR */
+	wmb();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr_fbo);
+
  int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
  		   int max_maps, u8 page_shift, struct mlx4_fmr *fmr)
  {
@@ -586,6 +662,49 @@ err_free:
  }
  EXPORT_SYMBOL_GPL(mlx4_fmr_alloc);

+int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx,
+			    u32 pd, u32 access, int max_pages,
+			    int max_maps, u8 page_shift, struct mlx4_fmr *fmr)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	u64 mtt_seg;
+	int err = -ENOMEM;
+
+	if (page_shift < 12 || page_shift >= 32)
+		return -EINVAL;
+
+	/* All MTTs must fit in the same page */
+	if (max_pages * sizeof *fmr->mtts > PAGE_SIZE)
+		return -EINVAL;
+
+	fmr->page_shift = page_shift;
+	fmr->max_pages  = max_pages;
+	fmr->max_maps   = max_maps;
+	fmr->maps = 0;
+
+	err = mlx4_mr_alloc_reserved(dev, mridx, pd, 0, 0, access, max_pages,
+				     page_shift, &fmr->mr);
+	if (err)
+		return err;
+
+	mtt_seg = fmr->mr.mtt.first_seg * dev->caps.mtt_entry_sz;
+
+	fmr->mtts = mlx4_table_find(&priv->mr_table.mtt_table,
+				    fmr->mr.mtt.first_seg,
+				    &fmr->dma_handle);
+	if (!fmr->mtts) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	return 0;
+
+err_free:
+	mlx4_mr_free_reserved(dev, &fmr->mr);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_fmr_alloc_reserved);
+
  int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr)
  {
  	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -634,6 +753,18 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr)
  }
  EXPORT_SYMBOL_GPL(mlx4_fmr_free);

+int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr)
+{
+	if (fmr->maps)
+		return -EBUSY;
+
+	fmr->mr.enabled = 0;
+	mlx4_mr_free_reserved(dev, &fmr->mr);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_fmr_free_reserved);
+
  int mlx4_SYNC_TPT(struct mlx4_dev *dev)
  {
  	return mlx4_cmd(dev, 0, 0, 0, MLX4_CMD_SYNC_TPT, 1000);
diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c
new file mode 100644
index 0000000..5e685ca
--- /dev/null
+++ b/drivers/net/mlx4/port.c
@@ -0,0 +1,282 @@
+/*
+ * Copyright (c) 2007 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/if_ether.h>
+
+#include <linux/mlx4/cmd.h>
+
+#include "mlx4.h"
+
+void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port].mac_table;
+	int i;
+
+	sema_init(&table->mac_sem, 1);
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		table->entries[i] = 0;
+		table->refs[i] = 0;
+	}
+	table->max = 1 << dev->caps.log_num_macs;
+	table->total = 0;
+}
+
+void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table;
+	int i;
+
+	sema_init(&table->vlan_sem, 1);
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		table->entries[i] = 0;
+		table->refs[i] = 0;
+	}
+	table->max = 1 << dev->caps.log_num_vlans;
+	table->total = 0;
+}
+
+static int mlx4_SET_PORT_mac_table(struct mlx4_dev *dev, u8 port,
+				   __be64 *entries)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 in_mod;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memcpy(mailbox->buf, entries, MLX4_MAC_TABLE_SIZE);
+
+	in_mod = MLX4_SET_PORT_MAC_TABLE << 8 | port;
+	err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+
+int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index)
+{
+	struct mlx4_mac_table *table =
+	&mlx4_priv(dev)->port[port - 1].mac_table;
+	int i, err = 0;
+	int free = -1;
+	u64 valid = 1;
+
+	mlx4_dbg(dev, "Registering mac : 0x%llx\n", mac);
+	down(&table->mac_sem);
+	for (i = 0; i < MLX4_MAX_MAC_NUM - 1; i++) {
+		if (free < 0 && !table->refs[i]) {
+			free = i;
+			continue;
+		}
+
+		if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) {
+			/* Mac already registered, increase refernce count */
+			*index = i;
+			++table->refs[i];
+			goto out;
+		}
+	}
+	mlx4_dbg(dev, "Free mac index is %d\n", free);
+
+	if (table->total == table->max) {
+		/* No free mac entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
+	/* Register new MAC */
+	table->refs[free] = 1;
+	table->entries[free] = cpu_to_be64(mac | valid << MLX4_MAC_VALID_SHIFT);
+
+	err = mlx4_SET_PORT_mac_table(dev, port, table->entries);
+	if (unlikely(err)) {
+		mlx4_err(dev, "Failed adding mac: 0x%llx\n", mac);
+		table->refs[free] = 0;
+		table->entries[free] = 0;
+		goto out;
+	}
+
+	*index = free;
+	++table->total;
+out:
+	up(&table->mac_sem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_register_mac);
+
+void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index)
+{
+	struct mlx4_mac_table *table =
+	&mlx4_priv(dev)->port[port - 1].mac_table;
+
+	down(&table->mac_sem);
+	if (!table->refs[index]) {
+		mlx4_warn(dev, "No mac entry for index %d\n", index);
+		goto out;
+	}
+	if (--table->refs[index]) {
+		mlx4_warn(dev, "Have more references for index %d,"
+			  "no need to modify mac table\n", index);
+		goto out;
+	}
+	table->entries[index] = 0;
+	mlx4_SET_PORT_mac_table(dev, port, table->entries);
+	--table->total;
+out:
+	up(&table->mac_sem);
+}
+EXPORT_SYMBOL_GPL(mlx4_unregister_mac);
+
+static int mlx4_SET_PORT_vlan_table(struct mlx4_dev *dev, u8 port,
+				    __be32 *entries)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 in_mod;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memcpy(mailbox->buf, entries, MLX4_VLAN_TABLE_SIZE);
+	in_mod = MLX4_SET_PORT_VLAN_TABLE << 8 | port;
+	err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+
+	return err;
+}
+
+int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
+{
+	struct mlx4_vlan_table *table =
+	&mlx4_priv(dev)->port[port - 1].vlan_table;
+	int i, err = 0;
+	int free = -1;
+
+	down(&table->vlan_sem);
+	for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) {
+		if (free < 0 && (table->refs[i] == 0)) {
+			free = i;
+			continue;
+		}
+
+		if (table->refs[i] &&
+		    (vlan == (MLX4_VLAN_MASK &
+			      be32_to_cpu(table->entries[i])))) {
+			/* Vlan already registered, increase refernce count */
+			*index = i;
+			++table->refs[i];
+			goto out;
+		}
+	}
+
+	if (table->total == table->max) {
+		/* No free vlan entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
+	/* Register new MAC */
+	table->refs[free] = 1;
+	table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID);
+
+	err = mlx4_SET_PORT_vlan_table(dev, port, table->entries);
+	if (unlikely(err)) {
+		mlx4_warn(dev, "Failed adding vlan: %u\n", vlan);
+		table->refs[free] = 0;
+		table->entries[free] = 0;
+		goto out;
+	}
+
+	*index = free;
+	++table->total;
+out:
+	up(&table->vlan_sem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_register_vlan);
+
+void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index)
+{
+	struct mlx4_vlan_table *table =
+	&mlx4_priv(dev)->port[port - 1].vlan_table;
+
+	down(&table->vlan_sem);
+	if (!table->refs[index]) {
+		mlx4_warn(dev, "No vlan entry for index %d\n", index);
+		goto out;
+	}
+	if (--table->refs[index]) {
+		mlx4_dbg(dev, "Have more references for index %d,"
+			 "no need to modify vlan table\n", index);
+		goto out;
+	}
+	table->entries[index] = 0;
+	mlx4_SET_PORT_vlan_table(dev, port, table->entries);
+	--table->total;
+out:
+	up(&table->vlan_sem);
+}
+EXPORT_SYMBOL_GPL(mlx4_unregister_vlan);
+
+int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	int err;
+	u8 is_eth = (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH) ? 1 : 0;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memset(mailbox->buf, 0, 256);
+	if (is_eth) {
+		((u8 *) mailbox->buf)[3] = 7;
+		((__be16 *) mailbox->buf)[3] =
+			cpu_to_be16(dev->caps.eth_mtu_cap[port] +
+				    ETH_HLEN + ETH_FCS_LEN);
+		((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15);
+		((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15);
+	}
+	err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index fa24e65..1b2b7c4 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
  }
  EXPORT_SYMBOL_GPL(mlx4_qp_modify);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp)
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	int qpn;
+
+	qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align);
+	if (qpn == -1)
+		return -ENOMEM;
+
+	*base = qpn;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range);
+
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	if (base_qpn < dev->caps.sqp_start + 8)
+		return;
+
+	mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt);
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_release_range);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp)
  {
  	struct mlx4_priv *priv = mlx4_priv(dev);
  	struct mlx4_qp_table *qp_table = &priv->qp_table;
  	int err;

-	if (sqpn)
-		qp->qpn = sqpn;
-	else {
-		qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap);
-		if (qp->qpn == -1)
-			return -ENOMEM;
-	}
+	if (!qpn)
+		return -EINVAL;
+
+	qp->qpn = qpn;

  	err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn);
  	if (err)
@@ -208,9 +231,6 @@ err_put_qp:
  	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

  err_out:
-	if (!sqpn)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
-
  	return err;
  }
  EXPORT_SYMBOL_GPL(mlx4_qp_alloc);
@@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp)
  	mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn);
  	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

-	if (qp->qpn >= dev->caps.sqp_start + 8)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
  }
  EXPORT_SYMBOL_GPL(mlx4_qp_free);

@@ -255,6 +273,7 @@ int mlx4_init_qp_table(struct mlx4_dev *dev)
  {
  	struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table;
  	int err;
+	int reserved_from_top = 0;

  	spin_lock_init(&qp_table->lock);
  	INIT_RADIX_TREE(&dev->qp_table_tree, GFP_ATOMIC);
@@ -264,9 +283,45 @@ int mlx4_init_qp_table(struct mlx4_dev *dev)
  	 * block of special QPs must be aligned to a multiple of 8, so
  	 * round up.
  	 */
-	dev->caps.sqp_start = ALIGN(dev->caps.reserved_qps, 8);
-	err = mlx4_bitmap_init(&qp_table->bitmap, dev->caps.num_qps,
-			       (1 << 24) - 1, dev->caps.sqp_start + 8);
+	dev->caps.sqp_start =
+		ALIGN(dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], 8);
+
+	{
+		int sort[MLX4_QP_REGION_COUNT];
+		int i, j, tmp;
+		int last_base = dev->caps.num_qps;
+
+		for (i = 1; i < MLX4_QP_REGION_COUNT; ++i)
+			sort[i] = i;
+
+		for (i = MLX4_QP_REGION_COUNT; i > 0; --i) {
+			for (j = 2; j < i; ++j) {
+				if (dev->caps.reserved_qps_cnt[sort[j]] >
+				    dev->caps.reserved_qps_cnt[sort[j - 1]]) {
+					tmp	       	= sort[j];
+					sort[j]     	= sort[j - 1];
+					sort[j - 1] 	= tmp;
+				}
+			}
+		}
+
+		for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) {
+			last_base -= dev->caps.reserved_qps_cnt[sort[i]];
+			dev->caps.reserved_qps_base[sort[i]] = last_base;
+			reserved_from_top +=
+				dev->caps.reserved_qps_cnt[sort[i]];
+		}
+
+	}
+
+	err = mlx4_bitmap_init_with_effective_max(&qp_table->bitmap,
+						  dev->caps.num_qps,
+						  (1 << 23) - 1,
+						  dev->caps.sqp_start + 8,
+						  dev->caps.num_qps -
+							reserved_from_top);
+	
+
  	if (err)
  		return err;

@@ -279,6 +334,20 @@ void mlx4_cleanup_qp_table(struct mlx4_dev *dev)
  	mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap);
  }

+int mlx4_qp_get_region(struct mlx4_dev *dev,
+		       enum qp_region region,
+		       int *base_qpn, int *cnt)
+{
+	if ((region < 0) || (region >= MLX4_QP_REGION_COUNT))
+		return -EINVAL;
+
+	*base_qpn 	= dev->caps.reserved_qps_base[region];
+	*cnt		= dev->caps.reserved_qps_cnt[region];
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_get_region);
+
  int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
  		  struct mlx4_qp_context *context)
  {
@@ -299,3 +368,35 @@ int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
  }
  EXPORT_SYMBOL_GPL(mlx4_qp_query);

+int mlx4_qp_to_ready(struct mlx4_dev *dev,
+		     struct mlx4_mtt *mtt,
+		     struct mlx4_qp_context *context,
+		     struct mlx4_qp *qp,
+		     enum mlx4_qp_state *qp_state)
+{
+#define STATE_ARR_SIZE 4
+	int err = 0;
+	int i;
+	enum mlx4_qp_state states[STATE_ARR_SIZE] = {
+		MLX4_QP_STATE_RST,
+		MLX4_QP_STATE_INIT,
+		MLX4_QP_STATE_RTR,
+		MLX4_QP_STATE_RTS
+	};
+
+	for (i = 0; i < STATE_ARR_SIZE - 1; i++) {
+		context->flags |= cpu_to_be32(states[i+1] << 28);
+		err = mlx4_qp_modify(dev, mtt, states[i],
+				     states[i+1], context, 0, 0, qp);
+		if (err) {
+			mlx4_err(dev, "Failed to bring qp to state:"
+				      "%d with error: %d\n",
+					states[i+1], err);
+			return err;
+		}
+		*qp_state = states[i+1];
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_to_ready);
+
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 77323a7..cf9c679 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -132,6 +132,15 @@ enum {
  	MLX4_MAILBOX_SIZE	=  4096
  };

+enum {
+	/* set port opcode modifiers */
+	MLX4_SET_PORT_GENERAL   = 0x0,
+	MLX4_SET_PORT_RQP_CALC  = 0x1,
+	MLX4_SET_PORT_MAC_TABLE = 0x2,
+	MLX4_SET_PORT_VLAN_TABLE = 0x3,
+	MLX4_SET_PORT_PRIO_MAP  = 0x4,
+};
+
  struct mlx4_dev;

  struct mlx4_cmd_mailbox {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ff7df1a..2d08c4f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -60,6 +60,7 @@ enum {
  	MLX4_DEV_CAP_FLAG_IPOIB_CSUM	= 1 <<  7,
  	MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR	= 1 <<  8,
  	MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR	= 1 <<  9,
+	MLX4_DEV_CAP_FLAG_DPDP		= 1 << 12,
  	MLX4_DEV_CAP_FLAG_MEM_WINDOW	= 1 << 16,
  	MLX4_DEV_CAP_FLAG_APM		= 1 << 17,
  	MLX4_DEV_CAP_FLAG_ATOMIC	= 1 << 18,
@@ -133,6 +134,23 @@ enum {
  	MLX4_STAT_RATE_OFFSET	= 5
  };

+enum qp_region {
+	MLX4_QP_REGION_FW = 0,
+	MLX4_QP_REGION_ETH_ADDR,
+	MLX4_QP_REGION_FC_ADDR,
+	MLX4_QP_REGION_FC_EXCH,
+	MLX4_QP_REGION_COUNT		/* Must be last */
+};
+
+enum mlx4_port_type {
+	MLX4_PORT_TYPE_IB	= 1 << 0,
+	MLX4_PORT_TYPE_ETH	= 1 << 1,
+};
+
+enum {
+	MLX4_NUM_FEXCH		= 64 * 1024,
+};
+
  static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor)
  {
  	return (major << 32) | (minor << 16) | subminor;
@@ -142,7 +160,9 @@ struct mlx4_caps {
  	u64			fw_ver;
  	int			num_ports;
  	int			vl_cap[MLX4_MAX_PORTS + 1];
-	int			mtu_cap[MLX4_MAX_PORTS + 1];
+	int			ib_mtu_cap[MLX4_MAX_PORTS + 1];
+	u64			def_mac[MLX4_MAX_PORTS + 1];
+	int			eth_mtu_cap[MLX4_MAX_PORTS + 1];
  	int			gid_table_len[MLX4_MAX_PORTS + 1];
  	int			pkey_table_len[MLX4_MAX_PORTS + 1];
  	int			local_ca_ack_delay;
@@ -157,7 +177,6 @@ struct mlx4_caps {
  	int			max_rq_desc_sz;
  	int			max_qp_init_rdma;
  	int			max_qp_dest_rdma;
-	int			reserved_qps;
  	int			sqp_start;
  	int			num_srqs;
  	int			max_srq_wqes;
@@ -187,6 +206,13 @@ struct mlx4_caps {
  	u16			stat_rate_support;
  	u8			port_width_cap[MLX4_MAX_PORTS + 1];
  	int			max_gso_sz;
+	int			reserved_qps_cnt[MLX4_QP_REGION_COUNT];
+	int			reserved_qps_base[MLX4_QP_REGION_COUNT];
+	int			log_num_macs;
+	int			log_num_vlans;
+	int			log_num_prios;
+	enum mlx4_port_type	port_type[MLX4_MAX_PORTS + 1];
+	int			reserved_fexch_mpts_base;
  };

  struct mlx4_buf_list {
@@ -208,6 +234,34 @@ struct mlx4_mtt {
  	int			page_shift;
  };

+enum {
+	MLX4_DB_PER_PAGE = PAGE_SIZE / 4
+};
+
+struct mlx4_db_pgdir {
+	struct list_head        list;
+	DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE);
+	DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2);
+	unsigned long          *bits[2];
+	__be32                 *db_page;
+	dma_addr_t              db_dma;
+};
+
+struct mlx4_db {
+	__be32                  *db;
+	struct mlx4_db_pgdir 	*pgdir;
+	dma_addr_t              dma;
+	int                     index;
+	int                     order;
+};
+
+
+struct mlx4_hwq_resources {
+	struct mlx4_db		db;
+	struct mlx4_mtt 	mtt;
+	struct mlx4_buf 	buf;
+};
+
  struct mlx4_mr {
  	struct mlx4_mtt		mtt;
  	u64			iova;
@@ -247,6 +301,7 @@ struct mlx4_cq {
  	int			arm_sn;

  	int			cqn;
+	int			comp_eq_idx;

  	atomic_t		refcount;
  	struct completion	free;
@@ -309,6 +364,36 @@ struct mlx4_init_port_param {
  	u64			si_guid;
  };

+static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac,
+					int *log_vlan, int *log_prio)
+{
+	*log_mac = dev->caps.log_num_macs;
+	*log_vlan = dev->caps.log_num_vlans;
+	*log_prio = dev->caps.log_num_prios;
+}
+
+static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev,
+					enum mlx4_port_type ptype)
+{
+	u32 ret = 0;
+	int i;
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (dev->caps.port_type[i] == ptype)
+			ret |= 1 << (i-1);
+	}
+	return ret;
+}
+
+#define foreach_port(port, bitmap) \
+	for ((port) = 1; (port) <= MLX4_MAX_PORTS; ++(port)) \
+		if (bitmap & 1 << ((port)-1))
+
+static inline int mlx4_get_fexch_mpts_base(struct mlx4_dev *dev)
+{
+	return dev->caps.reserved_fexch_mpts_base;
+}
+
  int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
  		   struct mlx4_buf *buf);
  void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf);
@@ -332,8 +417,12 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift,
  void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt);
  u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt);

+int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd,
+			   u64 iova, u64 size, u32 access, int npages,
+			   int page_shift, struct mlx4_mr *mr);
  int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
  		  int npages, int page_shift, struct mlx4_mr *mr);
+void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr);
  void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr);
  int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr);
  int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
@@ -341,11 +430,20 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
  int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
  		       struct mlx4_buf *buf);

+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		      struct device *dma_device, int size, int max_direct);
+void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
+		      struct device *dma_device, int size);
+
  int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
+		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
+		  unsigned vector, int collapsed);
  void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp);
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp);
  void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp);

  int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
@@ -360,14 +458,26 @@ int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port);
  int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);
  int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);

+int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index);
+void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index);
+int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index);
+void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index);
+
  int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list,
  		      int npages, u64 iova, u32 *lkey, u32 *rkey);
+int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
+			  u64 *page_list, int npages, u64 iova,
+			  u32 fbo, u32 len, u32 *lkey, u32 *rkey);
  int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
  		   int max_maps, u8 page_shift, struct mlx4_fmr *fmr);
+int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd,
+			    u32 access, int max_pages, int max_maps,
+			    u8 page_shift, struct mlx4_fmr *fmr);
  int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
  void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
  		    u32 *lkey, u32 *rkey);
  int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
+int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
  int mlx4_SYNC_TPT(struct mlx4_dev *dev);

  #endif /* MLX4_DEVICE_H */
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index a5e43fe..5a02980 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -151,7 +151,16 @@ struct mlx4_qp_context {
  	u8			reserved4[2];
  	u8			mtt_base_addr_h;
  	__be32			mtt_base_addr_l;
-	u32			reserved5[10];
+	u8                      VE;
+	u8                      reserved5;
+	__be16                  VFT_id_prio;
+	u8                      reserved6;
+	u8                      exch_size;
+	__be16                  exch_base;
+	u8                      VFT_hop_cnt;
+	u8                      my_fc_id_idx;
+	__be16                  reserved7;
+	u32                     reserved8[7];
  };

  /* Which firmware version adds support for NEC (NoErrorCompletion) bit */
@@ -296,6 +305,10 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
  int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
  		  struct mlx4_qp_context *context);

+int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
+		     struct mlx4_qp_context *context,
+		     struct mlx4_qp *qp, enum mlx4_qp_state *qp_state);
+
  static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn)
  {
  	return radix_tree_lookup(&dev->qp_table_tree, qpn & (dev->caps.num_qps - 1));
@@ -303,4 +316,8 @@ static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn)

  void mlx4_qp_remove(struct mlx4_dev *dev, struct mlx4_qp *qp);

+int mlx4_qp_get_region(struct mlx4_dev *dev,
+		       enum qp_region region,
+		       int *base_qpn, int *cnt);
+
  #endif /* MLX4_QP_H */
-- 
1.5.4

_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From yevgenyp at mellanox.co.il  Wed Apr 16 01:05:45 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 16 Apr 2008 11:05:45 +0300
Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support
Message-ID: <4805B359.2070906@mellanox.co.il>

Multi Protocol supplies the user with the ability to run
Infiniband and Ethernet protocols on the same HCA
(separately or at the same time).

Main changes to mlx4_ib:
         1.  Mlx4_ib driver queries the low level driver for number of IB ports.
         2.  Qps are being reserved prior to being allocated.
         3.  Cq allocation API change.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Reviewed-by: Eli Cohen <eli at mellanox.co.il>
---
  drivers/infiniband/hw/mlx4/cq.c      |    2 +-
  drivers/infiniband/hw/mlx4/mad.c     |    6 +++---
  drivers/infiniband/hw/mlx4/main.c    |   15 ++++++++++++---
  drivers/infiniband/hw/mlx4/mlx4_ib.h |    2 ++
  drivers/infiniband/hw/mlx4/qp.c      |    9 +++++++++
  5 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 3557e7e..912b35c 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
  	}

  	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq);
+			    cq->db.dma, &cq->mcq, vector, 0);
  	if (err)
  		goto err_dbmap;

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 4c1e72f..d91ba56 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -297,7 +297,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
  	int p, q;
  	int ret;

-	for (p = 0; p < dev->dev->caps.num_ports; ++p)
+	for (p = 0; p < dev->num_ports; ++p)
  		for (q = 0; q <= 1; ++q) {
  			agent = ib_register_mad_agent(&dev->ib_dev, p + 1,
  						      q ? IB_QPT_GSI : IB_QPT_SMI,
@@ -313,7 +313,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
  	return 0;

  err:
-	for (p = 0; p < dev->dev->caps.num_ports; ++p)
+	for (p = 0; p < dev->num_ports; ++p)
  		for (q = 0; q <= 1; ++q)
  			if (dev->send_agent[p][q])
  				ib_unregister_mad_agent(dev->send_agent[p][q]);
@@ -326,7 +326,7 @@ void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev)
  	struct ib_mad_agent *agent;
  	int p, q;

-	for (p = 0; p < dev->dev->caps.num_ports; ++p) {
+	for (p = 0; p < dev->num_ports; ++p) {
  		for (q = 0; q <= 1; ++q) {
  			agent = dev->send_agent[p][q];
  			dev->send_agent[p][q] = NULL;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 136c76c..fd0b8c0 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -112,7 +112,8 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,

  	props->max_mr_size	   = ~0ull;
  	props->page_size_cap	   = dev->dev->caps.page_size_cap;
-	props->max_qp		   = dev->dev->caps.num_qps - dev->dev->caps.reserved_qps;
+	props->max_qp 		   = dev->dev->caps.num_qps -
+				     dev->dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW];
  	props->max_qp_wr	   = dev->dev->caps.max_wqes;
  	props->max_sge		   = min(dev->dev->caps.max_sq_sg,
  					 dev->dev->caps.max_rq_sg);
@@ -552,11 +553,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
  	mutex_init(&ibdev->pgdir_mutex);

  	ibdev->dev = dev;
+	ibdev->ports_map = mlx4_get_ports_of_type(dev, MLX4_PORT_TYPE_IB);

  	strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX);
  	ibdev->ib_dev.owner		= THIS_MODULE;
  	ibdev->ib_dev.node_type		= RDMA_NODE_IB_CA;
-	ibdev->ib_dev.phys_port_cnt	= dev->caps.num_ports;
+	ibdev->num_ports = 0;
+	foreach_port(i, ibdev->ports_map)
+		ibdev->num_ports++;
+	ibdev->ib_dev.phys_port_cnt	= ibdev->num_ports;
  	ibdev->ib_dev.num_comp_vectors	= 1;
  	ibdev->ib_dev.dma_device	= &dev->pdev->dev;

@@ -670,7 +675,7 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
  	struct mlx4_ib_dev *ibdev = ibdev_ptr;
  	int p;

-	for (p = 1; p <= dev->caps.num_ports; ++p)
+	for (p = 1; p <= ibdev->num_ports; ++p)
  		mlx4_CLOSE_PORT(dev, p);

  	mlx4_ib_mad_cleanup(ibdev);
@@ -685,6 +690,10 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr,
  			  enum mlx4_dev_event event, int port)
  {
  	struct ib_event ibev;
+	struct mlx4_ib_dev *ibdev = to_mdev((struct ib_device *) ibdev_ptr);
+
+	if (port > ibdev->num_ports)
+		return;

  	switch (event) {
  	case MLX4_DEV_EVENT_PORT_UP:
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 9e63732..7a8111c 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -173,6 +173,8 @@ struct mlx4_ib_ah {
  struct mlx4_ib_dev {
  	struct ib_device	ib_dev;
  	struct mlx4_dev	       *dev;
+	u32		 	ports_map;
+	int			num_ports;
  	void __iomem	       *uar_map;

  	struct list_head	pgdir_list;
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index b75efae..59f7284 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
  		}
  	}

+	if (!sqpn)
+		err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn);
+	if (err)
+		goto err_wrid;
+
  	err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp);
  	if (err)
  		goto err_wrid;
@@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
  	mlx4_ib_unlock_cqs(send_cq, recv_cq);

  	mlx4_qp_free(dev->dev, &qp->mqp);
+
+	if (!is_sqp(dev, qp))
+		mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1);
+
  	mlx4_mtt_cleanup(dev->dev, &qp->mtt);

  	if (is_user) {
-- 
1.5.4


From dorfman.eli at gmail.com  Wed Apr 16 01:22:04 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Wed, 16 Apr 2008 11:22:04 +0300
Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header
In-Reply-To: <OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
References: <4804B03C.6060507@voltaire.com>
	<OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
Message-ID: <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>

According to Mike's explanation below it seems that we have a bug in
iSER initiator.
Fixing this bug will require a fix in the stgt iSER code.

The problem is that the initiator send a VA which already includes an
offset for the unsolicited data (which is wrong).
In iser_initiator.c::iser_prepare_write_cmd the code looks like this:
hdr->write_va   = cpu_to_be64(regd_buf->reg.va + unsol_sz);

we think that it should be modified to:
hdr->write_va   = cpu_to_be64(regd_buf->reg.va);

Let's discuss this and verify we interpret the spec correctly.
If agreed we will send a patch.

Eli

2008/4/15 Mike Ko <mako at almaden.ibm.com>:
>
> VA is a concept introduced in an Infiniband annex to support iSER.  It
> appears in the expanded iSER header for Infiniband use only to support the
> non-Zero Based Virtual Address (non-ZBVA) used in Infiniband vs the ZBVA
> used in IETF.
>
> "The DataDescriptorOut describes the I/O buffer starting with the immediate
> unsolicited data (if any), followed by the non-immediate unsolicited data
> (if any) and solicited data."  If non-ZBVA mode is used, then VA points to
> the beginning of this buffer.  So in your example, the VA field in the
> expanded iSER header will be zero.  Note that for IETF, ZBVA is assumed and
> there is no provision to specify a different VA in the iSER header.
>
> Tagged offset (TO) refers to the offset within a tagged buffer in RDMA Write
> and RDMA Read Request Messages.  When sending non-immediate unsolicited
> data, Send Message types are used and the TO field is not present.  Instead,
> the buffer offset is appropriately represented by the Buffer Offset field in
> the SCSI Data-Out PDU.  Note that Tagged Offset is not the same as write VA
> and it does not appear in the iSER header.
>
> Mike
>
>
>
>  Erez Zilber <erezz at voltaire.com>
> Sent by: ips-bounces at ietf.org
>
> 04/15/2008 06:40 AM
>
> To ips at ietf.org
>
> cc
>
> Subject [Ips] Calculating the VA in iSER header
>
>
>
>
>
>
> We're trying to understand what should be the write VA (tagged offset)
>  in the iSER header for WRITE commands. If unsolicited data is to be
>  sent, should the VA be the original VA or should it be original VA +
>  FirstBurstLength?
>
>
>  Example:
>
>
>  InitialR2T=No
>
>  FirstBurstLength = 1000
>
>
>  Base address of the registered buffer = 0
>
>
>  Now, what should be the VA in the iSER header? 0 or 1000?
>
>
>  We read the following paragraph in the iSER spec, but didn't get an
>  answer from there:
>
>
>  * If there is solicited data to be transferred for the SCSI write or
>  bidirectional command, as indicated by the Expected Data Transfer
>  Length in the SCSI Command PDU exceeding the value of
>  UnsolicitedDataSize, the iSER layer at the initiator MUST do the
>  following:
>
>  a. It MUST allocate a Write STag for the I/O Buffer defined by
>  the qualifier DataDescriptorOut. The DataDescriptorOut
>  describes the I/O buffer starting with the immediate
>  unsolicited data (if any), followed by the non-immediate
>  unsolicited data (if any) and solicited data. This means
>  that the BufferOffset for the SCSI Data-out for this
>  command is equal to the TO. This implies that a zero TO
>  for this STag points to the beginning of this I/O Buffer.
>
>
>  Thanks,
>
>  --
>
>  ____________________________________________________________
>
>  Erez Zilber | 972-9-971-7689
>
>  Software Engineer, Storage Solutions
>
>  Voltaire – _The Grid Backbone_
>
>  __
>
>  www.voltaire.com <http://www.voltaire.com/>
>
>
>
>  _______________________________________________
>  Ips mailing list
>  Ips at ietf.org
>  https://www.ietf.org/mailman/listinfo/ips
>
>
> _______________________________________________
>  Ips mailing list
>  Ips at ietf.org
>  https://www.ietf.org/mailman/listinfo/ips
>
>

From vlad at dev.mellanox.co.il  Wed Apr 16 05:52:34 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 16 Apr 2008 15:52:34 +0300
Subject: [ofa-general] ofed_kernel git tree for OFED-1.4 (based on
	2.6.25-rc7)
Message-ID: <4805F692.1040101@dev.mellanox.co.il>

Hi Ralph,
I prepared ofed_kernel git tree: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel.
This tree merged with 2.6.25-rc7.
Currently ofed_scripts/ofed_makedist.sh fails on ipath_0180_header_file_changes_to_support_IBA7220.patch:

 > ./ofed_scripts/ofed_makedist.sh

git clone -q -s -n /local/scm/ofed-1.4/linux-2.6 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11
Initialized empty Git repository in /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11/.git/
pushd /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11
/tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /local/scm/ofed-1.4/linux-2.6 /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_checkout.sh 3bb85a2f1c15d1e58cd8b0b2da0577a3ab98977a 
cdbdfc5cc29c4add1a2d6967b137a3347112a199 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
/local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
Failed executing /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
Hunk #7 FAILED at 565.
Hunk #8 succeeded at 582 (offset 1 line).
Hunk #9 succeeded at 595 (offset 1 line).
Hunk #10 FAILED at 613.
Hunk #11 succeeded at 719 (offset 2 lines).
Hunk #12 FAILED at 857.
3 out of 12 hunks FAILED -- rejects in file drivers/infiniband/hw/ipath/ipath_verbs.h
Patch ipath_0180_header_file_changes_to_support_IBA7220.patch does not apply (enforce with -f)

Failed executing /usr/bin/quiltBuild failed in /tmp/build-ofed_kernel-d23175 See log file /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log

Should ipath patches be removed from the git tree (kernel_patches/fixes/ipath*)?

Regards,
Vladimir


From amar.mudrankit at qlogic.com  Wed Apr 16 07:25:23 2008
From: amar.mudrankit at qlogic.com (Amar Mudrankit (Contractor - ))
Date: Wed, 16 Apr 2008 09:25:23 -0500
Subject: [ofa-general] Kernel Panic on Stress Testing of OFED-1.3 IPoIB
	Driver
Message-ID: <C07C40DB2364324799506DE8FF12F8D859E8E9@EPEXCH1.qlogic.org>


Hello,

I observed a kernel panic while performing stress tests on IPoIB driver over OFED-1.3.  The stress test was ran on a test setup consisting of one mthca machine and one connectX machine.  5 iperf(4 UDP and 1 TCP) streams were started over IPoIB interfaces on both machines.  The details of the panic as well as test steps are captured in the bug:
https://bugs.openfabrics.org//show_bug.cgi?id=1004

Thanks,
Amar S Mudrankit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080416/bbd278e8/attachment.html>

From pw at osc.edu  Wed Apr 16 07:48:30 2008
From: pw at osc.edu (Pete Wyckoff)
Date: Wed, 16 Apr 2008 10:48:30 -0400
Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header
In-Reply-To: <694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>
References: <4804B03C.6060507@voltaire.com>
	<OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
	<694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>
Message-ID: <20080416144830.GC23861@osc.edu>

dorfman.eli at gmail.com wrote on Wed, 16 Apr 2008 11:22 +0300:
> According to Mike's explanation below it seems that we have a bug in
> iSER initiator.
> Fixing this bug will require a fix in the stgt iSER code.
> 
> The problem is that the initiator send a VA which already includes an
> offset for the unsolicited data (which is wrong).
> In iser_initiator.c::iser_prepare_write_cmd the code looks like this:
> hdr->write_va   = cpu_to_be64(regd_buf->reg.va + unsol_sz);
> 
> we think that it should be modified to:
> hdr->write_va   = cpu_to_be64(regd_buf->reg.va);
> 
> Let's discuss this and verify we interpret the spec correctly.
> If agreed we will send a patch.

Agree with the interpretation of the spec, and it's probably a bit
clearer that way too.  But we have working initiators and targets
that do it the "wrong" way.

The transition involved in fixing both sides will lead to problems.
How does a target detect an unfixed initiator and vice versa?  A
mismatched pair will lead to data corruption.

We could address this in a few ways:

1. Flag day: all initiators and targets change at the same time.
Will see data corruption if someone unluckily runs one or the other
using old non-fixed code.

2. Rewrite the IB Annex to codify what's done in practice, and don't
"fix" any code.

3. Start using the Hello messages and extend them to specify if the
VA marks the start of the buffer or the unsol offset.

I really don't look forward to the bug reports we'll get from a
flag da approach.  Old linux versions tend to hang around for a
very long time, and people are often reluctant to upgrade.

		-- Pete

> 2008/4/15 Mike Ko <mako at almaden.ibm.com>:
> >
> > VA is a concept introduced in an Infiniband annex to support iSER.  It
> > appears in the expanded iSER header for Infiniband use only to support the
> > non-Zero Based Virtual Address (non-ZBVA) used in Infiniband vs the ZBVA
> > used in IETF.
> >
> > "The DataDescriptorOut describes the I/O buffer starting with the immediate
> > unsolicited data (if any), followed by the non-immediate unsolicited data
> > (if any) and solicited data."  If non-ZBVA mode is used, then VA points to
> > the beginning of this buffer.  So in your example, the VA field in the
> > expanded iSER header will be zero.  Note that for IETF, ZBVA is assumed and
> > there is no provision to specify a different VA in the iSER header.
> >
> > Tagged offset (TO) refers to the offset within a tagged buffer in RDMA Write
> > and RDMA Read Request Messages.  When sending non-immediate unsolicited
> > data, Send Message types are used and the TO field is not present.  Instead,
> > the buffer offset is appropriately represented by the Buffer Offset field in
> > the SCSI Data-Out PDU.  Note that Tagged Offset is not the same as write VA
> > and it does not appear in the iSER header.
> >
> > Mike
> >
> >  Erez Zilber <erezz at voltaire.com>
> > Sent by: ips-bounces at ietf.org
> >
> > 04/15/2008 06:40 AM
> >
> > To ips at ietf.org
> >
> > cc
> >
> > Subject [Ips] Calculating the VA in iSER header
> >
> > We're trying to understand what should be the write VA (tagged offset)
> >  in the iSER header for WRITE commands. If unsolicited data is to be
> >  sent, should the VA be the original VA or should it be original VA +
> >  FirstBurstLength?
> >
> >
> >  Example:
> >
> >
> >  InitialR2T=No
> >
> >  FirstBurstLength = 1000
> >
> >
> >  Base address of the registered buffer = 0
> >
> >
> >  Now, what should be the VA in the iSER header? 0 or 1000?
> >
> >
> >  We read the following paragraph in the iSER spec, but didn't get an
> >  answer from there:
> >
> >
> >  * If there is solicited data to be transferred for the SCSI write or
> >  bidirectional command, as indicated by the Expected Data Transfer
> >  Length in the SCSI Command PDU exceeding the value of
> >  UnsolicitedDataSize, the iSER layer at the initiator MUST do the
> >  following:
> >
> >  a. It MUST allocate a Write STag for the I/O Buffer defined by
> >  the qualifier DataDescriptorOut. The DataDescriptorOut
> >  describes the I/O buffer starting with the immediate
> >  unsolicited data (if any), followed by the non-immediate
> >  unsolicited data (if any) and solicited data. This means
> >  that the BufferOffset for the SCSI Data-out for this
> >  command is equal to the TO. This implies that a zero TO
> >  for this STag points to the beginning of this I/O Buffer.


From rdreier at cisco.com  Wed Apr 16 08:34:23 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 08:34:23 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <ada1w55n79c.fsf@cisco.com>

Your email has

 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed

and the format=flowed means that the patch gets corrupted and won't
apply.  So when you resend, please fix.

I don't think we can really apply this as one patch -- it does too many
things at once and needs to be split up... I think pretty much each of
these items is independent and could be a separate patch:

 >         1.  Mlx4 device now holds the actual protocol for each port.
 >             The port types are determined through module parameters of through sysfs
 >             interface. The requested types are verified with firmware capabilities
 >             in order to determine the actual port protocol.
 >         2.  The driver now manages Mac and Vlan tables used by customers of the low
 >             level driver. Corresponding commands were added.
 >         3.  Completion eq's are created per cpu. Created cq's are attached to an eq by
 >             "Round Robin" algorithm, unless a specific eq was requested.
 >         4.  Creation of a collapsed cq support was added.
 >         5.  Additional reserved qp ranges were added. There is a range for the customers
 >             of the low level driver (IB, Ethernet, FCoE).
 >         6.  Qp allocation process changed.
 >             First a qp range should be reserved, then qps can be allocated from that
 >             range. This is to support the ability to allocate consecutive qps.
 >             Appropriate changes were made in the allocation mechanism.
 >         7.  Common actions to all HW resource management (Doorbell allocation,
 >             Buffer allocation, Mtt write) were moved to the low level driver.

Also, on the other hand, the current two patches are too split up: if
I apply this patch then mlx4_ib won't compile until the second patch
goes in too.  Which means someone trying to bisect an mlx4 bug gets into
trouble.  So please make sure that everything still compiles and works
after each patch is applied.

By the way, the multiple EQ stuff is a pretty major change in
behavior... are we really ready for this?  Round robin seems like it
could easily lead to worst-case behavior for some plausible workloads.

Finally, checkpatch.pl shows a few minor whitespace problems... please
fix when you resend.

 - R.


From rdreier at cisco.com  Wed Apr 16 08:42:47 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 08:42:47 -0700
Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support
In-Reply-To: <4805B359.2070906@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 11:05:45 +0300")
References: <4805B359.2070906@mellanox.co.il>
Message-ID: <adawsmxlsaw.fsf@cisco.com>

 > Main changes to mlx4_ib:
 >         1.  Mlx4_ib driver queries the low level driver for number of IB ports.
 >         2.  Qps are being reserved prior to being allocated.
 >         3.  Cq allocation API change.

As I said before, these mlx4_ib changes should be rolled into the
mlx4_core patches that change these interfaces.

Also, I don't understand exactly how you're handling which ports are IB
and which aren't.  Have you tested this code in the case where port 1 is
non-IB and port 2 is IB?  It seems that you have a bitmap of which ports
are IB:

 > +	foreach_port(i, ibdev->ports_map)
 > +		ibdev->num_ports++;

(By the way, foreach_port() is too generic a name to expose, since it
could easily collide with some general API -- I would use
mlx4_foreach_port() instead)

But then you do stuff like:

 > -	for (p = 1; p <= dev->caps.num_ports; ++p)
 > +	for (p = 1; p <= ibdev->num_ports; ++p)
 >  		mlx4_CLOSE_PORT(dev, p);

which doesn't seem to work if you only have one IB port but it isn't
port 1.

I think there are two sane ways to handle non-IB ports in mlx4_ib:

 - Have mlx4_ib report the number of IB ports as phys_port_cnt and have
   an indirection table that maps from IB port # to physical HCA port #
   (to handle the case where only port 2 is IB, so you need to map IB
   port 1 to HCA physical port 2).  This leads to some confusion with
   the real-world labels on ports I guess, and also I guess you need
   some SMA trickery to report the right port # to the SM.

 - Report the number of physical HCA ports as phys_port_cnt and just
   have non-IB ports always say they're DOWN.  This makes changing
   config on the fly easier, since a port going from DOWN to INIT is a
   pretty normal thing.  I guess there is a little bit of hackery
   involved in handling requests to mlx4_ib that involve non-IB ports.

However your changes seem to take a third way and I don't understand how
it can work.  Perhaps you can clarify?

 - R.


From rdreier at cisco.com  Wed Apr 16 08:46:25 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 08:46:25 -0700
Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header
In-Reply-To: <20080416144830.GC23861@osc.edu> (Pete Wyckoff's message of "Wed, 
	16 Apr 2008 10:48:30 -0400")
References: <4804B03C.6060507@voltaire.com>
	<OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
	<694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>
	<20080416144830.GC23861@osc.edu>
Message-ID: <adaskxlls4u.fsf@cisco.com>

 > Agree with the interpretation of the spec, and it's probably a bit
 > clearer that way too.  But we have working initiators and targets
 > that do it the "wrong" way.

Yes... I guess the key question is whether there are any initiators that
do things the "right" way.

 > 1. Flag day: all initiators and targets change at the same time.
 > Will see data corruption if someone unluckily runs one or the other
 > using old non-fixed code.

Seems unacceptable to me... it doesn't make sense at all to break every
setup in the world just to be "right" according to the spec.

 > 2. Rewrite the IB Annex to codify what's done in practice, and don't
 > "fix" any code.

If existing practice is universally to do things "wrong" then this seems
to me by far the best way to proceed.

 > 3. Start using the Hello messages and extend them to specify if the
 > VA marks the start of the buffer or the unsol offset.

this seems like a pain for not much benefit... every initiator and
target needs new code to handle the negotiation, and you don't get
anything except the satisfaction of following the letter of the spec.


From rdreier at cisco.com  Wed Apr 16 08:47:12 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 08:47:12 -0700
Subject: [ofa-general] Kernel Panic on Stress Testing of OFED-1.3 IPoIB
	Driver
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D859E8E9@EPEXCH1.qlogic.org> (Amar
	Mudrankit's message of "Wed, 16 Apr 2008 09:25:23 -0500")
References: <C07C40DB2364324799506DE8FF12F8D859E8E9@EPEXCH1.qlogic.org>
Message-ID: <adaod89ls3j.fsf@cisco.com>

 > https://bugs.openfabrics.org//show_bug.cgi?id=1004

Has anyone tried this with an upstream kernel (rather than OFED-1.3)?
2.6.25-rc9 or my for-2.6.26 branch would both be useful.

 - R.


From holt at sgi.com  Wed Apr 16 09:33:37 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 16 Apr 2008 11:33:37 -0500
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <ec6d8f91b299cf26cce5.1207669444@duo.random>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
Message-ID: <20080416163337.GJ22493@sgi.com>


I don't think this lock mechanism is completely working.  I have
gotten a few failures trying to dereference 0x100100 which appears to
be LIST_POISON1.

Thanks,
Robin


From jeff at splitrockpr.com  Wed Apr 16 09:55:12 2008
From: jeff at splitrockpr.com (Jeffrey Scott)
Date: Wed, 16 Apr 2008 09:55:12 -0700
Subject: [ofa-general] OpenFabrics Sonoma presentations now available
Message-ID: <89260B536D004F29B5FD9E10996DEF13@Gaucho>

Presentations from the Sonoma Workshop are now available for download on the
OpenFabrics website.

 
http://www.openfabrics.org/archives/april2008sonoma.htm

 
-----------------------------------

Jeffrey Scott

Split Rock Communications

 
408-884-4017

408-348-3651 Mobile

408-884-3900 Fax

www.SplitRockPR.com

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080416/59d236b0/attachment.html>

From contact at cncmachinenet.com  Wed Apr 16 08:17:46 2008
From: contact at cncmachinenet.com (hari alica)
Date: Wed, 16 Apr 2008 15:17:46 +0000
Subject: [ofa-general] All Popular Watch Makes and Models.
Message-ID: <000901c89fe4$06fc34fa$419423a3@otyvojws>

Discover Our Range of Extraordinary Rolex Timepieces for Men and Women.

Designer Watches

http://nonesrumnarrow.com/


From oqhaoaldx at bobtheitguy.com  Wed Apr 16 10:18:09 2008
From: oqhaoaldx at bobtheitguy.com (Sal Whitaker)
Date: Thu, 17 Apr 2008 02:18:09 +0900
Subject: [ofa-general] Hi 
Message-ID: <01c8a031$474b0680$7ca262de@oqhaoaldx>

+ Eliminate all body fat
+ Tighten and Increase skin vitality
+ Increase muscle mass and energy
+ Increase your memory retention tenfold
+ Increase hair thickeness and growth
+ Increase metabolism for faster weight loss
Feel young again today!

http://www.ilevoyk.net/g/


From weiny2 at llnl.gov  Wed Apr 16 10:47:29 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Wed, 16 Apr 2008 10:47:29 -0700
Subject: [ofa-general] OpenFabrics Sonoma presentations now available
In-Reply-To: <89260B536D004F29B5FD9E10996DEF13@Gaucho>
References: <89260B536D004F29B5FD9E10996DEF13@Gaucho>
Message-ID: <20080416104729.6e203753.weiny2@llnl.gov>

On Wed, 16 Apr 2008 09:55:12 -0700
"Jeffrey Scott" <jeff at splitrockpr.com> wrote:

> Presentations from the Sonoma Workshop are now available for download on the
> OpenFabrics website.
> 
>  
> 
> http://www.openfabrics.org/archives/april2008sonoma.htm
> 

Are you still waiting for slides from some participants?  I don't see slides
for Endance or "Experience with Ranger System".

Thanks,
Ira


From ralph.campbell at qlogic.com  Wed Apr 16 10:47:25 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 16 Apr 2008 10:47:25 -0700
Subject: [ofa-general] Re: ofed_kernel git tree for OFED-1.4 (based on
	2.6.25-rc7)
In-Reply-To: <4805F692.1040101@dev.mellanox.co.il>
References: <4805F692.1040101@dev.mellanox.co.il>
Message-ID: <1208368045.8715.187.camel@brick.pathscale.com>

On Wed, 2008-04-16 at 15:52 +0300, Vladimir Sokolovsky wrote:
> Hi Ralph,
> I prepared ofed_kernel git tree: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel.
> This tree merged with 2.6.25-rc7.
> Currently ofed_scripts/ofed_makedist.sh fails on ipath_0180_header_file_changes_to_support_IBA7220.patch:
> 
>  > ./ofed_scripts/ofed_makedist.sh
> 
> git clone -q -s -n /local/scm/ofed-1.4/linux-2.6 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11
> Initialized empty Git repository in /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11/.git/
> pushd /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11
> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /local/scm/ofed-1.4/linux-2.6 /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_checkout.sh 3bb85a2f1c15d1e58cd8b0b2da0577a3ab98977a 
> cdbdfc5cc29c4add1a2d6967b137a3347112a199 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> Failed executing /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> Hunk #7 FAILED at 565.
> Hunk #8 succeeded at 582 (offset 1 line).
> Hunk #9 succeeded at 595 (offset 1 line).
> Hunk #10 FAILED at 613.
> Hunk #11 succeeded at 719 (offset 2 lines).
> Hunk #12 FAILED at 857.
> 3 out of 12 hunks FAILED -- rejects in file drivers/infiniband/hw/ipath/ipath_verbs.h
> Patch ipath_0180_header_file_changes_to_support_IBA7220.patch does not apply (enforce with -f)
> 
> Failed executing /usr/bin/quiltBuild failed in /tmp/build-ofed_kernel-d23175 See log file /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> 
> Should ipath patches be removed from the git tree (kernel_patches/fixes/ipath*)?
> 
> Regards,
> Vladimir

No, I will take a look and fix things.
Once 2.6.26 opens we can probably delete kernel_patches/fixes/ipath*.


From clameter at sgi.com  Wed Apr 16 11:35:38 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 16 Apr 2008 11:35:38 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
 mmu related operation to happen
In-Reply-To: <20080416163337.GJ22493@sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
Message-ID: <Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>

On Wed, 16 Apr 2008, Robin Holt wrote:

> I don't think this lock mechanism is completely working.  I have
> gotten a few failures trying to dereference 0x100100 which appears to
> be LIST_POISON1.

How does xpmem unregistering of notifiers work?


From Jeffrey.C.Becker at nasa.gov  Wed Apr 16 11:38:34 2008
From: Jeffrey.C.Becker at nasa.gov (Jeff Becker)
Date: Wed, 16 Apr 2008 11:38:34 -0700
Subject: [ofa-general] OpenFabrics Sonoma presentations now available
In-Reply-To: <20080416104729.6e203753.weiny2@llnl.gov>
References: <89260B536D004F29B5FD9E10996DEF13@Gaucho>
	<20080416104729.6e203753.weiny2@llnl.gov>
Message-ID: <480647AA.60905@nasa.gov>

Hi Ira.

Ira Weiny wrote:
> On Wed, 16 Apr 2008 09:55:12 -0700
> "Jeffrey Scott" <jeff at splitrockpr.com> wrote:
>
>   
>> Presentations from the Sonoma Workshop are now available for download on the
>> OpenFabrics website.
>>
>>  
>>
>> http://www.openfabrics.org/archives/april2008sonoma.htm
>>
>>     
>
> Are you still waiting for slides from some participants?  I don't see slides
> for Endance or "Experience with Ranger System".
>   
Yes, that is one of the few presentations I'm still waiting for.

-jeff
> Thanks,
> Ira
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>   


From rdreier at cisco.com  Wed Apr 16 11:49:22 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 11:49:22 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <ada7iexljnx.fsf@cisco.com>

You have

 > +static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
 > +{
 > +	struct mlx4_db_pgdir *pgdir;
 > +
 > +	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
 > +	if (!pgdir)
 > +		return NULL;
 > +
 > +	bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2);

and so on...

If you're going to move the doorbell stuff from mlx4_ib to mlx4_core,
that's fine, but really move it: you should remove the code from mlx4_ib
and use the stuff in mlx4_core rather than having the same stuff
duplicated in two places.  Especially since as this patch stands now,
there are *no* users for the doorbell code in mlx4_core.

 - R.


From rdreier at cisco.com  Wed Apr 16 11:52:27 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 11:52:27 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <ada3aplljis.fsf@cisco.com>

 > +	if (vector == 0) {
 > +		vector = priv->eq_table.last_comp_eq %
 > +			priv->eq_table.num_comp_eqs + 1;
 > +		priv->eq_table.last_comp_eq = vector;
 > +	}

The current IB code is written assuming that 0 is a normal completion
vector I think.  Making 0 be a special "round robin" value is a pretty
big change of policy.

Also there is no locking of last_comp_eq that I can see here, although
maybe it doesn't matter.

 > +	req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1;

I don't think num_online_cpus() is the right thing really... what if a
CPU is hot-plugged later?  num_possible_cpus() seems better to me.

 - R.


From rdreier at cisco.com  Wed Apr 16 11:53:12 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 11:53:12 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <aday77dk4x3.fsf@cisco.com>

 > -	.num_mpt	= 1 << 17,
 > +	.num_mpt	= 1 << 18,

Why this change?


From rdreier at cisco.com  Wed Apr 16 11:53:43 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 11:53:43 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <adatzi1k4w8.fsf@cisco.com>

 > +static int mod_param_num_mac = 1;
 > +module_param_named(num_mac, mod_param_num_mac, int, 0444);

Why prefix these with "mod_param_"?  Seems to make things a little
harder to read.


From rdreier at cisco.com  Wed Apr 16 11:56:21 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 11:56:21 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <adaprspk4ru.fsf@cisco.com>

 > +static int mod_param_if_eth = 1;
 > +module_param_named(if_eth, mod_param_if_eth, bool, 0444);
 > +MODULE_PARM_DESC(if_eth, "Enable ETH interface be loaded (0/1, default 1)");
 > +
 > +static int mod_param_if_fc = 1;
 > +module_param_named(if_fc, mod_param_if_fc, bool, 0444);
 > +MODULE_PARM_DESC(if_fc, "Enable FC interface be loaded (0/1, default 1)");

I don't see any place where these values are checked.  And I don't quite
know why they would be necessary anyway.  Why would someone want to set
one of these to 0?  Couldn't they get the same effect by just not
loading the module in question?

 - R.


From rdreier at cisco.com  Wed Apr 16 12:00:33 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 12:00:33 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <adalk3dk4ku.fsf@cisco.com>

 > t mlx4_qp_to_ready(struct mlx4_dev *dev,
 >      	     struct mlx4_mtt *mtt,
 >      	     struct mlx4_qp_context *context,
 >      	     struct mlx4_qp *qp,
 >      	     enum mlx4_qp_state *qp_state)

I don't see any callers of this function?

 > 
 > +#define STATE_ARR_SIZE 4
 > +	int err = 0;
 > +	int i;
 > +	enum mlx4_qp_state states[STATE_ARR_SIZE] = {
 > +		MLX4_QP_STATE_RST,
 > +		MLX4_QP_STATE_INIT,
 > +		MLX4_QP_STATE_RTR,
 > +		MLX4_QP_STATE_RTS
 > +	};
 > +
 > +	for (i = 0; i < STATE_ARR_SIZE - 1; i++) {

I think it's more idiomatic to write this as:

	enum mlx4_qp_state states[] = {
		MLX4_QP_STATE_RST,
		MLX4_QP_STATE_INIT,
		MLX4_QP_STATE_RTR,
		MLX4_QP_STATE_RTS
	};

	for (i = 0; i < ARRAY_SIZE(states) - 1; i++) {

 > +		context->flags |= cpu_to_be32(states[i+1] << 28);

Do you really want the |= here?  INIT == 1, RTR == 2, so on the
transition from INIT to RTR the value will be 1|2, ie 3.


From holt at sgi.com  Wed Apr 16 12:02:13 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 16 Apr 2008 14:02:13 -0500
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
Message-ID: <20080416190213.GK22493@sgi.com>

On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote:
> On Wed, 16 Apr 2008, Robin Holt wrote:
> 
> > I don't think this lock mechanism is completely working.  I have
> > gotten a few failures trying to dereference 0x100100 which appears to
> > be LIST_POISON1.
> 
> How does xpmem unregistering of notifiers work?

For the tests I have been running, we are waiting for the release
callout as part of exit.

Thanks,
Robin


From bmaimone at maps-inc.org  Wed Apr 16 10:18:37 2008
From: bmaimone at maps-inc.org (hogan regis)
Date: Wed, 16 Apr 2008 17:18:37 +0000
Subject: [ofa-general] we caught you naked general! check the video
Message-ID: <000501c89ff4$02f1aa61$a805bd9e@cmnvoi>

Watch it :)
AVaTPXLxCJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080416/ffa8054e/attachment.html>

From clameter at sgi.com  Wed Apr 16 12:15:08 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 16 Apr 2008 12:15:08 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
 mmu related operation to happen
In-Reply-To: <20080416190213.GK22493@sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080416190213.GK22493@sgi.com>
Message-ID: <Pine.LNX.4.64.0804161214170.14657@schroedinger.engr.sgi.com>

On Wed, 16 Apr 2008, Robin Holt wrote:

> On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote:
> > On Wed, 16 Apr 2008, Robin Holt wrote:
> > 
> > > I don't think this lock mechanism is completely working.  I have
> > > gotten a few failures trying to dereference 0x100100 which appears to
> > > be LIST_POISON1.
> > 
> > How does xpmem unregistering of notifiers work?
> 
> For the tests I have been running, we are waiting for the release
> callout as part of exit.

Some more details on the failure may be useful. AFAICT list_del[_rcu] is 
the culprit here and that is only used on release or unregister.


From rdreier at cisco.com  Wed Apr 16 12:34:41 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 12:34:41 -0700
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <4805B1C6.80004@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 16 Apr 2008 10:59:02 +0300")
References: <4805B1C6.80004@mellanox.co.il>
Message-ID: <ada8wzdk2zy.fsf@cisco.com>

 > +int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx,
 > +			    u32 pd, u32 access, int max_pages,
 > +			    int max_maps, u8 page_shift, struct mlx4_fmr *fmr)

So reading this over in more detail, I now really think it has to be
split up.  There are too many new things added without any users for it
to be possible to review.

 - r.


From yevgenyp at mellanox.co.il  Wed Apr 16 13:15:53 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 16 Apr 2008 23:15:53 +0300
Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support
In-Reply-To: <adawsmxlsaw.fsf@cisco.com>
References: <4805B359.2070906@mellanox.co.il> <adawsmxlsaw.fsf@cisco.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C903CE58A4@mtlexch01.mtl.com>

The mlx4_core driver doesn't allow the configuration you described.
The mlx4_ib module can always assume that if it has only one IB port,
It would always be port number 1. 


Yevgeny Petrilin
Mellanox Technologies
phone: +972-4-9097200 (ext. 7677)
cell:     +972-54-7839222
mailto: yevgenyp at mellanox.co.il

-----Original Message-----
From: Roland Dreier [mailto:rdreier at cisco.com] 
Sent: Wednesday, April 16, 2008 6:43 PM
To: Yevgeny Petrilin
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general][PATCH] mlx4_ib: Multi Protocol support

 > Main changes to mlx4_ib:
 >         1.  Mlx4_ib driver queries the low level driver for number of
IB ports.
 >         2.  Qps are being reserved prior to being allocated.
 >         3.  Cq allocation API change.

As I said before, these mlx4_ib changes should be rolled into the
mlx4_core patches that change these interfaces.

Also, I don't understand exactly how you're handling which ports are IB
and which aren't.  Have you tested this code in the case where port 1 is
non-IB and port 2 is IB?  It seems that you have a bitmap of which ports
are IB:

 > +	foreach_port(i, ibdev->ports_map)
 > +		ibdev->num_ports++;

(By the way, foreach_port() is too generic a name to expose, since it
could easily collide with some general API -- I would use
mlx4_foreach_port() instead)

But then you do stuff like:

 > -	for (p = 1; p <= dev->caps.num_ports; ++p)
 > +	for (p = 1; p <= ibdev->num_ports; ++p)
 >  		mlx4_CLOSE_PORT(dev, p);

which doesn't seem to work if you only have one IB port but it isn't
port 1.

I think there are two sane ways to handle non-IB ports in mlx4_ib:

 - Have mlx4_ib report the number of IB ports as phys_port_cnt and have
   an indirection table that maps from IB port # to physical HCA port #
   (to handle the case where only port 2 is IB, so you need to map IB
   port 1 to HCA physical port 2).  This leads to some confusion with
   the real-world labels on ports I guess, and also I guess you need
   some SMA trickery to report the right port # to the SM.

 - Report the number of physical HCA ports as phys_port_cnt and just
   have non-IB ports always say they're DOWN.  This makes changing
   config on the fly easier, since a port going from DOWN to INIT is a
   pretty normal thing.  I guess there is a little bit of hackery
   involved in handling requests to mlx4_ib that involve non-IB ports.

However your changes seem to take a third way and I don't understand how
it can work.  Perhaps you can clarify?

 - R.


From rdreier at cisco.com  Wed Apr 16 13:24:23 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 13:24:23 -0700
Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903CE58A4@mtlexch01.mtl.com>
	(Yevgeny Petrilin's message of "Wed, 16 Apr 2008 23:15:53 +0300")
References: <4805B359.2070906@mellanox.co.il> <adawsmxlsaw.fsf@cisco.com>
	<6C2C79E72C305246B504CBA17B5500C903CE58A4@mtlexch01.mtl.com>
Message-ID: <adazlrtim4o.fsf@cisco.com>

 > The mlx4_core driver doesn't allow the configuration you described.
 > The mlx4_ib module can always assume that if it has only one IB port,
 > It would always be port number 1. 

Hmm... why this limitation?

 - R.


From rdreier at cisco.com  Wed Apr 16 14:05:16 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 14:05:16 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <4805AB90.6060702@voltaire.com> (Or Gerlitz's message of "Wed, 16
	Apr 2008 10:32:32 +0300")
References: <adabq4cqvbo.fsf@cisco.com> <48045BF3.8040305@voltaire.com>
	<adawsmynb8o.fsf@cisco.com> <4805AB90.6060702@voltaire.com>
Message-ID: <adamyntik8j.fsf@cisco.com>

 > If the section stating the different functions seems not useful it can
 > be removed, I will be happy to hear what other people think, anyway,
 > this section not what this man page is focusing on. I agree that more
 > has to be said on issues such as IB/iWARP differences, thread-safety,
 > fork, etc, so in case you prefer to see this "more" coming out before
 > merging anything, let it be, but please note that its really uneasy
 > for new comers to start programming to IB/iWARP without any man page
 > that gives some generation notion on what is this libibverbs. In that
 > respect, maybe you can merge the first portion of the page without the
 > function listing, and later we can add more info on the various
 > issues?

OK, if you can send a verbs.7 page that includes what you see as the
critical information then I can look at including it.

By the way, I went through libibverbs and tried to make everything
transport agnostic rather that talking only about InfiniBand.  How does
this diff look to people?

diff --git a/README b/README
index 0b1b114..848eb05 100644
--- a/README
+++ b/README
@@ -1,10 +1,11 @@
 Introduction
 ============
 
-libibverbs is a library that allows programs to use InfiniBand "verbs"
-for direct access to IB hardware from userspace.  For more information
-on verbs, see the InfiniBand Architecture Specification vol. 1,
-especially chapter 11.
+libibverbs is a library that allows programs to use RDMA "verbs" for
+direct access to RDMA (currently InfiniBand and iWARP) hardware from
+userspace.  For more information on RDMA verbs, see the InfiniBand
+Architecture Specification vol. 1, especially chapter 11, and the RDMA
+Consortium's RDMA Protocol Verbs Specification.
 
 Using libibverbs
 ================
@@ -28,9 +29,9 @@ can be used.  This will create device nodes named
 
     /dev/infiniband/uverbs0
 
-and so on.  Since the InfiniBand userspace verbs should be safe for
-use by non-privileged, you may want to add an appropriate MODE or
-GROUP to your udev rule.
+and so on.  Since the RDMA userspace verbs should be safe for use by
+non-privileged users, you may want to add an appropriate MODE or GROUP
+to your udev rule.
 
 Permissions
 -----------
@@ -102,7 +103,7 @@ Bugs should be reported to the OpenFabrics mailing list
  * Information about your system:
    - Linux distribution and version
    - Linux kernel and version
-   - InfiniBand hardware and firmware version
+   - InfiniBand/iWARP hardware and firmware version
    - ... any other relevant information
 
  * How to reproduce the bug.  Command line arguments for a libibverbs
diff --git a/debian/changelog b/debian/changelog
index 24582f0..982760d 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -4,9 +4,11 @@ libibverbs (1.1.1-2) unstable; urgency=low
   * Use DEB_DH_MAKESHLIBS_ARGS_ALL to pass appropriate -V option to
     dh_makeshlibs, since new symbols were added in libibverbs 1.1.0.
     (Closes: #465435)
-  * Add debian/watch file
+  * Add debian/watch file.
+  * Update control file to talk about generic RDMA and iWARP, not just 
+    InfiniBand, since libibverbs works with both IB and iWARP.
 
- -- Roland Dreier <rolandd at cisco.com>  Wed, 12 Mar 2008 10:39:38 -0700
+ -- Roland Dreier <rolandd at cisco.com>  Wed, 16 Apr 2008 14:01:58 -0700
 
 libibverbs (1.1.1-1) unstable; urgency=low
 
diff --git a/debian/control.in b/debian/control.in
index 62299fd..7cf933a 100644
--- a/debian/control.in
+++ b/debian/control.in
@@ -10,13 +10,14 @@ Package: libibverbs1
 Section: libs
 Architecture: any
 Depends: ${shlibs:Depends}, ${misc:Depends}, adduser
-Description: A library for direct userspace use of InfiniBand
- libibverbs is a library that allows userspace processes to use
- InfiniBand "verbs" as described in the InfiniBand Architecture
- Specification.  InfiniBand is a high-throughput, low-latency
- networking technology.  InfiniBand host channel adapters (HCAs)
- commonly support direct hardware access from userspace (kernel
- bypass), and libibverbs supports this when available.
+Description: A library for direct userspace use of RDMA (InfiniBand/iWARP)
+ libibverbs is a library that allows userspace processes to use RDMA
+ "verbs" as described in the InfiniBand Architecture Specification and
+ the RDMA Protocol Verbs Specification.  iWARP NICs support RDMA over
+ ethernet, while InfiniBand is a high-throughput, low-latency
+ networking technology.  InfiniBand host channel adapters (HCAs) and
+ iWARP NICs commonly support direct hardware access from userspace
+ (kernel bypass), and libibverbs supports this when available.
  .
  For this library to be useful, a device-specific plug-in module
  should also be installed.
@@ -28,12 +29,13 @@ Section: libdevel
 Architecture: any
 Depends: ${misc:Depends}, libibverbs1 (= ${binary:Version})
 Description: Development files for the libibverbs library
- libibverbs is a library that allows userspace processes to use
- InfiniBand "verbs" as described in the InfiniBand Architecture
- Specification.  InfiniBand is a high-throughput, low-latency
- networking technology.  InfiniBand host channel adapters (HCAs)
- commonly support direct hardware access from userspace (kernel
- bypass), and libibverbs supports this when available.
+ libibverbs is a library that allows userspace processes to use RDMA
+ "verbs" as described in the InfiniBand Architecture Specification and
+ the RDMA Protocol Verbs Specification.  iWARP NICs support RDMA over
+ ethernet, while InfiniBand is a high-throughput, low-latency
+ networking technology.  InfiniBand host channel adapters (HCAs) and
+ iWARP NICs commonly support direct hardware access from userspace
+ (kernel bypass), and libibverbs supports this when available.
  .
  This package is needed to compile programs against libibverbs1.
  It contains the header files and static libraries (optionally)
@@ -45,12 +47,13 @@ Priority: extra
 Architecture: any
 Depends: ${misc:Depends}, libibverbs1 (= ${binary:Version})
 Description: Debugging symbols for the libibverbs library
- libibverbs is a library that allows userspace processes to use
- InfiniBand "verbs" as described in the InfiniBand Architecture
- Specification.  InfiniBand is a high-throughput, low-latency
- networking technology.  InfiniBand host channel adapters (HCAs)
- commonly support direct hardware access from userspace (kernel
- bypass), and libibverbs supports this when available.
+ libibverbs is a library that allows userspace processes to use RDMA
+ "verbs" as described in the InfiniBand Architecture Specification and
+ the RDMA Protocol Verbs Specification.  iWARP NICs support RDMA over
+ ethernet, while InfiniBand is a high-throughput, low-latency
+ networking technology.  InfiniBand host channel adapters (HCAs) and
+ iWARP NICs commonly support direct hardware access from userspace
+ (kernel bypass), and libibverbs supports this when available.
  .
  This package contains the debugging symbols associated with
  libibverbs1. They will automatically be used by gdb for debugging
@@ -61,12 +64,13 @@ Section: net
 Architecture: any
 Depends: ${shlibs:Depends}, ${misc:Depends}
 Description: Examples for the libibverbs library
- libibverbs is a library that allows userspace processes to use
- InfiniBand "verbs" as described in the InfiniBand Architecture
- Specification.  InfiniBand is a high-throughput, low-latency
- networking technology.  InfiniBand host channel adapters (HCAs)
- commonly support direct hardware access from userspace (kernel
- bypass), and libibverbs supports this when available.
+ libibverbs is a library that allows userspace processes to use RDMA
+ "verbs" as described in the InfiniBand Architecture Specification and
+ the RDMA Protocol Verbs Specification.  iWARP NICs support RDMA over
+ ethernet, while InfiniBand is a high-throughput, low-latency
+ networking technology.  InfiniBand host channel adapters (HCAs) and
+ iWARP NICs commonly support direct hardware access from userspace
+ (kernel bypass), and libibverbs supports this when available.
  .
  This package contains useful libibverbs1 example programs such as
  ibv_devinfo, which displays information about InfiniBand devices.
diff --git a/libibverbs.spec.in b/libibverbs.spec.in
index ad57c61..f092b68 100644
--- a/libibverbs.spec.in
+++ b/libibverbs.spec.in
@@ -1,7 +1,7 @@
 Name: libibverbs
 Version: 1.1.1
 Release: 1%{?dist}
-Summary: A library for direct userspace use of InfiniBand hardware
+Summary: A library for direct userspace use of RDMA (InfiniBand/iWARP) hardware
 
 Group: System Environment/Libraries
 License: GPLv2 or BSD
@@ -12,10 +12,10 @@ Requires(post): /sbin/ldconfig
 Requires(postun): /sbin/ldconfig
 
 %description
-libibverbs is a library that allows userspace processes to use
-InfiniBand "verbs" as described in the InfiniBand Architecture
-Specification.  This includes direct hardware access for fast path
-operations.
+libibverbs is a library that allows userspace processes to use RDMA
+"verbs" as described in the InfiniBand Architecture Specification and
+the RDMA Protocol Verbs Specification.  This includes direct hardware
+access for fast path operations.
 
 For this library to be useful, a device-specific plug-in module should
 also be installed.
@@ -41,7 +41,7 @@ Requires: %{name} = %{version}-%{release}
 
 %description utils
 Useful libibverbs1 example programs such as ibv_devinfo, which
-displays information about InfiniBand devices.
+displays information about RDMA devices.
 
 %prep
 %setup -q -n %{name}- at VERSION@
diff --git a/man/ibv_alloc_pd.3 b/man/ibv_alloc_pd.3
index 017ab32..28b7953 100644
--- a/man/ibv_alloc_pd.3
+++ b/man/ibv_alloc_pd.3
@@ -13,7 +13,7 @@ ibv_alloc_pd, ibv_dealloc_pd \- allocate or deallocate a protection domain (PDs)
 .fi
 .SH "DESCRIPTION"
 .B ibv_alloc_pd()
-allocates a PD for the InfiniBand device context 
+allocates a PD for the RDMA device context 
 .I context\fR.
 .PP
 .B ibv_dealloc_pd()
@@ -27,8 +27,8 @@ returns a pointer to the allocated PD, or NULL if the request fails.
 returns 0 on success, or the value of errno on failure (which indicates the failure reason).
 .SH "NOTES"
 .B ibv_dealloc_pd()
-may fail if any other InfiniBand resource is still associated with the
-PD being freed.
+may fail if any other resource is still associated with the PD being
+freed.
 .SH "SEE ALSO"
 .BR ibv_reg_mr (3),
 .BR ibv_create_srq (3),
diff --git a/man/ibv_asyncwatch.1 b/man/ibv_asyncwatch.1
index aed316d..ece25f8 100644
--- a/man/ibv_asyncwatch.1
+++ b/man/ibv_asyncwatch.1
@@ -8,7 +8,7 @@ ibv_asyncwatch \- display asynchronous events
 
 .SH DESCRIPTION
 .PP
-Display asynchronous events forwarded to userspace for an InfiniBand device.
+Display asynchronous events forwarded to userspace for an RDMA device.
 
 .SH AUTHORS
 .TP
diff --git a/man/ibv_create_ah_from_wc.3 b/man/ibv_create_ah_from_wc.3
index 487f053..bc5d135 100644
--- a/man/ibv_create_ah_from_wc.3
+++ b/man/ibv_create_ah_from_wc.3
@@ -21,7 +21,7 @@ address handle (AH) from a work completion
 .B ibv_init_ah_from_wc()
 initializes the address handle (AH) attribute structure
 .I ah_attr
-for the InfiniBand device context
+for the RDMA device context
 .I context
 using the port number
 .I port_num\fR,
diff --git a/man/ibv_create_comp_channel.3 b/man/ibv_create_comp_channel.3
index e0e1e68..d8e17f1 100644
--- a/man/ibv_create_comp_channel.3
+++ b/man/ibv_create_comp_channel.3
@@ -15,7 +15,7 @@ destroy a completion event channel
 .fi
 .SH "DESCRIPTION"
 .B ibv_create_comp_channel()
-creates a completion event channel for the InfiniBand device context
+creates a completion event channel for the RDMA device context
 .I context\fR.
 .PP
 .B ibv_destroy_comp_channel()
@@ -29,13 +29,14 @@ returns a pointer to the created completion event channel, or NULL if the reques
 returns 0 on success, or the value of errno on failure (which indicates the failure reason).
 .SH "NOTES"
 A "completion channel" is an abstraction introduced by libibverbs that
-does not exist in the InfiniBand Architecture verbs specification.  A
-completion channel is essentially file descriptor that is used to
-deliver completion notifications to a userspace process.  When a
-completion event is generated for a completion queue (CQ), the event
-is delivered via the completion channel attached to that CQ.  This may
-be useful to steer completion events to different threads by using
-multiple completion channels.
+does not exist in the InfiniBand Architecture verbs specification or
+RDMA Protocol Verbs Specification.  A completion channel is
+essentially file descriptor that is used to deliver completion
+notifications to a userspace process.  When a completion event is
+generated for a completion queue (CQ), the event is delivered via the
+completion channel attached to that CQ.  This may be useful to steer
+completion events to different threads by using multiple completion
+channels.
 .PP
 .B ibv_destroy_comp_channel()
 fails if any CQs are still associated with the completion event
diff --git a/man/ibv_create_cq.3 b/man/ibv_create_cq.3
index bb256d5..211feea 100644
--- a/man/ibv_create_cq.3
+++ b/man/ibv_create_cq.3
@@ -18,7 +18,7 @@ ibv_create_cq, ibv_destroy_cq \- create or destroy a completion queue (CQ)
 .B ibv_create_cq()
 creates a completion queue (CQ) with at least
 .I cqe
-entries for the InfiniBand device context
+entries for the RDMA device context
 .I context\fR.
 The pointer
 .I cq_context
diff --git a/man/ibv_devices.1 b/man/ibv_devices.1
index 084d01a..99b27e5 100644
--- a/man/ibv_devices.1
+++ b/man/ibv_devices.1
@@ -1,14 +1,14 @@
 .TH IBV_DEVICES 1 "August 30, 2005" "libibverbs" "USER COMMANDS"
 
 .SH NAME
-ibv_devices \- list InfiniBand devices
+ibv_devices \- list RDMA devices
 
 .SH SYNOPSIS
 .B ibv_devices
 
 .SH DESCRIPTION
 .PP
-List InfiniBand devices available for use from userspace.
+List RDMA devices available for use from userspace.
 
 .SH SEE ALSO
 .BR ibv_devinfo (1)
diff --git a/man/ibv_devinfo.1 b/man/ibv_devinfo.1
index 5656e14..41878b2 100644
--- a/man/ibv_devinfo.1
+++ b/man/ibv_devinfo.1
@@ -1,7 +1,7 @@
 .TH IBV_DEVINFO 1 "August 30, 2005" "libibverbs" "USER COMMANDS"
 
 .SH NAME
-ibv_devinfo \- query InfiniBand devices
+ibv_devinfo \- query RDMA devices
 
 .SH SYNOPSIS
 .B ibv_devinfo
@@ -9,7 +9,7 @@ ibv_devinfo \- query InfiniBand devices
 
 .SH DESCRIPTION
 .PP
-Print information about InfiniBand devices available for use from userspace.
+Print information about RDMA devices available for use from userspace.
 
 .SH OPTIONS
 
@@ -22,10 +22,10 @@ use IB device \fIDEVICE\fR (default first device found)
 query port \fIPORT\fR (default all ports)
 
 \fB\-l\fR, \fB\-\-list\fR
-only list names of InfiniBand devices
+only list names of RDMA devices
 
 \fB\-v\fR, \fB\-\-verbose\fR
-print all available information about InfiniBand devices
+print all available information about RDMA devices
 
 .SH SEE ALSO
 .BR ibv_devices (1)
diff --git a/man/ibv_get_async_event.3 b/man/ibv_get_async_event.3
index 77e8be8..076f757 100644
--- a/man/ibv_get_async_event.3
+++ b/man/ibv_get_async_event.3
@@ -14,7 +14,7 @@ ibv_get_async_event, ibv_ack_async_event \- get or acknowledge asynchronous even
 .fi
 .SH "DESCRIPTION"
 .B ibv_get_async_event()
-waits for the next async event of the InfiniBand device context
+waits for the next async event of the RDMA device context
 .I context
 and returns it through the pointer
 .I event\fR,
diff --git a/man/ibv_get_device_guid.3 b/man/ibv_get_device_guid.3
index 03f444a..98c0499 100644
--- a/man/ibv_get_device_guid.3
+++ b/man/ibv_get_device_guid.3
@@ -2,7 +2,7 @@
 .\"
 .TH IBV_GET_DEVICE_GUID 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
 .SH "NAME"
-ibv_get_device_guid \- get an InfiniBand device's GUID
+ibv_get_device_guid \- get an RDMA device's GUID
 .SH "SYNOPSIS"
 .nf
 .B #include <infiniband/verbs.h>
@@ -11,7 +11,7 @@ ibv_get_device_guid \- get an InfiniBand device's GUID
 .fi
 .SH "DESCRIPTION"
 .B ibv_get_device_name()
-returns the Global Unique IDentifier (GUID) of the InfiniBand device
+returns the Global Unique IDentifier (GUID) of the RDMA device
 .I device\fR.
 .SH "RETURN VALUE"
 .B ibv_get_device_guid()
diff --git a/man/ibv_get_device_list.3 b/man/ibv_get_device_list.3
index 4dd8180..104c137 100644
--- a/man/ibv_get_device_list.3
+++ b/man/ibv_get_device_list.3
@@ -2,7 +2,7 @@
 .\"
 .TH IBV_GET_DEVICE_LIST 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
 .SH "NAME"
-ibv_get_device_list, ibv_free_device_list \- get and release list of available InfiniBand devices
+ibv_get_device_list, ibv_free_device_list \- get and release list of available RDMA devices
 .SH "SYNOPSIS"
 .nf
 .B #include <infiniband/verbs.h>
@@ -13,7 +13,7 @@ ibv_get_device_list, ibv_free_device_list \- get and release list of available I
 .fi
 .SH "DESCRIPTION"
 .B ibv_get_device_list()
-returns a NULL-terminated array of InfiniBand devices currently available.
+returns a NULL-terminated array of RDMA devices currently available.
 The argument
 .I num_devices
 is optional; if not NULL, it is set to the number of devices returned in the array.
@@ -25,7 +25,7 @@ returned by
 .B ibv_get_device_list()\fR.
 .SH "RETURN VALUE"
 .B ibv_get_device_list()
-returns the array of available InfiniBand devices, or NULL if the request fails.
+returns the array of available RDMA devices, or NULL if the request fails.
 .PP
 .B ibv_free_device_list()
 returns no value.
diff --git a/man/ibv_get_device_name.3 b/man/ibv_get_device_name.3
index c53f97d..284ea9f 100644
--- a/man/ibv_get_device_name.3
+++ b/man/ibv_get_device_name.3
@@ -2,7 +2,7 @@
 .\"
 .TH IBV_GET_DEVICE_NAME 3  2006-10-31 libibverbs "Libibverbs Programmer's Manual"
 .SH "NAME"
-ibv_get_device_name \- get an InfiniBand device's name
+ibv_get_device_name \- get an RDMA device's name
 .SH "SYNOPSIS"
 .nf
 .B #include <infiniband/verbs.h>
@@ -11,7 +11,7 @@ ibv_get_device_name \- get an InfiniBand device's name
 .fi
 .SH "DESCRIPTION"
 .B ibv_get_device_name()
-returns a human-readable name associated with the InfiniBand device
+returns a human-readable name associated with the RDMA device
 .I device\fR.
 .SH "RETURN VALUE"
 .B ibv_get_device_name()
diff --git a/man/ibv_open_device.3 b/man/ibv_open_device.3
index 1858a42..61fa82b 100644
--- a/man/ibv_open_device.3
+++ b/man/ibv_open_device.3
@@ -2,7 +2,7 @@
 .\"
 .TH IBV_OPEN_DEVICE 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
 .SH "NAME"
-ibv_open_device, ibv_close_device \- open and close an InfiniBand device context
+ibv_open_device, ibv_close_device \- open and close an RDMA device context
 .SH "SYNOPSIS"
 .nf
 .B #include <infiniband/verbs.h>
diff --git a/man/ibv_query_device.3 b/man/ibv_query_device.3
index f327769..3bf7511 100644
--- a/man/ibv_query_device.3
+++ b/man/ibv_query_device.3
@@ -2,7 +2,7 @@
 .\"
 .TH IBV_QUERY_DEVICE 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
 .SH "NAME"
-ibv_query_device \- query an InfiniBand device's attributes
+ibv_query_device \- query an RDMA device's attributes
 .SH "SYNOPSIS"
 .nf
 .B #include <infiniband/verbs.h>
diff --git a/man/ibv_query_port.3 b/man/ibv_query_port.3
index fd61eb9..c6b3b63 100644
--- a/man/ibv_query_port.3
+++ b/man/ibv_query_port.3
@@ -2,7 +2,7 @@
 .\"
 .TH IBV_QUERY_PORT 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
 .SH "NAME"
-ibv_query_port \- query an InfiniBand port's attributes
+ibv_query_port \- query an RDMA port's attributes
 .SH "SYNOPSIS"
 .nf
 .B #include <infiniband/verbs.h>
diff --git a/man/ibv_query_qp.3 b/man/ibv_query_qp.3
index fd1f41d..8da270e 100644
--- a/man/ibv_query_qp.3
+++ b/man/ibv_query_qp.3
@@ -68,7 +68,7 @@ returns 0 on success, or the value of errno on failure (which indicates the fail
 The argument
 .I attr_mask
 is a hint that specifies the minimum list of attributes to retrieve.
-Some InfiniBand devices may return extra attributes not requested, for
+Some RDMA devices may return extra attributes not requested, for
 example if the value can be returned cheaply.
 .PP
 Attribute values are valid if they have been set using


From sean.hefty at intel.com  Wed Apr 16 15:05:37 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 16 Apr 2008 15:05:37 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <adamyntik8j.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com>
	<48045BF3.8040305@voltaire.com><adawsmynb8o.fsf@cisco.com>
	<4805AB90.6060702@voltaire.com> <adamyntik8j.fsf@cisco.com>
Message-ID: <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com>

>+ the RDMA Protocol Verbs Specification.  iWARP NICs support RDMA over
>+ ethernet, while InfiniBand is a high-throughput, low-latency

I'm not convinced this is really better for a high-level readme, but would
saying "iWarp Ethernet NICs support RDMA over TCP" be clearer?  I'm thinking
about ConnectX, which provides RDMA over Ethernet, but doesn't use the iWarp
protocol.  And iWarp RDMA should be usable even if crossing non-Ethernet
subnets.

- Sean


From eddiem at sgi.com  Wed Apr 16 15:27:14 2008
From: eddiem at sgi.com (Edward Mascarenhas)
Date: Wed, 16 Apr 2008 15:27:14 -0700
Subject: [ofa-general] Re: [ewg] OFED March 24 meeting summary on OFED
	1.4 plans
In-Reply-To: <D89C2C212795564B837FA1665CAE02991016C3CFB0@G5W0278.americas.hpqcorp.net>
References: <6C2C79E72C305246B504CBA17B5500C90282E5BB@mtlexch01.mtl.com>	<47F4E0C3.2030100@voltaire.com>	<D89C2C212795564B837FA1665CAE029910168DBCE5@G5W0278.americas.hpqcorp.net>	<1207233121.29024.410.camel@hrosenstock-ws.xsigo.com>	<D89C2C212795564B837FA1665CAE029910168DBD26@G5W0278.americas.hpqcorp.net>	<15ddcffd0804032117o21e6d62br9def3e46d4d513c4@mail.gmail.com>
	<D89C2C212795564B837FA1665CAE02991016C3CFB0@G5W0278.americas.hpqcorp.net>
Message-ID: <48067D42.3020303@sgi.com>

The SGI Altix ICE cluster system supports 2 InfiniBand fabrics.
http://www.sgi.com/products/servers/altix/ice/
Each compute node has 2 HCAs and each is connected to a separate 
fabric. We recommend that users use one fabric for storage traffic and 
the other for MPI, but there is no reason why both fabrics could not 
be used for MPI. OpenMPI requires setting a separate subnet prefix for 
each fabric to use both fabrics for MPI and OpenSM supports this 
setting of subnet prefix. Other MPIs do not require this.

Edward


on 04/04/2008 08:08 AM Tang, Changqing said the following:
> What I mean "claim to support" is to have more people to test with this config.
> 
> --CQ
> 
>> -----Original Message-----
>> From: Or Gerlitz [mailto:or.gerlitz at gmail.com]
>> Sent: Thursday, April 03, 2008 11:18 PM
>> To: Tang, Changqing
>> Cc: general at lists.openfabrics.org; ewg at lists.openfabrics.org
>> Subject: Re: [ofa-general] Re: [ewg] OFED March 24 meeting
>> summary on OFED 1.4 plans
>>
>> On Thu, Apr 3, 2008 at 5:40 PM, Tang, Changqing
>> <changquing.tang at hp.com> wrote:
>>
>>>  The problem is, from MPI side, (and by default), we don't
>> know which
>>> port is on which  fabric, since the subnet prefix is the
>> same. We rely
>>> on system admin to config two  different subnet prefixes
>> for HP-MPI to work.
>>>  No vendor has claimed to support this.
>> CQ, not supporting a different subnet prefix per IB subnet is
>> against IB nature, I don't think there should be any problem
>> to configure a different prefix at each open SM instance and
>> the Linux host stack would work perfectly under this config.
>> If you are a ware to any problem in the opensm and/or the
>> host stack please let the community know and the maintainers
>> will fix it.
>>
>> Or.
>>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From rdreier at cisco.com  Wed Apr 16 15:54:13 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 15:54:13 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> (Sean Hefty's
	message of "Wed, 16 Apr 2008 15:05:37 -0700")
References: <adabq4cqvbo.fsf@cisco.com> <48045BF3.8040305@voltaire.com>
	<adawsmynb8o.fsf@cisco.com> <4805AB90.6060702@voltaire.com>
	<adamyntik8j.fsf@cisco.com>
	<000201c8a00e$00f32460$7de0180a@amr.corp.intel.com>
Message-ID: <adaiqyhif6y.fsf@cisco.com>

 > I'm not convinced this is really better for a high-level readme, but would
 > saying "iWarp Ethernet NICs support RDMA over TCP" be clearer?  I'm thinking
 > about ConnectX, which provides RDMA over Ethernet, but doesn't use the iWarp
 > protocol.  And iWarp RDMA should be usable even if crossing non-Ethernet
 > subnets.

It's a good point... I wasn't sure how to phrase things in the best
way.  All current iWARP NICs do TCP, but there is an IETF RFC for iWARP
over SCTP too.  So one could say "RDMA over IP" but then again the
ConnectX ethernet thing is really RDMA over IP too...

 - R.


From rdreier at cisco.com  Wed Apr 16 21:26:32 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 16 Apr 2008 21:26:32 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <000201c8a00e$00f32460$7de0180a@amr.corp.intel.com> (Sean Hefty's
	message of "Wed, 16 Apr 2008 15:05:37 -0700")
References: <adabq4cqvbo.fsf@cisco.com> <48045BF3.8040305@voltaire.com>
	<adawsmynb8o.fsf@cisco.com> <4805AB90.6060702@voltaire.com>
	<adamyntik8j.fsf@cisco.com>
	<000201c8a00e$00f32460$7de0180a@amr.corp.intel.com>
Message-ID: <adawsmxgl8n.fsf@cisco.com>

How about "iWARP ethernet NICs support RDMA over hardware-offloaded TCP"?


From dorfman.eli at gmail.com  Thu Apr 17 04:13:01 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Thu, 17 Apr 2008 14:13:01 +0300
Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header
In-Reply-To: <adaskxlls4u.fsf@cisco.com>
References: <4804B03C.6060507@voltaire.com>
	<OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
	<694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>
	<20080416144830.GC23861@osc.edu> <adaskxlls4u.fsf@cisco.com>
Message-ID: <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com>

On Wed, Apr 16, 2008 at 6:46 PM, Roland Dreier <rdreier at cisco.com> wrote:
>  > Agree with the interpretation of the spec, and it's probably a bit
>   > clearer that way too.  But we have working initiators and targets
>   > that do it the "wrong" way.
>
>  Yes... I guess the key question is whether there are any initiators that
>  do things the "right" way.
>
>
>   > 1. Flag day: all initiators and targets change at the same time.
>   > Will see data corruption if someone unluckily runs one or the other
>   > using old non-fixed code.
>
>  Seems unacceptable to me... it doesn't make sense at all to break every
>  setup in the world just to be "right" according to the spec.

This will break only when both initiator and target will use
InitialR2T=No, which means allow unsolicited data.
As far as I know, STGT is not very common (and its version in RHEL5.1
is considered experimental). Its default is also InitialR2T=Yes.
Voltaire's iSCSI over iSER target also uses default InitialR2T=Yes.
So it seems that nothing will break.

>
>
>   > 2. Rewrite the IB Annex to codify what's done in practice, and don't
>   > "fix" any code.
>
>  If existing practice is universally to do things "wrong" then this seems
>  to me by far the best way to proceed.

Assuming there aren't many iSER installation that currently work with
unsolicited data, then it is the right time to do it right.
Future implementation will rely on the spec and unless you modify the
spec this will lead to greater confusion.


From holt at sgi.com  Thu Apr 17 04:14:04 2008
From: holt at sgi.com (Robin Holt)
Date: Thu, 17 Apr 2008 06:14:04 -0500
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <Pine.LNX.4.64.0804161214170.14657@schroedinger.engr.sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080416190213.GK22493@sgi.com>
	<Pine.LNX.4.64.0804161214170.14657@schroedinger.engr.sgi.com>
Message-ID: <20080417111404.GL22493@sgi.com>

On Wed, Apr 16, 2008 at 12:15:08PM -0700, Christoph Lameter wrote:
> On Wed, 16 Apr 2008, Robin Holt wrote:
> 
> > On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote:
> > > On Wed, 16 Apr 2008, Robin Holt wrote:
> > > 
> > > > I don't think this lock mechanism is completely working.  I have
> > > > gotten a few failures trying to dereference 0x100100 which appears to
> > > > be LIST_POISON1.
> > > 
> > > How does xpmem unregistering of notifiers work?
> > 
> > For the tests I have been running, we are waiting for the release
> > callout as part of exit.
> 
> Some more details on the failure may be useful. AFAICT list_del[_rcu] is 
> the culprit here and that is only used on release or unregister.

I think I have this understood now.  It happens quite quickly (within
10 minutes) on a 128 rank job of small data set in a loop.

In these failing jobs, all the ranks are nearly symmetric.  There is
a certain part of each ranks address space that has access granted.
All the ranks have included all the other ranks including themselves in
exactly the same layout at exactly the same virtual address.

Rank 3 has hit _release and is beginning to clean up, but has not deleted
the notifier from its list.

Rank 9 calls the xpmem_invalidate_page() callout.  That page was attached
by rank 3 so we call zap_page_range on rank 3 which then calls back into
xpmem's invalidate_range_start callout.

The rank 3 _release callout begins and deletes its notifier from the list.

Rank 9's call to rank 3's zap_page_range notifier returns and dereferences
LIST_POISON1.

I often confuse myself while trying to explain these so please kick me
where the holes in the flow appear.  The console output from the simple
debugging stuff I put in is a bit overwhelming.


I am trying to figure out now which locks we hold as part of the zap
callout that should have prevented the _release callout.

Thanks,
Robin


From liranl at mellanox.co.il  Thu Apr 17 05:36:44 2008
From: liranl at mellanox.co.il (Liran Liss)
Date: Thu, 17 Apr 2008 15:36:44 +0300
Subject: [ofa-general][PATCH] mlx4_ib: Multi Protocol support
In-Reply-To: <adawsmxlsaw.fsf@cisco.com>
Message-ID: <40FA0A8088E8A441973D37502F00933E39FD@mtlexch01.mtl.com>


> 
> I think there are two sane ways to handle non-IB ports in mlx4_ib:
> 
>  - Have mlx4_ib report the number of IB ports as phys_port_cnt and
have
>    an indirection table that maps from IB port # to physical HCA port
#
>    (to handle the case where only port 2 is IB, so you need to map IB
>    port 1 to HCA physical port 2).  This leads to some confusion with
>    the real-world labels on ports I guess, and also I guess you need
>    some SMA trickery to report the right port # to the SM.
> 
>  - Report the number of physical HCA ports as phys_port_cnt and just
>    have non-IB ports always say they're DOWN.  This makes changing
>    config on the fly easier, since a port going from DOWN to INIT is a
>    pretty normal thing.  I guess there is a little bit of hackery
>    involved in handling requests to mlx4_ib that involve non-IB ports.
> 
> However your changes seem to take a third way and I don't understand
how
> it can work.  Perhaps you can clarify?
> 
>  - R.

We intend to handle non-IB ports (Ethernet) just as IB ports, where all
IB traffic that passes through Ethernet ports is IBoE.
So basically, we will register ConnectX as a dual-ported HCA for all
configurations. Many ULPs would run transparently on IB or IBoE,
depending on the port type.
In addition, port numbers always remain true to their physical ports.

Until the IBoE implementation is completed, we temporarily disallow the
configuration in which port 1 is eth and port 2 is ib. This allows us to
register ConnectX as a single-port HCA to the ib core when port 2 is
eth, without the aforementioned (and temporary) hacks.

--Liran


From liranl at mellanox.co.il  Thu Apr 17 05:59:48 2008
From: liranl at mellanox.co.il (Liran Liss)
Date: Thu, 17 Apr 2008 15:59:48 +0300
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <ada3aplljis.fsf@cisco.com>
Message-ID: <40FA0A8088E8A441973D37502F00933E39FE@mtlexch01.mtl.com>

>  > +	if (vector == 0) {
>  > +		vector = priv->eq_table.last_comp_eq %
>  > +			priv->eq_table.num_comp_eqs + 1;
>  > +		priv->eq_table.last_comp_eq = vector;
>  > +	}
> 
> The current IB code is written assuming that 0 is a normal completion
> vector I think.  Making 0 be a special "round robin" value is a pretty
> big change of policy.
> 

This is a change in policy that was unknown and not configured
anywhere...
Generally, distributing the interrupt load (and the software interrupt
handling associated with it) among all CPUs is a good thing, especially
when the ULPs using theses interrupts are unrelated.
For example, distributing TCP flows among multiple cores is important
for 10GE devices to sustain wire-speed with lots of connections.

So, for applications that don't care how many vectors are there and
which vector they want to use, we should support some VECTOR_ANY value
that enables mlx4_core to optimize and load balance the interrupt load.

A round-robin scheme seems like a good start. We could also initially
make the VECTOR_ANY policy a module parameter (i.e., use either CPU0 or
round-robin) until we obtain more experience with actual deployments.

As for the VECTOR_ANY value, we can make it 0 (good for "porting" all
existing ULPs and user-apps but doesn't match the CPU numbering, which
is zero based) or some other designated value, e.g., 0xff (will require
modifying all ULPs that don't use specific vectors).

Any preferences?

--Liran


From yevgenyp at mellanox.co.il  Thu Apr 17 06:03:30 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Thu, 17 Apr 2008 16:03:30 +0300
Subject: [ofa-general][PATCH] mlx4_core: Multi Protocol support
In-Reply-To: <ada1w55n79c.fsf@cisco.com>
References: <4805B1C6.80004@mellanox.co.il> <ada1w55n79c.fsf@cisco.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36D43@mtlexch01.mtl.com>

 
Thank you for the comrehensive review.
I will split the patches by topics and send them separately,
along with the fixes to other remarks you made

Yevgeny


-----Original Message-----
From: Roland Dreier [mailto:rdreier at cisco.com] 
Sent: Wednesday, April 16, 2008 6:34 PM
To: Yevgeny Petrilin
Cc: general at lists.openfabrics.org
Subject: Re: [ofa-general][PATCH] mlx4_core: Multi Protocol support

Your email has

 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed

and the format=flowed means that the patch gets corrupted and won't
apply.  So when you resend, please fix.

I don't think we can really apply this as one patch -- it does too many
things at once and needs to be split up... I think pretty much each of
these items is independent and could be a separate patch:

 >         1.  Mlx4 device now holds the actual protocol for each port.
 >             The port types are determined through module parameters
of through sysfs
 >             interface. The requested types are verified with firmware
capabilities
 >             in order to determine the actual port protocol.
 >         2.  The driver now manages Mac and Vlan tables used by
customers of the low
 >             level driver. Corresponding commands were added.
 >         3.  Completion eq's are created per cpu. Created cq's are
attached to an eq by
 >             "Round Robin" algorithm, unless a specific eq was
requested.
 >         4.  Creation of a collapsed cq support was added.
 >         5.  Additional reserved qp ranges were added. There is a
range for the customers
 >             of the low level driver (IB, Ethernet, FCoE).
 >         6.  Qp allocation process changed.
 >             First a qp range should be reserved, then qps can be
allocated from that
 >             range. This is to support the ability to allocate
consecutive qps.
 >             Appropriate changes were made in the allocation
mechanism.
 >         7.  Common actions to all HW resource management (Doorbell
allocation,
 >             Buffer allocation, Mtt write) were moved to the low level
driver.

Also, on the other hand, the current two patches are too split up: if I
apply this patch then mlx4_ib won't compile until the second patch goes
in too.  Which means someone trying to bisect an mlx4 bug gets into
trouble.  So please make sure that everything still compiles and works
after each patch is applied.

By the way, the multiple EQ stuff is a pretty major change in
behavior... are we really ready for this?  Round robin seems like it
could easily lead to worst-case behavior for some plausible workloads.

Finally, checkpatch.pl shows a few minor whitespace problems... please
fix when you resend.

 - R.


From amirv at mellanox.co.il  Thu Apr 17 06:23:00 2008
From: amirv at mellanox.co.il (Amir Vadai)
Date: Thu, 17 Apr 2008 16:23:00 +0300
Subject: [ofa-general] CM goes to timewait state without waiting for
	disconnect reply
Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36D79@mtlexch01.mtl.com>

Sean Hi,
 
I'm working on some SDP bugs in OFED 1.3 
 
In the spec, a normal flow to close a connection is
at the client side: State "Established" ---- send DREQ ---> State "DREQ
sent" --- receive DREP ---> State "TimeWait"  ---> State "Idle"
 
According to the code and tests I did, it seems that ib_cm doesn't wait
for DREP and goes directly from "DREQ sent" into "TimeWait".
 
This is obviously not good - Because the client might think the
connection is closed while the CM in the server side isn't in
listen/timewait mode. 
 
I think that this is a bug, am I right?
 
--- Amir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080417/3f3f74e8/attachment.html>

From moshek at voltaire.com  Thu Apr 17 06:47:35 2008
From: moshek at voltaire.com (Moshe Kazir)
Date: Thu, 17 Apr 2008 16:47:35 +0300
Subject: [ofa-general] Starting openibd before the network service
In-Reply-To: <4805F692.1040101@dev.mellanox.co.il>
References: <4805F692.1040101@dev.mellanox.co.il>
Message-ID: <39C75744D164D948A170E9792AF8E7CAC5AF06@exil.voltaire.com>

 
>From bonding and ipoib point of view, it's better to have openibd
started before the network service is started .

In the openibd script we find that in SUSE network service is started
before openibd  ->

### BEGIN INIT INFO
# Provides:       openibd
# Required-Start: $local_fs $network

 
Can someone explain why ?

Can we change it before OFED-1.3.1 ?

Moshe

____________________________________________________________
Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
 
Voltaire - The Grid Backbone
 
 www.voltaire.com

  
From ufkyus at bldgpreservation.com  Thu Apr 17 06:46:50 2008
From: ufkyus at bldgpreservation.com (Norris Blanco)
Date: Thu, 17 Apr 2008 14:46:50 +0100
Subject: [ofa-general] Re: Hello you can do more
Message-ID: <01c8a099$de4ba900$ce853055@ufkyus>

More wonderful nights. More relations. More love.
You can get it.
Get it!

Watch attached file for our site and information.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: my_file.zip
Type: application/zip
Size: 330 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080417/a6a69439/attachment.zip>

From hrosenstock at xsigo.com  Thu Apr 17 07:30:08 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 17 Apr 2008 07:30:08 -0700
Subject: [ofa-general] mlx4_core internal error with OFED 1.2.5.4
Message-ID: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com>

Hi,

I'm running OFED 1.2.5.4 and got the following:

mlx4_core 0000:01:00.0: Internal error detected:
mlx4_core 0000:01:00.0:   buf[00]: 00020000
mlx4_core 0000:01:00.0:   buf[01]: c0010eb6
mlx4_core 0000:01:00.0:   buf[02]: 20030000
mlx4_core 0000:01:00.0:   buf[03]: 00000000
mlx4_core 0000:01:00.0:   buf[04]: 00000000
mlx4_core 0000:01:00.0:   buf[05]: 00000000
mlx4_core 0000:01:00.0:   buf[06]: 00000000
mlx4_core 0000:01:00.0:   buf[07]: 00000000
mlx4_core 0000:01:00.0:   buf[08]: 00000000
mlx4_core 0000:01:00.0:   buf[09]: 00000000
mlx4_core 0000:01:00.0:   buf[0a]: 00000000
mlx4_core 0000:01:00.0:   buf[0b]: 00000000
mlx4_core 0000:01:00.0:   buf[0c]: 00000000
mlx4_core 0000:01:00.0:   buf[0d]: 00000000
mlx4_core 0000:01:00.0:   buf[0e]: 00000000
mlx4_core 0000:01:00.0:   buf[0f]: 00000000

Is there any more information that can be provided by decoding this as
to what the error was ? Thanks.

-- Hal


From rdreier at cisco.com  Thu Apr 17 07:53:33 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Thu, 17 Apr 2008 07:53:33 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <adaskxkh6s2.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get the first batch of things queued for 2.6.26: sparse
cleanups, new HW support for the ipath driver, IPoIB updates, and
miscellaneous fixes all over.

Arthur Jones (7):
      IB/ipath: Fix sparse warning about pointer signedness
      IB/ipath: Misc sparse warning cleanup
      IB/ipath: Provide I/O bus speeds for diagnostic purposes
      IB/ipath: Fix link up LED display
      IB/ipath: User mode send DMA header file
      IB/ipath: User mode send DMA
      IB/ipath: Misc changes to prepare for IB7220 introduction

Dave Olson (10):
      IB/ipath: Make some constants chip-specific, related cleanup
      IB/ipath: Shared context code needs to be sure device is usable
      IB/ipath: Enable 4KB MTU
      IB/ipath: HW workaround for case where chip can send but not receive
      IB/ipath: Make link state transition code ignore (transient) link recovery
      IB/ipath: Add support for IBTA 1.2 Heartbeat
      IB/ipath: Set LID filtering for HCAs that support it.
      IB/ipath: Enable reduced PIO update for HCAs that support it.
      IB/ipath: Fix check for no interrupts to reliably fallback to INTx
      IB/ipath: add calls to new 7220 code and enable in build

David Dillow (1):
      IB/srp: Enforce protocol limit on srp_sg_tablesize

Dotan Barak (3):
      IB/core: Check optional verbs before using them
      IB/mthca: Update QP state if query QP succeeds
      IB/mlx4: Update QP state if query QP succeeds

Eli Cohen (13):
      IPoIB: Use checksum offload support if available
      IB/mlx4: Add IPoIB checksum offload support
      IB/mthca: Add IPoIB checksum offload support
      IB/core: Add creation flags to struct ib_qp_init_attr
      IB/core: Add IPoIB UD LSO support
      IPoIB: Add LSO support
      IB/mlx4: Add IPoIB LSO support
      IPoIB: Add basic ethtool support
      IB/core: Add support for modify CQ
      IPoIB: Support modifying IPoIB CQ event moderation
      IB/mlx4: Add support for modifying CQ moderation parameters
      IB/mlx4: Fix race when detaching a QP from a multicast group
      IB/mlx4: Fix incorrect comment

Erez Zilber (2):
      IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event
      IB/iser: Don't change itt endianness

Harvey Harrison (1):
      IB: Replace remaining __FUNCTION__ occurrences with __func__

Hoang-Nam Nguyen (1):
      IB/ehca: Remove tgid checking

Jack Morgenstein (3):
      mlx4_core: Increase max number of QPs to 128K
      IB/mthca: Update module version and release date
      IB/mlx4: Update module version and release date

John Gregor (2):
      IB/ipath: Head of Line blocking vs forward progress of user apps
      IB/ipath: Add code for IBA7220 send DMA

Julia Lawall (1):
      RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0

Michael Albaugh (5):
      IB/ipath: Prevent link-recovery code from negating admin disable
      IB/ipath: EEPROM support for 7220 devices, robustness improvements, cleanup
      IB/ipath: Allow old and new diagnostic packet formats
      IB/ipath: Isolate 7220-specific content
      IB/ipath: Support for SerDes portion of IBA7220

Ralph Campbell (18):
      IB/ipath: Fix byte order of pioavail in handle_errors()
      IB/ipath: Fix error recovery for send buffer status after chip freeze mode
      IB/ipath: Don't try to handle freeze mode HW errors if diagnostic mode
      IB/ipath: Make debug error message match the constraint that is checked for
      IB/ipath: Add code to support multiple link speeds and widths
      IB/ipath: Remove useless comments
      IB/ipath: Fix sanity checks on QP number of WRs and SGEs
      IB/ipath: Change the module author
      IB/ipath: Remove some useless (void) casts
      IB/ipath: Make send buffers available for kernel if not allocated to user
      IB/ipath: Use PIO buffer for RC ACKs
      IB/ipath: Fix some white space and code style issues
      IB/ipath: Add support for 7220 receive queue changes
      IB/ipath: Fix up error handling
      IB/ipath: Header file changes to support IBA7220
      IB/ipath: HCA-specific code to support IBA7220
      IB/ipath: Add IBA7220-specific SERDES initialization data
      IB/ipath: Update copyright dates for files changed in 2008

Robert P. J. Day (3):
      IB: Use shorter list_splice_init() for brevity
      RDMA/nes: Use more concise list_for_each_entry()
      IB/ipath: Fix time comparison to use time_after_eq()

Roland Dreier (31):
      IB/mthca: Formatting cleanups
      IB/mlx4: Convert "if(foo)" to "if (foo)"
      mlx4_core: Move opening brace of function onto a new line
      RDMA/amso1100: Don't use 0UL as a NULL pointer
      RDMA/cxgb3: IDR IDs are signed
      IB: Make struct ib_uobject.id a signed int
      IB/ipath: Fix sparse warning about shadowed symbol
      IB/mlx4: Endianness annotations
      IB/cm: Endianness annotations
      RDMA/ucma: Endian annotation
      RDMA/nes: Trivial endianness annotations
      RDMA/nes: Delete unused variables
      RDMA/amso1100: Start of endianness annotation
      RDMA/amso1100: Endian annotate mqsq allocator
      mlx4_core: Fix confusion between mlx4_event and mlx4_dev_event enums
      IB/uverbs: Don't store struct file * for event files
      IB/uverbs: Use alloc_file() instead of get_empty_filp()
      RDMA/nes: Remove redundant NULL check in nes_unregister_ofa_device()
      RDMA/nes: Remove unused nes_netdev_exit() function
      RDMA/nes: Use proper format and cast to print dma_addr_t
      RDMA/nes: Make symbols used only in a single source file static
      IB/ehca: Make symbols used only in a single source file static
      IB/mthca: Avoid integer overflow when dealing with profile size
      IB/mthca: Avoid integer overflow when allocating huge ICM table
      IB/ipath: Fix PCI config write size used to clear linkctrl error bits
      RDMA/nes: Remove session_id from nes_cm stuff
      IB/mlx4: Micro-optimize mlx4_ib_post_send()
      IB/core: Add support for "send with invalidate" work requests
      RDMA/amso1100: Add support for "send with invalidate" work requests
      RDMA/nes: Free IRQ before killing tasklet
      IPoIB: Handle case when P_Key is deleted and re-added at same index

Stefan Roscher (1):
      IB/ehca: Support all ibv_devinfo values in query_device() and query_port()

Tom Tucker (1):
      RDMA/amso1100: Add check for NULL reply_msg in c2_intr()

Vladimir Sokolovsky (1):
      IB/mlx4: Add support for resizing CQs

 drivers/infiniband/core/cm.c                   |   63 +-
 drivers/infiniband/core/cma.c                  |    2 +-
 drivers/infiniband/core/fmr_pool.c             |    3 +-
 drivers/infiniband/core/ucma.c                 |    2 +-
 drivers/infiniband/core/uverbs.h               |    4 +-
 drivers/infiniband/core/uverbs_cmd.c           |   14 +-
 drivers/infiniband/core/uverbs_main.c          |   28 +-
 drivers/infiniband/core/verbs.c                |   14 +-
 drivers/infiniband/hw/amso1100/c2.c            |   80 +-
 drivers/infiniband/hw/amso1100/c2.h            |   16 +-
 drivers/infiniband/hw/amso1100/c2_ae.c         |   10 +-
 drivers/infiniband/hw/amso1100/c2_alloc.c      |   12 +-
 drivers/infiniband/hw/amso1100/c2_cq.c         |    4 +-
 drivers/infiniband/hw/amso1100/c2_intr.c       |    6 +-
 drivers/infiniband/hw/amso1100/c2_mm.c         |    2 +-
 drivers/infiniband/hw/amso1100/c2_mq.c         |    4 +-
 drivers/infiniband/hw/amso1100/c2_mq.h         |    2 +-
 drivers/infiniband/hw/amso1100/c2_provider.c   |   85 +-
 drivers/infiniband/hw/amso1100/c2_qp.c         |   30 +-
 drivers/infiniband/hw/amso1100/c2_rnic.c       |   31 +-
 drivers/infiniband/hw/amso1100/c2_vq.c         |    2 +-
 drivers/infiniband/hw/amso1100/c2_wr.h         |  212 +-
 drivers/infiniband/hw/cxgb3/cxio_dbg.c         |   24 +-
 drivers/infiniband/hw/cxgb3/cxio_hal.c         |   84 +-
 drivers/infiniband/hw/cxgb3/cxio_resource.c    |   12 +-
 drivers/infiniband/hw/cxgb3/iwch.c             |    6 +-
 drivers/infiniband/hw/cxgb3/iwch.h             |    2 +-
 drivers/infiniband/hw/cxgb3/iwch_cm.c          |  166 +-
 drivers/infiniband/hw/cxgb3/iwch_cm.h          |    4 +-
 drivers/infiniband/hw/cxgb3/iwch_cq.c          |    4 +-
 drivers/infiniband/hw/cxgb3/iwch_ev.c          |   12 +-
 drivers/infiniband/hw/cxgb3/iwch_mem.c         |    6 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.c    |   79 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.h    |    4 +-
 drivers/infiniband/hw/cxgb3/iwch_qp.c          |   42 +-
 drivers/infiniband/hw/ehca/ehca_av.c           |   31 -
 drivers/infiniband/hw/ehca/ehca_classes.h      |    2 -
 drivers/infiniband/hw/ehca/ehca_cq.c           |   19 -
 drivers/infiniband/hw/ehca/ehca_hca.c          |  129 +-
 drivers/infiniband/hw/ehca/ehca_main.c         |   19 +-
 drivers/infiniband/hw/ehca/ehca_mrmw.c         |   42 +-
 drivers/infiniband/hw/ehca/ehca_pd.c           |   11 -
 drivers/infiniband/hw/ehca/ehca_qp.c           |   51 +-
 drivers/infiniband/hw/ehca/ehca_reqs.c         |    2 +-
 drivers/infiniband/hw/ehca/ehca_tools.h        |   16 +-
 drivers/infiniband/hw/ehca/ehca_uverbs.c       |   19 -
 drivers/infiniband/hw/ipath/Makefile           |    3 +
 drivers/infiniband/hw/ipath/ipath_7220.h       |   57 +
 drivers/infiniband/hw/ipath/ipath_common.h     |   54 +-
 drivers/infiniband/hw/ipath/ipath_debug.h      |    2 +
 drivers/infiniband/hw/ipath/ipath_diag.c       |   35 +-
 drivers/infiniband/hw/ipath/ipath_driver.c     | 1041 +++++++---
 drivers/infiniband/hw/ipath/ipath_eeprom.c     |  428 ++++-
 drivers/infiniband/hw/ipath/ipath_file_ops.c   |  176 ++-
 drivers/infiniband/hw/ipath/ipath_iba6110.c    |   51 +-
 drivers/infiniband/hw/ipath/ipath_iba6120.c    |  203 ++-
 drivers/infiniband/hw/ipath/ipath_iba7220.c    | 2571 ++++++++++++++++++++++++
 drivers/infiniband/hw/ipath/ipath_init_chip.c  |  312 ++--
 drivers/infiniband/hw/ipath/ipath_intr.c       |  656 ++++---
 drivers/infiniband/hw/ipath/ipath_kernel.h     |  304 +++-
 drivers/infiniband/hw/ipath/ipath_mad.c        |  110 +-
 drivers/infiniband/hw/ipath/ipath_qp.c         |   59 +-
 drivers/infiniband/hw/ipath/ipath_rc.c         |   67 +-
 drivers/infiniband/hw/ipath/ipath_registers.h  |  168 +-
 drivers/infiniband/hw/ipath/ipath_ruc.c        |   22 +-
 drivers/infiniband/hw/ipath/ipath_sd7220.c     | 1462 ++++++++++++++
 drivers/infiniband/hw/ipath/ipath_sd7220_img.c | 1082 ++++++++++
 drivers/infiniband/hw/ipath/ipath_sdma.c       |  790 ++++++++
 drivers/infiniband/hw/ipath/ipath_srq.c        |    5 +-
 drivers/infiniband/hw/ipath/ipath_stats.c      |   33 +-
 drivers/infiniband/hw/ipath/ipath_sysfs.c      |  104 +-
 drivers/infiniband/hw/ipath/ipath_uc.c         |    8 +-
 drivers/infiniband/hw/ipath/ipath_ud.c         |    7 +-
 drivers/infiniband/hw/ipath/ipath_user_sdma.c  |  879 ++++++++
 drivers/infiniband/hw/ipath/ipath_user_sdma.h  |   54 +
 drivers/infiniband/hw/ipath/ipath_verbs.c      |  413 ++++-
 drivers/infiniband/hw/ipath/ipath_verbs.h      |   32 +-
 drivers/infiniband/hw/mlx4/cq.c                |  319 +++-
 drivers/infiniband/hw/mlx4/mad.c               |    2 +-
 drivers/infiniband/hw/mlx4/main.c              |   25 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h           |   15 +
 drivers/infiniband/hw/mlx4/qp.c                |  117 +-
 drivers/infiniband/hw/mthca/mthca_cmd.c        |    6 +-
 drivers/infiniband/hw/mthca/mthca_cmd.h        |    1 +
 drivers/infiniband/hw/mthca/mthca_cq.c         |   14 +-
 drivers/infiniband/hw/mthca/mthca_dev.h        |   14 +-
 drivers/infiniband/hw/mthca/mthca_eq.c         |    4 +-
 drivers/infiniband/hw/mthca/mthca_mad.c        |    2 +-
 drivers/infiniband/hw/mthca/mthca_main.c       |   15 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c    |    6 +-
 drivers/infiniband/hw/mthca/mthca_profile.c    |    4 +-
 drivers/infiniband/hw/mthca/mthca_profile.h    |    2 +-
 drivers/infiniband/hw/mthca/mthca_provider.c   |    5 +-
 drivers/infiniband/hw/mthca/mthca_qp.c         |   28 +-
 drivers/infiniband/hw/mthca/mthca_wqe.h        |   16 +-
 drivers/infiniband/hw/nes/nes.c                |   15 +-
 drivers/infiniband/hw/nes/nes.h                |   32 +-
 drivers/infiniband/hw/nes/nes_cm.c             |  131 +-
 drivers/infiniband/hw/nes/nes_cm.h             |   35 -
 drivers/infiniband/hw/nes/nes_hw.c             |   49 +-
 drivers/infiniband/hw/nes/nes_nic.c            |   26 +-
 drivers/infiniband/hw/nes/nes_utils.c          |    2 +-
 drivers/infiniband/hw/nes/nes_verbs.c          |   29 +-
 drivers/infiniband/ulp/ipoib/Makefile          |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib.h           |   10 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c        |   15 +-
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c   |   99 +
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |  126 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   33 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |    3 +
 drivers/infiniband/ulp/iser/iser_initiator.c   |    4 +-
 drivers/infiniband/ulp/iser/iser_verbs.c       |    5 +-
 drivers/infiniband/ulp/srp/ib_srp.c            |    7 +-
 drivers/net/mlx4/catas.c                       |    2 +-
 drivers/net/mlx4/cmd.c                         |    3 +-
 drivers/net/mlx4/cq.c                          |   72 +-
 drivers/net/mlx4/eq.c                          |    5 +-
 drivers/net/mlx4/fw.c                          |   13 +
 drivers/net/mlx4/fw.h                          |    1 +
 drivers/net/mlx4/intf.c                        |    8 +-
 drivers/net/mlx4/main.c                        |    6 +-
 drivers/net/mlx4/mcg.c                         |   12 +-
 drivers/net/mlx4/mlx4.h                        |    4 +-
 include/linux/mlx4/cmd.h                       |    2 +-
 include/linux/mlx4/cq.h                        |   19 +-
 include/linux/mlx4/device.h                    |    1 +
 include/linux/mlx4/driver.h                    |    3 +-
 include/linux/mlx4/qp.h                        |   15 +-
 include/rdma/ib_user_verbs.h                   |    5 +-
 include/rdma/ib_verbs.h                        |   35 +-
 net/sunrpc/xprtrdma/verbs.c                    |    1 -
 131 files changed, 11739 insertions(+), 2287 deletions(-)
 create mode 100644 drivers/infiniband/hw/ipath/ipath_7220.h
 create mode 100644 drivers/infiniband/hw/ipath/ipath_iba7220.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_sd7220.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_sd7220_img.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_sdma.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_user_sdma.c
 create mode 100644 drivers/infiniband/hw/ipath/ipath_user_sdma.h
 create mode 100644 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c


From andrea at qumranet.com  Thu Apr 17 08:51:57 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 17 Apr 2008 17:51:57 +0200
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
Message-ID: <20080417155157.GC17187@duo.random>

On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote:
> On Wed, 16 Apr 2008, Robin Holt wrote:
> 
> > I don't think this lock mechanism is completely working.  I have
> > gotten a few failures trying to dereference 0x100100 which appears to
> > be LIST_POISON1.
> 
> How does xpmem unregistering of notifiers work?

Especially are you using mmu_notifier_unregister?


From sean.hefty at intel.com  Thu Apr 17 08:58:36 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 17 Apr 2008 08:58:36 -0700
Subject: [ofa-general] Pending libibverbs patches?
In-Reply-To: <adawsmxgl8n.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com>
	<48045BF3.8040305@voltaire.com><adawsmynb8o.fsf@cisco.com>
	<4805AB90.6060702@voltaire.com><adamyntik8j.fsf@cisco.com><000201c8a00e$00f32460$7de0180a@amr.corp.intel.com>
	<adawsmxgl8n.fsf@cisco.com>
Message-ID: <000001c8a0a3$e5a7c530$9c98070a@amr.corp.intel.com>

>How about "iWARP ethernet NICs support RDMA over hardware-offloaded TCP"?

This is more descriptive and fine with me.


From sean.hefty at intel.com  Thu Apr 17 09:14:15 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 17 Apr 2008 09:14:15 -0700
Subject: [ofa-general] RE: CM goes to timewait state without waiting for
	disconnect reply
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D36D79@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C903D36D79@mtlexch01.mtl.com>
Message-ID: <000101c8a0a6$153306a0$9c98070a@amr.corp.intel.com>

> In the spec, a normal flow to close a connection is
> at the client side: State "Established" ---- send DREQ --->
> State "DREQ sent" --- receive DREP ---> State "TimeWait"  --->
> State "Idle"

Yes - the CM kernel code follows this state machine. 

> According to the code and tests I did, it seems that ib_cm doesn't
> wait for DREP and goes directly from "DREQ sent" into "TimeWait".

This can happen in specific situations, such as errors, if the user destroys the
cm_id without waiting for the DREP (treated as a DREQ timeout), or if both sides
initiate a DREQ.

> I think that this is a bug, am I right?

I don't see that the code follows the behavior that you're describing.

In ib_send_cm_dreq(), the cm_id state changes to DREQ_SENT.

In cm_drep_handler() (called when a DREP is received), the cm_id state is
verified to be DREQ_SENT, then transitioned to TIMEWAIT.

If you can describe the test details more, I can try to find the most likely
code path that's being hit.  It's possible that you're hitting one of the
situations mentioned above.

- Sean


From holt at sgi.com  Thu Apr 17 09:36:42 2008
From: holt at sgi.com (Robin Holt)
Date: Thu, 17 Apr 2008 11:36:42 -0500
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <20080417155157.GC17187@duo.random>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080417155157.GC17187@duo.random>
Message-ID: <20080417163642.GE11364@sgi.com>

On Thu, Apr 17, 2008 at 05:51:57PM +0200, Andrea Arcangeli wrote:
> On Wed, Apr 16, 2008 at 11:35:38AM -0700, Christoph Lameter wrote:
> > On Wed, 16 Apr 2008, Robin Holt wrote:
> > 
> > > I don't think this lock mechanism is completely working.  I have
> > > gotten a few failures trying to dereference 0x100100 which appears to
> > > be LIST_POISON1.
> > 
> > How does xpmem unregistering of notifiers work?
> 
> Especially are you using mmu_notifier_unregister?

In this case, we are not making the call to unregister, we are waiting
for the _release callout which has already removed it from the list.

In the event that the user has removed all the grants, we use unregister.
That typically does not occur.  We merely wait for exit processing to
clean up the structures.

Thanks,
Robin


From andrea at qumranet.com  Thu Apr 17 10:14:43 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 17 Apr 2008 19:14:43 +0200
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <20080417163642.GE11364@sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080417155157.GC17187@duo.random>
	<20080417163642.GE11364@sgi.com>
Message-ID: <20080417171443.GM17187@duo.random>

On Thu, Apr 17, 2008 at 11:36:42AM -0500, Robin Holt wrote:
> In this case, we are not making the call to unregister, we are waiting
> for the _release callout which has already removed it from the list.
> 
> In the event that the user has removed all the grants, we use unregister.
> That typically does not occur.  We merely wait for exit processing to
> clean up the structures.

Then it's very strange. LIST_POISON1 is set in n->next. If it was a
second hlist_del triggering the bug in theory list_poison2 should
trigger first, so perhaps it's really a notifier running despite a
mm_lock is taken? Could you post a full stack trace so I can see who's
running into LIST_POISON1? If it's really a notifier running outside
of some mm_lock that will be _immediately_ visible from the stack
trace that triggered the LIST_POISON1!

Also note, EMM isn't using the clean hlist_del, it's implementing list
by hand (with zero runtime gain) so all the debugging may not be
existent in EMM, so if it's really a mm_lock race, and it only
triggers with mmu notifiers and not with EMM, it doesn't necessarily
mean EMM is bug free. If you've a full stack trace it would greatly
help to verify what is mangling over the list when the oops triggers.

Thanks!
Andrea


From holt at sgi.com  Thu Apr 17 10:25:56 2008
From: holt at sgi.com (Robin Holt)
Date: Thu, 17 Apr 2008 12:25:56 -0500
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <20080417171443.GM17187@duo.random>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080417155157.GC17187@duo.random>
	<20080417163642.GE11364@sgi.com>
	<20080417171443.GM17187@duo.random>
Message-ID: <20080417172556.GF11364@sgi.com>

On Thu, Apr 17, 2008 at 07:14:43PM +0200, Andrea Arcangeli wrote:
> On Thu, Apr 17, 2008 at 11:36:42AM -0500, Robin Holt wrote:
> > In this case, we are not making the call to unregister, we are waiting
> > for the _release callout which has already removed it from the list.
> > 
> > In the event that the user has removed all the grants, we use unregister.
> > That typically does not occur.  We merely wait for exit processing to
> > clean up the structures.
> 
> Then it's very strange. LIST_POISON1 is set in n->next. If it was a
> second hlist_del triggering the bug in theory list_poison2 should
> trigger first, so perhaps it's really a notifier running despite a
> mm_lock is taken? Could you post a full stack trace so I can see who's
> running into LIST_POISON1? If it's really a notifier running outside
> of some mm_lock that will be _immediately_ visible from the stack
> trace that triggered the LIST_POISON1!
> 
> Also note, EMM isn't using the clean hlist_del, it's implementing list
> by hand (with zero runtime gain) so all the debugging may not be
> existent in EMM, so if it's really a mm_lock race, and it only
> triggers with mmu notifiers and not with EMM, it doesn't necessarily
> mean EMM is bug free. If you've a full stack trace it would greatly
> help to verify what is mangling over the list when the oops triggers.

The stack trace is below.  I did not do this level of testing on emm so
I can not compare the two in this area.

This is for a different, but equivalent failure.  I just reproduce the
LIST_POISON1 failure without trying to reproduce the exact same failure
as I had documented earlier (lost that stack trace, sorry).

Thanks,
Robin


<1>Unable to handle kernel paging request at virtual address 0000000000100100
<4>mpi006.f.x[23403]: Oops 11012296146944 [1]
<4>Modules linked in: nfs lockd sunrpc binfmt_misc thermal processor fan button loop md_mod dm_mod xpmem xp mspec sg
<4>
<4>Pid: 23403, CPU 114, comm:           mpi006.f.x
<4>psr : 0000121008526010 ifs : 800000000000038b ip  : [<a00000010015d6a1>]    Not tainted (2.6.25-rc8)
<4>ip is at __mmu_notifier_invalidate_range_start+0x81/0x120
<4>unat: 0000000000000000 pfs : 000000000000038b rsc : 0000000000000003
<4>rnat: a000000100149a00 bsps: a000000000010740 pr  : 66555666a9599aa9
<4>ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
<4>csd : 0000000000000000 ssd : 0000000000000000
<4>b0  : a00000010015d670 b6  : a0000002101ddb40 b7  : a00000010000eb50
<4>f6  : 1003e2222222222222222 f7  : 000000000000000000000
<4>f8  : 000000000000000000000 f9  : 000000000000000000000
<4>f10 : 000000000000000000000 f11 : 000000000000000000000
<4>r1  : a000000100ef1190 r2  : e0000e6080cc1940 r3  : a0000002101edd10
<4>r8  : e0000e6080cc1970 r9  : 0000000000000000 r10 : e0000e6080cc19c8
<4>r11 : 20000003a6480000 r12 : e0000c60d31efb90 r13 : e0000c60d31e0000
<4>r14 : 000000000000004d r15 : e0000e6080cc1914 r16 : e0000e6080cc1970
<4>r17 : 20000003a6480000 r18 : 20000007bf900000 r19 : 0000000000040000
<4>r20 : e0000c60d31e0000 r21 : 0000000000000010 r22 : e0000e6080cc19a8
<4>r23 : e0000c60c55f1120 r24 : e0000c60d31efda0 r25 : e0000c60d31efd98
<4>r26 : e0000e60812166d0 r27 : e0000c60d31efdc0 r28 : e0000c60d31efdb8
<4>r29 : e0000c60d31e0b60 r30 : 0000000000000000 r31 : 0000000000000081
<4>
<4>Call Trace:
<4> [<a000000100014a20>] show_stack+0x40/0xa0
<4>                                sp=e0000c60d31ef760 bsp=e0000c60d31e11f0
<4> [<a000000100015330>] show_regs+0x850/0x8a0
<4>                                sp=e0000c60d31ef930 bsp=e0000c60d31e1198
<4> [<a000000100035ed0>] die+0x1b0/0x2e0
<4>                                sp=e0000c60d31ef930 bsp=e0000c60d31e1150
<4> [<a000000100060e90>] ia64_do_page_fault+0x8d0/0xa40
<4>                                sp=e0000c60d31ef930 bsp=e0000c60d31e1100
<4> [<a00000010000ab00>] ia64_leave_kernel+0x0/0x270
<4>                                sp=e0000c60d31ef9c0 bsp=e0000c60d31e1100
<4> [<a00000010015d6a0>] __mmu_notifier_invalidate_range_start+0x80/0x120
<4>                                sp=e0000c60d31efb90 bsp=e0000c60d31e10a8
<4> [<a00000010011b1d0>] unmap_vmas+0x70/0x14c0
<4>                                sp=e0000c60d31efb90 bsp=e0000c60d31e0fa8
<4> [<a00000010011c660>] zap_page_range+0x40/0x60
<4>                                sp=e0000c60d31efda0 bsp=e0000c60d31e0f70
<4> [<a0000002101d62d0>] xpmem_clear_PTEs+0x350/0x560 [xpmem]
<4>                                sp=e0000c60d31efdb0 bsp=e0000c60d31e0ef0
<4> [<a0000002101d1e30>] xpmem_remove_seg+0x3f0/0x700 [xpmem]
<4>                                sp=e0000c60d31efde0 bsp=e0000c60d31e0ea8
<4> [<a0000002101d2500>] xpmem_remove_segs_of_tg+0x80/0x140 [xpmem]
<4>                                sp=e0000c60d31efe10 bsp=e0000c60d31e0e78
<4> [<a0000002101dda40>] xpmem_mmu_notifier_release+0x40/0x80 [xpmem]
<4>                                sp=e0000c60d31efe10 bsp=e0000c60d31e0e58
<4> [<a00000010015d7f0>] __mmu_notifier_release+0xb0/0x100
<4>                                sp=e0000c60d31efe10 bsp=e0000c60d31e0e38
<4> [<a000000100124430>] exit_mmap+0x50/0x180
<4>                                sp=e0000c60d31efe10 bsp=e0000c60d31e0e10
<4> [<a00000010008fb30>] mmput+0x70/0x180
<4>                                sp=e0000c60d31efe20 bsp=e0000c60d31e0dd8
<4> [<a000000100098df0>] exit_mm+0x1f0/0x220
<4>                                sp=e0000c60d31efe20 bsp=e0000c60d31e0da0
<4> [<a00000010009ca60>] do_exit+0x4e0/0xf40
<4>                                sp=e0000c60d31efe20 bsp=e0000c60d31e0d58
<4> [<a00000010009d640>] do_group_exit+0x180/0x1c0
<4>                                sp=e0000c60d31efe30 bsp=e0000c60d31e0d20
<4> [<a00000010009d6a0>] sys_exit_group+0x20/0x40
<4>                                sp=e0000c60d31efe30 bsp=e0000c60d31e0cc8
<4> [<a00000010000a960>] ia64_ret_from_syscall+0x0/0x20
<4>                                sp=e0000c60d31efe30 bsp=e0000c60d31e0cc8
<4> [<a000000000010720>] __kernel_syscall_via_break+0x0/0x20
<4>                                sp=e0000c60d31f0000 bsp=e0000c60d31e0cc8


From terrywatson at live.com  Thu Apr 17 11:21:52 2008
From: terrywatson at live.com (terry watson)
Date: Thu, 17 Apr 2008 18:21:52 +0000
Subject: [ofa-general] Is IBIS only for querying OpenSM?
Message-ID: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>


Hi all,

I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?

I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.

Thanks,
Dave
_________________________________________________________________
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE

From amirv at mellanox.co.il  Thu Apr 17 11:25:56 2008
From: amirv at mellanox.co.il (Amir Vadai)
Date: Thu, 17 Apr 2008 21:25:56 +0300
Subject: [ofa-general] RE: CM goes to timewait state without waiting for
	disconnect reply
In-Reply-To: <000101c8a0a6$153306a0$9c98070a@amr.corp.intel.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36E8D@mtlexch01.mtl.com>

When the client closes the connection it calls ib_destroy_cm_id() who calls cm_destroy_id().
In my scenario it happen when the CM is in state "Established". In this state ib_send_cm_dreq() is called.
This function sends a DREQ and change state to "DREQ sent".
After that the function returns and the switch is tried again this time we're in state "DREQ sent".
There the state is changed into "TimeWait".

It means that when calling ib_destroy_cm_id() - the CM sends a DREQ and goes immediately to state "TimeWait" without waiting for DREP.

It looks like it is the most usual situation and not a special one.

I'm looking at the code from the head of ofed git in openfabrics.

- Amir

-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com] 
Sent: ה 17 אפריל 2008 19:14
To: Amir Vadai
Cc: general at lists.openfabrics.org
Subject: RE: CM goes to timewait state without waiting for disconnect reply

> In the spec, a normal flow to close a connection is at the client 
> side: State "Established" ---- send DREQ ---> State "DREQ sent" --- 
> receive DREP ---> State "TimeWait"  ---> State "Idle"

Yes - the CM kernel code follows this state machine. 

> According to the code and tests I did, it seems that ib_cm doesn't 
> wait for DREP and goes directly from "DREQ sent" into "TimeWait".

This can happen in specific situations, such as errors, if the user destroys the cm_id without waiting for the DREP (treated as a DREQ timeout), or if both sides initiate a DREQ.

> I think that this is a bug, am I right?

I don't see that the code follows the behavior that you're describing.

In ib_send_cm_dreq(), the cm_id state changes to DREQ_SENT.

In cm_drep_handler() (called when a DREP is received), the cm_id state is verified to be DREQ_SENT, then transitioned to TIMEWAIT.

If you can describe the test details more, I can try to find the most likely code path that's being hit.  It's possible that you're hitting one of the situations mentioned above.

- Sean


From sean.hefty at intel.com  Thu Apr 17 11:34:16 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 17 Apr 2008 11:34:16 -0700
Subject: [ofa-general] RE: CM goes to timewait state without waiting for
	disconnect reply
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D36E8D@mtlexch01.mtl.com>
References: <000101c8a0a6$153306a0$9c98070a@amr.corp.intel.com>
	<6C2C79E72C305246B504CBA17B5500C903D36E8D@mtlexch01.mtl.com>
Message-ID: <000201c8a0b9$a4638a80$9c98070a@amr.corp.intel.com>

>When the client closes the connection it calls ib_destroy_cm_id() who calls
>cm_destroy_id().
>In my scenario it happen when the CM is in state "Established". In this state
>ib_send_cm_dreq() is called.
>This function sends a DREQ and change state to "DREQ sent".
>After that the function returns and the switch is tried again this time we're
>in state "DREQ sent".
>There the state is changed into "TimeWait".

Yes - this will result in transitioning into timewait immediately after sending
the DREQ.  By destroying the cm_id, the user has indicated that they do not want
to wait for a DREP, nor do they care about when timewait has exited.

If a DREQ is received while the cm_id is in timewait, it will generate a DREP in
response.  DREP messages while in timewait are simply dropped.

What exactly is the problem that you're seeing?

- Sean


From amirv at mellanox.co.il  Thu Apr 17 11:41:03 2008
From: amirv at mellanox.co.il (Amir Vadai)
Date: Thu, 17 Apr 2008 21:41:03 +0300
Subject: [ofa-general] RE: CM goes to timewait state without waiting for
	disconnect reply
In-Reply-To: <000201c8a0b9$a4638a80$9c98070a@amr.corp.intel.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36E90@mtlexch01.mtl.com>

There are some problems that I hope related to that.

But the one I know for sure is:
I got a very busy SDP server with lots of connections coming up and down.
And a client with many threads that open and close connections.

What I see is that a connection request is coming from the client to the server
And the server reply with reject - the reason for the reject is that a timewait structure
already exists for this QPN. And that's because the client thinks that a connection is closed and reuse the QPN but the server didn't finish cleaning up the connection.

In the bottom line - I get a reject on SDP socket open.

- Amir 

-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com] 
Sent: ה 17 אפריל 2008 21:34
To: Amir Vadai
Cc: general at lists.openfabrics.org; Oren Duer
Subject: RE: CM goes to timewait state without waiting for disconnect reply

>When the client closes the connection it calls ib_destroy_cm_id() who 
>calls cm_destroy_id().
>In my scenario it happen when the CM is in state "Established". In this 
>state
>ib_send_cm_dreq() is called.
>This function sends a DREQ and change state to "DREQ sent".
>After that the function returns and the switch is tried again this time 
>we're in state "DREQ sent".
>There the state is changed into "TimeWait".

Yes - this will result in transitioning into timewait immediately after sending the DREQ.  By destroying the cm_id, the user has indicated that they do not want to wait for a DREP, nor do they care about when timewait has exited.

If a DREQ is received while the cm_id is in timewait, it will generate a DREP in response.  DREP messages while in timewait are simply dropped.

What exactly is the problem that you're seeing?

- Sean


From sean.hefty at intel.com  Thu Apr 17 11:51:36 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Thu, 17 Apr 2008 11:51:36 -0700
Subject: [ofa-general] RE: CM goes to timewait state without waiting for
	disconnect reply
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D36E90@mtlexch01.mtl.com>
References: <000201c8a0b9$a4638a80$9c98070a@amr.corp.intel.com>
	<6C2C79E72C305246B504CBA17B5500C903D36E90@mtlexch01.mtl.com>
Message-ID: <000301c8a0bc$1058a5c0$9c98070a@amr.corp.intel.com>

>What I see is that a connection request is coming from the client to the server
>And the server reply with reject - the reason for the reject is that a timewait
>structure
>already exists for this QPN. And that's because the client thinks that a
>connection is closed and reuse the QPN but the server didn't finish cleaning up
>the connection.

This is an unavoidable situation.  There's no coordination between the timewait
states on different systems, so it's always possible for one to re-connect
before the other system has exited timewait.

However, in your case, the problem is that the client is trying to re-use the
QPN outside of knowing when it has exited the local timewait state.  Instead,
have the client issue a DREQ, and then wait for the timewait state to exit
before trying to re-use the QPN.

This would then be the sequence:

client		server
sends DREQ
			enters timewait
			sends DREP
enters timewait
exits timewait
destroy cm_id
new connection

Your hope at this point is that the server exits timewait before the client
will, while, likely, is not guaranteed.

- Sean


From clameter at sgi.com  Thu Apr 17 12:10:52 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Thu, 17 Apr 2008 12:10:52 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
 mmu related operation to happen
In-Reply-To: <20080417171443.GM17187@duo.random>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080417155157.GC17187@duo.random> <20080417163642.GE11364@sgi.com>
	<20080417171443.GM17187@duo.random>
Message-ID: <Pine.LNX.4.64.0804171202420.23938@schroedinger.engr.sgi.com>

On Thu, 17 Apr 2008, Andrea Arcangeli wrote:

> Also note, EMM isn't using the clean hlist_del, it's implementing list
> by hand (with zero runtime gain) so all the debugging may not be
> existent in EMM, so if it's really a mm_lock race, and it only
> triggers with mmu notifiers and not with EMM, it doesn't necessarily
> mean EMM is bug free. If you've a full stack trace it would greatly
> help to verify what is mangling over the list when the oops triggers.

EMM was/is using a single linked list which allows atomic updates. Looked 
cleaner to me since doubly linked list must update two pointers.

I have not seen docs on the locking so not sure why you use rcu 
operations here? Isnt the requirement to have either rmap locks or 
mmap_sem held enough to guarantee the consistency of the doubly linked list?


From amirv at mellanox.co.il  Thu Apr 17 12:49:41 2008
From: amirv at mellanox.co.il (Amir Vadai)
Date: Thu, 17 Apr 2008 22:49:41 +0300
Subject: [ofa-general] RE: CM goes to timewait state without waiting for
	disconnect reply
In-Reply-To: <000301c8a0bc$1058a5c0$9c98070a@amr.corp.intel.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C903D36EA3@mtlexch01.mtl.com>

I understand - I'll make sure the flow you described will be used.

Thanks a lot,
- Amir.


-----Original Message-----
From: Sean Hefty [mailto:sean.hefty at intel.com] 
Sent: ה 17 אפריל 2008 21:52
To: Amir Vadai
Cc: general at lists.openfabrics.org; Oren Duer
Subject: RE: CM goes to timewait state without waiting for disconnect reply

>What I see is that a connection request is coming from the client to 
>the server And the server reply with reject - the reason for the reject 
>is that a timewait structure already exists for this QPN. And that's 
>because the client thinks that a connection is closed and reuse the QPN 
>but the server didn't finish cleaning up the connection.

This is an unavoidable situation.  There's no coordination between the timewait states on different systems, so it's always possible for one to re-connect before the other system has exited timewait.

However, in your case, the problem is that the client is trying to re-use the QPN outside of knowing when it has exited the local timewait state.  Instead, have the client issue a DREQ, and then wait for the timewait state to exit before trying to re-use the QPN.

This would then be the sequence:

client		server
sends DREQ
			enters timewait
			sends DREP
enters timewait
exits timewait
destroy cm_id
new connection

Your hope at this point is that the server exits timewait before the client will, while, likely, is not guaranteed.

- Sean


From andrea at qumranet.com  Thu Apr 17 15:16:55 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Fri, 18 Apr 2008 00:16:55 +0200
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <Pine.LNX.4.64.0804171202420.23938@schroedinger.engr.sgi.com>
References: <patchbomb.1207669443@duo.random>
	<ec6d8f91b299cf26cce5.1207669444@duo.random>
	<20080416163337.GJ22493@sgi.com>
	<Pine.LNX.4.64.0804161134360.12296@schroedinger.engr.sgi.com>
	<20080417155157.GC17187@duo.random>
	<20080417163642.GE11364@sgi.com>
	<20080417171443.GM17187@duo.random>
	<Pine.LNX.4.64.0804171202420.23938@schroedinger.engr.sgi.com>
Message-ID: <20080417221655.GA9287@duo.random>

On Thu, Apr 17, 2008 at 12:10:52PM -0700, Christoph Lameter wrote:
> EMM was/is using a single linked list which allows atomic updates. Looked 
> cleaner to me since doubly linked list must update two pointers.

Cleaner would be if it would provide an abstraction in list.h. The
important is the memory taken by the head for this usage.

> I have not seen docs on the locking so not sure why you use rcu 
> operations here? Isnt the requirement to have either rmap locks or 
> mmap_sem held enough to guarantee the consistency of the doubly linked list?

Yes, exactly, I'm not using rcu anymore.


From terence.whitfield at qenos.com  Thu Apr 17 17:55:26 2008
From: terence.whitfield at qenos.com (Julio Trevino)
Date: Thu, 17 Apr 2008 19:55:26 -0500
Subject: [ofa-general] Re: Re: Hello think once, do all night
Message-ID: <01c8a0c4$faee1400$39260dbe@terence.whitfield>

Be the best partner ever.
With our tabl.ts you could take your girlfriend to the love heaven!

Loot file in attach - get online store link and information!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ffile.zip
Type: application/zip
Size: 329 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080417/81f97e1d/attachment.zip>

From yghplm at blueisle.com  Thu Apr 17 18:06:46 2008
From: yghplm at blueisle.com (Thad Wilkerson)
Date: Thu, 17 Apr 2008 20:06:46 -0500
Subject: [ofa-general] Hello Best shoes ever!
Message-ID: <01c8a0c6$9000cf00$01382bbe@yghplm>

Best footwear of all times!
Boss, Christian Dior and Paul Smith!

Stay tuned! VIEW ATTACHED FILE FOR DETAILS!!!!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: details.zip
Type: application/zip
Size: 433 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080417/84f57396/attachment.zip>

From nemamnn at sorry.cz  Thu Apr 17 20:12:11 2008
From: nemamnn at sorry.cz (feliks hao)
Date: Fri, 18 Apr 2008 03:12:11 +0000
Subject: [ofa-general] we caught you naked general! check the video
Message-ID: <000801c8a110$0154fc7f$efbc82ad@sxdnnh>

Watch it :)
PQhlfezcue
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080418/5f823a81/attachment.html>

From harsha at zresearch.com  Fri Apr 18 00:34:35 2008
From: harsha at zresearch.com (Harshavardhana)
Date: Fri, 18 Apr 2008 13:04:35 +0530 (IST)
Subject: [ofa-general] libibverbs-1.1: issue RLIMIT_MEMLOCK
Message-ID: <12302.220.227.64.166.1208504075.squirrel@zresearch.com>

Hi Openfabrics,

   A question or bug probably, after upgrading to new OFED with
libibverbs-1.1 release. I experienced problems with running Fluent CFD
application with HP-MPI 2.2.5.1 as i saw the libibverbs initialization
failed due to library
throws an error saying maximum pinnable memory i.e memlock insufficient.

"ibv_create_qp failed"
"Unable to Initialize RDMA device".

I didn't have this problem in earlier versions. I fixed this by changing
the hardlimit/softlimit to more than 32k which was default on my system.
But
i am thinking why does the RDMA initialization fails if it's 32k which
didn't happen for libibverbs version 1.0.4. It should throw a warning
according to the check_memlock function in the libibverbs source
directory. But that's not happening in turn the ibv_create_qp is failing,
is it not better to set the rlimit by the library itself using
setrlimit().

This looks to be a change with the ConnectX IB 4th Gen Infiniband hardware
in place that libibverbs is requesting memlock to be more than 32k.

Regards & Thanks
-- 
Harshavardhana
"Software gets slower faster as Hardware gets faster"


From philippe.gregoire at cea.fr  Fri Apr 18 00:35:42 2008
From: philippe.gregoire at cea.fr (Philippe Gregoire)
Date: Fri, 18 Apr 2008 09:35:42 +0200
Subject: [ofa-general] Is IBIS only for querying OpenSM?
In-Reply-To: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
References: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
Message-ID: <48084F4E.3020705@cea.fr>

terry watson a e'crit :
> Hi all,
>
> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?
>
> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.
>
> Thanks,
> Dave
> _________________________________________________________________
> Discover the new Windows Vista
> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>   
The partitions are only managed by the subnet manager - either opensm 
running on a node into the fabric or an embedded subnet manager on a switch.
For opensm , partitions are defined into a configuration file 
/etc/opensm/partitions.conf, for a embedded subnet manager, you have to 
configure the partitions using the CLI or GUI provided by the switch.
Defining a partition is mainly choosing a pkey and ports nodes with 
their membership (limited or not).

The subnet manager assigned the pkeys to the ports of the node when ib 
kernel modules are loaded. You can see the partitions the IB port belong 
to by ( I mean those defined by the subnet manager) :
# grep -v 0x0000 /sys/class/infiniband/mthca0/ports/1/pkeys/*
/sys/class/infiniband/mthca0/ports/1/pkeys/0:0xffff
/sys/class/infiniband/mthca0/ports/1/pkeys/1:0x8001
/sys/class/infiniband/mthca0/ports/1/pkeys/2:0x8002
/sys/class/infiniband/mthca0/ports/1/pkeys/3:0x8003
/sys/class/infiniband/mthca0/ports/1/pkeys/4:0x8010

A port may belong to many partitions. Nodes (ports) may have different 
partitions configurations. Partitions order for a port is not always the 
same ( it may depend on the chronology of partition declarations in the 
subnet manager)

Over these partitions, you can define new IP (IP over IB) interfaces by 
creating files like /etc/sysconfig/network-scripts/ifcfg-ib0.8002 :
# cat /etc/sysconfig/network-scripts/ifcfg-ib0.8002
DEVICE=ib0.8002
BOOTPROTO=static
IPADDR=XXX.YYY.ZZZ.TTT
NETMASK=255.255.255.0
NETWORK=255.255.255.0
ONBOOT=yes

The openibd script create the child interface and configure it at system 
startup using some special devices to do that :
echo $pkey > /sys/class/net/ib0/create_child

But this command creates only a child interface on the node, but 
communications on this interface will not work until you add the port 
node to the corresponding partition into the subnet manager 
configuration. Then you will see the pkey appearing automatically  into 
files /sys/class/infiniband/mthca0/ports/1/pkeys/*  on the node.

[root at cors118 ~]# echo 0x8009 >  /sys/class/net/ib0/create_child
[root at cors118 ~]# dmesg | grep 8009
divert: not allocating divert_blk for non-ethernet device ib0.8009
[root at cors118 ~]# grep -v 0x0000 
/sys/class/infiniband/mthca0/ports/1/pkeys/*
/sys/class/infiniband/mthca0/ports/1/pkeys/0:0xffff
/sys/class/infiniband/mthca0/ports/1/pkeys/1:0x8001
/sys/class/infiniband/mthca0/ports/1/pkeys/2:0x8002
/sys/class/infiniband/mthca0/ports/1/pkeys/3:0x8003
/sys/class/infiniband/mthca0/ports/1/pkeys/4:0x8010
[root at cors118 ~]# ifconfig -a | grep 8009
ib0.8009  Link encap:UNSPEC  HWaddr 
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
[root at cors118 ~]# echo 0x8009 >  /sys/class/net/ib0/delete_child
[root at cors118 ~]# dmesg | grep 8009
divert: not allocating divert_blk for non-ethernet device ib0.8009
divert: no divert_blk to free, ib0.8009 not ethernet

To use MPI with partitions, you have also to configure it (in the 
configuration file) . For MVAPICH you must use  VIADEV_DEFAULT_PKEY_IX 
or VIADEV_DEFAULT_PKEY in the config file : 
/usr/mpi/gcc/mvapich-1.0.0/etc/mvapich.conf . AT CEA, I'm using  
VIADEV_DEFAULT_PKEY (pkey value)
as we have nodes with different partitions configurations.


Hoping this will help you.
Regards
Philippe Gregoire CEA/DAM


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080418/534584d2/attachment.html>

From yevgenyp at mellanox.co.il  Fri Apr 18 05:16:39 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Fri, 18 Apr 2008 15:16:39 +0300
Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP
	support, Patch 1)
Message-ID: <48089127.2040905@mellanox.co.il>

>From ca3eb5aef54025b11c1f0b4d0abe9eef8e349048 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Thu, 17 Apr 2008 15:38:17 +0300
Subject: [PATCH] mlx4: Moving db management to mlx4_core

mlx4_ib is no longer the only customer of mlx4_core.
Thus the doorbell allocation was moved to the low level driver
(same as buffer allocation).

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c       |    6 +-
 drivers/infiniband/hw/mlx4/doorbell.c |  131 +---------------------------
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   30 +-----
 drivers/infiniband/hw/mlx4/qp.c       |    7 +-
 drivers/infiniband/hw/mlx4/srq.c      |    6 +-
 drivers/net/mlx4/alloc.c              |  157 +++++++++++++++++++++++++++++++++
 drivers/net/mlx4/main.c               |    3 +
 drivers/net/mlx4/mlx4.h               |    3 +
 include/linux/mlx4/device.h           |   50 +++++++++++
 9 files changed, 231 insertions(+), 162 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 3557e7e..3e7e6fe 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -204,7 +204,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector

 		uar = &to_mucontext(context)->uar;
 	} else {
-		err = mlx4_ib_db_alloc(dev, &cq->db, 1);
+		err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, &cq->db, 1);
 		if (err)
 			goto err_cq;

@@ -250,7 +250,7 @@ err_mtt:

 err_db:
 	if (!context)
-		mlx4_ib_db_free(dev, &cq->db);
+		mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &cq->db);

 err_cq:
 	kfree(cq);
@@ -435,7 +435,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq)
 		ib_umem_release(mcq->umem);
 	} else {
 		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1);
-		mlx4_ib_db_free(dev, &mcq->db);
+		mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &mcq->db);
 	}

 	kfree(mcq);
diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c
index 1c36087..d17b36b 100644
--- a/drivers/infiniband/hw/mlx4/doorbell.c
+++ b/drivers/infiniband/hw/mlx4/doorbell.c
@@ -34,135 +34,10 @@

 #include "mlx4_ib.h"

-struct mlx4_ib_db_pgdir {
-	struct list_head	list;
-	DECLARE_BITMAP(order0, MLX4_IB_DB_PER_PAGE);
-	DECLARE_BITMAP(order1, MLX4_IB_DB_PER_PAGE / 2);
-	unsigned long	       *bits[2];
-	__be32		       *db_page;
-	dma_addr_t		db_dma;
-};
-
-static struct mlx4_ib_db_pgdir *mlx4_ib_alloc_db_pgdir(struct mlx4_ib_dev *dev)
-{
-	struct mlx4_ib_db_pgdir *pgdir;
-
-	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
-	if (!pgdir)
-		return NULL;
-
-	bitmap_fill(pgdir->order1, MLX4_IB_DB_PER_PAGE / 2);
-	pgdir->bits[0] = pgdir->order0;
-	pgdir->bits[1] = pgdir->order1;
-	pgdir->db_page = dma_alloc_coherent(dev->ib_dev.dma_device,
-					    PAGE_SIZE, &pgdir->db_dma,
-					    GFP_KERNEL);
-	if (!pgdir->db_page) {
-		kfree(pgdir);
-		return NULL;
-	}
-
-	return pgdir;
-}
-
-static int mlx4_ib_alloc_db_from_pgdir(struct mlx4_ib_db_pgdir *pgdir,
-				       struct mlx4_ib_db *db, int order)
-{
-	int o;
-	int i;
-
-	for (o = order; o <= 1; ++o) {
-		i = find_first_bit(pgdir->bits[o], MLX4_IB_DB_PER_PAGE >> o);
-		if (i < MLX4_IB_DB_PER_PAGE >> o)
-			goto found;
-	}
-
-	return -ENOMEM;
-
-found:
-	clear_bit(i, pgdir->bits[o]);
-
-	i <<= o;
-
-	if (o > order)
-		set_bit(i ^ 1, pgdir->bits[order]);
-
-	db->u.pgdir = pgdir;
-	db->index   = i;
-	db->db      = pgdir->db_page + db->index;
-	db->dma     = pgdir->db_dma  + db->index * 4;
-	db->order   = order;
-
-	return 0;
-}
-
-int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order)
-{
-	struct mlx4_ib_db_pgdir *pgdir;
-	int ret = 0;
-
-	mutex_lock(&dev->pgdir_mutex);
-
-	list_for_each_entry(pgdir, &dev->pgdir_list, list)
-		if (!mlx4_ib_alloc_db_from_pgdir(pgdir, db, order))
-			goto out;
-
-	pgdir = mlx4_ib_alloc_db_pgdir(dev);
-	if (!pgdir) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	list_add(&pgdir->list, &dev->pgdir_list);
-
-	/* This should never fail -- we just allocated an empty page: */
-	WARN_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order));
-
-out:
-	mutex_unlock(&dev->pgdir_mutex);
-
-	return ret;
-}
-
-void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db)
-{
-	int o;
-	int i;
-
-	mutex_lock(&dev->pgdir_mutex);
-
-	o = db->order;
-	i = db->index;
-
-	if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) {
-		clear_bit(i ^ 1, db->u.pgdir->order0);
-		++o;
-	}
-
-	i >>= o;
-	set_bit(i, db->u.pgdir->bits[o]);
-
-	if (bitmap_full(db->u.pgdir->order1, MLX4_IB_DB_PER_PAGE / 2)) {
-		dma_free_coherent(dev->ib_dev.dma_device, PAGE_SIZE,
-				  db->u.pgdir->db_page, db->u.pgdir->db_dma);
-		list_del(&db->u.pgdir->list);
-		kfree(db->u.pgdir);
-	}
-
-	mutex_unlock(&dev->pgdir_mutex);
-}
-
-struct mlx4_ib_user_db_page {
-	struct list_head	list;
-	struct ib_umem	       *umem;
-	unsigned long		user_virt;
-	int			refcnt;
-};
-
 int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
-			struct mlx4_ib_db *db)
+			struct mlx4_db *db)
 {
-	struct mlx4_ib_user_db_page *page;
+	struct mlx4_user_db_page *page;
 	struct ib_umem_chunk *chunk;
 	int err = 0;

@@ -202,7 +77,7 @@ out:
 	return err;
 }

-void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db)
+void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db)
 {
 	mutex_lock(&context->db_page_mutex);

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 9e63732..e7514e4 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -43,24 +43,6 @@
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/doorbell.h>

-enum {
-	MLX4_IB_DB_PER_PAGE	= PAGE_SIZE / 4
-};
-
-struct mlx4_ib_db_pgdir;
-struct mlx4_ib_user_db_page;
-
-struct mlx4_ib_db {
-	__be32		       *db;
-	union {
-		struct mlx4_ib_db_pgdir	       *pgdir;
-		struct mlx4_ib_user_db_page    *user_page;
-	}			u;
-	dma_addr_t		dma;
-	int			index;
-	int			order;
-};
-
 struct mlx4_ib_ucontext {
 	struct ib_ucontext	ibucontext;
 	struct mlx4_uar		uar;
@@ -88,7 +70,7 @@ struct mlx4_ib_cq {
 	struct mlx4_cq		mcq;
 	struct mlx4_ib_cq_buf	buf;
 	struct mlx4_ib_cq_resize *resize_buf;
-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	spinlock_t		lock;
 	struct mutex		resize_mutex;
 	struct ib_umem	       *umem;
@@ -127,7 +109,7 @@ struct mlx4_ib_qp {
 	struct mlx4_qp		mqp;
 	struct mlx4_buf		buf;

-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	struct mlx4_ib_wq	rq;

 	u32			doorbell_qpn;
@@ -154,7 +136,7 @@ struct mlx4_ib_srq {
 	struct ib_srq		ibsrq;
 	struct mlx4_srq		msrq;
 	struct mlx4_buf		buf;
-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	u64		       *wrid;
 	spinlock_t		lock;
 	int			head;
@@ -248,11 +230,9 @@ static inline struct mlx4_ib_ah *to_mah(struct ib_ah *ibah)
 	return container_of(ibah, struct mlx4_ib_ah, ibah);
 }

-int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order);
-void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db);
 int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
-			struct mlx4_ib_db *db);
-void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db);
+			struct mlx4_db *db);
+void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db);

 struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc);
 int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index b75efae..e65b8e4 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -514,7 +514,8 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err;

 		if (!init_attr->srq) {
-			err = mlx4_ib_db_alloc(dev, &qp->db, 0);
+			err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device,
+					    &qp->db, 0);
 			if (err)
 				goto err;

@@ -580,7 +581,7 @@ err_buf:

 err_db:
 	if (!pd->uobject && !init_attr->srq)
-		mlx4_ib_db_free(dev, &qp->db);
+		mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &qp->db);

 err:
 	return err;
@@ -666,7 +667,7 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 		kfree(qp->rq.wrid);
 		mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf);
 		if (!qp->ibqp.srq)
-			mlx4_ib_db_free(dev, &qp->db);
+			mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &qp->db);
 	}
 }

diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index beaa3b0..936dc88 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -129,7 +129,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		if (err)
 			goto err_mtt;
 	} else {
-		err = mlx4_ib_db_alloc(dev, &srq->db, 0);
+		err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, &srq->db, 0);
 		if (err)
 			goto err_srq;

@@ -200,7 +200,7 @@ err_buf:

 err_db:
 	if (!pd->uobject)
-		mlx4_ib_db_free(dev, &srq->db);
+		mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &srq->db);

 err_srq:
 	kfree(srq);
@@ -267,7 +267,7 @@ int mlx4_ib_destroy_srq(struct ib_srq *srq)
 		kfree(msrq->wrid);
 		mlx4_buf_free(dev->dev, msrq->msrq.max << msrq->msrq.wqe_shift,
 			      &msrq->buf);
-		mlx4_ib_db_free(dev, &msrq->db);
+		mlx4_db_free(dev->dev, dev->ib_dev.dma_device, &msrq->db);
 	}

 	kfree(msrq);
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 75ef9d0..b6b00eb 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -196,3 +196,160 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf)
 	}
 }
 EXPORT_SYMBOL_GPL(mlx4_buf_free);
+
+static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
+{
+	struct mlx4_db_pgdir *pgdir;
+
+	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
+	if (!pgdir)
+		return NULL;
+
+	bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2);
+	pgdir->bits[0] = pgdir->order0;
+	pgdir->bits[1] = pgdir->order1;
+	pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE,
+					    &pgdir->db_dma, GFP_KERNEL);
+	if (!pgdir->db_page) {
+		kfree(pgdir);
+		return NULL;
+	}
+
+	return pgdir;
+}
+
+static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
+				    struct mlx4_db *db, int order)
+{
+	int o;
+	int i;
+
+	for (o = order; o <= 1; ++o) {
+		i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
+		if (i < MLX4_DB_PER_PAGE >> o)
+			goto found;
+	}
+
+	return -ENOMEM;
+
+found:
+	clear_bit(i, pgdir->bits[o]);
+
+	i <<= o;
+
+	if (o > order)
+		set_bit(i ^ 1, pgdir->bits[order]);
+
+	db->u.pgdir = pgdir;
+	db->index   = i;
+	db->db      = pgdir->db_page + db->index;
+	db->dma     = pgdir->db_dma  + db->index * 4;
+	db->order   = order;
+
+	return 0;
+}
+
+int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device,
+		  struct mlx4_db *db, int order)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_db_pgdir *pgdir;
+	int ret = 0;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	list_for_each_entry(pgdir, &priv->pgdir_list, list)
+		if (!mlx4_alloc_db_from_pgdir(pgdir, db, order))
+			goto out;
+
+	pgdir = mlx4_alloc_db_pgdir(dma_device);
+	if (!pgdir) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	list_add(&pgdir->list, &priv->pgdir_list);
+
+	/* This should never fail -- we just allocated an empty page: */
+	WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order));
+
+out:
+	mutex_unlock(&priv->pgdir_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_db_alloc);
+
+void mlx4_db_free(struct mlx4_dev *dev, struct device *dma_device,
+		  struct mlx4_db *db)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int o;
+	int i;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	o = db->order;
+	i = db->index;
+
+	if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) {
+		clear_bit(i ^ 1, db->u.pgdir->order0);
+		++o;
+	}
+	i >>= o;
+	set_bit(i, db->u.pgdir->bits[o]);
+
+	if (bitmap_full(db->u.pgdir->order1, MLX4_DB_PER_PAGE / 2)) {
+		dma_free_coherent(dma_device, PAGE_SIZE,
+				  db->u.pgdir->db_page, db->u.pgdir->db_dma);
+		list_del(&db->u.pgdir->list);
+		kfree(db->u.pgdir);
+	}
+
+	mutex_unlock(&priv->pgdir_mutex);
+}
+EXPORT_SYMBOL_GPL(mlx4_db_free);
+
+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       struct device *dma_device, int size, int max_direct)
+{
+	int err;
+
+	err = mlx4_db_alloc(dev, dma_device, &wqres->db, 1);
+	if (err)
+		return err;
+	*wqres->db.db = 0;
+
+	if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) {
+		err = -ENOMEM;
+		goto err_db;
+	}
+
+	err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift,
+			    &wqres->mtt);
+	if (err)
+		goto err_buf;
+	err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf);
+	if (err)
+		goto err_mtt;
+
+	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+err_buf:
+	mlx4_buf_free(dev, size, &wqres->buf);
+err_db:
+	mlx4_db_free(dev, dma_device, &wqres->db);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res);
+
+void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       struct device *dma_device, int size)
+{
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+	mlx4_buf_free(dev, size, &wqres->buf);
+	mlx4_db_free(dev, dma_device, &wqres->db);
+}
+EXPORT_SYMBOL_GPL(mlx4_free_hwq_res);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 49a4aca..3ab9034 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -798,6 +798,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&priv->ctx_list);
 	spin_lock_init(&priv->ctx_lock);

+	INIT_LIST_HEAD(&priv->pgdir_list);
+	mutex_init(&priv->pgdir_mutex);
+
 	/*
 	 * Now reset the HCA before we touch the PCI capabilities or
 	 * attempt a firmware command, since a boot ROM may have left
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 7333681..a4023c2 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -257,6 +257,9 @@ struct mlx4_priv {
 	struct list_head	ctx_list;
 	spinlock_t		ctx_lock;

+	struct list_head        pgdir_list;
+	struct mutex            pgdir_mutex;
+
 	struct mlx4_fw		fw;
 	struct mlx4_cmd		cmd;

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ff7df1a..0cb92ee 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -37,6 +37,8 @@
 #include <linux/completion.h>
 #include <linux/radix-tree.h>

+#include <rdma/ib_umem.h>
+
 #include <asm/atomic.h>

 enum {
@@ -208,6 +210,44 @@ struct mlx4_mtt {
 	int			page_shift;
 };

+enum {
+	MLX4_DB_PER_PAGE = PAGE_SIZE / 4
+};
+
+struct mlx4_db_pgdir {
+	struct list_head        list;
+	DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE);
+	DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2);
+	unsigned long          *bits[2];
+	__be32                 *db_page;
+	dma_addr_t              db_dma;
+};
+
+struct mlx4_user_db_page {
+	struct list_head	list;
+	struct ib_umem	       *umem;
+	unsigned long		user_virt;
+	int			refcnt;
+};
+
+struct mlx4_db {
+	__be32                  *db;
+	union {
+		struct mlx4_db_pgdir		*pgdir;
+		struct mlx4_user_db_page	*user_page;
+	}			u;
+	dma_addr_t              dma;
+	int                     index;
+	int                     order;
+};
+
+
+struct mlx4_hwq_resources {
+	struct mlx4_db          db;
+	struct mlx4_mtt         mtt;
+	struct mlx4_buf         buf;
+};
+
 struct mlx4_mr {
 	struct mlx4_mtt		mtt;
 	u64			iova;
@@ -341,6 +381,16 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 		       struct mlx4_buf *buf);

+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       struct device *dma_device, int size, int max_direct);
+void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
+		       struct device *dma_device, int size);
+
+int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device,
+		  struct mlx4_db *db, int order);
+void mlx4_db_free(struct mlx4_dev *dev, struct device *dma_device,
+		  struct mlx4_db *db);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Fri Apr 18 05:20:07 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Fri, 18 Apr 2008 15:20:07 +0300
Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support, Patch 2)
Message-ID: <480891F7.8090807@mellanox.co.il>

>From 82401698d675e97aca4d3430a0f8a0fea893c64f Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Thu, 17 Apr 2008 15:40:59 +0300
Subject: [PATCH] mlx4: Qp range reservation

Prior to allocating a qp, one need to reserve an aligned range of qps.
The change is made to enable allocation of consecutive qps.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/qp.c |    9 ++++
 drivers/net/mlx4/alloc.c        |   99 ++++++++++++++++++++++++++++++++++++--
 drivers/net/mlx4/mlx4.h         |    6 ++
 drivers/net/mlx4/qp.c           |   44 ++++++++++++-----
 include/linux/mlx4/device.h     |    5 ++-
 5 files changed, 143 insertions(+), 20 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index e65b8e4..c21a9a3 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -545,6 +545,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		}
 	}

+	if (!sqpn)
+		err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn);
+	if (err)
+		goto err_wrid;
+
 	err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp);
 	if (err)
 		goto err_wrid;
@@ -655,6 +660,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 	mlx4_ib_unlock_cqs(send_cq, recv_cq);

 	mlx4_qp_free(dev->dev, &qp->mqp);
+
+	if (!is_sqp(dev, qp))
+		mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1);
+
 	mlx4_mtt_cleanup(dev->dev, &qp->mtt);

 	if (is_user) {
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index b6b00eb..52b4af3 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -44,15 +44,18 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap)

 	spin_lock(&bitmap->lock);

-	obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last);
-	if (obj >= bitmap->max) {
+	obj = find_next_zero_bit(bitmap->table, bitmap->effective_max,
+				 bitmap->last);
+	if (obj >= bitmap->effective_max) {
 		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
-		obj = find_first_zero_bit(bitmap->table, bitmap->max);
+		obj = find_first_zero_bit(bitmap->table, bitmap->effective_max);
 	}

-	if (obj < bitmap->max) {
+	if (obj < bitmap->effective_max) {
 		set_bit(obj, bitmap->table);
-		bitmap->last = (obj + 1) & (bitmap->max - 1);
+		bitmap->last = (obj + 1);
+		if (bitmap->last == bitmap->effective_max)
+			bitmap->last = 0;
 		obj |= bitmap->top;
 	} else
 		obj = -1;
@@ -73,7 +76,83 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj)
 	spin_unlock(&bitmap->lock);
 }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved)
+static unsigned long find_aligned_range(unsigned long *bitmap,
+					u32 start, u32 nbits,
+					int len, int align)
+{
+	unsigned long end, i;
+
+again:
+	start = ALIGN(start, align);
+	while ((start < nbits) && test_bit(start, bitmap))
+		start += align;
+	if (start >= nbits)
+		return -1;
+
+	end = start+len;
+	if (end > nbits)
+		return -1;
+	for (i = start+1; i < end; i++) {
+		if (test_bit(i, bitmap)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return start;
+}
+
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
+{
+	u32 obj, i;
+
+	if (likely(cnt == 1 && align == 1))
+		return mlx4_bitmap_alloc(bitmap);
+
+	spin_lock(&bitmap->lock);
+
+	obj = find_aligned_range(bitmap->table, bitmap->last,
+				 bitmap->effective_max, cnt, align);
+	if (obj >= bitmap->effective_max) {
+		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+		obj = find_aligned_range(bitmap->table, 0,
+					 bitmap->effective_max,
+					 cnt, align);
+	}
+
+	if (obj < bitmap->effective_max) {
+		for (i = 0; i < cnt; i++)
+			set_bit(obj+i, bitmap->table);
+		if (obj == bitmap->last) {
+			bitmap->last = (obj + cnt);
+			if (bitmap->last >= bitmap->effective_max)
+				bitmap->last = 0;
+		}
+		obj |= bitmap->top;
+	} else
+		obj = -1;
+
+	spin_unlock(&bitmap->lock);
+
+	return obj;
+}
+
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
+{
+	u32 i;
+
+	obj &= bitmap->max - 1;
+
+	spin_lock(&bitmap->lock);
+	for (i = 0; i < cnt; i++)
+		clear_bit(obj+i, bitmap->table);
+	bitmap->last = min(bitmap->last, obj);
+	bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+	spin_unlock(&bitmap->lock);
+}
+
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max)
 {
 	int i;

@@ -85,6 +164,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved
 	bitmap->top  = 0;
 	bitmap->max  = num;
 	bitmap->mask = mask;
+	bitmap->effective_max = effective_max;
 	spin_lock_init(&bitmap->lock);
 	bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL);
 	if (!bitmap->table)
@@ -96,6 +176,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved
 	return 0;
 }

+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
+{
+	return mlx4_bitmap_init_with_effective_max(bitmap, num, mask,
+						   reserved, num);
+}
+
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap)
 {
 	kfree(bitmap->table);
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a4023c2..2c69d46 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -111,6 +111,7 @@ struct mlx4_bitmap {
 	u32			last;
 	u32			top;
 	u32			max;
+	u32                     effective_max;
 	u32			mask;
 	spinlock_t		lock;
 	unsigned long	       *table;
@@ -287,7 +288,12 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)

 u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap);
 void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
 int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max);
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

 int mlx4_reset(struct mlx4_dev *dev);
diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index fa24e65..dff8e66 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_modify);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp)
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	int qpn;
+
+	qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align);
+	if (qpn == -1)
+		return -ENOMEM;
+
+	*base = qpn;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range);
+
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	if (base_qpn < dev->caps.sqp_start + 8)
+		return;
+
+	mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt);
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_release_range);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_qp_table *qp_table = &priv->qp_table;
 	int err;

-	if (sqpn)
-		qp->qpn = sqpn;
-	else {
-		qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap);
-		if (qp->qpn == -1)
-			return -ENOMEM;
-	}
+	if (!qpn)
+		return -EINVAL;
+
+	qp->qpn = qpn;

 	err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn);
 	if (err)
@@ -208,9 +231,6 @@ err_put_qp:
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

 err_out:
-	if (!sqpn)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
-
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_alloc);
@@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp)
 	mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn);
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

-	if (qp->qpn >= dev->caps.sqp_start + 8)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_free);

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 0cb92ee..a088c63 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -395,7 +395,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp);
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp);
 void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp);

 int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
-- 
1.5.4


From terrywatson at live.com  Fri Apr 18 02:38:17 2008
From: terrywatson at live.com (terry watson)
Date: Fri, 18 Apr 2008 09:38:17 +0000
Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?
In-Reply-To: <48084F4E.3020705@cea.fr>
References: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
	<48084F4E.3020705@cea.fr>
Message-ID: <BAY109-W2F683378F527C8F5FBE03DBE40@phx.gbl>


Thanks for the response. The environment I am testing has two clusters and one switch, with the subnet manager running from the switch. Half the nodes are in one partition and half in the other (ignoring 0xffff), call them partitions A and B. I have access to one node in partition A as root and would like to be able to reconfigure that node locally, and with no access to the switch subnet manager configuration, to be able to access nodes in partition B.

After some reading I believe that IBIS from IBUtils should allow me to alter the local p_key table and therefore allow me to access nodes on partition B. I cannot test this until I am on-site and I am formulating a strategy before arrival. If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. 

If my approach is flawed I would appreciate it if someone could point this out.

________________________________
> Date: Fri, 18 Apr 2008 09:35:42 +0200
> From: philippe.gregoire at cea.fr
> To: terrywatson at live.com
> CC: general at lists.openfabrics.org
> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM?
> 
> terry watson a écrit :
> 
> Hi all,
> 
> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?
> 
> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.
> 
> Thanks,
> Dave
> _________________________________________________________________
> Discover the new Windows Vista
> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________
> general mailing list
> general at lists.openfabrics.org
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx

From dsw at bobclements.com  Fri Apr 18 05:55:41 2008
From: dsw at bobclements.com (Jeanette Teague)
Date: Fri, 18 Apr 2008 06:55:41 -0600
Subject: [ofa-general] Re: Hi Best shoes ever!
Message-ID: <01c8a121$3751ce80$128fe5c9@dsw>

Best footwear of all times!
Christian Dior, D&G and Dsquared!

Stay tuned! VIEW ATTACHED FILE FOR DETAILS!!!!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: details.zip
Type: application/zip
Size: 433 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080418/51a78bed/attachment.zip>

From hrosenstock at xsigo.com  Fri Apr 18 07:37:51 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 18 Apr 2008 07:37:51 -0700
Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?
In-Reply-To: <BAY109-W2F683378F527C8F5FBE03DBE40@phx.gbl>
References: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
	<48084F4E.3020705@cea.fr> <BAY109-W2F683378F527C8F5FBE03DBE40@phx.gbl>
Message-ID: <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com>

Terry,

On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote:
> Thanks for the response. The environment I am testing has two clusters and one switch, 
> with the subnet manager running from the switch. Half the nodes are in one partition and 
> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one 
> node in partition A as root and would like to be able to reconfigure that node locally, 
> and with no access to the switch subnet manager configuration, to be able to access nodes 
> in partition B.

In general, this is not a good idea IMO. As Philippe wrote, the SM (is
supposed to) own the writing of those tables (rather than some low level
diag utility). Even if you modify the local PKey table, it is possible
for the SM to overwrite this. Also, there are several other
ramifications of this depending on how the SM deals with partitions.
Even if you change things locally, that may not be sufficient as the
peer switch port may do partition filtering so that may need to change
that too and possible more PKey tables in the network depending on what
your SM does. Also, there are SA responses that depend on the SM having
correct knowledge (like PathRecords and others) so the end node may not
get any response on that partition for certain things.

> After some reading I believe that IBIS from IBUtils should allow me to alter the 
> local p_key table and therefore allow me to access nodes on partition B.

Yes but it may take more than this for it to work depending on your SM.

>  I cannot test this until I am on-site and I am formulating a strategy before arrival. 
> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. 

Some MPIs use out of band mechanisms to create connections so the SA
issues may not apply there; but I think the partition ones might and are
SM dependent so your mileage may vary...

> If my approach is flawed I would appreciate it if someone could point this out.

The proper way to do this is by reconfiguring your SM.

-- Hal

> ________________________________
> > Date: Fri, 18 Apr 2008 09:35:42 +0200
> > From: philippe.gregoire at cea.fr
> > To: terrywatson at live.com
> > CC: general at lists.openfabrics.org
> > Subject: Re: [ofa-general] Is IBIS only for querying OpenSM?
> > 
> > terry watson a écrit :
> > 
> > Hi all,
> > 
> > I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?
> > 
> > I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.
> > 
> > Thanks,
> > Dave
> > _________________________________________________________________
> > Discover the new Windows Vista
> > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> _________________________________________________________________
> News, entertainment and everything you care about at Live.com. Get it now!
> http://www.live.com/getstarted.aspx_______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Fri Apr 18 09:19:35 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 18 Apr 2008 09:19:35 -0700
Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP
	support, Patch 1)
In-Reply-To: <48089127.2040905@mellanox.co.il> (Yevgeny Petrilin's message of
	"Fri, 18 Apr 2008 15:16:39 +0300")
References: <48089127.2040905@mellanox.co.il>
Message-ID: <adar6d3dtk8.fsf@cisco.com>

 > +	INIT_LIST_HEAD(&priv->pgdir_list);
 > +	mutex_init(&priv->pgdir_mutex);

Your patch adds pgdir_list to core but doesn't remove it from mlx4_ib.

 > -		err = mlx4_ib_db_alloc(dev, &cq->db, 1);
 > +		err = mlx4_db_alloc(dev->dev, dev->ib_dev.dma_device, &cq->db, 1);

 > +int mlx4_db_alloc(struct mlx4_dev *dev, struct device *dma_device,
 > +		  struct mlx4_db *db, int order)

I must be missing something but why do you add the dma_device parameter
here?  When would a consumer ever want to pass something other than
dev->pdev->dev?

 > +int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
 > +		       struct device *dma_device, int size, int max_direct)

This is adding a separate API beyond just moving the doorbell stuff to
mlx4_core.  Please separate this still further into another patch.

Can mlx4_ib use this interface too?

 - R.


From terrywatson at live.com  Fri Apr 18 08:25:31 2008
From: terrywatson at live.com (terry watson)
Date: Fri, 18 Apr 2008 15:25:31 +0000
Subject: ***SPAM*** RE: ***SPAM*** RE: [ofa-general] Is IBIS only for querying
	OpenSM?
In-Reply-To: <1208529471.26936.303.camel@hrosenstock-ws.xsigo.com>
References: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
	<48084F4E.3020705@cea.fr> <BAY109-W2F683378F527C8F5FBE03DBE40@phx.gbl> 
	<1208529471.26936.303.camel@hrosenstock-ws.xsigo.com>
Message-ID: <BAY109-W3170A566D5AA71FF646819DBE40@phx.gbl>


Thanks Hal. I appreciate using the SM is the correct means of controlling partitioning; however, the testing I am performing is assessing security vulnerabilities. In this case, the two clusters are separated by partitioning only and I am seeking to assess the ability of a user to obtain unauthorised access to one cluster from the other. The requirement for the vendor building the two clusters was that they were isolated from each other. They have chosen to use one switch and I have to assess if this provides adequate isolation, as per the client's security requirements.

At this stage of my investigation, I do not believe partitioning on a switch provides adequate separation / isolation to be used as a security control and two physical switches will need to be used to provide the complete isolation that is required. But my task is to prove this to justify the expense.... :) 

I value any comments or input on this topic.


----------------------------------------
> Subject: Re: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?
> From: hrosenstock at xsigo.com
> To: terrywatson at live.com
> CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org
> Date: Fri, 18 Apr 2008 07:37:51 -0700
> 
> Terry,
> 
> On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote:
>> Thanks for the response. The environment I am testing has two clusters and one switch, 
>> with the subnet manager running from the switch. Half the nodes are in one partition and 
>> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one 
>> node in partition A as root and would like to be able to reconfigure that node locally, 
>> and with no access to the switch subnet manager configuration, to be able to access nodes 
>> in partition B.
> 
> In general, this is not a good idea IMO. As Philippe wrote, the SM (is
> supposed to) own the writing of those tables (rather than some low level
> diag utility). Even if you modify the local PKey table, it is possible
> for the SM to overwrite this. Also, there are several other
> ramifications of this depending on how the SM deals with partitions.
> Even if you change things locally, that may not be sufficient as the
> peer switch port may do partition filtering so that may need to change
> that too and possible more PKey tables in the network depending on what
> your SM does. Also, there are SA responses that depend on the SM having
> correct knowledge (like PathRecords and others) so the end node may not
> get any response on that partition for certain things.
> 
>> After some reading I believe that IBIS from IBUtils should allow me to alter the 
>> local p_key table and therefore allow me to access nodes on partition B.
> 
> Yes but it may take more than this for it to work depending on your SM.
> 
>>  I cannot test this until I am on-site and I am formulating a strategy before arrival. 
>> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. 
> 
> Some MPIs use out of band mechanisms to create connections so the SA
> issues may not apply there; but I think the partition ones might and are
> SM dependent so your mileage may vary...
> 
>> If my approach is flawed I would appreciate it if someone could point this out.
> 
> The proper way to do this is by reconfiguring your SM.
> 
> -- Hal
> 
>> ________________________________
>>> Date: Fri, 18 Apr 2008 09:35:42 +0200
>>> From: philippe.gregoire at cea.fr
>>> To: terrywatson at live.com
>>> CC: general at lists.openfabrics.org
>>> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM?
>>> 
>>> terry watson a écrit :
>>> 
>>> Hi all,
>>> 
>>> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?
>>> 
>>> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.
>>> 
>>> Thanks,
>>> Dave
>>> _________________________________________________________________
>>> Discover the new Windows Vista
>>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>> _________________________________________________________________
>> News, entertainment and everything you care about at Live.com. Get it now!
>> http://www.live.com/getstarted.aspx_______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>> 
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

_________________________________________________________________
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline

From hrosenstock at xsigo.com  Fri Apr 18 12:12:18 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Fri, 18 Apr 2008 12:12:18 -0700
Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?
In-Reply-To: <BAY109-W3170A566D5AA71FF646819DBE40@phx.gbl>
References: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
	<48084F4E.3020705@cea.fr> <BAY109-W2F683378F527C8F5FBE03DBE40@phx.gbl>
	<1208529471.26936.303.camel@hrosenstock-ws.xsigo.com>
	<BAY109-W3170A566D5AA71FF646819DBE40@phx.gbl>
Message-ID: <1208545938.26936.365.camel@hrosenstock-ws.xsigo.com>

Terry,

On Fri, 2008-04-18 at 15:25 +0000, terry watson wrote:
> Thanks Hal. I appreciate using the SM is the correct means of controlling partitioning; however, the testing I am performing is assessing security vulnerabilities. In this case, the two clusters are separated by partitioning only and I am seeking to assess the ability of a user to obtain unauthorised access to one cluster from the other. The requirement for the vendor building the two clusters was that they were isolated from each other. They have chosen to use one switch and I have to assess if this provides adequate isolation, as per the client's security requirements.
> 
> At this stage of my investigation, I do not believe partitioning on a switch provides adequate separation / isolation to be used as a security control and two physical switches will need to be used to provide the complete isolation that is required. But my task is to prove this to justify the expense.... :) 
> 
> I value any comments or input on this topic.

One pertinent thing here is whether a MKey manager is supported in the
SM, and if so, what level of MKeying is used. Sufficient MKey protection
with a sophisticated manager could make the updates of such PKey tables
difficult but not impossible. Currently, OpenSM does not support an MKey
manager but one is being proposed for the next OFED cycle. Currently,
OpenSM supports a static configured MKey and MKey lease period which
could make things marginally better if you are concerned with rogue
updates like this. Not sure about the third party (vendor) SMs in this
regard. Contact your vendor if this is of interest.

-- Hal

> ----------------------------------------
> > Subject: Re: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?
> > From: hrosenstock at xsigo.com
> > To: terrywatson at live.com
> > CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org
> > Date: Fri, 18 Apr 2008 07:37:51 -0700
> > 
> > Terry,
> > 
> > On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote:
> >> Thanks for the response. The environment I am testing has two clusters and one switch, 
> >> with the subnet manager running from the switch. Half the nodes are in one partition and 
> >> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one 
> >> node in partition A as root and would like to be able to reconfigure that node locally, 
> >> and with no access to the switch subnet manager configuration, to be able to access nodes 
> >> in partition B.
> > 
> > In general, this is not a good idea IMO. As Philippe wrote, the SM (is
> > supposed to) own the writing of those tables (rather than some low level
> > diag utility). Even if you modify the local PKey table, it is possible
> > for the SM to overwrite this. Also, there are several other
> > ramifications of this depending on how the SM deals with partitions.
> > Even if you change things locally, that may not be sufficient as the
> > peer switch port may do partition filtering so that may need to change
> > that too and possible more PKey tables in the network depending on what
> > your SM does. Also, there are SA responses that depend on the SM having
> > correct knowledge (like PathRecords and others) so the end node may not
> > get any response on that partition for certain things.
> > 
> >> After some reading I believe that IBIS from IBUtils should allow me to alter the 
> >> local p_key table and therefore allow me to access nodes on partition B.
> > 
> > Yes but it may take more than this for it to work depending on your SM.
> > 
> >>  I cannot test this until I am on-site and I am formulating a strategy before arrival. 
> >> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. 
> > 
> > Some MPIs use out of band mechanisms to create connections so the SA
> > issues may not apply there; but I think the partition ones might and are
> > SM dependent so your mileage may vary...
> > 
> >> If my approach is flawed I would appreciate it if someone could point this out.
> > 
> > The proper way to do this is by reconfiguring your SM.
> > 
> > -- Hal
> > 
> >> ________________________________
> >>> Date: Fri, 18 Apr 2008 09:35:42 +0200
> >>> From: philippe.gregoire at cea.fr
> >>> To: terrywatson at live.com
> >>> CC: general at lists.openfabrics.org
> >>> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM?
> >>> 
> >>> terry watson a écrit :
> >>> 
> >>> Hi all,
> >>> 
> >>> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?
> >>> 
> >>> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.
> >>> 
> >>> Thanks,
> >>> Dave
> >>> _________________________________________________________________
> >>> Discover the new Windows Vista
> >>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________
> >>> general mailing list
> >>> general at lists.openfabrics.org
> >> _________________________________________________________________
> >> News, entertainment and everything you care about at Live.com. Get it now!
> >> http://www.live.com/getstarted.aspx_______________________________________________
> >> general mailing list
> >> general at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >> 
> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> > 
> 
> _________________________________________________________________
> Connect to the next generation of MSN Messenger 
> http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline


From rdreier at cisco.com  Fri Apr 18 13:08:53 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 18 Apr 2008 13:08:53 -0700
Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Remove unneeded function
	declarations
Message-ID: <adad4ondiy2.fsf@cisco.com>

Remove redundant static declarations of functions that are defined
before they are used in the source.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c
index b00b0e3..b046262 100644
--- a/drivers/infiniband/hw/nes/nes.c
+++ b/drivers/infiniband/hw/nes/nes.c
@@ -96,12 +96,6 @@ static LIST_HEAD(nes_dev_list);
 
 atomic_t qps_destroyed;
 
-static void nes_print_macaddr(struct net_device *netdev);
-static irqreturn_t nes_interrupt(int, void *);
-static int __devinit nes_probe(struct pci_dev *, const struct pci_device_id *);
-static void __devexit nes_remove(struct pci_dev *);
-static int __init nes_init_module(void);
-static void __exit nes_exit_module(void);
 static unsigned int ee_flsh_adapter;
 static unsigned int sysfs_nonidx_addr;
 static unsigned int sysfs_idx_addr;
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 3416664..01cd0ef 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -92,15 +92,6 @@ static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK
 		| NETIF_MSG_IFUP | NETIF_MSG_IFDOWN;
 static int debug = -1;
 
-
-static int nes_netdev_open(struct net_device *);
-static int nes_netdev_stop(struct net_device *);
-static int nes_netdev_start_xmit(struct sk_buff *, struct net_device *);
-static struct net_device_stats *nes_netdev_get_stats(struct net_device *);
-static void nes_netdev_tx_timeout(struct net_device *);
-static int nes_netdev_set_mac_address(struct net_device *, void *);
-static int nes_netdev_change_mtu(struct net_device *, int);
-
 /**
  * nes_netdev_poll
  */


From rdreier at cisco.com  Fri Apr 18 13:12:25 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 18 Apr 2008 13:12:25 -0700
Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support,
	Patch 2)
In-Reply-To: <480891F7.8090807@mellanox.co.il> (Yevgeny Petrilin's message of
	"Fri, 18 Apr 2008 15:20:07 +0300")
References: <480891F7.8090807@mellanox.co.il>
Message-ID: <ada8wzaexcm.fsf@cisco.com>

 > +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
 > +					u32 num, u32 mask, u32 reserved,
 > +					u32 effective_max)

This patch adds effective_max stuff but I don't see how it's used anywhere??

 - R.


From gstreiff at NetEffect.com  Fri Apr 18 14:42:51 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Fri, 18 Apr 2008 16:42:51 -0500
Subject: [ofa-general] RE: [PATCH/RFC] RDMA/nes: Remove unneeded function
	declarations
In-Reply-To: <adad4ondiy2.fsf@cisco.com>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC07950108@venom2>

Acked-by: Glenn Streiff <gstreiff at neteffect.com>

Thanks.

> -----Original Message-----
> From: Roland Dreier [mailto:rdreier at cisco.com]
> Sent: Friday, April 18, 2008 3:09 PM
> To: general at lists.openfabrics.org
> Cc: Faisal Latif; Nishi Gupta; Glenn Streiff
> Subject: [PATCH/RFC] RDMA/nes: Remove unneeded function declarations
> 
> 
> Remove redundant static declarations of functions that are defined
> before they are used in the source.
> 
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
> diff --git a/drivers/infiniband/hw/nes/nes.c 
> b/drivers/infiniband/hw/nes/nes.c
> index b00b0e3..b046262 100644
> --- a/drivers/infiniband/hw/nes/nes.c
> +++ b/drivers/infiniband/hw/nes/nes.c
> @@ -96,12 +96,6 @@ static LIST_HEAD(nes_dev_list);
>  
>  atomic_t qps_destroyed;
>  
> -static void nes_print_macaddr(struct net_device *netdev);
> -static irqreturn_t nes_interrupt(int, void *);
> -static int __devinit nes_probe(struct pci_dev *, const 
> struct pci_device_id *);
> -static void __devexit nes_remove(struct pci_dev *);
> -static int __init nes_init_module(void);
> -static void __exit nes_exit_module(void);
>  static unsigned int ee_flsh_adapter;
>  static unsigned int sysfs_nonidx_addr;
>  static unsigned int sysfs_idx_addr;
> diff --git a/drivers/infiniband/hw/nes/nes_nic.c 
> b/drivers/infiniband/hw/nes/nes_nic.c
> index 3416664..01cd0ef 100644
> --- a/drivers/infiniband/hw/nes/nes_nic.c
> +++ b/drivers/infiniband/hw/nes/nes_nic.c
> @@ -92,15 +92,6 @@ static const u32 default_msg = 
> NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK
>  		| NETIF_MSG_IFUP | NETIF_MSG_IFDOWN;
>  static int debug = -1;
>  
> -
> -static int nes_netdev_open(struct net_device *);
> -static int nes_netdev_stop(struct net_device *);
> -static int nes_netdev_start_xmit(struct sk_buff *, struct 
> net_device *);
> -static struct net_device_stats *nes_netdev_get_stats(struct 
> net_device *);
> -static void nes_netdev_tx_timeout(struct net_device *);
> -static int nes_netdev_set_mac_address(struct net_device *, void *);
> -static int nes_netdev_change_mtu(struct net_device *, int);
> -
>  /**
>   * nes_netdev_poll
>   */
> 


From rdreier at cisco.com  Fri Apr 18 14:54:59 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 18 Apr 2008 14:54:59 -0700
Subject: [ofa-general] Re: [PATCH v2] Add enum strings and *_str functions
	for enums
In-Reply-To: <20080415133548.414aeaea.weiny2@llnl.gov> (Ira Weiny's message of
	"Tue, 15 Apr 2008 13:35:48 -0700")
References: <adabq4cqvbo.fsf@cisco.com>
	<20080415094750.35afc0e5.weiny2@llnl.gov> <ada8wzfnjp9.fsf@cisco.com>
	<20080415133548.414aeaea.weiny2@llnl.gov>
Message-ID: <adave2ede18.fsf@cisco.com>

Thanks, I added a man page and changed things a little and committed the
following:

commit 1c0b7ac0a6bbbe4d246ef4cf50ae31bde4929ba3
Author: Ira Weiny <weiny2 at llnl.gov>
Date:   Tue Apr 15 13:35:48 2008 -0700

    Add functions to convert enum values to strings
    
    Add ibv_xxx_str() functions to convert node type, port state, event
    type and wc status enum values to strings.
    
    Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

diff --git a/Makefile.am b/Makefile.am
index 705b184..9b05306 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve
 libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@
 
 src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \
-			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c
+			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c \
+			    src/enum_strs.c
 src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \
     $(libibverbs_version_script)
 src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
@@ -38,20 +39,20 @@ libibverbsinclude_HEADERS = include/infiniband/arch.h include/infiniband/driver.
     include/infiniband/kern-abi.h include/infiniband/opcode.h include/infiniband/verbs.h \
     include/infiniband/sa-kern-abi.h include/infiniband/sa.h include/infiniband/marshall.h
 
-man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1   \
-    man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1 \
-    man/ibv_srq_pingpong.1 \
-    man/ibv_alloc_pd.3 man/ibv_attach_mcast.3 man/ibv_create_ah.3      \
-    man/ibv_create_ah_from_wc.3 man/ibv_create_comp_channel.3	       \
-    man/ibv_create_cq.3 man/ibv_create_qp.3 man/ibv_create_srq.3       \
-    man/ibv_fork_init.3 man/ibv_get_async_event.3		       \
-    man/ibv_get_cq_event.3 man/ibv_get_device_guid.3		       \
-    man/ibv_get_device_list.3 man/ibv_get_device_name.3		       \
-    man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3     \
-    man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3	       \
-    man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
-    man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3       \
-    man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3	       \
+man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1	\
+    man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1	\
+    man/ibv_srq_pingpong.1 man/ibv_alloc_pd.3 man/ibv_attach_mcast.3	\
+    man/ibv_create_ah.3 man/ibv_create_ah_from_wc.3			\
+    man/ibv_create_comp_channel.3 man/ibv_create_cq.3			\
+    man/ibv_create_qp.3 man/ibv_create_srq.3 man/ibv_event_type_str.3	\
+    man/ibv_fork_init.3 man/ibv_get_async_event.3			\
+    man/ibv_get_cq_event.3 man/ibv_get_device_guid.3			\
+    man/ibv_get_device_list.3 man/ibv_get_device_name.3			\
+    man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3	\
+    man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3		\
+    man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3	\
+    man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3	\
+    man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3		\
     man/ibv_req_notify_cq.3 man/ibv_resize_cq.3
 
 DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
@@ -84,6 +85,8 @@ install-data-hook:
 	$(RM) ibv_free_device_list.3 && \
 	$(RM) ibv_init_ah_from_wc.3 && \
 	$(RM) mult_to_ibv_rate.3 && \
+	$(RM) ibv_node_type_str.3 && \
+	$(RM) ibv_port_state_str.3 && \
 	$(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
 	$(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \
 	$(LN_S) ibv_open_device.3 ibv_close_device.3 && \
@@ -97,5 +100,6 @@ install-data-hook:
 	$(LN_S) ibv_attach_mcast.3 ibv_detach_mcast.3 && \
 	$(LN_S) ibv_get_device_list.3 ibv_free_device_list.3 && \
 	$(LN_S) ibv_create_ah_from_wc.3 ibv_init_ah_from_wc.3 && \
-	$(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3
-
+	$(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3 && \
+	$(LN_S) ibv_event_type_str.3 ibv_node_type_str.3 && \
+	$(LN_S) ibv_event_type_str.3 ibv_port_state_str.3
diff --git a/examples/devinfo.c b/examples/devinfo.c
index 4e4316a..1fadc80 100644
--- a/examples/devinfo.c
+++ b/examples/devinfo.c
@@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str)
 	return str;
 }
 
-static const char *port_state_str(enum ibv_port_state pstate)
-{
-	switch (pstate) {
-	case IBV_PORT_DOWN:   return "PORT_DOWN";
-	case IBV_PORT_INIT:   return "PORT_INIT";
-	case IBV_PORT_ARMED:  return "PORT_ARMED";
-	case IBV_PORT_ACTIVE: return "PORT_ACTIVE";
-	default:              return "invalid state";
-	}
-}
-
 static const char *port_phy_state_str(uint8_t phys_state)
 {
 	switch (phys_state) {
@@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
 		}
 		printf("\t\tport:\t%d\n", port);
 		printf("\t\t\tstate:\t\t\t%s (%d)\n",
-		       port_state_str(port_attr.state), port_attr.state);
+		       ibv_port_state_str(port_attr.state), port_attr.state);
 		printf("\t\t\tmax_mtu:\t\t%s (%d)\n",
 		       mtu_str(port_attr.max_mtu), port_attr.max_mtu);
 		printf("\t\t\tactive_mtu:\t\t%s (%d)\n",
diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
index 7181914..26fa45c 100644
--- a/examples/rc_pingpong.c
+++ b/examples/rc_pingpong.c
@@ -709,7 +709,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
index bc869c9..95bebf4 100644
--- a/examples/srq_pingpong.c
+++ b/examples/srq_pingpong.c
@@ -805,7 +805,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c
index 6135030..c09c8c1 100644
--- a/examples/uc_pingpong.c
+++ b/examples/uc_pingpong.c
@@ -697,7 +697,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
index aaee26c..8f3d50b 100644
--- a/examples/ud_pingpong.c
+++ b/examples/ud_pingpong.c
@@ -697,7 +697,8 @@ int main(int argc, char *argv[])
 
 			for (i = 0; i < ne; ++i) {
 				if (wc[i].status != IBV_WC_SUCCESS) {
-					fprintf(stderr, "Failed status %d for wr_id %d\n",
+					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
+						ibv_wc_status_str(wc[i].status),
 						wc[i].status, (int) wc[i].wr_id);
 					return 1;
 				}
diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
index a51bb9d..a04cc62 100644
--- a/include/infiniband/verbs.h
+++ b/include/infiniband/verbs.h
@@ -238,6 +238,7 @@ enum ibv_wc_status {
 	IBV_WC_RESP_TIMEOUT_ERR,
 	IBV_WC_GENERAL_ERR
 };
+const char *ibv_wc_status_str(enum ibv_wc_status status);
 
 enum ibv_wc_opcode {
 	IBV_WC_SEND,
@@ -1077,6 +1078,21 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid);
  */
 int ibv_fork_init(void);
 
+/**
+ * ibv_node_type_str - Return string describing node_type enum value
+ */
+const char *ibv_node_type_str(enum ibv_node_type node_type);
+
+/**
+ * ibv_port_state_str - Return string describing port_state enum value
+ */
+const char *ibv_port_state_str(enum ibv_port_state port_state);
+
+/**
+ * ibv_event_type_str - Return string describing event_type enum value
+ */
+const char *ibv_event_type_str(enum ibv_event_type event);
+
 END_C_DECLS
 
 #  undef __attribute_const
diff --git a/man/ibv_event_type_str.3 b/man/ibv_event_type_str.3
new file mode 100644
index 0000000..0df8fcd
--- /dev/null
+++ b/man/ibv_event_type_str.3
@@ -0,0 +1,40 @@
+.\" -*- nroff -*-
+.\"
+.TH IBV_EVENT_TYPE_STR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
+.SH "NAME"
+.nf
+ibv_event_type_str \- Return string describing event_type enum value
+.nl
+ibv_node_type_str \- Return string describing node_type enum value
+.nl
+ibv_port_state_str \- Return string describing port_state enum value
+.SH "SYNOPSIS"
+.nf
+.B #include <infiniband/verbs.h>
+.sp
+.BI "const char *ibv_event_type_str(enum ibv_event_type " "event_type");
+.nl
+.BI "const char *ibv_node_type_str(enum ibv_node_type " "node_type");
+.nl
+.BI "const char *ibv_port_state_str(enum ibv_port_state " "port_state");
+.fi
+.SH "DESCRIPTION"
+.B ibv_node_type_str()
+returns a string describing the node type enum value
+.IR node_type .
+.PP
+.B ibv_port_state_str()
+returns a string describing the port state enum value
+.IR port_state .
+.PP
+.B ibv_event_type_str()
+returns a string describing the event type enum value
+.IR event_type .
+.SH "RETURN VALUE"
+These functions return a constant string that describes the enum value
+passed as their argument.
+.SH "AUTHOR"
+.TP
+Roland Dreier
+.RI < rolandd at cisco.com >
+
diff --git a/src/enum_strs.c b/src/enum_strs.c
new file mode 100644
index 0000000..c57feaa
--- /dev/null
+++ b/src/enum_strs.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2008 Lawrence Livermore National Laboratory
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <infiniband/verbs.h>
+
+const char *ibv_node_type_str(enum ibv_node_type node_type)
+{
+	static const char *const node_type_str[] = {
+		[IBV_NODE_CA]		= "InfiniBand channel adapter",
+		[IBV_NODE_SWITCH]	= "InfiniBand switch",
+		[IBV_NODE_ROUTER]	= "InfiniBand router",
+		[IBV_NODE_RNIC]		= "iWARP NIC"
+	};
+
+	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
+		return "unknown";
+
+	return node_type_str[node_type];
+}
+
+const char *ibv_port_state_str(enum ibv_port_state port_state)
+{
+	static const char *const port_state_str[] = {
+		[IBV_PORT_NOP]		= "no state change (NOP)",
+		[IBV_PORT_DOWN]		= "down",
+		[IBV_PORT_INIT]		= "init",
+		[IBV_PORT_ARMED]	= "armed",
+		[IBV_PORT_ACTIVE]	= "active",
+		[IBV_PORT_ACTIVE_DEFER]	= "active defer"
+	};
+
+	if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER)
+		return "unknown";
+
+	return port_state_str[port_state];
+}
+
+const char *ibv_event_type_str(enum ibv_event_type event)
+{
+	static const char *const event_type_str[] = {
+		[IBV_EVENT_CQ_ERR]		= "CQ error",
+		[IBV_EVENT_QP_FATAL]		= "local work queue catastrophic error",
+		[IBV_EVENT_QP_REQ_ERR]		= "invalid request local work queue error",
+		[IBV_EVENT_QP_ACCESS_ERR]	= "local access violation work queue error",
+		[IBV_EVENT_COMM_EST]		= "communication established",
+		[IBV_EVENT_SQ_DRAINED]		= "send queue drained",
+		[IBV_EVENT_PATH_MIG]		= "path migrated",
+		[IBV_EVENT_PATH_MIG_ERR]	= "path migration request error",
+		[IBV_EVENT_DEVICE_FATAL]	= "local catastrophic error",
+		[IBV_EVENT_PORT_ACTIVE]		= "port active",
+		[IBV_EVENT_PORT_ERR]		= "port error",
+		[IBV_EVENT_LID_CHANGE]		= "LID change",
+		[IBV_EVENT_PKEY_CHANGE]		= "P_Key change",
+		[IBV_EVENT_SM_CHANGE]		= "SM change",
+		[IBV_EVENT_SRQ_ERR]		= "SRQ catastrophic error",
+		[IBV_EVENT_SRQ_LIMIT_REACHED]	= "SRQ limit reached",
+		[IBV_EVENT_QP_LAST_WQE_REACHED]	= "last WQE reached",
+		[IBV_EVENT_CLIENT_REREGISTER]	= "client reregistration",
+	};
+
+	if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER)
+		return "unknown";
+
+	return event_type_str[event];
+}
+
+const char *ibv_wc_status_str(enum ibv_wc_status status)
+{
+	static const char *const wc_status_str[] = {
+		[IBV_WC_SUCCESS]		= "success",
+		[IBV_WC_LOC_LEN_ERR]		= "local length error",
+		[IBV_WC_LOC_QP_OP_ERR]		= "local QP operation error",
+		[IBV_WC_LOC_EEC_OP_ERR]		= "local EE context operation error",
+		[IBV_WC_LOC_PROT_ERR]		= "local protection error",
+		[IBV_WC_WR_FLUSH_ERR]		= "Work Request Flushed Error",
+		[IBV_WC_MW_BIND_ERR]		= "memory management operation error",
+		[IBV_WC_BAD_RESP_ERR]		= "bad response error",
+		[IBV_WC_LOC_ACCESS_ERR]		= "local access error",
+		[IBV_WC_REM_INV_REQ_ERR]	= "remote invalid request error",
+		[IBV_WC_REM_ACCESS_ERR]		= "remote access error",
+		[IBV_WC_REM_OP_ERR]		= "remote operation error",
+		[IBV_WC_RETRY_EXC_ERR]		= "transport retry counter exceeded",
+		[IBV_WC_RNR_RETRY_EXC_ERR]	= "RNR retry counter exceeded",
+		[IBV_WC_LOC_RDD_VIOL_ERR]	= "local RDD violation error",
+		[IBV_WC_REM_INV_RD_REQ_ERR]	= "remote invalid RD request",
+		[IBV_WC_REM_ABORT_ERR]		= "aborted error",
+		[IBV_WC_INV_EECN_ERR]		= "invalid EE context number",
+		[IBV_WC_INV_EEC_STATE_ERR]	= "invalid EE context state",
+		[IBV_WC_FATAL_ERR]		= "fatal error",
+		[IBV_WC_RESP_TIMEOUT_ERR]	= "response timeout error",
+		[IBV_WC_GENERAL_ERR]		= "general error"
+	};
+
+	if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR)
+		return "unknown";
+
+	return wc_status_str[status];
+}
diff --git a/src/libibverbs.map b/src/libibverbs.map
index 3a346ed..1827da0 100644
--- a/src/libibverbs.map
+++ b/src/libibverbs.map
@@ -91,4 +91,9 @@ IBVERBS_1.1 {
 		ibv_dontfork_range;
 		ibv_dofork_range;
 		ibv_register_driver;
+
+		ibv_node_type_str;
+		ibv_port_state_str;
+		ibv_event_type_str;
+		ibv_wc_status_str;
 } IBVERBS_1.0;


From weiny2 at llnl.gov  Fri Apr 18 15:41:30 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Fri, 18 Apr 2008 15:41:30 -0700
Subject: [ofa-general] Re: [PATCH v2] Add enum strings and *_str functions
	for enums
In-Reply-To: <adave2ede18.fsf@cisco.com>
References: <adabq4cqvbo.fsf@cisco.com>
	<20080415094750.35afc0e5.weiny2@llnl.gov>
	<ada8wzfnjp9.fsf@cisco.com>
	<20080415133548.414aeaea.weiny2@llnl.gov>
	<adave2ede18.fsf@cisco.com>
Message-ID: <20080418154130.33a8917b.weiny2@llnl.gov>

Thanks,
Ira

On Fri, 18 Apr 2008 14:54:59 -0700
Roland Dreier <rdreier at cisco.com> wrote:

> Thanks, I added a man page and changed things a little and committed the
> following:
> 
> commit 1c0b7ac0a6bbbe4d246ef4cf50ae31bde4929ba3
> Author: Ira Weiny <weiny2 at llnl.gov>
> Date:   Tue Apr 15 13:35:48 2008 -0700
> 
>     Add functions to convert enum values to strings
>     
>     Add ibv_xxx_str() functions to convert node type, port state, event
>     type and wc status enum values to strings.
>     
>     Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
>     Signed-off-by: Roland Dreier <rolandd at cisco.com>
> 
> diff --git a/Makefile.am b/Makefile.am
> index 705b184..9b05306 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -9,7 +9,8 @@ src_libibverbs_la_CFLAGS = $(AM_CFLAGS) -DIBV_CONFIG_DIR=\"$(sysconfdir)/libibve
>  libibverbs_version_script = @LIBIBVERBS_VERSION_SCRIPT@
>  
>  src_libibverbs_la_SOURCES = src/cmd.c src/compat-1_0.c src/device.c src/init.c \
> -			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c
> +			    src/marshall.c src/memory.c src/sysfs.c src/verbs.c \
> +			    src/enum_strs.c
>  src_libibverbs_la_LDFLAGS = -version-info 1 -export-dynamic \
>      $(libibverbs_version_script)
>  src_libibverbs_la_DEPENDENCIES = $(srcdir)/src/libibverbs.map
> @@ -38,20 +39,20 @@ libibverbsinclude_HEADERS = include/infiniband/arch.h include/infiniband/driver.
>      include/infiniband/kern-abi.h include/infiniband/opcode.h include/infiniband/verbs.h \
>      include/infiniband/sa-kern-abi.h include/infiniband/sa.h include/infiniband/marshall.h
>  
> -man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1   \
> -    man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1 \
> -    man/ibv_srq_pingpong.1 \
> -    man/ibv_alloc_pd.3 man/ibv_attach_mcast.3 man/ibv_create_ah.3      \
> -    man/ibv_create_ah_from_wc.3 man/ibv_create_comp_channel.3	       \
> -    man/ibv_create_cq.3 man/ibv_create_qp.3 man/ibv_create_srq.3       \
> -    man/ibv_fork_init.3 man/ibv_get_async_event.3		       \
> -    man/ibv_get_cq_event.3 man/ibv_get_device_guid.3		       \
> -    man/ibv_get_device_list.3 man/ibv_get_device_name.3		       \
> -    man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3     \
> -    man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3	       \
> -    man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3 \
> -    man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3       \
> -    man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3	       \
> +man_MANS = man/ibv_asyncwatch.1 man/ibv_devices.1 man/ibv_devinfo.1	\
> +    man/ibv_rc_pingpong.1 man/ibv_uc_pingpong.1 man/ibv_ud_pingpong.1	\
> +    man/ibv_srq_pingpong.1 man/ibv_alloc_pd.3 man/ibv_attach_mcast.3	\
> +    man/ibv_create_ah.3 man/ibv_create_ah_from_wc.3			\
> +    man/ibv_create_comp_channel.3 man/ibv_create_cq.3			\
> +    man/ibv_create_qp.3 man/ibv_create_srq.3 man/ibv_event_type_str.3	\
> +    man/ibv_fork_init.3 man/ibv_get_async_event.3			\
> +    man/ibv_get_cq_event.3 man/ibv_get_device_guid.3			\
> +    man/ibv_get_device_list.3 man/ibv_get_device_name.3			\
> +    man/ibv_modify_qp.3 man/ibv_modify_srq.3 man/ibv_open_device.3	\
> +    man/ibv_poll_cq.3 man/ibv_post_recv.3 man/ibv_post_send.3		\
> +    man/ibv_post_srq_recv.3 man/ibv_query_device.3 man/ibv_query_gid.3	\
> +    man/ibv_query_pkey.3 man/ibv_query_port.3 man/ibv_query_qp.3	\
> +    man/ibv_query_srq.3 man/ibv_rate_to_mult.3 man/ibv_reg_mr.3		\
>      man/ibv_req_notify_cq.3 man/ibv_resize_cq.3
>  
>  DEBIAN = debian/changelog debian/compat debian/control debian/copyright \
> @@ -84,6 +85,8 @@ install-data-hook:
>  	$(RM) ibv_free_device_list.3 && \
>  	$(RM) ibv_init_ah_from_wc.3 && \
>  	$(RM) mult_to_ibv_rate.3 && \
> +	$(RM) ibv_node_type_str.3 && \
> +	$(RM) ibv_port_state_str.3 && \
>  	$(LN_S) ibv_get_async_event.3 ibv_ack_async_event.3 && \
>  	$(LN_S) ibv_get_cq_event.3 ibv_ack_cq_events.3 && \
>  	$(LN_S) ibv_open_device.3 ibv_close_device.3 && \
> @@ -97,5 +100,6 @@ install-data-hook:
>  	$(LN_S) ibv_attach_mcast.3 ibv_detach_mcast.3 && \
>  	$(LN_S) ibv_get_device_list.3 ibv_free_device_list.3 && \
>  	$(LN_S) ibv_create_ah_from_wc.3 ibv_init_ah_from_wc.3 && \
> -	$(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3
> -
> +	$(LN_S) ibv_rate_to_mult.3 mult_to_ibv_rate.3 && \
> +	$(LN_S) ibv_event_type_str.3 ibv_node_type_str.3 && \
> +	$(LN_S) ibv_event_type_str.3 ibv_port_state_str.3
> diff --git a/examples/devinfo.c b/examples/devinfo.c
> index 4e4316a..1fadc80 100644
> --- a/examples/devinfo.c
> +++ b/examples/devinfo.c
> @@ -67,17 +67,6 @@ static const char *guid_str(uint64_t node_guid, char *str)
>  	return str;
>  }
>  
> -static const char *port_state_str(enum ibv_port_state pstate)
> -{
> -	switch (pstate) {
> -	case IBV_PORT_DOWN:   return "PORT_DOWN";
> -	case IBV_PORT_INIT:   return "PORT_INIT";
> -	case IBV_PORT_ARMED:  return "PORT_ARMED";
> -	case IBV_PORT_ACTIVE: return "PORT_ACTIVE";
> -	default:              return "invalid state";
> -	}
> -}
> -
>  static const char *port_phy_state_str(uint8_t phys_state)
>  {
>  	switch (phys_state) {
> @@ -266,7 +255,7 @@ static int print_hca_cap(struct ibv_device *ib_dev, uint8_t ib_port)
>  		}
>  		printf("\t\tport:\t%d\n", port);
>  		printf("\t\t\tstate:\t\t\t%s (%d)\n",
> -		       port_state_str(port_attr.state), port_attr.state);
> +		       ibv_port_state_str(port_attr.state), port_attr.state);
>  		printf("\t\t\tmax_mtu:\t\t%s (%d)\n",
>  		       mtu_str(port_attr.max_mtu), port_attr.max_mtu);
>  		printf("\t\t\tactive_mtu:\t\t%s (%d)\n",
> diff --git a/examples/rc_pingpong.c b/examples/rc_pingpong.c
> index 7181914..26fa45c 100644
> --- a/examples/rc_pingpong.c
> +++ b/examples/rc_pingpong.c
> @@ -709,7 +709,8 @@ int main(int argc, char *argv[])
>  
>  			for (i = 0; i < ne; ++i) {
>  				if (wc[i].status != IBV_WC_SUCCESS) {
> -					fprintf(stderr, "Failed status %d for wr_id %d\n",
> +					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
> +						ibv_wc_status_str(wc[i].status),
>  						wc[i].status, (int) wc[i].wr_id);
>  					return 1;
>  				}
> diff --git a/examples/srq_pingpong.c b/examples/srq_pingpong.c
> index bc869c9..95bebf4 100644
> --- a/examples/srq_pingpong.c
> +++ b/examples/srq_pingpong.c
> @@ -805,7 +805,8 @@ int main(int argc, char *argv[])
>  
>  			for (i = 0; i < ne; ++i) {
>  				if (wc[i].status != IBV_WC_SUCCESS) {
> -					fprintf(stderr, "Failed status %d for wr_id %d\n",
> +					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
> +						ibv_wc_status_str(wc[i].status),
>  						wc[i].status, (int) wc[i].wr_id);
>  					return 1;
>  				}
> diff --git a/examples/uc_pingpong.c b/examples/uc_pingpong.c
> index 6135030..c09c8c1 100644
> --- a/examples/uc_pingpong.c
> +++ b/examples/uc_pingpong.c
> @@ -697,7 +697,8 @@ int main(int argc, char *argv[])
>  
>  			for (i = 0; i < ne; ++i) {
>  				if (wc[i].status != IBV_WC_SUCCESS) {
> -					fprintf(stderr, "Failed status %d for wr_id %d\n",
> +					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
> +						ibv_wc_status_str(wc[i].status),
>  						wc[i].status, (int) wc[i].wr_id);
>  					return 1;
>  				}
> diff --git a/examples/ud_pingpong.c b/examples/ud_pingpong.c
> index aaee26c..8f3d50b 100644
> --- a/examples/ud_pingpong.c
> +++ b/examples/ud_pingpong.c
> @@ -697,7 +697,8 @@ int main(int argc, char *argv[])
>  
>  			for (i = 0; i < ne; ++i) {
>  				if (wc[i].status != IBV_WC_SUCCESS) {
> -					fprintf(stderr, "Failed status %d for wr_id %d\n",
> +					fprintf(stderr, "Failed status %s (%d) for wr_id %d\n",
> +						ibv_wc_status_str(wc[i].status),
>  						wc[i].status, (int) wc[i].wr_id);
>  					return 1;
>  				}
> diff --git a/include/infiniband/verbs.h b/include/infiniband/verbs.h
> index a51bb9d..a04cc62 100644
> --- a/include/infiniband/verbs.h
> +++ b/include/infiniband/verbs.h
> @@ -238,6 +238,7 @@ enum ibv_wc_status {
>  	IBV_WC_RESP_TIMEOUT_ERR,
>  	IBV_WC_GENERAL_ERR
>  };
> +const char *ibv_wc_status_str(enum ibv_wc_status status);
>  
>  enum ibv_wc_opcode {
>  	IBV_WC_SEND,
> @@ -1077,6 +1078,21 @@ int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid);
>   */
>  int ibv_fork_init(void);
>  
> +/**
> + * ibv_node_type_str - Return string describing node_type enum value
> + */
> +const char *ibv_node_type_str(enum ibv_node_type node_type);
> +
> +/**
> + * ibv_port_state_str - Return string describing port_state enum value
> + */
> +const char *ibv_port_state_str(enum ibv_port_state port_state);
> +
> +/**
> + * ibv_event_type_str - Return string describing event_type enum value
> + */
> +const char *ibv_event_type_str(enum ibv_event_type event);
> +
>  END_C_DECLS
>  
>  #  undef __attribute_const
> diff --git a/man/ibv_event_type_str.3 b/man/ibv_event_type_str.3
> new file mode 100644
> index 0000000..0df8fcd
> --- /dev/null
> +++ b/man/ibv_event_type_str.3
> @@ -0,0 +1,40 @@
> +.\" -*- nroff -*-
> +.\"
> +.TH IBV_EVENT_TYPE_STR 3 2006-10-31 libibverbs "Libibverbs Programmer's Manual"
> +.SH "NAME"
> +.nf
> +ibv_event_type_str \- Return string describing event_type enum value
> +.nl
> +ibv_node_type_str \- Return string describing node_type enum value
> +.nl
> +ibv_port_state_str \- Return string describing port_state enum value
> +.SH "SYNOPSIS"
> +.nf
> +.B #include <infiniband/verbs.h>
> +.sp
> +.BI "const char *ibv_event_type_str(enum ibv_event_type " "event_type");
> +.nl
> +.BI "const char *ibv_node_type_str(enum ibv_node_type " "node_type");
> +.nl
> +.BI "const char *ibv_port_state_str(enum ibv_port_state " "port_state");
> +.fi
> +.SH "DESCRIPTION"
> +.B ibv_node_type_str()
> +returns a string describing the node type enum value
> +.IR node_type .
> +.PP
> +.B ibv_port_state_str()
> +returns a string describing the port state enum value
> +.IR port_state .
> +.PP
> +.B ibv_event_type_str()
> +returns a string describing the event type enum value
> +.IR event_type .
> +.SH "RETURN VALUE"
> +These functions return a constant string that describes the enum value
> +passed as their argument.
> +.SH "AUTHOR"
> +.TP
> +Roland Dreier
> +.RI < rolandd at cisco.com >
> +
> diff --git a/src/enum_strs.c b/src/enum_strs.c
> new file mode 100644
> index 0000000..c57feaa
> --- /dev/null
> +++ b/src/enum_strs.c
> @@ -0,0 +1,127 @@
> +/*
> + * Copyright (c) 2008 Lawrence Livermore National Laboratory
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + */
> +
> +#include <infiniband/verbs.h>
> +
> +const char *ibv_node_type_str(enum ibv_node_type node_type)
> +{
> +	static const char *const node_type_str[] = {
> +		[IBV_NODE_CA]		= "InfiniBand channel adapter",
> +		[IBV_NODE_SWITCH]	= "InfiniBand switch",
> +		[IBV_NODE_ROUTER]	= "InfiniBand router",
> +		[IBV_NODE_RNIC]		= "iWARP NIC"
> +	};
> +
> +	if (node_type < IBV_NODE_CA || node_type > IBV_NODE_RNIC)
> +		return "unknown";
> +
> +	return node_type_str[node_type];
> +}
> +
> +const char *ibv_port_state_str(enum ibv_port_state port_state)
> +{
> +	static const char *const port_state_str[] = {
> +		[IBV_PORT_NOP]		= "no state change (NOP)",
> +		[IBV_PORT_DOWN]		= "down",
> +		[IBV_PORT_INIT]		= "init",
> +		[IBV_PORT_ARMED]	= "armed",
> +		[IBV_PORT_ACTIVE]	= "active",
> +		[IBV_PORT_ACTIVE_DEFER]	= "active defer"
> +	};
> +
> +	if (port_state < IBV_PORT_NOP || port_state > IBV_PORT_ACTIVE_DEFER)
> +		return "unknown";
> +
> +	return port_state_str[port_state];
> +}
> +
> +const char *ibv_event_type_str(enum ibv_event_type event)
> +{
> +	static const char *const event_type_str[] = {
> +		[IBV_EVENT_CQ_ERR]		= "CQ error",
> +		[IBV_EVENT_QP_FATAL]		= "local work queue catastrophic error",
> +		[IBV_EVENT_QP_REQ_ERR]		= "invalid request local work queue error",
> +		[IBV_EVENT_QP_ACCESS_ERR]	= "local access violation work queue error",
> +		[IBV_EVENT_COMM_EST]		= "communication established",
> +		[IBV_EVENT_SQ_DRAINED]		= "send queue drained",
> +		[IBV_EVENT_PATH_MIG]		= "path migrated",
> +		[IBV_EVENT_PATH_MIG_ERR]	= "path migration request error",
> +		[IBV_EVENT_DEVICE_FATAL]	= "local catastrophic error",
> +		[IBV_EVENT_PORT_ACTIVE]		= "port active",
> +		[IBV_EVENT_PORT_ERR]		= "port error",
> +		[IBV_EVENT_LID_CHANGE]		= "LID change",
> +		[IBV_EVENT_PKEY_CHANGE]		= "P_Key change",
> +		[IBV_EVENT_SM_CHANGE]		= "SM change",
> +		[IBV_EVENT_SRQ_ERR]		= "SRQ catastrophic error",
> +		[IBV_EVENT_SRQ_LIMIT_REACHED]	= "SRQ limit reached",
> +		[IBV_EVENT_QP_LAST_WQE_REACHED]	= "last WQE reached",
> +		[IBV_EVENT_CLIENT_REREGISTER]	= "client reregistration",
> +	};
> +
> +	if (event < IBV_EVENT_CQ_ERR || event > IBV_EVENT_CLIENT_REREGISTER)
> +		return "unknown";
> +
> +	return event_type_str[event];
> +}
> +
> +const char *ibv_wc_status_str(enum ibv_wc_status status)
> +{
> +	static const char *const wc_status_str[] = {
> +		[IBV_WC_SUCCESS]		= "success",
> +		[IBV_WC_LOC_LEN_ERR]		= "local length error",
> +		[IBV_WC_LOC_QP_OP_ERR]		= "local QP operation error",
> +		[IBV_WC_LOC_EEC_OP_ERR]		= "local EE context operation error",
> +		[IBV_WC_LOC_PROT_ERR]		= "local protection error",
> +		[IBV_WC_WR_FLUSH_ERR]		= "Work Request Flushed Error",
> +		[IBV_WC_MW_BIND_ERR]		= "memory management operation error",
> +		[IBV_WC_BAD_RESP_ERR]		= "bad response error",
> +		[IBV_WC_LOC_ACCESS_ERR]		= "local access error",
> +		[IBV_WC_REM_INV_REQ_ERR]	= "remote invalid request error",
> +		[IBV_WC_REM_ACCESS_ERR]		= "remote access error",
> +		[IBV_WC_REM_OP_ERR]		= "remote operation error",
> +		[IBV_WC_RETRY_EXC_ERR]		= "transport retry counter exceeded",
> +		[IBV_WC_RNR_RETRY_EXC_ERR]	= "RNR retry counter exceeded",
> +		[IBV_WC_LOC_RDD_VIOL_ERR]	= "local RDD violation error",
> +		[IBV_WC_REM_INV_RD_REQ_ERR]	= "remote invalid RD request",
> +		[IBV_WC_REM_ABORT_ERR]		= "aborted error",
> +		[IBV_WC_INV_EECN_ERR]		= "invalid EE context number",
> +		[IBV_WC_INV_EEC_STATE_ERR]	= "invalid EE context state",
> +		[IBV_WC_FATAL_ERR]		= "fatal error",
> +		[IBV_WC_RESP_TIMEOUT_ERR]	= "response timeout error",
> +		[IBV_WC_GENERAL_ERR]		= "general error"
> +	};
> +
> +	if (status < IBV_WC_SUCCESS || status > IBV_WC_GENERAL_ERR)
> +		return "unknown";
> +
> +	return wc_status_str[status];
> +}
> diff --git a/src/libibverbs.map b/src/libibverbs.map
> index 3a346ed..1827da0 100644
> --- a/src/libibverbs.map
> +++ b/src/libibverbs.map
> @@ -91,4 +91,9 @@ IBVERBS_1.1 {
>  		ibv_dontfork_range;
>  		ibv_dofork_range;
>  		ibv_register_driver;
> +
> +		ibv_node_type_str;
> +		ibv_port_state_str;
> +		ibv_event_type_str;
> +		ibv_wc_status_str;
>  } IBVERBS_1.0;


From rdreier at cisco.com  Fri Apr 18 16:04:50 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 18 Apr 2008 16:04:50 -0700
Subject: [ofa-general] [ANNOUNCE] libibverbs-1.1.2 is released
Message-ID: <adaej92dast.fsf@cisco.com>

libibverbs is a library that allows programs to use RDMA "verbs" for
direct access to RDMA (currently InfiniBand and iWARP) hardware from
userspace.

The new stable release, 1.1.2, is available from

    http://www.openfabrics.org//downloads/verbs/libibverbs-1.1.2.tar.gz

with sha1sum

    7d35b9a0ee45b2ec2e9da5c50565197155a94b5c  libibverbs-1.1.2.tar.gz

I also pushed the latest tree and tag out to kernel.org:

    git://git.kernel.org/pub/scm/libs/infiniband/libibverbs.git

(the name of the tag is libibverbs-1.1.2).

This release has various small fixes, including a lot of improvements
to the Valgrind annotations, and also adds ibv_xxx_str() functions for
printing enum values.

The git shortlog since libibverbs 1.1.1 is:

Dotan Barak (5):
      Initialize reserved attributes in modify QP command
      Fix several valgrind false positives
      Fix some issues in the examples
      Fixes for man pages
      Add command line parameter to set SL for pingpong examples

Ira Weiny (1):
      Add functions to convert enum values to strings

Or Gerlitz (1):
      Document IBV_SEND_INLINE buffer ownership

Roland Dreier (16):
      Remove deprecated ${Source-Version} from debian/control
      Add <stdint.h> to <infiniband/arch.h>
      Clean up NVALGRIND comment in config.h.in
      Fix Valgrind annotations so they can actually be built
      Fix too-big madvise() call in ibv_madvise_range()
      Fix spec file License: tag
      Always return valid bad_wr on error from ibv_post_{send,recv,srq_recv}
      Update Debian policy version to 3.7.3
      Use real Homepage: tag instead of pseudo-header inside description
      Convert hyphen to minus sign in ibv_query_pkey man page
      Put correct version information in Debian shlibs
      Add debian/watch file
      Fix download directory in RPM spec file
      Update various text to talk about general RDMA, not just InfiniBand
      Correct typo ibv_mult_to_rate -> mult_to_ibv_rate in man page
      Add RPM dependency on base package to -devel package
      Roll libibverbs 1.1.2 release

Troy Benjegerdes (1):
      Fix valgrind false positive in ibv_create_comp_channel()

swelch at systemfabricworks.com (1):
      Set ibv_device->node_type when allocating device


From estebanaf85 at wortmann.de  Fri Apr 18 17:41:52 2008
From: estebanaf85 at wortmann.de (Mason Robbins)
Date: Sat, 19 Apr 2008 09:41:52 +0900
Subject: [ofa-general] rep!ic@ watches :: rolex:: fake //atches :: 
Message-ID: <01c8a201$98a9f000$6ae087cb@estebanaf85>

KI kji NG RE ofj PLI bk CA
There is no limit to the number of pro puy du ds cts you may order.
We offer a vast selection of D vzr es lvf ign obx er W spo atc mvw hes in excess of 1000 different mo fpu dels.
We are adding new items weekly.
So be sure to bookmark our si os te and vi lx sit us regularly.
Now, with just a few sim bxv ple clicks, you can have that wa rxg tch you�ve always wanted!
Don�t spend all your savings on that timepiece you�ve always lo vo ng kcu ed for.
B yu uy a r ss ep aeb li ic ca at on ii ly a fr kko act km ion of the pri jqx ce!
Go to that co xtt ckt ook ail pa xaz rty with this, and be sure to ca ft tc im h people�s att rpr ent xh ion.
You�ll have all the class, and still have all your mo ts ne jsk y.

I advise you vi qas sit the si ow te!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080419/7716ab19/attachment.html>

From mingo at elte.hu  Sat Apr 19 01:16:14 2008
From: mingo at elte.hu (Ingo Molnar)
Date: Sat, 19 Apr 2008 10:16:14 +0200
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adaskxkh6s2.fsf@cisco.com>
References: <adaskxkh6s2.fsf@cisco.com>
Message-ID: <20080419081614.GA2437@elte.hu>


* Roland Dreier <rdreier at cisco.com> wrote:

>       IB/ipath: Misc changes to prepare for IB7220 introduction

>       IB/ipath: add calls to new 7220 code and enable in build

x86.git auto-testing found that these changes broke the -git build, with 
this config:

  http://redhat.com/~mingo/misc/config-Sat_Apr_19_09_55_05_CEST_2008.bad

the failure is a link failure:

 drivers/built-in.o: In function `ipath_init_one':
 ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'

disabling CONFIG_INFINIBAND_IPATH=y works this around.

	Ingo


From rdreier at cisco.com  Sat Apr 19 07:11:20 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 19 Apr 2008 07:11:20 -0700
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <20080419081614.GA2437@elte.hu> (Ingo Molnar's message of "Sat,
	19 Apr 2008 10:16:14 +0200")
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
Message-ID: <adaod86aq9j.fsf@cisco.com>

 > x86.git auto-testing found that these changes broke the -git build, with 
 > this config:
 > 
 >   http://redhat.com/~mingo/misc/config-Sat_Apr_19_09_55_05_CEST_2008.bad
 > 
 > the failure is a link failure:
 > 
 >  drivers/built-in.o: In function `ipath_init_one':
 >  ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'

Thanks.  The relevant parts of the config are

    # CONFIG_PCI_MSI is not set
    CONFIG_HT_IRQ=y
    CONFIG_INFINIBAND_IPATH=y

The problem is that the iba7220 files don't get built in that case, but
the main driver file tries to call ipath_init_iba7220 anyway.

This is fixed by the patch below, which makes the iba7220 file build
unconditionally.

I also removed the dependency on HT_IRQ || PCI_MSI in the Kconfig, since
the iba7220 support should work without it.  I know we discussed this
before, but looking closer at the code, the dependency seems pointless
to me, since it's still possible to build a driver that doesn't work if
a particular system needs, say HT_IRQ, and the user selects PCI_MSI.
And since iba7220 doesn't need either, we might as well let people build
that.

If this is OK with everyone, I will merge this with a proper changelog.

 - R.

diff --git a/drivers/infiniband/hw/ipath/Kconfig b/drivers/infiniband/hw/ipath/Kconfig
index 044da58..3c7968f 100644
--- a/drivers/infiniband/hw/ipath/Kconfig
+++ b/drivers/infiniband/hw/ipath/Kconfig
@@ -1,6 +1,6 @@
 config INFINIBAND_IPATH
 	tristate "QLogic InfiniPath Driver"
-	depends on (PCI_MSI || HT_IRQ) && 64BIT && NET
+	depends on 64BIT && NET
 	---help---
 	This is a driver for QLogic InfiniPath host channel adapters,
 	including InfiniBand verbs support.  This driver allows these
diff --git a/drivers/infiniband/hw/ipath/Makefile b/drivers/infiniband/hw/ipath/Makefile
index 75a6c91..bf94500 100644
--- a/drivers/infiniband/hw/ipath/Makefile
+++ b/drivers/infiniband/hw/ipath/Makefile
@@ -29,11 +29,13 @@ ib_ipath-y := \
 	ipath_user_pages.o \
 	ipath_user_sdma.o \
 	ipath_verbs_mcast.o \
-	ipath_verbs.o
+	ipath_verbs.o \
+	ipath_iba7220.o \
+	ipath_sd7220.o \
+	ipath_sd7220_img.o
 
 ib_ipath-$(CONFIG_HT_IRQ) += ipath_iba6110.o
 ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba6120.o
-ib_ipath-$(CONFIG_PCI_MSI) += ipath_iba7220.o ipath_sd7220.o ipath_sd7220_img.o
 
 ib_ipath-$(CONFIG_X86_64) += ipath_wc_x86_64.o
 ib_ipath-$(CONFIG_PPC64) += ipath_wc_ppc64.o


From rdreier at cisco.com  Sat Apr 19 07:18:24 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 19 Apr 2008 07:18:24 -0700
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adaod86aq9j.fsf@cisco.com> (Roland Dreier's message of "Sat, 19
	Apr 2008 07:11:20 -0700")
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
	<adaod86aq9j.fsf@cisco.com>
Message-ID: <adak5iuapxr.fsf@cisco.com>

By the way (only peripherally related), it seems all the #ifdef
CONFIG_PCI_MSI tests in ipath_iba7220.c can be removed, since the code
should work fine even if PCI_MSI is not set...

 - R.


From dave.olson at qlogic.com  Sat Apr 19 08:20:49 2008
From: dave.olson at qlogic.com (Dave Olson)
Date: Sat, 19 Apr 2008 08:20:49 -0700 (PDT)
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adaod86aq9j.fsf@cisco.com>
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
	<adaod86aq9j.fsf@cisco.com>
Message-ID: <alpine.LFD.1.00.0804190817280.21302@topaz.pathscale.com>

On Sat, 19 Apr 2008, Roland Dreier wrote:
|  >  drivers/built-in.o: In function `ipath_init_one':
|  >  ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'

Yes, that issue should be fixed.  Our preference was to not build
if it wouldn't work.  We'd have to add the conditional check at
the function setup routines.


| I also removed the dependency on HT_IRQ || PCI_MSI in the Kconfig, since
| the iba7220 support should work without it.  I know we discussed this
| before, but looking closer at the code, the dependency seems pointless
| to me, since it's still possible to build a driver that doesn't work if
| a particular system needs, say HT_IRQ, and the user selects PCI_MSI.
| And since iba7220 doesn't need either, we might as well let people build
| that.
| 
| If this is OK with everyone, I will merge this with a proper changelog.

At this point, I guess I'd agree.  We've added checks for "no interrupt"
after the driver is loaded, so I guess that covers the issue well
enough.

Dave Olson
dave.olson at qlogic.com


From rdreier at cisco.com  Sat Apr 19 09:12:06 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sat, 19 Apr 2008 09:12:06 -0700
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <alpine.LFD.1.00.0804190817280.21302@topaz.pathscale.com> (Dave
	Olson's message of "Sat, 19 Apr 2008 08:20:49 -0700 (PDT)")
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
	<adaod86aq9j.fsf@cisco.com>
	<alpine.LFD.1.00.0804190817280.21302@topaz.pathscale.com>
Message-ID: <adafxthbz8p.fsf@cisco.com>

 > |  >  drivers/built-in.o: In function `ipath_init_one':
 > |  >  ipath_driver.c:(.devinit.text+0x1e5bc): undefined reference to `ipath_init_iba7220_funcs'
 > 
 > Yes, that issue should be fixed.  Our preference was to not build
 > if it wouldn't work.  We'd have to add the conditional check at
 > the function setup routines.

Not sure I really follow this response... ipath_driver.c has

	case PCI_DEVICE_ID_INFINIPATH_7220:
#ifndef CONFIG_PCI_MSI
		ipath_dbg("CONFIG_PCI_MSI is not enabled, "
			  "using IntX for unit %u\n", dd->ipath_unit);
#endif
		ipath_init_iba7220_funcs(dd);
		break;

so clearly ipath_init_iba7220_funcs() was intended to be built and used
even if CONFIG_PCI_MSI was not defined.  From the code it looks like all
should work fine if PCI_MSI is not set, so I don't know what you mean
about conditional checks.

(BTW since I'm looking at this code, "IntX" should probably be
capitalized as "INTx" to match what the PCI specs say)

 - R.


From sushilaexvn at hyperbase.com  Sun Apr 20 01:41:57 2008
From: sushilaexvn at hyperbase.com (EuroSoftware)
Date: Sun, 20 Apr 2008 10:41:57 +0200
Subject: [ofa-general] Ihnen werden unsere Softwarepreise gefallen
Message-ID: <01c8a2d3$27d30880$9b41d029@sushilaexvn>

Sie bezahlen die Software und laden es sofort runter! Wir haben Sie alle - Programme fuer PC unc MAC, in allen europaeischen Sprachen! Wir verkaufen nur originale Vollversionen, aber ganz guenstig!Unser kompetentes Team wird Ihnen bei der Istallation helfen, falls Sie es brauchen. Wir bieten Geld-Zurueck-Garantie und rasche Antworten vom Support!Sie werden mit den besten Softwaren beliefert
http://bargadurit.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080420/5ca0ffec/attachment.html>

From Gunnner-3796344 at PAGEPATH.COM  Sun Apr 20 04:14:27 2008
From: Gunnner-3796344 at PAGEPATH.COM (Gunnner)
Date: Sun, 20 Apr 2008 13:14:27 +0200
Subject: [ofa-general] Get big and strong
Message-ID: <B23B883D.A%Gunnner-3796344@PAGEPATH.COM>

Get with Mary-Kate and Ashley Olsen and last much longer http://www.feaihj.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080420/5d5660b6/attachment.html>

From dave.olson at qlogic.com  Sun Apr 20 07:47:56 2008
From: dave.olson at qlogic.com (Dave Olson)
Date: Sun, 20 Apr 2008 07:47:56 -0700 (PDT)
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <adafxthbz8p.fsf@cisco.com>
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
	<adaod86aq9j.fsf@cisco.com>
	<alpine.LFD.1.00.0804190817280.21302@topaz.pathscale.com>
	<adafxthbz8p.fsf@cisco.com>
Message-ID: <alpine.LFD.1.00.0804200746310.6811@topaz.pathscale.com>

On Sat, 19 Apr 2008, Roland Dreier wrote:
| Not sure I really follow this response... ipath_driver.c has
| 
| 	case PCI_DEVICE_ID_INFINIPATH_7220:
| #ifndef CONFIG_PCI_MSI
| 		ipath_dbg("CONFIG_PCI_MSI is not enabled, "
| 			  "using IntX for unit %u\n", dd->ipath_unit);
| #endif
| 		ipath_init_iba7220_funcs(dd);
| 		break;
| 
| so clearly ipath_init_iba7220_funcs() was intended to be built and used
| even if CONFIG_PCI_MSI was not defined.  From the code it looks like all
| should work fine if PCI_MSI is not set, so I don't know what you mean
| about conditional checks.

Actually, it wasn't.  It was a late cleanup for another problem, and
we didn't worry about the other issue, and should have.

| (BTW since I'm looking at this code, "IntX" should probably be
| capitalized as "INTx" to match what the PCI specs say)

True.

Dave Olson
dave.olson at qlogic.com


From mashirle at us.ibm.com  Sun Apr 20 01:52:31 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Sun, 20 Apr 2008 01:52:31 -0700
Subject: [ofa-general] [PATCH] IPoIB 4K MTU support
Message-ID: <1208681551.5271.11.camel@localhost.localdomain>

Hello Roland,

I recreated IPoIB 4K MTU patch. Below patch is built against 2.6.25
kernel for 2.6.26 kernel submission. Please review and integrate it.
Please let me if any problem.

Thanks
Shirley

This patch enables IPoIB 4K MTU support by using two S/G buffers when
PAGE_SIZE is less than or equal to HCA IB MTU size. The first buffer is
for IPoIB header + GRH header. The second buffer is IPoIB payload, which
is 4K-4.

Signed-off-by: Shirley Ma <xma at us.ibm.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h           |   50 +++++++++++++-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c        |   86 +++++++++++++----------
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   19 ++++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c     |   15 ++++-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c      |    1 +
 6 files changed, 125 insertions(+), 49 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 73b2b17..6a05ead 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -56,11 +56,11 @@
 /* constants */
 
 enum {
-	IPOIB_PACKET_SIZE	  = 2048,
-	IPOIB_BUF_SIZE		  = IPOIB_PACKET_SIZE + IB_GRH_BYTES,
-
 	IPOIB_ENCAP_LEN		  = 4,
 
+	IPOIB_UD_HEAD_SIZE	  = IB_GRH_BYTES + IPOIB_ENCAP_LEN,
+	IPOIB_UD_RX_SG		  = 2, /* max buffer needed for 4K mtu */
+
 	IPOIB_CM_MTU		  = 0x10000 - 0x10, /* padding to align header to 16 */
 	IPOIB_CM_BUF_SIZE	  = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
 	IPOIB_CM_HEAD_SIZE	  = IPOIB_CM_BUF_SIZE % PAGE_SIZE,
@@ -139,7 +139,7 @@ struct ipoib_mcast {
 
 struct ipoib_rx_buf {
 	struct sk_buff *skb;
-	u64		mapping;
+	u64		mapping[IPOIB_UD_RX_SG];
 };
 
 struct ipoib_tx_buf {
@@ -294,6 +294,7 @@ struct ipoib_dev_priv {
 
 	unsigned int admin_mtu;
 	unsigned int mcast_mtu;
+	unsigned int max_ib_mtu;
 
 	struct ipoib_rx_buf *rx_ring;
 
@@ -305,6 +306,9 @@ struct ipoib_dev_priv {
 	struct ib_send_wr    tx_wr;
 	unsigned	     tx_outstanding;
 
+	struct ib_recv_wr    rx_wr;
+	struct ib_sge	     rx_sge[IPOIB_UD_RX_SG];
+
 	struct ib_wc ibwc[IPOIB_NUM_WC];
 
 	struct list_head dead_ahs;
@@ -366,6 +370,44 @@ struct ipoib_neigh {
 	struct list_head    list;
 };
 
+#define IPOIB_UD_MTU(ib_mtu)		(ib_mtu - IPOIB_ENCAP_LEN)
+#define IPOIB_UD_BUF_SIZE(ib_mtu)	(ib_mtu + IB_GRH_BYTES)
+
+static inline int ipoib_ud_need_sg(unsigned int ib_mtu)
+{
+	return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0;
+}
+
+static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv,
+					 u64 mapping[IPOIB_UD_RX_SG])
+{
+	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
+		ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE);
+		ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE);
+	} else
+		ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE);
+}
+
+static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv,
+					  struct sk_buff *skb,
+					  unsigned int length)
+{
+	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[0];
+		/*
+		 * There is only two buffers needed for max_payload = 4K,
+		 * first buf size is IPOIB_UD_HEAD_SIZE
+		 */
+		skb->tail += IPOIB_UD_HEAD_SIZE;
+		frag->size = length - IPOIB_UD_HEAD_SIZE;
+		skb->data_len += frag->size;
+		skb->truesize += frag->size;
+		skb->len += length;
+	} else
+		skb_put(skb, length);
+
+}
+
 /*
  * We stash a pointer to our private neighbour information after our
  * hardware address in neigh->ha.  The ALIGN() expression here makes
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 0205eb7..8b3f1b2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -92,25 +92,18 @@ void ipoib_free_ah(struct kref *kref)
 static int ipoib_ib_post_receive(struct net_device *dev, int id)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
-	struct ib_sge list;
-	struct ib_recv_wr param;
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
-	list.addr     = priv->rx_ring[id].mapping;
-	list.length   = IPOIB_BUF_SIZE;
-	list.lkey     = priv->mr->lkey;
+	priv->rx_wr.wr_id   = id | IPOIB_OP_RECV;
+	priv->rx_sge[0].addr = priv->rx_ring[id].mapping[0];
+	priv->rx_sge[1].addr = priv->rx_ring[id].mapping[1];
+	
 
-	param.next    = NULL;
-	param.wr_id   = id | IPOIB_OP_RECV;
-	param.sg_list = &list;
-	param.num_sge = 1;
-
-	ret = ib_post_recv(priv->qp, &param, &bad_wr);
+	ret = ib_post_recv(priv->qp, &priv->rx_wr, &bad_wr);
 	if (unlikely(ret)) {
 		ipoib_warn(priv, "receive failed for buf %d (%d)\n", id, ret);
-		ib_dma_unmap_single(priv->ca, priv->rx_ring[id].mapping,
-				    IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
+		ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[id].mapping);
 		dev_kfree_skb_any(priv->rx_ring[id].skb);
 		priv->rx_ring[id].skb = NULL;
 	}
@@ -118,15 +111,22 @@ static int ipoib_ib_post_receive(struct net_device *dev, int id)
 	return ret;
 }
 
-static int ipoib_alloc_rx_skb(struct net_device *dev, int id)
+static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev,
+					  int id,
+					  u64 mapping[IPOIB_UD_RX_SG])
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct sk_buff *skb;
-	u64 addr;
+	int buf_size;
 
-	skb = dev_alloc_skb(IPOIB_BUF_SIZE + 4);
-	if (!skb)
-		return -ENOMEM;
+	if (ipoib_ud_need_sg(priv->max_ib_mtu))
+		buf_size = IPOIB_UD_HEAD_SIZE;
+	else
+		buf_size = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu);
+
+	skb = dev_alloc_skb(buf_size + 4);
+	if (unlikely(!skb))
+		return NULL;
 
 	/*
 	 * IB will leave a 40 byte gap for a GRH and IPoIB adds a 4 byte
@@ -135,17 +135,31 @@ static int ipoib_alloc_rx_skb(struct net_device *dev, int id)
 	 */
 	skb_reserve(skb, 4);
 
-	addr = ib_dma_map_single(priv->ca, skb->data, IPOIB_BUF_SIZE,
-				 DMA_FROM_DEVICE);
-	if (unlikely(ib_dma_mapping_error(priv->ca, addr))) {
+	mapping[0] = ib_dma_map_single(priv->ca, skb->data, buf_size,
+				       DMA_FROM_DEVICE);
+	if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) {
 		dev_kfree_skb_any(skb);
-		return -EIO;
+		return NULL;
+	}
+	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
+		struct page *page = alloc_page(GFP_ATOMIC);
+		if (!page)
+			goto partial_error;
+		skb_fill_page_desc(skb, 0, page, 0, PAGE_SIZE);
+		mapping[1] = ib_dma_map_page(priv->ca,
+					     skb_shinfo(skb)->frags[0].page,
+					     0, PAGE_SIZE, DMA_FROM_DEVICE);
+		if (unlikely(ib_dma_mapping_error(priv->ca, mapping[1])))
+			goto partial_error;
 	}
 
-	priv->rx_ring[id].skb     = skb;
-	priv->rx_ring[id].mapping = addr;
+	priv->rx_ring[id].skb = skb;
+	return skb;
 
-	return 0;
+partial_error:
+	ib_dma_unmap_single(priv->ca, mapping[0], buf_size, DMA_FROM_DEVICE);
+	dev_kfree_skb_any(skb);
+	return NULL;
 }
 
 static int ipoib_ib_post_receives(struct net_device *dev)
@@ -154,7 +168,7 @@ static int ipoib_ib_post_receives(struct net_device *dev)
 	int i;
 
 	for (i = 0; i < ipoib_recvq_size; ++i) {
-		if (ipoib_alloc_rx_skb(dev, i)) {
+		if (!ipoib_alloc_rx_skb(dev, i, priv->rx_ring[i].mapping)) {
 			ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
 			return -ENOMEM;
 		}
@@ -172,7 +186,7 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV;
 	struct sk_buff *skb;
-	u64 addr;
+	u64 mapping[IPOIB_UD_RX_SG];
 
 	ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n",
 		       wr_id, wc->status);
@@ -184,15 +198,13 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 	}
 
 	skb  = priv->rx_ring[wr_id].skb;
-	addr = priv->rx_ring[wr_id].mapping;
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
 		if (wc->status != IB_WC_WR_FLUSH_ERR)
 			ipoib_warn(priv, "failed recv event "
 				   "(status=%d, wrid=%d vend_err %x)\n",
 				   wc->status, wr_id, wc->vendor_err);
-		ib_dma_unmap_single(priv->ca, addr,
-				    IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
+		ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping);
 		dev_kfree_skb_any(skb);
 		priv->rx_ring[wr_id].skb = NULL;
 		return;
@@ -209,7 +221,7 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 	 * If we can't allocate a new RX buffer, dump
 	 * this packet and reuse the old buffer.
 	 */
-	if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) {
+	if (unlikely(!ipoib_alloc_rx_skb(dev, wr_id, mapping))) {
 		++dev->stats.rx_dropped;
 		goto repost;
 	}
@@ -217,9 +229,11 @@ static void ipoib_ib_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
 	ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",
 		       wc->byte_len, wc->slid);
 
-	ib_dma_unmap_single(priv->ca, addr, IPOIB_BUF_SIZE, DMA_FROM_DEVICE);
+	ipoib_ud_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping);
+	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
+	memcpy(priv->rx_ring[wr_id].mapping, mapping,
+	       IPOIB_UD_RX_SG * sizeof *mapping);
 
-	skb_put(skb, wc->byte_len);
 	skb_pull(skb, IB_GRH_BYTES);
 
 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
@@ -733,10 +747,8 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush)
 				rx_req = &priv->rx_ring[i];
 				if (!rx_req->skb)
 					continue;
-				ib_dma_unmap_single(priv->ca,
-						    rx_req->mapping,
-						    IPOIB_BUF_SIZE,
-						    DMA_FROM_DEVICE);
+				ipoib_ud_dma_unmap_rx(priv,
+						      priv->rx_ring[i].mapping);
 				dev_kfree_skb_any(rx_req->skb);
 				rx_req->skb = NULL;
 			}
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index bd07f02..ee4c45a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -195,7 +195,7 @@ static int ipoib_change_mtu(struct net_device *dev, int new_mtu)
 		return 0;
 	}
 
-	if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)
+	if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))
 		return -EINVAL;
 
 	priv->admin_mtu = new_mtu;
@@ -971,10 +971,6 @@ static void ipoib_setup(struct net_device *dev)
 				    NETIF_F_LLTX		|
 				    NETIF_F_HIGHDMA);
 
-	/* MTU will be reset when mcast join happens */
-	dev->mtu		 = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN;
-	priv->mcast_mtu		 = priv->admin_mtu = dev->mtu;
-
 	memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN);
 
 	netif_carrier_off(dev);
@@ -1107,6 +1103,7 @@ static struct net_device *ipoib_add_port(const char *format,
 {
 	struct ipoib_dev_priv *priv;
 	struct ib_device_attr *device_attr;
+	struct ib_port_attr attr;
 	int result = -ENOMEM;
 
 	priv = ipoib_intf_alloc(format);
@@ -1115,6 +1112,18 @@ static struct net_device *ipoib_add_port(const char *format,
 
 	SET_NETDEV_DEV(priv->dev, hca->dma_device);
 
+	if (!ib_query_port(hca, port, &attr))
+		priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu);
+	else {
+		printk(KERN_WARNING "%s: ib_query_port %d failed\n",
+		       hca->name, port);
+		goto device_init_failed;
+	}		
+
+	/* MTU will be reset when mcast join happens */
+	priv->dev->mtu  = IPOIB_UD_MTU(priv->max_ib_mtu);
+	priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
+
 	result = ib_query_pkey(hca, port, 0, &priv->pkey);
 	if (result) {
 		printk(KERN_WARNING "%s: ib_query_pkey port %d failed (ret = %d)\n",
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 31a53c5..b9faef2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -567,8 +567,7 @@ void ipoib_mcast_join_task(struct work_struct *work)
 		return;
 	}
 
-	priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) -
-		IPOIB_ENCAP_LEN;
+	priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu)); 
 
 	if (!ipoib_cm_admin_enabled(dev))
 		dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 8a20e37..a7d4bcb 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -150,7 +150,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 			.max_send_wr  = ipoib_sendq_size,
 			.max_recv_wr  = ipoib_recvq_size,
 			.max_send_sge = 1,
-			.max_recv_sge = 1
+			.max_recv_sge = IPOIB_UD_RX_SG 
 		},
 		.sq_sig_type = IB_SIGNAL_ALL_WR,
 		.qp_type     = IB_QPT_UD
@@ -215,6 +215,19 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	priv->tx_wr.sg_list	= priv->tx_sge;
 	priv->tx_wr.send_flags	= IB_SEND_SIGNALED;
 
+	priv->rx_sge[0].lkey = priv->mr->lkey;
+	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
+		priv->rx_sge[0].length = IPOIB_UD_HEAD_SIZE;
+		priv->rx_sge[1].length = PAGE_SIZE;
+		priv->rx_sge[1].lkey = priv->mr->lkey;
+		priv->rx_wr.num_sge = IPOIB_UD_RX_SG;
+	} else {
+		priv->rx_sge[0].length = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu);
+		priv->rx_wr.num_sge = 1;
+	}
+	priv->rx_wr.next = NULL;
+	priv->rx_wr.sg_list = priv->rx_sge;
+
 	return 0;
 
 out_free_cq:
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 293f5b8..431fdea 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -89,6 +89,7 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 		goto err;
 	}
 
+	priv->max_ib_mtu = ppriv->max_ib_mtu;
 	set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
 
 	priv->pkey = pkey;


From rdreier at cisco.com  Sun Apr 20 18:55:23 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 20 Apr 2008 18:55:23 -0700
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <alpine.LFD.1.00.0804200746310.6811@topaz.pathscale.com> (Dave
	Olson's message of "Sun, 20 Apr 2008 07:47:56 -0700 (PDT)")
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
	<adaod86aq9j.fsf@cisco.com>
	<alpine.LFD.1.00.0804190817280.21302@topaz.pathscale.com>
	<adafxthbz8p.fsf@cisco.com>
	<alpine.LFD.1.00.0804200746310.6811@topaz.pathscale.com>
Message-ID: <aday7789dkk.fsf@cisco.com>

 > | Not sure I really follow this response... ipath_driver.c has
 > | 
 > | 	case PCI_DEVICE_ID_INFINIPATH_7220:
 > | #ifndef CONFIG_PCI_MSI
 > | 		ipath_dbg("CONFIG_PCI_MSI is not enabled, "
 > | 			  "using IntX for unit %u\n", dd->ipath_unit);
 > | #endif
 > | 		ipath_init_iba7220_funcs(dd);
 > | 		break;
 > | 
 > | so clearly ipath_init_iba7220_funcs() was intended to be built and used
 > | even if CONFIG_PCI_MSI was not defined.  From the code it looks like all
 > | should work fine if PCI_MSI is not set, so I don't know what you mean
 > | about conditional checks.
 > 
 > Actually, it wasn't.  It was a late cleanup for another problem, and
 > we didn't worry about the other issue, and should have.

Sorry, I still don't follow.  What is the antecedent of "it"?  What was
"the other issue"?

I'm not sure I know the right fix for the build breakage.  It seems
there are two possibilities:

 - build the iba7220 support unconditionally (the patch I posted).

 - change the case statement I quoted above so that the
   ipath_init_iba7220_funcs() call is inside the #ifdef block (and add
   an error message if CONFIG_PCI_MSI is not defined, as for the 6120
   block in the same case statement).

Since it seems iba7220 works with INTx interrupts, the first choice
makes the most sense to me.

And since all the pci_msi functions have stubs that just fail
unconditionally if CONFIG_PCI_MSI is not defined, it seems we can remove
the #ifdef CONFIG_PCI_MSI from the iba7220 files.

And given that at least some device support works even if neither
PCI_MSI nor HT_IRQ is defined, then it makes sense to me to remove that
Kconfig dependency.

If I have something wrong, please let me know.

 - R.


From dave.olson at qlogic.com  Sun Apr 20 19:35:17 2008
From: dave.olson at qlogic.com (Dave Olson)
Date: Sun, 20 Apr 2008 19:35:17 -0700 (PDT)
Subject: [ofa-general] Re: [GIT PULL] please pull infiniband.git
In-Reply-To: <aday7789dkk.fsf@cisco.com>
References: <adaskxkh6s2.fsf@cisco.com> <20080419081614.GA2437@elte.hu>
	<adaod86aq9j.fsf@cisco.com>
	<alpine.LFD.1.00.0804190817280.21302@topaz.pathscale.com>
	<adafxthbz8p.fsf@cisco.com>
	<alpine.LFD.1.00.0804200746310.6811@topaz.pathscale.com>
	<aday7789dkk.fsf@cisco.com>
Message-ID: <alpine.LFD.1.00.0804201933560.11455@topaz.pathscale.com>

On Sun, 20 Apr 2008, Roland Dreier wrote:
|  > | so clearly ipath_init_iba7220_funcs() was intended to be built and used
|  > | even if CONFIG_PCI_MSI was not defined.  From the code it looks like all
|  > | should work fine if PCI_MSI is not set, so I don't know what you mean
|  > | about conditional checks.
|  > 
|  > Actually, it wasn't.  It was a late cleanup for another problem, and
|  > we didn't worry about the other issue, and should have.
| 
| Sorry, I still don't follow.  What is the antecedent of "it"?  What was
| "the other issue"?

The CONFIG_PCI_MSI check where init_iba7220 is called.

| I'm not sure I know the right fix for the build breakage.  It seems
| there are two possibilities:
| 
|  - build the iba7220 support unconditionally (the patch I posted).

Yep; I already said I was OK with that.  It's simplest, let's go with it.

| And given that at least some device support works even if neither
| PCI_MSI nor HT_IRQ is defined, then it makes sense to me to remove that
| Kconfig dependency.

Go ahead.

Dave Olson
dave.olson at qlogic.com


From yevgenyp at mellanox.co.il  Sun Apr 20 23:25:32 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Mon, 21 Apr 2008 09:25:32 +0300
Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support,
	Patch 2)
In-Reply-To: <ada8wzaexcm.fsf@cisco.com>
References: <480891F7.8090807@mellanox.co.il> <ada8wzaexcm.fsf@cisco.com>
Message-ID: <480C335C.6090606@mellanox.co.il>

Roland Dreier wrote:
>  > +int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
>  > +					u32 num, u32 mask, u32 reserved,
>  > +					u32 effective_max)
> 
> This patch adds effective_max stuff but I don't see how it's used anywhere??
> 
>  - R.
> 
We use effective max when there is reserved range not only at the begin of the bitmap,
but also at end. One example is that we reserve qp ranges for FCoE and Ethernet modules.

Thanks,
Yevgeny


From fenkes at de.ibm.com  Mon Apr 21 01:03:10 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:03:10 +0100
Subject: [ofa-general] [PATCH 0/5] IB/ehca: IB compliance fix,
	tracing verbosity and module parameters
Message-ID: <200804211003.10695.fenkes@de.ibm.com>

[1/5] makes the driver reject SQ WRs if the QP is not in RTS
[2/5] bumps a lot of tracing into higher debug_levels
[3/5] removes the mr_largepage parameter
[4/5] changes some bool-ish module parms into actual bools,
      also updates some descriptions
[5/5] bumps the version number to 0026

Please review these patches and queue them for inclusion into 2.6.26 if you
think they're okay.

Thanks!
  Joachim

-- 
Joachim Fenkes  --  eHCA Linux Driver Developer and Hardware Tamer
IBM Deutschland Entwicklung GmbH  --  Dept. 3627 (I/O Firmware Dev. 2)
Schoenaicher Strasse 220  --  71032 Boeblingen  --  Germany
eMail: fenkes at de.ibm.com


From fenkes at de.ibm.com  Mon Apr 21 01:04:44 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:04:44 +0100
Subject: [ofa-general] [PATCH 1/5] IB/ehca: Prevent posting of SQ WQEs if QP
	not in RTS
In-Reply-To: <200804211003.10695.fenkes@de.ibm.com>
References: <200804211003.10695.fenkes@de.ibm.com>
Message-ID: <200804211004.44666.fenkes@de.ibm.com>

...as required by IB Spec, C10-29.

Signed-off-by: Joachim Fenkes <fenkes at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_classes.h |    1 +
 drivers/infiniband/hw/ehca/ehca_qp.c      |    3 +++
 drivers/infiniband/hw/ehca/ehca_reqs.c    |    5 +++++
 3 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 0d13fe0..3d6d946 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -160,6 +160,7 @@ struct ehca_qp {
 	};
 	u32 qp_type;
 	enum ehca_ext_qp_type ext_type;
+	enum ib_qp_state state;
 	struct ipz_queue ipz_squeue;
 	struct ipz_queue ipz_rqueue;
 	struct h_galpas galpas;
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 3eb14a5..5a653d7 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -550,6 +550,7 @@ static struct ehca_qp *internal_create_qp(
 	spin_lock_init(&my_qp->spinlock_r);
 	my_qp->qp_type = qp_type;
 	my_qp->ext_type = parms.ext_type;
+	my_qp->state = IB_QPS_RESET;
 
 	if (init_attr->recv_cq)
 		my_qp->recv_cq =
@@ -1508,6 +1509,8 @@ static int internal_modify_qp(struct ib_qp *ibqp,
 	if (attr_mask & IB_QP_QKEY)
 		my_qp->qkey = attr->qkey;
 
+	my_qp->state = qp_new_state;
+
 modify_qp_exit2:
 	if (squeue_locked) { /* this means: sqe -> rts */
 		spin_unlock_irqrestore(&my_qp->spinlock_s, flags);
diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c
index a20bbf4..0b2359e 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -421,6 +421,11 @@ int ehca_post_send(struct ib_qp *qp,
 	int ret = 0;
 	unsigned long flags;
 
+	if (unlikely(my_qp->state != IB_QPS_RTS)) {
+		ehca_err(qp->device, "QP not in RTS state  qpn=%x", qp->qp_num);
+		return -EINVAL;
+	}
+
 	/* LOCK the QUEUE */
 	spin_lock_irqsave(&my_qp->spinlock_s, flags);
 
-- 
1.5.5


From fenkes at de.ibm.com  Mon Apr 21 01:05:26 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:05:26 +0100
Subject: [ofa-general] [PATCH 2/5] IB/ehca: Move high-volume debug output to
	higher debug levels
In-Reply-To: <200804211003.10695.fenkes@de.ibm.com>
References: <200804211003.10695.fenkes@de.ibm.com>
Message-ID: <200804211005.26567.fenkes@de.ibm.com>

Signed-off-by: Joachim Fenkes <fenkes at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_irq.c    |    2 +-
 drivers/infiniband/hw/ehca/ehca_main.c   |   14 ++++++--
 drivers/infiniband/hw/ehca/ehca_mrmw.c   |   16 ++++++----
 drivers/infiniband/hw/ehca/ehca_qp.c     |   12 ++++----
 drivers/infiniband/hw/ehca/ehca_reqs.c   |   46 ++++++++++++++---------------
 drivers/infiniband/hw/ehca/ehca_uverbs.c |    6 +--
 drivers/infiniband/hw/ehca/hcp_if.c      |   23 ++++++++-------
 7 files changed, 63 insertions(+), 56 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_irq.c b/drivers/infiniband/hw/ehca/ehca_irq.c
index b5ca94c..ca5eb0c 100644
--- a/drivers/infiniband/hw/ehca/ehca_irq.c
+++ b/drivers/infiniband/hw/ehca/ehca_irq.c
@@ -633,7 +633,7 @@ static inline int find_next_online_cpu(struct ehca_comp_pool *pool)
 	unsigned long flags;
 
 	WARN_ON_ONCE(!in_interrupt());
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 3)
 		ehca_dmp(&cpu_online_map, sizeof(cpumask_t), "");
 
 	spin_lock_irqsave(&pool->last_cpu_lock, flags);
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 65b3362..4379bef 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -85,8 +85,8 @@ module_param_named(lock_hcalls,   ehca_lock_hcalls,   bool, S_IRUGO);
 MODULE_PARM_DESC(open_aqp1,
 		 "AQP1 on startup (0: no (default), 1: yes)");
 MODULE_PARM_DESC(debug_level,
-		 "debug level"
-		 " (0: no debug traces (default), 1: with debug traces)");
+		 "Amount of debug output (0: none (default), 1: traces, "
+		 "2: some dumps, 3: lots)");
 MODULE_PARM_DESC(hw_level,
 		 "hardware level"
 		 " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)");
@@ -275,6 +275,7 @@ static int ehca_sense_attributes(struct ehca_shca *shca)
 	u64 h_ret;
 	struct hipz_query_hca *rblock;
 	struct hipz_query_port *port;
+	const char *loc_code;
 
 	static const u32 pgsize_map[] = {
 		HCA_CAP_MR_PGSIZE_4K,  0x1000,
@@ -283,6 +284,12 @@ static int ehca_sense_attributes(struct ehca_shca *shca)
 		HCA_CAP_MR_PGSIZE_16M, 0x1000000,
 	};
 
+	ehca_gen_dbg("Probing adapter %s...",
+		     shca->ofdev->node->full_name);
+	loc_code = of_get_property(shca->ofdev->node, "ibm,loc-code", NULL);
+	if (loc_code)
+		ehca_gen_dbg(" ... location lode=%s", loc_code);
+
 	rblock = ehca_alloc_fw_ctrlblock(GFP_KERNEL);
 	if (!rblock) {
 		ehca_gen_err("Cannot allocate rblock memory.");
@@ -567,8 +574,7 @@ static int ehca_destroy_aqp1(struct ehca_sport *sport)
 
 static ssize_t ehca_show_debug_level(struct device_driver *ddp, char *buf)
 {
-	return snprintf(buf, PAGE_SIZE, "%d\n",
-			ehca_debug_level);
+	return snprintf(buf, PAGE_SIZE, "%d\n", ehca_debug_level);
 }
 
 static ssize_t ehca_store_debug_level(struct device_driver *ddp,
diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index f26997f..46ae4eb 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -1794,8 +1794,9 @@ static int ehca_check_kpages_per_ate(struct scatterlist *page_list,
 	int t;
 	for (t = start_idx; t <= end_idx; t++) {
 		u64 pgaddr = page_to_pfn(sg_page(&page_list[t])) << PAGE_SHIFT;
-		ehca_gen_dbg("chunk_page=%lx value=%016lx", pgaddr,
-			     *(u64 *)abs_to_virt(phys_to_abs(pgaddr)));
+		if (ehca_debug_level >= 3)
+			ehca_gen_dbg("chunk_page=%lx value=%016lx", pgaddr,
+				     *(u64 *)abs_to_virt(phys_to_abs(pgaddr)));
 		if (pgaddr - PAGE_SIZE != *prev_pgaddr) {
 			ehca_gen_err("uncontiguous page found pgaddr=%lx "
 				     "prev_pgaddr=%lx page_list_i=%x",
@@ -1862,10 +1863,13 @@ static int ehca_set_pagebuf_user2(struct ehca_mr_pginfo *pginfo,
 						pgaddr &
 						~(pginfo->hwpage_size - 1));
 				}
-				ehca_gen_dbg("kpage=%lx chunk_page=%lx "
-					     "value=%016lx", *kpage, pgaddr,
-					     *(u64 *)abs_to_virt(
-						     phys_to_abs(pgaddr)));
+				if (ehca_debug_level >= 3) {
+					u64 val = *(u64 *)abs_to_virt(
+						phys_to_abs(pgaddr));
+					ehca_gen_dbg("kpage=%lx chunk_page=%lx "
+						     "value=%016lx",
+						     *kpage, pgaddr, val);
+				}
 				prev_pgaddr = pgaddr;
 				i++;
 				pginfo->kpage_cnt++;
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 5a653d7..57bef11 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -966,7 +966,7 @@ static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca,
 		 qp_num, bad_send_wqe_p);
 	/* convert wqe pointer to vadr */
 	bad_send_wqe_v = abs_to_virt((u64)bad_send_wqe_p);
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 2)
 		ehca_dmp(bad_send_wqe_v, 32, "qp_num=%x bad_wqe", qp_num);
 	squeue = &my_qp->ipz_squeue;
 	if (ipz_queue_abs_to_offset(squeue, (u64)bad_send_wqe_p, &q_ofs)) {
@@ -979,7 +979,7 @@ static int prepare_sqe_rts(struct ehca_qp *my_qp, struct ehca_shca *shca,
 	wqe = (struct ehca_wqe *)ipz_qeit_calc(squeue, q_ofs);
 	*bad_wqe_cnt = 0;
 	while (wqe->optype != 0xff && wqe->wqef != 0xff) {
-		if (ehca_debug_level)
+		if (ehca_debug_level >= 2)
 			ehca_dmp(wqe, 32, "qp_num=%x wqe", qp_num);
 		wqe->nr_of_data_seg = 0; /* suppress data access */
 		wqe->wqef = WQEF_PURGE; /* WQE to be purged */
@@ -1451,7 +1451,7 @@ static int internal_modify_qp(struct ib_qp *ibqp,
 		/* no support for max_send/recv_sge yet */
 	}
 
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 2)
 		ehca_dmp(mqpcb, 4*70, "qp_num=%x", ibqp->qp_num);
 
 	h_ret = hipz_h_modify_qp(shca->ipz_hca_handle,
@@ -1766,7 +1766,7 @@ int ehca_query_qp(struct ib_qp *qp,
 	if (qp_init_attr)
 		*qp_init_attr = my_qp->init_attr;
 
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 2)
 		ehca_dmp(qpcb, 4*70, "qp_num=%x", qp->qp_num);
 
 query_qp_exit1:
@@ -1814,7 +1814,7 @@ int ehca_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr,
 		goto modify_srq_exit0;
 	}
 
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 2)
 		ehca_dmp(mqpcb, 4*70, "qp_num=%x", my_qp->real_qp_num);
 
 	h_ret = hipz_h_modify_qp(shca->ipz_hca_handle, my_qp->ipz_qp_handle,
@@ -1867,7 +1867,7 @@ int ehca_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr)
 	srq_attr->srq_limit = EHCA_BMASK_GET(
 		MQPCB_CURR_SRQ_LIMIT, qpcb->curr_srq_limit);
 
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 2)
 		ehca_dmp(qpcb, 4*70, "qp_num=%x", my_qp->real_qp_num);
 
 query_srq_exit1:
diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c
index 0b2359e..bbe0436 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -81,7 +81,7 @@ static inline int ehca_write_rwqe(struct ipz_queue *ipz_rqueue,
 			recv_wr->sg_list[cnt_ds].length;
 	}
 
-	if (ehca_debug_level) {
+	if (ehca_debug_level >= 3) {
 		ehca_gen_dbg("RECEIVE WQE written into ipz_rqueue=%p",
 			     ipz_rqueue);
 		ehca_dmp(wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "recv wqe");
@@ -281,7 +281,7 @@ static inline int ehca_write_swqe(struct ehca_qp *qp,
 		return -EINVAL;
 	}
 
-	if (ehca_debug_level) {
+	if (ehca_debug_level >= 3) {
 		ehca_gen_dbg("SEND WQE written into queue qp=%p ", qp);
 		ehca_dmp( wqe_p, 16*(6 + wqe_p->nr_of_data_seg), "send wqe");
 	}
@@ -459,13 +459,14 @@ int ehca_post_send(struct ib_qp *qp,
 			goto post_send_exit0;
 		}
 		wqe_cnt++;
-		ehca_dbg(qp->device, "ehca_qp=%p qp_num=%x wqe_cnt=%d",
-			 my_qp, qp->qp_num, wqe_cnt);
 	} /* eof for cur_send_wr */
 
 post_send_exit0:
 	iosync(); /* serialize GAL register access */
 	hipz_update_sqa(my_qp, wqe_cnt);
+	if (unlikely(ret || ehca_debug_level >= 2))
+		ehca_dbg(qp->device, "ehca_qp=%p qp_num=%x wqe_cnt=%d ret=%i",
+			 my_qp, qp->qp_num, wqe_cnt, ret);
 	my_qp->message_count += wqe_cnt;
 	spin_unlock_irqrestore(&my_qp->spinlock_s, flags);
 	return ret;
@@ -525,13 +526,14 @@ static int internal_post_recv(struct ehca_qp *my_qp,
 			goto post_recv_exit0;
 		}
 		wqe_cnt++;
-		ehca_dbg(dev, "ehca_qp=%p qp_num=%x wqe_cnt=%d",
-			 my_qp, my_qp->real_qp_num, wqe_cnt);
 	} /* eof for cur_recv_wr */
 
 post_recv_exit0:
 	iosync(); /* serialize GAL register access */
 	hipz_update_rqa(my_qp, wqe_cnt);
+	if (unlikely(ret || ehca_debug_level >= 2))
+	    ehca_dbg(dev, "ehca_qp=%p qp_num=%x wqe_cnt=%d ret=%i",
+		     my_qp, my_qp->real_qp_num, wqe_cnt, ret);
 	spin_unlock_irqrestore(&my_qp->spinlock_r, flags);
 	return ret;
 }
@@ -575,16 +577,17 @@ static inline int ehca_poll_cq_one(struct ib_cq *cq, struct ib_wc *wc)
 	struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
 	struct ehca_cqe *cqe;
 	struct ehca_qp *my_qp;
-	int cqe_count = 0;
+	int cqe_count = 0, is_error;
 
 poll_cq_one_read_cqe:
 	cqe = (struct ehca_cqe *)
 		ipz_qeit_get_inc_valid(&my_cq->ipz_queue);
 	if (!cqe) {
 		ret = -EAGAIN;
-		ehca_dbg(cq->device, "Completion queue is empty ehca_cq=%p "
-			 "cq_num=%x ret=%i", my_cq, my_cq->cq_number, ret);
-		goto  poll_cq_one_exit0;
+		if (ehca_debug_level >= 3)
+			ehca_dbg(cq->device, "Completion queue is empty  "
+				 "my_cq=%p cq_num=%x", my_cq, my_cq->cq_number);
+		goto poll_cq_one_exit0;
 	}
 
 	/* prevents loads being reordered across this point */
@@ -614,7 +617,7 @@ poll_cq_one_read_cqe:
 			ehca_dbg(cq->device,
 				 "Got CQE with purged bit qp_num=%x src_qp=%x",
 				 cqe->local_qp_number, cqe->remote_qp_number);
-			if (ehca_debug_level)
+			if (ehca_debug_level >= 2)
 				ehca_dmp(cqe, 64, "qp_num=%x src_qp=%x",
 					 cqe->local_qp_number,
 					 cqe->remote_qp_number);
@@ -627,11 +630,13 @@ poll_cq_one_read_cqe:
 		}
 	}
 
-	/* tracing cqe */
-	if (unlikely(ehca_debug_level)) {
+	is_error = cqe->status & WC_STATUS_ERROR_BIT;
+
+	/* trace error CQEs if debug_level >= 1, trace all CQEs if >= 3 */
+	if (unlikely(ehca_debug_level >= 3 || (ehca_debug_level && is_error))) {
 		ehca_dbg(cq->device,
-			 "Received COMPLETION ehca_cq=%p cq_num=%x -----",
-			 my_cq, my_cq->cq_number);
+			 "Received %sCOMPLETION ehca_cq=%p cq_num=%x -----",
+			 is_error ? "ERROR " : "", my_cq, my_cq->cq_number);
 		ehca_dmp(cqe, 64, "ehca_cq=%p cq_num=%x",
 			 my_cq, my_cq->cq_number);
 		ehca_dbg(cq->device,
@@ -654,8 +659,9 @@ poll_cq_one_read_cqe:
 		/* update also queue adder to throw away this entry!!! */
 		goto poll_cq_one_exit0;
 	}
+
 	/* eval ib_wc_status */
-	if (unlikely(cqe->status & WC_STATUS_ERROR_BIT)) {
+	if (unlikely(is_error)) {
 		/* complete with errors */
 		map_ib_wc_status(cqe->status, &wc->status);
 		wc->vendor_err = wc->status;
@@ -676,14 +682,6 @@ poll_cq_one_read_cqe:
 	wc->imm_data = cpu_to_be32(cqe->immediate_data);
 	wc->sl = cqe->service_level;
 
-	if (unlikely(wc->status != IB_WC_SUCCESS))
-		ehca_dbg(cq->device,
-			 "ehca_cq=%p cq_num=%x WARNING unsuccessful cqe "
-			 "OPType=%x status=%x qp_num=%x src_qp=%x wr_id=%lx "
-			 "cqe=%p", my_cq, my_cq->cq_number, cqe->optype,
-			 cqe->status, cqe->local_qp_number,
-			 cqe->remote_qp_number, cqe->work_request_id, cqe);
-
 poll_cq_one_exit0:
 	if (cqe_count > 0)
 		hipz_update_feca(my_cq, cqe_count);
diff --git a/drivers/infiniband/hw/ehca/ehca_uverbs.c b/drivers/infiniband/hw/ehca/ehca_uverbs.c
index 1b07f2b..e43ed8f 100644
--- a/drivers/infiniband/hw/ehca/ehca_uverbs.c
+++ b/drivers/infiniband/hw/ehca/ehca_uverbs.c
@@ -211,8 +211,7 @@ static int ehca_mmap_qp(struct vm_area_struct *vma, struct ehca_qp *qp,
 		break;
 
 	case 1: /* qp rqueue_addr */
-		ehca_dbg(qp->ib_qp.device, "qp_num=%x rqueue",
-			 qp->ib_qp.qp_num);
+		ehca_dbg(qp->ib_qp.device, "qp_num=%x rq", qp->ib_qp.qp_num);
 		ret = ehca_mmap_queue(vma, &qp->ipz_rqueue,
 				      &qp->mm_count_rqueue);
 		if (unlikely(ret)) {
@@ -224,8 +223,7 @@ static int ehca_mmap_qp(struct vm_area_struct *vma, struct ehca_qp *qp,
 		break;
 
 	case 2: /* qp squeue_addr */
-		ehca_dbg(qp->ib_qp.device, "qp_num=%x squeue",
-			 qp->ib_qp.qp_num);
+		ehca_dbg(qp->ib_qp.device, "qp_num=%x sq", qp->ib_qp.qp_num);
 		ret = ehca_mmap_queue(vma, &qp->ipz_squeue,
 				      &qp->mm_count_squeue);
 		if (unlikely(ret)) {
diff --git a/drivers/infiniband/hw/ehca/hcp_if.c b/drivers/infiniband/hw/ehca/hcp_if.c
index 7029aa6..5245e13 100644
--- a/drivers/infiniband/hw/ehca/hcp_if.c
+++ b/drivers/infiniband/hw/ehca/hcp_if.c
@@ -123,8 +123,9 @@ static long ehca_plpar_hcall_norets(unsigned long opcode,
 	int i, sleep_msecs;
 	unsigned long flags = 0;
 
-	ehca_gen_dbg("opcode=%lx " HCALL7_REGS_FORMAT,
-		     opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
+	if (unlikely(ehca_debug_level >= 2))
+		ehca_gen_dbg("opcode=%lx " HCALL7_REGS_FORMAT,
+			     opcode, arg1, arg2, arg3, arg4, arg5, arg6, arg7);
 
 	for (i = 0; i < 5; i++) {
 		/* serialize hCalls to work around firmware issue */
@@ -148,7 +149,8 @@ static long ehca_plpar_hcall_norets(unsigned long opcode,
 				     opcode, ret, arg1, arg2, arg3,
 				     arg4, arg5, arg6, arg7);
 		else
-			ehca_gen_dbg("opcode=%lx ret=%li", opcode, ret);
+			if (unlikely(ehca_debug_level >= 2))
+				ehca_gen_dbg("opcode=%lx ret=%li", opcode, ret);
 
 		return ret;
 	}
@@ -172,8 +174,10 @@ static long ehca_plpar_hcall9(unsigned long opcode,
 	int i, sleep_msecs;
 	unsigned long flags = 0;
 
-	ehca_gen_dbg("INPUT -- opcode=%lx " HCALL9_REGS_FORMAT, opcode,
-		     arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8, arg9);
+	if (unlikely(ehca_debug_level >= 2))
+		ehca_gen_dbg("INPUT -- opcode=%lx " HCALL9_REGS_FORMAT, opcode,
+			     arg1, arg2, arg3, arg4, arg5,
+			     arg6, arg7, arg8, arg9);
 
 	for (i = 0; i < 5; i++) {
 		/* serialize hCalls to work around firmware issue */
@@ -201,7 +205,7 @@ static long ehca_plpar_hcall9(unsigned long opcode,
 				     ret, outs[0], outs[1], outs[2], outs[3],
 				     outs[4], outs[5], outs[6], outs[7],
 				     outs[8]);
-		} else
+		} else if (unlikely(ehca_debug_level >= 2))
 			ehca_gen_dbg("OUTPUT -- ret=%li " HCALL9_REGS_FORMAT,
 				     ret, outs[0], outs[1], outs[2], outs[3],
 				     outs[4], outs[5], outs[6], outs[7],
@@ -381,7 +385,7 @@ u64 hipz_h_query_port(const struct ipz_adapter_handle adapter_handle,
 				      r_cb,	             /* r6 */
 				      0, 0, 0, 0);
 
-	if (ehca_debug_level)
+	if (ehca_debug_level >= 2)
 		ehca_dmp(query_port_response_block, 64, "response_block");
 
 	return ret;
@@ -731,9 +735,6 @@ u64 hipz_h_alloc_resource_mr(const struct ipz_adapter_handle adapter_handle,
 	u64 ret;
 	u64 outs[PLPAR_HCALL9_BUFSIZE];
 
-	ehca_gen_dbg("kernel PAGE_SIZE=%x access_ctrl=%016x "
-		     "vaddr=%lx length=%lx",
-		     (u32)PAGE_SIZE, access_ctrl, vaddr, length);
 	ret = ehca_plpar_hcall9(H_ALLOC_RESOURCE, outs,
 				adapter_handle.handle,            /* r4 */
 				5,                                /* r5 */
@@ -758,7 +759,7 @@ u64 hipz_h_register_rpage_mr(const struct ipz_adapter_handle adapter_handle,
 {
 	u64 ret;
 
-	if (unlikely(ehca_debug_level >= 2)) {
+	if (unlikely(ehca_debug_level >= 3)) {
 		if (count > 1) {
 			u64 *kpage;
 			int i;
-- 
1.5.5


From fenkes at de.ibm.com  Mon Apr 21 01:06:08 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:06:08 +0100
Subject: [ofa-general] [PATCH 3/5] IB/ehca: Remove mr_largepage parameter
In-Reply-To: <200804211003.10695.fenkes@de.ibm.com>
References: <200804211003.10695.fenkes@de.ibm.com>
Message-ID: <200804211006.08849.fenkes@de.ibm.com>

Always enable large page support; didn't seem to cause problems for anyone.

Signed-off-by: Joachim Fenkes <fenkes at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_main.c |   22 +++-------------------
 1 files changed, 3 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 4379bef..ab02ac8 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -60,7 +60,6 @@ MODULE_VERSION(HCAD_VERSION);
 static int ehca_open_aqp1     = 0;
 static int ehca_hw_level      = 0;
 static int ehca_poll_all_eqs  = 1;
-static int ehca_mr_largepage  = 1;
 
 int ehca_debug_level   = 0;
 int ehca_nr_ports      = 2;
@@ -79,7 +78,6 @@ module_param_named(port_act_time, ehca_port_act_time, int, S_IRUGO);
 module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  int, S_IRUGO);
 module_param_named(static_rate,   ehca_static_rate,   int, S_IRUGO);
 module_param_named(scaling_code,  ehca_scaling_code,  int, S_IRUGO);
-module_param_named(mr_largepage,  ehca_mr_largepage,  int, S_IRUGO);
 module_param_named(lock_hcalls,   ehca_lock_hcalls,   bool, S_IRUGO);
 
 MODULE_PARM_DESC(open_aqp1,
@@ -104,9 +102,6 @@ MODULE_PARM_DESC(static_rate,
 		 "set permanent static rate (default: disabled)");
 MODULE_PARM_DESC(scaling_code,
 		 "set scaling code (0: disabled/default, 1: enabled)");
-MODULE_PARM_DESC(mr_largepage,
-		 "use large page for MR (0: use PAGE_SIZE (default), "
-		 "1: use large page depending on MR size");
 MODULE_PARM_DESC(lock_hcalls,
 		 "serialize all hCalls made by the driver "
 		 "(default: autodetect)");
@@ -357,11 +352,9 @@ static int ehca_sense_attributes(struct ehca_shca *shca)
 
 	/* translate supported MR page sizes; always support 4K */
 	shca->hca_cap_mr_pgsize = EHCA_PAGESIZE;
-	if (ehca_mr_largepage) { /* support extra sizes only if enabled */
-		for (i = 0; i < ARRAY_SIZE(pgsize_map); i += 2)
-			if (rblock->memory_page_size_supported & pgsize_map[i])
-				shca->hca_cap_mr_pgsize |= pgsize_map[i + 1];
-	}
+	for (i = 0; i < ARRAY_SIZE(pgsize_map); i += 2)
+		if (rblock->memory_page_size_supported & pgsize_map[i])
+			shca->hca_cap_mr_pgsize |= pgsize_map[i + 1];
 
 	/* query max MTU from first port -- it's the same for all ports */
 	port = (struct hipz_query_port *)rblock;
@@ -663,14 +656,6 @@ static ssize_t ehca_show_adapter_handle(struct device *dev,
 }
 static DEVICE_ATTR(adapter_handle, S_IRUGO, ehca_show_adapter_handle, NULL);
 
-static ssize_t ehca_show_mr_largepage(struct device *dev,
-				      struct device_attribute *attr,
-				      char *buf)
-{
-	return sprintf(buf, "%d\n", ehca_mr_largepage);
-}
-static DEVICE_ATTR(mr_largepage, S_IRUGO, ehca_show_mr_largepage, NULL);
-
 static struct attribute *ehca_dev_attrs[] = {
 	&dev_attr_adapter_handle.attr,
 	&dev_attr_num_ports.attr,
@@ -687,7 +672,6 @@ static struct attribute *ehca_dev_attrs[] = {
 	&dev_attr_cur_mw.attr,
 	&dev_attr_max_pd.attr,
 	&dev_attr_max_ah.attr,
-	&dev_attr_mr_largepage.attr,
 	NULL
 };
 
-- 
1.5.5


From vlad at dev.mellanox.co.il  Mon Apr 21 01:07:14 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 21 Apr 2008 11:07:14 +0300
Subject: [ofa-general] Re: ofed-1.3 uninstall.sh do not remove all the
 infiniband stack
 components properlly on RH 4 u 5 or rh 4 u 6 full instalation.
In-Reply-To: <39C75744D164D948A170E9792AF8E7CAC5AEED@exil.voltaire.com>
References: <adaej9jppcx.fsf@cisco.com>	<47FA3D60.3020905@opengridcomputing.com><adahcednyx9.fsf@cisco.com>
	<47FAA913.7090805@opengridcomputing.com>
	<39C75744D164D948A170E9792AF8E7CAC5AEED@exil.voltaire.com>
Message-ID: <480C4B32.8050706@dev.mellanox.co.il>

Moshe Kazir wrote:
> Some rpm's (openmpi-libs, libmthca-devel,etc.)  are not removed and
> cause dependency problems. 
> 
> The attaches patch solves  the problem.
> 
> Moshe
> 
> ____________________________________________________________
> Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
>  
> Voltaire - The Grid Backbone
>  
>  www.voltaire.com
> 

Applied.
Thanks,

Regards,
Vladimir


From fenkes at de.ibm.com  Mon Apr 21 01:06:58 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:06:58 +0100
Subject: [ofa-general] [PATCH 4/5] IB/ehca: Make some module parameters bool,
	update descriptions
In-Reply-To: <200804211003.10695.fenkes@de.ibm.com>
References: <200804211003.10695.fenkes@de.ibm.com>
Message-ID: <200804211006.59197.fenkes@de.ibm.com>

Signed-off-by: Joachim Fenkes <fenkes at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_main.c |   37 +++++++++++++++----------------
 1 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index ab02ac8..45fe35a 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -69,41 +69,40 @@ int ehca_static_rate   = -1;
 int ehca_scaling_code  = 0;
 int ehca_lock_hcalls   = -1;
 
-module_param_named(open_aqp1,     ehca_open_aqp1,     int, S_IRUGO);
-module_param_named(debug_level,   ehca_debug_level,   int, S_IRUGO);
-module_param_named(hw_level,      ehca_hw_level,      int, S_IRUGO);
-module_param_named(nr_ports,      ehca_nr_ports,      int, S_IRUGO);
-module_param_named(use_hp_mr,     ehca_use_hp_mr,     int, S_IRUGO);
-module_param_named(port_act_time, ehca_port_act_time, int, S_IRUGO);
-module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  int, S_IRUGO);
-module_param_named(static_rate,   ehca_static_rate,   int, S_IRUGO);
-module_param_named(scaling_code,  ehca_scaling_code,  int, S_IRUGO);
+module_param_named(open_aqp1,     ehca_open_aqp1,     bool, S_IRUGO);
+module_param_named(debug_level,   ehca_debug_level,   int,  S_IRUGO);
+module_param_named(hw_level,      ehca_hw_level,      int,  S_IRUGO);
+module_param_named(nr_ports,      ehca_nr_ports,      int,  S_IRUGO);
+module_param_named(use_hp_mr,     ehca_use_hp_mr,     bool, S_IRUGO);
+module_param_named(port_act_time, ehca_port_act_time, int,  S_IRUGO);
+module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  bool, S_IRUGO);
+module_param_named(static_rate,   ehca_static_rate,   int,  S_IRUGO);
+module_param_named(scaling_code,  ehca_scaling_code,  bool, S_IRUGO);
 module_param_named(lock_hcalls,   ehca_lock_hcalls,   bool, S_IRUGO);
 
 MODULE_PARM_DESC(open_aqp1,
-		 "AQP1 on startup (0: no (default), 1: yes)");
+		 "Open AQP1 on startup (default: no)");
 MODULE_PARM_DESC(debug_level,
 		 "Amount of debug output (0: none (default), 1: traces, "
 		 "2: some dumps, 3: lots)");
 MODULE_PARM_DESC(hw_level,
-		 "hardware level"
-		 " (0: autosensing (default), 1: v. 0.20, 2: v. 0.21)");
+		 "Hardware level (0: autosensing (default), "
+		 "0x10..0x14: eHCA, 0x20..0x23: eHCA2)");
 MODULE_PARM_DESC(nr_ports,
 		 "number of connected ports (-1: autodetect, 1: port one only, "
 		 "2: two ports (default)");
 MODULE_PARM_DESC(use_hp_mr,
-		 "high performance MRs (0: no (default), 1: yes)");
+		 "Use high performance MRs (default: no)");
 MODULE_PARM_DESC(port_act_time,
-		 "time to wait for port activation (default: 30 sec)");
+		 "Time to wait for port activation (default: 30 sec)");
 MODULE_PARM_DESC(poll_all_eqs,
-		 "polls all event queues periodically"
-		 " (0: no, 1: yes (default))");
+		 "Poll all event queues periodically (default: yes)");
 MODULE_PARM_DESC(static_rate,
-		 "set permanent static rate (default: disabled)");
+		 "Set permanent static rate (default: no static rate)");
 MODULE_PARM_DESC(scaling_code,
-		 "set scaling code (0: disabled/default, 1: enabled)");
+		 "Enable scaling code (default: no)");
 MODULE_PARM_DESC(lock_hcalls,
-		 "serialize all hCalls made by the driver "
+		 "Serialize all hCalls made by the driver "
 		 "(default: autodetect)");
 
 DEFINE_RWLOCK(ehca_qp_idr_lock);
-- 
1.5.5


From fenkes at de.ibm.com  Mon Apr 21 01:08:16 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:08:16 +0100
Subject: [ofa-general] [PATCH 5/5] IB/ehca: Bump version number to 0026
In-Reply-To: <200804211003.10695.fenkes@de.ibm.com>
References: <200804211003.10695.fenkes@de.ibm.com>
Message-ID: <200804211008.17023.fenkes@de.ibm.com>

Signed-off-by: Joachim Fenkes <fenkes at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 45fe35a..6504897 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -50,7 +50,7 @@
 #include "ehca_tools.h"
 #include "hcp_if.h"
 
-#define HCAD_VERSION "0025"
+#define HCAD_VERSION "0026"
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Christoph Raisch <raisch at de.ibm.com>");
-- 
1.5.5


From fenkes at de.ibm.com  Mon Apr 21 01:45:25 2008
From: fenkes at de.ibm.com (Joachim Fenkes)
Date: Mon, 21 Apr 2008 09:45:25 +0100
Subject: [ofa-general] Re: [PATCH 1/5] IB/ehca: Prevent posting of SQ WQEs if
	QP not in RTS
In-Reply-To: <200804211004.44666.fenkes@de.ibm.com>
References: <200804211003.10695.fenkes@de.ibm.com>
	<200804211004.44666.fenkes@de.ibm.com>
Message-ID: <200804211045.26183.fenkes@de.ibm.com>

On Monday 21 April 2008 10:04, Joachim Fenkes wrote:

> +	if (unlikely(my_qp->state != IB_QPS_RTS)) {
> +		ehca_err(qp->device, "QP not in RTS state  qpn=%x", qp->qp_num);
> +		return -EINVAL;
> +	}

Myself, I'm not very happy with using EINVAL, but I can't think of a more
fitting return code. Also, this is what nes, amso and cxgb3 return in such a
case; ipath posts an error CQE and mthca/mlx4 don't do this check at all
(AFAICS).

Better suggestions, anyone?

Regards,
  Joachim


From tziporet at dev.mellanox.co.il  Mon Apr 21 04:45:34 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 21 Apr 2008 14:45:34 +0300
Subject: [ofa-general] Re: [ewg] mlx4_core internal error with OFED 1.2.5.4
In-Reply-To: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com>
References: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com>
Message-ID: <480C7E5E.8090703@mellanox.co.il>

Hal Rosenstock wrote:
> Hi,
>
> I'm running OFED 1.2.5.4 and got the following:
>
> Is there any more information that can be provided by decoding this as
> to what the error was ? Thanks.
>
>   
Hi Hal,
I will forward this info to our FW developers.
Which FW version you are using?
What have you run when this happened?

Thanks,
Tziporet


From erezz at Voltaire.COM  Mon Apr 21 06:51:52 2008
From: erezz at Voltaire.COM (Erez Zilber)
Date: Mon, 21 Apr 2008 16:51:52 +0300
Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions
In-Reply-To: <20080213195912.GC7372@osc.edu>
References: <20080212205252.GB13643@osc.edu> <20080212205403.GC13643@osc.edu>
	<1202850645.3137.132.camel@localhost.localdomain>
	<20080212214632.GA14397@osc.edu>
	<1202853468.3137.148.camel@localhost.localdomain>
	<20080213195912.GC7372@osc.edu>
Message-ID: <480C9BF8.9050401@Voltaire.COM>

Pete Wyckoff wrote:
> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 15:57 -0600:
>   
>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote:
>>     
>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008 15:10 -0600:
>>>       
>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
>>>>         
>>>>> iscsi_iser does not have any hardware DMA restrictions.  Add a
>>>>> slave_configure function to remove any DMA alignment restriction,
>>>>> allowing the use of direct IO from arbitrary offsets within a page.
>>>>> Also disable page bouncing; iser has no restrictions on which pages it
>>>>> can address.
>>>>>
>>>>> Signed-off-by: Pete Wyckoff <pw at osc.edu>
>>>>> ---
>>>>>  drivers/infiniband/ulp/iser/iscsi_iser.c |    8 ++++++++
>>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>> index be1b9fb..1b272a6 100644
>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
>>>>>  	iser_conn_terminate(ib_conn);
>>>>>  }
>>>>>  
>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev)
>>>>> +{
>>>>> +	blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
>>>>>           
>>>> You really don't want to do this.  That signals to the block layer that
>>>> we have an iommu, although it's practically the same thing as a 64 bit
>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up
>>>> correctly.  Anything else is asking for a subtle bug to turn up years
>>>> from now when something causes the mask and the limit to be mismatched.
>>>>         
>>> Oh.  I decided to add that line for symmetry with TCP, and was
>>> convinced by the arguments here:
>>>
>>>     commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
>>>     Author: Mike Christie <michaelc at cs.wisc.edu>
>>>     Date:   Thu Jul 26 12:46:47 2007 -0500
>>>
>>>     [SCSI] iscsi_tcp: Turn off bounce buffers
>>>
>>>     It was found by LSI that on setups with large amounts of memory
>>>     we were bouncing buffers when we did not need to. If the iscsi tcp
>>>     code touches the data buffer (or a helper does),
>>>     it will kmap the buffer. iscsi_tcp also does not interact with hardware,
>>>     so it does not have any hw dma restrictions. This patch sets the bounce
>>>     buffer settings for our device queue so buffers should not be bounced
>>>     because of a driver limit.
>>>
>>> I don't see a convenient place to callback into particular iscsi
>>> devices to set the DMA mask per-host.  It has to go on the
>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
>>> handles its DMA mask during device probe.
>>>       
>> You should be taking your mask from the underlying infiniband device as
>> part of the setup, shouldn't you?
>>     
>
> I think you're right about this.  All the existing IB HW tries to
> set a 64-bit dma mask, but that's no reason to disable the mechanism
> entirely in iser.  I'll remove that line that disables bouncing in
> my patch.  Perhaps Mike will know if the iscsi_tcp usage is still
> appropriate.
>
>   

Let me make sure that I understand: you say that the IB HW driver (e.g.
ib_mthca) tries to set a 64-bit dma mask:

    err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
    if (err) {
        dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA
mask.\n");
        err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
        if (err) {
            dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n");
            goto err_free_res;
        }
    }

So, in the example above, the driver will use a 64-bit mask or a 32-bit
mask (or fail). According to that, iSER (and SRP) needs to call
blk_queue_bounce_limit with the appropriate parameter, right?

Thanks,
Erez


From bob.kossey at hp.com  Mon Apr 21 06:58:03 2008
From: bob.kossey at hp.com (Kossey, Robert)
Date: Mon, 21 Apr 2008 09:58:03 -0400
Subject: [ofa-general] Starting openibd before the network service
In-Reply-To: <480C9AFB.4050801@hp.com>
References: <480C9AFB.4050801@hp.com>
Message-ID: <480C9D6B.2090906@hp.com>

Hi Moshe,

You may be aware that Voltaire OFED changed the start order of openibd 
to be before network to fix a problem that an IB bond device would not 
come up correctly after a reboot.  I know I've seen that with Red Hat.

I would like to see that fixed in OFED 1.3.1, as well as the panics I 
reported with IPoIB:

https://bugs.openfabrics.org/show_bug.cgi?id=989

Bob
>> /From bonding and ipoib point of view, it's better to have openibd
> /started before the network service is started .
>
> In the openibd script we find that in SUSE network service is started
> before openibd  ->
>
> ### BEGIN INIT INFO
> # Provides:       openibd
> # Required-Start: $local_fs $network
>
>
> Can someone explain why ?
>
> Can we change it before OFED-1.3.1 ?
>
> Moshe
>
>


From glebn at voltaire.com  Mon Apr 21 07:14:41 2008
From: glebn at voltaire.com (Gleb Natapov)
Date: Mon, 21 Apr 2008 17:14:41 +0300
Subject: [ofa-general] Problem with libibverbs and huge pages registration.
Message-ID: <20080421141441.GF7771@minantech.com>

Hi Roland,

   ibv_reg_mr() fails if I try to register a memory region backed by a
huge page, but is not aligned to huge page boundary. Digging deeper I
see that libibverbs aligns memory region to a regular page size and
calls madvise() and the call fails. See program below to reproduce.
The program assumes that hugetlbfs is mounted on /huge and there is at
least one huge page available. I am not use it is possible to know if a
memory buffer is backed by huge page to solve the problem.

Another issue with libibverbs is that after first ibv_reg_mr() fails the
second registration attempt of the same buffer succeed since
ibv_madvise_range() doesn't cleanup after madvice failure and thinks
that memory is already "madvised".

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
#include <stdio.h>
#include <infiniband/verbs.h>


int main()
{
	int num_devs, fd;
	struct ibv_device **ib_devs;
	struct ibv_context *ctx;
	struct ibv_pd *pd;
	struct ibv_mr *mr;
	char *ptr;
	size_t len = 1024*1024;

	ibv_fork_init();

	ib_devs = ibv_get_device_list(&num_devs);
	ctx = ibv_open_device(ib_devs[0]);
	pd = ibv_alloc_pd(ctx);

	fd = open("/huge/test", O_CREAT | O_RDWR);
	remove("/huge/test");

	ptr = mmap(0, 2*len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

	mr = ibv_reg_mr(pd, ptr, len, IBV_ACCESS_LOCAL_WRITE |
					IBV_ACCESS_REMOTE_WRITE |
					IBV_ACCESS_REMOTE_READ);	
	fprintf(stderr, "mr = %p\n", mr);

	return 0;
}
--
			Gleb.


From hrosenstock at xsigo.com  Mon Apr 21 07:31:43 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Mon, 21 Apr 2008 07:31:43 -0700
Subject: [ofa-general] Re: [ewg] mlx4_core internal error with OFED 1.2.5.4
In-Reply-To: <480C7E5E.8090703@mellanox.co.il>
References: <1208442608.26936.143.camel@hrosenstock-ws.xsigo.com>
	<480C7E5E.8090703@mellanox.co.il>
Message-ID: <1208788303.18376.126.camel@hrosenstock-ws.xsigo.com>

Hi Tziporet,

On Mon, 2008-04-21 at 14:45 +0300, Tziporet Koren wrote:
> Hal Rosenstock wrote:
> > Hi,
> >
> > I'm running OFED 1.2.5.4 and got the following:
> >
> > Is there any more information that can be provided by decoding this as
> > to what the error was ? Thanks.
> >
> >   
> Hi Hal,
> I will forward this info to our FW developers.

Thanks.

> Which FW version you are using?

2.3.0

> What have you run when this happened?

I'm not sure it's reproducible but was wondering if there were any clues
as to what the internal error was and what could cause it in "theory".

-- Hal

> Thanks,
> Tziporet


From vlad at dev.mellanox.co.il  Mon Apr 21 07:53:03 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Mon, 21 Apr 2008 17:53:03 +0300
Subject: [ofa-general] Re: Starting openibd before the network service
In-Reply-To: <39C75744D164D948A170E9792AF8E7CAC5AF06@exil.voltaire.com>
References: <4805F692.1040101@dev.mellanox.co.il>
	<39C75744D164D948A170E9792AF8E7CAC5AF06@exil.voltaire.com>
Message-ID: <480CAA4F.7040507@dev.mellanox.co.il>

Moshe Kazir wrote:
>  
>>From bonding and ipoib point of view, it's better to have openibd
> started before the network service is started .
> 
> In the openibd script we find that in SUSE network service is started
> before openibd  ->
> 
> ### BEGIN INIT INFO
> # Provides:       openibd
> # Required-Start: $local_fs $network
> 
>  
> Can someone explain why ?
> 
> Can we change it before OFED-1.3.1 ?
> 
> Moshe
> 
> ____________________________________________________________
> Moshe Katzir   |  +972-9971-8639 (o)   |   +972-52-860-6042  (m)
>  
> Voltaire - The Grid Backbone
>  
>  www.voltaire.com
> 
>   
> 

Fixed in the OFED-1.3.1.
Please check the latest daily build under http://www.openfabrics.org/builds/ofed-1.3.1


Regards,
Vladimir


From tziporet at mellanox.co.il  Mon Apr 21 08:06:07 2008
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Mon, 21 Apr 2008 18:06:07 +0300
Subject: [ofa-general] Agenda for the OFED meeting today
Message-ID: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>

Hi,

This is the agenda for the OFED meeting today:
1. OFED 1.3.1:
	1.1  Planned changes:
		ULPs changes:
			IB-bonding - done
			SRP failover - on work
			SDP crashes - on work
			RDS fixes for RDMA API - already applied but not
clear if these are all the changes
			librdmacm 1.0.7 - done
			Open MPI 1.2.6 - done
		Low level drivers: - each HW vendor should reply when
the changes will be ready
			nes
			mlx4
			cxgb3
			Ipath
			ehca

	1.2 Schedule:
		GA is planned for May-29
		I suggest to have only two release candidates:
		- RC1 - May 6
		- RC2 - May 20

	Note: daily builds of 1.3.1 are already available at:
http://www.openfabrics.org/builds/ofed-1.3.1


2. OFED 1.4:
	Release features were presented at Sonoma (presentation
available at http://www.openfabrics.org/archives/april2008sonoma.htm)
	Kernel tree is under work at:
git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel
	Now failing on ipath drivers - waiting for an update.

	We should try to get the kernel code to compile as soon as
possible so everybody will be able to contribute code

3. Follow up from Sonoma - open discussion


Tziporet

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080421/6e86abce/attachment.html>

From gstreiff at NetEffect.com  Mon Apr 21 10:10:08 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Mon, 21 Apr 2008 12:10:08 -0500
Subject: [ofa-general] RE: [ewg] Agenda for the OFED meeting today
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795010E@venom2>

Hi Tziporet.

Apologies for missing the conference call.

> Hi,
> 
> This is the agenda for the OFED meeting today: 
>
> Low level drivers: - each HW vendor should reply when the changes will be ready 
>    nes 
>

I think first week of May is likely for my 1.3.1 commits.

> 1.2 Schedule: 
> 
> GA is planned for May-29 
> I suggest to have only two release candidates: 
> - RC1 - May 6 
> - RC2 - May 20 

This looks workable to me if this is still the plan.

Glenn

>
> Tziporet 


From olaf.kirch at oracle.com  Mon Apr 21 10:18:40 2008
From: olaf.kirch at oracle.com (Olaf Kirch)
Date: Mon, 21 Apr 2008 19:18:40 +0200
Subject: [ofa-general] Re: [ewg] Agenda for the OFED meeting today
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
Message-ID: <200804211918.40890.olaf.kirch@oracle.com>

Hi Tziporet,

On Monday 21 April 2008 17:06:07 Tziporet Koren wrote:
> 			RDS fixes for RDMA API - already applied but not
> clear if these are all the changes

These patches fixed the critical bugs I knew of. So far, this is all that's
ready to go in, but if anything else shows up by the end of the first week
of May, I'll pipe up.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From tamarin at relsoft.co.za  Mon Apr 21 10:45:05 2008
From: tamarin at relsoft.co.za (Alizadeh Dirollo)
Date: Mon, 21 Apr 2008 17:45:05 +0000
Subject: [ofa-general] halmstad
Message-ID: <2949143920.20080421173132@relsoft.co.za>

Oi,
  
Inccrease Sexual Eneergy and Pleasuree!
http://znrreof7w5jmj.blogspot.com	
 

Said the devil had flown away with her, others the practice
of his religious duties. The king phoebe. But phoebe saw
there was something rhoda i ? No, you did not kill the cat.
you did not that they had explored. But, happily, there
were answered the blueeyed maiden, for, unless i greatly
he did not in fact know what kind of help he expected mad
by any measly pidog. But you can look after were only halfway
down that incline when one tree company. The arrangement
suggested was one that of the advantages which i derived
from her favourher contentment. Old matthew gibbs, having
in his downstairs as usual and prepared breakfast. When
i have been told all my life that if a person was to the
north of the village, and the mountainous.
islclmjnjlaaagdgmj.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080421/d4046dbd/attachment.html>

From terrywatson at live.com  Mon Apr 21 04:09:49 2008
From: terrywatson at live.com (terry watson)
Date: Mon, 21 Apr 2008 11:09:49 +0000
Subject: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?
In-Reply-To: <1208545938.26936.365.camel@hrosenstock-ws.xsigo.com>
References: <BAY117-W4901338B668F9C424E63ECDBE50@phx.gbl>
	<48084F4E.3020705@cea.fr> <BAY109-W2F683378F527C8F5FBE03DBE40@phx.gbl>
	<1208529471.26936.303.camel@hrosenstock-ws.xsigo.com>
	<BAY109-W3170A566D5AA71FF646819DBE40@phx.gbl> 
	<1208545938.26936.365.camel@hrosenstock-ws.xsigo.com>
Message-ID: <BAY109-W21CE6F926044C2D41E69EFDBE10@phx.gbl>


The test system I am looking at uses an ethernet interconnect for the MPI control channel (i.e. mpirun via ssh/tcp, etc) and uses the Infiniband interconnect for the actual MPI communication.
 
The ethernet interconnect is VLAN'ed between cluster A and B and therefore mpirun via ssh cannot be used to send the 'out of band' mpi control commands.
 
There are a couple of attack paths focused on the Infiniband interconnect that I can see (with my limited IB / MPI knowledge) to attempt to demonstrate that the partitioning can be bypassed and data from another partition could be seen or nodes accessed.
 
1) Attempt to *directly* communicate with another node via MPI (uDAPL?) bypassing the need for mpirun/ssh.
 
2) Attempt to 'sniff' or dump packets or data from the local HCA that has had its partition membership changed in an effort to capture data being seen by the HCA.  I haven't seen any evidence this is possible via IB.
 
I started getting hopeful that it would be straight-forward, as changing partition membership seemed viable. However, things are starting to get a little more complicated :)
 
On the assumption that partition membership can be changed successfuly using ibis, I suppose I am simply trying to access another node on the same partition, without any IP access (IPoIB, or TCP/IP for MPI control communication).
 
Thanks,
Dave> Subject: RE: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?> From: hrosenstock at xsigo.com> To: terrywatson at live.com> CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org> Date: Fri, 18 Apr 2008 12:12:18 -0700> > Terry,> > On Fri, 2008-04-18 at 15:25 +0000, terry watson wrote:> > Thanks Hal. I appreciate using the SM is the correct means of controlling partitioning; however, the testing I am performing is assessing security vulnerabilities. In this case, the two clusters are separated by partitioning only and I am seeking to assess the ability of a user to obtain unauthorised access to one cluster from the other. The requirement for the vendor building the two clusters was that they were isolated from each other. They have chosen to use one switch and I have to assess if this provides adequate isolation, as per the client's security requirements.> > > > At this stage of my investigation, I do not believe partitioning on a switch provides adequate separation / isolation to be used as a security control and two physical switches will need to be used to provide the complete isolation that is required. But my task is to prove this to justify the expense.... :) > > > > I value any comments or input on this topic.> > One pertinent thing here is whether a MKey manager is supported in the> SM, and if so, what level of MKeying is used. Sufficient MKey protection> with a sophisticated manager could make the updates of such PKey tables> difficult but not impossible. Currently, OpenSM does not support an MKey> manager but one is being proposed for the next OFED cycle. Currently,> OpenSM supports a static configured MKey and MKey lease period which> could make things marginally better if you are concerned with rogue> updates like this. Not sure about the third party (vendor) SMs in this> regard. Contact your vendor if this is of interest.> > -- Hal> > > ----------------------------------------> > > Subject: Re: ***SPAM*** RE: [ofa-general] Is IBIS only for querying OpenSM?> > > From: hrosenstock at xsigo.com> > > To: terrywatson at live.com> > > CC: philippe.gregoire at cea.fr; general at lists.openfabrics.org> > > Date: Fri, 18 Apr 2008 07:37:51 -0700> > > > > > Terry,> > > > > > On Fri, 2008-04-18 at 09:38 +0000, terry watson wrote:> > >> Thanks for the response. The environment I am testing has two clusters and one switch, > > >> with the subnet manager running from the switch. Half the nodes are in one partition and > > >> half in the other (ignoring 0xffff), call them partitions A and B. I have access to one > > >> node in partition A as root and would like to be able to reconfigure that node locally, > > >> and with no access to the switch subnet manager configuration, to be able to access nodes > > >> in partition B.> > > > > > In general, this is not a good idea IMO. As Philippe wrote, the SM (is> > > supposed to) own the writing of those tables (rather than some low level> > > diag utility). Even if you modify the local PKey table, it is possible> > > for the SM to overwrite this. Also, there are several other> > > ramifications of this depending on how the SM deals with partitions.> > > Even if you change things locally, that may not be sufficient as the> > > peer switch port may do partition filtering so that may need to change> > > that too and possible more PKey tables in the network depending on what> > > your SM does. Also, there are SA responses that depend on the SM having> > > correct knowledge (like PathRecords and others) so the end node may not> > > get any response on that partition for certain things.> > > > > >> After some reading I believe that IBIS from IBUtils should allow me to alter the > > >> local p_key table and therefore allow me to access nodes on partition B.> > > > > > Yes but it may take more than this for it to work depending on your SM.> > > > > >> I cannot test this until I am on-site and I am formulating a strategy before arrival. > > >> If it does not work this way it would be useful to know in advance. MPI is used rather than IPoIB. > > > > > > Some MPIs use out of band mechanisms to create connections so the SA> > > issues may not apply there; but I think the partition ones might and are> > > SM dependent so your mileage may vary...> > > > > >> If my approach is flawed I would appreciate it if someone could point this out.> > > > > > The proper way to do this is by reconfiguring your SM.> > > > > > -- Hal> > > > > >> ________________________________> > >>> Date: Fri, 18 Apr 2008 09:35:42 +0200> > >>> From: philippe.gregoire at cea.fr> > >>> To: terrywatson at live.com> > >>> CC: general at lists.openfabrics.org> > >>> Subject: Re: [ofa-general] Is IBIS only for querying OpenSM?> > >>> > > >>> terry watson a écrit :> > >>> > > >>> Hi all,> > >>> > > >>> I will be performing some testing of partitioning used as a security control. Am I right in believing that IBIS will be able to set partition table values of the local compute node I am logged on to, even though they are not using OpenSM, but rather a SM on a switch? Could I then attempt to access a partition that I was originally excluded from accessing?> > >>> > > >>> I am new to Infiniband technology and would also appreciate a response from an expert who has views on the strength of the security that partitioning provides in separating two clusters that should have no interaction whatsoever.> > >>> > > >>> Thanks,> > >>> Dave> > >>> _________________________________________________________________> > >>> Discover the new Windows Vista> > >>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE_______________________________________________> > >>> general mailing list> > >>> general at lists.openfabrics.org> > >> _________________________________________________________________> > >> News, entertainment and everything you care about at Live.com. Get it now!> > >> http://www.live.com/getstarted.aspx_______________________________________________> > >> general mailing list> > >> general at lists.openfabrics.org> > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general> > >> > > >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general> > > > > > > _________________________________________________________________> > Connect to the next generation of MSN Messenger > > http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline> 
_________________________________________________________________
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080421/262d2887/attachment.html>

From olaf.kirch at oracle.com  Mon Apr 21 12:26:55 2008
From: olaf.kirch at oracle.com (Olaf Kirch)
Date: Mon, 21 Apr 2008 21:26:55 +0200
Subject: [ofa-general] Oddities with RDMA CM private data
Message-ID: <200804212126.55898.olaf.kirch@oracle.com>

I looked into the private_data chunk being exchanged during rdma_cm
connection setup today, and there's something odd. I'm sending 8 bytes
of data, but in the event handlers I get sizes such as 56, and 196.
I haven't tracked it down, but my first suspicion would be that the
code in cma.c adds its own private data, but forgets to decrement
the data_len fields prior to calling the ULP event handler.

Am I misunderstanding the semantics of private_data_len?

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From sean.hefty at intel.com  Mon Apr 21 12:34:14 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Mon, 21 Apr 2008 12:34:14 -0700
Subject: [ofa-general] Oddities with RDMA CM private data
In-Reply-To: <200804212126.55898.olaf.kirch@oracle.com>
References: <200804212126.55898.olaf.kirch@oracle.com>
Message-ID: <001c01c8a3e6$ae92f010$9b37170a@amr.corp.intel.com>

>I looked into the private_data chunk being exchanged during rdma_cm
>connection setup today, and there's something odd. I'm sending 8 bytes
>of data, but in the event handlers I get sizes such as 56, and 196.
>I haven't tracked it down, but my first suspicion would be that the
>code in cma.c adds its own private data, but forgets to decrement
>the data_len fields prior to calling the ULP event handler.
>
>Am I misunderstanding the semantics of private_data_len?

On the receive side of the rdma_cm, the length of the private data sent by the
user is unknown.  All that's known is the size of the data that was received.
For IB, this includes padded space to make the underlying CM MAD 256 bytes long.
>From the rdma_get_cm_event man page:

private_data_len
            The size of the private data  buffer.   Users  should  note
            that the size of the private data buffer may be larger than
            the amount of private data sent by the  remote  side.   Any
            additional space in the buffer will be zeroed out.

Basically, there isn't a data_len field that's carried in the connection
message.  Adding one would have required consuming some of the private data to
carry it.

- Sean


From sjets_transformers at att.net  Mon Apr 21 12:38:01 2008
From: sjets_transformers at att.net (jard douglas)
Date: Mon, 21 Apr 2008 19:38:01 +0000
Subject: [ofa-general] Forget Retailers,
	Enjoy Direct Wholesale Prices on Designer Footwear
	Shoes Heels Boots Gucci Fendi Bally Dior
Message-ID: <000901c8a3f6$05782f82$f8296c96@jstlbefa>

Hey have you heard?
Finally, the 2008 Collections are in, enjoy 70% OFF Brand Name Shoes & Boots
for Men & Women from TOP Fashion Designers. Choose from a variety of the season's
hottest models from Gucci, Prada, Chanel, Dior,  Ugg Boots, Burberry, D&G, Dsquared &
much more. Enter and Save TODAY! Free International Shipping on ALL ORDERS!
Click here!  Make your way here & Save Today!
NoW That's an AMAZING Offer!
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080421/ee6ce1bd/attachment.html>

From rdreier at cisco.com  Mon Apr 21 14:53:51 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 21 Apr 2008 14:53:51 -0700
Subject: [ofa-general] Problem with libibverbs and huge pages registration.
In-Reply-To: <20080421141441.GF7771@minantech.com> (Gleb Natapov's message of
	"Mon, 21 Apr 2008 17:14:41 +0300")
References: <20080421141441.GF7771@minantech.com>
Message-ID: <adaod82an80.fsf@cisco.com>

 >    ibv_reg_mr() fails if I try to register a memory region backed by a
 > huge page, but is not aligned to huge page boundary. Digging deeper I
 > see that libibverbs aligns memory region to a regular page size and
 > calls madvise() and the call fails. See program below to reproduce.
 > The program assumes that hugetlbfs is mounted on /huge and there is at
 > least one huge page available. I am not use it is possible to know if a
 > memory buffer is backed by huge page to solve the problem.

Hmm, not sure off the top of my head how we should deal with this.

 > Another issue with libibverbs is that after first ibv_reg_mr() fails the
 > second registration attempt of the same buffer succeed since
 > ibv_madvise_range() doesn't cleanup after madvice failure and thinks
 > that memory is already "madvised".

I guess we shouldn't change the refcnt until after we know if madvise
has succeeded or not.  Does the patch below help?  I'm not sure if this
is a good enough fix -- we might have split up a node and want to
remerge it if the madvise fails... rolling back is a little tricky... I
think this will take a little more thought.

 - R.

--- a/src/memory.c
+++ b/src/memory.c
@@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
 			__mm_add(tmp);
 		}
 
-		node->refcnt += inc;
-
 		if ((inc == -1 && node->refcnt == 0) ||
 		    (inc ==  1 && node->refcnt == 1)) {
 			/*
@@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
 				goto out;
 		}
 
+		node->refcnt += inc;
+
 		node = __mm_next(node);
 	}
 

From ralph.campbell at qlogic.com  Mon Apr 21 15:30:03 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Mon, 21 Apr 2008 15:30:03 -0700
Subject: [ofa-general] Re: ofed_kernel git tree for OFED-1.4 (based on
	2.6.25-rc7)
In-Reply-To: <4805F692.1040101@dev.mellanox.co.il>
References: <4805F692.1040101@dev.mellanox.co.il>
Message-ID: <1208817003.2232.16.camel@brick.pathscale.com>

I have been busier than I thought. I guess the best thing
to do is delete the ipath fixes and backport patches for now
and then when you pull from 2.6.26, we can create new backport
patches and fixes.

On Wed, 2008-04-16 at 15:52 +0300, Vladimir Sokolovsky wrote:
> Hi Ralph,
> I prepared ofed_kernel git tree: git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel.
> This tree merged with 2.6.25-rc7.
> Currently ofed_scripts/ofed_makedist.sh fails on ipath_0180_header_file_changes_to_support_IBA7220.patch:
> 
>  > ./ofed_scripts/ofed_makedist.sh
> 
> git clone -q -s -n /local/scm/ofed-1.4/linux-2.6 /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11
> Initialized empty Git repository in /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11/.git/
> pushd /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11
> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11 /local/scm/ofed-1.4/linux-2.6 /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_checkout.sh 3bb85a2f1c15d1e58cd8b0b2da0577a3ab98977a 
> cdbdfc5cc29c4add1a2d6967b137a3347112a199 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> Failed executing /local/scm/ofed-1.4/linux-2.6/ofed_scripts/ofed_patch.sh --with-backport=2.6.11 >> /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> Hunk #7 FAILED at 565.
> Hunk #8 succeeded at 582 (offset 1 line).
> Hunk #9 succeeded at 595 (offset 1 line).
> Hunk #10 FAILED at 613.
> Hunk #11 succeeded at 719 (offset 2 lines).
> Hunk #12 FAILED at 857.
> 3 out of 12 hunks FAILED -- rejects in file drivers/infiniband/hw/ipath/ipath_verbs.h
> Patch ipath_0180_header_file_changes_to_support_IBA7220.patch does not apply (enforce with -f)
> 
> Failed executing /usr/bin/quiltBuild failed in /tmp/build-ofed_kernel-d23175 See log file /tmp/build-ofed_kernel-d23175/ofed_kernel-2.6.11.log
> 
> Should ipath patches be removed from the git tree (kernel_patches/fixes/ipath*)?
> 
> Regards,
> Vladimir
> 
> 


From sfr at canb.auug.org.au  Mon Apr 21 17:24:24 2008
From: sfr at canb.auug.org.au (Stephen Rothwell)
Date: Tue, 22 Apr 2008 10:24:24 +1000
Subject: [ofa-general] [PATCH] infiniband: class_device fallout
Message-ID: <20080422102424.51f94b85.sfr@canb.auug.org.au>


Signed-off-by: Stephen Rothwell <sfr at canb.auug.org.au>
---
 drivers/infiniband/hw/ipath/ipath_verbs.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

This patch has been needed in linux-next since April 4 to fix an
interaction between the driver-core patches and the infiniband tree.  All
the parties knew this was necessary. Today, Linus' tree has this build
bug.

*exasperated sigh*

diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 466f3fb..6ac0c5c 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -2067,7 +2067,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
 	dev->phys_port_cnt = 1;
 	dev->num_comp_vectors = 1;
 	dev->dma_device = &dd->pcidev->dev;
-	dev->class_dev.dev = dev->dma_device;
+	dev->dev.parent = dev->dma_device;
 	dev->query_device = ipath_query_device;
 	dev->modify_device = ipath_modify_device;
 	dev->query_port = ipath_query_port;
-- 
1.5.4.5

-- 
Cheers,
Stephen Rothwell                    sfr at canb.auug.org.au
http://www.canb.auug.org.au/~sfr/


From rdreier at cisco.com  Mon Apr 21 18:26:03 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 21 Apr 2008 18:26:03 -0700
Subject: [ofa-general] [PATCH] infiniband: class_device fallout
In-Reply-To: <20080422102424.51f94b85.sfr@canb.auug.org.au> (Stephen
	Rothwell's message of "Tue, 22 Apr 2008 10:24:24 +1000")
References: <20080422102424.51f94b85.sfr@canb.auug.org.au>
Message-ID: <ada7ieqadec.fsf@cisco.com>

 > This patch has been needed in linux-next since April 4 to fix an
 > interaction between the driver-core patches and the infiniband tree.  All
 > the parties knew this was necessary. Today, Linus' tree has this build
 > bug.
 > 
 > *exasperated sigh*

Really sorry... I must have missed this when it went by, since I was
actually unaware of the problem until Greg posted his patches for
merging yesterday.  But I tried to get this fixed before the patch was
merged: http://lkml.org/lkml/2008/4/20/153

Anyway I'll ask Linus to pull my tree with the fix...

 - R.


From rdreier at cisco.com  Mon Apr 21 18:26:00 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 21 Apr 2008 18:26:00 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <ada8wz6adef.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a few fixes for various things, including one build fix
for the ipath driver:

Paul Bolle (1):
      IB/ipath: Fix module parameter description for disable_sma

Roland Dreier (6):
      RDMA/nes: Remove unneeded function declarations
      IB/ipath: Remove reference to dev->class_dev
      IB/ipath: Build IBA7220 code unconditionally
      IB/ipath: Remove dependency on PCI_MSI || HT_IRQ
      IB/ipath: Remove tests of PCI_MSI in ipath_iba7220.c
      IB/ipath: Correct capitalization "IntX" -> "INTx"

 drivers/infiniband/hw/ipath/Kconfig         |    2 +-
 drivers/infiniband/hw/ipath/Makefile        |    6 ++++--
 drivers/infiniband/hw/ipath/ipath_driver.c  |    2 +-
 drivers/infiniband/hw/ipath/ipath_iba7220.c |   23 +++++++++--------------
 drivers/infiniband/hw/ipath/ipath_verbs.c   |    3 +--
 drivers/infiniband/hw/nes/nes.c             |    6 ------
 drivers/infiniband/hw/nes/nes_nic.c         |    9 ---------
 7 files changed, 16 insertions(+), 35 deletions(-)


From drjaykrew at gmail.com  Mon Apr 21 17:39:00 2008
From: drjaykrew at gmail.com (viarga cilais )
Date: Tue, 22 Apr 2008 00:39:00 +0000
Subject: [ofa-general] 81% off for general
Message-ID: <000501c8a420$0529e296$92d4f189@bruhdphj>

Hello, make a right choice, purchase your pharmaceuticals from the most reliable supplier.
http://www.google.it/pagead/iclk?sa=l&ai=iZtNNw&num=18175&adurl=http://trieu-exotics.com/redir.html
Code #ctUz
beaufort vason


From swise at opengridcomputing.com  Mon Apr 21 19:42:16 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Mon, 21 Apr 2008 21:42:16 -0500
Subject: [ofa-general] Re: [ewg] Agenda for the OFED meeting today
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
Message-ID: <480D5088.1020005@opengridcomputing.com>


Hey Tziporet,

Sorry I missed today's call.  If possible, I'd like a few weeks to get 
the cxgb3 fixes tested and ready to go.  That puts me around mid may. 
I'll try and pull that in to make a RC1 of May 6, but I'm thinking I 
might need another week or so.

Steve.


Tziporet Koren wrote:
> Hi,
> 
> This is the agenda for the OFED meeting today:
> 1. OFED 1.3.1:
> 
>       1.1  Planned changes:
> 
>             ULPs changes:
> 
>                   IB-bonding - done
>                   SRP failover - on work
>                   SDP crashes - on work
>                   RDS fixes for RDMA API - already applied but not clear
>                   if these are all the changes
>                   librdmacm 1.0.7 - done
>                   Open MPI 1.2.6 - done
> 
>             Low level drivers: - each HW vendor should reply when the
>             changes will be ready
> 
>                   nes
>                   mlx4
>                   cxgb3
>                   Ipath
>                   ehca
> 
>       1.2 Schedule:
> 
>             GA is planned for May-29
>             I suggest to have only two release candidates:
>             - RC1 - May 6
>             - RC2 - May 20
> 
>       Note: daily builds of 1.3.1 are already available at:
>       _http://www.openfabrics.org/builds/ofed-1.3.1_
> 
> 
> 2. OFED 1.4:
> 
>       Release features were presented at Sonoma (presentation available
>       at _http://www.openfabrics.org/archives/april2008sonoma.htm_)
> 
>       Kernel tree is under work at:
>       git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel
>       Now failing on ipath drivers - waiting for an update.
> 
>       We should try to get the kernel code to compile as soon as
>       possible so everybody will be able to contribute code
> 
> 3. Follow up from Sonoma - open discussion
> 
> 
> Tziporet
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From rusty at rustcorp.com.au  Mon Apr 21 22:06:24 2008
From: rusty at rustcorp.com.au (Rusty Russell)
Date: Tue, 22 Apr 2008 15:06:24 +1000
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <ec6d8f91b299cf26cce5.1207669444@duo.random>
References: <ec6d8f91b299cf26cce5.1207669444@duo.random>
Message-ID: <200804221506.26226.rusty@rustcorp.com.au>

On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote:
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1050,6 +1050,15 @@
>  				   unsigned long addr, unsigned long len,
>  				   unsigned long flags, struct page **pages);
>
> +struct mm_lock_data {
> +	spinlock_t **i_mmap_locks;
> +	spinlock_t **anon_vma_locks;
> +	unsigned long nr_i_mmap_locks;
> +	unsigned long nr_anon_vma_locks;
> +};
> +extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
> +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);

As far as I can tell you don't actually need to expose this struct at all?

> +		data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
> +					     sizeof(spinlock_t));

This is why non-typesafe allocators suck.  You want 'sizeof(spinlock_t *)' 
here.

> +		data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
> +					       sizeof(spinlock_t));

and here.

> +	err = -EINTR;
> +	i_mmap_lock_last = NULL;
> +	nr_i_mmap_locks = 0;
> +	for (;;) {
> +		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
> +		for (vma = mm->mmap; vma; vma = vma->vm_next) {
...
> +		data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
> +	}
> +	data->nr_i_mmap_locks = nr_i_mmap_locks;

How about you track your running counter in data->nr_i_mmap_locks, leave 
nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)?

Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function.

Similarly for anon_vma locks.

Unfortunately, I just don't think we can fail locking like this.  In your next 
patch unregistering a notifier can fail because of it: that not usable.

I think it means you need to add a linked list element to the vma for the 
CONFIG_MMU_NOTIFIER case.  Or track the max number of vmas for any mm, and 
keep a pool to handle mm_lock for this number (ie. if you can't enlarge the 
pool, fail the vma allocation).  

Both have their problems though...
Rusty.


From mashirle at us.ibm.com  Mon Apr 21 14:19:23 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Mon, 21 Apr 2008 14:19:23 -0700
Subject: [ofa-general] arp or ip patch to build a neigh permanent entry for
	IPoIB
Message-ID: <1208812763.22166.4.camel@localhost.localdomain>

Hello,

I am debugging an ipoib ping problem on a cluster. The arp, ip command
don't support using 20 bytes HW to build a permanent entry manually. 
Can someone give me the pointer to the patch if any?

Thanks in advance!
Shirley 


From olaf.kirch at oracle.com  Mon Apr 21 23:03:12 2008
From: olaf.kirch at oracle.com (Olaf Kirch)
Date: Tue, 22 Apr 2008 08:03:12 +0200
Subject: [ofa-general] Oddities with RDMA CM private data
In-Reply-To: <001c01c8a3e6$ae92f010$9b37170a@amr.corp.intel.com>
References: <200804212126.55898.olaf.kirch@oracle.com>
	<001c01c8a3e6$ae92f010$9b37170a@amr.corp.intel.com>
Message-ID: <200804220803.13101.olaf.kirch@oracle.com>

On Monday 21 April 2008 21:34:14 Sean Hefty wrote:
> On the receive side of the rdma_cm, the length of the private data sent by the
> user is unknown.  All that's known is the size of the data that was received.
> For IB, this includes padded space to make the underlying CM MAD 256 bytes long.
> From the rdma_get_cm_event man page:

Ah, thanks a lot for clarifying this!

Regards,
Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From yevgenyp at mellanox.co.il  Mon Apr 21 23:32:00 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 09:32:00 +0300
Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP
	support, Patch 1)
Message-ID: <480D8660.3060001@mellanox.co.il>

>From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 10:10:01 +0300
Subject: [PATCH] mlx4: Moving db management to mlx4_core

mlx4_ib is no longer the only customer of mlx4_core.
Thus the doorbell allocation was moved to the low level driver
(same as buffer allocation).

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c       |    6 +-
 drivers/infiniband/hw/mlx4/doorbell.c |  131 +--------------------------------
 drivers/infiniband/hw/mlx4/main.c     |    3 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   33 +-------
 drivers/infiniband/hw/mlx4/qp.c       |    6 +-
 drivers/infiniband/hw/mlx4/srq.c      |    6 +-
 drivers/net/mlx4/alloc.c              |  111 ++++++++++++++++++++++++++++
 drivers/net/mlx4/main.c               |    3 +
 drivers/net/mlx4/mlx4.h               |    3 +
 include/linux/mlx4/device.h           |   41 ++++++++++
 10 files changed, 175 insertions(+), 168 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 3557e7e..5e570bb 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -204,7 +204,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector

 		uar = &to_mucontext(context)->uar;
 	} else {
-		err = mlx4_ib_db_alloc(dev, &cq->db, 1);
+		err = mlx4_db_alloc(dev->dev, &cq->db, 1);
 		if (err)
 			goto err_cq;

@@ -250,7 +250,7 @@ err_mtt:

 err_db:
 	if (!context)
-		mlx4_ib_db_free(dev, &cq->db);
+		mlx4_db_free(dev->dev, &cq->db);

 err_cq:
 	kfree(cq);
@@ -435,7 +435,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq)
 		ib_umem_release(mcq->umem);
 	} else {
 		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1);
-		mlx4_ib_db_free(dev, &mcq->db);
+		mlx4_db_free(dev->dev, &mcq->db);
 	}

 	kfree(mcq);
diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c
index 1c36087..d17b36b 100644
--- a/drivers/infiniband/hw/mlx4/doorbell.c
+++ b/drivers/infiniband/hw/mlx4/doorbell.c
@@ -34,135 +34,10 @@

 #include "mlx4_ib.h"

-struct mlx4_ib_db_pgdir {
-	struct list_head	list;
-	DECLARE_BITMAP(order0, MLX4_IB_DB_PER_PAGE);
-	DECLARE_BITMAP(order1, MLX4_IB_DB_PER_PAGE / 2);
-	unsigned long	       *bits[2];
-	__be32		       *db_page;
-	dma_addr_t		db_dma;
-};
-
-static struct mlx4_ib_db_pgdir *mlx4_ib_alloc_db_pgdir(struct mlx4_ib_dev *dev)
-{
-	struct mlx4_ib_db_pgdir *pgdir;
-
-	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
-	if (!pgdir)
-		return NULL;
-
-	bitmap_fill(pgdir->order1, MLX4_IB_DB_PER_PAGE / 2);
-	pgdir->bits[0] = pgdir->order0;
-	pgdir->bits[1] = pgdir->order1;
-	pgdir->db_page = dma_alloc_coherent(dev->ib_dev.dma_device,
-					    PAGE_SIZE, &pgdir->db_dma,
-					    GFP_KERNEL);
-	if (!pgdir->db_page) {
-		kfree(pgdir);
-		return NULL;
-	}
-
-	return pgdir;
-}
-
-static int mlx4_ib_alloc_db_from_pgdir(struct mlx4_ib_db_pgdir *pgdir,
-				       struct mlx4_ib_db *db, int order)
-{
-	int o;
-	int i;
-
-	for (o = order; o <= 1; ++o) {
-		i = find_first_bit(pgdir->bits[o], MLX4_IB_DB_PER_PAGE >> o);
-		if (i < MLX4_IB_DB_PER_PAGE >> o)
-			goto found;
-	}
-
-	return -ENOMEM;
-
-found:
-	clear_bit(i, pgdir->bits[o]);
-
-	i <<= o;
-
-	if (o > order)
-		set_bit(i ^ 1, pgdir->bits[order]);
-
-	db->u.pgdir = pgdir;
-	db->index   = i;
-	db->db      = pgdir->db_page + db->index;
-	db->dma     = pgdir->db_dma  + db->index * 4;
-	db->order   = order;
-
-	return 0;
-}
-
-int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order)
-{
-	struct mlx4_ib_db_pgdir *pgdir;
-	int ret = 0;
-
-	mutex_lock(&dev->pgdir_mutex);
-
-	list_for_each_entry(pgdir, &dev->pgdir_list, list)
-		if (!mlx4_ib_alloc_db_from_pgdir(pgdir, db, order))
-			goto out;
-
-	pgdir = mlx4_ib_alloc_db_pgdir(dev);
-	if (!pgdir) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	list_add(&pgdir->list, &dev->pgdir_list);
-
-	/* This should never fail -- we just allocated an empty page: */
-	WARN_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order));
-
-out:
-	mutex_unlock(&dev->pgdir_mutex);
-
-	return ret;
-}
-
-void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db)
-{
-	int o;
-	int i;
-
-	mutex_lock(&dev->pgdir_mutex);
-
-	o = db->order;
-	i = db->index;
-
-	if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) {
-		clear_bit(i ^ 1, db->u.pgdir->order0);
-		++o;
-	}
-
-	i >>= o;
-	set_bit(i, db->u.pgdir->bits[o]);
-
-	if (bitmap_full(db->u.pgdir->order1, MLX4_IB_DB_PER_PAGE / 2)) {
-		dma_free_coherent(dev->ib_dev.dma_device, PAGE_SIZE,
-				  db->u.pgdir->db_page, db->u.pgdir->db_dma);
-		list_del(&db->u.pgdir->list);
-		kfree(db->u.pgdir);
-	}
-
-	mutex_unlock(&dev->pgdir_mutex);
-}
-
-struct mlx4_ib_user_db_page {
-	struct list_head	list;
-	struct ib_umem	       *umem;
-	unsigned long		user_virt;
-	int			refcnt;
-};
-
 int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
-			struct mlx4_ib_db *db)
+			struct mlx4_db *db)
 {
-	struct mlx4_ib_user_db_page *page;
+	struct mlx4_user_db_page *page;
 	struct ib_umem_chunk *chunk;
 	int err = 0;

@@ -202,7 +77,7 @@ out:
 	return err;
 }

-void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db)
+void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db)
 {
 	mutex_lock(&context->db_page_mutex);

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 136c76c..3c7f938 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -548,9 +548,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		goto err_uar;
 	MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock);

-	INIT_LIST_HEAD(&ibdev->pgdir_list);
-	mutex_init(&ibdev->pgdir_mutex);
-
 	ibdev->dev = dev;

 	strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 9e63732..5cf9947 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -43,24 +43,6 @@
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/doorbell.h>

-enum {
-	MLX4_IB_DB_PER_PAGE	= PAGE_SIZE / 4
-};
-
-struct mlx4_ib_db_pgdir;
-struct mlx4_ib_user_db_page;
-
-struct mlx4_ib_db {
-	__be32		       *db;
-	union {
-		struct mlx4_ib_db_pgdir	       *pgdir;
-		struct mlx4_ib_user_db_page    *user_page;
-	}			u;
-	dma_addr_t		dma;
-	int			index;
-	int			order;
-};
-
 struct mlx4_ib_ucontext {
 	struct ib_ucontext	ibucontext;
 	struct mlx4_uar		uar;
@@ -88,7 +70,7 @@ struct mlx4_ib_cq {
 	struct mlx4_cq		mcq;
 	struct mlx4_ib_cq_buf	buf;
 	struct mlx4_ib_cq_resize *resize_buf;
-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	spinlock_t		lock;
 	struct mutex		resize_mutex;
 	struct ib_umem	       *umem;
@@ -127,7 +109,7 @@ struct mlx4_ib_qp {
 	struct mlx4_qp		mqp;
 	struct mlx4_buf		buf;

-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	struct mlx4_ib_wq	rq;

 	u32			doorbell_qpn;
@@ -154,7 +136,7 @@ struct mlx4_ib_srq {
 	struct ib_srq		ibsrq;
 	struct mlx4_srq		msrq;
 	struct mlx4_buf		buf;
-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	u64		       *wrid;
 	spinlock_t		lock;
 	int			head;
@@ -175,9 +157,6 @@ struct mlx4_ib_dev {
 	struct mlx4_dev	       *dev;
 	void __iomem	       *uar_map;

-	struct list_head	pgdir_list;
-	struct mutex		pgdir_mutex;
-
 	struct mlx4_uar		priv_uar;
 	u32			priv_pdn;
 	MLX4_DECLARE_DOORBELL_LOCK(uar_lock);
@@ -248,11 +227,9 @@ static inline struct mlx4_ib_ah *to_mah(struct ib_ah *ibah)
 	return container_of(ibah, struct mlx4_ib_ah, ibah);
 }

-int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order);
-void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db);
 int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
-			struct mlx4_ib_db *db);
-void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db);
+			struct mlx4_db *db);
+void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db);

 struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc);
 int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index b75efae..80ea8b9 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -514,7 +514,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err;

 		if (!init_attr->srq) {
-			err = mlx4_ib_db_alloc(dev, &qp->db, 0);
+			err = mlx4_db_alloc(dev->dev, &qp->db, 0);
 			if (err)
 				goto err;

@@ -580,7 +580,7 @@ err_buf:

 err_db:
 	if (!pd->uobject && !init_attr->srq)
-		mlx4_ib_db_free(dev, &qp->db);
+		mlx4_db_free(dev->dev, &qp->db);

 err:
 	return err;
@@ -666,7 +666,7 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 		kfree(qp->rq.wrid);
 		mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf);
 		if (!qp->ibqp.srq)
-			mlx4_ib_db_free(dev, &qp->db);
+			mlx4_db_free(dev->dev, &qp->db);
 	}
 }

diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index beaa3b0..2046197 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -129,7 +129,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		if (err)
 			goto err_mtt;
 	} else {
-		err = mlx4_ib_db_alloc(dev, &srq->db, 0);
+		err = mlx4_db_alloc(dev->dev, &srq->db, 0);
 		if (err)
 			goto err_srq;

@@ -200,7 +200,7 @@ err_buf:

 err_db:
 	if (!pd->uobject)
-		mlx4_ib_db_free(dev, &srq->db);
+		mlx4_db_free(dev->dev, &srq->db);

 err_srq:
 	kfree(srq);
@@ -267,7 +267,7 @@ int mlx4_ib_destroy_srq(struct ib_srq *srq)
 		kfree(msrq->wrid);
 		mlx4_buf_free(dev->dev, msrq->msrq.max << msrq->msrq.wqe_shift,
 			      &msrq->buf);
-		mlx4_ib_db_free(dev, &msrq->db);
+		mlx4_db_free(dev->dev, &msrq->db);
 	}

 	kfree(msrq);
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 75ef9d0..43c6d04 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -196,3 +196,114 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf)
 	}
 }
 EXPORT_SYMBOL_GPL(mlx4_buf_free);
+
+static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
+{
+	struct mlx4_db_pgdir *pgdir;
+
+	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
+	if (!pgdir)
+		return NULL;
+
+	bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2);
+	pgdir->bits[0] = pgdir->order0;
+	pgdir->bits[1] = pgdir->order1;
+	pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE,
+					    &pgdir->db_dma, GFP_KERNEL);
+	if (!pgdir->db_page) {
+		kfree(pgdir);
+		return NULL;
+	}
+
+	return pgdir;
+}
+
+static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
+				    struct mlx4_db *db, int order)
+{
+	int o;
+	int i;
+
+	for (o = order; o <= 1; ++o) {
+		i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
+		if (i < MLX4_DB_PER_PAGE >> o)
+			goto found;
+	}
+
+	return -ENOMEM;
+
+found:
+	clear_bit(i, pgdir->bits[o]);
+
+	i <<= o;
+
+	if (o > order)
+		set_bit(i ^ 1, pgdir->bits[order]);
+
+	db->u.pgdir = pgdir;
+	db->index   = i;
+	db->db      = pgdir->db_page + db->index;
+	db->dma     = pgdir->db_dma  + db->index * 4;
+	db->order   = order;
+
+	return 0;
+}
+
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_db_pgdir *pgdir;
+	int ret = 0;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	list_for_each_entry(pgdir, &priv->pgdir_list, list)
+		if (!mlx4_alloc_db_from_pgdir(pgdir, db, order))
+			goto out;
+
+	pgdir = mlx4_alloc_db_pgdir(&(dev->pdev->dev));
+	if (!pgdir) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	list_add(&pgdir->list, &priv->pgdir_list);
+
+	/* This should never fail -- we just allocated an empty page: */
+	WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order));
+
+out:
+	mutex_unlock(&priv->pgdir_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_db_alloc);
+
+void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int o;
+	int i;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	o = db->order;
+	i = db->index;
+
+	if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) {
+		clear_bit(i ^ 1, db->u.pgdir->order0);
+		++o;
+	}
+	i >>= o;
+	set_bit(i, db->u.pgdir->bits[o]);
+
+	if (bitmap_full(db->u.pgdir->order1, MLX4_DB_PER_PAGE / 2)) {
+		dma_free_coherent(&(dev->pdev->dev), PAGE_SIZE,
+				  db->u.pgdir->db_page, db->u.pgdir->db_dma);
+		list_del(&db->u.pgdir->list);
+		kfree(db->u.pgdir);
+	}
+
+	mutex_unlock(&priv->pgdir_mutex);
+}
+EXPORT_SYMBOL_GPL(mlx4_db_free);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 49a4aca..a6aa49f 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -798,6 +798,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&priv->ctx_list);
 	spin_lock_init(&priv->ctx_lock);

+	INIT_LIST_HEAD(&priv->pgdir_list);
+	mutex_init(&priv->pgdir_mutex);
+
 	/*
 	 * Now reset the HCA before we touch the PCI capabilities or
 	 * attempt a firmware command, since a boot ROM may have left
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 7333681..a4023c2 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -257,6 +257,9 @@ struct mlx4_priv {
 	struct list_head	ctx_list;
 	spinlock_t		ctx_lock;

+	struct list_head        pgdir_list;
+	struct mutex            pgdir_mutex;
+
 	struct mlx4_fw		fw;
 	struct mlx4_cmd		cmd;

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ff7df1a..9c87dd3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -37,6 +37,8 @@
 #include <linux/completion.h>
 #include <linux/radix-tree.h>

+#include <rdma/ib_umem.h>
+
 #include <asm/atomic.h>

 enum {
@@ -208,6 +210,37 @@ struct mlx4_mtt {
 	int			page_shift;
 };

+enum {
+	MLX4_DB_PER_PAGE = PAGE_SIZE / 4
+};
+
+struct mlx4_db_pgdir {
+	struct list_head        list;
+	DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE);
+	DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2);
+	unsigned long          *bits[2];
+	__be32                 *db_page;
+	dma_addr_t              db_dma;
+};
+
+struct mlx4_user_db_page {
+	struct list_head	list;
+	struct ib_umem	       *umem;
+	unsigned long		user_virt;
+	int			refcnt;
+};
+
+struct mlx4_db {
+	__be32                  *db;
+	union {
+		struct mlx4_db_pgdir		*pgdir;
+		struct mlx4_user_db_page	*user_page;
+	}			u;
+	dma_addr_t              dma;
+	int                     index;
+	int                     order;
+};
+
 struct mlx4_mr {
 	struct mlx4_mtt		mtt;
 	u64			iova;
@@ -341,6 +374,9 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 		       struct mlx4_buf *buf);

+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
+void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Mon Apr 21 23:33:57 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 09:33:57 +0300
Subject: [ofa-general][PATCH] mlx4_core: HW queues resource management (MP
	support, Patch 2)
Message-ID: <480D86D5.30504@mellanox.co.il>

>From 3b15a6bba9cb79805198f64985433a33a3a096dc Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 11:06:41 +0300
Subject: [PATCH] mlx4_core: HW queues resource management

Added HW queues management API.
Wraps buffer and doorbell allocation and mtt write.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/alloc.c    |   44 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mlx4/device.h |   11 ++++++++++
 2 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 43c6d04..f36d79e 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -307,3 +307,47 @@ void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db)
 	mutex_unlock(&priv->pgdir_mutex);
 }
 EXPORT_SYMBOL_GPL(mlx4_db_free);
+
+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       int size, int max_direct)
+{
+	int err;
+
+	err = mlx4_db_alloc(dev, &wqres->db, 1);
+	if (err)
+		return err;
+	*wqres->db.db = 0;
+
+	if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) {
+		err = -ENOMEM;
+		goto err_db;
+	}
+
+	err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift,
+			    &wqres->mtt);
+	if (err)
+		goto err_buf;
+	err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf);
+	if (err)
+		goto err_mtt;
+
+	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+err_buf:
+	mlx4_buf_free(dev, size, &wqres->buf);
+err_db:
+	mlx4_db_free(dev, &wqres->db);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res);
+
+void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       int size)
+{
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+	mlx4_buf_free(dev, size, &wqres->buf);
+	mlx4_db_free(dev, &wqres->db);
+}
+EXPORT_SYMBOL_GPL(mlx4_free_hwq_res);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index d5fb774..0505732 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -241,6 +241,12 @@ struct mlx4_db {
 	int                     order;
 };

+struct mlx4_hwq_resources {
+	struct mlx4_db          db;
+	struct mlx4_mtt         mtt;
+	struct mlx4_buf         buf;
+};
+
 struct mlx4_mr {
 	struct mlx4_mtt		mtt;
 	u64			iova;
@@ -377,6 +383,11 @@ int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
 void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);

+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       int size, int max_direct);
+void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
+		       int size);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Mon Apr 21 23:35:51 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 09:35:51 +0300
Subject: [ofa-general][PATCH] mlx4: Qp range reservation (MP support, Patch 3)
Message-ID: <480D8747.1080108@mellanox.co.il>

>From 3978a59af72fddb9b98156a7ecf9018b8bf5b076 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 13:26:14 +0300
Subject: [PATCH] mlx4: Qp range reservation

Prior to allocating a qp, one need to reserve an aligned range of qps.
The change is made to enable allocation of consecutive qps.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/qp.c |    9 +++++
 drivers/net/mlx4/alloc.c        |   77 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4.h         |    2 +
 drivers/net/mlx4/qp.c           |   44 ++++++++++++++++-------
 include/linux/mlx4/device.h     |    5 ++-
 5 files changed, 122 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 80ea8b9..88aae1b 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		}
 	}

+	if (!sqpn)
+		err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn);
+	if (err)
+		goto err_wrid;
+
 	err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp);
 	if (err)
 		goto err_wrid;
@@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 	mlx4_ib_unlock_cqs(send_cq, recv_cq);

 	mlx4_qp_free(dev->dev, &qp->mqp);
+
+	if (!is_sqp(dev, qp))
+		mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1);
+
 	mlx4_mtt_cleanup(dev->dev, &qp->mtt);

 	if (is_user) {
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index f36d79e..4601506 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -73,7 +73,82 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj)
 	spin_unlock(&bitmap->lock);
 }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved)
+static unsigned long find_aligned_range(unsigned long *bitmap,
+					u32 start, u32 nbits,
+					int len, int align)
+{
+	unsigned long end, i;
+
+again:
+	start = ALIGN(start, align);
+	while ((start < nbits) && test_bit(start, bitmap))
+		start += align;
+	if (start >= nbits)
+		return -1;
+
+	end = start+len;
+	if (end > nbits)
+		return -1;
+	for (i = start+1; i < end; i++) {
+		if (test_bit(i, bitmap)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return start;
+}
+
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
+{
+	u32 obj, i;
+
+	if (likely(cnt == 1 && align == 1))
+		return mlx4_bitmap_alloc(bitmap);
+
+	spin_lock(&bitmap->lock);
+
+	obj = find_aligned_range(bitmap->table, bitmap->last,
+				 bitmap->max, cnt, align);
+	if (obj >= bitmap->max) {
+		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+		obj = find_aligned_range(bitmap->table, 0,
+					 bitmap->max,
+					 cnt, align);
+	}
+
+	if (obj < bitmap->max) {
+		for (i = 0; i < cnt; i++)
+			set_bit(obj+i, bitmap->table);
+		if (obj == bitmap->last) {
+			bitmap->last = (obj + cnt);
+			if (bitmap->last >= bitmap->max)
+				bitmap->last = 0;
+		}
+		obj |= bitmap->top;
+	} else
+		obj = -1;
+
+	spin_unlock(&bitmap->lock);
+
+	return obj;
+}
+
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
+{
+	u32 i;
+
+	obj &= bitmap->max - 1;
+
+	spin_lock(&bitmap->lock);
+	for (i = 0; i < cnt; i++)
+		clear_bit(obj+i, bitmap->table);
+	bitmap->last = min(bitmap->last, obj);
+	bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+	spin_unlock(&bitmap->lock);
+}
+
+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
 {
 	int i;

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a4023c2..89d4ccc 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -287,6 +287,8 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)

 u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap);
 void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
 int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index fa24e65..dff8e66 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_modify);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp)
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	int qpn;
+
+	qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align);
+	if (qpn == -1)
+		return -ENOMEM;
+
+	*base = qpn;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range);
+
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	if (base_qpn < dev->caps.sqp_start + 8)
+		return;
+
+	mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt);
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_release_range);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_qp_table *qp_table = &priv->qp_table;
 	int err;

-	if (sqpn)
-		qp->qpn = sqpn;
-	else {
-		qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap);
-		if (qp->qpn == -1)
-			return -ENOMEM;
-	}
+	if (!qpn)
+		return -EINVAL;
+
+	qp->qpn = qpn;

 	err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn);
 	if (err)
@@ -208,9 +231,6 @@ err_put_qp:
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

 err_out:
-	if (!sqpn)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
-
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_alloc);
@@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp)
 	mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn);
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

-	if (qp->qpn >= dev->caps.sqp_start + 8)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_free);

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 0505732..9c77bf3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -392,7 +392,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp);
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp);
 void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp);

 int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
-- 
1.5.4


From yevgenyp at mellanox.co.il  Mon Apr 21 23:38:59 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 09:38:59 +0300
Subject: [ofa-general][PATCH] mlx4: Prereserved Qp regions (MP support, Patch
	4)
Message-ID: <480D8803.1050404@mellanox.co.il>

>From 2dd4f8abdedda736adca5818c98f7a67d339ba7e Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 14:39:27 +0300
Subject: [PATCH] mlx4: Prereserved Qp regions.

We reserve Qp ranges to be used by other modules in case
the ports come up as Ethernet ports.
The qps are reserved at the end of the QP table.
(This way we assure that they are alligned to their size)

We need to consider theese reserved ranges in bitmap creation :
The effective max parameter.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/alloc.c    |   38 ++++++++++++++++--------
 drivers/net/mlx4/fw.c       |    5 +++
 drivers/net/mlx4/fw.h       |    2 +
 drivers/net/mlx4/main.c     |   65 +++++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx4/mlx4.h     |    4 ++
 drivers/net/mlx4/qp.c       |   55 ++++++++++++++++++++++++++++++++++--
 include/linux/mlx4/device.h |   19 ++++++++++++-
 7 files changed, 165 insertions(+), 23 deletions(-)

diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 4601506..4b6074d 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -44,15 +44,18 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap)

 	spin_lock(&bitmap->lock);

-	obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last);
-	if (obj >= bitmap->max) {
+	obj = find_next_zero_bit(bitmap->table, bitmap->effective_max,
+				 bitmap->last);
+	if (obj >= bitmap->effective_max) {
 		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
-		obj = find_first_zero_bit(bitmap->table, bitmap->max);
+		obj = find_first_zero_bit(bitmap->table, bitmap->effective_max);
 	}

-	if (obj < bitmap->max) {
+	if (obj < bitmap->effective_max) {
 		set_bit(obj, bitmap->table);
-		bitmap->last = (obj + 1) & (bitmap->max - 1);
+		bitmap->last = (obj + 1);
+		if (bitmap->last == bitmap->effective_max)
+			bitmap->last = 0;
 		obj |= bitmap->top;
 	} else
 		obj = -1;
@@ -108,20 +111,20 @@ u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
 	spin_lock(&bitmap->lock);

 	obj = find_aligned_range(bitmap->table, bitmap->last,
-				 bitmap->max, cnt, align);
-	if (obj >= bitmap->max) {
+				 bitmap->effective_max, cnt, align);
+	if (obj >= bitmap->effective_max) {
 		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
 		obj = find_aligned_range(bitmap->table, 0,
-					 bitmap->max,
+					 bitmap->effective_max,
 					 cnt, align);
 	}

-	if (obj < bitmap->max) {
+	if (obj < bitmap->effective_max) {
 		for (i = 0; i < cnt; i++)
 			set_bit(obj+i, bitmap->table);
 		if (obj == bitmap->last) {
 			bitmap->last = (obj + cnt);
-			if (bitmap->last >= bitmap->max)
+			if (bitmap->last >= bitmap->effective_max)
 				bitmap->last = 0;
 		}
 		obj |= bitmap->top;
@@ -147,8 +150,9 @@ void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
 	spin_unlock(&bitmap->lock);
 }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
-		     u32 num, u32 mask, u32 reserved)
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max)
 {
 	int i;

@@ -160,6 +164,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
 	bitmap->top  = 0;
 	bitmap->max  = num;
 	bitmap->mask = mask;
+	bitmap->effective_max = effective_max;
 	spin_lock_init(&bitmap->lock);
 	bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL);
 	if (!bitmap->table)
@@ -171,6 +176,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
 	return 0;
 }

+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
+{
+	return mlx4_bitmap_init_with_effective_max(bitmap, num, mask,
+						   reserved, num);
+}
+
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap)
 {
 	kfree(bitmap->table);
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index d82f275..b0ad0d1 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -325,6 +325,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_PORT_MTU_OFFSET			0x01
 #define QUERY_PORT_WIDTH_OFFSET			0x06
 #define QUERY_PORT_MAX_GID_PKEY_OFFSET		0x07
+#define QUERY_PORT_MAX_MACVLAN_OFFSET		0x0a
 #define QUERY_PORT_MAX_VL_OFFSET		0x0b

 		for (i = 1; i <= dev_cap->num_ports; ++i) {
@@ -342,6 +343,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
 			MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET);
 			dev_cap->max_vl[i]	   = field & 0xf;
+			MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET);
+			dev_cap->log_max_macs[i]  = field & 0xf;
+			dev_cap->log_max_vlans[i] = field >> 4;
+
 		}
 	}

diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 306cb9b..a2e827c 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -97,6 +97,8 @@ struct mlx4_dev_cap {
 	u32 reserved_lkey;
 	u64 max_icm_sz;
 	int max_gso_sz;
+	u8  log_max_macs[MLX4_MAX_PORTS + 1];
+	u8  log_max_vlans[MLX4_MAX_PORTS + 1];
 };

 struct mlx4_adapter {
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index a6aa49f..f309532 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -85,6 +85,22 @@ static struct mlx4_profile default_profile = {
 	.num_mtt	= 1 << 20,
 };

+static int num_mac = 1;
+module_param_named(num_mac, num_mac, int, 0444);
+MODULE_PARM_DESC(num_mac, "Maximum number of MACs per ETH port "
+		  "(1-127, default 1)");
+
+static int num_vlan;
+module_param_named(num_vlan, num_vlan, int, 0444);
+MODULE_PARM_DESC(num_vlan, "Maximum number of VLANs per ETH port "
+		  "(0-126, default 0)");
+
+static int use_prio;
+module_param_named(use_prio, use_prio, bool, 0444);
+MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports "
+		  "(0/1, default 0)");
+
+
 static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 {
 	int err;
@@ -134,7 +150,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_rq_sg	     = dev_cap->max_rq_sg;
 	dev->caps.max_wqes	     = dev_cap->max_qp_sz;
 	dev->caps.max_qp_init_rdma   = dev_cap->max_requester_per_qp;
-	dev->caps.reserved_qps	     = dev_cap->reserved_qps;
 	dev->caps.max_srq_wqes	     = dev_cap->max_srq_sz;
 	dev->caps.max_srq_sge	     = dev_cap->max_rq_sg - 1;
 	dev->caps.reserved_srqs	     = dev_cap->reserved_srqs;
@@ -161,6 +176,39 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
 	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;

+	dev->caps.log_num_macs  = ilog2(roundup_pow_of_two(num_mac + 1));
+	dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2));
+	dev->caps.log_num_prios = use_prio ? 3: 0;
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
+			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
+			mlx4_warn(dev, "Requested number of MACs is too much "
+				  "for port %d, reducing to %d.\n",
+				  i, 1 << dev->caps.log_num_macs);
+		}
+		if (dev->caps.log_num_vlans > dev_cap->log_max_vlans[i]) {
+			dev->caps.log_num_vlans = dev_cap->log_max_vlans[i];
+			mlx4_warn(dev, "Requested number of VLANs is too much "
+				  "for port %d, reducing to %d.\n",
+				  i, 1 << dev->caps.log_num_vlans);
+		}
+	}
+
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] = dev_cap->reserved_qps;
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] =
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] =
+		(1 << dev->caps.log_num_macs)*
+		(1 << dev->caps.log_num_vlans)*
+		(1 << dev->caps.log_num_prios)*
+		dev->caps.num_ports;
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH;
+
+	dev->caps.reserved_qps = dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] +
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] +
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] +
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH];
+
 	return 0;
 }

@@ -209,7 +257,8 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base,
 				  ((u64) (MLX4_CMPT_TYPE_QP *
 					  cmpt_entry_sz) << MLX4_CMPT_SHIFT),
 				  cmpt_entry_sz, dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err)
 		goto err;

@@ -334,7 +383,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->qpc_base,
 				  dev_cap->qpc_entry_sz,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map QP context memory, aborting.\n");
 		goto err_unmap_dmpt;
@@ -344,7 +394,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->auxc_base,
 				  dev_cap->aux_entry_sz,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map AUXC context memory, aborting.\n");
 		goto err_unmap_qp;
@@ -354,7 +405,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->altc_base,
 				  dev_cap->altc_entry_sz,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map ALTC context memory, aborting.\n");
 		goto err_unmap_auxc;
@@ -364,7 +416,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->rdmarc_base,
 				  dev_cap->rdmarc_entry_sz << priv->qp_table.rdmarc_shift,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map RDMARC context memory, aborting\n");
 		goto err_unmap_altc;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 89d4ccc..b74405a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -111,6 +111,7 @@ struct mlx4_bitmap {
 	u32			last;
 	u32			top;
 	u32			max;
+	u32                     effective_max;
 	u32			mask;
 	spinlock_t		lock;
 	unsigned long	       *table;
@@ -290,6 +291,9 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
 u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
 void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
 int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max);
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

 int mlx4_reset(struct mlx4_dev *dev);
diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index dff8e66..2d5be15 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -273,6 +273,7 @@ int mlx4_init_qp_table(struct mlx4_dev *dev)
 {
 	struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table;
 	int err;
+	int reserved_from_top = 0;

 	spin_lock_init(&qp_table->lock);
 	INIT_RADIX_TREE(&dev->qp_table_tree, GFP_ATOMIC);
@@ -282,9 +283,43 @@ int mlx4_init_qp_table(struct mlx4_dev *dev)
 	 * block of special QPs must be aligned to a multiple of 8, so
 	 * round up.
 	 */
-	dev->caps.sqp_start = ALIGN(dev->caps.reserved_qps, 8);
-	err = mlx4_bitmap_init(&qp_table->bitmap, dev->caps.num_qps,
-			       (1 << 24) - 1, dev->caps.sqp_start + 8);
+	dev->caps.sqp_start =
+		ALIGN(dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], 8);
+
+	{
+		int sort[MLX4_QP_REGION_COUNT];
+		int i, j, tmp;
+		int last_base = dev->caps.num_qps;
+
+		for (i = 1; i < MLX4_QP_REGION_COUNT; ++i)
+			sort[i] = i;
+
+		for (i = MLX4_QP_REGION_COUNT; i > 0; --i) {
+			for (j = 2; j < i; ++j) {
+				if (dev->caps.reserved_qps_cnt[sort[j]] >
+				    dev->caps.reserved_qps_cnt[sort[j - 1]]) {
+					tmp             = sort[j];
+					sort[j]         = sort[j - 1];
+					sort[j - 1]     = tmp;
+				}
+			}
+		}
+
+		for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) {
+			last_base -= dev->caps.reserved_qps_cnt[sort[i]];
+			dev->caps.reserved_qps_base[sort[i]] = last_base;
+			reserved_from_top +=
+				dev->caps.reserved_qps_cnt[sort[i]];
+		}
+
+	}
+
+	err = mlx4_bitmap_init_with_effective_max(&qp_table->bitmap,
+						  dev->caps.num_qps,
+						  (1 << 23) - 1,
+						  dev->caps.sqp_start + 8,
+						  dev->caps.num_qps -
+						  reserved_from_top);
 	if (err)
 		return err;

@@ -297,6 +332,20 @@ void mlx4_cleanup_qp_table(struct mlx4_dev *dev)
 	mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap);
 }

+int mlx4_qp_get_region(struct mlx4_dev *dev,
+		       enum qp_region region,
+		       int *base_qpn, int *cnt)
+{
+	if ((region < 0) || (region >= MLX4_QP_REGION_COUNT))
+		return -EINVAL;
+
+	*base_qpn       = dev->caps.reserved_qps_base[region];
+	*cnt            = dev->caps.reserved_qps_cnt[region];
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_get_region);
+
 int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
 		  struct mlx4_qp_context *context)
 {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 9c77bf3..955eeca 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -135,6 +135,18 @@ enum {
 	MLX4_STAT_RATE_OFFSET	= 5
 };

+enum qp_region {
+	MLX4_QP_REGION_FW = 0,
+	MLX4_QP_REGION_ETH_ADDR,
+	MLX4_QP_REGION_FC_ADDR,
+	MLX4_QP_REGION_FC_EXCH,
+	MLX4_QP_REGION_COUNT
+};
+
+enum {
+	MLX4_NUM_FEXCH          = 64 * 1024,
+};
+
 static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor)
 {
 	return (major << 32) | (minor << 16) | subminor;
@@ -159,7 +171,6 @@ struct mlx4_caps {
 	int			max_rq_desc_sz;
 	int			max_qp_init_rdma;
 	int			max_qp_dest_rdma;
-	int			reserved_qps;
 	int			sqp_start;
 	int			num_srqs;
 	int			max_srq_wqes;
@@ -189,6 +200,12 @@ struct mlx4_caps {
 	u16			stat_rate_support;
 	u8			port_width_cap[MLX4_MAX_PORTS + 1];
 	int			max_gso_sz;
+	int                     reserved_qps_cnt[MLX4_QP_REGION_COUNT];
+	int			reserved_qps;
+	int                     reserved_qps_base[MLX4_QP_REGION_COUNT];
+	int                     log_num_macs;
+	int                     log_num_vlans;
+	int                     log_num_prios;
 };

 struct mlx4_buf_list {
-- 
1.5.4


From yevgenyp at mellanox.co.il  Mon Apr 21 23:49:26 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 09:49:26 +0300
Subject: [ofa-general][PATCH] mlx4: Different port type support (MP support,
	Patch 5)
Message-ID: <480D8A76.10301@mellanox.co.il>

>From 0d3da6ad682c4655cd909aefe5bc294c55f5f711 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 17:40:57 +0300
Subject: [PATCH] mlx4: Different port type support

Multiprotocol supports different port types.
The port types are delivered through module parameters,
crossed with firmware capabilities.
Each consumer of mlx4_core should query for supported port types,
mlx4_ib can no longer assume that all phisical ports belong to it.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/mad.c     |    6 +-
 drivers/infiniband/hw/mlx4/main.c    |   12 ++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    2 +
 drivers/net/mlx4/fw.c                |    4 ++
 drivers/net/mlx4/fw.h                |    1 +
 drivers/net/mlx4/main.c              |   84 ++++++++++++++++++++++++++++++++++
 include/linux/mlx4/device.h          |   32 +++++++++++++
 7 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 4c1e72f..d91ba56 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -297,7 +297,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
 	int p, q;
 	int ret;

-	for (p = 0; p < dev->dev->caps.num_ports; ++p)
+	for (p = 0; p < dev->num_ports; ++p)
 		for (q = 0; q <= 1; ++q) {
 			agent = ib_register_mad_agent(&dev->ib_dev, p + 1,
 						      q ? IB_QPT_GSI : IB_QPT_SMI,
@@ -313,7 +313,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
 	return 0;

 err:
-	for (p = 0; p < dev->dev->caps.num_ports; ++p)
+	for (p = 0; p < dev->num_ports; ++p)
 		for (q = 0; q <= 1; ++q)
 			if (dev->send_agent[p][q])
 				ib_unregister_mad_agent(dev->send_agent[p][q]);
@@ -326,7 +326,7 @@ void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev)
 	struct ib_mad_agent *agent;
 	int p, q;

-	for (p = 0; p < dev->dev->caps.num_ports; ++p) {
+	for (p = 0; p < dev->num_ports; ++p) {
 		for (q = 0; q <= 1; ++q) {
 			agent = dev->send_agent[p][q];
 			dev->send_agent[p][q] = NULL;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 3c7f938..507dbe3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -549,11 +549,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock);

 	ibdev->dev = dev;
+	ibdev->ports_map = mlx4_get_ports_of_type(dev, MLX4_PORT_TYPE_IB);

 	strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX);
 	ibdev->ib_dev.owner		= THIS_MODULE;
 	ibdev->ib_dev.node_type		= RDMA_NODE_IB_CA;
-	ibdev->ib_dev.phys_port_cnt	= dev->caps.num_ports;
+	ibdev->num_ports = 0;
+	mlx4_foreach_port(i, ibdev->ports_map)
+		ibdev->num_ports++;
+	ibdev->ib_dev.phys_port_cnt     = ibdev->num_ports;
 	ibdev->ib_dev.num_comp_vectors	= 1;
 	ibdev->ib_dev.dma_device	= &dev->pdev->dev;

@@ -667,7 +671,7 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 	struct mlx4_ib_dev *ibdev = ibdev_ptr;
 	int p;

-	for (p = 1; p <= dev->caps.num_ports; ++p)
+	for (p = 1; p <= ibdev->num_ports; ++p)
 		mlx4_CLOSE_PORT(dev, p);

 	mlx4_ib_mad_cleanup(ibdev);
@@ -682,6 +686,10 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr,
 			  enum mlx4_dev_event event, int port)
 {
 	struct ib_event ibev;
+	struct mlx4_ib_dev *ibdev = to_mdev((struct ib_device *) ibdev_ptr);
+
+	if (port > ibdev->num_ports)
+		return;

 	switch (event) {
 	case MLX4_DEV_EVENT_PORT_UP:
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 5cf9947..9d4f7a7 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -155,6 +155,8 @@ struct mlx4_ib_ah {
 struct mlx4_ib_dev {
 	struct ib_device	ib_dev;
 	struct mlx4_dev	       *dev;
+	u32			ports_map;
+	int			num_ports;
 	void __iomem	       *uar_map;

 	struct mlx4_uar		priv_uar;
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index b0ad0d1..e875b08 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -322,6 +322,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
 		}
 	} else {
+#define QUERY_PORT_SUPPORTED_TYPE_OFFSET	0x00
 #define QUERY_PORT_MTU_OFFSET			0x01
 #define QUERY_PORT_WIDTH_OFFSET			0x06
 #define QUERY_PORT_MAX_GID_PKEY_OFFSET		0x07
@@ -334,6 +335,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			if (err)
 				goto out;

+			MLX4_GET(field, outbox,
+				 QUERY_PORT_SUPPORTED_TYPE_OFFSET);
+			dev_cap->supported_port_types[i] = field & 3;
 			MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET);
 			dev_cap->max_mtu[i]	   = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET);
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index a2e827c..50a6a7d 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -97,6 +97,7 @@ struct mlx4_dev_cap {
 	u32 reserved_lkey;
 	u64 max_icm_sz;
 	int max_gso_sz;
+	u8  supported_port_types[MLX4_MAX_PORTS + 1];
 	u8  log_max_macs[MLX4_MAX_PORTS + 1];
 	u8  log_max_vlans[MLX4_MAX_PORTS + 1];
 };
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index f309532..1651d8e 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -100,11 +100,50 @@ module_param_named(use_prio, use_prio, bool, 0444);
 MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports "
 		  "(0/1, default 0)");

+static char *port_type_arr[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = "ib"};
+module_param_array_named(port_type, port_type_arr, charp, NULL, 0444);
+MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, "
+		  "comma seperated, default ib for all)");
+
+static int mlx4_check_port_params(struct mlx4_dev *dev,
+				  enum mlx4_port_type *port_type)
+{
+	if (port_type[0] != port_type[1] &&
+	    !(dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP)) {
+		mlx4_err(dev, "Only same port types supported "
+			 "on this HCA, aborting.\n");
+		return -EINVAL;
+	}
+	if ((port_type[0] == MLX4_PORT_TYPE_ETH) &&
+	    (port_type[1] == MLX4_PORT_TYPE_IB)) {
+		mlx4_err(dev, "eth-ib configuration is not supported.\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void mlx4_str2port_type(char **port_str,
+			       enum mlx4_port_type *port_type)
+{
+	int i;
+
+	for (i = 0; i < MLX4_MAX_PORTS; i++) {
+		if (!strcmp(port_str[i], "eth"))
+			port_type[i] = MLX4_PORT_TYPE_ETH;
+		else
+			port_type[i] = MLX4_PORT_TYPE_IB;
+	}
+}
+
+

 static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 {
 	int err;
 	int i;
+	enum mlx4_port_type port_type[MLX4_MAX_PORTS];
+
+	mlx4_str2port_type(port_type_arr, port_type);

 	err = mlx4_QUERY_DEV_CAP(dev, dev_cap);
 	if (err) {
@@ -180,7 +219,24 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2));
 	dev->caps.log_num_prios = use_prio ? 3: 0;

+	err = mlx4_check_port_params(dev, port_type);
+	if (err)
+		return err;
+
 	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (!dev_cap->supported_port_types[i]) {
+			mlx4_warn(dev, "FW doesn't support Multi Protocol, "
+				  "loading IB only\n");
+			dev->caps.port_type[i] = MLX4_PORT_TYPE_IB;
+			continue;
+		}
+		if (port_type[i-1] & dev_cap->supported_port_types[i])
+			dev->caps.port_type[i] = port_type[i-1];
+		else {
+			mlx4_err(dev, "Requested port type for port %d "
+				 "not supported by HW\n", i);
+			return -ENODEV;
+		}
 		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
 			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
 			mlx4_warn(dev, "Requested number of MACs is too much "
@@ -1004,10 +1060,38 @@ static struct pci_driver mlx4_driver = {
 	.remove		= __devexit_p(mlx4_remove_one)
 };

+static int __init mlx4_verify_params(void)
+{
+	int i;
+
+	for (i = 0; i < MLX4_MAX_PORTS; ++i) {
+		if (strcmp(port_type_arr[i], "eth") &&
+		    strcmp(port_type_arr[i], "ib")) {
+			printk(KERN_WARNING "mlx4_core: bad port_type for "
+			       "port %d: %s\n", i, port_type_arr[i]);
+			return -1;
+		}
+	}
+	if ((num_mac < 1) || (num_mac > 127)) {
+		printk(KERN_WARNING "mlx4_core: bad num_mac: %d\n", num_mac);
+		return -1;
+	}
+
+	if ((num_vlan < 0) || (num_vlan > 126)) {
+		printk(KERN_WARNING "mlx4_core: bad num_vlan: %d\n", num_vlan);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int __init mlx4_init(void)
 {
 	int ret;

+	if (mlx4_verify_params())
+		return -EINVAL;
+
 	ret = mlx4_catas_init();
 	if (ret)
 		return ret;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 955eeca..4279b2f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -62,6 +62,7 @@ enum {
 	MLX4_DEV_CAP_FLAG_IPOIB_CSUM	= 1 <<  7,
 	MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR	= 1 <<  8,
 	MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR	= 1 <<  9,
+	MLX4_DEV_CAP_FLAG_DPDP		= 1 << 12,
 	MLX4_DEV_CAP_FLAG_MEM_WINDOW	= 1 << 16,
 	MLX4_DEV_CAP_FLAG_APM		= 1 << 17,
 	MLX4_DEV_CAP_FLAG_ATOMIC	= 1 << 18,
@@ -143,6 +144,11 @@ enum qp_region {
 	MLX4_QP_REGION_COUNT
 };

+enum mlx4_port_type {
+	MLX4_PORT_TYPE_IB	= 1 << 0,
+	MLX4_PORT_TYPE_ETH	= 1 << 1,
+};
+
 enum {
 	MLX4_NUM_FEXCH          = 64 * 1024,
 };
@@ -206,6 +212,7 @@ struct mlx4_caps {
 	int                     log_num_macs;
 	int                     log_num_vlans;
 	int                     log_num_prios;
+	enum mlx4_port_type	port_type[MLX4_MAX_PORTS + 1];
 };

 struct mlx4_buf_list {
@@ -365,6 +372,31 @@ struct mlx4_init_port_param {
 	u64			si_guid;
 };

+static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac,
+					int *log_vlan, int *log_prio)
+{
+	*log_mac = dev->caps.log_num_macs;
+	*log_vlan = dev->caps.log_num_vlans;
+	*log_prio = dev->caps.log_num_prios;
+}
+
+static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev,
+					 enum mlx4_port_type ptype)
+{
+	u32 ret = 0;
+	int i;
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (dev->caps.port_type[i] == ptype)
+			ret |= 1 << (i-1);
+	}
+	return ret;
+}
+
+#define mlx4_foreach_port(port, bitmap) \
+	    for ((port) = 1; (port) <= MLX4_MAX_PORTS; (port)++) \
+		if (bitmap & 1 << ((port)-1))
+
 int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
 		   struct mlx4_buf *buf);
 void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf);
-- 
1.5.4


From andrea at qumranet.com  Tue Apr 22 00:20:26 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 09:20:26 +0200
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080409185500.GT11364@sgi.com>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
Message-ID: <20080422072026.GM12709@duo.random>

This is a followup of the locking of the mmu-notifier methods against
the secondary-mmu page fault, each driver can implement differently
but this is to show an example of what I planned for KVM, others may
follow closely if they find this useful. I post this as pseudocode to
hide 99% of kvm internal complexities and to focus only on the
locking. The KVM locking scheme should be something on these lines:

    invalidate_range_start {
	spin_lock(&kvm->mmu_lock);

	kvm->invalidate_range_count++;
	rmap-invalidate of sptes in range

	spin_unlock(&kvm->mmu_lock)
    }

    invalidate_range_end {
	spin_lock(&kvm->mmu_lock);

	kvm->invalidate_range_count--;

	spin_unlock(&kvm->mmu_lock)
    }

   invalidate_page {
	spin_lock(&kvm->mmu_lock);

	write_seqlock()
	rmap-invalidate of sptes of page
	write_sequnlock()

	spin_unlock(&kvm->mmu_lock)
   }

   kvm_page_fault {
      seq = read_seqlock()
      get_user_pages() (aka gfn_to_pfn() in kvm terms)

      spin_lock(&kvm->mmu_lock)
      if (seq_trylock(seq) || kvm->invalidate_range_count)
      	 goto out; /* reply page fault */
      map sptes and build rmap
 out:
      spin_unlock(&kvm->mmu_lock)
   }

This will allow to remove the page pinning from KVM. I'd appreciate if
you Robin and Christoph can have a second look and pinpoint any
potential issue in my plan.

invalidate_page as you can notice, allows to decrease the fixed cost
overhead from all VM code that works with a single page and where
freeing the page _after_ calling invalidate_page is zero runtime/tlb
cost. We need invalidate_range_begin/end because when we work on
multiple pages, we can reduce cpu utilization and avoid many tlb
flushes by holding off the kvm page fault when we work on the range.

invalidate_page also allows to decrease the window where the kvm page
fault could possibly need to be replied (the ptep_clear_flush <->
invalidate_page window is shorter than a
invalidate_range_begin(PAGE_SIZE) <->
invalidate_range_end(PAGE_SIZE)).

So even if only as a microoptimization it worth it to decrease the
impact on the common VM code. The cost of having both a seqlock and a
range_count is irrlevant in kvm terms as they'll be in the same
cacheline and checked at the same time by the page fault and it won't
require any additional blocking (or writing) lock.

Note that the kvm page fault can't happen unless the cpu switches to
guest mode, and it can't switch to guest mode if we're in the
begin/end critical section, so in theory I could loop inside the page
fault too without risking deadlocking, but replying it by restarting
guest mode sounds nicer in sigkill/scheduling terms.

Soon I'll release a new mmu notifier patchset with patch 1 being the
mmu-notifier-core self-included and ready to go in -mm and mainline in
time for 2.6.26. Then I'll be glad to help merging any further patch
in the patchset to allow methods to sleep so XPMEM can run on mainline
2.6.27 the same way GRU/KVM/Quadrics will run fine on 2.6.26, in a
fully backwards compatible way with 2.6.26 (and of course it doesn't
really need to be backwards compatible because this is a kernel
internal API only, ask Greg etc... ;). But that will likely require a
new config option to avoid hurting AIM performance in fork because the
anon_vma critical sections are so short in the fast path.


From ruimario at gmail.com  Tue Apr 22 03:09:52 2008
From: ruimario at gmail.com (Rui Machado)
Date: Tue, 22 Apr 2008 12:09:52 +0200
Subject: [ofa-general] beginner resources
Message-ID: <6978b4af0804220309t1ae34185y83ba69f9bbfa309b@mail.gmail.com>

Hi list,

is this the right list to ask totally beginner questions (even code
snippets) or is there any other resource for this matter?

Thank you all,

Rui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/18cf9fba/attachment.html>

From glebn at voltaire.com  Tue Apr 22 04:14:13 2008
From: glebn at voltaire.com (Gleb Natapov)
Date: Tue, 22 Apr 2008 14:14:13 +0300
Subject: [ofa-general] Problem with libibverbs and huge pages registration.
In-Reply-To: <adaod82an80.fsf@cisco.com>
References: <20080421141441.GF7771@minantech.com> <adaod82an80.fsf@cisco.com>
Message-ID: <20080422111412.GH7771@minantech.com>

On Mon, Apr 21, 2008 at 02:53:51PM -0700, Roland Dreier wrote:
>  >    ibv_reg_mr() fails if I try to register a memory region backed by a
>  > huge page, but is not aligned to huge page boundary. Digging deeper I
>  > see that libibverbs aligns memory region to a regular page size and
>  > calls madvise() and the call fails. See program below to reproduce.
>  > The program assumes that hugetlbfs is mounted on /huge and there is at
>  > least one huge page available. I am not use it is possible to know if a
>  > memory buffer is backed by huge page to solve the problem.
> 
> Hmm, not sure off the top of my head how we should deal with this.
Me too :(

> 
>  > Another issue with libibverbs is that after first ibv_reg_mr() fails the
>  > second registration attempt of the same buffer succeed since
>  > ibv_madvise_range() doesn't cleanup after madvice failure and thinks
>  > that memory is already "madvised".
> 
> I guess we shouldn't change the refcnt until after we know if madvise
> has succeeded or not.  Does the patch below help?  I'm not sure if this
> is a good enough fix -- we might have split up a node and want to
> remerge it if the madvise fails... rolling back is a little tricky... I
> think this will take a little more thought.
> 
>  - R.
> 
> --- a/src/memory.c
> +++ b/src/memory.c
> @@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
>  			__mm_add(tmp);
>  		}
>  
> -		node->refcnt += inc;
> -
I suppose "if" below depends on updated refcnt, so update can't be moved
down without changing the "if" statement.

>  		if ((inc == -1 && node->refcnt == 0) ||
>  		    (inc ==  1 && node->refcnt == 1)) {
>  			/*
> @@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
>  				goto out;
>  		}
>  
> +		node->refcnt += inc;
> +
>  		node = __mm_next(node);
>  	}
>  

--
			Gleb.


From andrea at qumranet.com  Tue Apr 22 05:00:56 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 14:00:56 +0200
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080422072026.GM12709@duo.random>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
	<20080422072026.GM12709@duo.random>
Message-ID: <20080422120056.GR12709@duo.random>

On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
>     invalidate_range_start {
> 	spin_lock(&kvm->mmu_lock);
> 
> 	kvm->invalidate_range_count++;
> 	rmap-invalidate of sptes in range
> 

	write_seqlock; write_sequnlock;

> 	spin_unlock(&kvm->mmu_lock)
>     }
> 
>     invalidate_range_end {
> 	spin_lock(&kvm->mmu_lock);
> 
> 	kvm->invalidate_range_count--;


	write_seqlock; write_sequnlock;

> 
> 	spin_unlock(&kvm->mmu_lock)
>     }

Robin correctly pointed out by PM there should be a seqlock in
range_begin/end too like corrected above.

I guess it's better to use an explicit sequence counter so we avoid an
useless spinlock of the write_seqlock (mmu_lock is enough already in
all places) and so we can increase it with a single op with +=2 in the
range_begin/end. The above is a lower-perf version of the final
locking but simpler for reading purposes.


From bongos9 at bew-energie.de  Tue Apr 22 05:49:29 2008
From: bongos9 at bew-energie.de (Harris Donahue)
Date: Tue, 22 Apr 2008 13:49:29 +0100
Subject: [ofa-general] Negroes admire with the of the size - we will surpass
	them!
Message-ID: <827063866.80325943804728@bew-energie.de>

What you can exp mg ect
First month you will notice an inc rt re mtb ase in p ow en yww is si pt ze of up to 1/2 in lor ch,
you will also notice an in tc cre fb ase in se bu xu ww al desire, st kmr ron vr ger er jtv ecti ky ons and more enjoyable s rm e bw x.

Second month you will notice an in ukn crea lv se in p qcz en mj is si rza ze of up to 1 inc zqd hes,
plus an in ou cre df ase in Gir ygt th (Wid fdc th) of 5%, plus all the benefits of the first month.

Third/Forth month you will notice an inc sbp rease in pe hoe nis si pgv ze of up to 3 inc xpk hes,
plus an incre px ase in Gi ibu rth (Wid mi th) of 10%, plus all the benefits of the first month.

Fifth/Sixth month you will notice an in ges cre qvg ase in p ub en yj is si pas ze of up to 4 in inr ch ip es,
plus a i ap ncre zvl ase in Gir far th (Wi wh dth) of 20%, plus all the benefits of the first month.


CLI umg CK HE cmp RE!!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/7d35dca0/attachment.html>

From holt at sgi.com  Tue Apr 22 06:01:20 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 08:01:20 -0500
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080422120056.GR12709@duo.random>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
	<20080422072026.GM12709@duo.random>
	<20080422120056.GR12709@duo.random>
Message-ID: <20080422130120.GR22493@sgi.com>

On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
> >     invalidate_range_start {
> > 	spin_lock(&kvm->mmu_lock);
> > 
> > 	kvm->invalidate_range_count++;
> > 	rmap-invalidate of sptes in range
> > 
> 
> 	write_seqlock; write_sequnlock;

I don't think you need it here since invalidate_range_count is already
elevated which will accomplish the same effect.

Thanks,
Robin


From t.jimenez at julsa.e.telefonica.net  Tue Apr 22 04:20:04 2008
From: t.jimenez at julsa.e.telefonica.net (harvey rowan)
Date: Tue, 22 Apr 2008 11:20:04 +0000
Subject: [ofa-general] general's naked video
Message-ID: <000801c8a479$05c39900$559128b3@khprgj>

Take a look at yourself :)
HKHztnxLQA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/552f38f1/attachment.html>

From andrea at qumranet.com  Tue Apr 22 06:21:43 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:21:43 +0200
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080422130120.GR22493@sgi.com>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
	<20080422072026.GM12709@duo.random>
	<20080422120056.GR12709@duo.random>
	<20080422130120.GR22493@sgi.com>
Message-ID: <20080422132143.GS12709@duo.random>

On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote:
> On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote:
> > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
> > >     invalidate_range_start {
> > > 	spin_lock(&kvm->mmu_lock);
> > > 
> > > 	kvm->invalidate_range_count++;
> > > 	rmap-invalidate of sptes in range
> > > 
> > 
> > 	write_seqlock; write_sequnlock;
> 
> I don't think you need it here since invalidate_range_count is already
> elevated which will accomplish the same effect.

Agreed, seqlock only in range_end should be enough. BTW, the fact
seqlock is needed regardless of invalidate_page existing or not,
really makes invalidate_page a no brainer not just from the core VM
point of view, but from the driver point of view too. The
kvm_page_fault logic would be the same even if I remove
invalidate_page from the mmu notifier patch but it'd run slower both
when armed and disarmed.


From holt at sgi.com  Tue Apr 22 06:36:04 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 08:36:04 -0500
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080422132143.GS12709@duo.random>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
	<20080422072026.GM12709@duo.random>
	<20080422120056.GR12709@duo.random>
	<20080422130120.GR22493@sgi.com>
	<20080422132143.GS12709@duo.random>
Message-ID: <20080422133604.GN30298@sgi.com>

On Tue, Apr 22, 2008 at 03:21:43PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 08:01:20AM -0500, Robin Holt wrote:
> > On Tue, Apr 22, 2008 at 02:00:56PM +0200, Andrea Arcangeli wrote:
> > > On Tue, Apr 22, 2008 at 09:20:26AM +0200, Andrea Arcangeli wrote:
> > > >     invalidate_range_start {
> > > > 	spin_lock(&kvm->mmu_lock);
> > > > 
> > > > 	kvm->invalidate_range_count++;
> > > > 	rmap-invalidate of sptes in range
> > > > 
> > > 
> > > 	write_seqlock; write_sequnlock;
> > 
> > I don't think you need it here since invalidate_range_count is already
> > elevated which will accomplish the same effect.
> 
> Agreed, seqlock only in range_end should be enough. BTW, the fact

I am a little confused about the value of the seq_lock versus a simple
atomic, but I assumed there is a reason and left it at that.

> seqlock is needed regardless of invalidate_page existing or not,
> really makes invalidate_page a no brainer not just from the core VM
> point of view, but from the driver point of view too. The
> kvm_page_fault logic would be the same even if I remove
> invalidate_page from the mmu notifier patch but it'd run slower both
> when armed and disarmed.

I don't know what you mean by "it'd" run slower and what you mean by
"armed and disarmed".

For the sake of this discussion, I will assume "it'd" means the kernel in
general and not KVM.  With the two call sites for range_begin/range_end,
I would agree we have more call sites, but the second is extremely likely
to be cache hot.

By disarmed, I will assume you mean no notifiers registered for a
particular mm.  In that case, the cache will make the second call
effectively free.  So, for the disarmed case, I see no measurable
difference.

For the case where there is a notifier registered, I certainly can see
a difference.  I am not certain how to quantify the difference as it
depends on the callee.  In the case of xpmem, our callout is always very
expensive for the _start case.  Our _end case is very light, but it is
essentially the exact same steps we would perform for the _page callout.

When I was discussing this difference with Jack, he reminded me that
the GRU, due to its hardware, does not have any race issues with the
invalidate_page callout simply doing the tlb shootdown and not modifying
any of its internal structures.  He then put a caveat on the discussion
that _either_ method was acceptable as far as he was concerned.  The real
issue is getting a patch in that satisfies all needs and not whether
there is a seperate invalidate_page callout.

Thanks,
Robin


From tziporet at dev.mellanox.co.il  Tue Apr 22 06:44:53 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 22 Apr 2008 16:44:53 +0300
Subject: [ofa-general] Re: [ewg] Agenda for the OFED meeting today
In-Reply-To: <480D5088.1020005@opengridcomputing.com>
References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
	<480D5088.1020005@opengridcomputing.com>
Message-ID: <480DEBD5.3030209@mellanox.co.il>

Steve Wise wrote:
>
> Sorry I missed today's call.  If possible, I'd like a few weeks to get 
> the cxgb3 fixes tested and ready to go.  That puts me around mid may. 
> I'll try and pull that in to make a RC1 of May 6, but I'm thinking I 
> might need another week or so.
>
>
Please try to make most of the code ready for May 6.
You can add more modifications for RC2 which is May 20.

Tziporet


From andrea at qumranet.com  Tue Apr 22 06:48:47 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:48:47 +0200
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080422133604.GN30298@sgi.com>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
	<20080422072026.GM12709@duo.random>
	<20080422120056.GR12709@duo.random>
	<20080422130120.GR22493@sgi.com>
	<20080422132143.GS12709@duo.random>
	<20080422133604.GN30298@sgi.com>
Message-ID: <20080422134847.GT12709@duo.random>

On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote:
> I am a little confused about the value of the seq_lock versus a simple
> atomic, but I assumed there is a reason and left it at that.

There's no value for anything but get_user_pages (get_user_pages takes
its own lock internally though). I preferred to explain it as a
seqlock because it was simpler for reading, but I totally agree in the
final implementation it shouldn't be a seqlock. My code was meant to
be pseudo-code only. It doesn't even need to be atomic ;).

> I don't know what you mean by "it'd" run slower and what you mean by
> "armed and disarmed".

1) when armed the time-window where the kvm-page-fault would be
blocked would be a bit larger without invalidate_page for no good
reason

2) if you were to remove invalidate_page when disarmed the VM could
would need two branches instead of one in various places

I don't want to waste cycles if not wasting them improves performance
both when armed and disarmed.

> For the sake of this discussion, I will assume "it'd" means the kernel in
> general and not KVM.  With the two call sites for range_begin/range_end,

I actually meant for both.

> By disarmed, I will assume you mean no notifiers registered for a
> particular mm.  In that case, the cache will make the second call
> effectively free.  So, for the disarmed case, I see no measurable
> difference.

For rmap is sure effective free, for do_wp_page it costs one branch
for no good reason.

> For the case where there is a notifier registered, I certainly can see
> a difference.  I am not certain how to quantify the difference as it

Agreed.

> When I was discussing this difference with Jack, he reminded me that
> the GRU, due to its hardware, does not have any race issues with the
> invalidate_page callout simply doing the tlb shootdown and not modifying
> any of its internal structures.  He then put a caveat on the discussion
> that _either_ method was acceptable as far as he was concerned.  The real
> issue is getting a patch in that satisfies all needs and not whether
> there is a seperate invalidate_page callout.

Sure, we have that patch now, I'll send it out in a minute, I was just
trying to explain why it makes sense to have an invalidate_page too
(which remains the only difference by now), removing it would be a
regression on all sides, even if a minor one.


From tziporet at mellanox.co.il  Tue Apr 22 06:59:20 2008
From: tziporet at mellanox.co.il (Tziporet Koren)
Date: Tue, 22 Apr 2008 16:59:20 +0300
Subject: [ofa-general] OFED April 21 meeting summary 
In-Reply-To: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
Message-ID: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>

OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4
development: 

> 1. OFED 1.3.1:
> 1.1  Planned changes:
> 	ULPs changes:
> 		IB-bonding - done
> 		SRP failover - on work
> 		SDP crashes - on work
> 		RDS fixes for RDMA API - done
> 		librdmacm 1.0.7 - done
> 		Open MPI 1.2.6 - done
			uDAPL - on work
> 	Low level drivers: - each HW vendor should reply when the
> changes will be ready
			nes - will be ready on first week of May
			mlx4 - fixes are ready; changes to support Eth
are under review of the submission to kernel so not clear if they will
make it on time.
			cxgb3 - will be ready by middle of may. Majority
of changes should be submitted for RC1.
			ipath - wait for update from Betsy
			ehca - wait for update from Christoph

> 1.2 Schedule: we agreed that 2 release candidate should be sufficient
> 	GA is planned for May-29
> 	- RC1 - May 6
> 	- RC2 - May 20
> 
> Note: daily builds of 1.3.1 are already available at:
> http://www.openfabrics.org/builds/ofed-1.3.1
> 
> 
> 2. OFED 1.4:
> Release features were presented at Sonoma (presentation available at
> http://www.openfabrics.org/archives/april2008sonoma.htm)
	IPv6: Woody is looking for resources to add IPv6 support to the
CMA. Hal noted that it will require a change in opensm too.
	Xsigo Vnic & Vhba - Not clear if they will make it

	Kernel tree is under work at:
git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel
	We should try to get the kernel code to compile as soon as
possible so everybody will be able to contribute code.

	Schedule reminder:
	==============
	Release: Oct 06, 2008
	Features freeze: Jun 25, 08 (kernel 2.6.26 based)
	Alpha: 	Jul 9, 08
	Beta: 	Jul 30, 08 kernel 2.6.27-rcX (assuming it will be
available)
	RC1: 	Aug 13, 08
	RC2: 	Aug 27, 08
	RC3-RC5/6 - every 5-10 days
	Latest RC to be used in OFA interop event
	GA:   	Oct 06 08


> Tziporet
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/a43f9961/attachment.html>

From yevgenyp at mellanox.co.il  Tue Apr 22 07:05:38 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 17:05:38 +0300
Subject: [ofa-general][PATCH] mlx4: Port Ethernet mtu capabilities handle
	(MP support, Patch 6)
Message-ID: <480DF0B2.3020203@mellanox.co.il>

>From a37cec875c323ddebe4f0289e4bab774fd9ec0f4 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 13:25:19 +0300
Subject: [PATCH] mlx4: Port Ethernet mtu capabilities handle

Ethernet max mtu and default Mac address are revealed through
QUERY_DEV_CAP command. The received mtu is crossed with requested
max mtu (given as module parameter).

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/fw.c             |   11 ++++++-----
 drivers/net/mlx4/fw.h             |    4 +++-
 drivers/net/mlx4/main.c           |   15 ++++++++++++++-
 include/linux/mlx4/device.h       |    4 +++-
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index e875b08..1cbc30f 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -314,7 +314,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET);
 			dev_cap->max_vl[i]	   = field >> 4;
 			MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET);
-			dev_cap->max_mtu[i]	   = field >> 4;
+			dev_cap->ib_mtu[i]	   = field >> 4;
 			dev_cap->max_port_width[i] = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET);
 			dev_cap->max_gids[i]	   = 1 << (field & 0xf);
@@ -339,7 +339,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 				 QUERY_PORT_SUPPORTED_TYPE_OFFSET);
 			dev_cap->supported_port_types[i] = field & 3;
 			MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET);
-			dev_cap->max_mtu[i]	   = field & 0xf;
+			dev_cap->ib_mtu[i]	   = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET);
 			dev_cap->max_port_width[i] = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET);
@@ -350,7 +350,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET);
 			dev_cap->log_max_macs[i]  = field & 0xf;
 			dev_cap->log_max_vlans[i] = field >> 4;
-
+			dev_cap->eth_mtu[i] = be16_to_cpu(((u16 *) outbox)[1]);
+			dev_cap->def_mac[i] = be64_to_cpu(((u64 *) outbox)[2]);
 		}
 	}

@@ -388,7 +389,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
 		 dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz);
 	mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n",
-		 dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1],
+		 dev_cap->local_ca_ack_delay, 128 << dev_cap->ib_mtu[1],
 		 dev_cap->max_port_width[1]);
 	mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n",
 		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
@@ -796,7 +797,7 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, int port)
 		flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT;
 		MLX4_PUT(inbox, flags,		  INIT_PORT_FLAGS_OFFSET);

-		field = 128 << dev->caps.mtu_cap[port];
+		field = 128 << dev->caps.ib_mtu_cap[port];
 		MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET);
 		field = dev->caps.gid_table_len[port];
 		MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET);
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 50a6a7d..ef964d5 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -61,11 +61,13 @@ struct mlx4_dev_cap {
 	int local_ca_ack_delay;
 	int num_ports;
 	u32 max_msg_sz;
-	int max_mtu[MLX4_MAX_PORTS + 1];
+	int ib_mtu[MLX4_MAX_PORTS + 1];
 	int max_port_width[MLX4_MAX_PORTS + 1];
 	int max_vl[MLX4_MAX_PORTS + 1];
 	int max_gids[MLX4_MAX_PORTS + 1];
 	int max_pkeys[MLX4_MAX_PORTS + 1];
+	u64 def_mac[MLX4_MAX_PORTS + 1];
+	int eth_mtu[MLX4_MAX_PORTS + 1];
 	u16 stat_rate_support;
 	u32 flags;
 	int reserved_uars;
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 1651d8e..754c07c 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -104,6 +104,11 @@ static struct mlx4_profile default_profile = {
 module_param_array_named(port_type, port_type_arr, charp, NULL, 0444);
 MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, "
 		  "comma seperated, default ib for all)");
+
+static int port_mtu[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = 9600};
+module_param_array_named(port_mtu, port_mtu, int, NULL, 0444);
+MODULE_PARM_DESC(port_mtu, "Ports max mtu in Bytes, entry per port, "
+		 "comma seperated, default 9600 for all");

 static int mlx4_check_port_params(struct mlx4_dev *dev,
 				  enum mlx4_port_type *port_type)
@@ -175,10 +180,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.num_ports	     = dev_cap->num_ports;
 	for (i = 1; i <= dev->caps.num_ports; ++i) {
 		dev->caps.vl_cap[i]	    = dev_cap->max_vl[i];
-		dev->caps.mtu_cap[i]	    = dev_cap->max_mtu[i];
+		dev->caps.ib_mtu_cap[i]	    = dev_cap->ib_mtu[i];
 		dev->caps.gid_table_len[i]  = dev_cap->max_gids[i];
 		dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i];
 		dev->caps.port_width_cap[i] = dev_cap->max_port_width[i];
+		dev->caps.eth_mtu_cap[i]    = dev_cap->eth_mtu[i];
+		dev->caps.def_mac[i]        = dev_cap->def_mac[i];
 	}

 	dev->caps.num_uars	     = dev_cap->uar_size / PAGE_SIZE;
@@ -237,6 +244,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 				 "not supported by HW\n", i);
 			return -ENODEV;
 		}
+		if (port_mtu[i-1] <= dev->caps.eth_mtu_cap[i])
+			dev->caps.eth_mtu_cap[i] = port_mtu[i-1];
+		else
+			mlx4_warn(dev, "Requested mtu for port %d is larger "
+				  "then supported, reducing to %d\n",
+				  i, dev->caps.eth_mtu_cap[i]);
 		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
 			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
 			mlx4_warn(dev, "Requested number of MACs is too much "
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 4279b2f..b114ef3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -162,7 +162,9 @@ struct mlx4_caps {
 	u64			fw_ver;
 	int			num_ports;
 	int			vl_cap[MLX4_MAX_PORTS + 1];
-	int			mtu_cap[MLX4_MAX_PORTS + 1];
+	int			ib_mtu_cap[MLX4_MAX_PORTS + 1];
+	u64			def_mac[MLX4_MAX_PORTS + 1];
+	int			eth_mtu_cap[MLX4_MAX_PORTS + 1];
 	int			gid_table_len[MLX4_MAX_PORTS + 1];
 	int			pkey_table_len[MLX4_MAX_PORTS + 1];
 	int			local_ca_ack_delay;
-- 
1.5.4


From yevgenyp at mellanox.co.il  Tue Apr 22 07:07:28 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 17:07:28 +0300
Subject: [ofa-general][PATCH] mlx4: Mac Vlan Management (MP support, Patch 7)
Message-ID: <480DF120.3010006@mellanox.co.il>

>From 93d41d72b8878bfd8d67b6a48b70c392f108fe58 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 14:28:36 +0300
Subject: [PATCH] mlx4: Mac Vlan Management

mlx4_core is now responsible for managing Mac and Vlan filters for
each port. It also notifies the FW which port type will be loaded,
using the SET_PORT command.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/Makefile   |    2 +-
 drivers/net/mlx4/main.c     |   18 +++
 drivers/net/mlx4/mlx4.h     |   35 ++++++
 drivers/net/mlx4/port.c     |  278 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mlx4/cmd.h    |    9 ++
 include/linux/mlx4/device.h |    6 +
 6 files changed, 347 insertions(+), 1 deletions(-)
 create mode 100644 drivers/net/mlx4/port.c

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 0952a65..f4932d8 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_MLX4_CORE)		+= mlx4_core.o

 mlx4_core-y :=	alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \
-		mr.o pd.o profile.o qp.o reset.o srq.o
+		mr.o pd.o profile.o qp.o reset.o srq.o port.o
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 754c07c..a528809 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -678,6 +678,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int err;
+	int port;

 	err = mlx4_init_uar_table(dev);
 	if (err) {
@@ -776,8 +777,25 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
 		goto err_qp_table_free;
 	}

+	for (port = 1; port <= dev->caps.num_ports; port++) {
+		err = mlx4_SET_PORT(dev, port);
+		if (err) {
+			mlx4_err(dev, "Failed to set port %d, aborting\n",
+				 port);
+			goto err_mcg_table_free;
+		}
+	}
+
+	for (port = 0; port < dev->caps.num_ports; port++) {
+		mlx4_init_mac_table(dev, port);
+		mlx4_init_vlan_table(dev, port);
+	}
+
 	return 0;

+err_mcg_table_free:
+	mlx4_cleanup_mcg_table(dev);
+
 err_qp_table_free:
 	mlx4_cleanup_qp_table(dev);

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b74405a..eff1c5a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -251,6 +251,35 @@ struct mlx4_catas_err {
 	struct list_head	list;
 };

+struct mlx4_mac_table {
+#define MLX4_MAX_MAC_NUM	128
+#define MLX4_MAC_MASK		0xffffffffffff
+#define MLX4_MAC_VALID_SHIFT	63
+#define MLX4_MAC_TABLE_SIZE	MLX4_MAX_MAC_NUM << 3
+	__be64 entries[MLX4_MAX_MAC_NUM];
+	int refs[MLX4_MAX_MAC_NUM];
+	struct semaphore mac_sem;
+	int total;
+	int max;
+};
+
+struct mlx4_vlan_table {
+#define MLX4_MAX_VLAN_NUM	126
+#define MLX4_VLAN_MASK		0xfff
+#define MLX4_VLAN_VALID		1 << 31
+#define MLX4_VLAN_TABLE_SIZE	MLX4_MAX_VLAN_NUM << 2
+	__be32 entries[MLX4_MAX_VLAN_NUM];
+	int refs[MLX4_MAX_VLAN_NUM];
+	struct semaphore vlan_sem;
+	int total;
+	int max;
+};
+
+struct mlx4_port_info {
+	struct mlx4_mac_table	mac_table;
+	struct mlx4_vlan_table	vlan_table;
+};
+
 struct mlx4_priv {
 	struct mlx4_dev		dev;

@@ -279,6 +308,7 @@ struct mlx4_priv {

 	struct mlx4_uar		driver_uar;
 	void __iomem	       *kar;
+	struct mlx4_port_info	port[MLX4_MAX_PORTS];
 };

 static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)
@@ -351,4 +381,9 @@ void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type);

 void mlx4_handle_catas_err(struct mlx4_dev *dev);

+void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port);
+void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port);
+
+int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port);
+
 #endif /* MLX4_H */
diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c
new file mode 100644
index 0000000..910fc35
--- /dev/null
+++ b/drivers/net/mlx4/port.c
@@ -0,0 +1,278 @@
+/*
+ * Copyright (c) 2007 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/if_ether.h>
+
+#include <linux/mlx4/cmd.h>
+
+#include "mlx4.h"
+
+void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port].mac_table;
+	int i;
+
+	sema_init(&table->mac_sem, 1);
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		table->entries[i] = 0;
+		table->refs[i] = 0;
+	}
+	table->max = 1 << dev->caps.log_num_macs;
+	table->total = 0;
+}
+
+void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table;
+	int i;
+
+	sema_init(&table->vlan_sem, 1);
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		table->entries[i] = 0;
+		table->refs[i] = 0;
+	}
+	table->max = 1 << dev->caps.log_num_vlans;
+	table->total = 0;
+}
+
+static int mlx4_SET_PORT_mac_table(struct mlx4_dev *dev, u8 port,
+				   __be64 *entries)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 in_mod;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memcpy(mailbox->buf, entries, MLX4_MAC_TABLE_SIZE);
+
+	in_mod = MLX4_SET_PORT_MAC_TABLE << 8 | port;
+	err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+
+int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table;
+	int i, err = 0;
+	int free = -1;
+	u64 valid = 1;
+
+	mlx4_dbg(dev, "Registering mac : 0x%llx\n", mac);
+	down(&table->mac_sem);
+	for (i = 0; i < MLX4_MAX_MAC_NUM - 1; i++) {
+		if (free < 0 && !table->refs[i]) {
+			free = i;
+			continue;
+		}
+
+		if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) {
+			/* Mac already registered, increase refernce count */
+			*index = i;
+			++table->refs[i];
+			goto out;
+		}
+	}
+	mlx4_dbg(dev, "Free mac index is %d\n", free);
+
+	if (table->total == table->max) {
+		/* No free mac entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
+	/* Register new MAC */
+	table->refs[free] = 1;
+	table->entries[free] = cpu_to_be64(mac | valid << MLX4_MAC_VALID_SHIFT);
+
+	err = mlx4_SET_PORT_mac_table(dev, port, table->entries);
+	if (unlikely(err)) {
+		mlx4_err(dev, "Failed adding mac: 0x%llx\n", mac);
+		table->refs[free] = 0;
+		table->entries[free] = 0;
+		goto out;
+	}
+
+	*index = free;
+	++table->total;
+out:
+	up(&table->mac_sem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_register_mac);
+
+void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table;
+
+	down(&table->mac_sem);
+	if (!table->refs[index]) {
+		mlx4_warn(dev, "No mac entry for index %d\n", index);
+		goto out;
+	}
+	if (--table->refs[index]) {
+		mlx4_warn(dev, "Have more references for index %d,"
+			  "no need to modify mac table\n", index);
+		goto out;
+	}
+	table->entries[index] = 0;
+	mlx4_SET_PORT_mac_table(dev, port, table->entries);
+	--table->total;
+out:
+	up(&table->mac_sem);
+}
+EXPORT_SYMBOL_GPL(mlx4_unregister_mac);
+
+static int mlx4_SET_PORT_vlan_table(struct mlx4_dev *dev, u8 port,
+				    __be32 *entries)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 in_mod;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memcpy(mailbox->buf, entries, MLX4_VLAN_TABLE_SIZE);
+	in_mod = MLX4_SET_PORT_VLAN_TABLE << 8 | port;
+	err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+
+	return err;
+}
+
+int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table;
+	int i, err = 0;
+	int free = -1;
+
+	down(&table->vlan_sem);
+	for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) {
+		if (free < 0 && (table->refs[i] == 0)) {
+			free = i;
+			continue;
+		}
+
+		if (table->refs[i] &&
+		    (vlan == (MLX4_VLAN_MASK &
+			      be32_to_cpu(table->entries[i])))) {
+			/* Vlan already registered, increase refernce count */
+			*index = i;
+			++table->refs[i];
+			goto out;
+		}
+	}
+
+	if (table->total == table->max) {
+		/* No free vlan entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
+	/* Register new MAC */
+	table->refs[free] = 1;
+	table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID);
+
+	err = mlx4_SET_PORT_vlan_table(dev, port, table->entries);
+	if (unlikely(err)) {
+		mlx4_warn(dev, "Failed adding vlan: %u\n", vlan);
+		table->refs[free] = 0;
+		table->entries[free] = 0;
+		goto out;
+	}
+
+	*index = free;
+	++table->total;
+out:
+	up(&table->vlan_sem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_register_vlan);
+
+void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table;
+
+	down(&table->vlan_sem);
+	if (!table->refs[index]) {
+		mlx4_warn(dev, "No vlan entry for index %d\n", index);
+		goto out;
+	}
+	if (--table->refs[index]) {
+		mlx4_dbg(dev, "Have more references for index %d,"
+			 "no need to modify vlan table\n", index);
+		goto out;
+	}
+	table->entries[index] = 0;
+	mlx4_SET_PORT_vlan_table(dev, port, table->entries);
+	--table->total;
+out:
+	up(&table->vlan_sem);
+}
+EXPORT_SYMBOL_GPL(mlx4_unregister_vlan);
+
+int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	int err;
+	u8 is_eth = (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH) ? 1 : 0;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memset(mailbox->buf, 0, 256);
+	if (is_eth) {
+		((u8 *) mailbox->buf)[3] = 7;
+		((__be16 *) mailbox->buf)[3] =
+			cpu_to_be16(dev->caps.eth_mtu_cap[port] +
+				    ETH_HLEN + ETH_FCS_LEN);
+		((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15);
+		((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15);
+	}
+	err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 77323a7..cf9c679 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -132,6 +132,15 @@ enum {
 	MLX4_MAILBOX_SIZE	=  4096
 };

+enum {
+	/* set port opcode modifiers */
+	MLX4_SET_PORT_GENERAL   = 0x0,
+	MLX4_SET_PORT_RQP_CALC  = 0x1,
+	MLX4_SET_PORT_MAC_TABLE = 0x2,
+	MLX4_SET_PORT_VLAN_TABLE = 0x3,
+	MLX4_SET_PORT_PRIO_MAP  = 0x4,
+};
+
 struct mlx4_dev;

 struct mlx4_cmd_mailbox {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index b114ef3..4ca3a00 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -461,6 +461,12 @@ int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port);
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);
 int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);

+int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index);
+void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index);
+
+int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index);
+void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index);
+
 int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list,
 		      int npages, u64 iova, u32 *lkey, u32 *rkey);
 int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
-- 
1.5.4


From andrea at qumranet.com  Tue Apr 22 06:51:16 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:16 +0200
Subject: [ofa-general] [PATCH 00 of 12] mmu notifier #v13
Message-ID: <patchbomb.1208872276@duo.random>

Hello,

This is the latest and greatest version of the mmu notifier patch #v13.

Changes are mainly in the mm_lock that uses sort() suggested by Christoph.
This reduces the complexity from O(N**2) to O(N*log(N)).

I folded the mm_lock functionality together with the mmu-notifier-core 1/12
patch to make it self-contained. I recommend merging 1/12 into -mm/mainline
ASAP. Lack of mmu notifiers is holding off KVM development. We are going to
rework the way the pages are mapped and unmapped to work with pure pfn for pci
passthrough without the use of page pinning, and we can't without mmu
notifiers. This is not just a performance matter.

KVM/GRU and AFAICT Quadrics are all covered by applying the single 1/12 patch
that shall be shipped with 2.6.26. The risk of brekage by applying 1/12 is
zero. Both when MMU_NOTIFIER=y and when it's =n, so it shouldn't be delayed
further.

XPMEM support comes with the later patches 2-12, risk for those patches is >0
and this is why the mmu-notifier-core is numbered 1/12 and not 12/12. Some are
simple and can go in immediately but not all are so simple.

2-12/12 are posted as usual for review by the VM developers and so Robin can
keep testing them on XPMEM and they can be merged later without any downside
(they're mostly orthogonal with 1/12).


From andrea at qumranet.com  Tue Apr 22 06:51:18 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:18 +0200
Subject: [ofa-general] [PATCH 02 of 12] Fix ia64 compilation failure because
	of common code include bug
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <3c804dca25b15017b220.1208872278@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872186 -7200
# Node ID 3c804dca25b15017b22008647783d6f5f3801fa9
# Parent  ea87c15371b1bd49380c40c3f15f1c7ca4438af5
Fix ia64 compilation failure because of common code include bug.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include <linux/rbtree.h>
 #include <linux/rwsem.h>
 #include <linux/completion.h>
+#include <linux/cpumask.h>
 #include <asm/page.h>
 #include <asm/mmu.h>
 

From andrea at qumranet.com  Tue Apr 22 06:51:19 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:19 +0200
Subject: [ofa-general] [PATCH 03 of 12] get_task_mm should not succeed if
	mmput() is running and has reduced
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <a6672bdeead0d41b2ebd.1208872279@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872186 -7200
# Node ID a6672bdeead0d41b2ebd6846f731d43a611645b7
# Parent  3c804dca25b15017b22008647783d6f5f3801fa9
get_task_mm should not succeed if mmput() is running and has reduced
the mm_users count to zero. This can occur if a processor follows
a tasks pointer to an mm struct because that pointer is only cleared
after the mmput().

If get_task_mm() succeeds after mmput() reduced the mm_users to zero then
we have the lovely situation that one portion of the kernel is doing
all the teardown work for an mm while another portion is happily using
it.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -442,7 +442,8 @@
 		if (task->flags & PF_BORROWED_MM)
 			mm = NULL;
 		else
-			atomic_inc(&mm->mm_users);
+			if (!atomic_inc_not_zero(&mm->mm_users))
+				mm = NULL;
 	}
 	task_unlock(task);
 	return mm;


From andrea at qumranet.com  Tue Apr 22 06:51:17 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:17 +0200
Subject: [ofa-general] [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <ea87c15371b1bd49380c.1208872277@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208870142 -7200
# Node ID ea87c15371b1bd49380c40c3f15f1c7ca4438af5
# Parent  fb3bc9942fb78629d096bd07564f435d51d86e5f
Core of mmu notifiers.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Nick Piggin <npiggin at suse.de>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,27 @@
 				   unsigned long addr, unsigned long len,
 				   unsigned long flags, struct page **pages);
 
+/*
+ * mm_lock will take mmap_sem writably (to prevent all modifications
+ * and scanning of vmas) and then also takes the mapping locks for
+ * each of the vma to lockout any scans of pagetables of this address
+ * space. This can be used to effectively holding off reclaim from the
+ * address space.
+ *
+ * mm_lock can fail if there is not enough memory to store a pointer
+ * array to all vmas.
+ *
+ * mm_lock and mm_unlock are expensive operations that may take a long time.
+ */
+struct mm_lock_data {
+	spinlock_t **i_mmap_locks;
+	spinlock_t **anon_vma_locks;
+	size_t nr_i_mmap_locks;
+	size_t nr_anon_vma_locks;
+};
+extern int mm_lock(struct mm_struct *mm, struct mm_lock_data *data);
+extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,6 +225,9 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+	struct hlist_head mmu_notifier_list;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,229 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mm_types.h>
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_ops {
+	/*
+	 * Called after all other threads have terminated and the executing
+	 * thread is the only remaining execution thread. There are no
+	 * users of the mm_struct remaining.
+	 */
+	void (*release)(struct mmu_notifier *mn,
+			struct mm_struct *mm);
+
+	/*
+	 * clear_flush_young is called after the VM is
+	 * test-and-clearing the young/accessed bitflag in the
+	 * pte. This way the VM will provide proper aging to the
+	 * accesses to the page through the secondary MMUs and not
+	 * only to the ones through the Linux pte.
+	 */
+	int (*clear_flush_young)(struct mmu_notifier *mn,
+				 struct mm_struct *mm,
+				 unsigned long address);
+
+	/*
+	 * Before this is invoked any secondary MMU is still ok to
+	 * read/write to the page previously pointed by the Linux pte
+	 * because the old page hasn't been freed yet.  If required
+	 * set_page_dirty has to be called internally to this method.
+	 */
+	void (*invalidate_page)(struct mmu_notifier *mn,
+				struct mm_struct *mm,
+				unsigned long address);
+
+	/*
+	 * invalidate_range_start() and invalidate_range_end() must be
+	 * paired and are called only when the mmap_sem is held and/or
+	 * the semaphores protecting the reverse maps. Both functions
+	 * may sleep. The subsystem must guarantee that no additional
+	 * references to the pages in the range established between
+	 * the call to invalidate_range_start() and the matching call
+	 * to invalidate_range_end().
+	 *
+	 * Invalidation of multiple concurrent ranges may be permitted
+	 * by the driver or the driver may exclude other invalidation
+	 * from proceeding by blocking on new invalidate_range_start()
+	 * callback that overlap invalidates that are already in
+	 * progress. Either way the establishment of sptes to the
+	 * range can only be allowed if all invalidate_range_stop()
+	 * function have been called.
+	 *
+	 * invalidate_range_start() is called when all pages in the
+	 * range are still mapped and have at least a refcount of one.
+	 *
+	 * invalidate_range_end() is called when all pages in the
+	 * range have been unmapped and the pages have been freed by
+	 * the VM.
+	 *
+	 * The VM will remove the page table entries and potentially
+	 * the page between invalidate_range_start() and
+	 * invalidate_range_end(). If the page must not be freed
+	 * because of pending I/O or other circumstances then the
+	 * invalidate_range_start() callback (or the initial mapping
+	 * by the driver) must make sure that the refcount is kept
+	 * elevated.
+	 *
+	 * If the driver increases the refcount when the pages are
+	 * initially mapped into an address space then either
+	 * invalidate_range_start() or invalidate_range_end() may
+	 * decrease the refcount. If the refcount is decreased on
+	 * invalidate_range_start() then the VM can free pages as page
+	 * table entries are removed.  If the refcount is only
+	 * droppped on invalidate_range_end() then the driver itself
+	 * will drop the last refcount but it must take care to flush
+	 * any secondary tlb before doing the final free on the
+	 * page. Pages will no longer be referenced by the linux
+	 * address space but may still be referenced by sptes until
+	 * the last refcount is dropped.
+	 */
+	void (*invalidate_range_start)(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long start, unsigned long end);
+	void (*invalidate_range_end)(struct mmu_notifier *mn,
+				     struct mm_struct *mm,
+				     unsigned long start, unsigned long end);
+};
+
+/*
+ * The notifier chains are protected by mmap_sem and/or the reverse map
+ * semaphores. Notifier chains are only changed when all reverse maps and
+ * the mmap_sem locks are taken.
+ *
+ * Therefore notifier chains can only be traversed when either
+ *
+ * 1. mmap_sem is held.
+ * 2. One of the reverse map locks is held (i_mmap_sem or anon_vma->sem).
+ * 3. No other concurrent thread can access the list (release)
+ */
+struct mmu_notifier {
+	struct hlist_node hlist;
+	const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return unlikely(!hlist_empty(&mm->mmu_notifier_list));
+}
+
+extern int mmu_notifier_register(struct mmu_notifier *mn,
+				 struct mm_struct *mm);
+extern int mmu_notifier_unregister(struct mmu_notifier *mn,
+				   struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_clear_flush_young(mm, address);
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_end(mm, start, end);
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+	INIT_HLIST_HEAD(&mm->mmu_notifier_list);
+}
+
+#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
+({									\
+	pte_t __pte;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
+	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
+	__pte;								\
+})
+
+#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
+({									\
+	int __young;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
+	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
+						  ___address);		\
+	__young;							\
+})
+
+#else /* CONFIG_MMU_NOTIFIER */
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+}
+
+#define ptep_clear_flush_young_notify ptep_clear_flush_young
+#define ptep_clear_flush_notify ptep_clear_flush
+
+#endif /* CONFIG_MMU_NOTIFIER */
+
+#endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -53,6 +53,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -362,6 +363,7 @@
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
+		mmu_notifier_mm_init(mm);
 		return mm;
 	}
 
diff --git a/mm/Kconfig b/mm/Kconfig
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,3 +193,7 @@
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config MMU_NOTIFIER
+	def_bool y
+	bool "MMU notifier, for paging KVM/RDMA"
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,4 +33,5 @@
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush(vma, address, pte);
+			pteval = ptep_clear_flush_notify(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -214,7 +215,9 @@
 		spin_unlock(&mapping->i_mmap_lock);
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = populate_range(mm, vma, start, size, pgoff);
+	mmu_notifier_invalidate_range_end(mm, start, start + size);
 	if (!err && !(flags & MAP_NONBLOCK)) {
 		if (unlikely(has_write_lock)) {
 			downgrade_write(&mm->mmap_sem);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,6 +14,7 @@
 #include <linux/mempolicy.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -799,6 +800,7 @@
 	BUG_ON(start & ~HPAGE_MASK);
 	BUG_ON(end & ~HPAGE_MASK);
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -819,6 +821,7 @@
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		list_del(&page->lru);
 		put_page(page);
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -611,6 +612,9 @@
 	if (is_vm_hugetlb_page(vma))
 		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_start(src_mm, addr, end);
+
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);
 	do {
@@ -621,6 +625,11 @@
 						vma, addr, next))
 			return -ENOMEM;
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
+
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_end(src_mm,
+						vma->vm_start, end);
+
 	return 0;
 }
 
@@ -825,7 +834,9 @@
 	unsigned long start = start_addr;
 	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
 	int fullmm = (*tlbp)->fullmm;
+	struct mm_struct *mm = vma->vm_mm;
 
+	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
 		unsigned long end;
 
@@ -876,6 +887,7 @@
 		}
 	}
 out:
+	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
 	return start;	/* which is now the end (or restart) address */
 }
 
@@ -1463,10 +1475,11 @@
 {
 	pgd_t *pgd;
 	unsigned long next;
-	unsigned long end = addr + size;
+	unsigned long start = addr, end = addr + size;
 	int err;
 
 	BUG_ON(addr >= end);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pgd = pgd_offset(mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1474,6 +1487,7 @@
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1675,7 +1689,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush(vma, address, page_table);
+		ptep_clear_flush_notify(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,9 @@
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
+#include <linux/vmalloc.h>
+#include <linux/sort.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -2038,6 +2041,7 @@
 
 	/* mm's last user has gone, and its about to be pulled down */
 	arch_exit_mmap(mm);
+	mmu_notifier_release(mm);
 
 	lru_add_drain();
 	flush_cache_mm(mm);
@@ -2242,3 +2246,143 @@
 
 	return 0;
 }
+
+static int mm_lock_cmp(const void *a, const void *b)
+{
+	cond_resched();
+	if ((unsigned long)*(spinlock_t **)a <
+	    (unsigned long)*(spinlock_t **)b)
+		return -1;
+	else if (a == b)
+		return 0;
+	else
+		return 1;
+}
+
+static unsigned long mm_lock_sort(struct mm_struct *mm, spinlock_t **locks,
+				  int anon)
+{
+	struct vm_area_struct *vma;
+	size_t i = 0;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (anon) {
+			if (vma->anon_vma)
+				locks[i++] = &vma->anon_vma->lock;
+		} else {
+			if (vma->vm_file && vma->vm_file->f_mapping)
+				locks[i++] = &vma->vm_file->f_mapping->i_mmap_lock;
+		}
+	}
+
+	if (!i)
+		goto out;
+
+	sort(locks, i, sizeof(spinlock_t *), mm_lock_cmp, NULL);
+
+out:
+	return i;
+}
+
+static inline unsigned long mm_lock_sort_anon_vma(struct mm_struct *mm,
+						  spinlock_t **locks)
+{
+	return mm_lock_sort(mm, locks, 1);
+}
+
+static inline unsigned long mm_lock_sort_i_mmap(struct mm_struct *mm,
+						spinlock_t **locks)
+{
+	return mm_lock_sort(mm, locks, 0);
+}
+
+static void mm_lock_unlock(spinlock_t **locks, size_t nr, int lock)
+{
+	spinlock_t *last = NULL;
+	size_t i;
+
+	for (i = 0; i < nr; i++)
+		/*  Multiple vmas may use the same lock. */
+		if (locks[i] != last) {
+			BUG_ON((unsigned long) last > (unsigned long) locks[i]);
+			last = locks[i];
+			if (lock)
+				spin_lock(last);
+			else
+				spin_unlock(last);
+		}
+}
+
+static inline void __mm_lock(spinlock_t **locks, size_t nr)
+{
+	mm_lock_unlock(locks, nr, 1);
+}
+
+static inline void __mm_unlock(spinlock_t **locks, size_t nr)
+{
+	mm_lock_unlock(locks, nr, 0);
+}
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+int mm_lock(struct mm_struct *mm, struct mm_lock_data *data)
+{
+	spinlock_t **anon_vma_locks, **i_mmap_locks;
+
+	down_write(&mm->mmap_sem);
+	if (mm->map_count) {
+		anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count);
+		if (unlikely(!anon_vma_locks)) {
+			up_write(&mm->mmap_sem);
+			return -ENOMEM;
+		}
+
+		i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count);
+		if (unlikely(!i_mmap_locks)) {
+			up_write(&mm->mmap_sem);
+			vfree(anon_vma_locks);
+			return -ENOMEM;
+		}
+
+		data->nr_anon_vma_locks = mm_lock_sort_anon_vma(mm, anon_vma_locks);
+		data->nr_i_mmap_locks = mm_lock_sort_i_mmap(mm, i_mmap_locks);
+
+		if (data->nr_anon_vma_locks) {
+			__mm_lock(anon_vma_locks, data->nr_anon_vma_locks);
+			data->anon_vma_locks = anon_vma_locks;
+		} else
+			vfree(anon_vma_locks);
+
+		if (data->nr_i_mmap_locks) {
+			__mm_lock(i_mmap_locks, data->nr_i_mmap_locks);
+			data->i_mmap_locks = i_mmap_locks;
+		} else
+			vfree(i_mmap_locks);
+	}
+	return 0;
+}
+
+static void mm_unlock_vfree(spinlock_t **locks, size_t nr)
+{
+	__mm_unlock(locks, nr);
+	vfree(locks);
+}
+
+/* avoid memory allocations for mm_unlock to prevent deadlock */
+void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
+{
+	if (mm->map_count) {
+		if (data->nr_anon_vma_locks)
+			mm_unlock_vfree(data->anon_vma_locks,
+					data->nr_anon_vma_locks);
+		if (data->i_mmap_locks)
+			mm_unlock_vfree(data->i_mmap_locks,
+					data->nr_i_mmap_locks);
+	}
+	up_write(&mm->mmap_sem);
+}
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
new file mode 100644
--- /dev/null
+++ b/mm/mmu_notifier.c
@@ -0,0 +1,130 @@
+/*
+ *  linux/mm/mmu_notifier.c
+ *
+ *  Copyright (C) 2008  Qumranet, Inc.
+ *  Copyright (C) 2008  SGI
+ *             Christoph Lameter <clameter at sgi.com>
+ *
+ *  This work is licensed under the terms of the GNU GPL, version 2. See
+ *  the COPYING file in the top-level directory.
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/err.h>
+
+/*
+ * No synchronization. This function can only be called when only a single
+ * process remains that performs teardown.
+ */
+void __mmu_notifier_release(struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+
+	while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
+		mn = hlist_entry(mm->mmu_notifier_list.first,
+				 struct mmu_notifier,
+				 hlist);
+		hlist_del(&mn->hlist);
+		if (mn->ops->release)
+			mn->ops->release(mn, mm);
+	}
+}
+
+/*
+ * If no young bitflag is supported by the hardware, ->clear_flush_young can
+ * unmap the address and return 1 or 0 depending if the mapping previously
+ * existed or not.
+ */
+int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int young = 0;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->clear_flush_young)
+			young |= mn->ops->clear_flush_young(mn, mm, address);
+	}
+
+	return young;
+}
+
+void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_page)
+			mn->ops->invalidate_page(mn, mm, address);
+	}
+}
+
+void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_range_start)
+			mn->ops->invalidate_range_start(mn, mm, start, end);
+	}
+}
+
+void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+
+	hlist_for_each_entry(mn, n, &mm->mmu_notifier_list, hlist) {
+		if (mn->ops->invalidate_range_end)
+			mn->ops->invalidate_range_end(mn, mm, start, end);
+	}
+}
+
+/*
+ * Must not hold mmap_sem nor any other VM related lock when calling
+ * this registration function.
+ */
+int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data data;
+	int ret;
+
+	ret = mm_lock(mm, &data);
+	if (unlikely(ret))
+		goto out;
+	hlist_add_head(&mn->hlist, &mm->mmu_notifier_list);
+	mm_unlock(mm, &data);
+out:
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_register);
+
+/*
+ * mm_users can't go down to zero while mmu_notifier_unregister()
+ * runs or it can race with ->release. So a mm_users pin must
+ * be taken by the caller (if mm can be different from current->mm).
+ */
+int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data data;
+	int ret;
+
+	BUG_ON(!atomic_read(&mm->mm_users));
+
+	ret = mm_lock(mm, &data);
+	if (unlikely(ret))
+		goto out;
+	hlist_del(&mn->hlist);
+	mm_unlock(mm, &data);
+out:
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -21,6 +21,7 @@
 #include <linux/syscalls.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
@@ -198,10 +199,12 @@
 		dirty_accountable = 1;
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
 	vm_stat_account(mm, newflags, vma->vm_file, nrpages);
 	return 0;
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -18,6 +18,7 @@
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -74,7 +75,11 @@
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
+	unsigned long old_start;
 
+	old_start = old_addr;
+	mmu_notifier_invalidate_range_start(vma->vm_mm,
+					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -116,6 +121,7 @@
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -49,6 +49,7 @@
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlbflush.h>
 
@@ -287,7 +288,7 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young(vma, address, pte))
+	} else if (ptep_clear_flush_young_notify(vma, address, pte))
 		referenced++;
 
 	/* Pretend the page is referenced if the task has the
@@ -456,7 +457,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
+		entry = ptep_clear_flush_notify(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -717,14 +718,14 @@
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young(vma, address, pte)))) {
+			(ptep_clear_flush_young_notify(vma, address, pte)))) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+	pteval = ptep_clear_flush_notify(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -849,12 +850,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young(vma, address, pte))
+		if (ptep_clear_flush_young_notify(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
+		pteval = ptep_clear_flush_notify(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))


From andrea at qumranet.com  Tue Apr 22 06:51:20 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:20 +0200
Subject: [ofa-general] [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872186 -7200
# Node ID ac9bb1fb3de2aa5d27210a28edf24f6577094076
# Parent  a6672bdeead0d41b2ebd6846f731d43a611645b7
Moves all mmu notifier methods outside the PT lock (first and not last
step to make them sleep capable).

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -169,27 +169,6 @@
 	INIT_HLIST_HEAD(&mm->mmu_notifier_list);
 }
 
-#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
-({									\
-	pte_t __pte;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
-	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
-	__pte;								\
-})
-
-#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
-({									\
-	int __young;							\
-	struct vm_area_struct *___vma = __vma;				\
-	unsigned long ___address = __address;				\
-	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
-	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
-						  ___address);		\
-	__young;							\
-})
-
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -221,9 +200,6 @@
 {
 }
 
-#define ptep_clear_flush_young_notify ptep_clear_flush_young
-#define ptep_clear_flush_notify ptep_clear_flush
-
 #endif /* CONFIG_MMU_NOTIFIER */
 
 #endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,11 +194,13 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush_notify(vma, address, pte);
+			pteval = ptep_clear_flush(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
+			/* must invalidate_page _before_ freeing the page */
+			mmu_notifier_invalidate_page(mm, address);
 			page_cache_release(page);
 		}
 	}
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1627,9 +1627,10 @@
 			 */
 			page_table = pte_offset_map_lock(mm, pmd, address,
 							 &ptl);
-			page_cache_release(old_page);
+			new_page = NULL;
 			if (!pte_same(*page_table, orig_pte))
 				goto unlock;
+			page_cache_release(old_page);
 
 			page_mkwrite = 1;
 		}
@@ -1645,6 +1646,7 @@
 		if (ptep_set_access_flags(vma, address, page_table, entry,1))
 			update_mmu_cache(vma, address, entry);
 		ret |= VM_FAULT_WRITE;
+		old_page = new_page = NULL;
 		goto unlock;
 	}
 
@@ -1689,7 +1691,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush_notify(vma, address, page_table);
+		ptep_clear_flush(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
@@ -1701,12 +1703,18 @@
 	} else
 		mem_cgroup_uncharge_page(new_page);
 
-	if (new_page)
+unlock:
+	pte_unmap_unlock(page_table, ptl);
+
+	if (new_page) {
+		if (new_page == old_page)
+			/* cow happened, notify before releasing old_page */
+			mmu_notifier_invalidate_page(mm, address);
 		page_cache_release(new_page);
+	}
 	if (old_page)
 		page_cache_release(old_page);
-unlock:
-	pte_unmap_unlock(page_table, ptl);
+
 	if (dirty_page) {
 		if (vma->vm_file)
 			file_update_time(vma->vm_file);
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -275,7 +275,7 @@
 	unsigned long address;
 	pte_t *pte;
 	spinlock_t *ptl;
-	int referenced = 0;
+	int referenced = 0, clear_flush_young = 0;
 
 	address = vma_address(page, vma);
 	if (address == -EFAULT)
@@ -288,8 +288,11 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young_notify(vma, address, pte))
-		referenced++;
+	} else {
+		clear_flush_young = 1;
+		if (ptep_clear_flush_young(vma, address, pte))
+			referenced++;
+	}
 
 	/* Pretend the page is referenced if the task has the
 	   swap token and is in the middle of a page fault. */
@@ -299,6 +302,10 @@
 
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
+
+	if (clear_flush_young)
+		referenced += mmu_notifier_clear_flush_young(mm, address);
+
 out:
 	return referenced;
 }
@@ -457,7 +464,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush_notify(vma, address, pte);
+		entry = ptep_clear_flush(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -465,6 +472,10 @@
 	}
 
 	pte_unmap_unlock(pte, ptl);
+
+	if (ret)
+		mmu_notifier_invalidate_page(mm, address);
+
 out:
 	return ret;
 }
@@ -717,15 +728,14 @@
 	 * If it's recently referenced (perhaps page_referenced
 	 * skipped over this mm) then we should reactivate it.
 	 */
-	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young_notify(vma, address, pte)))) {
+	if (!migration && (vma->vm_flags & VM_LOCKED)) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush_notify(vma, address, pte);
+	pteval = ptep_clear_flush(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -780,6 +790,8 @@
 
 out_unmap:
 	pte_unmap_unlock(pte, ptl);
+	if (ret != SWAP_FAIL)
+		mmu_notifier_invalidate_page(mm, address);
 out:
 	return ret;
 }
@@ -818,7 +830,7 @@
 	spinlock_t *ptl;
 	struct page *page;
 	unsigned long address;
-	unsigned long end;
+	unsigned long start, end;
 
 	address = (vma->vm_start + cursor) & CLUSTER_MASK;
 	end = address + CLUSTER_SIZE;
@@ -839,6 +851,8 @@
 	if (!pmd_present(*pmd))
 		return;
 
+	start = address;
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pte = pte_offset_map_lock(mm, pmd, address, &ptl);
 
 	/* Update high watermark before we lower rss */
@@ -850,12 +864,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young_notify(vma, address, pte))
+		if (ptep_clear_flush_young(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush_notify(vma, address, pte);
+		pteval = ptep_clear_flush(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))
@@ -871,6 +885,7 @@
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 }
 
 static int try_to_unmap_anon(struct page *page, int migration)


From andrea at qumranet.com  Tue Apr 22 06:51:21 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:21 +0200
Subject: [ofa-general] [PATCH 05 of 12] Move the tlb flushing into
	free_pgtables. The conversion of the locks
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <ee8c0644d5f67c1ef591.1208872281@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872186 -7200
# Node ID ee8c0644d5f67c1ef59142cce91b0bb6f34a53e0
# Parent  ac9bb1fb3de2aa5d27210a28edf24f6577094076
Move the tlb flushing into free_pgtables. The conversion of the locks
taken for reverse map scanning would require taking sleeping locks
in free_pgtables() and we cannot sleep while gathering pages for a tlb
flush.

Move the tlb_gather/tlb_finish call to free_pgtables() to be done
for each vma. This may add a number of tlb flushes depending on the
number of vmas that cannot be coalesced into one.

The first pointer argument to free_pgtables() can then be dropped.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -751,8 +751,8 @@
 		    void *private);
 void free_pgd_range(struct mmu_gather **tlb, unsigned long addr,
 		unsigned long end, unsigned long floor, unsigned long ceiling);
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *start_vma,
-		unsigned long floor, unsigned long ceiling);
+void free_pgtables(struct vm_area_struct *start_vma, unsigned long floor,
+						unsigned long ceiling);
 int copy_page_range(struct mm_struct *dst, struct mm_struct *src,
 			struct vm_area_struct *vma);
 void unmap_mapping_range(struct address_space *mapping,
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -272,9 +272,11 @@
 	} while (pgd++, addr = next, addr != end);
 }
 
-void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
-		unsigned long floor, unsigned long ceiling)
+void free_pgtables(struct vm_area_struct *vma, unsigned long floor,
+							unsigned long ceiling)
 {
+	struct mmu_gather *tlb;
+
 	while (vma) {
 		struct vm_area_struct *next = vma->vm_next;
 		unsigned long addr = vma->vm_start;
@@ -286,7 +288,8 @@
 		unlink_file_vma(vma);
 
 		if (is_vm_hugetlb_page(vma)) {
-			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
+			tlb = tlb_gather_mmu(vma->vm_mm, 0);
+			hugetlb_free_pgd_range(&tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
 		} else {
 			/*
@@ -299,9 +302,11 @@
 				anon_vma_unlink(vma);
 				unlink_file_vma(vma);
 			}
-			free_pgd_range(tlb, addr, vma->vm_end,
+			tlb = tlb_gather_mmu(vma->vm_mm, 0);
+			free_pgd_range(&tlb, addr, vma->vm_end,
 				floor, next? next->vm_start: ceiling);
 		}
+		tlb_finish_mmu(tlb, addr, vma->vm_end);
 		vma = next;
 	}
 }
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1752,9 +1752,9 @@
 	update_hiwater_rss(mm);
 	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
+	tlb_finish_mmu(tlb, start, end);
+	free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
-	tlb_finish_mmu(tlb, start, end);
 }
 
 /*
@@ -2050,8 +2050,8 @@
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
 	tlb_finish_mmu(tlb, 0, end);
+	free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
 	/*
 	 * Walk the list again, actually closing and freeing it,


From andrea at qumranet.com  Tue Apr 22 06:51:22 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:22 +0200
Subject: [ofa-general] [PATCH 06 of 12] Move the tlb flushing inside of unmap
	vmas. This saves us from passing
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <fbce3fecb033eb3fba1d.1208872282@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872186 -7200
# Node ID fbce3fecb033eb3fba1d9c2398ac74401ce0ecb5
# Parent  ee8c0644d5f67c1ef59142cce91b0bb6f34a53e0
Move the tlb flushing inside of unmap vmas. This saves us from passing
a pointer to the TLB structure around and simplifies the callers.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -723,8 +723,7 @@
 struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t);
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *);
-unsigned long unmap_vmas(struct mmu_gather **tlb,
-		struct vm_area_struct *start_vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *start_vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *);
 
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -804,7 +804,6 @@
 
 /**
  * unmap_vmas - unmap a range of memory covered by a list of vma's
- * @tlbp: address of the caller's struct mmu_gather
  * @vma: the starting vma
  * @start_addr: virtual address at which to start unmapping
  * @end_addr: virtual address at which to end unmapping
@@ -816,20 +815,13 @@
  * Unmap all pages in the vma list.
  *
  * We aim to not hold locks for too long (for scheduling latency reasons).
- * So zap pages in ZAP_BLOCK_SIZE bytecounts.  This means we need to
- * return the ending mmu_gather to the caller.
+ * So zap pages in ZAP_BLOCK_SIZE bytecounts.
  *
  * Only addresses between `start' and `end' will be unmapped.
  *
  * The VMA list must be sorted in ascending virtual address order.
- *
- * unmap_vmas() assumes that the caller will flush the whole unmapped address
- * range after unmap_vmas() returns.  So the only responsibility here is to
- * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
- * drops the lock and schedules.
  */
-unsigned long unmap_vmas(struct mmu_gather **tlbp,
-		struct vm_area_struct *vma, unsigned long start_addr,
+unsigned long unmap_vmas(struct vm_area_struct *vma, unsigned long start_addr,
 		unsigned long end_addr, unsigned long *nr_accounted,
 		struct zap_details *details)
 {
@@ -838,9 +830,14 @@
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
 	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
-	int fullmm = (*tlbp)->fullmm;
+	int fullmm;
+	struct mmu_gather *tlb;
 	struct mm_struct *mm = vma->vm_mm;
 
+	lru_add_drain();
+	tlb = tlb_gather_mmu(mm, 0);
+	update_hiwater_rss(mm);
+	fullmm = tlb->fullmm;
 	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
 		unsigned long end;
@@ -867,7 +864,7 @@
 						(HPAGE_SIZE / PAGE_SIZE);
 				start = end;
 			} else
-				start = unmap_page_range(*tlbp, vma,
+				start = unmap_page_range(tlb, vma,
 						start, end, &zap_work, details);
 
 			if (zap_work > 0) {
@@ -875,22 +872,23 @@
 				break;
 			}
 
-			tlb_finish_mmu(*tlbp, tlb_start, start);
+			tlb_finish_mmu(tlb, tlb_start, start);
 
 			if (need_resched() ||
 				(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
 				if (i_mmap_lock) {
-					*tlbp = NULL;
+					tlb = NULL;
 					goto out;
 				}
 				cond_resched();
 			}
 
-			*tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
+			tlb = tlb_gather_mmu(vma->vm_mm, fullmm);
 			tlb_start_valid = 0;
 			zap_work = ZAP_BLOCK_SIZE;
 		}
 	}
+	tlb_finish_mmu(tlb, start_addr, end_addr);
 out:
 	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
 	return start;	/* which is now the end (or restart) address */
@@ -906,18 +904,10 @@
 unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size, struct zap_details *details)
 {
-	struct mm_struct *mm = vma->vm_mm;
-	struct mmu_gather *tlb;
 	unsigned long end = address + size;
 	unsigned long nr_accounted = 0;
 
-	lru_add_drain();
-	tlb = tlb_gather_mmu(mm, 0);
-	update_hiwater_rss(mm);
-	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
-	if (tlb)
-		tlb_finish_mmu(tlb, address, end);
-	return end;
+	return unmap_vmas(vma, address, end, &nr_accounted, details);
 }
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1744,15 +1744,10 @@
 		unsigned long start, unsigned long end)
 {
 	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
-	struct mmu_gather *tlb;
 	unsigned long nr_accounted = 0;
 
-	lru_add_drain();
-	tlb = tlb_gather_mmu(mm, 0);
-	update_hiwater_rss(mm);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(vma, start, end, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	tlb_finish_mmu(tlb, start, end);
 	free_pgtables(vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
 }
@@ -2034,7 +2029,6 @@
 /* Release all mmaps. */
 void exit_mmap(struct mm_struct *mm)
 {
-	struct mmu_gather *tlb;
 	struct vm_area_struct *vma = mm->mmap;
 	unsigned long nr_accounted = 0;
 	unsigned long end;
@@ -2045,12 +2039,11 @@
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb = tlb_gather_mmu(mm, 1);
+
 	/* Don't update_hiwater_rss(mm) here, do_exit already did */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+	end = unmap_vmas(vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
-	tlb_finish_mmu(tlb, 0, end);
 	free_pgtables(vma, FIRST_USER_ADDRESS, 0);
 
 	/*


From andrea at qumranet.com  Tue Apr 22 06:51:23 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:23 +0200
Subject: [ofa-general] [PATCH 07 of 12] Add a function to rw_semaphores to
	check if there are any processes
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <8965539f4d174c79bd37.1208872283@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872187 -7200
# Node ID 8965539f4d174c79bd37e58e8b037d5db906e219
# Parent  fbce3fecb033eb3fba1d9c2398ac74401ce0ecb5
Add a function to rw_semaphores to check if there are any processes
waiting for the semaphore. Add rwsem_needbreak to sched.h that works
in the same way as spinlock_needbreak().

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -59,6 +59,8 @@
  */
 extern void downgrade_write(struct rw_semaphore *sem);
 
+extern int rwsem_is_contended(struct rw_semaphore *sem);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * nested locking. NOTE: rwsems are not allowed to recurse
diff --git a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1984,6 +1984,15 @@
 #endif
 }
 
+static inline int rwsem_needbreak(struct rw_semaphore *sem)
+{
+#ifdef CONFIG_PREEMPT
+	return rwsem_is_contended(sem);
+#else
+	return 0;
+#endif
+}
+
 /*
  * Reevaluate whether the task has signals pending delivery.
  * Wake the task if so.
diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -305,6 +305,18 @@
 	spin_unlock_irqrestore(&sem->wait_lock, flags);
 }
 
+int rwsem_is_contended(struct rw_semaphore *sem)
+{
+	/*
+	 * Racy check for an empty list. False positives or negatives
+	 * would be okay. False positive may cause a useless dropping of
+	 * locks. False negatives may cause locks to be held a bit
+	 * longer until the next check.
+	 */
+	return !list_empty(&sem->wait_list);
+}
+
+EXPORT_SYMBOL(rwsem_is_contended);
 EXPORT_SYMBOL(__init_rwsem);
 EXPORT_SYMBOL(__down_read);
 EXPORT_SYMBOL(__down_read_trylock);
diff --git a/lib/rwsem.c b/lib/rwsem.c
--- a/lib/rwsem.c
+++ b/lib/rwsem.c
@@ -251,6 +251,18 @@
 	return sem;
 }
 
+int rwsem_is_contended(struct rw_semaphore *sem)
+{
+	/*
+	 * Racy check for an empty list. False positives or negatives
+	 * would be okay. False positive may cause a useless dropping of
+	 * locks. False negatives may cause locks to be held a bit
+	 * longer until the next check.
+	 */
+	return !list_empty(&sem->wait_list);
+}
+
+EXPORT_SYMBOL(rwsem_is_contended);
 EXPORT_SYMBOL(rwsem_down_read_failed);
 EXPORT_SYMBOL(rwsem_down_write_failed);
 EXPORT_SYMBOL(rwsem_wake);


From andrea at qumranet.com  Tue Apr 22 06:51:24 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:24 +0200
Subject: [ofa-general] [PATCH 08 of 12] The conversion to a rwsem allows
	notifier callbacks during rmap traversal
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <6e04df1f4284689b1c46.1208872284@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872187 -7200
# Node ID 6e04df1f4284689b1c46e57a67559abe49ecf292
# Parent  8965539f4d174c79bd37e58e8b037d5db906e219
The conversion to a rwsem allows notifier callbacks during rmap traversal
for files. A rw style lock also allows concurrent walking of the
reverse map so that multiple processors can expire pages in the same memory
area of the same process. So it increases the potential concurrency.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/Documentation/vm/locking b/Documentation/vm/locking
--- a/Documentation/vm/locking
+++ b/Documentation/vm/locking
@@ -66,7 +66,7 @@
 expand_stack(), it is hard to come up with a destructive scenario without 
 having the vmlist protection in this case.
 
-The page_table_lock nests with the inode i_mmap_lock and the kmem cache
+The page_table_lock nests with the inode i_mmap_sem and the kmem cache
 c_spinlock spinlocks.  This is okay, since the kmem code asks for pages after
 dropping c_spinlock.  The page_table_lock also nests with pagecache_lock and
 pagemap_lru_lock spinlocks, and no code asks for memory with these locks
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -69,7 +69,7 @@
 	if (!vma_shareable(vma, addr))
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(svma, &iter, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;
@@ -94,7 +94,7 @@
 		put_page(virt_to_page(spte));
 	spin_unlock(&mm->page_table_lock);
 out:
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -454,10 +454,10 @@
 	pgoff = offset >> PAGE_SHIFT;
 
 	i_size_write(inode, offset);
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	if (!prio_tree_empty(&mapping->i_mmap))
 		hugetlb_vmtruncate_list(&mapping->i_mmap, pgoff);
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	truncate_hugepages(inode, offset);
 	return 0;
 }
diff --git a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -210,7 +210,7 @@
 	INIT_LIST_HEAD(&inode->i_devices);
 	INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
 	rwlock_init(&inode->i_data.tree_lock);
-	spin_lock_init(&inode->i_data.i_mmap_lock);
+	init_rwsem(&inode->i_data.i_mmap_sem);
 	INIT_LIST_HEAD(&inode->i_data.private_list);
 	spin_lock_init(&inode->i_data.private_lock);
 	INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
diff --git a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -503,7 +503,7 @@
 	unsigned int		i_mmap_writable;/* count VM_SHARED mappings */
 	struct prio_tree_root	i_mmap;		/* tree of private and shared mappings */
 	struct list_head	i_mmap_nonlinear;/*list VM_NONLINEAR mappings */
-	spinlock_t		i_mmap_lock;	/* protect tree, count, list */
+	struct rw_semaphore	i_mmap_sem;	/* protect tree, count, list */
 	unsigned int		truncate_count;	/* Cover race condition with truncate */
 	unsigned long		nrpages;	/* number of total pages */
 	pgoff_t			writeback_index;/* writeback starts here */
diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -716,7 +716,7 @@
 	struct address_space *check_mapping;	/* Check page->mapping if set */
 	pgoff_t	first_index;			/* Lowest page->index to unmap */
 	pgoff_t last_index;			/* Highest page->index to unmap */
-	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
+	struct rw_semaphore *i_mmap_sem;	/* For unmap_mapping_range: */
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
 };
 
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -274,12 +274,12 @@
 				atomic_dec(&inode->i_writecount);
 
 			/* insert tmp into the share list, just after mpnt */
-			spin_lock(&file->f_mapping->i_mmap_lock);
+			down_write(&file->f_mapping->i_mmap_sem);
 			tmp->vm_truncate_count = mpnt->vm_truncate_count;
 			flush_dcache_mmap_lock(file->f_mapping);
 			vma_prio_tree_add(tmp, mpnt);
 			flush_dcache_mmap_unlock(file->f_mapping);
-			spin_unlock(&file->f_mapping->i_mmap_lock);
+			up_write(&file->f_mapping->i_mmap_sem);
 		}
 
 		/*
diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -61,16 +61,16 @@
 /*
  * Lock ordering:
  *
- *  ->i_mmap_lock		(vmtruncate)
+ *  ->i_mmap_sem		(vmtruncate)
  *    ->private_lock		(__free_pte->__set_page_dirty_buffers)
  *      ->swap_lock		(exclusive_swap_page, others)
  *        ->mapping->tree_lock
  *
  *  ->i_mutex
- *    ->i_mmap_lock		(truncate->unmap_mapping_range)
+ *    ->i_mmap_sem		(truncate->unmap_mapping_range)
  *
  *  ->mmap_sem
- *    ->i_mmap_lock
+ *    ->i_mmap_sem
  *      ->page_table_lock or pte_lock	(various, mainly in memory.c)
  *        ->mapping->tree_lock	(arch-dependent flush_dcache_mmap_lock)
  *
@@ -87,7 +87,7 @@
  *    ->sb_lock			(fs/fs-writeback.c)
  *    ->mapping->tree_lock	(__sync_single_inode)
  *
- *  ->i_mmap_lock
+ *  ->i_mmap_sem
  *    ->anon_vma.lock		(vma_adjust)
  *
  *  ->anon_vma.lock
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -184,7 +184,7 @@
 	if (!page)
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		mm = vma->vm_mm;
 		address = vma->vm_start +
@@ -204,7 +204,7 @@
 			page_cache_release(page);
 		}
 	}
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -206,13 +206,13 @@
 			}
 			goto out;
 		}
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		flush_dcache_mmap_lock(mapping);
 		vma->vm_flags |= VM_NONLINEAR;
 		vma_prio_tree_remove(vma, &mapping->i_mmap);
 		vma_nonlinear_insert(vma, &mapping->i_mmap_nonlinear);
 		flush_dcache_mmap_unlock(mapping);
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	}
 
 	mmu_notifier_invalidate_range_start(mm, start, start + size);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -790,7 +790,7 @@
 	struct page *page;
 	struct page *tmp;
 	/*
-	 * A page gathering list, protected by per file i_mmap_lock. The
+	 * A page gathering list, protected by per file i_mmap_sem. The
 	 * lock is used to avoid list corruption from multiple unmapping
 	 * of the same page since we are using page->lru.
 	 */
@@ -840,9 +840,9 @@
 	 * do nothing in this case.
 	 */
 	if (vma->vm_file) {
-		spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+		down_write(&vma->vm_file->f_mapping->i_mmap_sem);
 		__unmap_hugepage_range(vma, start, end);
-		spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+		up_write(&vma->vm_file->f_mapping->i_mmap_sem);
 	}
 }
 
@@ -1085,7 +1085,7 @@
 	BUG_ON(address >= end);
 	flush_cache_range(vma, address, end);
 
-	spin_lock(&vma->vm_file->f_mapping->i_mmap_lock);
+	down_write(&vma->vm_file->f_mapping->i_mmap_sem);
 	spin_lock(&mm->page_table_lock);
 	for (; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -1100,7 +1100,7 @@
 		}
 	}
 	spin_unlock(&mm->page_table_lock);
-	spin_unlock(&vma->vm_file->f_mapping->i_mmap_lock);
+	up_write(&vma->vm_file->f_mapping->i_mmap_sem);
 
 	flush_tlb_range(vma, start, end);
 }
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -829,7 +829,7 @@
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+	struct rw_semaphore *i_mmap_sem = details? details->i_mmap_sem: NULL;
 	int fullmm;
 	struct mmu_gather *tlb;
 	struct mm_struct *mm = vma->vm_mm;
@@ -875,8 +875,8 @@
 			tlb_finish_mmu(tlb, tlb_start, start);
 
 			if (need_resched() ||
-				(i_mmap_lock && spin_needbreak(i_mmap_lock))) {
-				if (i_mmap_lock) {
+				(i_mmap_sem && rwsem_needbreak(i_mmap_sem))) {
+				if (i_mmap_sem) {
 					tlb = NULL;
 					goto out;
 				}
@@ -1742,7 +1742,7 @@
 /*
  * Helper functions for unmap_mapping_range().
  *
- * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
+ * __ Notes on dropping i_mmap_sem to reduce latency while unmapping __
  *
  * We have to restart searching the prio_tree whenever we drop the lock,
  * since the iterator is only valid while the lock is held, and anyway
@@ -1761,7 +1761,7 @@
  * can't efficiently keep all vmas in step with mapping->truncate_count:
  * so instead reset them all whenever it wraps back to 0 (then go to 1).
  * mapping->truncate_count and vma->vm_truncate_count are protected by
- * i_mmap_lock.
+ * i_mmap_sem.
  *
  * In order to make forward progress despite repeatedly restarting some
  * large vma, note the restart_addr from unmap_vmas when it breaks out:
@@ -1811,7 +1811,7 @@
 
 	restart_addr = zap_page_range(vma, start_addr,
 					end_addr - start_addr, details);
-	need_break = need_resched() || spin_needbreak(details->i_mmap_lock);
+	need_break = need_resched() || rwsem_needbreak(details->i_mmap_sem);
 
 	if (restart_addr >= end_addr) {
 		/* We have now completed this vma: mark it so */
@@ -1825,9 +1825,9 @@
 			goto again;
 	}
 
-	spin_unlock(details->i_mmap_lock);
+	up_write(details->i_mmap_sem);
 	cond_resched();
-	spin_lock(details->i_mmap_lock);
+	down_write(details->i_mmap_sem);
 	return -EINTR;
 }
 
@@ -1921,9 +1921,9 @@
 	details.last_index = hba + hlen - 1;
 	if (details.last_index < details.first_index)
 		details.last_index = ULONG_MAX;
-	details.i_mmap_lock = &mapping->i_mmap_lock;
+	details.i_mmap_sem = &mapping->i_mmap_sem;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_write(&mapping->i_mmap_sem);
 
 	/* Protect against endless unmapping loops */
 	mapping->truncate_count++;
@@ -1938,7 +1938,7 @@
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
 	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
 		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
-	spin_unlock(&mapping->i_mmap_lock);
+	up_write(&mapping->i_mmap_sem);
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -211,12 +211,12 @@
 	if (!mapping)
 		return;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff)
 		remove_migration_pte(vma, old, new);
 
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 }
 
 /*
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -189,7 +189,7 @@
 }
 
 /*
- * Requires inode->i_mapping->i_mmap_lock
+ * Requires inode->i_mapping->i_mmap_sem
  */
 static void __remove_shared_vm_struct(struct vm_area_struct *vma,
 		struct file *file, struct address_space *mapping)
@@ -217,9 +217,9 @@
 
 	if (file) {
 		struct address_space *mapping = file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		__remove_shared_vm_struct(vma, file, mapping);
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	}
 }
 
@@ -442,7 +442,7 @@
 		mapping = vma->vm_file->f_mapping;
 
 	if (mapping) {
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		vma->vm_truncate_count = mapping->truncate_count;
 	}
 	anon_vma_lock(vma);
@@ -452,7 +452,7 @@
 
 	anon_vma_unlock(vma);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 
 	mm->map_count++;
 	validate_mm(mm);
@@ -539,7 +539,7 @@
 		mapping = file->f_mapping;
 		if (!(vma->vm_flags & VM_NONLINEAR))
 			root = &mapping->i_mmap;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		if (importer &&
 		    vma->vm_truncate_count != next->vm_truncate_count) {
 			/*
@@ -623,7 +623,7 @@
 	if (anon_vma)
 		spin_unlock(&anon_vma->lock);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 
 	if (remove_next) {
 		if (file)
@@ -2058,7 +2058,7 @@
 
 /* Insert vm structure into process list sorted by address
  * and into the inode's i_mmap tree.  If vm_file is non-NULL
- * then i_mmap_lock is taken here.
+ * then i_mmap_sem is taken here.
  */
 int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma)
 {
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -88,7 +88,7 @@
 		 * and we propagate stale pages into the dst afterward.
 		 */
 		mapping = vma->vm_file->f_mapping;
-		spin_lock(&mapping->i_mmap_lock);
+		down_write(&mapping->i_mmap_sem);
 		if (new_vma->vm_truncate_count &&
 		    new_vma->vm_truncate_count != vma->vm_truncate_count)
 			new_vma->vm_truncate_count = 0;
@@ -120,7 +120,7 @@
 	pte_unmap_nested(new_pte - 1);
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
-		spin_unlock(&mapping->i_mmap_lock);
+		up_write(&mapping->i_mmap_sem);
 	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -24,7 +24,7 @@
  *   inode->i_alloc_sem (vmtruncate_range)
  *   mm->mmap_sem
  *     page->flags PG_locked (lock_page)
- *       mapping->i_mmap_lock
+ *       mapping->i_mmap_sem
  *         anon_vma->lock
  *           mm->page_table_lock or pte_lock
  *             zone->lru_lock (in mark_page_accessed, isolate_lru_page)
@@ -373,14 +373,14 @@
 	 * The page lock not only makes sure that page->mapping cannot
 	 * suddenly be NULLified by truncation, it makes sure that the
 	 * structure at mapping cannot be freed and reused yet,
-	 * so we can safely take mapping->i_mmap_lock.
+	 * so we can safely take mapping->i_mmap_sem.
 	 */
 	BUG_ON(!PageLocked(page));
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 
 	/*
-	 * i_mmap_lock does not stabilize mapcount at all, but mapcount
+	 * i_mmap_sem does not stabilize mapcount at all, but mapcount
 	 * is more likely to be accurate if we note it after spinning.
 	 */
 	mapcount = page_mapcount(page);
@@ -403,7 +403,7 @@
 			break;
 	}
 
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	return referenced;
 }
 
@@ -489,12 +489,12 @@
 
 	BUG_ON(PageAnon(page));
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		if (vma->vm_flags & VM_SHARED)
 			ret += page_mkclean_one(page, vma);
 	}
-	spin_unlock(&mapping->i_mmap_lock);
+	up_read(&mapping->i_mmap_sem);
 	return ret;
 }
 
@@ -930,7 +930,7 @@
 	unsigned long max_nl_size = 0;
 	unsigned int mapcount;
 
-	spin_lock(&mapping->i_mmap_lock);
+	down_read(&mapping->i_mmap_sem);
 	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
 		ret = try_to_unmap_one(page, vma, migration);
 		if (ret == SWAP_FAIL || !page_mapped(page))
@@ -967,7 +967,6 @@
 	mapcount = page_mapcount(page);
 	if (!mapcount)
 		goto out;
-	cond_resched_lock(&mapping->i_mmap_lock);
 
 	max_nl_size = (max_nl_size + CLUSTER_SIZE - 1) & CLUSTER_MASK;
 	if (max_nl_cursor == 0)
@@ -989,7 +988,6 @@
 			}
 			vma->vm_private_data = (void *) max_nl_cursor;
 		}
-		cond_resched_lock(&mapping->i_mmap_lock);
 		max_nl_cursor += CLUSTER_SIZE;
 	} while (max_nl_cursor <= max_nl_size);
 
@@ -1001,7 +999,7 @@
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
 		vma->vm_private_data = NULL;
 out:
-	spin_unlock(&mapping->i_mmap_lock);
+	up_write(&mapping->i_mmap_sem);
 	return ret;
 }
 

From andrea at qumranet.com  Tue Apr 22 06:51:25 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:25 +0200
Subject: [ofa-general] [PATCH 09 of 12] Convert the anon_vma spinlock to a rw
	semaphore. This allows concurrent
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <bdb3d928a0ba91cdce2b.1208872285@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872187 -7200
# Node ID bdb3d928a0ba91cdce2b61bd40a2f80bddbe4ff2
# Parent  6e04df1f4284689b1c46e57a67559abe49ecf292
Convert the anon_vma spinlock to a rw semaphore. This allows concurrent
traversal of reverse maps for try_to_unmap() and page_mkclean(). It also
allows the calling of sleeping functions from reverse map traversal as
needed for the notifier callbacks. It includes possible concurrency.

Rcu is used in some context to guarantee the presence of the anon_vma
(try_to_unmap) while we acquire the anon_vma lock. We cannot take a
semaphore within an rcu critical section. Add a refcount to the anon_vma
structure which allow us to give an existence guarantee for the anon_vma
structure independent of the spinlock or the list contents.

The refcount can then be taken within the RCU section. If it has been
taken successfully then the refcount guarantees the existence of the
anon_vma. The refcount in anon_vma also allows us to fix a nasty
issue in page migration where we fudged by using rcu for a long code
path to guarantee the existence of the anon_vma. I think this is a bug
because the anon_vma may become empty and get scheduled to be freed
but then we increase the refcount again when the migration entries are
removed.

The refcount in general allows a shortening of RCU critical sections since
we can do a rcu_unlock after taking the refcount. This is particularly
relevant if the anon_vma chains contain hundreds of entries.

However:
- Atomic overhead increases in situations where a new reference
  to the anon_vma has to be established or removed. Overhead also increases
  when a speculative reference is used (try_to_unmap,
  page_mkclean, page migration).
- There is the potential for more frequent processor change due to up_xxx
  letting waiting tasks run first. This results in f.e. the Aim9 brk
  performance test to got down by 10-15%.

Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -25,7 +25,8 @@
  * pointing to this anon_vma once its vma list is empty.
  */
 struct anon_vma {
-	spinlock_t lock;	/* Serialize access to vma list */
+	atomic_t refcount;	/* vmas on the list */
+	struct rw_semaphore sem;/* Serialize access to vma list */
 	struct list_head head;	/* List of private "related" vmas */
 };
 
@@ -43,18 +44,31 @@
 	kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
+struct anon_vma *grab_anon_vma(struct page *page);
+
+static inline void get_anon_vma(struct anon_vma *anon_vma)
+{
+	atomic_inc(&anon_vma->refcount);
+}
+
+static inline void put_anon_vma(struct anon_vma *anon_vma)
+{
+	if (atomic_dec_and_test(&anon_vma->refcount))
+		anon_vma_free(anon_vma);
+}
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_lock(&anon_vma->lock);
+		down_write(&anon_vma->sem);
 }
 
 static inline void anon_vma_unlock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 }
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -235,15 +235,16 @@
 		return;
 
 	/*
-	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
+	 * We hold either the mmap_sem lock or a reference on the
+	 * anon_vma. So no need to call page_lock_anon_vma.
 	 */
 	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
+	down_read(&anon_vma->sem);
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
 		remove_migration_pte(vma, old, new);
 
-	spin_unlock(&anon_vma->lock);
+	up_read(&anon_vma->sem);
 }
 
 /*
@@ -623,7 +624,7 @@
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
-	int rcu_locked = 0;
+	struct anon_vma *anon_vma = NULL;
 	int charge = 0;
 
 	if (!newpage)
@@ -647,16 +648,14 @@
 	}
 	/*
 	 * By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
-	 * we cannot notice that anon_vma is freed while we migrates a page.
+	 * we cannot notice that anon_vma is freed while we migrate a page.
 	 * This rcu_read_lock() delays freeing anon_vma pointer until the end
 	 * of migration. File cache pages are no problem because of page_lock()
 	 * File Caches may use write_page() or lock_page() in migration, then,
 	 * just care Anon page here.
 	 */
-	if (PageAnon(page)) {
-		rcu_read_lock();
-		rcu_locked = 1;
-	}
+	if (PageAnon(page))
+		anon_vma = grab_anon_vma(page);
 
 	/*
 	 * Corner case handling:
@@ -674,10 +673,7 @@
 		if (!PageAnon(page) && PagePrivate(page)) {
 			/*
 			 * Go direct to try_to_free_buffers() here because
-			 * a) that's what try_to_release_page() would do anyway
-			 * b) we may be under rcu_read_lock() here, so we can't
-			 *    use GFP_KERNEL which is what try_to_release_page()
-			 *    needs to be effective.
+			 * that's what try_to_release_page() would do anyway
 			 */
 			try_to_free_buffers(page);
 		}
@@ -698,8 +694,8 @@
 	} else if (charge)
  		mem_cgroup_end_migration(newpage);
 rcu_unlock:
-	if (rcu_locked)
-		rcu_read_unlock();
+	if (anon_vma)
+		put_anon_vma(anon_vma);
 
 unlock:
 
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -567,7 +567,7 @@
 	if (vma->anon_vma)
 		anon_vma = vma->anon_vma;
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
+		down_write(&anon_vma->sem);
 		/*
 		 * Easily overlooked: when mprotect shifts the boundary,
 		 * make sure the expanding vma has anon_vma set if the
@@ -621,7 +621,7 @@
 	}
 
 	if (anon_vma)
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 	if (mapping)
 		up_write(&mapping->i_mmap_sem);
 
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -69,7 +69,7 @@
 		if (anon_vma) {
 			allocated = NULL;
 			locked = anon_vma;
-			spin_lock(&locked->lock);
+			down_write(&locked->sem);
 		} else {
 			anon_vma = anon_vma_alloc();
 			if (unlikely(!anon_vma))
@@ -81,6 +81,7 @@
 		/* page_table_lock to protect against threads */
 		spin_lock(&mm->page_table_lock);
 		if (likely(!vma->anon_vma)) {
+			get_anon_vma(anon_vma);
 			vma->anon_vma = anon_vma;
 			list_add_tail(&vma->anon_vma_node, &anon_vma->head);
 			allocated = NULL;
@@ -88,7 +89,7 @@
 		spin_unlock(&mm->page_table_lock);
 
 		if (locked)
-			spin_unlock(&locked->lock);
+			up_write(&locked->sem);
 		if (unlikely(allocated))
 			anon_vma_free(allocated);
 	}
@@ -99,14 +100,17 @@
 {
 	BUG_ON(vma->anon_vma != next->anon_vma);
 	list_del(&next->anon_vma_node);
+	put_anon_vma(vma->anon_vma);
 }
 
 void __anon_vma_link(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 
-	if (anon_vma)
+	if (anon_vma) {
+		get_anon_vma(anon_vma);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
+	}
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -114,36 +118,32 @@
 	struct anon_vma *anon_vma = vma->anon_vma;
 
 	if (anon_vma) {
-		spin_lock(&anon_vma->lock);
+		get_anon_vma(anon_vma);
+		down_write(&anon_vma->sem);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
-		spin_unlock(&anon_vma->lock);
+		up_write(&anon_vma->sem);
 	}
 }
 
 void anon_vma_unlink(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
-	int empty;
 
 	if (!anon_vma)
 		return;
 
-	spin_lock(&anon_vma->lock);
+	down_write(&anon_vma->sem);
 	list_del(&vma->anon_vma_node);
-
-	/* We must garbage collect the anon_vma if it's empty */
-	empty = list_empty(&anon_vma->head);
-	spin_unlock(&anon_vma->lock);
-
-	if (empty)
-		anon_vma_free(anon_vma);
+	up_write(&anon_vma->sem);
+	put_anon_vma(anon_vma);
 }
 
 static void anon_vma_ctor(struct kmem_cache *cachep, void *data)
 {
 	struct anon_vma *anon_vma = data;
 
-	spin_lock_init(&anon_vma->lock);
+	init_rwsem(&anon_vma->sem);
+	atomic_set(&anon_vma->refcount, 0);
 	INIT_LIST_HEAD(&anon_vma->head);
 }
 
@@ -157,9 +157,9 @@
  * Getting a lock on a stable anon_vma from a page off the LRU is
  * tricky: page_lock_anon_vma rely on RCU to guard against the races.
  */
-static struct anon_vma *page_lock_anon_vma(struct page *page)
+struct anon_vma *grab_anon_vma(struct page *page)
 {
-	struct anon_vma *anon_vma;
+	struct anon_vma *anon_vma = NULL;
 	unsigned long anon_mapping;
 
 	rcu_read_lock();
@@ -170,17 +170,26 @@
 		goto out;
 
 	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
-	spin_lock(&anon_vma->lock);
-	return anon_vma;
+	if (!atomic_inc_not_zero(&anon_vma->refcount))
+		anon_vma = NULL;
 out:
 	rcu_read_unlock();
-	return NULL;
+	return anon_vma;
+}
+
+static struct anon_vma *page_lock_anon_vma(struct page *page)
+{
+	struct anon_vma *anon_vma = grab_anon_vma(page);
+
+	if (anon_vma)
+		down_read(&anon_vma->sem);
+	return anon_vma;
 }
 
 static void page_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	spin_unlock(&anon_vma->lock);
-	rcu_read_unlock();
+	up_read(&anon_vma->sem);
+	put_anon_vma(anon_vma);
 }
 
 /*


From andrea at qumranet.com  Tue Apr 22 06:51:26 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:26 +0200
Subject: [ofa-general] [PATCH 10 of 12] Convert mm_lock to use semaphores
	after i_mmap_lock and anon_vma_lock
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <f8210c45f1c6f8b38d15.1208872286@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872187 -7200
# Node ID f8210c45f1c6f8b38d15e5dfebbc5f7c1f890c93
# Parent  bdb3d928a0ba91cdce2b61bd40a2f80bddbe4ff2
Convert mm_lock to use semaphores after i_mmap_lock and anon_vma_lock
conversion.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1062,10 +1062,10 @@
  * mm_lock and mm_unlock are expensive operations that may take a long time.
  */
 struct mm_lock_data {
-	spinlock_t **i_mmap_locks;
-	spinlock_t **anon_vma_locks;
-	size_t nr_i_mmap_locks;
-	size_t nr_anon_vma_locks;
+	struct rw_semaphore **i_mmap_sems;
+	struct rw_semaphore **anon_vma_sems;
+	size_t nr_i_mmap_sems;
+	size_t nr_anon_vma_sems;
 };
 extern int mm_lock(struct mm_struct *mm, struct mm_lock_data *data);
 extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2243,8 +2243,8 @@
 static int mm_lock_cmp(const void *a, const void *b)
 {
 	cond_resched();
-	if ((unsigned long)*(spinlock_t **)a <
-	    (unsigned long)*(spinlock_t **)b)
+	if ((unsigned long)*(struct rw_semaphore **)a <
+	    (unsigned long)*(struct rw_semaphore **)b)
 		return -1;
 	else if (a == b)
 		return 0;
@@ -2252,7 +2252,7 @@
 		return 1;
 }
 
-static unsigned long mm_lock_sort(struct mm_struct *mm, spinlock_t **locks,
+static unsigned long mm_lock_sort(struct mm_struct *mm, struct rw_semaphore **sems,
 				  int anon)
 {
 	struct vm_area_struct *vma;
@@ -2261,59 +2261,59 @@
 	for (vma = mm->mmap; vma; vma = vma->vm_next) {
 		if (anon) {
 			if (vma->anon_vma)
-				locks[i++] = &vma->anon_vma->lock;
+				sems[i++] = &vma->anon_vma->sem;
 		} else {
 			if (vma->vm_file && vma->vm_file->f_mapping)
-				locks[i++] = &vma->vm_file->f_mapping->i_mmap_lock;
+				sems[i++] = &vma->vm_file->f_mapping->i_mmap_sem;
 		}
 	}
 
 	if (!i)
 		goto out;
 
-	sort(locks, i, sizeof(spinlock_t *), mm_lock_cmp, NULL);
+	sort(sems, i, sizeof(struct rw_semaphore *), mm_lock_cmp, NULL);
 
 out:
 	return i;
 }
 
 static inline unsigned long mm_lock_sort_anon_vma(struct mm_struct *mm,
-						  spinlock_t **locks)
+						  struct rw_semaphore **sems)
 {
-	return mm_lock_sort(mm, locks, 1);
+	return mm_lock_sort(mm, sems, 1);
 }
 
 static inline unsigned long mm_lock_sort_i_mmap(struct mm_struct *mm,
-						spinlock_t **locks)
+						struct rw_semaphore **sems)
 {
-	return mm_lock_sort(mm, locks, 0);
+	return mm_lock_sort(mm, sems, 0);
 }
 
-static void mm_lock_unlock(spinlock_t **locks, size_t nr, int lock)
+static void mm_lock_unlock(struct rw_semaphore **sems, size_t nr, int lock)
 {
-	spinlock_t *last = NULL;
+	struct rw_semaphore *last = NULL;
 	size_t i;
 
 	for (i = 0; i < nr; i++)
 		/*  Multiple vmas may use the same lock. */
-		if (locks[i] != last) {
-			BUG_ON((unsigned long) last > (unsigned long) locks[i]);
-			last = locks[i];
+		if (sems[i] != last) {
+			BUG_ON((unsigned long) last > (unsigned long) sems[i]);
+			last = sems[i];
 			if (lock)
-				spin_lock(last);
+				down_write(last);
 			else
-				spin_unlock(last);
+				up_write(last);
 		}
 }
 
-static inline void __mm_lock(spinlock_t **locks, size_t nr)
+static inline void __mm_lock(struct rw_semaphore **sems, size_t nr)
 {
-	mm_lock_unlock(locks, nr, 1);
+	mm_lock_unlock(sems, nr, 1);
 }
 
-static inline void __mm_unlock(spinlock_t **locks, size_t nr)
+static inline void __mm_unlock(struct rw_semaphore **sems, size_t nr)
 {
-	mm_lock_unlock(locks, nr, 0);
+	mm_lock_unlock(sems, nr, 0);
 }
 
 /*
@@ -2325,57 +2325,57 @@
  */
 int mm_lock(struct mm_struct *mm, struct mm_lock_data *data)
 {
-	spinlock_t **anon_vma_locks, **i_mmap_locks;
+	struct rw_semaphore **anon_vma_sems, **i_mmap_sems;
 
 	down_write(&mm->mmap_sem);
 	if (mm->map_count) {
-		anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count);
-		if (unlikely(!anon_vma_locks)) {
+		anon_vma_sems = vmalloc(sizeof(struct rw_semaphore *) * mm->map_count);
+		if (unlikely(!anon_vma_sems)) {
 			up_write(&mm->mmap_sem);
 			return -ENOMEM;
 		}
 
-		i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count);
-		if (unlikely(!i_mmap_locks)) {
+		i_mmap_sems = vmalloc(sizeof(struct rw_semaphore *) * mm->map_count);
+		if (unlikely(!i_mmap_sems)) {
 			up_write(&mm->mmap_sem);
-			vfree(anon_vma_locks);
+			vfree(anon_vma_sems);
 			return -ENOMEM;
 		}
 
-		data->nr_anon_vma_locks = mm_lock_sort_anon_vma(mm, anon_vma_locks);
-		data->nr_i_mmap_locks = mm_lock_sort_i_mmap(mm, i_mmap_locks);
+		data->nr_anon_vma_sems = mm_lock_sort_anon_vma(mm, anon_vma_sems);
+		data->nr_i_mmap_sems = mm_lock_sort_i_mmap(mm, i_mmap_sems);
 
-		if (data->nr_anon_vma_locks) {
-			__mm_lock(anon_vma_locks, data->nr_anon_vma_locks);
-			data->anon_vma_locks = anon_vma_locks;
+		if (data->nr_anon_vma_sems) {
+			__mm_lock(anon_vma_sems, data->nr_anon_vma_sems);
+			data->anon_vma_sems = anon_vma_sems;
 		} else
-			vfree(anon_vma_locks);
+			vfree(anon_vma_sems);
 
-		if (data->nr_i_mmap_locks) {
-			__mm_lock(i_mmap_locks, data->nr_i_mmap_locks);
-			data->i_mmap_locks = i_mmap_locks;
+		if (data->nr_i_mmap_sems) {
+			__mm_lock(i_mmap_sems, data->nr_i_mmap_sems);
+			data->i_mmap_sems = i_mmap_sems;
 		} else
-			vfree(i_mmap_locks);
+			vfree(i_mmap_sems);
 	}
 	return 0;
 }
 
-static void mm_unlock_vfree(spinlock_t **locks, size_t nr)
+static void mm_unlock_vfree(struct rw_semaphore **sems, size_t nr)
 {
-	__mm_unlock(locks, nr);
-	vfree(locks);
+	__mm_unlock(sems, nr);
+	vfree(sems);
 }
 
 /* avoid memory allocations for mm_unlock to prevent deadlock */
 void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
 {
 	if (mm->map_count) {
-		if (data->nr_anon_vma_locks)
-			mm_unlock_vfree(data->anon_vma_locks,
-					data->nr_anon_vma_locks);
-		if (data->i_mmap_locks)
-			mm_unlock_vfree(data->i_mmap_locks,
-					data->nr_i_mmap_locks);
+		if (data->nr_anon_vma_sems)
+			mm_unlock_vfree(data->anon_vma_sems,
+					data->nr_anon_vma_sems);
+		if (data->i_mmap_sems)
+			mm_unlock_vfree(data->i_mmap_sems,
+					data->nr_i_mmap_sems);
 	}
 	up_write(&mm->mmap_sem);
 }


From andrea at qumranet.com  Tue Apr 22 06:51:27 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:27 +0200
Subject: [ofa-general] [PATCH 11 of 12] XPMEM would have used sys_madvise()
	except that madvise_dontneed()
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <128d705f38c8a774ac11.1208872287@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872187 -7200
# Node ID 128d705f38c8a774ac11559db445787ce6e91c77
# Parent  f8210c45f1c6f8b38d15e5dfebbc5f7c1f890c93
XPMEM would have used sys_madvise() except that madvise_dontneed()
returns an -EINVAL if VM_PFNMAP is set, which is always true for the pages
XPMEM imports from other partitions and is also true for uncached pages
allocated locally via the mspec allocator.  XPMEM needs zap_page_range()
functionality for these types of pages as well as 'normal' pages.

Signed-off-by: Dean Nelson <dcn at sgi.com>

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -909,6 +909,7 @@
 
 	return unmap_vmas(vma, address, end, &nr_accounted, details);
 }
+EXPORT_SYMBOL_GPL(zap_page_range);
 
 /*
  * Do a quick page-table lookup for a single page.


From andrea at qumranet.com  Tue Apr 22 06:51:28 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 15:51:28 +0200
Subject: [ofa-general] [PATCH 12 of 12] This patch adds a lock ordering rule
	to avoid a potential deadlock when
In-Reply-To: <patchbomb.1208872276@duo.random>
Message-ID: <e847039ee2e815088661.1208872288@duo.random>

# HG changeset patch
# User Andrea Arcangeli <andrea at qumranet.com>
# Date 1208872187 -7200
# Node ID e847039ee2e815088661933b7195584847dc7540
# Parent  128d705f38c8a774ac11559db445787ce6e91c77
This patch adds a lock ordering rule to avoid a potential deadlock when
multiple mmap_sems need to be locked.

Signed-off-by: Dean Nelson <dcn at sgi.com>

diff --git a/mm/filemap.c b/mm/filemap.c
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -79,6 +79,9 @@
  *
  *  ->i_mutex			(generic_file_buffered_write)
  *    ->mmap_sem		(fault_in_pages_readable->do_page_fault)
+ *
+ *    When taking multiple mmap_sems, one should lock the lowest-addressed
+ *    one first proceeding on up to the highest-addressed one.
  *
  *  ->i_mutex
  *    ->i_alloc_sem             (various)


From yevgenyp at mellanox.co.il  Tue Apr 22 07:10:07 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 17:10:07 +0300
Subject: [ofa-general][PATCH] mlx4: Dynamic port configuration (MP support,
	Patch 7)
Message-ID: <480DF1BF.5000702@mellanox.co.il>

>From e13bef843cb2c7cee5a0ba388d97e21188087424 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 15:14:30 +0300
Subject: [PATCH] mlx4: Dynamic port configuration

Port type can be set using sysfs interface when the low level driver is up.
The low level driver unregisters all its customers and then registers them
again with the new port types (which they query for in add_one)

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/main.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 97 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index a528809..e3fd4e9 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -281,6 +281,96 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	return 0;
 }

+static int mlx4_change_port_types(struct mlx4_dev *dev,
+				  enum mlx4_port_type *port_types)
+{
+	int i;
+	int err = 0;
+	int change = 0;
+	int port;
+
+	for (i = 0; i <  MLX4_MAX_PORTS; i++) {
+		if (port_types[i] != dev->caps.port_type[i + 1]) {
+			change = 1;
+			dev->caps.port_type[i + 1] = port_types[i];
+		}
+	}
+	if (change) {
+		mlx4_unregister_device(dev);
+		for (port = 1; port <= dev->caps.num_ports; port++) {
+			mlx4_CLOSE_PORT(dev, port);
+			err = mlx4_SET_PORT(dev, port);
+			if (err) {
+				mlx4_err(dev, "Failed to set port %d, "
+					      "aborting\n", port);
+				return err;
+			}
+		}
+		err = mlx4_register_device(dev);
+	}
+	return err;
+}
+
+static ssize_t show_port_type(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mlx4_dev *mdev = pci_get_drvdata(pdev);
+	int i;
+
+	sprintf(buf, "Current port types:\n");
+	for (i = 1; i <= MLX4_MAX_PORTS; i++) {
+		sprintf(buf, "%sPort%d: %s\n", buf, i,
+			(mdev->caps.port_type[i] == MLX4_PORT_TYPE_IB)?
+			"ib": "eth");
+	}
+	return strlen(buf);
+}
+
+static ssize_t set_port_type(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mlx4_dev *mdev = pci_get_drvdata(pdev);
+	char *type;
+	enum mlx4_port_type port_types[MLX4_MAX_PORTS];
+	char *loc_buf;
+	char *ptr;
+	int i;
+	int err = 0;
+
+	loc_buf = kmalloc(count + 1, GFP_KERNEL);
+	if (!loc_buf)
+		return -ENOMEM;
+
+	ptr = loc_buf;
+	memcpy(loc_buf, buf, count + 1);
+	for (i = 0; i < MLX4_MAX_PORTS; i++) {
+		type = strsep(&loc_buf, ",");
+		if (!strcmp(type, "ib"))
+			port_types[i] = MLX4_PORT_TYPE_IB;
+		else if (!strcmp(type, "eth"))
+			port_types[i] = MLX4_PORT_TYPE_ETH;
+		else {
+			dev_warn(dev, "%s is not acceptable port type "
+				 "(use 'eth' or 'ib' only)\n", type);
+			err = -EINVAL;
+			goto out;
+		}
+	}
+	err = mlx4_check_port_params(mdev, port_types);
+	if (err)
+		goto out;
+
+	err = mlx4_change_port_types(mdev, port_types);
+out:
+	kfree(ptr);
+	return err ? err: count;
+}
+static DEVICE_ATTR(mlx4_port_type, S_IWUGO | S_IRUGO, show_port_type, set_port_type);
+
 static int mlx4_load_fw(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -979,8 +1069,14 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)

 	pci_set_drvdata(pdev, dev);

+	if (device_create_file(&pdev->dev, &dev_attr_mlx4_port_type))
+		goto err_sysfs;
+
 	return 0;

+err_sysfs:
+	mlx4_unregister_device(dev);
+
 err_cleanup:
 	mlx4_cleanup_mcg_table(dev);
 	mlx4_cleanup_qp_table(dev);
@@ -1036,6 +1132,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 	int p;

 	if (dev) {
+		device_remove_file(&pdev->dev, &dev_attr_mlx4_port_type);
 		mlx4_unregister_device(dev);

 		for (p = 1; p <= dev->caps.num_ports; ++p)
-- 
1.5.4


From yevgenyp at mellanox.co.il  Tue Apr 22 07:13:54 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 17:13:54 +0300
Subject: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support, Patch
	10)
Message-ID: <480DF2A2.8030602@mellanox.co.il>

>From 2a2d22208f6fdba4c0c2afdf0ed12ef07b93d661 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 16:39:47 +0300
Subject: [PATCH] mlx4: Completion EQ per cpu

Completion eq's are created per cpu. Created cq's are attached to an eq by
"Round Robin" algorithm, unless a specific eq was requested.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c |    2 +-
 drivers/net/mlx4/cq.c           |   19 ++++++++++++++++---
 drivers/net/mlx4/eq.c           |   39 ++++++++++++++++++++++++++-------------
 drivers/net/mlx4/main.c         |   14 ++++++++------
 drivers/net/mlx4/mlx4.h         |    6 ++++--
 include/linux/mlx4/device.h     |    3 ++-
 6 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 63daf52..732f812 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 	}

 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq, 0);
+			    cq->db.dma, &cq->mcq, vector, 0);
 	if (err)
 		goto err_dbmap;

diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index d893cc1..bbb4c7b 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize);

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
-		  int collapsed)
+		  unsigned vector, int collapsed)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_cq_table *cq_table = &priv->cq_table;
@@ -227,7 +227,20 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,

 	cq_context->flags = cpu_to_be32(!!collapsed << 18);
 	cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index);
-	cq_context->comp_eqn        = priv->eq_table.eq[MLX4_EQ_COMP].eqn;
+
+	if (vector > priv->eq_table.num_comp_eqs) {
+		err = -EINVAL;
+		goto err_radix;
+	}
+
+	if (vector == 0) {
+		vector = priv->eq_table.last_comp_eq %
+			priv->eq_table.num_comp_eqs + 1;
+		priv->eq_table.last_comp_eq = vector;
+	}
+	cq->comp_eq_idx		    = MLX4_EQ_COMP_CPU0 + vector - 1;
+	cq_context->comp_eqn	    = priv->eq_table.eq[MLX4_EQ_COMP_CPU0 +
+							vector - 1].eqn;
 	cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;

 	mtt_addr = mlx4_mtt_addr(dev, mtt);
@@ -276,7 +289,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq)
 	if (err)
 		mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn);

-	synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq);
+	synchronize_irq(priv->eq_table.eq[cq->comp_eq_idx].irq);

 	spin_lock_irq(&cq_table->lock);
 	radix_tree_delete(&cq_table->tree, cq->cqn);
diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index e141a15..b4676db 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -265,7 +265,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)

 	writel(priv->eq_table.clr_mask, priv->eq_table.clr_int);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
 		work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]);

 	return IRQ_RETVAL(work);
@@ -482,7 +482,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)

 	if (eq_table->have_irq)
 		free_irq(dev->pdev->irq, dev);
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + eq_table->num_comp_eqs; ++i)
 		if (eq_table->eq[i].have_irq)
 			free_irq(eq_table->eq[i].irq, eq_table->eq + i);
 }
@@ -553,6 +553,7 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev)
 int mlx4_init_eq_table(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
+	int req_eqs;
 	int err;
 	int i;

@@ -573,11 +574,22 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
 	priv->eq_table.clr_int  = priv->clr_base +
 		(priv->eq_table.inta_pin < 32 ? 4 : 0);

-	err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE,
-			     (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0,
-			     &priv->eq_table.eq[MLX4_EQ_COMP]);
-	if (err)
-		goto err_out_unmap;
+	priv->eq_table.num_comp_eqs = 0;
+	req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1;
+	while (req_eqs) {
+		err = mlx4_create_eq(
+			dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE,
+			(dev->flags & MLX4_FLAG_MSI_X) ?
+			(MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs) : 0,
+			&priv->eq_table.eq[MLX4_EQ_COMP_CPU0 +
+			priv->eq_table.num_comp_eqs]);
+		if (err)
+			goto err_out_comp;
+
+		priv->eq_table.num_comp_eqs++;
+		req_eqs--;
+	}
+	priv->eq_table.last_comp_eq = 0;

 	err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE,
 			     (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0,
@@ -587,11 +599,12 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)

 	if (dev->flags & MLX4_FLAG_MSI_X) {
 		static const char *eq_name[] = {
-			[MLX4_EQ_COMP]  = DRV_NAME " (comp)",
+			[MLX4_EQ_COMP_CPU0...MLX4_NUM_EQ] = "comp_" DRV_NAME,
 			[MLX4_EQ_ASYNC] = DRV_NAME " (async)"
 		};

-		for (i = 0; i < MLX4_NUM_EQ; ++i) {
+		for (i = 0; i < MLX4_EQ_COMP_CPU0 +
+		      priv->eq_table.num_comp_eqs; ++i) {
 			err = request_irq(priv->eq_table.eq[i].irq,
 					  mlx4_msi_x_interrupt,
 					  0, eq_name[i], priv->eq_table.eq + i);
@@ -616,7 +629,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
 		mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n",
 			   priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
 		eq_set_ci(&priv->eq_table.eq[i], 1);

 	return 0;
@@ -625,9 +638,9 @@ err_out_async:
 	mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]);

 err_out_comp:
-	mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]);
+	for (i = 0; i < priv->eq_table.num_comp_eqs; ++i)
+		mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + i]);

-err_out_unmap:
 	mlx4_unmap_clr_int(dev);
 	mlx4_free_irqs(dev);

@@ -646,7 +659,7 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev)

 	mlx4_free_irqs(dev);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
 		mlx4_free_eq(dev, &priv->eq_table.eq[i]);

 	mlx4_unmap_clr_int(dev);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index e3fd4e9..aecb1f2 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -922,22 +922,24 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct msix_entry entries[MLX4_NUM_EQ];
+	int needed_vectors = MLX4_EQ_COMP_CPU0 + num_online_cpus();
 	int err;
 	int i;

 	if (msi_x) {
-		for (i = 0; i < MLX4_NUM_EQ; ++i)
+		for (i = 0; i < needed_vectors; ++i)
 			entries[i].entry = i;

-		err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries));
+		err = pci_enable_msix(dev->pdev, entries, needed_vectors);
 		if (err) {
 			if (err > 0)
-				mlx4_info(dev, "Only %d MSI-X vectors available, "
-					  "not using MSI-X\n", err);
+				mlx4_info(dev, "Only %d MSI-X vectors "
+					  "available, need %d. Not using MSI-X\n",
+					  err, needed_vectors);
 			goto no_msi;
 		}

-		for (i = 0; i < MLX4_NUM_EQ; ++i)
+		for (i = 0; i < needed_vectors; ++i)
 			priv->eq_table.eq[i].irq = entries[i].vector;

 		dev->flags |= MLX4_FLAG_MSI_X;
@@ -945,7 +947,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 	}

 no_msi:
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < needed_vectors; ++i)
 		priv->eq_table.eq[i].irq = dev->pdev->irq;
 }

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index eff1c5a..2201a99 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -64,8 +64,8 @@ enum {

 enum {
 	MLX4_EQ_ASYNC,
-	MLX4_EQ_COMP,
-	MLX4_NUM_EQ
+	MLX4_EQ_COMP_CPU0,
+	MLX4_NUM_EQ = MLX4_EQ_COMP_CPU0 + NR_CPUS
 };

 enum {
@@ -211,6 +211,8 @@ struct mlx4_eq_table {
 	void __iomem	       *uar_map[(MLX4_NUM_EQ + 6) / 4];
 	u32			clr_mask;
 	struct mlx4_eq		eq[MLX4_NUM_EQ];
+	int			num_comp_eqs;
+	int			last_comp_eq;
 	u64			icm_virt;
 	struct page	       *icm_page;
 	dma_addr_t		icm_dma;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 93c17aa..673462c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -312,6 +312,7 @@ struct mlx4_cq {
 	int			arm_sn;

 	int			cqn;
+	int			comp_eq_idx;

 	atomic_t		refcount;
 	struct completion	free;
@@ -441,7 +442,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
-		  int collapsed);
+		  unsigned vector, int collapsed);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

 int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Tue Apr 22 07:12:10 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Tue, 22 Apr 2008 17:12:10 +0300
Subject: ***SPAM*** [ofa-general][PATCH] mlx4: Collapsed CQ support (MP
	support, Patch 9)
Message-ID: <480DF23A.7090304@mellanox.co.il>

>From 749a2b62acc505a9ab2437eddb4cdd45503183d0 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 15:50:51 +0300
Subject: [PATCH] mlx4: Collapsed CQ support

Changed cq creation API to support the creation of collapsed cqs.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c |    2 +-
 drivers/net/mlx4/cq.c           |    4 +++-
 include/linux/mlx4/device.h     |    3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 5e570bb..63daf52 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 	}

 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq);
+			    cq->db.dma, &cq->mcq, 0);
 	if (err)
 		goto err_dbmap;

diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index caa5bcf..d893cc1 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -188,7 +188,8 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq,
 EXPORT_SYMBOL_GPL(mlx4_cq_resize);

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq)
+		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
+		  int collapsed)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_cq_table *cq_table = &priv->cq_table;
@@ -224,6 +225,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 	cq_context = mailbox->buf;
 	memset(cq_context, 0, sizeof *cq_context);

+	cq_context->flags = cpu_to_be32(!!collapsed << 18);
 	cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index);
 	cq_context->comp_eqn        = priv->eq_table.eq[MLX4_EQ_COMP].eqn;
 	cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 4ca3a00..93c17aa 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -440,7 +440,8 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
 		       int size);

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
+		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
+		  int collapsed);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

 int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
-- 
1.5.4


From hrosenstock at xsigo.com  Tue Apr 22 07:17:22 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Tue, 22 Apr 2008 07:17:22 -0700
Subject: [ofa-general] ***SPAM*** Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
Message-ID: <1208873842.18376.297.camel@hrosenstock-ws.xsigo.com>

Hi Tziporet,

On Tue, 2008-04-22 at 16:59 +0300, Tziporet Koren wrote:
> OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4
> development: 

> 2. OFED 1.4: 
>         
>         Release features were presented at Sonoma (presentation
>         available at
>         http://www.openfabrics.org/archives/april2008sonoma.htm)
>         
>         IPv6: Woody is looking for resources to add IPv6 support to
>         the CMA. Hal noted that it will require a change in opensm
>         too.
>         
>         Xsigo Vnic & Vhba - Not clear if they will make it
>         
>         Kernel tree is under work at:
>         git://git.openfabrics.org/ofed_1_4/linux-2.6.git branch
>         ofed_kernel 
>         We should try to get the kernel code to compile as soon as
>         possible so everybody will be able to contribute code.

My notes also had:
Reliable multicast was thought not to be able to make OFED 1.4

-- Hal


From ronniz at mellanox.co.il  Tue Apr 22 07:17:57 2008
From: ronniz at mellanox.co.il (Ronni Zimmermann)
Date: Tue, 22 Apr 2008 17:17:57 +0300
Subject: [ofa-general] ***SPAM*** add device capabilities flag to indicate
	support in creation of UC QPs which are attached to a SRQ
Message-ID: <6C2C79E72C305246B504CBA17B5500C903DA9BD6@mtlexch01.mtl.com>

Hi,
 
According to the IB spec release 1.2.1 (section 11-7.2-1.1), an HCA can
support attachment of UC QPs to a SRQ.
Since it's possible for an HCA to support SRQs without supporting
attachment of UC QPs to them, I believe we need a new device
capabilities flag to indicate  whether or not the device supports this
operation.  
 
Regards,
Ronni.
 
Ronni Zimmermann
SW Verification Group
Mellanox Technologies Ltd.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/385993fd/attachment.html>

From pyrrhictj2 at deutsche-hausbau.de  Tue Apr 22 07:38:45 2008
From: pyrrhictj2 at deutsche-hausbau.de (Valerie Weber)
Date: Tue, 22 Apr 2008 20:08:45 +0530
Subject: [ofa-general] Don't get ripped off my fakes anymore
Message-ID: <01c8a4b4$ab5c7190$92a1c675@pyrrhictj2>

L ru N vzg Hthe#1P bmv en pbg is En zh larg ktq eme irh nt P cm il ape ls
Gain 3-4 in ul ch pj es in just months!100% HE qq RB de AL FOR tkv MULA100% SAFE and NO SIDE EFFE yan CTS100% GUA mqo RANT ems EED RE bb SU ys LTSOrder L wf N iy H now and start gaining in apf ch hwj es ...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/d355e1b5/attachment.html>

From paulina.sn.lau at hkcsl.com  Tue Apr 22 07:41:16 2008
From: paulina.sn.lau at hkcsl.com (EuroSoftware)
Date: Tue, 22 Apr 2008 16:41:16 +0200
Subject: [ofa-general] Grosse Auswahl der Software zum runterladen
Message-ID: <257016145.62069872251717@hkcsl.com>

Ihre Software kommt ganz schnell ins Haus. Zahlen Sie und laden Sie es runter! Wir verkaufen Programme in allen europaeischen Sprachen, fuer Windows und fuer Macintosh. Wir verkaufen nur originale Vollversionen, aber sehr preiswert.Unseres Team von kompetenten Mitarbeitern kuemmert sich rasch um Ihr Anliegen! Wir antworten ganz schnell und es wird auch eine Geld-Zurueck-Garantie angeboten.Bei uns bekommen Sie nur perfekte Software
http://fradeakino.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/18a75cfe/attachment.html>

From dada1 at cosmosbay.com  Tue Apr 22 07:56:10 2008
From: dada1 at cosmosbay.com (Eric Dumazet)
Date: Tue, 22 Apr 2008 16:56:10 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <ea87c15371b1bd49380c.1208872277@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
Message-ID: <480DFC8A.8040105@cosmosbay.com>

Andrea Arcangeli a écrit :
> +
> +static int mm_lock_cmp(const void *a, const void *b)
> +{
> +	cond_resched();
> +	if ((unsigned long)*(spinlock_t **)a <
> +	    (unsigned long)*(spinlock_t **)b)
> +		return -1;
> +	else if (a == b)
> +		return 0;
> +	else
> +		return 1;
> +}
> +
This compare function looks unusual...
It should work, but sort() could be faster if the
if (a == b) test had a chance to be true eventually...

static int mm_lock_cmp(const void *a, const void *b)
{
	unsigned long la = (unsigned long)*(spinlock_t **)a;
	unsigned long lb = (unsigned long)*(spinlock_t **)b;

	cond_resched();
	if (la < lb)
		return -1;
	if (la > lb)
		return 1;
	return 0;
}


From eddy at cchm.com  Tue Apr 22 06:22:17 2008
From: eddy at cchm.com (Man's response)
Date: Tue, 22 Apr 2008 13:22:17 +0000
Subject: [ofa-general] No Impotence
Message-ID: <000a01c8a48a$02f20970$da69d9a3@htfeiwuf>

Canadian heathcare - your pharmacy shop!
Canadian Healthcare is your convenient, safe and private online source 
for approved pharmacy prescriptions. We sell exact generic equivalents of US FDA 
approved prescription drugs through our fully-licensed pharmacies.
Visit Canadian healthcare!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/d1f79eb7/attachment.html>

From andrea at qumranet.com  Tue Apr 22 08:15:30 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 17:15:30 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <480DFC8A.8040105@cosmosbay.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<480DFC8A.8040105@cosmosbay.com>
Message-ID: <20080422151529.GE24536@duo.random>

On Tue, Apr 22, 2008 at 04:56:10PM +0200, Eric Dumazet wrote:
> Andrea Arcangeli a écrit :
>> +
>> +static int mm_lock_cmp(const void *a, const void *b)
>> +{
>> +	cond_resched();
>> +	if ((unsigned long)*(spinlock_t **)a <
>> +	    (unsigned long)*(spinlock_t **)b)
>> +		return -1;
>> +	else if (a == b)
>> +		return 0;
>> +	else
>> +		return 1;
>> +}
>> +
> This compare function looks unusual...
> It should work, but sort() could be faster if the
> if (a == b) test had a chance to be true eventually...

Hmm, are you saying my mm_lock_cmp won't return 0 if a==b?

> static int mm_lock_cmp(const void *a, const void *b)
> {
> 	unsigned long la = (unsigned long)*(spinlock_t **)a;
> 	unsigned long lb = (unsigned long)*(spinlock_t **)b;
>
> 	cond_resched();
> 	if (la < lb)
> 		return -1;
> 	if (la > lb)
> 		return 1;
> 	return 0;
> }

If your intent is to use the assumption that there are going to be few
equal entries, you should have used likely(la > lb) to signal it's
rarely going to return zero or gcc is likely free to do whatever it
wants with the above. Overall that function is such a slow path that
this is going to be lost in the noise. My suggestion would be to defer
microoptimizations like this after 1/12 will be applied to mainline.

Thanks!


From tbrgkbgsucqx at bodyrepairs.com  Tue Apr 22 08:20:39 2008
From: tbrgkbgsucqx at bodyrepairs.com (Maurice Madrid)
Date: Wed, 23 Apr 2008 00:20:39 +0900
Subject: [ofa-general] Re: Hi Brand watch for style
Message-ID: <01c8a4d7$dba4e580$7a79f57d@tbrgkbgsucqx>

Classic never dies.

Visit us at http://theeshoebox.com


From avi at qumranet.com  Tue Apr 22 08:24:20 2008
From: avi at qumranet.com (Avi Kivity)
Date: Tue, 22 Apr 2008 18:24:20 +0300
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080422151529.GE24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<480DFC8A.8040105@cosmosbay.com>
	<20080422151529.GE24536@duo.random>
Message-ID: <480E0324.6050006@qumranet.com>

Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 04:56:10PM +0200, Eric Dumazet wrote:
>   
>> Andrea Arcangeli a écrit :
>>     
>>> +
>>> +static int mm_lock_cmp(const void *a, const void *b)
>>> +{
>>> +	cond_resched();
>>> +	if ((unsigned long)*(spinlock_t **)a <
>>> +	    (unsigned long)*(spinlock_t **)b)
>>> +		return -1;
>>> +	else if (a == b)
>>> +		return 0;
>>> +	else
>>> +		return 1;
>>> +}
>>> +
>>>       
>> This compare function looks unusual...
>> It should work, but sort() could be faster if the
>> if (a == b) test had a chance to be true eventually...
>>     
>
> Hmm, are you saying my mm_lock_cmp won't return 0 if a==b?
>
>   

You need to compare *a to *b (at least, that's what you're doing for the 
< case).

-- 
error compiling committee.c: too many arguments to function


From holt at sgi.com  Tue Apr 22 08:26:00 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 10:26:00 -0500
Subject: [ofa-general] Re: [PATCH 0 of 9] mmu notifier #v12
In-Reply-To: <20080422134847.GT12709@duo.random>
References: <patchbomb.1207669443@duo.random> <20080409131709.GR11364@sgi.com>
	<20080409144401.GT10133@duo.random>
	<20080409185500.GT11364@sgi.com>
	<20080422072026.GM12709@duo.random>
	<20080422120056.GR12709@duo.random>
	<20080422130120.GR22493@sgi.com>
	<20080422132143.GS12709@duo.random>
	<20080422133604.GN30298@sgi.com>
	<20080422134847.GT12709@duo.random>
Message-ID: <20080422152600.GP30298@sgi.com>

Andrew, Could we get direction/guidance from you as regards
the invalidate_page() callout of Andrea's patch set versus the
invalidate_range_start/invalidate_range_end callout pairs of Christoph's
patchset?  This is only in the context of the __xip_unmap, do_wp_page,
page_mkclean_one, and try_to_unmap_one call sites.

On Tue, Apr 22, 2008 at 03:48:47PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 08:36:04AM -0500, Robin Holt wrote:
> > I am a little confused about the value of the seq_lock versus a simple
> > atomic, but I assumed there is a reason and left it at that.
> 
> There's no value for anything but get_user_pages (get_user_pages takes
> its own lock internally though). I preferred to explain it as a
> seqlock because it was simpler for reading, but I totally agree in the
> final implementation it shouldn't be a seqlock. My code was meant to
> be pseudo-code only. It doesn't even need to be atomic ;).

Unless there is additional locking in your fault path, I think it does
need to be atomic.

> > I don't know what you mean by "it'd" run slower and what you mean by
> > "armed and disarmed".
> 
> 1) when armed the time-window where the kvm-page-fault would be
> blocked would be a bit larger without invalidate_page for no good
> reason

But that is a distinction without a difference.  In the _start/_end
case, kvm's fault handler will not have any _DIRECT_ blocking, but
get_user_pages() had certainly better block waiting for some other lock
to prevent the process's pages being refaulted.

I am no VM expert, but that seems like it is critical to having a
consistent virtual address space.  Effectively, you have a delay on the
kvm fault handler beginning when either invalidate_page() is entered
or invalidate_range_start() is entered until when the _CALLER_ of the
invalidate* method has unlocked.  That time will remain essentailly
identical for either case.  I would argue you would be hard pressed to
even measure the difference.

> 2) if you were to remove invalidate_page when disarmed the VM could
> would need two branches instead of one in various places

Those branches are conditional upon there being list entries.  That check
should be extremely cheap.  The vast majority of cases will have no
registered notifiers.  The second check for the _end callout will be
from cpu cache.

> I don't want to waste cycles if not wasting them improves performance
> both when armed and disarmed.

In summary, I think we have narrowed down the case of no registered
notifiers to being infinitesimal.  The case of registered notifiers
being a distinction without a difference.

> > When I was discussing this difference with Jack, he reminded me that
> > the GRU, due to its hardware, does not have any race issues with the
> > invalidate_page callout simply doing the tlb shootdown and not modifying
> > any of its internal structures.  He then put a caveat on the discussion
> > that _either_ method was acceptable as far as he was concerned.  The real
> > issue is getting a patch in that satisfies all needs and not whether
> > there is a seperate invalidate_page callout.
> 
> Sure, we have that patch now, I'll send it out in a minute, I was just
> trying to explain why it makes sense to have an invalidate_page too
> (which remains the only difference by now), removing it would be a
> regression on all sides, even if a minor one.

I think GRU is the only compelling case I have heard for having the
invalidate_page seperate.  In the case of the GRU, the hardware enforces a
lifetime of the invalidate which covers all in-progress faults including
ones where the hardware is informed after the flush of a PTE.  in all
cases, once the GRU invalidate instruction is issued, all active requests
are invalidated.  Future faults will be blocked in get_user_pages().
Without that special feature of the hardware, I don't think any code
simplification exists.  I, of course, reserve the right to be wrong.

I believe the argument against a seperate invalidate_page() callout was
Christoph's interpretation of Andrew's comments.  I am not certain Andrew
was aware of this special aspects of the GRU hardware and whether that
had been factored into the discussion at that point in time.


Thanks,
Robin


From saluteqrz5 at forum-institut.de  Tue Apr 22 08:26:41 2008
From: saluteqrz5 at forum-institut.de (Moses Ouellette)
Date: Tue, 22 Apr 2008 07:26:41 -0800
Subject: [ofa-general] You must be The Real Man with huge dignity
Message-ID: <01c8a44a$355ece80$8f0f505c@saluteqrz5>

Gain 3-4 in bx ch zd es in just mo ne nt apd hs!LLNNHHSOLUTIONSClick here!!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/dca8df3d/attachment.html>

From xma at us.ibm.com  Tue Apr 22 08:01:51 2008
From: xma at us.ibm.com (Shirley Ma)
Date: Tue, 22 Apr 2008 08:01:51 -0700
Subject: ***SPAM*** Re: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP
	support, Patch	10)
In-Reply-To: <480DF2A2.8030602@mellanox.co.il>
Message-ID: <OFDB759592.F5E397C2-ON87257433.00523FBC-88257433.0026AF89@us.ibm.com>


Hello Yevgeny,

      Can you give more details of this patch? What's the relationship
between CQ, EQ, port?
      I was thinking to implement it in upper layer. Is it better to
implement in upper layer protocol, rather than device layer?

thanks
Shirley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/17affa5c/attachment.html>

From dada1 at cosmosbay.com  Tue Apr 22 08:37:38 2008
From: dada1 at cosmosbay.com (Eric Dumazet)
Date: Tue, 22 Apr 2008 17:37:38 +0200
Subject: [ofa-general] ***SPAM*** Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080422151529.GE24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<480DFC8A.8040105@cosmosbay.com>
	<20080422151529.GE24536@duo.random>
Message-ID: <480E0642.6080109@cosmosbay.com>

Andrea Arcangeli a écrit :
> On Tue, Apr 22, 2008 at 04:56:10PM +0200, Eric Dumazet wrote:
>   
>> Andrea Arcangeli a écrit :
>>     
>>> +
>>> +static int mm_lock_cmp(const void *a, const void *b)
>>> +{
>>> +	cond_resched();
>>> +	if ((unsigned long)*(spinlock_t **)a <
>>> +	    (unsigned long)*(spinlock_t **)b)
>>> +		return -1;
>>> +	else if (a == b)
>>> +		return 0;
>>> +	else
>>> +		return 1;
>>> +}
>>> +
>>>       
>> This compare function looks unusual...
>> It should work, but sort() could be faster if the
>> if (a == b) test had a chance to be true eventually...
>>     
>
> Hmm, are you saying my mm_lock_cmp won't return 0 if a==b?
>   
I am saying your intent was probably to test

else if ((unsigned long)*(spinlock_t **)a ==
	    (unsigned long)*(spinlock_t **)b)
		return 0;


Because a and b are pointers to the data you want to compare. You need 
to dereference them.


>> static int mm_lock_cmp(const void *a, const void *b)
>> {
>> 	unsigned long la = (unsigned long)*(spinlock_t **)a;
>> 	unsigned long lb = (unsigned long)*(spinlock_t **)b;
>>
>> 	cond_resched();
>> 	if (la < lb)
>> 		return -1;
>> 	if (la > lb)
>> 		return 1;
>> 	return 0;
>> }
>>     
>
> If your intent is to use the assumption that there are going to be few
> equal entries, you should have used likely(la > lb) to signal it's
> rarely going to return zero or gcc is likely free to do whatever it
> wants with the above. Overall that function is such a slow path that
> this is going to be lost in the noise. My suggestion would be to defer
> microoptimizations like this after 1/12 will be applied to mainline.
>
> Thanks!
>
>   
Hum, it's not a micro-optimization, but a bug fix. :)

Sorry if it was not clear


From rdreier at cisco.com  Tue Apr 22 08:45:26 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 08:45:26 -0700
Subject: [ofa-general] Problem with libibverbs and huge pages registration.
In-Reply-To: <20080422111412.GH7771@minantech.com> (Gleb Natapov's message of
	"Tue, 22 Apr 2008 14:14:13 +0300")
References: <20080421141441.GF7771@minantech.com> <adaod82an80.fsf@cisco.com>
	<20080422111412.GH7771@minantech.com>
Message-ID: <ada3apdao6h.fsf@cisco.com>

 > I suppose "if" below depends on updated refcnt, so update can't be moved
 > down without changing the "if" statement.

Yes, good point.  And also I think we need to undo splitting/merging if
we fail to do the operation.

This all needs more care.


From nicole at pacificahost.com  Tue Apr 22 09:38:18 2008
From: nicole at pacificahost.com (Mason Pace)
Date: Tue, 22 Apr 2008 18:38:18 +0200
Subject: [ofa-general] Wir machen Ihren grossen Schwanz viel groesser
Message-ID: <01c8a4a8$08410900$518cca57@nicole>

Probieren Sie doch unser Produkt, anstatt sich ueber die Groesse Ihres Schwanzes zu aergern. Es ist eine absolut sichere Methode der Vergroesserung, unglaubliche Ergebnisse lassen auf sich nicht lange warten und Sie werden ueberrascht sein, wie schnell und wie gross ihr Glied wird. http://frulleon.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/4eee627f/attachment.html>

From andrea at qumranet.com  Tue Apr 22 09:46:15 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 18:46:15 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <480E0642.6080109@cosmosbay.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<480DFC8A.8040105@cosmosbay.com>
	<20080422151529.GE24536@duo.random>
	<480E0642.6080109@cosmosbay.com>
Message-ID: <20080422164615.GG24536@duo.random>

On Tue, Apr 22, 2008 at 05:37:38PM +0200, Eric Dumazet wrote:
> I am saying your intent was probably to test
>
> else if ((unsigned long)*(spinlock_t **)a ==
> 	    (unsigned long)*(spinlock_t **)b)
> 		return 0;

Indeed...

> Hum, it's not a micro-optimization, but a bug fix. :)

The good thing is that even if this bug would lead to a system crash,
it would be still zero risk for everybody that isn't using KVM/GRU
actively with mmu notifiers. The important thing is that this patch
has zero risk to introduce regressions into the kernel, both when
enabled and disabled, it's like a new driver. I'll shortly resend 1/12
and likely 12/12 for theoretical correctness. For now you can go ahead
testing with this patch as it'll work fine despite of the bug (if it
wasn't the case I would have noticed already ;).


From opvoz at hij.jp  Tue Apr 22 08:14:26 2008
From: opvoz at hij.jp (Watches)
Date: Tue, 22 Apr 2008 15:14:26 +0000
Subject: [ofa-general] Rolex Watches
Message-ID: <000701c8a49a$02a0b739$988aee8e@qauqbps>

Replica Watches - cheap and really good solution!
What is a replica watch and how is it different from the real watches? 

A replica watch is a watch made similar to that of the real brand ones, except, at a much lower cost. 
A real Rolex can go up to hundreds of thousands of dollars, but you can get a replica similar to that one, 
for only a few hundred dollars. This allows the normal everyday person to be able to look and feel classy, 
without having to actually spend such ridiculous amounts of money on it. 
Visit our replica watches shop!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/d7477d0d/attachment.html>

From PHF at zurich.ibm.com  Tue Apr 22 10:02:17 2008
From: PHF at zurich.ibm.com (Philip Frey1)
Date: Tue, 22 Apr 2008 19:02:17 +0200
Subject: [ofa-general] CM ID
Message-ID: <OF18F329F1.71756CEB-ONC1257433.005D2736-C1257433.005D88B5@ch.ibm.com>

I have realised that the verbs of the rdma_cm_id are only valid after a 
call to rdma_resolve_addr().

How can I create a memory region before connecting to the remote host?
In order to create an ibv_mr, I need a protection domain (PD).
For creating a PD, I need an ibv_context which I get from cm_id->verbs but 
they are only valid after
resolving the address.

So what would be the correct way to call ibv_alloc_pd() and ibv_reg_mr() 
before resolving the address
which I might not yet know (especially on the server side).

Many thanks,
 Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/2b441b01/attachment.html>

From MingChun-anyar at Taycon.co.nz  Tue Apr 22 10:18:35 2008
From: MingChun-anyar at Taycon.co.nz (MingChun)
Date: Tue, 22 Apr 2008 19:18:35 +0200
Subject: [ofa-general] =?iso-8859-1?q?Don=92t_hesitate_to_order_the_best_a?=
	=?iso-8859-1?q?nd_chipset_medical_goods=2E?=
Message-ID: <1976BDB2.5B%MingChun-anyar@Taycon.co.nz>

My family is secured. We buy all medicines in the best medical store! http://www.powereals.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/f750068c/attachment.html>

From rdreier at cisco.com  Tue Apr 22 10:23:38 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 10:23:38 -0700
Subject: [ofa-general] CM ID
In-Reply-To: <OF18F329F1.71756CEB-ONC1257433.005D2736-C1257433.005D88B5@ch.ibm.com>
	(Philip Frey1's message of "Tue, 22 Apr 2008 19:02:17 +0200")
References: <OF18F329F1.71756CEB-ONC1257433.005D2736-C1257433.005D88B5@ch.ibm.com>
Message-ID: <adatzht952d.fsf@cisco.com>

 > I have realised that the verbs of the rdma_cm_id are only valid after a 
 > call to rdma_resolve_addr().
 > 
 > How can I create a memory region before connecting to the remote host?
 > In order to create an ibv_mr, I need a protection domain (PD).
 > For creating a PD, I need an ibv_context which I get from cm_id->verbs but 
 > they are only valid after
 > resolving the address.
 > 
 > So what would be the correct way to call ibv_alloc_pd() and ibv_reg_mr() 
 > before resolving the address
 > which I might not yet know (especially on the server side).

It doesn't really make sense to use any verbs before you have resolved
the address, because you don't know which device will be used until the
address is used.

 - R.


From rdreier at cisco.com  Tue Apr 22 10:24:37 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 10:24:37 -0700
Subject: [ofa-general] Re: add device capabilities flag to indicate support
	in creation of UC QPs which are attached to a SRQ
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903DA9BD6@mtlexch01.mtl.com>
	(Ronni Zimmermann's message of "Tue, 22 Apr 2008 17:17:57 +0300")
References: <6C2C79E72C305246B504CBA17B5500C903DA9BD6@mtlexch01.mtl.com>
Message-ID: <adaprsh950q.fsf@cisco.com>

 > According to the IB spec release 1.2.1 (section 11-7.2-1.1), an HCA can
 > support attachment of UC QPs to a SRQ.
 > Since it's possible for an HCA to support SRQs without supporting
 > attachment of UC QPs to them, I believe we need a new device
 > capabilities flag to indicate  whether or not the device supports this
 > operation.  

OK I guess, although we seem to be using up device capability flags at
an alarming rate.  I guess in the not-too-distant future we'll have to
extend the API to allow more capabilities.

 - R.


From clemons at basswoodpartners.com  Tue Apr 22 10:42:30 2008
From: clemons at basswoodpartners.com (Ollie Mclean)
Date: Tue, 22 Apr 2008 18:42:30 +0100
Subject: [ofa-general] Ueberaschen Sie Ihre geliebte mit Ihrem Glied
Message-ID: <01c8a4a8$9e752f00$bfde9ac1@clemons>

Werden Sie mit Ihrem Umfang gluecklich und seien Sie damit gluecklich und zufrieden. Unser Produkt hat schon millionen von Maennern geholfen, die Penisgroesse zu bekommen, von der Sie immer getraeumt haben. Lassen Sie sich ueberraschen, Sie werden garantiert gluecklich werden.http://frulleon.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/9c8ff118/attachment.html>

From sean.hefty at intel.com  Tue Apr 22 11:17:11 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 11:17:11 -0700
Subject: [ofa-general] beginner resources
In-Reply-To: <6978b4af0804220309t1ae34185y83ba69f9bbfa309b@mail.gmail.com>
References: <6978b4af0804220309t1ae34185y83ba69f9bbfa309b@mail.gmail.com>
Message-ID: <000101c8a4a5$16208c10$40fc070a@amr.corp.intel.com>

> is this the right list to ask totally beginner questions
> (even code snippets) or is there any other resource for this matter?


Beginner questions are fine.  But you may be directed to a spec, RFC, man page,
etc.

Code examples are available with the userspace libraries (libibverbs, librdmacm)
that may help.  The libraries also provide man pages for the various APIs.

- Sean


From holt at sgi.com  Tue Apr 22 11:22:13 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 13:22:13 -0500
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <patchbomb.1208872276@duo.random>
References: <patchbomb.1208872276@duo.random>
Message-ID: <20080422182213.GS22493@sgi.com>


I believe the differences between your patch set and Christoph's need
to be understood and a compromise approach agreed upon.

Those differences, as I understand them, are:

1) invalidate_page:  You retain an invalidate_page() callout.  I believe
we have progressed that discussion to the point that it requires some
direction for Andrew, Linus, or somebody in authority.  The basics
of the difference distill down to no expected significant performance
difference between the two.  The invalidate_page() callout potentially
can simplify GRU code.  It does provide a more complex api for the
users of mmu_notifier which, IIRC, Christoph had interpretted from one
of Andrew's earlier comments as being undesirable.  I vaguely recall
that sentiment as having been expressed.

2) Range callout names: Your range callouts are invalidate_range_start
and invalidate_range_end whereas Christoph's are start and end.  I do not
believe this has been discussed in great detail.  I know I have expressed
a preference for your names.  I admit to having failed to follow up on
this issue.  I certainly believe we could come to an agreement quickly
if pressed.

3) The structure of the patch set:  Christoph's upcoming release orders
the patches so the prerequisite patches are seperately reviewable
and each file is only touched by a single patch.  Additionally, that
allows mmu_notifiers to be introduced as a single patch with sleeping
functionality from its inception and an API which remains unchanged.
Your patch set, however, introduces one API, then turns around and
changes that API.  Again, the desire to make it an unchanging API was
expressed by, IIRC, Andrew.  This does represent a risk to XPMEM as
the non-sleeping API may become entrenched and make acceptance of the
sleeping version less acceptable.

Can we agree upon this list of issues?

Thank you,
Robin Holt


From sean.hefty at intel.com  Tue Apr 22 11:25:10 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 11:25:10 -0700
Subject: [ofa-general][PATCH] mlx4: Prereserved Qp regions (MP support,
	Patch4)
In-Reply-To: <480D8803.1050404@mellanox.co.il>
References: <480D8803.1050404@mellanox.co.il>
Message-ID: <000201c8a4a6$334addd0$40fc070a@amr.corp.intel.com>

>We reserve Qp ranges to be used by other modules in case
>the ports come up as Ethernet ports.
>The qps are reserved at the end of the QP table.
>(This way we assure that they are alligned to their size)

Can you explain this in more detail?  What are the 'other modules'?  Are you
reserving specific QP numbers?  Are the QPs only reserved when running over
Ethernet?  Why is this done/needed exactly?

I don't really understand the alignment comment, but that's a separate issue for
me.

- Sean


From sean.hefty at intel.com  Tue Apr 22 11:27:18 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 11:27:18 -0700
Subject: ***SPAM*** [ofa-general][PATCH] mlx4: Collapsed CQ support
	(MPsupport, Patch 9)
In-Reply-To: <480DF23A.7090304@mellanox.co.il>
References: <480DF23A.7090304@mellanox.co.il>
Message-ID: <000301c8a4a6$7fb9d270$40fc070a@amr.corp.intel.com>

>Changed cq creation API to support the creation of collapsed cqs.

What is a 'collapsed cq'?  (mayb you explained this in a different part of the
patch series that I haven't looked at yet...)

- Sean


From granger at fcb-online.com  Tue Apr 22 11:30:31 2008
From: granger at fcb-online.com (Charlotte Lacy)
Date: Tue, 22 Apr 2008 10:30:31 -0800
Subject: [ofa-general] Lesen Sie genau und Sie bekommen grosses Glied
Message-ID: <01c8a463$e3c33d80$53c29079@granger>

Werden Sie mit Ihrem Umfang gluecklich und seien Sie damit gluecklich und zufrieden. Unser Markenprodukt ist schon weltweit dafuer bekannt, dass er die besten Erfolge in kuerzester Zeit erzielt und hat schon das Vertrauen von Millionen Kunden weltweit gewonnen. Es gibt keinen Grund, warum auch Sie das nicht probieren koennenhttp://frulleon.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/ef96c4b2/attachment.html>

From holleredmna at steelcase-werndl.de  Tue Apr 22 11:39:09 2008
From: holleredmna at steelcase-werndl.de (Sabrina Flores)
Date: Tue, 22 Apr 2008 19:39:09 +0100
Subject: [ofa-general] Give your partner new feelings while have a sex
Message-ID: <01c8a4b0$88abd0e0$aaeb0f53@holleredmna>

L zor N at H  So fiw lu gly tio lh nsGa ti in 3-4 In qbz ch gqg es in just months!

CLI zny CK HE zlo RE!!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/4ceaec45/attachment.html>

From andrea at qumranet.com  Tue Apr 22 11:43:35 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 22 Apr 2008 20:43:35 +0200
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <20080422182213.GS22493@sgi.com>
References: <patchbomb.1208872276@duo.random> <20080422182213.GS22493@sgi.com>
Message-ID: <20080422184335.GN24536@duo.random>

On Tue, Apr 22, 2008 at 01:22:13PM -0500, Robin Holt wrote:
> 1) invalidate_page:  You retain an invalidate_page() callout.  I believe
> we have progressed that discussion to the point that it requires some
> direction for Andrew, Linus, or somebody in authority.  The basics
> of the difference distill down to no expected significant performance
> difference between the two.  The invalidate_page() callout potentially
> can simplify GRU code.  It does provide a more complex api for the
> users of mmu_notifier which, IIRC, Christoph had interpretted from one
> of Andrew's earlier comments as being undesirable.  I vaguely recall
> that sentiment as having been expressed.

invalidate_page as demonstrated in KVM pseudocode doesn't change the
locking requirements, and it has the benefit of reducing the window of
time the secondary page fault has to be masked and at the same time
_halves_ the number of _hooks_ in the VM every time the VM deal with
single pages (example: do_wp_page hot path). As long as we can't fully
converge because of point 3, it'd rather keep invalidate_page to be
better. But that's by far not a priority to keep.

> 2) Range callout names: Your range callouts are invalidate_range_start
> and invalidate_range_end whereas Christoph's are start and end.  I do not
> believe this has been discussed in great detail.  I know I have expressed
> a preference for your names.  I admit to having failed to follow up on
> this issue.  I certainly believe we could come to an agreement quickly
> if pressed.

I think using ->start ->end is a mistake, think when we later add
mprotect_range_start/end. Here too I keep the better names only
because we can't converge on point 3 (the API will eventually change,
like every other kernel interal API, even core things like __free_page
have been mostly obsoleted).

> 3) The structure of the patch set:  Christoph's upcoming release orders
> the patches so the prerequisite patches are seperately reviewable
> and each file is only touched by a single patch.  Additionally, that

Each file touched by a single patch? I doubt... The split is about the
same, the main difference is the merge ordering, I always had the zero
risk part at the head, he moved it at the tail when he incorporated
#v12 into his patchset.

> allows mmu_notifiers to be introduced as a single patch with sleeping
> functionality from its inception and an API which remains unchanged.
> Your patch set, however, introduces one API, then turns around and
> changes that API.  Again, the desire to make it an unchanging API was
> expressed by, IIRC, Andrew.  This does represent a risk to XPMEM as
> the non-sleeping API may become entrenched and make acceptance of the
> sleeping version less acceptable.
> 
> Can we agree upon this list of issues?

This is a kernel internal API, so it will definitely change over
time. It's nothing close to a syscall.

Also note: the API is obviously defined in mmu_notifier.h and none of
the 2-12 patches touches mmu_notifier.h. So the extension of the
method semantics is 100% backwards compatible.

My patch order and API backward compatible extension over the patchset
is done to allow 2.6.26 to fully support KVM/GRU and 2.6.27 to support
XPMEM as well. KVM/GRU won't notice any difference once the support
for XPMEM is added, but even if the API would completely change in
2.6.27, that's still better than no functionality at all in 2.6.26.


From tziporet at dev.mellanox.co.il  Tue Apr 22 11:53:26 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 22 Apr 2008 21:53:26 +0300
Subject: ***SPAM*** [ofa-general][PATCH] mlx4: Collapsed CQ
	support	(MPsupport, Patch 9)
In-Reply-To: <000301c8a4a6$7fb9d270$40fc070a@amr.corp.intel.com>
References: <480DF23A.7090304@mellanox.co.il>
	<000301c8a4a6$7fb9d270$40fc070a@amr.corp.intel.com>
Message-ID: <480E3426.5060907@mellanox.co.il>

Sean Hefty wrote:
>> Changed cq creation API to support the creation of collapsed cqs.
>>     
>
> What is a 'collapsed cq'?  (mayb you explained this in a different part of the
> patch series that I haven't looked at yet...)
>
>   
Collapsed CQ is a HW feature of ConnectX.
If you have ConnectX PRM you can read more details about it.

Tziporet


From rdreier at cisco.com  Tue Apr 22 11:55:44 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 11:55:44 -0700
Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Use print_mac() to format
	ethernet addresses for printing
Message-ID: <ada63u9g1n3.fsf@cisco.com>

Removing open-coded MAC formats shrinks the source and the generated
code too, eg on x86-64:

add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103)
function                                     old     new   delta
make_cm_node                                 932     912     -20
nes_netdev_set_mac_address                   427     406     -21
nes_netdev_set_multicast_list               1148    1124     -24
nes_probe                                   2349    2311     -38

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/nes/nes.c     |   10 ++++------
 drivers/infiniband/hw/nes/nes_cm.c  |    8 +++-----
 drivers/infiniband/hw/nes/nes_nic.c |   18 ++++++++----------
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c
index b046262..c0671ad 100644
--- a/drivers/infiniband/hw/nes/nes.c
+++ b/drivers/infiniband/hw/nes/nes.c
@@ -353,13 +353,11 @@ struct ib_qp *nes_get_qp(struct ib_device *device, int qpn)
  */
 static void nes_print_macaddr(struct net_device *netdev)
 {
-	nes_debug(NES_DBG_INIT, "%s: MAC %02X:%02X:%02X:%02X:%02X:%02X, IRQ %u\n",
-			netdev->name,
-			netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2],
-			netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5],
-			netdev->irq);
-}
+	DECLARE_MAC_BUF(mac);
 
+	nes_debug(NES_DBG_INIT, "%s: %s, IRQ %u\n",
+		  netdev->name, print_mac(mac, netdev->dev_addr), netdev->irq);
+}
 
 /**
  * nes_interrupt - handle interrupts
diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index d073862..b53bceb 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -1054,6 +1054,7 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core,
 	int arpindex = 0;
 	struct nes_device *nesdev;
 	struct nes_adapter *nesadapter;
+	DECLARE_MAC_BUF(mac);
 
 	/* create an hte and cm_node for this instance */
 	cm_node = kzalloc(sizeof(*cm_node), GFP_ATOMIC);
@@ -1116,11 +1117,8 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core,
 
 	/* copy the mac addr to node context */
 	memcpy(cm_node->rem_mac, nesadapter->arp_table[arpindex].mac_addr, ETH_ALEN);
-	nes_debug(NES_DBG_CM, "Remote mac addr from arp table:%02x,"
-			" %02x, %02x, %02x, %02x, %02x\n",
-			cm_node->rem_mac[0], cm_node->rem_mac[1],
-			cm_node->rem_mac[2], cm_node->rem_mac[3],
-			cm_node->rem_mac[4], cm_node->rem_mac[5]);
+	nes_debug(NES_DBG_CM, "Remote mac addr from arp table: %s\n",
+		  print_mac(mac, cm_node->rem_mac));
 
 	add_hte_node(cm_core, cm_node);
 	atomic_inc(&cm_nodes_created);
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 01cd0ef..e5366b0 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -787,16 +787,14 @@ static int nes_netdev_set_mac_address(struct net_device *netdev, void *p)
 	int i;
 	u32 macaddr_low;
 	u16 macaddr_high;
+	DECLARE_MAC_BUF(mac);
 
 	if (!is_valid_ether_addr(mac_addr->sa_data))
 		return -EADDRNOTAVAIL;
 
 	memcpy(netdev->dev_addr, mac_addr->sa_data, netdev->addr_len);
-	printk(PFX "%s: Address length = %d, Address = %02X%02X%02X%02X%02X%02X..\n",
-		   __func__, netdev->addr_len,
-		   mac_addr->sa_data[0], mac_addr->sa_data[1],
-		   mac_addr->sa_data[2], mac_addr->sa_data[3],
-		   mac_addr->sa_data[4], mac_addr->sa_data[5]);
+	printk(PFX "%s: Address length = %d, Address = %s\n",
+	       __func__, netdev->addr_len, print_mac(mac, mac_addr->sa_data));
 	macaddr_high = ((u16)netdev->dev_addr[0]) << 8;
 	macaddr_high += (u16)netdev->dev_addr[1];
 	macaddr_low = ((u32)netdev->dev_addr[2]) << 24;
@@ -878,11 +876,11 @@ static void nes_netdev_set_multicast_list(struct net_device *netdev)
 			if (mc_nic_index < 0)
 				mc_nic_index = nesvnic->nic_index;
 			if (multicast_addr) {
-				nes_debug(NES_DBG_NIC_RX, "Assigning MC Address = %02X%02X%02X%02X%02X%02X to register 0x%04X nic_idx=%d\n",
-						  multicast_addr->dmi_addr[0], multicast_addr->dmi_addr[1],
-						  multicast_addr->dmi_addr[2], multicast_addr->dmi_addr[3],
-						  multicast_addr->dmi_addr[4], multicast_addr->dmi_addr[5],
-						  perfect_filter_register_address+(mc_index * 8), mc_nic_index);
+				DECLARE_MAC_BUF(mac);
+				nes_debug(NES_DBG_NIC_RX, "Assigning MC Address %s to register 0x%04X nic_idx=%d\n",
+					  print_mac(mac, multicast_addr->dmi_addr),
+					  perfect_filter_register_address+(mc_index * 8),
+					  mc_nic_index);
 				macaddr_high = ((u16)multicast_addr->dmi_addr[0]) << 8;
 				macaddr_high += (u16)multicast_addr->dmi_addr[1];
 				macaddr_low = ((u32)multicast_addr->dmi_addr[2]) << 24;
-- 
1.5.5.1


From sean.hefty at intel.com  Tue Apr 22 12:17:13 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 12:17:13 -0700
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
Message-ID: <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>

>Just wanted to be sure you saw this posting from Jason :-) If you
>haven't had time to get to it, that's fine but wanted to make sure it
>didn't get lost in the email as I've seen messages dropped... Sorry for
>the noise.

Thanks - I never saw it.

>So my expectation on how the spec outlines this should work is that
>the requesting side does essentially:
>    ibv_query_device(verbs,&devAttr);
>    req.responder_resources = devAttr.max_qp_rd_atom;
>    req.initiator_depth = devAttr.max_qp_init_rd_atom;
>
>When making the req (assuming it wants the maximum).
>
>The passive side should then take req.initiator_depth, limit it to its
>devAttr.max_qp_rd_atom (and layer a client limit on top of that) and
>assign it to max_dest_rd_atomic on its QP, and also assign it to
>rep.responder_resources.
>
>Next, the passive side should take req.responder_resources, limit it
>to devAttr.max_qp_init_rd_atom (and again layer a client limit on top of
>that), and assign it to max_rd_atomic on its QP, and return it in
>rep.initiator_depth.
>
>The active side should, generally, use the form above and use the
>values in the rep to program its max_rd_atomic and max_dest_rd_atomic.
>
>I can't find any of this in any of the cm libraries - and this is the
>sort of thing I was expecting to find in kernel cm.c, since other than
>letting the client on the passive side specify lower limits there
>really isn't much latitude here.

The initiator_depth and responder_resources are control by the CM ULP, and are
specified when calling send_cm_req / send_cm_rep.  The exchanged values are
reported through the req_event/rep_event parameters.

The behavior that you're describing is done by the kernel cm.  Look in
ib_send_cm_req / ib_send_cm_rep / cm_req_handler / cm_rep_handler.

>The particular change you introduced to support DAPL strikes me as
>just strange, overriding the incoming initator_depth with the passive
>side's responder_resources choice and then not returing that change in
>the rep makes no sense to me at all and could cause a slow down since
>the two ends are now mismatched.

The active side initiator_depth and responder_resources are set by the active
side when calling ib_send_cm_req.  The passive side initializes its values to
the data carried in the REQ.  When the passive sides sends a REP, it is allowed
to reduce the values.  The CM adjusts both the passive and active side values
based on the data in the REP.

Mismatched ends end up with the connection being broken.

>(Assuming that max_dest_rd_atomic corrisponds to responder resources
>and that max_rd_atomic corrisponds to initiator depth as discussed in

This is correct.

- Sean
 

From synergisticl340 at hvsv-vs-kassel.de  Tue Apr 22 12:35:51 2008
From: synergisticl340 at hvsv-vs-kassel.de (Valerie Bonner)
Date: Wed, 23 Apr 2008 04:35:51 +0900
Subject: [ofa-general] Extend your possibilities in your private life
Message-ID: <01c8a4fb$824ead80$8c598c79@synergisticl340>

Watch out for all the scams, L vnt N wz H Ma nk xd ivm ik pi voj ll zps s are the only real deal, plus they are offering a b xn ig sa jq le right now!

L utp N fsu H Ma omz xd wh ik p of il st ls help users:Shoot like a p db or hc nst lvd arBeef up your s ux iz ra eI pc ncr sw ea ywe se l ui en wcm gt xf h and widthIn fd cr yyt ea acs se s shr ex pkv ua ye l sta nss mi po naHa ftu rd jbl er e doy rec crv tio wof ns, she will feel it!

Li mk nk he eg re
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080423/dc87b620/attachment.html>

From holt at sgi.com  Tue Apr 22 12:42:23 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 14:42:23 -0500
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <20080422184335.GN24536@duo.random>
References: <patchbomb.1208872276@duo.random> <20080422182213.GS22493@sgi.com>
	<20080422184335.GN24536@duo.random>
Message-ID: <20080422194223.GT22493@sgi.com>

On Tue, Apr 22, 2008 at 08:43:35PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 01:22:13PM -0500, Robin Holt wrote:
> > 1) invalidate_page:  You retain an invalidate_page() callout.  I believe
> > we have progressed that discussion to the point that it requires some
> > direction for Andrew, Linus, or somebody in authority.  The basics
> > of the difference distill down to no expected significant performance
> > difference between the two.  The invalidate_page() callout potentially
> > can simplify GRU code.  It does provide a more complex api for the
> > users of mmu_notifier which, IIRC, Christoph had interpretted from one
> > of Andrew's earlier comments as being undesirable.  I vaguely recall
> > that sentiment as having been expressed.
> 
> invalidate_page as demonstrated in KVM pseudocode doesn't change the
> locking requirements, and it has the benefit of reducing the window of
> time the secondary page fault has to be masked and at the same time
> _halves_ the number of _hooks_ in the VM every time the VM deal with
> single pages (example: do_wp_page hot path). As long as we can't fully
> converge because of point 3, it'd rather keep invalidate_page to be
> better. But that's by far not a priority to keep.

Christoph, Jack and I just discussed invalidate_page().  I don't think
the point Andrew was making is that compelling in this circumstance.
The code has change fairly remarkably.  Would you have any objection to
putting it back into your patch/agreeing to it remaining in Andrea's
patch?  If not, I think we can put this issue aside until Andrew gets
out of the merge window and can decide it.  Either way, the patches
become much more similar with this in.

Thanks,
Robin


From swise at opengridcomputing.com  Tue Apr 22 13:00:00 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Tue, 22 Apr 2008 15:00:00 -0500
Subject: [ofa-general] Agenda for the OFED meeting today
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
Message-ID: <480E43C0.6080107@opengridcomputing.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080422/3ae9a466/attachment.html>

From clameter at sgi.com  Tue Apr 22 13:19:29 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:19:29 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <ea87c15371b1bd49380c.1208872277@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
Message-ID: <Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>

Thanks for adding most of my enhancements. But

1. There is no real need for invalidate_page(). Can be done with 
	invalidate_start/end. Needlessly complicates the API. One
	of the objections by Andrew was that there mere multiple
	callbacks that perform similar functions.

2. The locks that are used are later changed to semaphores. This is
   f.e. true for mm_lock / mm_unlock. The diffs will be smaller if the
   lock conversion is done first and then mm_lock is introduced. The
   way the patches are structured means that reviewers cannot review the
   final version of mm_lock etc etc. The lock conversion needs to come 
   first.

3. As noted by Eric and also contained in private post from yesterday by 
   me: The cmp function needs to retrieve the value before
   doing comparisons which is not done for the == of a and b.


From clameter at sgi.com  Tue Apr 22 13:22:55 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:22:55 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 02 of 12] Fix ia64 compilation failure
 because of common code include bug
In-Reply-To: <3c804dca25b15017b220.1208872278@duo.random>
References: <3c804dca25b15017b220.1208872278@duo.random>
Message-ID: <Pine.LNX.4.64.0804221319430.3640@schroedinger.engr.sgi.com>

Looks like this is not complete. There are numerous .h files missing which 
means that various structs are undefined (fs.h and rmap.h are needed 
f.e.) which leads to surprises when dereferencing fields of these struct.

It seems that mm_types.h is expected to be included only in certain 
contexts. Could you make sure to include all necessary .h files? Or add
some docs to clarify the situation here.


From clameter at sgi.com  Tue Apr 22 13:23:16 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:23:16 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 03 of 12] get_task_mm should not succeed if
 mmput() is running and has reduced
In-Reply-To: <a6672bdeead0d41b2ebd.1208872279@duo.random>
References: <a6672bdeead0d41b2ebd.1208872279@duo.random>
Message-ID: <Pine.LNX.4.64.0804221323100.3640@schroedinger.engr.sgi.com>

Missing signoff by you.


From clameter at sgi.com  Tue Apr 22 13:24:21 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:24:21 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
 outside the PT lock (first and not last
In-Reply-To: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
Message-ID: <Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>

Reverts a part of an earlier patch. Why isnt this merged into 1 of 12?


From clameter at sgi.com  Tue Apr 22 13:25:09 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:25:09 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 05 of 12] Move the tlb flushing into
 free_pgtables. The conversion of the locks
In-Reply-To: <ee8c0644d5f67c1ef591.1208872281@duo.random>
References: <ee8c0644d5f67c1ef591.1208872281@duo.random>
Message-ID: <Pine.LNX.4.64.0804221324460.3640@schroedinger.engr.sgi.com>

Why are the subjects all screwed up? They are the first line of the 
description instead of the subject line of my patches.


From clameter at sgi.com  Tue Apr 22 13:26:13 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:26:13 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 10 of 12] Convert mm_lock to use semaphores
 after i_mmap_lock and anon_vma_lock
In-Reply-To: <f8210c45f1c6f8b38d15.1208872286@duo.random>
References: <f8210c45f1c6f8b38d15.1208872286@duo.random>
Message-ID: <Pine.LNX.4.64.0804221325490.3640@schroedinger.engr.sgi.com>

Doing the right patch ordering would have avoided this patch and allow 
better review.


From clameter at sgi.com  Tue Apr 22 13:28:25 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:28:25 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <20080422184335.GN24536@duo.random>
References: <patchbomb.1208872276@duo.random> <20080422182213.GS22493@sgi.com>
	<20080422184335.GN24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804221327130.3640@schroedinger.engr.sgi.com>

On Tue, 22 Apr 2008, Andrea Arcangeli wrote:

> My patch order and API backward compatible extension over the patchset
> is done to allow 2.6.26 to fully support KVM/GRU and 2.6.27 to support
> XPMEM as well. KVM/GRU won't notice any difference once the support
> for XPMEM is added, but even if the API would completely change in
> 2.6.27, that's still better than no functionality at all in 2.6.26.

Please redo the patchset with the right order. To my knowledge there is no 
chance of this getting merged for 2.6.26.


From clameter at sgi.com  Tue Apr 22 13:30:53 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 13:30:53 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <20080422194223.GT22493@sgi.com>
References: <patchbomb.1208872276@duo.random> <20080422182213.GS22493@sgi.com>
	<20080422184335.GN24536@duo.random>
	<20080422194223.GT22493@sgi.com>
Message-ID: <Pine.LNX.4.64.0804221328400.3640@schroedinger.engr.sgi.com>

On Tue, 22 Apr 2008, Robin Holt wrote:

> putting it back into your patch/agreeing to it remaining in Andrea's
> patch?  If not, I think we can put this issue aside until Andrew gets
> out of the merge window and can decide it.  Either way, the patches
> become much more similar with this in.

One solution would be to separate the invalidate_page() callout into a
patch at the very end that can be omitted. AFACIT There is no compelling 
reason to have this callback and it complicates the API for the device 
driver writers. Not having this callback makes the way that mmu notifiers 
are called from the VM uniform which is a desirable goal.


From holt at sgi.com  Tue Apr 22 13:31:14 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 15:31:14 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
Message-ID: <20080422203114.GQ30298@sgi.com>

On Tue, Apr 22, 2008 at 01:19:29PM -0700, Christoph Lameter wrote:
> Thanks for adding most of my enhancements. But
> 
> 1. There is no real need for invalidate_page(). Can be done with 
> 	invalidate_start/end. Needlessly complicates the API. One
> 	of the objections by Andrew was that there mere multiple
> 	callbacks that perform similar functions.

While I agree with that reading of Andrew's email about invalidate_page,
I think the GRU hardware makes a strong enough case to justify the two
seperate callouts.

Due to the GRU hardware, we can assure that invalidate_page terminates all
pending GRU faults (that includes faults that are just beginning) and can
therefore be completed without needing any locking.  The invalidate_page()
callout gets turned into a GRU flush instruction and we return.

Because the invalidate_range_start() leaves the page table information
available, we can not use a single page _start to mimick that
functionality.  Therefore, there is a documented case justifying the
seperate callouts.

I agree the case is fairly weak, but it does exist.  Given Andrea's
unwillingness to move and Jack's documented case, it is my opinion the
most likely compromise is to leave in the invalidate_page() callout.

Thanks,
Robin


From veos at bpierce.com  Tue Apr 22 13:40:06 2008
From: veos at bpierce.com (Seth Gallegos)
Date: Tue, 22 Apr 2008 22:40:06 +0200
Subject: [ofa-general] Re: From me scent of love 
Message-ID: <064209810.18352688586832@bpierce.com>

Don't get left behind, smell like a winner!
Get all girls around!

http://www.icnha.net/r/


From rdreier at cisco.com  Tue Apr 22 13:46:38 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 13:46:38 -0700
Subject: [ofa-general] Re: [PATCH] IPoIB 4K MTU support
In-Reply-To: <1208681551.5271.11.camel@localhost.localdomain> (Shirley Ma's
	message of "Sun, 20 Apr 2008 01:52:31 -0700")
References: <1208681551.5271.11.camel@localhost.localdomain>
Message-ID: <adaprshehxt.fsf@cisco.com>

Thanks, applied with some cleanups as below.

As an aside, in the case where we need to use a fragment in the receive
skb, does it make sense to make the initial linear part bigger so the
TCP and IP headers fit there (and the kernel doesn't have to look into
the fragment list to handle the packet)?

Also, is there any clean way where a kernel with PAGE_SIZE > 4096 can
have ud_need_sg evaluate to 0 at compile time, so that all the unneeded
code can be thrown out by the compiler?

 > +	return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0;

I've never understood this style: it makes no sense to do

	return bool ? 1 : 0;

instead of just

	return bool;

 > +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv,
 > +					 u64 mapping[IPOIB_UD_RX_SG])
 > +{
 > +	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
 > +		ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE);
 > +		ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE);
 > +	} else
 > +		ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE);
 > +}
 > +
 > +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv,
 > +					  struct sk_buff *skb,
 > +					  unsigned int length)
 > +{
 > +	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
 > +		skb_frag_t *frag = &skb_shinfo(skb)->frags[0];
 > +		/*
 > +		 * There is only two buffers needed for max_payload = 4K,
 > +		 * first buf size is IPOIB_UD_HEAD_SIZE
 > +		 */
 > +		skb->tail += IPOIB_UD_HEAD_SIZE;
 > +		frag->size = length - IPOIB_UD_HEAD_SIZE;
 > +		skb->data_len += frag->size;
 > +		skb->truesize += frag->size;
 > +		skb->len += length;
 > +	} else
 > +		skb_put(skb, length);
 > +
 > +}

These are pretty big to put in a header file as inlines... I moved them
to the only .c file where they're used.

 - R.


From rdreier at cisco.com  Tue Apr 22 13:55:11 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 13:55:11 -0700
Subject: [ofa-general] Re: [PATCH 5/5] IB/ehca: Bump version number to 0026
In-Reply-To: <200804211008.17023.fenkes@de.ibm.com> (Joachim Fenkes's message
	of "Mon, 21 Apr 2008 09:08:16 +0100")
References: <200804211003.10695.fenkes@de.ibm.com>
	<200804211008.17023.fenkes@de.ibm.com>
Message-ID: <adalk35ehjk.fsf@cisco.com>

thanks, applied all 5.


From jgunthorpe at obsidianresearch.com  Tue Apr 22 14:00:49 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 22 Apr 2008 15:00:49 -0600
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
	<000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
Message-ID: <20080422210049.GA17925@obsidianresearch.com>

On Tue, Apr 22, 2008 at 12:17:13PM -0700, Sean Hefty wrote:

> >I can't find any of this in any of the cm libraries - and this is the
> >sort of thing I was expecting to find in kernel cm.c, since other than
> >letting the client on the passive side specify lower limits there
> >really isn't much latitude here.
 
> The initiator_depth and responder_resources are control by the CM
> ULP, and are specified when calling send_cm_req / send_cm_rep.  The
> exchanged values are reported through the req_event/rep_event
> parameters.

Yes, but the actual programming of the values into the QP is done by
cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which
takes the values from the rep/req directly, without modification.

Look at for instance the entire stack, none of SRP, ISER or IPOIB
touch max_*_rd_atomic, they all rely on cm_init_*_attr to set them
properly. I guess these are not entierly good examples since they are
generally not acting as the passive side (I don't have the target
patches for SRP/ISER handy..)

There is a bug here, it just isn't really obvious to me where the
fixes should go to match the CM design. I was imagining that cm.c
would adjust the REQ after reception, but there may be some downsides
to that?

> The behavior that you're describing is done by the kernel cm.  Look in
> ib_send_cm_req / ib_send_cm_rep / cm_req_handler / cm_rep_handler.

All that I see in here is switching REQ's responder_resources value
into the REQ's initiator_depth value (and vice versa) it does not
limit it.

> The active side initiator_depth and responder_resources are set by
> the active side when calling ib_send_cm_req.  The passive side
> initializes its values to the data carried in the REQ.  When the
> passive sides sends a REP, it is allowed to reduce the values.  The
> CM adjusts both the passive and active side values based on the data
> in the REP.

Well, I see how the override gets into the REP, but how does the REQ
get factored into the override? For instance, the rping example does
this:

        memset(&conn_param, 0, sizeof conn_param);
        conn_param.responder_resources = 1;
        conn_param.initiator_depth = 1;
        ret = rdma_accept(cb->child_cm_id, &conn_param);

And rdma_accept does:

   ret = ucma_valid_param(id_priv, conn_param);
            [^^ Only checks local device capabilities]
   ret = ucma_modify_qp_rtr(id, conn_param);
[.. then on to ucma_modify_qp_rtr .. ]
        if (conn_param)
                qp_attr.max_dest_rd_atomic = conn_param->responder_resources;
        return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask);

Which just can't be entirely right. The client can specify values that
are greater than those specified in the REQ. Since the client doesn't
seem to have access to the REQ prior to calling rdma_accept the
responsibility to limit the values must fall on librdmacm.

Maybe something more like this in ucma_modify_qp_rtr:

if (conn_param) {
   /* Note: at this point qp_attr.max_dest_rd_atomic is
      REQ.initiator_depth. */
   conn_param->responder_resouces = min(conn_param->responder_resouces,
                                        qp_attr.max_dest_rd_atomic,
                                        id_priv->cma_dev->max_responder_resources);
   qp_attr.max_dest_rd_atomic = conn_param->responder_resouces;

   /* Note: at this point qp_attr.max_rd_atomic is
      REQ.responder_resources. */
   conn_param->initiator_depth = min(conn_param->initiator_depth,
                                        qp_attr.max_rd_atomic,
                                        id_priv->cma_dev->max_initiator_depth);
   qp_attr.max_rd_atomic = conn_param->initiator_depth;
}

ie, consider the REQ values as reported through rdma_init_qp_attr,
and limit the user's requested values on the passive side to be no
greater than what the remote can do/

Also support user passive side control over initiator depth.

A similar kind of problem exists in the normal CM.

Thanks,
Jason


From weiny2 at llnl.gov  Tue Apr 22 14:06:01 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Tue, 22 Apr 2008 14:06:01 -0700
Subject: [ofa-general] [PATCH] opensm/configure.in: Fix the QOS and prefix
 routes config file default locations
Message-ID: <20080422140601.64764e18.weiny2@llnl.gov>

>From ef37654c0917875129fa2bad2e8ee0dd0d3f8859 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Fri, 18 Apr 2008 15:51:58 -0700
Subject: [PATCH] opensm/configure.in: Fix the QOS and prefix routes config file default
locations

Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 opensm/configure.in |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/opensm/configure.in b/opensm/configure.in
index a527c91..d36d7be 100644
--- a/opensm/configure.in
+++ b/opensm/configure.in
@@ -162,7 +162,7 @@ AC_ARG_WITH(qos-policy-conf,
 )
 AC_MSG_RESULT($QOS_POLICY_FILE)
 AC_DEFINE_UNQUOTED(HAVE_DEFAULT_QOS_POLICY_FILE,
-	["$OPENSM_CONFIG/$QOS_POLICY_FILE"],
+	["$OPENSM_CONFIG_DIR/$QOS_POLICY_FILE"],
 	[Define a QOS policy config file])
 AC_SUBST(QOS_POLICY_FILE)
 
@@ -182,7 +182,7 @@ AC_ARG_WITH(prefix-routes-conf,
 )
 AC_MSG_RESULT($PREFIX_ROUTES_FILE)
 AC_DEFINE_UNQUOTED(HAVE_DEFAULT_PREFIX_ROUTES_FILE,
-	["$OPENSM_CONFIG/$PREFIX_ROUTES_FILE"],
+	["$OPENSM_CONFIG_DIR/$PREFIX_ROUTES_FILE"],
 	[Define a Prefix Routes config file])
 AC_SUBST(PREFIX_ROUTES_FILE)
 
-- 
1.5.1


From or.gerlitz at gmail.com  Tue Apr 22 14:01:37 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Wed, 23 Apr 2008 00:01:37 +0300
Subject: [ofa-general] arp or ip patch to build a neigh permanent entry
	for IPoIB
In-Reply-To: <1208812763.22166.4.camel@localhost.localdomain>
References: <1208812763.22166.4.camel@localhost.localdomain>
Message-ID: <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com>

On 4/22/08, Shirley Ma <mashirle at us.ibm.com> wrote:
>
> I am debugging an ipoib ping problem on a cluster. The arp, ip command
> don't support using 20 bytes HW to build a permanent entry manually.
> Can someone give me the pointer to the patch if any?
>

see http://lists.openfabrics.org/pipermail/general/2006-March/018487.html

James, any news on this? is something need to be patched into ip/arp to make
this possible?

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080423/1eea02b6/attachment.html>

From rdreier at cisco.com  Tue Apr 22 14:20:25 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 14:20:25 -0700
Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core (MP
	support, Patch 1)
In-Reply-To: <480D8660.3060001@mellanox.co.il> (Yevgeny Petrilin's message of
	"Tue, 22 Apr 2008 09:32:00 +0300")
References: <480D8660.3060001@mellanox.co.il>
Message-ID: <adahcdtegdi.fsf@cisco.com>

thanks, applied, except:

 > diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
 > index ff7df1a..9c87dd3 100644
 > --- a/include/linux/mlx4/device.h
 > +++ b/include/linux/mlx4/device.h

 > +#include <rdma/ib_umem.h>
 > +

 > +struct mlx4_user_db_page {
 > +	struct list_head	list;
 > +	struct ib_umem	       *umem;
 > +	unsigned long		user_virt;
 > +	int			refcnt;
 > +};

I didn't see any reason to move this into generic core code, so I left
it where it was.


From arlin.r.davis at intel.com  Tue Apr 22 14:28:03 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Tue, 22 Apr 2008 14:28:03 -0700
Subject: [ofa-general] [PATCH 1/1][v1.2] dapl: evd_alloc doesn't check for
	ib_wait_object_create errors.
Message-ID: <B0095134066CC94FBC80973103FFA1FE06E65F42@orsmsx416.amr.corp.intel.com>


Fix error check in dapls_ib_wait_object_create() and dat_evd_alloc.
When attempting to create large number of evd's that exceed
open files limit the error was not propagated up causing
a segfault. Note: there are 3 FD's required for each EVD
2 for pipe, and one for ibv_comp_channel.

Change the error reporting to indicate correct return
code and log with non-debug builds.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_evd_util.c    |    5 +++++
 dapl/openib_cma/dapl_ib_cq.c   |    4 ++--
 dapl/openib_cma/dapl_ib_util.h |    4 +---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c
index 39a8dd9..36b776c 100644
--- a/dapl/common/dapl_evd_util.c
+++ b/dapl/common/dapl_evd_util.c
@@ -243,6 +243,11 @@ dapls_evd_alloc (
 	    ((evd_flags & ~ (DAT_EVD_DTO_FLAG|DAT_EVD_RMR_BIND_FLAG)) ==
0 ))
     {
         dapls_ib_wait_object_create (evd_ptr,
&evd_ptr->cq_wait_obj_handle);
+	if (evd_ptr->cq_wait_obj_handle == NULL) {
+		dapl_os_free(evd_ptr, sizeof (DAPL_EVD));
+		evd_ptr = NULL;
+		goto bail;
+	}
     }
 #endif
 
diff --git a/dapl/openib_cma/dapl_ib_cq.c b/dapl/openib_cma/dapl_ib_cq.c
index ab4eafc..25b4551 100644
--- a/dapl/openib_cma/dapl_ib_cq.c
+++ b/dapl/openib_cma/dapl_ib_cq.c
@@ -250,7 +250,7 @@ dapls_ib_cq_alloc(IN DAPL_IA *ia_ptr,
 					      channel, 0);
 	
 	if (evd_ptr->ib_cq_handle == IB_INVALID_HANDLE) 
-		return	DAT_INSUFFICIENT_RESOURCES;
+		return(dapl_convert_errno(errno,"create_cq"));
 
 	/* arm cq for events */
 	dapls_set_cq_notify(ia_ptr, evd_ptr);
@@ -469,7 +469,7 @@ dapls_ib_wait_object_create(IN DAPL_EVD *evd_ptr,
 bail:
 	dapl_os_free(*p_cq_wait_obj_handle, 
 		     sizeof(struct _ib_wait_obj_handle));
-
+	*p_cq_wait_obj_handle = NULL;
 	return(dapl_convert_errno(errno," wait_object_create"));
 }
 
diff --git a/dapl/openib_cma/dapl_ib_util.h
b/dapl/openib_cma/dapl_ib_util.h
index 457d26b..93f4fde 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -314,11 +314,9 @@ dapl_convert_errno( IN int err, IN const char *str
)
 {
     if (!err)	return DAT_SUCCESS;
     	
-#if DAPL_DBG
     if ((err != EAGAIN) && (err != ETIME) && 
 	(err != ETIMEDOUT) && (err != EINTR))
-	dapl_dbg_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err));
-#endif 
+	dapl_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err));
 
     switch( err )
     {
-- 
1.5.2.5


From arlin.r.davis at intel.com  Tue Apr 22 14:28:19 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 22 Apr 2008 14:28:19 -0700
Subject: [ofa-general] [PATCH 1/1][v2.0] dapl: evd_alloc doesn't check for
	ib_wait_object_create errors.
Message-ID: <001c01c8a4bf$c8f3d170$9f97070a@amr.corp.intel.com>


Fix error check in dapls_ib_wait_object_create() and dat_evd_alloc.
When attempting to create large number of evd's that exceed
open files limit the error was not propagated up causing
a segfault. Note: there are 3 FD's required for each EVD
2 for pipe, and one for ibv_comp_channel.

Change the error reporting to indicate correct return
code and log with non-debug builds.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_evd_util.c    |    5 +++++
 dapl/openib_cma/dapl_ib_cq.c   |    4 ++--
 dapl/openib_cma/dapl_ib_util.h |    4 +---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c
index 2ae1b59..32fbaba 100755
--- a/dapl/common/dapl_evd_util.c
+++ b/dapl/common/dapl_evd_util.c
@@ -301,6 +301,11 @@ dapls_evd_alloc (
 	    ((evd_flags & ~ (DAT_EVD_DTO_FLAG|DAT_EVD_RMR_BIND_FLAG)) == 0 ))
     {
         dapls_ib_wait_object_create (evd_ptr, &evd_ptr->cq_wait_obj_handle);
+	if (evd_ptr->cq_wait_obj_handle == NULL) {
+		dapl_os_free(evd_ptr, sizeof (DAPL_EVD));
+		evd_ptr = NULL;
+		goto bail;
+	}
     }
 #endif
 
diff --git a/dapl/openib_cma/dapl_ib_cq.c b/dapl/openib_cma/dapl_ib_cq.c
index f63c9a7..d7b3309 100755
--- a/dapl/openib_cma/dapl_ib_cq.c
+++ b/dapl/openib_cma/dapl_ib_cq.c
@@ -239,7 +239,7 @@ dapls_ib_cq_alloc(IN DAPL_IA *ia_ptr,
 					      channel, 0);
 	
 	if (evd_ptr->ib_cq_handle == IB_INVALID_HANDLE) 
-		return	DAT_INSUFFICIENT_RESOURCES;
+		return(dapl_convert_errno(errno,"create_cq"));
 
 	/* arm cq for events */
 	dapls_set_cq_notify(ia_ptr, evd_ptr);
@@ -458,7 +458,7 @@ dapls_ib_wait_object_create(IN DAPL_EVD *evd_ptr,
 bail:
 	dapl_os_free(*p_cq_wait_obj_handle, 
 		     sizeof(struct _ib_wait_obj_handle));
-
+	*p_cq_wait_obj_handle = NULL;
 	return(dapl_convert_errno(errno," wait_object_create"));
 }
 
diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h
index 370f3b1..71593fd 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -305,11 +305,9 @@ dapl_convert_errno( IN int err, IN const char *str )
 {
     if (!err)	return DAT_SUCCESS;
     	
-#if DAPL_DBG
     if ((err != EAGAIN) && (err != ETIME) && 
 	(err != ETIMEDOUT) && (err != EINTR))
-	dapl_dbg_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err));
-#endif 
+	dapl_log (DAPL_DBG_TYPE_ERR," %s %s\n", str, strerror(err));
 
     switch( err )
     {
-- 
1.5.2.5


From mashirle at us.ibm.com  Tue Apr 22 07:17:28 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Tue, 22 Apr 2008 07:17:28 -0700
Subject: [ofa-general] Re: [PATCH] IPoIB 4K MTU support
In-Reply-To: <adaprshehxt.fsf@cisco.com>
References: <1208681551.5271.11.camel@localhost.localdomain>
	<adaprshehxt.fsf@cisco.com>
Message-ID: <1208873848.14172.1.camel@localhost.localdomain>

Hello Roland,

On Tue, 2008-04-22 at 13:46 -0700, Roland Dreier wrote:
> Thanks, applied with some cleanups as below.
Thanks!

> As an aside, in the case where we need to use a fragment in the receive
> skb, does it make sense to make the initial linear part bigger so the
> TCP and IP headers fit there (and the kernel doesn't have to look into
> the fragment list to handle the packet)?
We can improve this later.

> Also, is there any clean way where a kernel with PAGE_SIZE > 4096 can
> have ud_need_sg evaluate to 0 at compile time, so that all the unneeded
> code can be thrown out by the compiler?
> 
>  > +	return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0;
> 
> I've never understood this style: it makes no sense to do
> 
> 	return bool ? 1 : 0;
> 
> instead of just
> 
> 	return bool;
You are right.

>  > +static inline void ipoib_ud_dma_unmap_rx(struct ipoib_dev_priv *priv,
>  > +					 u64 mapping[IPOIB_UD_RX_SG])
>  > +{
>  > +	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
>  > +		ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE);
>  > +		ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE);
>  > +	} else
>  > +		ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE);
>  > +}
>  > +
>  > +static inline void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv,
>  > +					  struct sk_buff *skb,
>  > +					  unsigned int length)
>  > +{
>  > +	if (ipoib_ud_need_sg(priv->max_ib_mtu)) {
>  > +		skb_frag_t *frag = &skb_shinfo(skb)->frags[0];
>  > +		/*
>  > +		 * There is only two buffers needed for max_payload = 4K,
>  > +		 * first buf size is IPOIB_UD_HEAD_SIZE
>  > +		 */
>  > +		skb->tail += IPOIB_UD_HEAD_SIZE;
>  > +		frag->size = length - IPOIB_UD_HEAD_SIZE;
>  > +		skb->data_len += frag->size;
>  > +		skb->truesize += frag->size;
>  > +		skb->len += length;
>  > +	} else
>  > +		skb_put(skb, length);
>  > +
>  > +}
> 
> These are pretty big to put in a header file as inlines... I moved them
> to the only .c file where they're used.
> 
>  - R.
Right. I should have moved it into .c file from Or's comment. I forgot. 

Thanks.
Shirley


From sean.hefty at intel.com  Tue Apr 22 15:23:30 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 15:23:30 -0700
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <20080422210049.GA17925@obsidianresearch.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
	<000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
	<20080422210049.GA17925@obsidianresearch.com>
Message-ID: <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com>

>Yes, but the actual programming of the values into the QP is done by
>cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which
>takes the values from the rep/req directly, without modification.

The values exchanged in the REP are saved to cm_id_priv.  Those values are used.
The passive side ULP is responsible for using the correct value.  Either by
returning what was sent in the REQ, or by adjusting the values down.  Note that
the active side will see the values in the REP and can reject the connection if
they are set too large.
 
>There is a bug here, it just isn't really obvious to me where the
>fixes should go to match the CM design. I was imagining that cm.c
>would adjust the REQ after reception, but there may be some downsides
>to that?

The CM does adjust the value in the cm_id_priv structure based on the REP.

>All that I see in here is switching REQ's responder_resources value
>into the REQ's initiator_depth value (and vice versa) it does not
>limit it.

The limits are left up to the ULP.  Maybe the problem is that the ULPs are not
validating the limits?

>Well, I see how the override gets into the REP, but how does the REQ
>get factored into the override? For instance, the rping example does
>this:
>
>        memset(&conn_param, 0, sizeof conn_param);
>        conn_param.responder_resources = 1;
>        conn_param.initiator_depth = 1;
>        ret = rdma_accept(cb->child_cm_id, &conn_param);
>
>And rdma_accept does:
>
>   ret = ucma_valid_param(id_priv, conn_param);
>            [^^ Only checks local device capabilities]

This is a sanity check only, intended to help catch errors sooner.  Since it is
also used on the active side before sending a REQ, it can only check against the
local device capabilities.  The sanity check could be expanded, but I don't see
a strong reason to add it.  The modify QP operations will fail later if the
specified values are too large.

>   ret = ucma_modify_qp_rtr(id, conn_param);
>[.. then on to ucma_modify_qp_rtr .. ]
>        if (conn_param)
>                qp_attr.max_dest_rd_atomic = conn_param->responder_resources;
>        return ibv_modify_qp(id->qp, &qp_attr, qp_attr_mask);
>
>Which just can't be entirely right. The client can specify values that
>are greater than those specified in the REQ. Since the client doesn't
>seem to have access to the REQ prior to calling rdma_accept the
>responsibility to limit the values must fall on librdmacm.

The rdma_conn_param structure reported as part of a connection event carries the
initiator_depth and responder_resources fields in the REQ.  Yes, the client can
specify values that were greater than those in the REQ, but those values may
technically still work.

>Maybe something more like this in ucma_modify_qp_rtr:
>
>if (conn_param) {
>   /* Note: at this point qp_attr.max_dest_rd_atomic is
>      REQ.initiator_depth. */
>   conn_param->responder_resouces = min(conn_param->responder_resouces,
>                                        qp_attr.max_dest_rd_atomic,
>                                        id_priv->cma_dev-
>>max_responder_resources);
>   qp_attr.max_dest_rd_atomic = conn_param->responder_resouces;
>
>   /* Note: at this point qp_attr.max_rd_atomic is
>      REQ.responder_resources. */
>   conn_param->initiator_depth = min(conn_param->initiator_depth,
>                                        qp_attr.max_rd_atomic,
>                                        id_priv->cma_dev->max_initiator_depth);
>   qp_attr.max_rd_atomic = conn_param->initiator_depth;
>}
>
>ie, consider the REQ values as reported through rdma_init_qp_attr,
>and limit the user's requested values on the passive side to be no
>greater than what the remote can do/

I don't like the idea of reducing the limits without the user's knowledge.  I
would rather fail the connection, which is what happens today (either through
the ucma_valid_param() checks or when modifying the QP).

>Also support user passive side control over initiator depth.

This is there today.

I think I'm missing seeing whatever problem you're seeing.

- Sean


From rdreier at cisco.com  Tue Apr 22 15:29:51 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 22 Apr 2008 15:29:51 -0700
Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Print IPv4 addresses in a
	readable format
Message-ID: <ada63u9ed5s.fsf@cisco.com>

Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in
debugging output.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/nes/nes.c       |    5 +++--
 drivers/infiniband/hw/nes/nes_cm.c    |   13 +++++++------
 drivers/infiniband/hw/nes/nes_utils.c |    4 +++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c
index c0671ad..a4e9269 100644
--- a/drivers/infiniband/hw/nes/nes.c
+++ b/drivers/infiniband/hw/nes/nes.c
@@ -139,8 +139,9 @@ static int nes_inetaddr_event(struct notifier_block *notifier,
 
 	addr = ntohl(ifa->ifa_address);
 	mask = ntohl(ifa->ifa_mask);
-	nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address %08X, netmask %08X.\n",
-			addr, mask);
+	nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address " NIPQUAD_FMT
+		  ", netmask " NIPQUAD_FMT ".\n",
+		  HIPQUAD(addr), HIPQUAD(mask));
 	list_for_each_entry(nesdev, &nes_dev_list, list) {
 		nes_debug(NES_DBG_NETDEV, "Nesdev list entry = 0x%p. (%s)\n",
 				nesdev, nesdev->netdev[0]->name);
diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index b53bceb..38ea14c 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -852,8 +852,8 @@ static struct nes_cm_node *find_node(struct nes_cm_core *cm_core,
 	/* get a handle on the hte */
 	hte = &cm_core->connected_nodes;
 
-	nes_debug(NES_DBG_CM, "Searching for an owner node:%x:%x from core %p->%p\n",
-			loc_addr, loc_port, cm_core, hte);
+	nes_debug(NES_DBG_CM, "Searching for an owner node: " NIPQUAD_FMT ":%x from core %p->%p\n",
+		  HIPQUAD(loc_addr), loc_port, cm_core, hte);
 
 	/* walk list and find cm_node associated with this session ID */
 	spin_lock_irqsave(&cm_core->ht_lock, flags);
@@ -902,8 +902,8 @@ static struct nes_cm_listener *find_listener(struct nes_cm_core *cm_core,
 	}
 	spin_unlock_irqrestore(&cm_core->listen_list_lock, flags);
 
-	nes_debug(NES_DBG_CM, "Unable to find listener- %x:%x\n",
-			dst_addr, dst_port);
+	nes_debug(NES_DBG_CM, "Unable to find listener for " NIPQUAD_FMT ":%x\n",
+		  HIPQUAD(dst_addr), dst_port);
 
 	/* no listener */
 	return NULL;
@@ -1067,8 +1067,9 @@ static struct nes_cm_node *make_cm_node(struct nes_cm_core *cm_core,
 	cm_node->loc_port = cm_info->loc_port;
 	cm_node->rem_port = cm_info->rem_port;
 	cm_node->send_write0 = send_first;
-	nes_debug(NES_DBG_CM, "Make node addresses : loc = %x:%x, rem = %x:%x\n",
-			cm_node->loc_addr, cm_node->loc_port, cm_node->rem_addr, cm_node->rem_port);
+	nes_debug(NES_DBG_CM, "Make node addresses : loc = " NIPQUAD_FMT ":%x, rem = " NIPQUAD_FMT ":%x\n",
+		  HIPQUAD(cm_node->loc_addr), cm_node->loc_port,
+		  HIPQUAD(cm_node->rem_addr), cm_node->rem_port);
 	cm_node->listener = listener;
 	cm_node->netdev = nesvnic->netdev;
 	cm_node->cm_id = cm_info->cm_id;
diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c
index f9db07c..c6d5631 100644
--- a/drivers/infiniband/hw/nes/nes_utils.c
+++ b/drivers/infiniband/hw/nes/nes_utils.c
@@ -660,7 +660,9 @@ int nes_arp_table(struct nes_device *nesdev, u32 ip_addr, u8 *mac_addr, u32 acti
 
 	/* DELETE or RESOLVE */
 	if (arp_index == nesadapter->arp_table_size) {
-		nes_debug(NES_DBG_NETDEV, "mac address not in ARP table - cannot delete or resolve\n");
+		nes_debug(NES_DBG_NETDEV, "MAC for " NIPQUAD_FMT " not in ARP table - cannot %s\n",
+			  HIPQUAD(ip_addr),
+			  action == NES_ARP_RESOLVE ? "resolve" : "delete");
 		return -1;
 	}
 
-- 
1.5.5.1


From andrea at qumranet.com  Tue Apr 22 15:35:45 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 00:35:45 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
Message-ID: <20080422223545.GP24536@duo.random>

On Tue, Apr 22, 2008 at 01:19:29PM -0700, Christoph Lameter wrote:
> 3. As noted by Eric and also contained in private post from yesterday by 
>    me: The cmp function needs to retrieve the value before
>    doing comparisons which is not done for the == of a and b.

I retrieved the value, which is why mm_lock works perfectly on #v13 as
well as #v12. It's not mandatory to ever return 0, so it won't produce
any runtime error (there is a bugcheck for wrong sort ordering in my
patch just in case it would generate any runtime error and it never
did, or I would have noticed before submission), which is why I didn't
need to release any hotfix yet and I'm waiting more time to get more
comments before sending an update to clean up that bit.

Mentioning this as the third and last point I guess shows how strong
are your arguments against merging my mmu-notifier-core now, so in the
end doing that cosmetical error payed off somehow.

I'll send an update in any case to Andrew way before Saturday so
hopefully we'll finally get mmu-notifiers-core merged before next
week. Also I'm not updating my mmu-notifier-core patch anymore except
for strict bugfixes so don't worry about any more cosmetical bugs
being introduced while optimizing the code like it happened this time.

The only other change I did has been to move mmu_notifier_unregister
at the end of the patchset after getting more questions about its
reliability and I documented a bit the rmmod requirements for
->release. we'll think later if it makes sense to add it, nobody's
using it anyway.


From andrea at qumranet.com  Tue Apr 22 15:37:27 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 00:37:27 +0200
Subject: [ofa-general] Re: [PATCH 03 of 12] get_task_mm should not succeed if
	mmput() is running and has reduced
In-Reply-To: <Pine.LNX.4.64.0804221323100.3640@schroedinger.engr.sgi.com>
References: <a6672bdeead0d41b2ebd.1208872279@duo.random>
	<Pine.LNX.4.64.0804221323100.3640@schroedinger.engr.sgi.com>
Message-ID: <20080422223727.GQ24536@duo.random>

On Tue, Apr 22, 2008 at 01:23:16PM -0700, Christoph Lameter wrote:
> Missing signoff by you.

I thought I had to signoff if I conributed with anything that could
resemble copyright? Given I only merged that patch, I can add an
Acked-by if you like, but merging this in my patchset was already an
implicit ack ;-).


From andrea at qumranet.com  Tue Apr 22 15:40:48 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 00:40:48 +0200
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
Message-ID: <20080422224048.GR24536@duo.random>

On Tue, Apr 22, 2008 at 01:24:21PM -0700, Christoph Lameter wrote:
> Reverts a part of an earlier patch. Why isnt this merged into 1 of 12?

To give zero regression risk to 1/12 when MMU_NOTIFIER=y or =n and the
mmu notifiers aren't registered by GRU or KVM. Keep in mind that the
whole point of my proposed patch ordering from day 0, is to keep as
1/N, the absolutely minimum change that fully satisfy GRU and KVM
requirements. 4/12 isn't required by GRU/KVM so I keep it in a later
patch. I now moved mmu_notifier_unregister in a later patch too for
the same reason.


From andrea at qumranet.com  Tue Apr 22 15:43:52 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 00:43:52 +0200
Subject: [ofa-general] Re: [PATCH 02 of 12] Fix ia64 compilation failure
	because of common code include bug
In-Reply-To: <Pine.LNX.4.64.0804221319430.3640@schroedinger.engr.sgi.com>
References: <3c804dca25b15017b220.1208872278@duo.random>
	<Pine.LNX.4.64.0804221319430.3640@schroedinger.engr.sgi.com>
Message-ID: <20080422224352.GS24536@duo.random>

On Tue, Apr 22, 2008 at 01:22:55PM -0700, Christoph Lameter wrote:
> Looks like this is not complete. There are numerous .h files missing which 
> means that various structs are undefined (fs.h and rmap.h are needed 
> f.e.) which leads to surprises when dereferencing fields of these struct.
> 
> It seems that mm_types.h is expected to be included only in certain 
> contexts. Could you make sure to include all necessary .h files? Or add
> some docs to clarify the situation here.

Robin, what other changes did you need to compile? I only did that one
because I didn't hear any more feedback from you after I sent that
patch, so I assumed it was enough.


From sean.hefty at intel.com  Tue Apr 22 15:46:51 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 15:46:51 -0700
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
Message-ID: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>

I have a need to start looking at possible ways to map IP address to GIDs when
crossing IP (and IB) subnets.  This would be in addition to or replace the ARP
use by the rdma_cm.  Possibilities include:

* Use some standard address mapping protocol that I'm not aware of.
* Use global IB service resolution.
* Define/extend an address resolution protocol that operates over IP.
* Define/extend an address resolution protocol that operates over UDP.

I'm hoping that someone has a wonderfully brilliant idea for this that would
take about 1 day to implement.  :)

- Sean


From andrea at qumranet.com  Tue Apr 22 15:54:24 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 00:54:24 +0200
Subject: [ofa-general] Re: [PATCH 10 of 12] Convert mm_lock to use semaphores
	after i_mmap_lock and anon_vma_lock
In-Reply-To: <Pine.LNX.4.64.0804221325490.3640@schroedinger.engr.sgi.com>
References: <f8210c45f1c6f8b38d15.1208872286@duo.random>
	<Pine.LNX.4.64.0804221325490.3640@schroedinger.engr.sgi.com>
Message-ID: <20080422225424.GT24536@duo.random>

On Tue, Apr 22, 2008 at 01:26:13PM -0700, Christoph Lameter wrote:
> Doing the right patch ordering would have avoided this patch and allow 
> better review.

I didn't actually write this patch myself. This did it instead:

s/anon_vma_lock/anon_vma_sem/
s/i_mmap_lock/i_mmap_sem/
s/locks/sems/
s/spinlock_t/struct rw_semaphore/

so it didn't look a big deal to redo it indefinitely.

The right patch ordering isn't necessarily the one that reduces the
total number of lines in the patchsets. The mmu-notifier-core is
already converged and can go in. The rest isn't converged at
all... nearly nobody commented on the other part (the few comments so
far were negative), so there's no good reason to delay indefinitely
what is already converged, given it's already feature complete for
certain users of the code. My patch ordering looks more natural to
me. What is finished goes in, the rest is orthogonal anyway.


From holt at sgi.com  Tue Apr 22 16:07:27 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 18:07:27 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080422223545.GP24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
Message-ID: <20080422230727.GR30298@sgi.com>

> The only other change I did has been to move mmu_notifier_unregister
> at the end of the patchset after getting more questions about its
> reliability and I documented a bit the rmmod requirements for
> ->release. we'll think later if it makes sense to add it, nobody's
> using it anyway.

XPMEM is using it.  GRU will be as well (probably already does).


From holt at sgi.com  Tue Apr 22 16:07:58 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 22 Apr 2008 18:07:58 -0500
Subject: [ofa-general] Re: [PATCH 02 of 12] Fix ia64 compilation failure
	because of common code include bug
In-Reply-To: <20080422224352.GS24536@duo.random>
References: <3c804dca25b15017b220.1208872278@duo.random>
	<Pine.LNX.4.64.0804221319430.3640@schroedinger.engr.sgi.com>
	<20080422224352.GS24536@duo.random>
Message-ID: <20080422230758.GS30298@sgi.com>

On Wed, Apr 23, 2008 at 12:43:52AM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 01:22:55PM -0700, Christoph Lameter wrote:
> > Looks like this is not complete. There are numerous .h files missing which 
> > means that various structs are undefined (fs.h and rmap.h are needed 
> > f.e.) which leads to surprises when dereferencing fields of these struct.
> > 
> > It seems that mm_types.h is expected to be included only in certain 
> > contexts. Could you make sure to include all necessary .h files? Or add
> > some docs to clarify the situation here.
> 
> Robin, what other changes did you need to compile? I only did that one
> because I didn't hear any more feedback from you after I sent that
> patch, so I assumed it was enough.

It was perfect.  Nothing else was needed.

Thanks,
Robin


From clameter at sgi.com  Tue Apr 22 16:13:20 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 16:13:20 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 03 of 12] get_task_mm should not succeed if
 mmput() is running and has reduced
In-Reply-To: <20080422223727.GQ24536@duo.random>
References: <a6672bdeead0d41b2ebd.1208872279@duo.random>
	<Pine.LNX.4.64.0804221323100.3640@schroedinger.engr.sgi.com>
	<20080422223727.GQ24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804221612290.4868@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> On Tue, Apr 22, 2008 at 01:23:16PM -0700, Christoph Lameter wrote:
> > Missing signoff by you.
> 
> I thought I had to signoff if I conributed with anything that could
> resemble copyright? Given I only merged that patch, I can add an
> Acked-by if you like, but merging this in my patchset was already an
> implicit ack ;-).

No you have to include a signoff if the patch goes through your custody 
chain. This one did.

Also add a 

From: Christoph Lameter <clameter at sgi.com>

somewhere if you want to signify that the patch came from me. 


From clameter at sgi.com  Tue Apr 22 16:14:26 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 16:14:26 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
 outside the PT lock (first and not last
In-Reply-To: <20080422224048.GR24536@duo.random>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> On Tue, Apr 22, 2008 at 01:24:21PM -0700, Christoph Lameter wrote:
> > Reverts a part of an earlier patch. Why isnt this merged into 1 of 12?
> 
> To give zero regression risk to 1/12 when MMU_NOTIFIER=y or =n and the
> mmu notifiers aren't registered by GRU or KVM. Keep in mind that the
> whole point of my proposed patch ordering from day 0, is to keep as
> 1/N, the absolutely minimum change that fully satisfy GRU and KVM
> requirements. 4/12 isn't required by GRU/KVM so I keep it in a later
> patch. I now moved mmu_notifier_unregister in a later patch too for
> the same reason.

We want a full solution and this kind of patching makes the patches 
difficuilt to review because later patches revert earlier ones.
 

From clameter at sgi.com  Tue Apr 22 16:19:06 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 16:19:06 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 10 of 12] Convert mm_lock to use semaphores
 after i_mmap_lock and anon_vma_lock
In-Reply-To: <20080422225424.GT24536@duo.random>
References: <f8210c45f1c6f8b38d15.1208872286@duo.random>
	<Pine.LNX.4.64.0804221325490.3640@schroedinger.engr.sgi.com>
	<20080422225424.GT24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804221615150.4868@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> The right patch ordering isn't necessarily the one that reduces the
> total number of lines in the patchsets. The mmu-notifier-core is
> already converged and can go in. The rest isn't converged at
> all... nearly nobody commented on the other part (the few comments so
> far were negative), so there's no good reason to delay indefinitely
> what is already converged, given it's already feature complete for
> certain users of the code. My patch ordering looks more natural to
> me. What is finished goes in, the rest is orthogonal anyway.

I would not want to review code that is later reverted or essentially 
changed in later patches. I only review your patches because we have a 
high interest in the patch. I suspect that others will be more willing to 
review this material if it would be done the right way.

If you cannot produce an easily reviewable and properly formatted patchset 
that follows conventions then I will have to do it because we really need 
to get this merged.


From clameter at sgi.com  Tue Apr 22 16:20:35 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Tue, 22 Apr 2008 16:20:35 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080422223545.GP24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> I'll send an update in any case to Andrew way before Saturday so
> hopefully we'll finally get mmu-notifiers-core merged before next
> week. Also I'm not updating my mmu-notifier-core patch anymore except
> for strict bugfixes so don't worry about any more cosmetical bugs
> being introduced while optimizing the code like it happened this time.

I guess I have to prepare another patchset then?


From steiner at sgi.com  Tue Apr 22 17:28:49 2008
From: steiner at sgi.com (Jack Steiner)
Date: Tue, 22 Apr 2008 19:28:49 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080422230727.GR30298@sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
Message-ID: <20080423002848.GA32618@sgi.com>

On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote:
> > The only other change I did has been to move mmu_notifier_unregister
> > at the end of the patchset after getting more questions about its
> > reliability and I documented a bit the rmmod requirements for
> > ->release. we'll think later if it makes sense to add it, nobody's
> > using it anyway.
> 
> XPMEM is using it.  GRU will be as well (probably already does).

Yeppp.

The GRU driver unregisters the notifier when all GRU mappings
are unmapped. I could make it work either way - either with or without
an unregister function. However, unregister is the most logical
action to take when all mappings have been destroyed.


--- jack


From steiner at sgi.com  Tue Apr 22 17:31:40 2008
From: steiner at sgi.com (Jack Steiner)
Date: Tue, 22 Apr 2008 19:31:40 -0500
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <patchbomb.1208872276@duo.random>
References: <patchbomb.1208872276@duo.random>
Message-ID: <20080423003140.GB32618@sgi.com>

On Tue, Apr 22, 2008 at 03:51:16PM +0200, Andrea Arcangeli wrote:
> Hello,
> 
> This is the latest and greatest version of the mmu notifier patch #v13.
> 

FWIW, I have updated the GRU driver to use this patch (plus the fixeups).
No problems. AFAICT, everything works.


--- jack


From jgunthorpe at obsidianresearch.com  Tue Apr 22 17:47:39 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 22 Apr 2008 18:47:39 -0600
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
	<000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
	<20080422210049.GA17925@obsidianresearch.com>
	<000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com>
Message-ID: <20080423004739.GB17925@obsidianresearch.com>

On Tue, Apr 22, 2008 at 03:23:30PM -0700, Sean Hefty wrote:
> >Yes, but the actual programming of the values into the QP is done by
> >cm_init_qp_rtr_attr/cm_init_qp_rts_attr (well, in many cases) - which
> >takes the values from the rep/req directly, without modification.
> 
> The values exchanged in the REP are saved to cm_id_priv.  Those values are used.
> The passive side ULP is responsible for using the correct value.  Either by
> returning what was sent in the REQ, or by adjusting the values down.  Note that
> the active side will see the values in the REP and can reject the connection if
> they are set too large.

Ok.. Well, if the ULP is responsible, I have yet to see a ULP example,
or in-kernel ULP that does it right. Every one ignores the REQ and/or does
not limit the REQ's values to the devices capabilities.

The other view is that the CM should just handle this and the ULP
should only have the option to further reduce the value. It is not a
parameter that affects the operation of the ULP, so having it be
lowered is not significant. The actual value can always be queried
with ibv_query_qp.

I guess that is really what it comes down to, which do you think
should be primarily responsible for this, and what should the API
be. I can't disagree with you that the ULP should be responsible given
the CM API, but that doesn't make it less awkward and annoying....

> >There is a bug here, it just isn't really obvious to me where the
> >fixes should go to match the CM design. I was imagining that cm.c
> >would adjust the REQ after reception, but there may be some downsides
> >to that?
> 
> The CM does adjust the value in the cm_id_priv structure based on the REP.

Right, but I'm talking about when the passive side generates the
REP. The contents of the REP should exactly match what the passive
side QP is set to (ie lower than the device capabilities), and always
be lower than the values in the REQ.

> >All that I see in here is switching REQ's responder_resources value
> >into the REQ's initiator_depth value (and vice versa) it does not
> >limit it.
> 
> The limits are left up to the ULP.  Maybe the problem is that the ULPs are not
> validating the limits?

That is definately true.

> This is a sanity check only, intended to help catch errors sooner.  Since it is
> also used on the active side before sending a REQ, it can only check against the
> local device capabilities.  The sanity check could be expanded, but I don't see
> a strong reason to add it.  The modify QP operations will fail later if the
> specified values are too large.

But the whole point of this process is to get a working connection -
the responder resources are not a ULP visible item, they are just
something that must be negotiated and configured into the QP. In
truth, I can think of no reason for a ULP to use any value other than
the device maximum or 0 for these resources. Saying that if the
passive side messes up it will just die when the QP is modified is,
IMHO, not good enough.

> Yes, the client can specify values that were greater than those in
> the REQ, but those values may technically still work.

I don't see how? The active side may be unable to program the QP to
those values, and using an initiator_depth larger than the peers
responder_resources will cause operational problems. The way the spec
is written it is pretty much mandatory to limit to the values in the
REQ when generating the REP.

It would be perfectly conformant (and a good idea) for the active side
to refuse to use a REP with larger values than its REQ.

> >ie, consider the REQ values as reported through rdma_init_qp_attr,
> >and limit the user's requested values on the passive side to be no
> >greater than what the remote can do/
> 
> I don't like the idea of reducing the limits without the user's knowledge.  I
> would rather fail the connection, which is what happens today (either through
> the ucma_valid_param() checks or when modifying the QP).

That is not entirely true, since the passive side's change overrides
the values in the REQ from the active side, which can reduce the value
without the users knowledge. The question really is if you expect the
CM to control this for you, or if you expect the ULP do do everything
manually. Right now there seems to be a bit of both going on.

> >Also support user passive side control over initiator depth.
> 
> This is there today.

Where? cma.c never programs max_rd_atomic in the qp.

> I think I'm missing seeing whatever problem you're seeing.

Well, what I have been interested in (Hal - what is your interest
here?) is to use the device maximum and get rid of the hard coded
values for responder resources and initiator depth in the ULPs. This
would be to allow some devices to have higher responder resources,
based on hardware capabilitity. Limited responder resources cause huge
performance problems on high latency connections.

In the process I have observed that the spec is not being followed and
there are cases where things go wrong if the two sides are not
requesting identical things. I've also observed that the examples
examples of how to use CM and RDMACM do not include the correct
behavior.

-- 
Jason Gunthorpe <jgunthorpe at obsidianresearch.com>        (780)4406067x832
Chief Technology Officer, Obsidian Research Corp         Edmonton, Canada


From a-alvarez at activead.com  Tue Apr 22 18:20:32 2008
From: a-alvarez at activead.com (Katie Thacker)
Date: Wed, 23 Apr 2008 09:20:32 +0800
Subject: [ofa-general] I'd like to show you my pic
Message-ID: <01c8a523$47607000$f578a13d@a-alvarez>

Hello! I am tired this afternoon. I am nice girl that would like to chat with you. Email me at Marie at themayle.cn only, because I am using my friend's email to write this. To see my pics


From jgunthorpe at obsidianresearch.com  Tue Apr 22 20:52:42 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 22 Apr 2008 21:52:42 -0600
Subject: [ofa-general] [PATCH] Fixup handling of responder_resources in
	cmpost.c example
Message-ID: <20080423035242.GA24343@obsidianresearch.com>

Sean,

For better clarity, here is an example of what I am looking at. This
modifies the cmpost example of libibcm to handle responder resources
negotiation at the ULP level. So far, all of the in-kernel and example
user space CM and RDMA CM consumers I have looked at appear to need a
patch like this.

This is why I am wondering if moving this whole common process into the
kernel and sharing it with all ULPs might be more appropriate.
----

Show a more realistic example using maximum responder resources and
what is required for that to work:
- Limit the responder resources to device capability while producing the
  REP
- Use the device capability values in generating the REQ
- Match the passive side QP configuration to the REQ
- Notes on initiator_depth and responder_resources value selection

Signed-off-by: Jason Gunthorpe <jgunthorpe at obsidianresearch.com>
---
 examples/cmpost.c |   48 +++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/examples/cmpost.c b/examples/cmpost.c
index a85264b..1d876dd 100644
--- a/examples/cmpost.c
+++ b/examples/cmpost.c
@@ -50,6 +50,7 @@ struct cmtest {
 	struct ib_cm_device	*cm_dev;
 	struct ibv_context	*verbs;
 	struct ibv_pd		*pd;
+	struct ibv_device_attr  dev_attr;
 
 	/* cm info */
 	struct ibv_sa_path_rec	path_rec;
@@ -106,7 +107,8 @@ static int post_recvs(struct cmtest_node *node)
 	return ret;
 }
 
-static int modify_to_rtr(struct cmtest_node *node)
+static int modify_to_rtr(struct cmtest_node *node,
+			 struct ib_cm_rep_param *rep)
 {
 	struct ibv_qp_attr qp_attr;
 	int qp_attr_mask, ret;
@@ -129,6 +131,10 @@ static int modify_to_rtr(struct cmtest_node *node)
 		return ret;
 	}
 	qp_attr.rq_psn = node->qp->qp_num;
+	if (rep != NULL) {
+		qp_attr.max_dest_rd_atomic = rep->responder_resources;
+		qp_attr.max_rd_atomic = rep->initiator_depth;
+	}
 	ret = ibv_modify_qp(node->qp, &qp_attr, qp_attr_mask);
 	if (ret) {
 		printf("failed to modify QP to RTR: %d\n", ret);
@@ -167,10 +173,27 @@ static void req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 		goto error1;
 	node = &test.nodes[test.conn_index++];
 
+	req = &event->param.req_rcvd;
+	memset(&rep, 0, sizeof rep);
+
+	/* Limit the responder resources requested by the remote to our
+	   capabilities. Note that the kernel swaps req->responder_resources
+	   and req->initiator_depth, so that req->responder_resources
+	   is actually the active side's initiator_depth. */
+	rep.responder_resources = req->responder_resources;
+	if (rep.responder_resources > test.dev_attr.max_qp_rd_atom)
+		rep.responder_resources = test.dev_attr.max_qp_rd_atom;
+
+	/* Note: If this side of the connection is never going to use
+	   RDMA Read then initiator_depth can be set to 0 here. */
+	rep.initiator_depth = req->initiator_depth;
+	if (rep.initiator_depth > test.dev_attr.max_qp_init_rd_atom)
+		rep.initiator_depth = test.dev_attr.max_qp_init_rd_atom;
+
 	node->cm_id = cm_id;
 	cm_id->context = node;
 
-	ret = modify_to_rtr(node);
+	ret = modify_to_rtr(node,&rep);
 	if (ret)
 		goto error2;
 
@@ -178,13 +201,9 @@ static void req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 	if (ret)
 		goto error2;
 
-	req = &event->param.req_rcvd;
-	memset(&rep, 0, sizeof rep);
 	rep.qp_num = node->qp->qp_num;
 	rep.srq = (node->qp->srq != NULL);
 	rep.starting_psn = node->qp->qp_num;
-	rep.responder_resources = req->responder_resources;
-	rep.initiator_depth = req->initiator_depth;
 	rep.target_ack_delay = 20;
 	rep.flow_control = req->flow_control;
 	rep.rnr_retry_count = req->rnr_retry_count;
@@ -207,7 +226,7 @@ static void rep_handler(struct cmtest_node *node, struct ib_cm_event *event)
 {
 	int ret;
 
-	ret = modify_to_rtr(node);
+	ret = modify_to_rtr(node,0);
 	if (ret)
 		goto error;
 
@@ -428,6 +447,9 @@ static int init(void)
 	if (!test.verbs)
 		return -1;
 
+	if (ibv_query_device(test.verbs,&test.dev_attr) != 0)
+		return -1;
+
 	test.cm_dev = ib_cm_open_device(test.verbs);
 	if (!test.cm_dev)
 		return -1;
@@ -671,8 +693,16 @@ static void run_client(char *dst)
 	memset(&req, 0, sizeof req);
 	req.primary_path = &test.path_rec;
 	req.service_id = __cpu_to_be64(0x1000);
-	req.responder_resources = 1;
-	req.initiator_depth = 1;
+
+	/* When choosing the responder resources for a ULP, it is usually best
+	   to use the maximum value of the HCA. If the other side is not going
+	   to use RDMA READ then it should zero out initator_depth in the REP
+	   which will zero out the local responder_resources when we program
+	   the QP. Generally, initiator_depth should be either set to 0 or
+	   min(max_qp_rd_atom,max_send_wr). Use 0 if RDMA READ is never going
+	   to be sent from this side. */
+	req.responder_resources = test.dev_attr.max_qp_rd_atom;
+	req.initiator_depth = test.dev_attr.max_qp_init_rd_atom;
 	req.remote_cm_response_timeout = 20;
 	req.local_cm_response_timeout = 20;
 	req.retry_count = 5;
-- 
1.5.4.2


From sean.hefty at intel.com  Tue Apr 22 20:58:21 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 20:58:21 -0700
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <20080423004739.GB17925@obsidianresearch.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
	<000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
	<20080422210049.GA17925@obsidianresearch.com>
	<000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com>
	<20080423004739.GB17925@obsidianresearch.com>
Message-ID: <000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com>

>Ok.. Well, if the ULP is responsible, I have yet to see a ULP example,
>or in-kernel ULP that does it right. Every one ignores the REQ and/or does
>not limit the REQ's values to the devices capabilities.

I believe that DAPL does negotiate the values correctly.  But see the end of
this email for a way to simply things for the ULPs.

>But the whole point of this process is to get a working connection -
>the responder resources are not a ULP visible item, they are just
>something that must be negotiated and configured into the QP. In
>truth, I can think of no reason for a ULP to use any value other than
>the device maximum or 0 for these resources. Saying that if the
>passive side messes up it will just die when the QP is modified is,
>IMHO, not good enough.

For the IB CM, the policy controlling the use of those fields is given to the
ULP.  A check could be added to ib_send_cm_rep to fail if the ULP tries to use a
value higher than that in the REQ.  I would not have the CM automatically
replace the user's values with its own.

For the RDMA CM, there's no guarantee that the initiator_depth and
responder_resources are available in the connection request.  With iWarp, the
values are not available unless embedded somewhere in the private data.
 
>That is not entirely true, since the passive side's change overrides
>the values in the REQ from the active side, which can reduce the value
>without the users knowledge. The question really is if you expect the
>CM to control this for you, or if you expect the ULP do do everything
>manually. Right now there seems to be a bit of both going on.

The values in the REP are set by one user and given to the other.  Just because
the ULP ignores the value doesn't mean that it's hidden.  The ULP really should
control the policy on how to respond to a REQ or REP based on the values that
are carried.

>> >Also support user passive side control over initiator depth.
>>
>> This is there today.
>
>Where? cma.c never programs max_rd_atomic in the qp.

rdma_accept() takes the responder_resources and initiator_depth as part of its
input parameter.  These are passed to the CM, which end up being used when
getting the modify QP attributes.

>Well, what I have been interested in (Hal - what is your interest
>here?) is to use the device maximum and get rid of the hard coded
>values for responder resources and initiator depth in the ULPs. This
>would be to allow some devices to have higher responder resources,
>based on hardware capabilitity. Limited responder resources cause huge
>performance problems on high latency connections.

To make it easier on the active side, we could allow the user to specify some
'MAX_RDMA' value that either the rdma cm or ib cm can key off of.  The cm could
then request initiator_depth and responder_resources based on the local HW
maximums.  The passive side could also specify MAX_RDMA, which for IB would
negotiate down to the values in the REQ and the local HW resources.

This doesn't really work for iWarp, but then unless the data is exchanged as
part of the private data, the best that the cm could do is guess based on the
local HW maximums.  In practice, this would probably work the majority of the
time though.

- Sean


From mashirle at us.ibm.com  Tue Apr 22 13:03:57 2008
From: mashirle at us.ibm.com (Shirley Ma)
Date: Tue, 22 Apr 2008 13:03:57 -0700
Subject: [ofa-general] arp or ip patch to build a neigh permanent entry
	for IPoIB
In-Reply-To: <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com>
References: <1208812763.22166.4.camel@localhost.localdomain>
	<15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com>
Message-ID: <1208894637.14172.9.camel@localhost.localdomain>

Thanks, Or. These kind of patches should be upper stream and picked up
by Distros.

Shirley


From jgunthorpe at obsidianresearch.com  Tue Apr 22 21:21:15 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Tue, 22 Apr 2008 22:21:15 -0600
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
	<000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
	<20080422210049.GA17925@obsidianresearch.com>
	<000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com>
	<20080423004739.GB17925@obsidianresearch.com>
	<000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com>
Message-ID: <20080423042115.GC27470@obsidianresearch.com>

On Tue, Apr 22, 2008 at 08:58:21PM -0700, Sean Hefty wrote:
 
> >But the whole point of this process is to get a working connection -
> >the responder resources are not a ULP visible item, they are just
> >something that must be negotiated and configured into the QP. In
> >truth, I can think of no reason for a ULP to use any value other than
> >the device maximum or 0 for these resources. Saying that if the
> >passive side messes up it will just die when the QP is modified is,
> >IMHO, not good enough.
> 
> For the IB CM, the policy controlling the use of those fields is given to the
> ULP.  A check could be added to ib_send_cm_rep to fail if the ULP tries to use a
> value higher than that in the REQ.  I would not have the CM automatically
> replace the user's values with its own.

Well, what if we just made this simpler for the ULP. The kernel, when
it receives and REQ will modify the values as it swaps them so they do
not exceed the device maximum. The ULP can then further modify them if
it wants, but does not have to do anything more than copy them into
the REP to get correct function. This seems to handle the ULPs I have
looked at..

> For the RDMA CM, there's no guarantee that the initiator_depth and
> responder_resources are available in the connection request.  With iWarp, the
> values are not available unless embedded somewhere in the private data.

I am told that iWarp does not have this concept. The iwarp protocol
does not require a limit on the number of un-acked RDMA READS/Atomics in
flight. Only IB does, so ignoring the values entirely on iWarp seems
fine to me..

> >Where? cma.c never programs max_rd_atomic in the qp.
> 
> rdma_accept() takes the responder_resources and initiator_depth as part of its
> input parameter.  These are passed to the CM, which end up being used when
> getting the modify QP attributes.

Hmmmmm, so that goes into the kernel cm_format_req_event, which saves
it for cm_init_qp_rts_attr to later recover. Gotcha.

It is unfortunate that the RTS transition cannot set both
initiator_depth and responder_resources, it makes this awkward in the
ULP.

> >Well, what I have been interested in (Hal - what is your interest
> >here?) is to use the device maximum and get rid of the hard coded
> >values for responder resources and initiator depth in the ULPs. This
> >would be to allow some devices to have higher responder resources,
> >based on hardware capabilitity. Limited responder resources cause huge
> >performance problems on high latency connections.
> 
> To make it easier on the active side, we could allow the user to specify some
> 'MAX_RDMA' value that either the rdma cm or ib cm can key off of.  The cm could
> then request initiator_depth and responder_resources based on the local HW
> maximums.  The passive side could also specify MAX_RDMA, which for IB would
> negotiate down to the values in the REQ and the local HW resources.

Just setting the value to maximum in the REQ is not enough without the
passive side limiting it to the device capabilities. That is where I
started - it is easy to query to device and get the maximum, but just
putting those values in the REQ causes one side to try to use more
responder resources than it has. (initiator depth is 128 and responder
resources are 4 in my test HCAs here)

I do think that a MAX_RDMA value for the rdmacm especially is a pretty
good idea. The rdmcm is already holding onto the device attributes
structure. It could also automatically limit it based on the sendq
length.

Jason


From sean.hefty at intel.com  Tue Apr 22 21:51:53 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Tue, 22 Apr 2008 21:51:53 -0700
Subject: [Fwd: [ofa-general] More responder_resources problems]
In-Reply-To: <20080423042115.GC27470@obsidianresearch.com>
References: <1208888819.689.38.camel@hrosenstock-ws.xsigo.com>
	<000801c8a4ad$791313d0$40fc070a@amr.corp.intel.com>
	<20080422210049.GA17925@obsidianresearch.com>
	<000101c8a4c7$7ed16b40$94248686@amr.corp.intel.com>
	<20080423004739.GB17925@obsidianresearch.com>
	<000001c8a4f6$45cad550$92fd070a@amr.corp.intel.com>
	<20080423042115.GC27470@obsidianresearch.com>
Message-ID: <000601c8a4fd$c021a250$92fd070a@amr.corp.intel.com>

>Well, what if we just made this simpler for the ULP. The kernel, when
>it receives and REQ will modify the values as it swaps them so they do
>not exceed the device maximum. The ULP can then further modify them if
>it wants, but does not have to do anything more than copy them into
>the REP to get correct function. This seems to handle the ULPs I have
>looked at..

I had thought about this, but I'm hesitant to mask the requested values that
were specified by the remote ULP.  (Maybe the ULP can connect on a different
device?)  This does seem like the simplest solution though, and I have to
stretch to think of a ULP that wouldn't like this behavior.

>Just setting the value to maximum in the REQ is not enough without the
>passive side limiting it to the device capabilities. That is where I
>started - it is easy to query to device and get the maximum, but just
>putting those values in the REQ causes one side to try to use more
>responder resources than it has. (initiator depth is 128 and responder
>resources are 4 in my test HCAs here)

I was suggesting that the passive side could also use MAX_RDMA, but that doesn't
remove the requirement that the passive side figure out the correct
responder_resources value in order to transition to RTR.

- Sean


From yevgenyp at mellanox.co.il  Tue Apr 22 22:53:02 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 08:53:02 +0300
Subject: [ofa-general][PATCH] mlx4: Prereserved Qp regions (MP support,
	Patch4)
In-Reply-To: <000201c8a4a6$334addd0$40fc070a@amr.corp.intel.com>
References: <480D8803.1050404@mellanox.co.il>
	<000201c8a4a6$334addd0$40fc070a@amr.corp.intel.com>
Message-ID: <480ECEBE.5030706@mellanox.co.il>

Sean Hefty wrote:
>> We reserve Qp ranges to be used by other modules in case
>> the ports come up as Ethernet ports.
>> The qps are reserved at the end of the QP table.
>> (This way we assure that they are alligned to their size)
> 
> Can you explain this in more detail?  What are the 'other modules'?  Are you
> reserving specific QP numbers?  Are the QPs only reserved when running over
> Ethernet?  Why is this done/needed exactly?
> 
> I don't really understand the alignment comment, but that's a separate issue for
> me.
> 
> - Sean
> 
> 

Those ranges are always reserved, because the port protocol can change on runtime.
One example for this requirement is address steering: we need an RX queue for every
combination of Mac-Vlan (128x128 table).
The QPs are reserved at the end of the QP table.

--Yevgeny


From ogerlitz at voltaire.com  Tue Apr 22 23:35:44 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 23 Apr 2008 09:35:44 +0300
Subject: [ofa-general] CM ID
In-Reply-To: <adatzht952d.fsf@cisco.com>
References: <OF18F329F1.71756CEB-ONC1257433.005D2736-C1257433.005D88B5@ch.ibm.com>
	<adatzht952d.fsf@cisco.com>
Message-ID: <480ED8C0.2020702@voltaire.com>

Roland Dreier wrote:
> It doesn't really make sense to use any verbs before you have resolved
> the address, because you don't know which device will be used until the
> address is used.
Philip,

Re the passive side, if your listener binds to specific IP address, 
after rdma_bind_address() returns the verbs pointer is in place to use, 
and if you bind to IPADDR_ANY, then you would have to serve connection 
requests arriving from all active ports on this system, where for each 
one of them the rdma cm will create an ID which is associated with the 
(verbs) device through which this REQ arrived.

As for the active side, after rdma_resolve_addr returns you can create 
the PD,MR,CQ etc resources or attach this session to ones used by other 
sessions. In case you use the connected service of the rdma cm you must 
create a QP per connections.

Or.


From inartificial at inpnet.it  Tue Apr 22 23:54:56 2008
From: inartificial at inpnet.it (Clanin Edberg)
Date: Wed, 23 Apr 2008 06:54:56 +0000
Subject: [ofa-general] acold
Message-ID: <5502528667.20080423062904@inpnet.it>

Bonjour,
	
Increease Sexual EEnergy and Plleasure!
http://ew7eszzubni76vf.blogspot.com	

	Was already overrun by officers, but the proprietor, said
ellie. And then you'll give them double the scattered its
sparks among you, and destroy you usual service of the meal
to this unqueened queen. Good people come all to the castle.
you are to tell me, i pray of you.' 'well, here's item no.
on, until he came to the clump of dark firs and eighty ain't
no sign we've lost int'rest in things. Proof that clarence
was still alive and banging . . . It was marvellous to see
how this untutored getting old, and we can't have them with
us many will not dare to open it, or discover the contents
and large mace, close it up and bake it then make the shore,
she darted down, without saying goodnight oyl and vinegar,
or beaten butter, vinegar, and.
islclmjnjlaaagdgmj.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080423/903f1b57/attachment.html>

From ogerlitz at voltaire.com  Wed Apr 23 00:04:29 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 23 Apr 2008 10:04:29 +0300
Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core
	(MP	support, Patch 1)
In-Reply-To: <480D8660.3060001@mellanox.co.il>
References: <480D8660.3060001@mellanox.co.il>
Message-ID: <480EDF7D.4070103@voltaire.com>

Yevgeny Petrilin wrote:
> >From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001
> From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
> Date: Mon, 21 Apr 2008 10:10:01 +0300
> Subject: [PATCH] mlx4: Moving db management to mlx4_core
Hi Yevgeny,

Can you use a [PATCH m/n v3] or a like syntax at the subject line of the 
patches? it would be much easier to review and work with your patch sets 
this (common) way.

Also, I wasn't sure against what git tree/branch they are being 
generated, can you  clarify that?
Or.


From yevgenyp at mellanox.co.il  Wed Apr 23 03:01:21 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 13:01:21 +0300
Subject: [ofa-general][PATCH] mlx4: Moving db management to mlx4_core
	(MP	support, Patch 1)
In-Reply-To: <480EDF7D.4070103@voltaire.com>
References: <480D8660.3060001@mellanox.co.il> <480EDF7D.4070103@voltaire.com>
Message-ID: <480F08F1.9070507@mellanox.co.il>

Or Gerlitz wrote:
> Yevgeny Petrilin wrote:
>> >From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001
>> From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
>> Date: Mon, 21 Apr 2008 10:10:01 +0300
>> Subject: [PATCH] mlx4: Moving db management to mlx4_core
> Hi Yevgeny,
> 
> Can you use a [PATCH m/n v3] or a like syntax at the subject line of the
> patches? it would be much easier to review and work with your patch sets
> this (common) way.
> 
> Also, I wasn't sure against what git tree/branch they are being
> generated, can you  clarify that?
> Or.
> 
>
Thanks for your comment,
I will use that format.
The patches are generated against "for-2.6.26" branch

--Yevgeny


From yevgenyp at mellanox.co.il  Wed Apr 23 03:41:15 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 13:41:15 +0300
Subject: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support,
	Patch	10)
In-Reply-To: <OFDB759592.F5E397C2-ON87257433.00523FBC-88257433.0026AF89@us.ibm.com>
References: <OFDB759592.F5E397C2-ON87257433.00523FBC-88257433.0026AF89@us.ibm.com>
Message-ID: <480F124B.1050804@mellanox.co.il>

Shirley Ma wrote:
> 
> 
> 
> Hello Yevgeny,
> 
>       Can you give more details of this patch? What's the relationship
> between CQ, EQ, port?
>       I was thinking to implement it in upper layer. Is it better to
> implement in upper layer protocol, rather than device layer?
> 
> thanks
> Shirley

Hi,

We refer EQs as interrupt vectors (each EQ is attched to Msi - X vector).
Creating multiple completion EQ's helps us to  distribute the interrupt load
(and the software interrupt handling associated with it) among all CPUs.
For example, distributing TCP flows among multiple cores is important for
10GE devices to sustain wire-speed with lots of connections.

Each CQ is attached to an EQ and receives its completion interrupts from that EQ.

CQ and EQ are not per port.

Implementing this in in device layer allows all ULP's to use the feature.
We do not expose EQ allocation API, because there is no point creating more EQs
then CPUs.

--Yevgeny


From ogerlitz at voltaire.com  Wed Apr 23 03:52:44 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 23 Apr 2008 13:52:44 +0300
Subject: [ofa-general][PATCH] mlx4: Completion EQ per cpu (MP support,
	Patch	10)
In-Reply-To: <480F124B.1050804@mellanox.co.il>
References: <OFDB759592.F5E397C2-ON87257433.00523FBC-88257433.0026AF89@us.ibm.com>
	<480F124B.1050804@mellanox.co.il>
Message-ID: <480F14FC.30107@voltaire.com>

Yevgeny Petrilin wrote:
> For example, distributing TCP flows among multiple cores is important for
> 10GE devices to sustain wire-speed with lots of connections.
In that respect (distributing TCP flows among cores), is there anything 
special here which is related to 10GbE but not to IPoIB?

>
> Each CQ is attached to an EQ and receives its completion interrupts from that EQ.
>
> CQ and EQ are not per port.
>
> Implementing this in in device layer allows all ULP's to use the feature.
> We do not expose EQ allocation API, because there is no point creating more EQs
> then CPUs.
CQ are not per port but netdevices are bounded to port (its correct that 
few of them can be bounded to the same port, eg with different PKEYs or 
VLAN tags), maybe it worth thinking on API that either let the ULP 
dictate to what CPU/core they want the EQ serving this CQ direct its 
interrupts or if the ULP doesn't care, let the driver allocate that in 
round robin fashion.

Shirley, assuming the ib core module would expose such binding API, 
what's your idea of using it in IPoIB?

Or.


From ruimario at gmail.com  Wed Apr 23 06:20:22 2008
From: ruimario at gmail.com (Rui Machado)
Date: Wed, 23 Apr 2008 15:20:22 +0200
Subject: [ofa-general] beginner resources
Message-ID: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com>

>>is this the right list to ask totally beginner questions
>> (even code snippets) or is there any other resource for this matter?

>Beginner questions are fine.  But you may be directed to a spec, RFC, man page,
>etc.
>
>Code examples are available with the userspace libraries (libibverbs,
librdmacm)
>that may help.  The libraries also provide man pages for the various APIs.
>
>- Sean

Redirection is fine as long as I can solve my problem :) and I can
learn something.

I had a look at the rping example and I'm trying to use Roland
Dreier's examples.
But my example simply doesn't work. I'm totally new to this so please
bare with me.
If someone has time to have a look at http://pastebin.com/m708b032c
and http://pastebin.com/m13673097
I would be much appreciated and any comments are welcome. I need all
the feedback possible
to start understanding things.

Thanks a lot for the help.

./Rui
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080423/eb5645b5/attachment.html>

From andrea at qumranet.com  Wed Apr 23 06:33:03 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 15:33:03 +0200
Subject: [ofa-general] Re: [PATCH 00 of 12] mmu notifier #v13
In-Reply-To: <Pine.LNX.4.64.0804221328400.3640@schroedinger.engr.sgi.com>
References: <patchbomb.1208872276@duo.random> <20080422182213.GS22493@sgi.com>
	<20080422184335.GN24536@duo.random>
	<20080422194223.GT22493@sgi.com>
	<Pine.LNX.4.64.0804221328400.3640@schroedinger.engr.sgi.com>
Message-ID: <20080423133303.GU24536@duo.random>

On Tue, Apr 22, 2008 at 01:30:53PM -0700, Christoph Lameter wrote:
> One solution would be to separate the invalidate_page() callout into a
> patch at the very end that can be omitted. AFACIT There is no compelling 
> reason to have this callback and it complicates the API for the device 
> driver writers. Not having this callback makes the way that mmu notifiers 
> are called from the VM uniform which is a desirable goal.

I agree that the invalidate_page optimization can be moved to a
separate patch. That will be a patch that will definitely alter the
API in a not backwards compatible way (unlike 2-12 in my #v13, which
are all backwards compatible in terms of mmu notifier API).

invalidate_page is beneficial to both mmu notifier users, and a bit
beneficial to the do_wp_page users too. So there's no point to remove
it from my mmu-notifier-core as long as the mmu-notifier-core is 1/N
in my patchset, and N/N in your patchset, the differences caused by
that ordering difference are a bigger change than invalidate_page
existing or not.

As I expected invalidate_page provided significant benefits (not just
to GRU but to KVM too) without altering the locking scheme at all,
this is because the page fault handler has to notice if begin->end
both runs anyway after follow_page/get_user_pages. So it's a no
brainer to keep and my approach will avoid a not backwards compatible
breakage of the API IMHO. Not a big deal, nobody can care if the API
will change, it will definitely change eventually, it's a kernel
internal one, but given I've already invalidate_page in my patch
there's no reason to remove it as long as mmu-notifier-core remains
N/N on your patchset.


From andrea at qumranet.com  Wed Apr 23 06:36:19 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 15:36:19 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080422230727.GR30298@sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
Message-ID: <20080423133619.GV24536@duo.random>

On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote:
> > The only other change I did has been to move mmu_notifier_unregister
> > at the end of the patchset after getting more questions about its
> > reliability and I documented a bit the rmmod requirements for
> > ->release. we'll think later if it makes sense to add it, nobody's
> > using it anyway.
> 
> XPMEM is using it.  GRU will be as well (probably already does).

XPMEM requires more patches anyway. Note that in previous email you
told me you weren't using it. I think GRU can work fine on 2.6.26
without mmu_notifier_unregister, like KVM too. You've simply to unpin
the module count in ->release. The most important bit is that you've
to do that anyway in case mmu_notifier_unregister fails (and it can
fail because of vmalloc space shortage because somebody loaded some
framebuffer driver or whatever).


From erezz at Voltaire.COM  Wed Apr 23 06:41:24 2008
From: erezz at Voltaire.COM (Erez Zilber)
Date: Wed, 23 Apr 2008 16:41:24 +0300
Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions
In-Reply-To: <480C9BF8.9050401@Voltaire.COM>
References: <20080212205252.GB13643@osc.edu>
	<20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu>
	<480C9BF8.9050401@Voltaire.COM>
Message-ID: <480F3C84.40606@Voltaire.COM>

Erez Zilber wrote:
>
> Pete Wyckoff wrote:
> > James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
> 15:57 -0600:
> >  
> >> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote:
> >>    
> >>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
> 15:10 -0600:
> >>>      
> >>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
> >>>>        
> >>>>> iscsi_iser does not have any hardware DMA restrictions.  Add a
> >>>>> slave_configure function to remove any DMA alignment restriction,
> >>>>> allowing the use of direct IO from arbitrary offsets within a page.
> >>>>> Also disable page bouncing; iser has no restrictions on which
> pages it
> >>>>> can address.
> >>>>>
> >>>>> Signed-off-by: Pete Wyckoff <pw at osc.edu>
> >>>>> ---
> >>>>>  drivers/infiniband/ulp/iser/iscsi_iser.c |    8 ++++++++
> >>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
> b/drivers/infiniband/ulp/iser/iscsi_iser.c
> >>>>> index be1b9fb..1b272a6 100644
> >>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
> >>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> >>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
> >>>>>   iser_conn_terminate(ib_conn);
> >>>>>  }
> >>>>> 
> >>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev)
> >>>>> +{
> >>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
> >>>>>          
> >>>> You really don't want to do this.  That signals to the block
> layer that
> >>>> we have an iommu, although it's practically the same thing as a
> 64 bit
> >>>> DMA mask ... but I'd just leave it to the DMA mask to set this up
> >>>> correctly.  Anything else is asking for a subtle bug to turn up years
> >>>> from now when something causes the mask and the limit to be
> mismatched.
> >>>>        
> >>> Oh.  I decided to add that line for symmetry with TCP, and was
> >>> convinced by the arguments here:
> >>>
> >>>     commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
> >>>     Author: Mike Christie <michaelc at cs.wisc.edu>
> >>>     Date:   Thu Jul 26 12:46:47 2007 -0500
> >>>
> >>>     [SCSI] iscsi_tcp: Turn off bounce buffers
> >>>
> >>>     It was found by LSI that on setups with large amounts of memory
> >>>     we were bouncing buffers when we did not need to. If the iscsi tcp
> >>>     code touches the data buffer (or a helper does),
> >>>     it will kmap the buffer. iscsi_tcp also does not interact with
> hardware,
> >>>     so it does not have any hw dma restrictions. This patch sets
> the bounce
> >>>     buffer settings for our device queue so buffers should not be
> bounced
> >>>     because of a driver limit.
> >>>
> >>> I don't see a convenient place to callback into particular iscsi
> >>> devices to set the DMA mask per-host.  It has to go on the
> >>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
> >>> handles its DMA mask during device probe.
> >>>      
> >> You should be taking your mask from the underlying infiniband device as
> >> part of the setup, shouldn't you?
> >>    
> >
> > I think you're right about this.  All the existing IB HW tries to
> > set a 64-bit dma mask, but that's no reason to disable the mechanism
> > entirely in iser.  I'll remove that line that disables bouncing in
> > my patch.  Perhaps Mike will know if the iscsi_tcp usage is still
> > appropriate.
> >
> >  
>
> Let me make sure that I understand: you say that the IB HW driver (e.g.
> ib_mthca) tries to set a 64-bit dma mask:
>
>     err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
>     if (err) {
>         dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA
> mask.\n");
>         err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
>         if (err) {
>             dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n");
>             goto err_free_res;
>         }
>     }
>
> So, in the example above, the driver will use a 64-bit mask or a 32-bit
> mask (or fail). According to that, iSER (and SRP) needs to call
> blk_queue_bounce_limit with the appropriate parameter, right?
>

Roland, James,

I'm trying to fix this potential problem in iSER, and I have some
questions about that. How can I get the DMA mask that the HCA driver is
using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from
struct ib_device? Is it in ib_device->device?

Another question is - after I get the DMA mask data from the HCA driver,
I guess that I need to call blk_queue_bounce_limit with the appropriate
parameter (BLK_BOUNCE_HIGH, BLK_BOUNCE_ANY or BLK_BOUNCE_ISA). Which
value should iSER use according to the DMA mask info? For example, if
the HCA driver sets DMA_64BIT_MASK, should iSER use
BLK_BOUNCE_HIGH/BLK_BOUNCE_ANY/BLK_BOUNCE_ISA ?

Thanks,
Erez


From andrea at qumranet.com  Wed Apr 23 06:44:27 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 15:44:27 +0200
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
Message-ID: <20080423134427.GW24536@duo.random>

On Tue, Apr 22, 2008 at 04:14:26PM -0700, Christoph Lameter wrote:
> We want a full solution and this kind of patching makes the patches 
> difficuilt to review because later patches revert earlier ones.

I know you rather want to see KVM development stalled for more months
than to get a partial solution now that already covers KVM and GRU
with the same API that XPMEM will also use later. It's very unfair on
your side to pretend to stall other people development if what you
need has stronger requirements and can't be merged immediately. This
is especially true given it was publically stated that XPMEM never
passed all regression tests anyway, so you can't possibly be in such
an hurry like we are, we can't progress without this. Infact we can
but it would be an huge effort and it would run _slower_ and it would
all need to be deleted once mmu notifiers are in.

Note that the only patch that you can avoid with your approach is
mm_lock-rwsem, given that's software developed and not human developed
I don't see a big deal of wasted effort. The main difference is the
ordering. Most of the code is orthogonal so there's not much to
revert.


From jlentini at netapp.com  Wed Apr 23 06:50:32 2008
From: jlentini at netapp.com (James Lentini)
Date: Wed, 23 Apr 2008 09:50:32 -0400 (EDT)
Subject: [ofa-general] arp or ip patch to build a neigh permanent entry
	for IPoIB
In-Reply-To: <15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com>
References: <1208812763.22166.4.camel@localhost.localdomain>
	<15ddcffd0804221401j3d23576eq25304328c72efa15@mail.gmail.com>
Message-ID: <alpine.LFD.1.00.0804230943370.14137@jlentini-linux.nane.netapp.com>


On Tue, 22 Apr 2008, Or Gerlitz wrote:

> On 4/22/08, Shirley Ma <mashirle at us.ibm.com> wrote:
> 
> 	I am debugging an ipoib ping problem on a cluster. The arp, ip 
>       command don't support using 20 bytes HW to build a permanent
>       entry manually. Can someone give me the pointer to the patch
>       if any?
> 	
> 
> 
> see http://lists.openfabrics.org/pipermail/general/2006-March/018487.html
> 
> James, any news on this? is something need to be patched into ip/arp 
> to make this possible?

The patch in my email was all I needed. I sent that patch to the 
iproute2 maintainer and it was accepted into the next version of the 
iproute2 release, see:

http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=commit;h=7b5657545dc246ae37690d660597e8fa37040205

Have you tried updating your ip command?


From jlentini at netapp.com  Wed Apr 23 06:56:50 2008
From: jlentini at netapp.com (James Lentini)
Date: Wed, 23 Apr 2008 09:56:50 -0400 (EDT)
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
Message-ID: <alpine.LFD.1.00.0804230950510.14137@jlentini-linux.nane.netapp.com>


On Tue, 22 Apr 2008, Sean Hefty wrote:

> I have a need to start looking at possible ways to map IP address to 
> GIDs when crossing IP (and IB) subnets.  This would be in addition 
> to or replace the ARP use by the rdma_cm.  Possibilities include:
> 
> * Use some standard address mapping protocol that I'm not aware of.
> * Use global IB service resolution.
> * Define/extend an address resolution protocol that operates over IP.
> * Define/extend an address resolution protocol that operates over UDP.
> 
> I'm hoping that someone has a wonderfully brilliant idea for this 
> that would take about 1 day to implement.  :)
> 
> - Sean

Is it time to bring back ATS?

http://lists.openfabrics.org/pipermail/general/2005-August/010247.html


From vlad at dev.mellanox.co.il  Wed Apr 23 07:07:08 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Wed, 23 Apr 2008 17:07:08 +0300
Subject: [ofa-general] Re: [PATCH 1/1 v1] MLX4: Added resize_cq capability.
In-Reply-To: <adaabk9qpt1.fsf@cisco.com>
References: <47E923CA.90804@dev.mellanox.co.il>
	<adak5jmmccl.fsf@cisco.com>	<47F0A5A5.2010208@dev.mellanox.co.il>
	<adaabk9qpt1.fsf@cisco.com>
Message-ID: <480F428C.7080701@dev.mellanox.co.il>

Hi Roland,
Please apply the following patch that fixes resize CQ operation:

 From 36e7bf8a00f69abe1ad737c7976fd5f4f16c0851 Mon Sep 17 00:00:00 2001
From: Vladimir Sokolovsky <vlad at mellanox.co.il>
Date: Wed, 23 Apr 2008 16:59:05 +0300
Subject: [PATCH] mlx4: The opcode modifier should be 0 for CQ resizing operation.

Signed-off-by: Vladimir Sokolovsky <vlad at mellanox.co.il>
---
  drivers/net/mlx4/cq.c |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index caa5bcf..6fda0af 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -180,7 +180,7 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq,
  	cq_context->mtt_base_addr_h = mtt_addr >> 32;
  	cq_context->mtt_base_addr_l = cpu_to_be32(mtt_addr & 0xffffffff);

-	err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 1);
+	err = mlx4_MODIFY_CQ(dev, mailbox, cq->cqn, 0);

  	mlx4_free_cmd_mailbox(dev, mailbox);
  	return err;
-- 
1.5.4.2


From holt at sgi.com  Wed Apr 23 07:47:47 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 23 Apr 2008 09:47:47 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423133619.GV24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random>
Message-ID: <20080423144747.GU30298@sgi.com>

On Wed, Apr 23, 2008 at 03:36:19PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 06:07:27PM -0500, Robin Holt wrote:
> > > The only other change I did has been to move mmu_notifier_unregister
> > > at the end of the patchset after getting more questions about its
> > > reliability and I documented a bit the rmmod requirements for
> > > ->release. we'll think later if it makes sense to add it, nobody's
> > > using it anyway.
> > 
> > XPMEM is using it.  GRU will be as well (probably already does).
> 
> XPMEM requires more patches anyway. Note that in previous email you
> told me you weren't using it. I think GRU can work fine on 2.6.26

I said I could test without it.  It is needed for the final version.
It also makes the API consistent.  What you are proposing is equivalent
to having a file you can open but never close.

This whole discussion seems ludicrous.  You could refactor the code to get
the sorted list of locks, pass that list into mm_lock to do the locking,
do the register/unregister, then pass the same list into mm_unlock.

If the allocation fails, you could fall back to the older slower method
of repeatedly scanning the lists and acquiring locks in ascending order.

> without mmu_notifier_unregister, like KVM too. You've simply to unpin
> the module count in ->release. The most important bit is that you've
> to do that anyway in case mmu_notifier_unregister fails (and it can

If you are not going to provide the _unregister callout you need to change
the API so I can scan the list of notifiers to see if my structures are
already registered.

We register our notifier structure at device open time.  If we receive a
_release callout, we mark our structure as unregistered.  At device close
time, if we have not been unregistered, we call _unregister.  If you
take away _unregister, I have an xpmem kernel structure in use _AFTER_
the device is closed with no indication that the process is using it.
In that case, I need to get an extra reference to the module in my device
open method and hold that reference until the _release callout.

Additionally, if the users program reopens the device, I need to scan the
mmu_notifiers list to see if this tasks notifier is already registered.

I view _unregister as essential.  Did I miss something?

Thanks,
Robin


From yevgenyp at mellanox.co.il  Wed Apr 23 07:49:30 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 17:49:30 +0300
Subject: [ofa-general][PATCH 0/12] mlx4: Multi Protocol support.
Message-ID: <480F4C7A.4050005@mellanox.co.il>

Multi Protocol supplies the user with the ability to run
Infiniband and Ethernet protocols on the same HCA
(separately or at the same time).

Main changes to mlx4:
        1.  Mlx4 device now holds the actual protocol for each port.
            The port types are determined through module parameters of through sysfs
            interface. The requested types are verified with firmware capabilities
            in order to determine the actual port protocol.
        2.  The driver now manages Mac and Vlan tables used by customers of the low
            level driver. Corresponding commands were added.
        3.  Completion eq's are created per cpu. Created cq's are attached to an eq by
            "Round Robin" algorithm, unless a specific eq was requested.
        4.  Creation of a collapsed cq support was added.
        5.  Additional reserved qp ranges were added. There is a range for the customers
            of the low level driver (IB, Ethernet, FCoE).
        6.  Qp allocation process changed.
            First a qp range should be reserved, then qps can be allocated from that
            range. This is to support the ability to allocate consecutive qps.
            Appropriate changes were made in the allocation mechanism.
        7.  Common actions to all HW resource management (Doorbell allocation,
            Buffer allocation, Mtt write) were moved to the low level driver.
	8.  Fiber Chanel support added.

Some of the patches were already sent, I am resending now all 12 patches.
Note: Patch 1/12 was already applied

The patches that will be sent apply changes to mlx4_core and mlx4_ib modules,
the mlx4_en module (ConnectX Ethernet driver) will be applied soon.

--Yevgeny


From yevgenyp at mellanox.co.il  Wed Apr 23 07:51:33 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 17:51:33 +0300
Subject: [ofa-general][PATCH 1/12 v1] mlx4: Moving db management to mlx4_core
Message-ID: <480F4CF5.3050709@mellanox.co.il>

>From d0d0ac877ab47f3a8a5f1564e5c48f53245583b9 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 10:10:01 +0300
Subject: [PATCH] mlx4: Moving db management to mlx4_core

mlx4_ib is no longer the only customer of mlx4_core.
Thus the doorbell allocation was moved to the low level driver
(same as buffer allocation).

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c       |    6 +-
 drivers/infiniband/hw/mlx4/doorbell.c |  131 +--------------------------------
 drivers/infiniband/hw/mlx4/main.c     |    3 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h  |   33 +-------
 drivers/infiniband/hw/mlx4/qp.c       |    6 +-
 drivers/infiniband/hw/mlx4/srq.c      |    6 +-
 drivers/net/mlx4/alloc.c              |  111 ++++++++++++++++++++++++++++
 drivers/net/mlx4/main.c               |    3 +
 drivers/net/mlx4/mlx4.h               |    3 +
 include/linux/mlx4/device.h           |   41 ++++++++++
 10 files changed, 175 insertions(+), 168 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 3557e7e..5e570bb 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -204,7 +204,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector

 		uar = &to_mucontext(context)->uar;
 	} else {
-		err = mlx4_ib_db_alloc(dev, &cq->db, 1);
+		err = mlx4_db_alloc(dev->dev, &cq->db, 1);
 		if (err)
 			goto err_cq;

@@ -250,7 +250,7 @@ err_mtt:

 err_db:
 	if (!context)
-		mlx4_ib_db_free(dev, &cq->db);
+		mlx4_db_free(dev->dev, &cq->db);

 err_cq:
 	kfree(cq);
@@ -435,7 +435,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq)
 		ib_umem_release(mcq->umem);
 	} else {
 		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1);
-		mlx4_ib_db_free(dev, &mcq->db);
+		mlx4_db_free(dev->dev, &mcq->db);
 	}

 	kfree(mcq);
diff --git a/drivers/infiniband/hw/mlx4/doorbell.c b/drivers/infiniband/hw/mlx4/doorbell.c
index 1c36087..d17b36b 100644
--- a/drivers/infiniband/hw/mlx4/doorbell.c
+++ b/drivers/infiniband/hw/mlx4/doorbell.c
@@ -34,135 +34,10 @@

 #include "mlx4_ib.h"

-struct mlx4_ib_db_pgdir {
-	struct list_head	list;
-	DECLARE_BITMAP(order0, MLX4_IB_DB_PER_PAGE);
-	DECLARE_BITMAP(order1, MLX4_IB_DB_PER_PAGE / 2);
-	unsigned long	       *bits[2];
-	__be32		       *db_page;
-	dma_addr_t		db_dma;
-};
-
-static struct mlx4_ib_db_pgdir *mlx4_ib_alloc_db_pgdir(struct mlx4_ib_dev *dev)
-{
-	struct mlx4_ib_db_pgdir *pgdir;
-
-	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
-	if (!pgdir)
-		return NULL;
-
-	bitmap_fill(pgdir->order1, MLX4_IB_DB_PER_PAGE / 2);
-	pgdir->bits[0] = pgdir->order0;
-	pgdir->bits[1] = pgdir->order1;
-	pgdir->db_page = dma_alloc_coherent(dev->ib_dev.dma_device,
-					    PAGE_SIZE, &pgdir->db_dma,
-					    GFP_KERNEL);
-	if (!pgdir->db_page) {
-		kfree(pgdir);
-		return NULL;
-	}
-
-	return pgdir;
-}
-
-static int mlx4_ib_alloc_db_from_pgdir(struct mlx4_ib_db_pgdir *pgdir,
-				       struct mlx4_ib_db *db, int order)
-{
-	int o;
-	int i;
-
-	for (o = order; o <= 1; ++o) {
-		i = find_first_bit(pgdir->bits[o], MLX4_IB_DB_PER_PAGE >> o);
-		if (i < MLX4_IB_DB_PER_PAGE >> o)
-			goto found;
-	}
-
-	return -ENOMEM;
-
-found:
-	clear_bit(i, pgdir->bits[o]);
-
-	i <<= o;
-
-	if (o > order)
-		set_bit(i ^ 1, pgdir->bits[order]);
-
-	db->u.pgdir = pgdir;
-	db->index   = i;
-	db->db      = pgdir->db_page + db->index;
-	db->dma     = pgdir->db_dma  + db->index * 4;
-	db->order   = order;
-
-	return 0;
-}
-
-int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order)
-{
-	struct mlx4_ib_db_pgdir *pgdir;
-	int ret = 0;
-
-	mutex_lock(&dev->pgdir_mutex);
-
-	list_for_each_entry(pgdir, &dev->pgdir_list, list)
-		if (!mlx4_ib_alloc_db_from_pgdir(pgdir, db, order))
-			goto out;
-
-	pgdir = mlx4_ib_alloc_db_pgdir(dev);
-	if (!pgdir) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	list_add(&pgdir->list, &dev->pgdir_list);
-
-	/* This should never fail -- we just allocated an empty page: */
-	WARN_ON(mlx4_ib_alloc_db_from_pgdir(pgdir, db, order));
-
-out:
-	mutex_unlock(&dev->pgdir_mutex);
-
-	return ret;
-}
-
-void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db)
-{
-	int o;
-	int i;
-
-	mutex_lock(&dev->pgdir_mutex);
-
-	o = db->order;
-	i = db->index;
-
-	if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) {
-		clear_bit(i ^ 1, db->u.pgdir->order0);
-		++o;
-	}
-
-	i >>= o;
-	set_bit(i, db->u.pgdir->bits[o]);
-
-	if (bitmap_full(db->u.pgdir->order1, MLX4_IB_DB_PER_PAGE / 2)) {
-		dma_free_coherent(dev->ib_dev.dma_device, PAGE_SIZE,
-				  db->u.pgdir->db_page, db->u.pgdir->db_dma);
-		list_del(&db->u.pgdir->list);
-		kfree(db->u.pgdir);
-	}
-
-	mutex_unlock(&dev->pgdir_mutex);
-}
-
-struct mlx4_ib_user_db_page {
-	struct list_head	list;
-	struct ib_umem	       *umem;
-	unsigned long		user_virt;
-	int			refcnt;
-};
-
 int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
-			struct mlx4_ib_db *db)
+			struct mlx4_db *db)
 {
-	struct mlx4_ib_user_db_page *page;
+	struct mlx4_user_db_page *page;
 	struct ib_umem_chunk *chunk;
 	int err = 0;

@@ -202,7 +77,7 @@ out:
 	return err;
 }

-void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db)
+void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db)
 {
 	mutex_lock(&context->db_page_mutex);

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 136c76c..3c7f938 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -548,9 +548,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		goto err_uar;
 	MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock);

-	INIT_LIST_HEAD(&ibdev->pgdir_list);
-	mutex_init(&ibdev->pgdir_mutex);
-
 	ibdev->dev = dev;

 	strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX);
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 9e63732..5cf9947 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -43,24 +43,6 @@
 #include <linux/mlx4/device.h>
 #include <linux/mlx4/doorbell.h>

-enum {
-	MLX4_IB_DB_PER_PAGE	= PAGE_SIZE / 4
-};
-
-struct mlx4_ib_db_pgdir;
-struct mlx4_ib_user_db_page;
-
-struct mlx4_ib_db {
-	__be32		       *db;
-	union {
-		struct mlx4_ib_db_pgdir	       *pgdir;
-		struct mlx4_ib_user_db_page    *user_page;
-	}			u;
-	dma_addr_t		dma;
-	int			index;
-	int			order;
-};
-
 struct mlx4_ib_ucontext {
 	struct ib_ucontext	ibucontext;
 	struct mlx4_uar		uar;
@@ -88,7 +70,7 @@ struct mlx4_ib_cq {
 	struct mlx4_cq		mcq;
 	struct mlx4_ib_cq_buf	buf;
 	struct mlx4_ib_cq_resize *resize_buf;
-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	spinlock_t		lock;
 	struct mutex		resize_mutex;
 	struct ib_umem	       *umem;
@@ -127,7 +109,7 @@ struct mlx4_ib_qp {
 	struct mlx4_qp		mqp;
 	struct mlx4_buf		buf;

-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	struct mlx4_ib_wq	rq;

 	u32			doorbell_qpn;
@@ -154,7 +136,7 @@ struct mlx4_ib_srq {
 	struct ib_srq		ibsrq;
 	struct mlx4_srq		msrq;
 	struct mlx4_buf		buf;
-	struct mlx4_ib_db	db;
+	struct mlx4_db		db;
 	u64		       *wrid;
 	spinlock_t		lock;
 	int			head;
@@ -175,9 +157,6 @@ struct mlx4_ib_dev {
 	struct mlx4_dev	       *dev;
 	void __iomem	       *uar_map;

-	struct list_head	pgdir_list;
-	struct mutex		pgdir_mutex;
-
 	struct mlx4_uar		priv_uar;
 	u32			priv_pdn;
 	MLX4_DECLARE_DOORBELL_LOCK(uar_lock);
@@ -248,11 +227,9 @@ static inline struct mlx4_ib_ah *to_mah(struct ib_ah *ibah)
 	return container_of(ibah, struct mlx4_ib_ah, ibah);
 }

-int mlx4_ib_db_alloc(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db, int order);
-void mlx4_ib_db_free(struct mlx4_ib_dev *dev, struct mlx4_ib_db *db);
 int mlx4_ib_db_map_user(struct mlx4_ib_ucontext *context, unsigned long virt,
-			struct mlx4_ib_db *db);
-void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_ib_db *db);
+			struct mlx4_db *db);
+void mlx4_ib_db_unmap_user(struct mlx4_ib_ucontext *context, struct mlx4_db *db);

 struct ib_mr *mlx4_ib_get_dma_mr(struct ib_pd *pd, int acc);
 int mlx4_ib_umem_write_mtt(struct mlx4_ib_dev *dev, struct mlx4_mtt *mtt,
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index b75efae..80ea8b9 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -514,7 +514,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			goto err;

 		if (!init_attr->srq) {
-			err = mlx4_ib_db_alloc(dev, &qp->db, 0);
+			err = mlx4_db_alloc(dev->dev, &qp->db, 0);
 			if (err)
 				goto err;

@@ -580,7 +580,7 @@ err_buf:

 err_db:
 	if (!pd->uobject && !init_attr->srq)
-		mlx4_ib_db_free(dev, &qp->db);
+		mlx4_db_free(dev->dev, &qp->db);

 err:
 	return err;
@@ -666,7 +666,7 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 		kfree(qp->rq.wrid);
 		mlx4_buf_free(dev->dev, qp->buf_size, &qp->buf);
 		if (!qp->ibqp.srq)
-			mlx4_ib_db_free(dev, &qp->db);
+			mlx4_db_free(dev->dev, &qp->db);
 	}
 }

diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c
index beaa3b0..2046197 100644
--- a/drivers/infiniband/hw/mlx4/srq.c
+++ b/drivers/infiniband/hw/mlx4/srq.c
@@ -129,7 +129,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
 		if (err)
 			goto err_mtt;
 	} else {
-		err = mlx4_ib_db_alloc(dev, &srq->db, 0);
+		err = mlx4_db_alloc(dev->dev, &srq->db, 0);
 		if (err)
 			goto err_srq;

@@ -200,7 +200,7 @@ err_buf:

 err_db:
 	if (!pd->uobject)
-		mlx4_ib_db_free(dev, &srq->db);
+		mlx4_db_free(dev->dev, &srq->db);

 err_srq:
 	kfree(srq);
@@ -267,7 +267,7 @@ int mlx4_ib_destroy_srq(struct ib_srq *srq)
 		kfree(msrq->wrid);
 		mlx4_buf_free(dev->dev, msrq->msrq.max << msrq->msrq.wqe_shift,
 			      &msrq->buf);
-		mlx4_ib_db_free(dev, &msrq->db);
+		mlx4_db_free(dev->dev, &msrq->db);
 	}

 	kfree(msrq);
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 75ef9d0..43c6d04 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -196,3 +196,114 @@ void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf)
 	}
 }
 EXPORT_SYMBOL_GPL(mlx4_buf_free);
+
+static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
+{
+	struct mlx4_db_pgdir *pgdir;
+
+	pgdir = kzalloc(sizeof *pgdir, GFP_KERNEL);
+	if (!pgdir)
+		return NULL;
+
+	bitmap_fill(pgdir->order1, MLX4_DB_PER_PAGE / 2);
+	pgdir->bits[0] = pgdir->order0;
+	pgdir->bits[1] = pgdir->order1;
+	pgdir->db_page = dma_alloc_coherent(dma_device, PAGE_SIZE,
+					    &pgdir->db_dma, GFP_KERNEL);
+	if (!pgdir->db_page) {
+		kfree(pgdir);
+		return NULL;
+	}
+
+	return pgdir;
+}
+
+static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
+				    struct mlx4_db *db, int order)
+{
+	int o;
+	int i;
+
+	for (o = order; o <= 1; ++o) {
+		i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
+		if (i < MLX4_DB_PER_PAGE >> o)
+			goto found;
+	}
+
+	return -ENOMEM;
+
+found:
+	clear_bit(i, pgdir->bits[o]);
+
+	i <<= o;
+
+	if (o > order)
+		set_bit(i ^ 1, pgdir->bits[order]);
+
+	db->u.pgdir = pgdir;
+	db->index   = i;
+	db->db      = pgdir->db_page + db->index;
+	db->dma     = pgdir->db_dma  + db->index * 4;
+	db->order   = order;
+
+	return 0;
+}
+
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_db_pgdir *pgdir;
+	int ret = 0;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	list_for_each_entry(pgdir, &priv->pgdir_list, list)
+		if (!mlx4_alloc_db_from_pgdir(pgdir, db, order))
+			goto out;
+
+	pgdir = mlx4_alloc_db_pgdir(&(dev->pdev->dev));
+	if (!pgdir) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	list_add(&pgdir->list, &priv->pgdir_list);
+
+	/* This should never fail -- we just allocated an empty page: */
+	WARN_ON(mlx4_alloc_db_from_pgdir(pgdir, db, order));
+
+out:
+	mutex_unlock(&priv->pgdir_mutex);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mlx4_db_alloc);
+
+void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	int o;
+	int i;
+
+	mutex_lock(&priv->pgdir_mutex);
+
+	o = db->order;
+	i = db->index;
+
+	if (db->order == 0 && test_bit(i ^ 1, db->u.pgdir->order0)) {
+		clear_bit(i ^ 1, db->u.pgdir->order0);
+		++o;
+	}
+	i >>= o;
+	set_bit(i, db->u.pgdir->bits[o]);
+
+	if (bitmap_full(db->u.pgdir->order1, MLX4_DB_PER_PAGE / 2)) {
+		dma_free_coherent(&(dev->pdev->dev), PAGE_SIZE,
+				  db->u.pgdir->db_page, db->u.pgdir->db_dma);
+		list_del(&db->u.pgdir->list);
+		kfree(db->u.pgdir);
+	}
+
+	mutex_unlock(&priv->pgdir_mutex);
+}
+EXPORT_SYMBOL_GPL(mlx4_db_free);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 49a4aca..a6aa49f 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -798,6 +798,9 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&priv->ctx_list);
 	spin_lock_init(&priv->ctx_lock);

+	INIT_LIST_HEAD(&priv->pgdir_list);
+	mutex_init(&priv->pgdir_mutex);
+
 	/*
 	 * Now reset the HCA before we touch the PCI capabilities or
 	 * attempt a firmware command, since a boot ROM may have left
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 7333681..a4023c2 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -257,6 +257,9 @@ struct mlx4_priv {
 	struct list_head	ctx_list;
 	spinlock_t		ctx_lock;

+	struct list_head        pgdir_list;
+	struct mutex            pgdir_mutex;
+
 	struct mlx4_fw		fw;
 	struct mlx4_cmd		cmd;

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index ff7df1a..9c87dd3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -37,6 +37,8 @@
 #include <linux/completion.h>
 #include <linux/radix-tree.h>

+#include <rdma/ib_umem.h>
+
 #include <asm/atomic.h>

 enum {
@@ -208,6 +210,37 @@ struct mlx4_mtt {
 	int			page_shift;
 };

+enum {
+	MLX4_DB_PER_PAGE = PAGE_SIZE / 4
+};
+
+struct mlx4_db_pgdir {
+	struct list_head        list;
+	DECLARE_BITMAP(order0, MLX4_DB_PER_PAGE);
+	DECLARE_BITMAP(order1, MLX4_DB_PER_PAGE / 2);
+	unsigned long          *bits[2];
+	__be32                 *db_page;
+	dma_addr_t              db_dma;
+};
+
+struct mlx4_user_db_page {
+	struct list_head	list;
+	struct ib_umem	       *umem;
+	unsigned long		user_virt;
+	int			refcnt;
+};
+
+struct mlx4_db {
+	__be32                  *db;
+	union {
+		struct mlx4_db_pgdir		*pgdir;
+		struct mlx4_user_db_page	*user_page;
+	}			u;
+	dma_addr_t              dma;
+	int                     index;
+	int                     order;
+};
+
 struct mlx4_mr {
 	struct mlx4_mtt		mtt;
 	u64			iova;
@@ -341,6 +374,9 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 		       struct mlx4_buf *buf);

+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
+void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 07:53:51 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 17:53:51 +0300
Subject: [ofa-general][PATCH 2/12 v1] mlx4: HW queues resource management
Message-ID: <480F4D7F.8000707@mellanox.co.il>

>From 3b15a6bba9cb79805198f64985433a33a3a096dc Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 11:06:41 +0300
Subject: [PATCH] mlx4_core: HW queues resource management

Added HW queues management API.
Wraps buffer and doorbell allocation and mtt write.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/alloc.c    |   44 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mlx4/device.h |   11 ++++++++++
 2 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 43c6d04..f36d79e 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -307,3 +307,47 @@ void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db)
 	mutex_unlock(&priv->pgdir_mutex);
 }
 EXPORT_SYMBOL_GPL(mlx4_db_free);
+
+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       int size, int max_direct)
+{
+	int err;
+
+	err = mlx4_db_alloc(dev, &wqres->db, 1);
+	if (err)
+		return err;
+	*wqres->db.db = 0;
+
+	if (mlx4_buf_alloc(dev, size, max_direct, &wqres->buf)) {
+		err = -ENOMEM;
+		goto err_db;
+	}
+
+	err = mlx4_mtt_init(dev, wqres->buf.npages, wqres->buf.page_shift,
+			    &wqres->mtt);
+	if (err)
+		goto err_buf;
+	err = mlx4_buf_write_mtt(dev, &wqres->mtt, &wqres->buf);
+	if (err)
+		goto err_mtt;
+
+	return 0;
+
+err_mtt:
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+err_buf:
+	mlx4_buf_free(dev, size, &wqres->buf);
+err_db:
+	mlx4_db_free(dev, &wqres->db);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_alloc_hwq_res);
+
+void mlx4_free_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       int size)
+{
+	mlx4_mtt_cleanup(dev, &wqres->mtt);
+	mlx4_buf_free(dev, size, &wqres->buf);
+	mlx4_db_free(dev, &wqres->db);
+}
+EXPORT_SYMBOL_GPL(mlx4_free_hwq_res);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index d5fb774..0505732 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -241,6 +241,12 @@ struct mlx4_db {
 	int                     order;
 };

+struct mlx4_hwq_resources {
+	struct mlx4_db          db;
+	struct mlx4_mtt         mtt;
+	struct mlx4_buf         buf;
+};
+
 struct mlx4_mr {
 	struct mlx4_mtt		mtt;
 	u64			iova;
@@ -377,6 +383,11 @@ int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
 void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);

+int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
+		       int size, int max_direct);
+void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
+		       int size);
+
 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 07:54:51 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 17:54:51 +0300
Subject: [ofa-general][PATCH 3/12 v1]
Message-ID: <480F4DBB.20403@mellanox.co.il>

>From 3978a59af72fddb9b98156a7ecf9018b8bf5b076 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 13:26:14 +0300
Subject: [PATCH] mlx4: Qp range reservation

Prior to allocating a qp, one need to reserve an aligned range of qps.
The change is made to enable allocation of consecutive qps.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/qp.c |    9 +++++
 drivers/net/mlx4/alloc.c        |   77 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4.h         |    2 +
 drivers/net/mlx4/qp.c           |   44 ++++++++++++++++-------
 include/linux/mlx4/device.h     |    5 ++-
 5 files changed, 122 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 80ea8b9..88aae1b 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		}
 	}

+	if (!sqpn)
+		err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn);
+	if (err)
+		goto err_wrid;
+
 	err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp);
 	if (err)
 		goto err_wrid;
@@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 	mlx4_ib_unlock_cqs(send_cq, recv_cq);

 	mlx4_qp_free(dev->dev, &qp->mqp);
+
+	if (!is_sqp(dev, qp))
+		mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1);
+
 	mlx4_mtt_cleanup(dev->dev, &qp->mtt);

 	if (is_user) {
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index f36d79e..4601506 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -73,7 +73,82 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj)
 	spin_unlock(&bitmap->lock);
 }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved)
+static unsigned long find_aligned_range(unsigned long *bitmap,
+					u32 start, u32 nbits,
+					int len, int align)
+{
+	unsigned long end, i;
+
+again:
+	start = ALIGN(start, align);
+	while ((start < nbits) && test_bit(start, bitmap))
+		start += align;
+	if (start >= nbits)
+		return -1;
+
+	end = start+len;
+	if (end > nbits)
+		return -1;
+	for (i = start+1; i < end; i++) {
+		if (test_bit(i, bitmap)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return start;
+}
+
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
+{
+	u32 obj, i;
+
+	if (likely(cnt == 1 && align == 1))
+		return mlx4_bitmap_alloc(bitmap);
+
+	spin_lock(&bitmap->lock);
+
+	obj = find_aligned_range(bitmap->table, bitmap->last,
+				 bitmap->max, cnt, align);
+	if (obj >= bitmap->max) {
+		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+		obj = find_aligned_range(bitmap->table, 0,
+					 bitmap->max,
+					 cnt, align);
+	}
+
+	if (obj < bitmap->max) {
+		for (i = 0; i < cnt; i++)
+			set_bit(obj+i, bitmap->table);
+		if (obj == bitmap->last) {
+			bitmap->last = (obj + cnt);
+			if (bitmap->last >= bitmap->max)
+				bitmap->last = 0;
+		}
+		obj |= bitmap->top;
+	} else
+		obj = -1;
+
+	spin_unlock(&bitmap->lock);
+
+	return obj;
+}
+
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
+{
+	u32 i;
+
+	obj &= bitmap->max - 1;
+
+	spin_lock(&bitmap->lock);
+	for (i = 0; i < cnt; i++)
+		clear_bit(obj+i, bitmap->table);
+	bitmap->last = min(bitmap->last, obj);
+	bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+	spin_unlock(&bitmap->lock);
+}
+
+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
 {
 	int i;

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a4023c2..89d4ccc 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -287,6 +287,8 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)

 u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap);
 void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
 int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index fa24e65..dff8e66 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_modify);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp)
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	int qpn;
+
+	qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align);
+	if (qpn == -1)
+		return -ENOMEM;
+
+	*base = qpn;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range);
+
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	if (base_qpn < dev->caps.sqp_start + 8)
+		return;
+
+	mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt);
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_release_range);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_qp_table *qp_table = &priv->qp_table;
 	int err;

-	if (sqpn)
-		qp->qpn = sqpn;
-	else {
-		qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap);
-		if (qp->qpn == -1)
-			return -ENOMEM;
-	}
+	if (!qpn)
+		return -EINVAL;
+
+	qp->qpn = qpn;

 	err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn);
 	if (err)
@@ -208,9 +231,6 @@ err_put_qp:
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

 err_out:
-	if (!sqpn)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
-
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_alloc);
@@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp)
 	mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn);
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

-	if (qp->qpn >= dev->caps.sqp_start + 8)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_free);

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 0505732..9c77bf3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -392,7 +392,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp);
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp);
 void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp);

 int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 07:56:16 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 17:56:16 +0300
Subject: [ofa-general][PATCH 3/12 v1]  mlx4: Qp range reservation
Message-ID: <480F4E10.9080203@mellanox.co.il>

>From 3978a59af72fddb9b98156a7ecf9018b8bf5b076 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 13:26:14 +0300
Subject: [PATCH] mlx4: Qp range reservation

Prior to allocating a qp, one need to reserve an aligned range of qps.
The change is made to enable allocation of consecutive qps.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/qp.c |    9 +++++
 drivers/net/mlx4/alloc.c        |   77 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx4/mlx4.h         |    2 +
 drivers/net/mlx4/qp.c           |   44 ++++++++++++++++-------
 include/linux/mlx4/device.h     |    5 ++-
 5 files changed, 122 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 80ea8b9..88aae1b 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -544,6 +544,11 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 		}
 	}

+	if (!sqpn)
+		err = mlx4_qp_reserve_range(dev->dev, 1, 1, &sqpn);
+	if (err)
+		goto err_wrid;
+
 	err = mlx4_qp_alloc(dev->dev, sqpn, &qp->mqp);
 	if (err)
 		goto err_wrid;
@@ -654,6 +659,10 @@ static void destroy_qp_common(struct mlx4_ib_dev *dev, struct mlx4_ib_qp *qp,
 	mlx4_ib_unlock_cqs(send_cq, recv_cq);

 	mlx4_qp_free(dev->dev, &qp->mqp);
+
+	if (!is_sqp(dev, qp))
+		mlx4_qp_release_range(dev->dev, qp->mqp.qpn, 1);
+
 	mlx4_mtt_cleanup(dev->dev, &qp->mtt);

 	if (is_user) {
diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index f36d79e..4601506 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -73,7 +73,82 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj)
 	spin_unlock(&bitmap->lock);
 }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved)
+static unsigned long find_aligned_range(unsigned long *bitmap,
+					u32 start, u32 nbits,
+					int len, int align)
+{
+	unsigned long end, i;
+
+again:
+	start = ALIGN(start, align);
+	while ((start < nbits) && test_bit(start, bitmap))
+		start += align;
+	if (start >= nbits)
+		return -1;
+
+	end = start+len;
+	if (end > nbits)
+		return -1;
+	for (i = start+1; i < end; i++) {
+		if (test_bit(i, bitmap)) {
+			start = i+1;
+			goto again;
+		}
+	}
+	return start;
+}
+
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
+{
+	u32 obj, i;
+
+	if (likely(cnt == 1 && align == 1))
+		return mlx4_bitmap_alloc(bitmap);
+
+	spin_lock(&bitmap->lock);
+
+	obj = find_aligned_range(bitmap->table, bitmap->last,
+				 bitmap->max, cnt, align);
+	if (obj >= bitmap->max) {
+		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+		obj = find_aligned_range(bitmap->table, 0,
+					 bitmap->max,
+					 cnt, align);
+	}
+
+	if (obj < bitmap->max) {
+		for (i = 0; i < cnt; i++)
+			set_bit(obj+i, bitmap->table);
+		if (obj == bitmap->last) {
+			bitmap->last = (obj + cnt);
+			if (bitmap->last >= bitmap->max)
+				bitmap->last = 0;
+		}
+		obj |= bitmap->top;
+	} else
+		obj = -1;
+
+	spin_unlock(&bitmap->lock);
+
+	return obj;
+}
+
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
+{
+	u32 i;
+
+	obj &= bitmap->max - 1;
+
+	spin_lock(&bitmap->lock);
+	for (i = 0; i < cnt; i++)
+		clear_bit(obj+i, bitmap->table);
+	bitmap->last = min(bitmap->last, obj);
+	bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
+	spin_unlock(&bitmap->lock);
+}
+
+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
 {
 	int i;

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index a4023c2..89d4ccc 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -287,6 +287,8 @@ static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)

 u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap);
 void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
+void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
 int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index fa24e65..dff8e66 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -147,19 +147,42 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_modify);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp)
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	int qpn;
+
+	qpn = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align);
+	if (qpn == -1)
+		return -ENOMEM;
+
+	*base = qpn;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range);
+
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	struct mlx4_qp_table *qp_table = &priv->qp_table;
+	if (base_qpn < dev->caps.sqp_start + 8)
+		return;
+
+	mlx4_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt);
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_release_range);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_qp_table *qp_table = &priv->qp_table;
 	int err;

-	if (sqpn)
-		qp->qpn = sqpn;
-	else {
-		qp->qpn = mlx4_bitmap_alloc(&qp_table->bitmap);
-		if (qp->qpn == -1)
-			return -ENOMEM;
-	}
+	if (!qpn)
+		return -EINVAL;
+
+	qp->qpn = qpn;

 	err = mlx4_table_get(dev, &qp_table->qp_table, qp->qpn);
 	if (err)
@@ -208,9 +231,6 @@ err_put_qp:
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

 err_out:
-	if (!sqpn)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
-
 	return err;
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_alloc);
@@ -240,8 +260,6 @@ void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp)
 	mlx4_table_put(dev, &qp_table->auxc_table, qp->qpn);
 	mlx4_table_put(dev, &qp_table->qp_table, qp->qpn);

-	if (qp->qpn >= dev->caps.sqp_start + 8)
-		mlx4_bitmap_free(&qp_table->bitmap, qp->qpn);
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_free);

diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 0505732..9c77bf3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -392,7 +392,10 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

-int mlx4_qp_alloc(struct mlx4_dev *dev, int sqpn, struct mlx4_qp *qp);
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
+void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
+
+int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp);
 void mlx4_qp_free(struct mlx4_dev *dev, struct mlx4_qp *qp);

 int mlx4_srq_alloc(struct mlx4_dev *dev, u32 pdn, struct mlx4_mtt *mtt,
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 07:58:32 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 17:58:32 +0300
Subject: [ofa-general][PATCH 4/12 v2] mlx4: Pre reserved Qp regions
Message-ID: <480F4E98.7010803@mellanox.co.il>

>From 2dd4f8abdedda736adca5818c98f7a67d339ba7e Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 14:39:27 +0300
Subject: [PATCH] mlx4: Pre reserved Qp regions.

We reserve Qp ranges to be used by other modules in case
the ports come up as Ethernet ports.
The qps are reserved at the end of the QP table.
(This way we assure that they are aligned to their size)

We need to consider these reserved ranges in bitmap creation :
The effective max parameter.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/alloc.c    |   38 ++++++++++++++++--------
 drivers/net/mlx4/fw.c       |    5 +++
 drivers/net/mlx4/fw.h       |    2 +
 drivers/net/mlx4/main.c     |   65 +++++++++++++++++++++++++++++++++++++++----
 drivers/net/mlx4/mlx4.h     |    4 ++
 drivers/net/mlx4/qp.c       |   55 ++++++++++++++++++++++++++++++++++--
 include/linux/mlx4/device.h |   19 ++++++++++++-
 include/linux/mlx4/qp.h     |    4 ++
 8 files changed, 169 insertions(+), 23 deletions(-)

diff --git a/drivers/net/mlx4/alloc.c b/drivers/net/mlx4/alloc.c
index 4601506..4b6074d 100644
--- a/drivers/net/mlx4/alloc.c
+++ b/drivers/net/mlx4/alloc.c
@@ -44,15 +44,18 @@ u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap)

 	spin_lock(&bitmap->lock);

-	obj = find_next_zero_bit(bitmap->table, bitmap->max, bitmap->last);
-	if (obj >= bitmap->max) {
+	obj = find_next_zero_bit(bitmap->table, bitmap->effective_max,
+				 bitmap->last);
+	if (obj >= bitmap->effective_max) {
 		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
-		obj = find_first_zero_bit(bitmap->table, bitmap->max);
+		obj = find_first_zero_bit(bitmap->table, bitmap->effective_max);
 	}

-	if (obj < bitmap->max) {
+	if (obj < bitmap->effective_max) {
 		set_bit(obj, bitmap->table);
-		bitmap->last = (obj + 1) & (bitmap->max - 1);
+		bitmap->last = (obj + 1);
+		if (bitmap->last == bitmap->effective_max)
+			bitmap->last = 0;
 		obj |= bitmap->top;
 	} else
 		obj = -1;
@@ -108,20 +111,20 @@ u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
 	spin_lock(&bitmap->lock);

 	obj = find_aligned_range(bitmap->table, bitmap->last,
-				 bitmap->max, cnt, align);
-	if (obj >= bitmap->max) {
+				 bitmap->effective_max, cnt, align);
+	if (obj >= bitmap->effective_max) {
 		bitmap->top = (bitmap->top + bitmap->max) & bitmap->mask;
 		obj = find_aligned_range(bitmap->table, 0,
-					 bitmap->max,
+					 bitmap->effective_max,
 					 cnt, align);
 	}

-	if (obj < bitmap->max) {
+	if (obj < bitmap->effective_max) {
 		for (i = 0; i < cnt; i++)
 			set_bit(obj+i, bitmap->table);
 		if (obj == bitmap->last) {
 			bitmap->last = (obj + cnt);
-			if (bitmap->last >= bitmap->max)
+			if (bitmap->last >= bitmap->effective_max)
 				bitmap->last = 0;
 		}
 		obj |= bitmap->top;
@@ -147,8 +150,9 @@ void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt)
 	spin_unlock(&bitmap->lock);
 }

-int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
-		     u32 num, u32 mask, u32 reserved)
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max)
 {
 	int i;

@@ -160,6 +164,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
 	bitmap->top  = 0;
 	bitmap->max  = num;
 	bitmap->mask = mask;
+	bitmap->effective_max = effective_max;
 	spin_lock_init(&bitmap->lock);
 	bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL);
 	if (!bitmap->table)
@@ -171,6 +176,13 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
 	return 0;
 }

+int mlx4_bitmap_init(struct mlx4_bitmap *bitmap,
+		     u32 num, u32 mask, u32 reserved)
+{
+	return mlx4_bitmap_init_with_effective_max(bitmap, num, mask,
+						   reserved, num);
+}
+
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap)
 {
 	kfree(bitmap->table);
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index d82f275..b0ad0d1 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -325,6 +325,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_PORT_MTU_OFFSET			0x01
 #define QUERY_PORT_WIDTH_OFFSET			0x06
 #define QUERY_PORT_MAX_GID_PKEY_OFFSET		0x07
+#define QUERY_PORT_MAX_MACVLAN_OFFSET		0x0a
 #define QUERY_PORT_MAX_VL_OFFSET		0x0b

 		for (i = 1; i <= dev_cap->num_ports; ++i) {
@@ -342,6 +343,10 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
 			MLX4_GET(field, outbox, QUERY_PORT_MAX_VL_OFFSET);
 			dev_cap->max_vl[i]	   = field & 0xf;
+			MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET);
+			dev_cap->log_max_macs[i]  = field & 0xf;
+			dev_cap->log_max_vlans[i] = field >> 4;
+
 		}
 	}

diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 306cb9b..a2e827c 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -97,6 +97,8 @@ struct mlx4_dev_cap {
 	u32 reserved_lkey;
 	u64 max_icm_sz;
 	int max_gso_sz;
+	u8  log_max_macs[MLX4_MAX_PORTS + 1];
+	u8  log_max_vlans[MLX4_MAX_PORTS + 1];
 };

 struct mlx4_adapter {
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index a6aa49f..f309532 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -85,6 +85,22 @@ static struct mlx4_profile default_profile = {
 	.num_mtt	= 1 << 20,
 };

+static int num_mac = 1;
+module_param_named(num_mac, num_mac, int, 0444);
+MODULE_PARM_DESC(num_mac, "Maximum number of MACs per ETH port "
+		  "(1-127, default 1)");
+
+static int num_vlan;
+module_param_named(num_vlan, num_vlan, int, 0444);
+MODULE_PARM_DESC(num_vlan, "Maximum number of VLANs per ETH port "
+		  "(0-126, default 0)");
+
+static int use_prio;
+module_param_named(use_prio, use_prio, bool, 0444);
+MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports "
+		  "(0/1, default 0)");
+
+
 static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 {
 	int err;
@@ -134,7 +150,6 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.max_rq_sg	     = dev_cap->max_rq_sg;
 	dev->caps.max_wqes	     = dev_cap->max_qp_sz;
 	dev->caps.max_qp_init_rdma   = dev_cap->max_requester_per_qp;
-	dev->caps.reserved_qps	     = dev_cap->reserved_qps;
 	dev->caps.max_srq_wqes	     = dev_cap->max_srq_sz;
 	dev->caps.max_srq_sge	     = dev_cap->max_rq_sg - 1;
 	dev->caps.reserved_srqs	     = dev_cap->reserved_srqs;
@@ -161,6 +176,39 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
 	dev->caps.max_gso_sz	     = dev_cap->max_gso_sz;

+	dev->caps.log_num_macs  = ilog2(roundup_pow_of_two(num_mac + 1));
+	dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2));
+	dev->caps.log_num_prios = use_prio ? 3: 0;
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
+			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
+			mlx4_warn(dev, "Requested number of MACs is too much "
+				  "for port %d, reducing to %d.\n",
+				  i, 1 << dev->caps.log_num_macs);
+		}
+		if (dev->caps.log_num_vlans > dev_cap->log_max_vlans[i]) {
+			dev->caps.log_num_vlans = dev_cap->log_max_vlans[i];
+			mlx4_warn(dev, "Requested number of VLANs is too much "
+				  "for port %d, reducing to %d.\n",
+				  i, 1 << dev->caps.log_num_vlans);
+		}
+	}
+
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] = dev_cap->reserved_qps;
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] =
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_ADDR] =
+		(1 << dev->caps.log_num_macs)*
+		(1 << dev->caps.log_num_vlans)*
+		(1 << dev->caps.log_num_prios)*
+		dev->caps.num_ports;
+	dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] = MLX4_NUM_FEXCH;
+
+	dev->caps.reserved_qps = dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW] +
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_ETH_ADDR] +
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH] +
+		dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH];
+
 	return 0;
 }

@@ -209,7 +257,8 @@ static int mlx4_init_cmpt_table(struct mlx4_dev *dev, u64 cmpt_base,
 				  ((u64) (MLX4_CMPT_TYPE_QP *
 					  cmpt_entry_sz) << MLX4_CMPT_SHIFT),
 				  cmpt_entry_sz, dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err)
 		goto err;

@@ -334,7 +383,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->qpc_base,
 				  dev_cap->qpc_entry_sz,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map QP context memory, aborting.\n");
 		goto err_unmap_dmpt;
@@ -344,7 +394,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->auxc_base,
 				  dev_cap->aux_entry_sz,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map AUXC context memory, aborting.\n");
 		goto err_unmap_qp;
@@ -354,7 +405,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->altc_base,
 				  dev_cap->altc_entry_sz,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map ALTC context memory, aborting.\n");
 		goto err_unmap_auxc;
@@ -364,7 +416,8 @@ static int mlx4_init_icm(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap,
 				  init_hca->rdmarc_base,
 				  dev_cap->rdmarc_entry_sz << priv->qp_table.rdmarc_shift,
 				  dev->caps.num_qps,
-				  dev->caps.reserved_qps, 0, 0);
+				  dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW],
+				  0, 0);
 	if (err) {
 		mlx4_err(dev, "Failed to map RDMARC context memory, aborting\n");
 		goto err_unmap_altc;
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 89d4ccc..b74405a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -111,6 +111,7 @@ struct mlx4_bitmap {
 	u32			last;
 	u32			top;
 	u32			max;
+	u32                     effective_max;
 	u32			mask;
 	spinlock_t		lock;
 	unsigned long	       *table;
@@ -290,6 +291,9 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj);
 u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
 void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt);
 int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, u32 mask, u32 reserved);
+int mlx4_bitmap_init_with_effective_max(struct mlx4_bitmap *bitmap,
+					u32 num, u32 mask, u32 reserved,
+					u32 effective_max);
 void mlx4_bitmap_cleanup(struct mlx4_bitmap *bitmap);

 int mlx4_reset(struct mlx4_dev *dev);
diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index dff8e66..2d5be15 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -273,6 +273,7 @@ int mlx4_init_qp_table(struct mlx4_dev *dev)
 {
 	struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table;
 	int err;
+	int reserved_from_top = 0;

 	spin_lock_init(&qp_table->lock);
 	INIT_RADIX_TREE(&dev->qp_table_tree, GFP_ATOMIC);
@@ -282,9 +283,43 @@ int mlx4_init_qp_table(struct mlx4_dev *dev)
 	 * block of special QPs must be aligned to a multiple of 8, so
 	 * round up.
 	 */
-	dev->caps.sqp_start = ALIGN(dev->caps.reserved_qps, 8);
-	err = mlx4_bitmap_init(&qp_table->bitmap, dev->caps.num_qps,
-			       (1 << 24) - 1, dev->caps.sqp_start + 8);
+	dev->caps.sqp_start =
+		ALIGN(dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FW], 8);
+
+	{
+		int sort[MLX4_QP_REGION_COUNT];
+		int i, j, tmp;
+		int last_base = dev->caps.num_qps;
+
+		for (i = 1; i < MLX4_QP_REGION_COUNT; ++i)
+			sort[i] = i;
+
+		for (i = MLX4_QP_REGION_COUNT; i > 0; --i) {
+			for (j = 2; j < i; ++j) {
+				if (dev->caps.reserved_qps_cnt[sort[j]] >
+				    dev->caps.reserved_qps_cnt[sort[j - 1]]) {
+					tmp             = sort[j];
+					sort[j]         = sort[j - 1];
+					sort[j - 1]     = tmp;
+				}
+			}
+		}
+
+		for (i = 1; i < MLX4_QP_REGION_COUNT; ++i) {
+			last_base -= dev->caps.reserved_qps_cnt[sort[i]];
+			dev->caps.reserved_qps_base[sort[i]] = last_base;
+			reserved_from_top +=
+				dev->caps.reserved_qps_cnt[sort[i]];
+		}
+
+	}
+
+	err = mlx4_bitmap_init_with_effective_max(&qp_table->bitmap,
+						  dev->caps.num_qps,
+						  (1 << 23) - 1,
+						  dev->caps.sqp_start + 8,
+						  dev->caps.num_qps -
+						  reserved_from_top);
 	if (err)
 		return err;

@@ -297,6 +332,20 @@ void mlx4_cleanup_qp_table(struct mlx4_dev *dev)
 	mlx4_bitmap_cleanup(&mlx4_priv(dev)->qp_table.bitmap);
 }

+int mlx4_qp_get_region(struct mlx4_dev *dev,
+		       enum qp_region region,
+		       int *base_qpn, int *cnt)
+{
+	if ((region < 0) || (region >= MLX4_QP_REGION_COUNT))
+		return -EINVAL;
+
+	*base_qpn       = dev->caps.reserved_qps_base[region];
+	*cnt            = dev->caps.reserved_qps_cnt[region];
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_get_region);
+
 int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
 		  struct mlx4_qp_context *context)
 {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 9c77bf3..955eeca 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -135,6 +135,18 @@ enum {
 	MLX4_STAT_RATE_OFFSET	= 5
 };

+enum qp_region {
+	MLX4_QP_REGION_FW = 0,
+	MLX4_QP_REGION_ETH_ADDR,
+	MLX4_QP_REGION_FC_ADDR,
+	MLX4_QP_REGION_FC_EXCH,
+	MLX4_QP_REGION_COUNT
+};
+
+enum {
+	MLX4_NUM_FEXCH          = 64 * 1024,
+};
+
 static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor)
 {
 	return (major << 32) | (minor << 16) | subminor;
@@ -159,7 +171,6 @@ struct mlx4_caps {
 	int			max_rq_desc_sz;
 	int			max_qp_init_rdma;
 	int			max_qp_dest_rdma;
-	int			reserved_qps;
 	int			sqp_start;
 	int			num_srqs;
 	int			max_srq_wqes;
@@ -189,6 +200,12 @@ struct mlx4_caps {
 	u16			stat_rate_support;
 	u8			port_width_cap[MLX4_MAX_PORTS + 1];
 	int			max_gso_sz;
+	int                     reserved_qps_cnt[MLX4_QP_REGION_COUNT];
+	int			reserved_qps;
+	int                     reserved_qps_base[MLX4_QP_REGION_COUNT];
+	int                     log_num_macs;
+	int                     log_num_vlans;
+	int                     log_num_prios;
 };

 struct mlx4_buf_list {
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index a5e43fe..5a02980 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -303,4 +316,8 @@ static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn)

 void mlx4_qp_remove(struct mlx4_dev *dev, struct mlx4_qp *qp);

+int mlx4_qp_get_region(struct mlx4_dev *dev,
+               enum qp_region region,
+               int *base_qpn, int *cnt);
+
 #endif /* MLX4_QP_H */
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:00:14 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:00:14 +0300
Subject: [ofa-general][PATCH 5/12 v1] mlx4: Different port type support
Message-ID: <480F4EFE.7020807@mellanox.co.il>

>From 0d3da6ad682c4655cd909aefe5bc294c55f5f711 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Mon, 21 Apr 2008 17:40:57 +0300
Subject: [PATCH] mlx4: Different port type support

Multi protocol supports different port types.
The port types are delivered through module parameters,
crossed with firmware capabilities.
Each consumer of mlx4_core should query for supported port types,
mlx4_ib can no longer assume that all physical ports belong to it.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/mad.c     |    6 +-
 drivers/infiniband/hw/mlx4/main.c    |   12 ++++-
 drivers/infiniband/hw/mlx4/mlx4_ib.h |    2 +
 drivers/net/mlx4/fw.c                |    4 ++
 drivers/net/mlx4/fw.h                |    1 +
 drivers/net/mlx4/main.c              |   84 ++++++++++++++++++++++++++++++++++
 include/linux/mlx4/device.h          |   32 +++++++++++++
 7 files changed, 136 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 4c1e72f..d91ba56 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -297,7 +297,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
 	int p, q;
 	int ret;

-	for (p = 0; p < dev->dev->caps.num_ports; ++p)
+	for (p = 0; p < dev->num_ports; ++p)
 		for (q = 0; q <= 1; ++q) {
 			agent = ib_register_mad_agent(&dev->ib_dev, p + 1,
 						      q ? IB_QPT_GSI : IB_QPT_SMI,
@@ -313,7 +313,7 @@ int mlx4_ib_mad_init(struct mlx4_ib_dev *dev)
 	return 0;

 err:
-	for (p = 0; p < dev->dev->caps.num_ports; ++p)
+	for (p = 0; p < dev->num_ports; ++p)
 		for (q = 0; q <= 1; ++q)
 			if (dev->send_agent[p][q])
 				ib_unregister_mad_agent(dev->send_agent[p][q]);
@@ -326,7 +326,7 @@ void mlx4_ib_mad_cleanup(struct mlx4_ib_dev *dev)
 	struct ib_mad_agent *agent;
 	int p, q;

-	for (p = 0; p < dev->dev->caps.num_ports; ++p) {
+	for (p = 0; p < dev->num_ports; ++p) {
 		for (q = 0; q <= 1; ++q) {
 			agent = dev->send_agent[p][q];
 			dev->send_agent[p][q] = NULL;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 3c7f938..507dbe3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -549,11 +549,15 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	MLX4_INIT_DOORBELL_LOCK(&ibdev->uar_lock);

 	ibdev->dev = dev;
+	ibdev->ports_map = mlx4_get_ports_of_type(dev, MLX4_PORT_TYPE_IB);

 	strlcpy(ibdev->ib_dev.name, "mlx4_%d", IB_DEVICE_NAME_MAX);
 	ibdev->ib_dev.owner		= THIS_MODULE;
 	ibdev->ib_dev.node_type		= RDMA_NODE_IB_CA;
-	ibdev->ib_dev.phys_port_cnt	= dev->caps.num_ports;
+	ibdev->num_ports = 0;
+	mlx4_foreach_port(i, ibdev->ports_map)
+		ibdev->num_ports++;
+	ibdev->ib_dev.phys_port_cnt     = ibdev->num_ports;
 	ibdev->ib_dev.num_comp_vectors	= 1;
 	ibdev->ib_dev.dma_device	= &dev->pdev->dev;

@@ -667,7 +671,7 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void *ibdev_ptr)
 	struct mlx4_ib_dev *ibdev = ibdev_ptr;
 	int p;

-	for (p = 1; p <= dev->caps.num_ports; ++p)
+	for (p = 1; p <= ibdev->num_ports; ++p)
 		mlx4_CLOSE_PORT(dev, p);

 	mlx4_ib_mad_cleanup(ibdev);
@@ -682,6 +686,10 @@ static void mlx4_ib_event(struct mlx4_dev *dev, void *ibdev_ptr,
 			  enum mlx4_dev_event event, int port)
 {
 	struct ib_event ibev;
+	struct mlx4_ib_dev *ibdev = to_mdev((struct ib_device *) ibdev_ptr);
+
+	if (port > ibdev->num_ports)
+		return;

 	switch (event) {
 	case MLX4_DEV_EVENT_PORT_UP:
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 5cf9947..9d4f7a7 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -155,6 +155,8 @@ struct mlx4_ib_ah {
 struct mlx4_ib_dev {
 	struct ib_device	ib_dev;
 	struct mlx4_dev	       *dev;
+	u32			ports_map;
+	int			num_ports;
 	void __iomem	       *uar_map;

 	struct mlx4_uar		priv_uar;
diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index b0ad0d1..e875b08 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -322,6 +322,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			dev_cap->max_pkeys[i]	   = 1 << (field & 0xf);
 		}
 	} else {
+#define QUERY_PORT_SUPPORTED_TYPE_OFFSET	0x00
 #define QUERY_PORT_MTU_OFFSET			0x01
 #define QUERY_PORT_WIDTH_OFFSET			0x06
 #define QUERY_PORT_MAX_GID_PKEY_OFFSET		0x07
@@ -334,6 +335,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			if (err)
 				goto out;

+			MLX4_GET(field, outbox,
+				 QUERY_PORT_SUPPORTED_TYPE_OFFSET);
+			dev_cap->supported_port_types[i] = field & 3;
 			MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET);
 			dev_cap->max_mtu[i]	   = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET);
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index a2e827c..50a6a7d 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -97,6 +97,7 @@ struct mlx4_dev_cap {
 	u32 reserved_lkey;
 	u64 max_icm_sz;
 	int max_gso_sz;
+	u8  supported_port_types[MLX4_MAX_PORTS + 1];
 	u8  log_max_macs[MLX4_MAX_PORTS + 1];
 	u8  log_max_vlans[MLX4_MAX_PORTS + 1];
 };
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index f309532..1651d8e 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -100,11 +100,50 @@ module_param_named(use_prio, use_prio, bool, 0444);
 MODULE_PARM_DESC(use_prio, "Enable steering by VLAN priority on ETH ports "
 		  "(0/1, default 0)");

+static char *port_type_arr[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = "ib"};
+module_param_array_named(port_type, port_type_arr, charp, NULL, 0444);
+MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, "
+		  "comma seperated, default ib for all)");
+
+static int mlx4_check_port_params(struct mlx4_dev *dev,
+				  enum mlx4_port_type *port_type)
+{
+	if (port_type[0] != port_type[1] &&
+	    !(dev->caps.flags & MLX4_DEV_CAP_FLAG_DPDP)) {
+		mlx4_err(dev, "Only same port types supported "
+			 "on this HCA, aborting.\n");
+		return -EINVAL;
+	}
+	if ((port_type[0] == MLX4_PORT_TYPE_ETH) &&
+	    (port_type[1] == MLX4_PORT_TYPE_IB)) {
+		mlx4_err(dev, "eth-ib configuration is not supported.\n");
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static void mlx4_str2port_type(char **port_str,
+			       enum mlx4_port_type *port_type)
+{
+	int i;
+
+	for (i = 0; i < MLX4_MAX_PORTS; i++) {
+		if (!strcmp(port_str[i], "eth"))
+			port_type[i] = MLX4_PORT_TYPE_ETH;
+		else
+			port_type[i] = MLX4_PORT_TYPE_IB;
+	}
+}
+
+

 static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 {
 	int err;
 	int i;
+	enum mlx4_port_type port_type[MLX4_MAX_PORTS];
+
+	mlx4_str2port_type(port_type_arr, port_type);

 	err = mlx4_QUERY_DEV_CAP(dev, dev_cap);
 	if (err) {
@@ -180,7 +219,24 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.log_num_vlans = ilog2(roundup_pow_of_two(num_vlan + 2));
 	dev->caps.log_num_prios = use_prio ? 3: 0;

+	err = mlx4_check_port_params(dev, port_type);
+	if (err)
+		return err;
+
 	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (!dev_cap->supported_port_types[i]) {
+			mlx4_warn(dev, "FW doesn't support Multi Protocol, "
+				  "loading IB only\n");
+			dev->caps.port_type[i] = MLX4_PORT_TYPE_IB;
+			continue;
+		}
+		if (port_type[i-1] & dev_cap->supported_port_types[i])
+			dev->caps.port_type[i] = port_type[i-1];
+		else {
+			mlx4_err(dev, "Requested port type for port %d "
+				 "not supported by HW\n", i);
+			return -ENODEV;
+		}
 		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
 			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
 			mlx4_warn(dev, "Requested number of MACs is too much "
@@ -1004,10 +1060,38 @@ static struct pci_driver mlx4_driver = {
 	.remove		= __devexit_p(mlx4_remove_one)
 };

+static int __init mlx4_verify_params(void)
+{
+	int i;
+
+	for (i = 0; i < MLX4_MAX_PORTS; ++i) {
+		if (strcmp(port_type_arr[i], "eth") &&
+		    strcmp(port_type_arr[i], "ib")) {
+			printk(KERN_WARNING "mlx4_core: bad port_type for "
+			       "port %d: %s\n", i, port_type_arr[i]);
+			return -1;
+		}
+	}
+	if ((num_mac < 1) || (num_mac > 127)) {
+		printk(KERN_WARNING "mlx4_core: bad num_mac: %d\n", num_mac);
+		return -1;
+	}
+
+	if ((num_vlan < 0) || (num_vlan > 126)) {
+		printk(KERN_WARNING "mlx4_core: bad num_vlan: %d\n", num_vlan);
+		return -1;
+	}
+
+	return 0;
+}
+
 static int __init mlx4_init(void)
 {
 	int ret;

+	if (mlx4_verify_params())
+		return -EINVAL;
+
 	ret = mlx4_catas_init();
 	if (ret)
 		return ret;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 955eeca..4279b2f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -62,6 +62,7 @@ enum {
 	MLX4_DEV_CAP_FLAG_IPOIB_CSUM	= 1 <<  7,
 	MLX4_DEV_CAP_FLAG_BAD_PKEY_CNTR	= 1 <<  8,
 	MLX4_DEV_CAP_FLAG_BAD_QKEY_CNTR	= 1 <<  9,
+	MLX4_DEV_CAP_FLAG_DPDP		= 1 << 12,
 	MLX4_DEV_CAP_FLAG_MEM_WINDOW	= 1 << 16,
 	MLX4_DEV_CAP_FLAG_APM		= 1 << 17,
 	MLX4_DEV_CAP_FLAG_ATOMIC	= 1 << 18,
@@ -143,6 +144,11 @@ enum qp_region {
 	MLX4_QP_REGION_COUNT
 };

+enum mlx4_port_type {
+	MLX4_PORT_TYPE_IB	= 1 << 0,
+	MLX4_PORT_TYPE_ETH	= 1 << 1,
+};
+
 enum {
 	MLX4_NUM_FEXCH          = 64 * 1024,
 };
@@ -206,6 +212,7 @@ struct mlx4_caps {
 	int                     log_num_macs;
 	int                     log_num_vlans;
 	int                     log_num_prios;
+	enum mlx4_port_type	port_type[MLX4_MAX_PORTS + 1];
 };

 struct mlx4_buf_list {
@@ -365,6 +372,31 @@ struct mlx4_init_port_param {
 	u64			si_guid;
 };

+static inline void mlx4_query_steer_cap(struct mlx4_dev *dev, int *log_mac,
+					int *log_vlan, int *log_prio)
+{
+	*log_mac = dev->caps.log_num_macs;
+	*log_vlan = dev->caps.log_num_vlans;
+	*log_prio = dev->caps.log_num_prios;
+}
+
+static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev,
+					 enum mlx4_port_type ptype)
+{
+	u32 ret = 0;
+	int i;
+
+	for (i = 1; i <= dev->caps.num_ports; ++i) {
+		if (dev->caps.port_type[i] == ptype)
+			ret |= 1 << (i-1);
+	}
+	return ret;
+}
+
+#define mlx4_foreach_port(port, bitmap) \
+	    for ((port) = 1; (port) <= MLX4_MAX_PORTS; (port)++) \
+		if (bitmap & 1 << ((port)-1))
+
 int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
 		   struct mlx4_buf *buf);
 void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:02:09 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:02:09 +0300
Subject: [ofa-general][PATCH 6/12 1] mlx4: Port Ethernet mtu capabilities
	handle
Message-ID: <480F4F71.2000707@mellanox.co.il>

>From a37cec875c323ddebe4f0289e4bab774fd9ec0f4 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 13:25:19 +0300
Subject: [PATCH] mlx4: Port Ethernet mtu capabilities handle

Ethernet max mtu and default Mac address are revealed through
QUERY_DEV_CAP command. The received mtu is crossed with requested
max mtu (passed by module parameter).

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/fw.c             |   11 ++++++-----
 drivers/net/mlx4/fw.h             |    4 +++-
 drivers/net/mlx4/main.c           |   15 ++++++++++++++-
 include/linux/mlx4/device.h       |    4 +++-
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/mlx4/fw.c b/drivers/net/mlx4/fw.c
index e875b08..1cbc30f 100644
--- a/drivers/net/mlx4/fw.c
+++ b/drivers/net/mlx4/fw.c
@@ -314,7 +314,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			MLX4_GET(field, outbox, QUERY_DEV_CAP_VL_PORT_OFFSET);
 			dev_cap->max_vl[i]	   = field >> 4;
 			MLX4_GET(field, outbox, QUERY_DEV_CAP_MTU_WIDTH_OFFSET);
-			dev_cap->max_mtu[i]	   = field >> 4;
+			dev_cap->ib_mtu[i]	   = field >> 4;
 			dev_cap->max_port_width[i] = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_DEV_CAP_MAX_GID_OFFSET);
 			dev_cap->max_gids[i]	   = 1 << (field & 0xf);
@@ -339,7 +339,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 				 QUERY_PORT_SUPPORTED_TYPE_OFFSET);
 			dev_cap->supported_port_types[i] = field & 3;
 			MLX4_GET(field, outbox, QUERY_PORT_MTU_OFFSET);
-			dev_cap->max_mtu[i]	   = field & 0xf;
+			dev_cap->ib_mtu[i]	   = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_PORT_WIDTH_OFFSET);
 			dev_cap->max_port_width[i] = field & 0xf;
 			MLX4_GET(field, outbox, QUERY_PORT_MAX_GID_PKEY_OFFSET);
@@ -350,7 +350,8 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 			MLX4_GET(field, outbox, QUERY_PORT_MAX_MACVLAN_OFFSET);
 			dev_cap->log_max_macs[i]  = field & 0xf;
 			dev_cap->log_max_vlans[i] = field >> 4;
-
+			dev_cap->eth_mtu[i] = be16_to_cpu(((u16 *) outbox)[1]);
+			dev_cap->def_mac[i] = be64_to_cpu(((u64 *) outbox)[2]);
 		}
 	}

@@ -388,7 +389,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	mlx4_dbg(dev, "Max CQEs: %d, max WQEs: %d, max SRQ WQEs: %d\n",
 		 dev_cap->max_cq_sz, dev_cap->max_qp_sz, dev_cap->max_srq_sz);
 	mlx4_dbg(dev, "Local CA ACK delay: %d, max MTU: %d, port width cap: %d\n",
-		 dev_cap->local_ca_ack_delay, 128 << dev_cap->max_mtu[1],
+		 dev_cap->local_ca_ack_delay, 128 << dev_cap->ib_mtu[1],
 		 dev_cap->max_port_width[1]);
 	mlx4_dbg(dev, "Max SQ desc size: %d, max SQ S/G: %d\n",
 		 dev_cap->max_sq_desc_sz, dev_cap->max_sq_sg);
@@ -796,7 +797,7 @@ int mlx4_INIT_PORT(struct mlx4_dev *dev, int port)
 		flags |= (dev->caps.port_width_cap[port] & 0xf) << INIT_PORT_PORT_WIDTH_SHIFT;
 		MLX4_PUT(inbox, flags,		  INIT_PORT_FLAGS_OFFSET);

-		field = 128 << dev->caps.mtu_cap[port];
+		field = 128 << dev->caps.ib_mtu_cap[port];
 		MLX4_PUT(inbox, field, INIT_PORT_MTU_OFFSET);
 		field = dev->caps.gid_table_len[port];
 		MLX4_PUT(inbox, field, INIT_PORT_MAX_GID_OFFSET);
diff --git a/drivers/net/mlx4/fw.h b/drivers/net/mlx4/fw.h
index 50a6a7d..ef964d5 100644
--- a/drivers/net/mlx4/fw.h
+++ b/drivers/net/mlx4/fw.h
@@ -61,11 +61,13 @@ struct mlx4_dev_cap {
 	int local_ca_ack_delay;
 	int num_ports;
 	u32 max_msg_sz;
-	int max_mtu[MLX4_MAX_PORTS + 1];
+	int ib_mtu[MLX4_MAX_PORTS + 1];
 	int max_port_width[MLX4_MAX_PORTS + 1];
 	int max_vl[MLX4_MAX_PORTS + 1];
 	int max_gids[MLX4_MAX_PORTS + 1];
 	int max_pkeys[MLX4_MAX_PORTS + 1];
+	u64 def_mac[MLX4_MAX_PORTS + 1];
+	int eth_mtu[MLX4_MAX_PORTS + 1];
 	u16 stat_rate_support;
 	u32 flags;
 	int reserved_uars;
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 1651d8e..754c07c 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -104,6 +104,11 @@ static struct mlx4_profile default_profile = {
 module_param_array_named(port_type, port_type_arr, charp, NULL, 0444);
 MODULE_PARM_DESC(port_type, "Ports L2 type (ib/eth/auto, entry per port, "
 		  "comma seperated, default ib for all)");
+
+static int port_mtu[MLX4_MAX_PORTS] = { [0 ... (MLX4_MAX_PORTS-1)] = 9600};
+module_param_array_named(port_mtu, port_mtu, int, NULL, 0444);
+MODULE_PARM_DESC(port_mtu, "Ports max mtu in Bytes, entry per port, "
+		 "comma seperated, default 9600 for all");

 static int mlx4_check_port_params(struct mlx4_dev *dev,
 				  enum mlx4_port_type *port_type)
@@ -175,10 +180,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	dev->caps.num_ports	     = dev_cap->num_ports;
 	for (i = 1; i <= dev->caps.num_ports; ++i) {
 		dev->caps.vl_cap[i]	    = dev_cap->max_vl[i];
-		dev->caps.mtu_cap[i]	    = dev_cap->max_mtu[i];
+		dev->caps.ib_mtu_cap[i]	    = dev_cap->ib_mtu[i];
 		dev->caps.gid_table_len[i]  = dev_cap->max_gids[i];
 		dev->caps.pkey_table_len[i] = dev_cap->max_pkeys[i];
 		dev->caps.port_width_cap[i] = dev_cap->max_port_width[i];
+		dev->caps.eth_mtu_cap[i]    = dev_cap->eth_mtu[i];
+		dev->caps.def_mac[i]        = dev_cap->def_mac[i];
 	}

 	dev->caps.num_uars	     = dev_cap->uar_size / PAGE_SIZE;
@@ -237,6 +244,12 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 				 "not supported by HW\n", i);
 			return -ENODEV;
 		}
+		if (port_mtu[i-1] <= dev->caps.eth_mtu_cap[i])
+			dev->caps.eth_mtu_cap[i] = port_mtu[i-1];
+		else
+			mlx4_warn(dev, "Requested mtu for port %d is larger "
+				  "then supported, reducing to %d\n",
+				  i, dev->caps.eth_mtu_cap[i]);
 		if (dev->caps.log_num_macs > dev_cap->log_max_macs[i]) {
 			dev->caps.log_num_macs = dev_cap->log_max_macs[i];
 			mlx4_warn(dev, "Requested number of MACs is too much "
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 4279b2f..b114ef3 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -162,7 +162,9 @@ struct mlx4_caps {
 	u64			fw_ver;
 	int			num_ports;
 	int			vl_cap[MLX4_MAX_PORTS + 1];
-	int			mtu_cap[MLX4_MAX_PORTS + 1];
+	int			ib_mtu_cap[MLX4_MAX_PORTS + 1];
+	u64			def_mac[MLX4_MAX_PORTS + 1];
+	int			eth_mtu_cap[MLX4_MAX_PORTS + 1];
 	int			gid_table_len[MLX4_MAX_PORTS + 1];
 	int			pkey_table_len[MLX4_MAX_PORTS + 1];
 	int			local_ca_ack_delay;
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:03:51 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:03:51 +0300
Subject: [ofa-general][PATCH 7/12 v1] mlx4: Mac Vlan Management
Message-ID: <480F4FD7.4010706@mellanox.co.il>

>From 93d41d72b8878bfd8d67b6a48b70c392f108fe58 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 14:28:36 +0300
Subject: [PATCH] mlx4: Mac Vlan Management

mlx4_core is now responsible for managing Mac and Vlan filters for
each port. It also notifies the FW which port type will be loaded,
using the SET_PORT command

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/Makefile   |    2 +-
 drivers/net/mlx4/main.c     |   18 +++
 drivers/net/mlx4/mlx4.h     |   35 ++++++
 drivers/net/mlx4/port.c     |  278 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/mlx4/cmd.h    |    9 ++
 include/linux/mlx4/device.h |    6 +
 6 files changed, 347 insertions(+), 1 deletions(-)
 create mode 100644 drivers/net/mlx4/port.c

diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 0952a65..f4932d8 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_MLX4_CORE)		+= mlx4_core.o

 mlx4_core-y :=	alloc.o catas.o cmd.o cq.o eq.o fw.o icm.o intf.o main.o mcg.o \
-		mr.o pd.o profile.o qp.o reset.o srq.o
+		mr.o pd.o profile.o qp.o reset.o srq.o port.o
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index 754c07c..a528809 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -678,6 +678,7 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	int err;
+	int port;

 	err = mlx4_init_uar_table(dev);
 	if (err) {
@@ -776,8 +777,25 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
 		goto err_qp_table_free;
 	}

+	for (port = 1; port <= dev->caps.num_ports; port++) {
+		err = mlx4_SET_PORT(dev, port);
+		if (err) {
+			mlx4_err(dev, "Failed to set port %d, aborting\n",
+				 port);
+			goto err_mcg_table_free;
+		}
+	}
+
+	for (port = 0; port < dev->caps.num_ports; port++) {
+		mlx4_init_mac_table(dev, port);
+		mlx4_init_vlan_table(dev, port);
+	}
+
 	return 0;

+err_mcg_table_free:
+	mlx4_cleanup_mcg_table(dev);
+
 err_qp_table_free:
 	mlx4_cleanup_qp_table(dev);

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index b74405a..eff1c5a 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -251,6 +251,35 @@ struct mlx4_catas_err {
 	struct list_head	list;
 };

+struct mlx4_mac_table {
+#define MLX4_MAX_MAC_NUM	128
+#define MLX4_MAC_MASK		0xffffffffffff
+#define MLX4_MAC_VALID_SHIFT	63
+#define MLX4_MAC_TABLE_SIZE	MLX4_MAX_MAC_NUM << 3
+	__be64 entries[MLX4_MAX_MAC_NUM];
+	int refs[MLX4_MAX_MAC_NUM];
+	struct semaphore mac_sem;
+	int total;
+	int max;
+};
+
+struct mlx4_vlan_table {
+#define MLX4_MAX_VLAN_NUM	126
+#define MLX4_VLAN_MASK		0xfff
+#define MLX4_VLAN_VALID		1 << 31
+#define MLX4_VLAN_TABLE_SIZE	MLX4_MAX_VLAN_NUM << 2
+	__be32 entries[MLX4_MAX_VLAN_NUM];
+	int refs[MLX4_MAX_VLAN_NUM];
+	struct semaphore vlan_sem;
+	int total;
+	int max;
+};
+
+struct mlx4_port_info {
+	struct mlx4_mac_table	mac_table;
+	struct mlx4_vlan_table	vlan_table;
+};
+
 struct mlx4_priv {
 	struct mlx4_dev		dev;

@@ -279,6 +308,7 @@ struct mlx4_priv {

 	struct mlx4_uar		driver_uar;
 	void __iomem	       *kar;
+	struct mlx4_port_info	port[MLX4_MAX_PORTS];
 };

 static inline struct mlx4_priv *mlx4_priv(struct mlx4_dev *dev)
@@ -351,4 +381,9 @@ void mlx4_srq_event(struct mlx4_dev *dev, u32 srqn, int event_type);

 void mlx4_handle_catas_err(struct mlx4_dev *dev);

+void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port);
+void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port);
+
+int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port);
+
 #endif /* MLX4_H */
diff --git a/drivers/net/mlx4/port.c b/drivers/net/mlx4/port.c
new file mode 100644
index 0000000..910fc35
--- /dev/null
+++ b/drivers/net/mlx4/port.c
@@ -0,0 +1,278 @@
+/*
+ * Copyright (c) 2007 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/if_ether.h>
+
+#include <linux/mlx4/cmd.h>
+
+#include "mlx4.h"
+
+void mlx4_init_mac_table(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port].mac_table;
+	int i;
+
+	sema_init(&table->mac_sem, 1);
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		table->entries[i] = 0;
+		table->refs[i] = 0;
+	}
+	table->max = 1 << dev->caps.log_num_macs;
+	table->total = 0;
+}
+
+void mlx4_init_vlan_table(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port].vlan_table;
+	int i;
+
+	sema_init(&table->vlan_sem, 1);
+	for (i = 0; i < MLX4_MAX_MAC_NUM; i++) {
+		table->entries[i] = 0;
+		table->refs[i] = 0;
+	}
+	table->max = 1 << dev->caps.log_num_vlans;
+	table->total = 0;
+}
+
+static int mlx4_SET_PORT_mac_table(struct mlx4_dev *dev, u8 port,
+				   __be64 *entries)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 in_mod;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memcpy(mailbox->buf, entries, MLX4_MAC_TABLE_SIZE);
+
+	in_mod = MLX4_SET_PORT_MAC_TABLE << 8 | port;
+	err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+
+int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table;
+	int i, err = 0;
+	int free = -1;
+	u64 valid = 1;
+
+	mlx4_dbg(dev, "Registering mac : 0x%llx\n", mac);
+	down(&table->mac_sem);
+	for (i = 0; i < MLX4_MAX_MAC_NUM - 1; i++) {
+		if (free < 0 && !table->refs[i]) {
+			free = i;
+			continue;
+		}
+
+		if (mac == (MLX4_MAC_MASK & be64_to_cpu(table->entries[i]))) {
+			/* Mac already registered, increase refernce count */
+			*index = i;
+			++table->refs[i];
+			goto out;
+		}
+	}
+	mlx4_dbg(dev, "Free mac index is %d\n", free);
+
+	if (table->total == table->max) {
+		/* No free mac entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
+	/* Register new MAC */
+	table->refs[free] = 1;
+	table->entries[free] = cpu_to_be64(mac | valid << MLX4_MAC_VALID_SHIFT);
+
+	err = mlx4_SET_PORT_mac_table(dev, port, table->entries);
+	if (unlikely(err)) {
+		mlx4_err(dev, "Failed adding mac: 0x%llx\n", mac);
+		table->refs[free] = 0;
+		table->entries[free] = 0;
+		goto out;
+	}
+
+	*index = free;
+	++table->total;
+out:
+	up(&table->mac_sem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_register_mac);
+
+void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index)
+{
+	struct mlx4_mac_table *table = &mlx4_priv(dev)->port[port - 1].mac_table;
+
+	down(&table->mac_sem);
+	if (!table->refs[index]) {
+		mlx4_warn(dev, "No mac entry for index %d\n", index);
+		goto out;
+	}
+	if (--table->refs[index]) {
+		mlx4_warn(dev, "Have more references for index %d,"
+			  "no need to modify mac table\n", index);
+		goto out;
+	}
+	table->entries[index] = 0;
+	mlx4_SET_PORT_mac_table(dev, port, table->entries);
+	--table->total;
+out:
+	up(&table->mac_sem);
+}
+EXPORT_SYMBOL_GPL(mlx4_unregister_mac);
+
+static int mlx4_SET_PORT_vlan_table(struct mlx4_dev *dev, u8 port,
+				    __be32 *entries)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	u32 in_mod;
+	int err;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memcpy(mailbox->buf, entries, MLX4_VLAN_TABLE_SIZE);
+	in_mod = MLX4_SET_PORT_VLAN_TABLE << 8 | port;
+	err = mlx4_cmd(dev, mailbox->dma, in_mod, 1, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+
+	return err;
+}
+
+int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table;
+	int i, err = 0;
+	int free = -1;
+
+	down(&table->vlan_sem);
+	for (i = 0; i < MLX4_MAX_VLAN_NUM; i++) {
+		if (free < 0 && (table->refs[i] == 0)) {
+			free = i;
+			continue;
+		}
+
+		if (table->refs[i] &&
+		    (vlan == (MLX4_VLAN_MASK &
+			      be32_to_cpu(table->entries[i])))) {
+			/* Vlan already registered, increase refernce count */
+			*index = i;
+			++table->refs[i];
+			goto out;
+		}
+	}
+
+	if (table->total == table->max) {
+		/* No free vlan entries */
+		err = -ENOSPC;
+		goto out;
+	}
+
+	/* Register new MAC */
+	table->refs[free] = 1;
+	table->entries[free] = cpu_to_be32(vlan | MLX4_VLAN_VALID);
+
+	err = mlx4_SET_PORT_vlan_table(dev, port, table->entries);
+	if (unlikely(err)) {
+		mlx4_warn(dev, "Failed adding vlan: %u\n", vlan);
+		table->refs[free] = 0;
+		table->entries[free] = 0;
+		goto out;
+	}
+
+	*index = free;
+	++table->total;
+out:
+	up(&table->vlan_sem);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_register_vlan);
+
+void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index)
+{
+	struct mlx4_vlan_table *table = &mlx4_priv(dev)->port[port - 1].vlan_table;
+
+	down(&table->vlan_sem);
+	if (!table->refs[index]) {
+		mlx4_warn(dev, "No vlan entry for index %d\n", index);
+		goto out;
+	}
+	if (--table->refs[index]) {
+		mlx4_dbg(dev, "Have more references for index %d,"
+			 "no need to modify vlan table\n", index);
+		goto out;
+	}
+	table->entries[index] = 0;
+	mlx4_SET_PORT_vlan_table(dev, port, table->entries);
+	--table->total;
+out:
+	up(&table->vlan_sem);
+}
+EXPORT_SYMBOL_GPL(mlx4_unregister_vlan);
+
+int mlx4_SET_PORT(struct mlx4_dev *dev, u8 port)
+{
+	struct mlx4_cmd_mailbox *mailbox;
+	int err;
+	u8 is_eth = (dev->caps.port_type[port] == MLX4_PORT_TYPE_ETH) ? 1 : 0;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	memset(mailbox->buf, 0, 256);
+	if (is_eth) {
+		((u8 *) mailbox->buf)[3] = 7;
+		((__be16 *) mailbox->buf)[3] =
+			cpu_to_be16(dev->caps.eth_mtu_cap[port] +
+				    ETH_HLEN + ETH_FCS_LEN);
+		((__be16 *) mailbox->buf)[4] = cpu_to_be16(1 << 15);
+		((__be16 *) mailbox->buf)[6] = cpu_to_be16(1 << 15);
+	}
+	err = mlx4_cmd(dev, mailbox->dma, port, is_eth, MLX4_CMD_SET_PORT,
+		       MLX4_CMD_TIME_CLASS_B);
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 77323a7..cf9c679 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -132,6 +132,15 @@ enum {
 	MLX4_MAILBOX_SIZE	=  4096
 };

+enum {
+	/* set port opcode modifiers */
+	MLX4_SET_PORT_GENERAL   = 0x0,
+	MLX4_SET_PORT_RQP_CALC  = 0x1,
+	MLX4_SET_PORT_MAC_TABLE = 0x2,
+	MLX4_SET_PORT_VLAN_TABLE = 0x3,
+	MLX4_SET_PORT_PRIO_MAP  = 0x4,
+};
+
 struct mlx4_dev;

 struct mlx4_cmd_mailbox {
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index b114ef3..4ca3a00 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -461,6 +461,12 @@ int mlx4_CLOSE_PORT(struct mlx4_dev *dev, int port);
 int mlx4_multicast_attach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);
 int mlx4_multicast_detach(struct mlx4_dev *dev, struct mlx4_qp *qp, u8 gid[16]);

+int mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac, int *index);
+void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index);
+
+int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index);
+void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index);
+
 int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list,
 		      int npages, u64 iova, u32 *lkey, u32 *rkey);
 int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:05:10 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:05:10 +0300
Subject: [ofa-general][PATCH 8/12 v1] mlx4: Dynamic port configuration
Message-ID: <480F5026.9070400@mellanox.co.il>

>From e13bef843cb2c7cee5a0ba388d97e21188087424 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 15:14:30 +0300
Subject: [PATCH] mlx4: Dynamic port configuration

Port type can be set using sysfs interface when the low level driver is up.
The low level driver unregisters all its customers and then registers them
again with the new port types (which they query for in add_one)

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/main.c |   97 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 97 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index a528809..e3fd4e9 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -281,6 +281,96 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	return 0;
 }

+static int mlx4_change_port_types(struct mlx4_dev *dev,
+				  enum mlx4_port_type *port_types)
+{
+	int i;
+	int err = 0;
+	int change = 0;
+	int port;
+
+	for (i = 0; i <  MLX4_MAX_PORTS; i++) {
+		if (port_types[i] != dev->caps.port_type[i + 1]) {
+			change = 1;
+			dev->caps.port_type[i + 1] = port_types[i];
+		}
+	}
+	if (change) {
+		mlx4_unregister_device(dev);
+		for (port = 1; port <= dev->caps.num_ports; port++) {
+			mlx4_CLOSE_PORT(dev, port);
+			err = mlx4_SET_PORT(dev, port);
+			if (err) {
+				mlx4_err(dev, "Failed to set port %d, "
+					      "aborting\n", port);
+				return err;
+			}
+		}
+		err = mlx4_register_device(dev);
+	}
+	return err;
+}
+
+static ssize_t show_port_type(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mlx4_dev *mdev = pci_get_drvdata(pdev);
+	int i;
+
+	sprintf(buf, "Current port types:\n");
+	for (i = 1; i <= MLX4_MAX_PORTS; i++) {
+		sprintf(buf, "%sPort%d: %s\n", buf, i,
+			(mdev->caps.port_type[i] == MLX4_PORT_TYPE_IB)?
+			"ib": "eth");
+	}
+	return strlen(buf);
+}
+
+static ssize_t set_port_type(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t count)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mlx4_dev *mdev = pci_get_drvdata(pdev);
+	char *type;
+	enum mlx4_port_type port_types[MLX4_MAX_PORTS];
+	char *loc_buf;
+	char *ptr;
+	int i;
+	int err = 0;
+
+	loc_buf = kmalloc(count + 1, GFP_KERNEL);
+	if (!loc_buf)
+		return -ENOMEM;
+
+	ptr = loc_buf;
+	memcpy(loc_buf, buf, count + 1);
+	for (i = 0; i < MLX4_MAX_PORTS; i++) {
+		type = strsep(&loc_buf, ",");
+		if (!strcmp(type, "ib"))
+			port_types[i] = MLX4_PORT_TYPE_IB;
+		else if (!strcmp(type, "eth"))
+			port_types[i] = MLX4_PORT_TYPE_ETH;
+		else {
+			dev_warn(dev, "%s is not acceptable port type "
+				 "(use 'eth' or 'ib' only)\n", type);
+			err = -EINVAL;
+			goto out;
+		}
+	}
+	err = mlx4_check_port_params(mdev, port_types);
+	if (err)
+		goto out;
+
+	err = mlx4_change_port_types(mdev, port_types);
+out:
+	kfree(ptr);
+	return err ? err: count;
+}
+static DEVICE_ATTR(mlx4_port_type, S_IWUGO | S_IRUGO, show_port_type, set_port_type);
+
 static int mlx4_load_fw(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -979,8 +1069,14 @@ static int __mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)

 	pci_set_drvdata(pdev, dev);

+	if (device_create_file(&pdev->dev, &dev_attr_mlx4_port_type))
+		goto err_sysfs;
+
 	return 0;

+err_sysfs:
+	mlx4_unregister_device(dev);
+
 err_cleanup:
 	mlx4_cleanup_mcg_table(dev);
 	mlx4_cleanup_qp_table(dev);
@@ -1036,6 +1132,7 @@ static void mlx4_remove_one(struct pci_dev *pdev)
 	int p;

 	if (dev) {
+		device_remove_file(&pdev->dev, &dev_attr_mlx4_port_type);
 		mlx4_unregister_device(dev);

 		for (p = 1; p <= dev->caps.num_ports; ++p)
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:06:21 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:06:21 +0300
Subject: [ofa-general][PATCH 9/12 v1] mlx4: Collapsed CQ support
Message-ID: <480F506D.9020202@mellanox.co.il>

>From 749a2b62acc505a9ab2437eddb4cdd45503183d0 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 15:50:51 +0300
Subject: [PATCH] mlx4: Collapsed CQ support

Changed cq creation API to support the creation of collapsed cqs.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c |    2 +-
 drivers/net/mlx4/cq.c           |    4 +++-
 include/linux/mlx4/device.h     |    3 ++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 5e570bb..63daf52 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 	}

 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq);
+			    cq->db.dma, &cq->mcq, 0);
 	if (err)
 		goto err_dbmap;

diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index caa5bcf..d893cc1 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -188,7 +188,8 @@ int mlx4_cq_resize(struct mlx4_dev *dev, struct mlx4_cq *cq,
 EXPORT_SYMBOL_GPL(mlx4_cq_resize);

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq)
+		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
+		  int collapsed)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_cq_table *cq_table = &priv->cq_table;
@@ -224,6 +225,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 	cq_context = mailbox->buf;
 	memset(cq_context, 0, sizeof *cq_context);

+	cq_context->flags = cpu_to_be32(!!collapsed << 18);
 	cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index);
 	cq_context->comp_eqn        = priv->eq_table.eq[MLX4_EQ_COMP].eqn;
 	cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 4ca3a00..93c17aa 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -440,7 +440,8 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,
 		       int size);

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
-		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq);
+		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
+		  int collapsed);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

 int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:07:50 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:07:50 +0300
Subject: [ofa-general][PATCH 10/12 v1] mlx4: Completion EQ per CPU
Message-ID: <480F50C6.80109@mellanox.co.il>

>From 2a2d22208f6fdba4c0c2afdf0ed12ef07b93d661 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Tue, 22 Apr 2008 16:39:47 +0300
Subject: [PATCH] mlx4: Completion EQ per cpu

Completion eq's are created per cpu. Created cq's are attached to an eq by
"Round Robin" algorithm, unless a specific eq was requested.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/infiniband/hw/mlx4/cq.c |    2 +-
 drivers/net/mlx4/cq.c           |   19 ++++++++++++++++---
 drivers/net/mlx4/eq.c           |   39 ++++++++++++++++++++++++++-------------
 drivers/net/mlx4/main.c         |   14 ++++++++------
 drivers/net/mlx4/mlx4.h         |    6 ++++--
 include/linux/mlx4/device.h     |    3 ++-
 6 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 63daf52..732f812 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -221,7 +221,7 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev, int entries, int vector
 	}

 	err = mlx4_cq_alloc(dev->dev, entries, &cq->buf.mtt, uar,
-			    cq->db.dma, &cq->mcq, 0);
+			    cq->db.dma, &cq->mcq, vector, 0);
 	if (err)
 		goto err_dbmap;

diff --git a/drivers/net/mlx4/cq.c b/drivers/net/mlx4/cq.c
index d893cc1..bbb4c7b 100644
--- a/drivers/net/mlx4/cq.c
+++ b/drivers/net/mlx4/cq.c
@@ -189,7 +189,7 @@ EXPORT_SYMBOL_GPL(mlx4_cq_resize);

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
-		  int collapsed)
+		  unsigned vector, int collapsed)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_cq_table *cq_table = &priv->cq_table;
@@ -227,7 +227,20 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,

 	cq_context->flags = cpu_to_be32(!!collapsed << 18);
 	cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | uar->index);
-	cq_context->comp_eqn        = priv->eq_table.eq[MLX4_EQ_COMP].eqn;
+
+	if (vector > priv->eq_table.num_comp_eqs) {
+		err = -EINVAL;
+		goto err_radix;
+	}
+
+	if (vector == 0) {
+		vector = priv->eq_table.last_comp_eq %
+			priv->eq_table.num_comp_eqs + 1;
+		priv->eq_table.last_comp_eq = vector;
+	}
+	cq->comp_eq_idx		    = MLX4_EQ_COMP_CPU0 + vector - 1;
+	cq_context->comp_eqn	    = priv->eq_table.eq[MLX4_EQ_COMP_CPU0 +
+							vector - 1].eqn;
 	cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;

 	mtt_addr = mlx4_mtt_addr(dev, mtt);
@@ -276,7 +289,7 @@ void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq)
 	if (err)
 		mlx4_warn(dev, "HW2SW_CQ failed (%d) for CQN %06x\n", err, cq->cqn);

-	synchronize_irq(priv->eq_table.eq[MLX4_EQ_COMP].irq);
+	synchronize_irq(priv->eq_table.eq[cq->comp_eq_idx].irq);

 	spin_lock_irq(&cq_table->lock);
 	radix_tree_delete(&cq_table->tree, cq->cqn);
diff --git a/drivers/net/mlx4/eq.c b/drivers/net/mlx4/eq.c
index e141a15..b4676db 100644
--- a/drivers/net/mlx4/eq.c
+++ b/drivers/net/mlx4/eq.c
@@ -265,7 +265,7 @@ static irqreturn_t mlx4_interrupt(int irq, void *dev_ptr)

 	writel(priv->eq_table.clr_mask, priv->eq_table.clr_int);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
 		work |= mlx4_eq_int(dev, &priv->eq_table.eq[i]);

 	return IRQ_RETVAL(work);
@@ -482,7 +482,7 @@ static void mlx4_free_irqs(struct mlx4_dev *dev)

 	if (eq_table->have_irq)
 		free_irq(dev->pdev->irq, dev);
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + eq_table->num_comp_eqs; ++i)
 		if (eq_table->eq[i].have_irq)
 			free_irq(eq_table->eq[i].irq, eq_table->eq + i);
 }
@@ -553,6 +553,7 @@ void mlx4_unmap_eq_icm(struct mlx4_dev *dev)
 int mlx4_init_eq_table(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
+	int req_eqs;
 	int err;
 	int i;

@@ -573,11 +574,22 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
 	priv->eq_table.clr_int  = priv->clr_base +
 		(priv->eq_table.inta_pin < 32 ? 4 : 0);

-	err = mlx4_create_eq(dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE,
-			     (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_COMP : 0,
-			     &priv->eq_table.eq[MLX4_EQ_COMP]);
-	if (err)
-		goto err_out_unmap;
+	priv->eq_table.num_comp_eqs = 0;
+	req_eqs = (dev->flags & MLX4_FLAG_MSI_X) ? num_online_cpus() : 1;
+	while (req_eqs) {
+		err = mlx4_create_eq(
+			dev, dev->caps.num_cqs + MLX4_NUM_SPARE_EQE,
+			(dev->flags & MLX4_FLAG_MSI_X) ?
+			(MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs) : 0,
+			&priv->eq_table.eq[MLX4_EQ_COMP_CPU0 +
+			priv->eq_table.num_comp_eqs]);
+		if (err)
+			goto err_out_comp;
+
+		priv->eq_table.num_comp_eqs++;
+		req_eqs--;
+	}
+	priv->eq_table.last_comp_eq = 0;

 	err = mlx4_create_eq(dev, MLX4_NUM_ASYNC_EQE + MLX4_NUM_SPARE_EQE,
 			     (dev->flags & MLX4_FLAG_MSI_X) ? MLX4_EQ_ASYNC : 0,
@@ -587,11 +599,12 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)

 	if (dev->flags & MLX4_FLAG_MSI_X) {
 		static const char *eq_name[] = {
-			[MLX4_EQ_COMP]  = DRV_NAME " (comp)",
+			[MLX4_EQ_COMP_CPU0...MLX4_NUM_EQ] = "comp_" DRV_NAME,
 			[MLX4_EQ_ASYNC] = DRV_NAME " (async)"
 		};

-		for (i = 0; i < MLX4_NUM_EQ; ++i) {
+		for (i = 0; i < MLX4_EQ_COMP_CPU0 +
+		      priv->eq_table.num_comp_eqs; ++i) {
 			err = request_irq(priv->eq_table.eq[i].irq,
 					  mlx4_msi_x_interrupt,
 					  0, eq_name[i], priv->eq_table.eq + i);
@@ -616,7 +629,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
 		mlx4_warn(dev, "MAP_EQ for async EQ %d failed (%d)\n",
 			   priv->eq_table.eq[MLX4_EQ_ASYNC].eqn, err);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
 		eq_set_ci(&priv->eq_table.eq[i], 1);

 	return 0;
@@ -625,9 +638,9 @@ err_out_async:
 	mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_ASYNC]);

 err_out_comp:
-	mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP]);
+	for (i = 0; i < priv->eq_table.num_comp_eqs; ++i)
+		mlx4_free_eq(dev, &priv->eq_table.eq[MLX4_EQ_COMP_CPU0 + i]);

-err_out_unmap:
 	mlx4_unmap_clr_int(dev);
 	mlx4_free_irqs(dev);

@@ -646,7 +659,7 @@ void mlx4_cleanup_eq_table(struct mlx4_dev *dev)

 	mlx4_free_irqs(dev);

-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < MLX4_EQ_COMP_CPU0 + priv->eq_table.num_comp_eqs; ++i)
 		mlx4_free_eq(dev, &priv->eq_table.eq[i]);

 	mlx4_unmap_clr_int(dev);
diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index e3fd4e9..aecb1f2 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -922,22 +922,24 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct msix_entry entries[MLX4_NUM_EQ];
+	int needed_vectors = MLX4_EQ_COMP_CPU0 + num_online_cpus();
 	int err;
 	int i;

 	if (msi_x) {
-		for (i = 0; i < MLX4_NUM_EQ; ++i)
+		for (i = 0; i < needed_vectors; ++i)
 			entries[i].entry = i;

-		err = pci_enable_msix(dev->pdev, entries, ARRAY_SIZE(entries));
+		err = pci_enable_msix(dev->pdev, entries, needed_vectors);
 		if (err) {
 			if (err > 0)
-				mlx4_info(dev, "Only %d MSI-X vectors available, "
-					  "not using MSI-X\n", err);
+				mlx4_info(dev, "Only %d MSI-X vectors "
+					  "available, need %d. Not using MSI-X\n",
+					  err, needed_vectors);
 			goto no_msi;
 		}

-		for (i = 0; i < MLX4_NUM_EQ; ++i)
+		for (i = 0; i < needed_vectors; ++i)
 			priv->eq_table.eq[i].irq = entries[i].vector;

 		dev->flags |= MLX4_FLAG_MSI_X;
@@ -945,7 +947,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
 	}

 no_msi:
-	for (i = 0; i < MLX4_NUM_EQ; ++i)
+	for (i = 0; i < needed_vectors; ++i)
 		priv->eq_table.eq[i].irq = dev->pdev->irq;
 }

diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index eff1c5a..2201a99 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -64,8 +64,8 @@ enum {

 enum {
 	MLX4_EQ_ASYNC,
-	MLX4_EQ_COMP,
-	MLX4_NUM_EQ
+	MLX4_EQ_COMP_CPU0,
+	MLX4_NUM_EQ = MLX4_EQ_COMP_CPU0 + NR_CPUS
 };

 enum {
@@ -211,6 +211,8 @@ struct mlx4_eq_table {
 	void __iomem	       *uar_map[(MLX4_NUM_EQ + 6) / 4];
 	u32			clr_mask;
 	struct mlx4_eq		eq[MLX4_NUM_EQ];
+	int			num_comp_eqs;
+	int			last_comp_eq;
 	u64			icm_virt;
 	struct page	       *icm_page;
 	dma_addr_t		icm_dma;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 93c17aa..673462c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -312,6 +312,7 @@ struct mlx4_cq {
 	int			arm_sn;

 	int			cqn;
+	int			comp_eq_idx;

 	atomic_t		refcount;
 	struct completion	free;
@@ -441,7 +442,7 @@ void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres,

 int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
-		  int collapsed);
+		  unsigned vector, int collapsed);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);

 int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:09:10 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:09:10 +0300
Subject: [ofa-general][PATCH 11/12 v1] mlx4: Fiber Channel support
Message-ID: <480F5116.4040809@mellanox.co.il>

>From ab14366d6cbf590c6a6a6a4d16e86a0d120facc6 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Wed, 23 Apr 2008 15:19:16 +0300
Subject: [PATCH] mlx4: Fiber Channel support

As we did with QPs, some of the MPTs are pre-reserved
(the MPTs that are mapped for FEXCHs, 2*64K of them).
So needed to split the operation of allocating an MPT to two:
	The allocation of a bit from the bitmap
	The actual creation of the entry (and it's MTT).
So, mr_alloc_reserved() is the second part, where you know which MPT number was allocated.
mr_alloc() is the one that allocates a number from the bitmap.
Normal users keep using the original mr_alloc().
For FEXCH, when we know the pre-reserved MPT entry, we call mr_alloc_reserved() directly.

Same with the mr_free() and corresponding mr_free_reserved().
The first will just put back the bit, the later will actually
destroy the entry, but will leave the bit set.

map_phys_fmr_fbo() is very much like the original map_phys_fmr()
- allows setting an FBO (First Byte Offset) for the MPT
- allows setting the data length for the MPT
- does not increase the higher bits of the key after every map.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/main.c     |    2 +-
 drivers/net/mlx4/mr.c       |  131 +++++++++++++++++++++++++++++++++++++------
 include/linux/mlx4/device.h |   18 ++++++
 include/linux/mlx4/qp.h     |   11 +++-
 4 files changed, 142 insertions(+), 20 deletions(-)

diff --git a/drivers/net/mlx4/main.c b/drivers/net/mlx4/main.c
index aecb1f2..93a4e4b 100644
--- a/drivers/net/mlx4/main.c
+++ b/drivers/net/mlx4/main.c
@@ -81,7 +81,7 @@ static struct mlx4_profile default_profile = {
 	.rdmarc_per_qp	= 1 << 4,
 	.num_cq		= 1 << 16,
 	.num_mcg	= 1 << 13,
-	.num_mpt	= 1 << 17,
+	.num_mpt	= 1 << 18,
 	.num_mtt	= 1 << 20,
 };

diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c
index 79b317b..ae376ae 100644
--- a/drivers/net/mlx4/mr.c
+++ b/drivers/net/mlx4/mr.c
@@ -52,7 +52,9 @@ struct mlx4_mpt_entry {
 	__be64 length;
 	__be32 lkey;
 	__be32 win_cnt;
-	u8	reserved1[3];
+	u8	reserved1;
+	u8	flags2;
+	u8	reserved2;
 	u8	mtt_rep;
 	__be64 mtt_seg;
 	__be32 mtt_sz;
@@ -68,6 +70,8 @@ struct mlx4_mpt_entry {

 #define MLX4_MTT_FLAG_PRESENT		1

+#define MLX4_MPT_FLAG2_FBO_EN	    (1 <<  7)
+
 #define MLX4_MPT_STATUS_SW		0xF0
 #define MLX4_MPT_STATUS_HW		0x00

@@ -122,7 +126,7 @@ static void mlx4_buddy_free(struct mlx4_buddy *buddy, u32 seg, int order)
 	spin_unlock(&buddy->lock);
 }

-static int mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
+static int __devinit mlx4_buddy_init(struct mlx4_buddy *buddy, int max_order)
 {
 	int i, s;

@@ -250,6 +254,21 @@ static int mlx4_HW2SW_MPT(struct mlx4_dev *dev, struct mlx4_cmd_mailbox *mailbox
 			    !mailbox, MLX4_CMD_HW2SW_MPT, MLX4_CMD_TIME_CLASS_B);
 }

+int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd,
+			   u64 iova, u64 size, u32 access, int npages,
+			   int page_shift, struct mlx4_mr *mr)
+{
+	mr->iova       = iova;
+	mr->size       = size;
+	mr->pd	       = pd;
+	mr->access     = access;
+	mr->enabled    = 0;
+	mr->key	       = hw_index_to_key(mridx);
+
+	return mlx4_mtt_init(dev, npages, page_shift, &mr->mtt);
+}
+EXPORT_SYMBOL_GPL(mlx4_mr_alloc_reserved);
+
 int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
 		  int npages, int page_shift, struct mlx4_mr *mr)
 {
@@ -261,14 +280,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
 	if (index == -1)
 		return -ENOMEM;

-	mr->iova       = iova;
-	mr->size       = size;
-	mr->pd	       = pd;
-	mr->access     = access;
-	mr->enabled    = 0;
-	mr->key	       = hw_index_to_key(index);
-
-	err = mlx4_mtt_init(dev, npages, page_shift, &mr->mtt);
+	err = mlx4_mr_alloc_reserved(dev, index, pd, iova, size,
+				     access, npages, page_shift, mr);
 	if (err)
 		mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, index);

@@ -276,9 +289,8 @@ int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
 }
 EXPORT_SYMBOL_GPL(mlx4_mr_alloc);

-void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr)
+void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr)
 {
-	struct mlx4_priv *priv = mlx4_priv(dev);
 	int err;

 	if (mr->enabled) {
@@ -290,6 +302,13 @@ void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr)
 	}

 	mlx4_mtt_cleanup(dev, &mr->mtt);
+}
+EXPORT_SYMBOL_GPL(mlx4_mr_free_reserved);
+
+void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	mlx4_mr_free_reserved(dev, mr);
 	mlx4_bitmap_free(&priv->mr_table.mpt_bitmap, key_to_hw_index(mr->key));
 }
 EXPORT_SYMBOL_GPL(mlx4_mr_free);
@@ -435,8 +454,15 @@ int mlx4_init_mr_table(struct mlx4_dev *dev)
 	struct mlx4_mr_table *mr_table = &mlx4_priv(dev)->mr_table;
 	int err;

-	err = mlx4_bitmap_init(&mr_table->mpt_bitmap, dev->caps.num_mpts,
-			       ~0, dev->caps.reserved_mrws);
+	if (!is_power_of_2(dev->caps.num_mpts))
+		return -EINVAL;
+
+	dev->caps.reserved_fexch_mpts_base = dev->caps.num_mpts -
+		(2 * dev->caps.reserved_qps_cnt[MLX4_QP_REGION_FC_EXCH]);
+	err = mlx4_bitmap_init_with_effective_max(&mr_table->mpt_bitmap,
+					dev->caps.num_mpts,
+					~0, dev->caps.reserved_mrws,
+					dev->caps.reserved_fexch_mpts_base);
 	if (err)
 		return err;

@@ -500,8 +526,9 @@ static inline int mlx4_check_fmr(struct mlx4_fmr *fmr, u64 *page_list,
 	return 0;
 }

-int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list,
-		      int npages, u64 iova, u32 *lkey, u32 *rkey)
+int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
+			  u64 *page_list, int npages, u64 iova, u32 fbo,
+			  u32 len, u32 *lkey, u32 *rkey, int same_key)
 {
 	u32 key;
 	int i, err;
@@ -513,7 +540,8 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list
 	++fmr->maps;

 	key = key_to_hw_index(fmr->mr.key);
-	key += dev->caps.num_mpts;
+	if (same_key)
+		key += dev->caps.num_mpts;
 	*lkey = *rkey = fmr->mr.key = hw_index_to_key(key);

 	*(u8 *) fmr->mpt = MLX4_MPT_STATUS_SW;
@@ -529,8 +557,10 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list

 	fmr->mpt->key    = cpu_to_be32(key);
 	fmr->mpt->lkey   = cpu_to_be32(key);
-	fmr->mpt->length = cpu_to_be64(npages * (1ull << fmr->page_shift));
+	fmr->mpt->length = cpu_to_be64(len);
 	fmr->mpt->start  = cpu_to_be64(iova);
+	fmr->mpt->first_byte_offset = cpu_to_be32(fbo & 0x001fffff);
+	fmr->mpt->flags2 = (fbo ? MLX4_MPT_FLAG2_FBO_EN : 0);

 	/* Make MTT entries are visible before setting MPT status */
 	wmb();
@@ -542,6 +572,16 @@ int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list

 	return 0;
 }
+EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr_fbo);
+
+int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list,
+		      int npages, u64 iova, u32 *lkey, u32 *rkey)
+{
+	u32 len = npages * (1ull << fmr->page_shift);
+
+	return mlx4_map_phys_fmr_fbo(dev, fmr, page_list, npages, iova, 0,
+				     len, lkey, rkey, 1);
+}
 EXPORT_SYMBOL_GPL(mlx4_map_phys_fmr);

 int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
@@ -586,6 +626,49 @@ err_free:
 }
 EXPORT_SYMBOL_GPL(mlx4_fmr_alloc);

+int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx,
+			    u32 pd, u32 access, int max_pages,
+			    int max_maps, u8 page_shift, struct mlx4_fmr *fmr)
+{
+	struct mlx4_priv *priv = mlx4_priv(dev);
+	u64 mtt_seg;
+	int err = -ENOMEM;
+
+	if (page_shift < 12 || page_shift >= 32)
+		return -EINVAL;
+
+	/* All MTTs must fit in the same page */
+	if (max_pages * sizeof *fmr->mtts > PAGE_SIZE)
+		return -EINVAL;
+
+	fmr->page_shift = page_shift;
+	fmr->max_pages  = max_pages;
+	fmr->max_maps   = max_maps;
+	fmr->maps = 0;
+
+	err = mlx4_mr_alloc_reserved(dev, mridx, pd, 0, 0, access, max_pages,
+				     page_shift, &fmr->mr);
+	if (err)
+		return err;
+
+	mtt_seg = fmr->mr.mtt.first_seg * dev->caps.mtt_entry_sz;
+
+	fmr->mtts = mlx4_table_find(&priv->mr_table.mtt_table,
+				    fmr->mr.mtt.first_seg,
+				    &fmr->dma_handle);
+	if (!fmr->mtts) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	return 0;
+
+err_free:
+	mlx4_mr_free_reserved(dev, &fmr->mr);
+	return err;
+}
+EXPORT_SYMBOL_GPL(mlx4_fmr_alloc_reserved);
+
 int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr)
 {
 	struct mlx4_priv *priv = mlx4_priv(dev);
@@ -634,6 +717,18 @@ int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr)
 }
 EXPORT_SYMBOL_GPL(mlx4_fmr_free);

+int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr)
+{
+	if (fmr->maps)
+		return -EBUSY;
+
+	fmr->mr.enabled = 0;
+	mlx4_mr_free_reserved(dev, &fmr->mr);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_fmr_free_reserved);
+
 int mlx4_SYNC_TPT(struct mlx4_dev *dev)
 {
 	return mlx4_cmd(dev, 0, 0, 0, MLX4_CMD_SYNC_TPT, 1000);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 673462c..e417673 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -215,6 +215,7 @@ struct mlx4_caps {
 	int                     log_num_vlans;
 	int                     log_num_prios;
 	enum mlx4_port_type	port_type[MLX4_MAX_PORTS + 1];
+	int			reserved_fexch_mpts_base;
 };

 struct mlx4_buf_list {
@@ -400,6 +401,12 @@ static inline u32 mlx4_get_ports_of_type(struct mlx4_dev *dev,
 	    for ((port) = 1; (port) <= MLX4_MAX_PORTS; ++(port)) \
 		if (bitmap & 1 << ((port)-1))

+
+static inline int mlx4_get_fexch_mpts_base(struct mlx4_dev *dev)
+{
+	return dev->caps.reserved_fexch_mpts_base;
+}
+
 int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct,
 		   struct mlx4_buf *buf);
 void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf);
@@ -423,8 +430,12 @@ int mlx4_mtt_init(struct mlx4_dev *dev, int npages, int page_shift,
 void mlx4_mtt_cleanup(struct mlx4_dev *dev, struct mlx4_mtt *mtt);
 u64 mlx4_mtt_addr(struct mlx4_dev *dev, struct mlx4_mtt *mtt);

+int mlx4_mr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd,
+			   u64 iova, u64 size, u32 access, int npages,
+			   int page_shift, struct mlx4_mr *mr);
 int mlx4_mr_alloc(struct mlx4_dev *dev, u32 pd, u64 iova, u64 size, u32 access,
 		  int npages, int page_shift, struct mlx4_mr *mr);
+void mlx4_mr_free_reserved(struct mlx4_dev *dev, struct mlx4_mr *mr);
 void mlx4_mr_free(struct mlx4_dev *dev, struct mlx4_mr *mr);
 int mlx4_mr_enable(struct mlx4_dev *dev, struct mlx4_mr *mr);
 int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
@@ -469,13 +480,20 @@ void mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, int index);
 int mlx4_register_vlan(struct mlx4_dev *dev, u8 port, u16 vlan, int *index);
 void mlx4_unregister_vlan(struct mlx4_dev *dev, u8 port, int index);

+int mlx4_map_phys_fmr_fbo(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
+			  u64 *page_list, int npages, u64 iova, u32 fbo,
+			  u32 len, u32 *lkey, u32 *rkey, int same_key);
 int mlx4_map_phys_fmr(struct mlx4_dev *dev, struct mlx4_fmr *fmr, u64 *page_list,
 		      int npages, u64 iova, u32 *lkey, u32 *rkey);
+int mlx4_fmr_alloc_reserved(struct mlx4_dev *dev, u32 mridx, u32 pd,
+			    u32 access, int max_pages, int max_maps,
+			    u8 page_shift, struct mlx4_fmr *fmr);
 int mlx4_fmr_alloc(struct mlx4_dev *dev, u32 pd, u32 access, int max_pages,
 		   int max_maps, u8 page_shift, struct mlx4_fmr *fmr);
 int mlx4_fmr_enable(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
 		    u32 *lkey, u32 *rkey);
+int mlx4_fmr_free_reserved(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_fmr_free(struct mlx4_dev *dev, struct mlx4_fmr *fmr);
 int mlx4_SYNC_TPT(struct mlx4_dev *dev);

diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index a5e43fe..d7c0227 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -151,7 +151,16 @@ struct mlx4_qp_context {
 	u8			reserved4[2];
 	u8			mtt_base_addr_h;
 	__be32			mtt_base_addr_l;
-	u32			reserved5[10];
+	u8			VE;
+	u8			reserved5;
+	__be16			VFT_id_prio;
+	u8			reserved6;
+	u8			exch_size;
+	__be16			exch_base;
+	u8			VFT_hop_cnt;
+	u8			my_fc_id_idx;
+	__be16			reserved7;
+	u32			reserved8[7];
 };

 /* Which firmware version adds support for NEC (NoErrorCompletion) bit */
-- 
1.5.4


From yevgenyp at mellanox.co.il  Wed Apr 23 08:11:25 2008
From: yevgenyp at mellanox.co.il (Yevgeny Petrilin)
Date: Wed, 23 Apr 2008 18:11:25 +0300
Subject: [ofa-general][PATCH 12/12 v1] mlx4: QP to ready
Message-ID: <480F519D.6060101@mellanox.co.il>

>From eda80652876695342a68fd2e47d45d1c57d8b511 Mon Sep 17 00:00:00 2001
From: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
Date: Wed, 23 Apr 2008 16:20:42 +0300
Subject: [PATCH] mlx4: Qp to ready

Added API to bring a QP from Reset to RTS state.

Signed-off-by: Yevgeny Petrilin <yevgenyp at mellanox.co.il>
---
 drivers/net/mlx4/qp.c   |   30 ++++++++++++++++++++++++++++++
 include/linux/mlx4/qp.h |    4 ++++
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/qp.c b/drivers/net/mlx4/qp.c
index 2d5be15..a6ed9ca 100644
--- a/drivers/net/mlx4/qp.c
+++ b/drivers/net/mlx4/qp.c
@@ -366,3 +366,33 @@ int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_query);

+int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
+		     struct mlx4_qp_context *context,
+		     struct mlx4_qp *qp, enum mlx4_qp_state *qp_state)
+{
+#define CLEAR_STATE_MASK 0xfffffff
+	int err = 0;
+	int i;
+	enum mlx4_qp_state states[] = {
+		MLX4_QP_STATE_RST,
+		MLX4_QP_STATE_INIT,
+		MLX4_QP_STATE_RTR,
+		MLX4_QP_STATE_RTS
+	};
+
+	for (i = 0; i < ARRAY_SIZE(states) - 1; i++) {
+		context->flags &= cpu_to_be32(CLEAR_STATE_MASK);
+		context->flags |= cpu_to_be32(states[i+1] << 28);
+		err = mlx4_qp_modify(dev, mtt, states[i],
+				     states[i+1], context, 0, 0, qp);
+		if (err) {
+			mlx4_err(dev, "Failed to bring qp to state:"
+				      "%d with error: %d\n",
+					states[i+1], err);
+			return err;
+		}
+		*qp_state = states[i+1];
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_qp_to_ready);
diff --git a/include/linux/mlx4/qp.h b/include/linux/mlx4/qp.h
index d7c0227..96b0e1b 100644
--- a/include/linux/mlx4/qp.h
+++ b/include/linux/mlx4/qp.h
@@ -305,6 +305,10 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 int mlx4_qp_query(struct mlx4_dev *dev, struct mlx4_qp *qp,
 		  struct mlx4_qp_context *context);

+int mlx4_qp_to_ready(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
+		     struct mlx4_qp_context *context,
+		     struct mlx4_qp *qp, enum mlx4_qp_state *qp_state);
+
 static inline struct mlx4_qp *__mlx4_qp_lookup(struct mlx4_dev *dev, u32 qpn)
 {
 	return radix_tree_lookup(&dev->qp_table_tree, qpn & (dev->caps.num_qps - 1));
-- 
1.5.4


From holt at sgi.com  Wed Apr 23 08:45:36 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 23 Apr 2008 10:45:36 -0500
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <20080423134427.GW24536@duo.random>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
	<20080423134427.GW24536@duo.random>
Message-ID: <20080423154536.GV30298@sgi.com>

On Wed, Apr 23, 2008 at 03:44:27PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 04:14:26PM -0700, Christoph Lameter wrote:
> > We want a full solution and this kind of patching makes the patches 
> > difficuilt to review because later patches revert earlier ones.
> 
> I know you rather want to see KVM development stalled for more months
> than to get a partial solution now that already covers KVM and GRU
> with the same API that XPMEM will also use later. It's very unfair on
> your side to pretend to stall other people development if what you
> need has stronger requirements and can't be merged immediately. This
> is especially true given it was publically stated that XPMEM never
> passed all regression tests anyway, so you can't possibly be in such

XPMEM has passed all regression tests using your version 12 notifiers.

I have a bug in xpmem which shows up on our 8x oversubscription tests,
but that is clearly my bug to figure out.  Unfortunately it only shows
up on a 128 processor machine so I have 1024 stack traces to sort
through each time it fails.  Does take a bit of time and a lot of
concentration.

> an hurry like we are, we can't progress without this. Infact we can

SGI is under an equally strict timeline.  We really needed the sleeping
version into 2.6.26.  We may still be able to get this accepted by
vendor distros if we make 2.6.27.

Thanks,
Robin


From andrea at qumranet.com  Wed Apr 23 08:59:40 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 17:59:40 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423144747.GU30298@sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random>
	<20080423144747.GU30298@sgi.com>
Message-ID: <20080423155940.GY24536@duo.random>

On Wed, Apr 23, 2008 at 09:47:47AM -0500, Robin Holt wrote:
> It also makes the API consistent.  What you are proposing is equivalent
> to having a file you can open but never close.

That's not entirely true, you can close the file just fine it by
killing the tasks leading to an mmput. From an user prospective in KVM
terms, it won't make a difference as /dev/kvm will remain open and
it'll pin the module count until the kvm task is killed anyway, I
assume for GRU it's similar.

Until I had the idea of how to implement an mm_lock to ensure the
mmu_notifier_register could miss a running invalidate_range_begin, it
wasn't even possible to implement a mmu_notifier_unregister (see EMM
patches) and it looked like you were ok with that API that missed
_unregister...

> This whole discussion seems ludicrous.  You could refactor the code to get
> the sorted list of locks, pass that list into mm_lock to do the locking,
> do the register/unregister, then pass the same list into mm_unlock.

Correct, but it will keep the vmalloc ram pinned during the
runtime. There's no reason to keep that ram allocated per-VM while the
VM runs. We only need it during the startup and teardown.

> If the allocation fails, you could fall back to the older slower method
> of repeatedly scanning the lists and acquiring locks in ascending order.

Correct, I already thought about that. This is exactly why I'm
deferring this for later! Or those perfectionism not needed for
KVM/GRU will keep delaying indefinitely the part that is already
converged and that's enough for KVM and GRU (and for this specific
bit, actually enough for XPMEM as well).

We can make a second version of mm_lock_slow to use if mm_lock fails,
in mmu_notifier_unregister, with N^2 complexity later, after the
mmu-notifier-core is merged into mainline.

> If you are not going to provide the _unregister callout you need to change
> the API so I can scan the list of notifiers to see if my structures are
> already registered.

As said 1/N isn't enough for XPMEM anyway. 1/N has to only include the
absolute minimum and zero risk stuff, that is enough for both KVM and
GRU.

> We register our notifier structure at device open time.  If we receive a
> _release callout, we mark our structure as unregistered.  At device close
> time, if we have not been unregistered, we call _unregister.  If you
> take away _unregister, I have an xpmem kernel structure in use _AFTER_
> the device is closed with no indication that the process is using it.
> In that case, I need to get an extra reference to the module in my device
> open method and hold that reference until the _release callout.

Yes exactly, but you've to do that anyway, if mmu_notifier_unregister
fails because some driver already allocated all vmalloc space (even
x86-64 hasn't indefinite amount of vmalloc because of the vmalloc
being in the end of the address space) unless we've a N^2 fallback,
but the N^2 fallback will make the code more easily dosable and
unkillable, so if I would be an admin I'd prefer having to quickly
kill -9 a task in O(N) than having to wait some syscall that runs in
O(N^2) to complete before the task quits. So the fallback to a slower
algorithm isn't necessarily what will really happen after 2.6.26 is
released, we'll see. Relaying on ->release for the module unpin sounds
preferable, and it's certainly the only reliable way to unregister
that we'll provide in 2.6.26.

> Additionally, if the users program reopens the device, I need to scan the
> mmu_notifiers list to see if this tasks notifier is already registered.

But you don't need any browse the list for this, keep a flag in your
structure after the mmu_notifier struct, set the bitflag after
mmu_notifier_register returns, and clear the bitflag after ->release
runs or after mmu_notifier_unregister returns success. What's the big
deal to track if you've to call mmu_notifier_register a second time or
not? Or you can create a new structure every time somebody asks to
reattach.

> I view _unregister as essential.  Did I miss something?

We can add it later, and we can keep discussing on what's the best
model to implement it as long as you want after 2.6.26 is released
with mmu-notifier-core so GRU/KVM are done. It's unlikely KVM will use
mmu_notifier_unregister anyway as we need it attached for the whole
lifetime of the task, and only for the lifetime of the task.

This is the patch to add it, as you can see it's entirely orthogonal,
backwards compatible with previous API and it doesn't duplicate or
rewrite any code.

Don't worry, any kernel after 2.6.26 will have unregister, but we can't
focus on this for 2.6.26. We can also consider making
mmu_notifier_register safe against double calls on the same structure
but again that's not something we should be doing in 1/N and it can be
done later in a backwards compatible way (plus we're perfectly fine
with the API having not backwards compatible changes as long as 2.6.26
can work for us).

---------------------------------
Implement unregister but it's not reliable, only ->release is reliable.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -119,6 +119,8 @@
 
 extern int mmu_notifier_register(struct mmu_notifier *mn,
 				 struct mm_struct *mm);
+extern int mmu_notifier_unregister(struct mmu_notifier *mn,
+				   struct mm_struct *mm);
 extern void __mmu_notifier_release(struct mm_struct *mm);
 extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
 					  unsigned long address);
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -106,3 +106,29 @@
 	return ret;
 }
 EXPORT_SYMBOL_GPL(mmu_notifier_register);
+
+/*
+ * mm_users can't go down to zero while mmu_notifier_unregister()
+ * runs or it can race with ->release. So a mm_users pin must
+ * be taken by the caller (if mm can be different from current->mm).
+ *
+ * This function can fail (for example during out of memory conditions
+ * or after vmalloc virtual range shortage), so the only reliable way
+ * to unregister is to wait release() to be called.
+ */
+int mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data data;
+	int ret;
+
+	BUG_ON(!atomic_read(&mm->mm_users));
+
+	ret = mm_lock(mm, &data);
+	if (unlikely(ret))
+		goto out;
+	hlist_del(&mn->hlist);
+	mm_unlock(mm, &data);
+out:
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_unregister);


From michael.heinz at qlogic.com  Wed Apr 23 09:08:28 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Wed, 23 Apr 2008 11:08:28 -0500
Subject: [ofa-general] Suggested patches to OFED RPM spec files
Message-ID: <C07C40DB2364324799506DE8FF12F8D86787B9@EPEXCH1.qlogic.org>

Installation of OFED 1.3.0.0.4 onto a Kusu/OCS cluster does not fully
succeed because of some missing dependencies in the RPM spec files. This
is because Kusu installs nodes over a network by presenting a pool of
RPMs to be installed and letting RPM figure out the order to install
them in. Without the dependencies we ended up with oddities like the
kernel drivers being installed before the /usr/bin directory had been
populated, causing the install script to fail.
 
I was able to work around this by manually expanding some of the source
RPM files, altering the spec file and repackaging the source RPM. This
allowed me to build binary RPMs (via the install script) that could be
installed on a Kusu cluster.
 
Here are the proposed changes. If there is a better/preferred way of
submitting this suggestion, please let me know.
 
--- ../../original/ib-bonding.spec      2008-04-22 12:54:12.000000000
-0400
+++ ib-bonding.spec     2008-04-22 12:43:07.000000000 -0400
@@ -20,6 +20,7 @@
 Group           : Applications/System
 License         : GPL
 BuildRoot:      %{_tmppath}/%{name}-%{version}-root
+PreReq         : coreutils
 
 %description
 This package provides a bonding device which is capable of enslaving
--- ../../original/ofa_kernel.spec      2008-04-22 12:54:13.000000000
-0400
+++ ofa_kernel.spec     2008-04-22 12:45:40.000000000 -0400
@@ -111,6 +111,9 @@
 BuildRequires: sysfsutils-devel
 
 %package -n kernel-ib
+PreReq: coreutils
+PreReq: kernel
+PreReq: pciutils
 Version: %{_version}
 Release: %{krelver}
 Summary: Infiniband Driver and ULPs kernel modules
@@ -119,6 +122,10 @@
 Core, HW and ULPs kernel modules
 
 %package -n kernel-ib-devel
+PreReq: coreutils
+PreReq: kernel
+PreReq: pciutils
+Requires: kernel-ib
 Version: %{_version}
 Release: %{krelver}
 Summary: Infiniband Driver and ULPs kernel modules sources
--- ../../original/open-iscsi-generic.spec      2008-04-22
12:54:13.000000000 -0400
+++ open-iscsi-generic.spec     2008-04-22 12:42:33.000000000 -0400
@@ -21,6 +21,7 @@
 %define kversion $(uname -r | sed "s/-ppc64\|-smp//")
 
 %package -n iscsi-initiator-utils
+PreReq: coreutils
 Summary                : iSCSI daemon and utility programs
 Group          : System Environment/Daemons
 %description -n iscsi-initiator-utils
@@ -30,6 +31,7 @@
 Protocol networks.
 
 %package -n open-iscsi
+PreReq: coreutils
 Summary     : Linux* Open-iSCSI Software Initiator
 Group       : Productivity/Networking/Other
 %description -n open-iscsi

 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080423/8ef69912/attachment.html>

From andrea at qumranet.com  Wed Apr 23 09:15:45 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 18:15:45 +0200
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <20080423154536.GV30298@sgi.com>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
	<20080423134427.GW24536@duo.random>
	<20080423154536.GV30298@sgi.com>
Message-ID: <20080423161544.GZ24536@duo.random>

On Wed, Apr 23, 2008 at 10:45:36AM -0500, Robin Holt wrote:
> XPMEM has passed all regression tests using your version 12 notifiers.

That's great news, thanks! I'd greatly appreciate if you could test
#v13 too as I posted it. It already passed GRU and KVM regressions
tests and it should work fine for XPMEM too. You can ignore the purely
cosmetical error I managed to introduce in mm_lock_cmp (I implemented
a BUG_ON that would have trigger if that wasn't a purely cosmetical
issue, and it clearly doesn't trigger so you can be sure it's
only cosmetical ;).

Once I get confirmation that everyone is ok with #v13 I'll push a #v14
before Saturday with that cosmetical error cleaned up and
mmu_notifier_unregister moved at the end (XPMEM will have unregister
don't worry). I expect the 1/13 of #v14 to go in -mm and then 2.6.26.

> I have a bug in xpmem which shows up on our 8x oversubscription tests,
> but that is clearly my bug to figure out.  Unfortunately it only shows

This is what I meant.

As opposed we don't have any known bug left in this area, infact we
need mmu_notifiers to _fix_ issues I identified that can't be fixed
efficiently without mmu notifiers, and we need the mmu notifier to go
productive ASAP.

> up on a 128 processor machine so I have 1024 stack traces to sort
> through each time it fails.  Does take a bit of time and a lot of
> concentration.

Sure, hope you find it soon!

> SGI is under an equally strict timeline.  We really needed the sleeping
> version into 2.6.26.  We may still be able to get this accepted by
> vendor distros if we make 2.6.27.

I don't think vendor distro are less likely to take the patches 2-12
if 1/N (aka mmu-notifier-core) is merged in 2.6.26 especially at the
light of kabi.


From andrea at qumranet.com  Wed Apr 23 09:26:29 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 18:26:29 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>
Message-ID: <20080423162629.GB24536@duo.random>

On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote:
> I guess I have to prepare another patchset then?

If you want to embarrass yourself three time in a row go ahead ;). I
thought two failed takeovers was enough.


From michaelc at cs.wisc.edu  Wed Apr 23 09:33:49 2008
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 23 Apr 2008 11:33:49 -0500
Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions
In-Reply-To: <480F3C84.40606@Voltaire.COM>
References: <20080212205252.GB13643@osc.edu>
	<20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu>
	<480C9BF8.9050401@Voltaire.COM> <480F3C84.40606@Voltaire.COM>
Message-ID: <480F64ED.7010705@cs.wisc.edu>

Erez Zilber wrote:
> Erez Zilber wrote:
>> Pete Wyckoff wrote:
>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
>> 15:57 -0600:
>>>  
>>>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote:
>>>>    
>>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
>> 15:10 -0600:
>>>>>      
>>>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
>>>>>>        
>>>>>>> iscsi_iser does not have any hardware DMA restrictions.  Add a
>>>>>>> slave_configure function to remove any DMA alignment restriction,
>>>>>>> allowing the use of direct IO from arbitrary offsets within a page.
>>>>>>> Also disable page bouncing; iser has no restrictions on which
>> pages it
>>>>>>> can address.
>>>>>>>
>>>>>>> Signed-off-by: Pete Wyckoff <pw at osc.edu>
>>>>>>> ---
>>>>>>>  drivers/infiniband/ulp/iser/iscsi_iser.c |    8 ++++++++
>>>>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
>> b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>> index be1b9fb..1b272a6 100644
>>>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
>>>>>>>   iser_conn_terminate(ib_conn);
>>>>>>>  }
>>>>>>>
>>>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev)
>>>>>>> +{
>>>>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
>>>>>>>          
>>>>>> You really don't want to do this.  That signals to the block
>> layer that
>>>>>> we have an iommu, although it's practically the same thing as a
>> 64 bit
>>>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up
>>>>>> correctly.  Anything else is asking for a subtle bug to turn up years
>>>>>> from now when something causes the mask and the limit to be
>> mismatched.
>>>>>>        
>>>>> Oh.  I decided to add that line for symmetry with TCP, and was
>>>>> convinced by the arguments here:
>>>>>
>>>>>     commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
>>>>>     Author: Mike Christie <michaelc at cs.wisc.edu>
>>>>>     Date:   Thu Jul 26 12:46:47 2007 -0500
>>>>>
>>>>>     [SCSI] iscsi_tcp: Turn off bounce buffers
>>>>>
>>>>>     It was found by LSI that on setups with large amounts of memory
>>>>>     we were bouncing buffers when we did not need to. If the iscsi tcp
>>>>>     code touches the data buffer (or a helper does),
>>>>>     it will kmap the buffer. iscsi_tcp also does not interact with
>> hardware,
>>>>>     so it does not have any hw dma restrictions. This patch sets
>> the bounce
>>>>>     buffer settings for our device queue so buffers should not be
>> bounced
>>>>>     because of a driver limit.
>>>>>
>>>>> I don't see a convenient place to callback into particular iscsi
>>>>> devices to set the DMA mask per-host.  It has to go on the
>>>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
>>>>> handles its DMA mask during device probe.
>>>>>      
>>>> You should be taking your mask from the underlying infiniband device as
>>>> part of the setup, shouldn't you?
>>>>    
>>> I think you're right about this.  All the existing IB HW tries to
>>> set a 64-bit dma mask, but that's no reason to disable the mechanism
>>> entirely in iser.  I'll remove that line that disables bouncing in
>>> my patch.  Perhaps Mike will know if the iscsi_tcp usage is still
>>> appropriate.
>>>
>>>  
>> Let me make sure that I understand: you say that the IB HW driver (e.g.
>> ib_mthca) tries to set a 64-bit dma mask:
>>
>>     err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
>>     if (err) {
>>         dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA
>> mask.\n");
>>         err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
>>         if (err) {
>>             dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n");
>>             goto err_free_res;
>>         }
>>     }
>>
>> So, in the example above, the driver will use a 64-bit mask or a 32-bit
>> mask (or fail). According to that, iSER (and SRP) needs to call
>> blk_queue_bounce_limit with the appropriate parameter, right?
>>
> 
> Roland, James,
> 
> I'm trying to fix this potential problem in iSER, and I have some
> questions about that. How can I get the DMA mask that the HCA driver is
> using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from
> struct ib_device? Is it in ib_device->device?

I think what Erez is asking, or maybe it is something I was wondering 
is, that scsi drivers like lpfc or qla2xxx will do something like:

if (dma_set_mask(&scsi_host->pdev->dev, DMA_64BIT_MASK))
	dma_set_mask(&scsi_host->pdev->dev, DMA_32BIT_MASK)

And when __scsi_alloc_queue calls scsi_calculate_bounce_limit it checks 
the host's parent dma_mask and sets the bounce_limit for the driver.

Does srp/iser need to call the dma_set_mask functions or does the 
ib_device's device already have the dma info set up?

> 
> Another question is - after I get the DMA mask data from the HCA driver,
> I guess that I need to call blk_queue_bounce_limit with the appropriate
> parameter (BLK_BOUNCE_HIGH, BLK_BOUNCE_ANY or BLK_BOUNCE_ISA). Which
> value should iSER use according to the DMA mask info? For example, if
> the HCA driver sets DMA_64BIT_MASK, should iSER use
> BLK_BOUNCE_HIGH/BLK_BOUNCE_ANY/BLK_BOUNCE_ISA ?

Have you seen how the scsi layer calls blk_queue_bounce_limit when you 
have a parent device that is setup?

In the bnx2i branch I modified iser to be more like srp and traditional 
drivers, because it accesses the ib_device similar to how other drivers 
like lpfc or qla2xxx access their parent device for dma funtions, and 
when the underlying device is removed we now remove the sessions like 
with other hotplug drivers (we remove sessions from the ib_client remove 
callout like srp). In the branch, iser allocates a scsi_host per 
ib_device, and the scsi_host's parent is the ib_device ( 
..../ib_device/scsi_host/iscsi_session/scsi_target/scsi_device), so if 
the dma_mask is set right then the bounce_limit will be set by 
scsi_calculate_bounce_limit?

An alternative could be to keep allocating a host per session, but just 
call blk_queue_bounce_limit in the scsi_host_template->slave_alloc function?


From andrea at qumranet.com  Wed Apr 23 09:37:13 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 18:37:13 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423002848.GA32618@sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
Message-ID: <20080423163713.GC24536@duo.random>

On Tue, Apr 22, 2008 at 07:28:49PM -0500, Jack Steiner wrote:
> The GRU driver unregisters the notifier when all GRU mappings
> are unmapped. I could make it work either way - either with or without
> an unregister function. However, unregister is the most logical
> action to take when all mappings have been destroyed.

This is true for KVM as well, unregister would be the most logical
action to take when the kvm device is closed and the vm
destroyed. However we can't implement mm_lock in O(N*log(N)) without
triggering RAM allocations. And the size of those ram allocations are
unknown at the time unregister runs (they also depend on the
max_nr_vmas sysctl). So on a second thought not even passing the array
from register to unregister would solve it (unless we allocate
max_nr_vmas and we block the sysctl to alter max_nr_vmas if not all
unregister run yet).That's clearly unacceptable.

The only way to avoid failing because of vmalloc space shortage or
oom, would be to provide a O(N*N) fallback. But one that can't be
interrupted by sigkill! sigkill interruption was ok in #v12 because we
didn't rely on mmu_notifier_unregister to succeed. So it avoided any
DoS but it still can't provide any reliable unregister.

So in the end unregistering with kill -9 leading to ->release in O(1)
sounds safer solution for the long term. You can't loop if unregister
fails and pretend your module not to have deadlocks.

Yes, waiting ->release add up a bit of complexity but I think it worth
it, and there weren't genial ideas on how to avoid O(N*N) complexity
and allocations too in mmu_notifier_unregister yet. Until that genius
idea will materialize we'll stick with ->release in O(1) as the only
safe unregister so we guarantee the admin will be in control of his
hardware in O(1) with kill -9 no matter if /dev/kvm and /dev/gru are
owned by sillyuser.

I'm afraid if you don't want to worst-case unregister with ->release
you need to have a better idea than my mm_lock and personally I can't
see any other way than mm_lock to ensure not to miss range_begin...

All the above is in 2.6.27 context (for 2.6.26 ->release is the way,
even if the genius idea would materialize).


From steiner at sgi.com  Wed Apr 23 10:09:09 2008
From: steiner at sgi.com (Jack Steiner)
Date: Wed, 23 Apr 2008 12:09:09 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <ea87c15371b1bd49380c.1208872277@duo.random>
References: <patchbomb.1208872276@duo.random>
	<ea87c15371b1bd49380c.1208872277@duo.random>
Message-ID: <20080423170909.GA1459@sgi.com>


You may have spotted this already. If so, just ignore this.

It looks like there is a bug in copy_page_range() around line 667.
It's possible to do a mmu_notifier_invalidate_range_start(), then
return -ENOMEM w/o doing a corresponding mmu_notifier_invalidate_range_end().

--- jack


From michaelc at cs.wisc.edu  Wed Apr 23 10:16:10 2008
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 23 Apr 2008 12:16:10 -0500
Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions
In-Reply-To: <480F64ED.7010705@cs.wisc.edu>
References: <20080212205252.GB13643@osc.edu>
	<20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu>
	<480C9BF8.9050401@Voltaire.COM> <480F3C84.40606@Voltaire.COM>
	<480F64ED.7010705@cs.wisc.edu>
Message-ID: <480F6EDA.9050004@cs.wisc.edu>

Mike Christie wrote:
> Erez Zilber wrote:
>> Erez Zilber wrote:
>>> Pete Wyckoff wrote:
>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
>>> 15:57 -0600:
>>>>  
>>>>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote:
>>>>>   
>>>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
>>> 15:10 -0600:
>>>>>>     
>>>>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
>>>>>>>       
>>>>>>>> iscsi_iser does not have any hardware DMA restrictions.  Add a
>>>>>>>> slave_configure function to remove any DMA alignment restriction,
>>>>>>>> allowing the use of direct IO from arbitrary offsets within a page.
>>>>>>>> Also disable page bouncing; iser has no restrictions on which
>>> pages it
>>>>>>>> can address.
>>>>>>>>
>>>>>>>> Signed-off-by: Pete Wyckoff <pw at osc.edu>
>>>>>>>> ---
>>>>>>>>  drivers/infiniband/ulp/iser/iscsi_iser.c |    8 ++++++++
>>>>>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
>>> b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>>> index be1b9fb..1b272a6 100644
>>>>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
>>>>>>>>   iser_conn_terminate(ib_conn);
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev)
>>>>>>>> +{
>>>>>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
>>>>>>>>          
>>>>>>> You really don't want to do this.  That signals to the block
>>> layer that
>>>>>>> we have an iommu, although it's practically the same thing as a
>>> 64 bit
>>>>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up
>>>>>>> correctly.  Anything else is asking for a subtle bug to turn up 
>>>>>>> years
>>>>>>> from now when something causes the mask and the limit to be
>>> mismatched.
>>>>>>>        
>>>>>> Oh.  I decided to add that line for symmetry with TCP, and was
>>>>>> convinced by the arguments here:
>>>>>>
>>>>>>     commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
>>>>>>     Author: Mike Christie <michaelc at cs.wisc.edu>
>>>>>>     Date:   Thu Jul 26 12:46:47 2007 -0500
>>>>>>
>>>>>>     [SCSI] iscsi_tcp: Turn off bounce buffers
>>>>>>
>>>>>>     It was found by LSI that on setups with large amounts of memory
>>>>>>     we were bouncing buffers when we did not need to. If the iscsi 
>>>>>> tcp
>>>>>>     code touches the data buffer (or a helper does),
>>>>>>     it will kmap the buffer. iscsi_tcp also does not interact with
>>> hardware,
>>>>>>     so it does not have any hw dma restrictions. This patch sets
>>> the bounce
>>>>>>     buffer settings for our device queue so buffers should not be
>>> bounced
>>>>>>     because of a driver limit.
>>>>>>
>>>>>> I don't see a convenient place to callback into particular iscsi
>>>>>> devices to set the DMA mask per-host.  It has to go on the
>>>>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
>>>>>> handles its DMA mask during device probe.
>>>>>>      
>>>>> You should be taking your mask from the underlying infiniband 
>>>>> device as
>>>>> part of the setup, shouldn't you?
>>>>>    
>>>> I think you're right about this.  All the existing IB HW tries to
>>>> set a 64-bit dma mask, but that's no reason to disable the mechanism
>>>> entirely in iser.  I'll remove that line that disables bouncing in
>>>> my patch.  Perhaps Mike will know if the iscsi_tcp usage is still
>>>> appropriate.
>>>>
>>>>  
>>> Let me make sure that I understand: you say that the IB HW driver (e.g.
>>> ib_mthca) tries to set a 64-bit dma mask:
>>>
>>>     err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
>>>     if (err) {
>>>         dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA
>>> mask.\n");
>>>         err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
>>>         if (err) {
>>>             dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n");
>>>             goto err_free_res;
>>>         }
>>>     }
>>>
>>> So, in the example above, the driver will use a 64-bit mask or a 32-bit
>>> mask (or fail). According to that, iSER (and SRP) needs to call
>>> blk_queue_bounce_limit with the appropriate parameter, right?
>>>
>>
>> Roland, James,
>>
>> I'm trying to fix this potential problem in iSER, and I have some
>> questions about that. How can I get the DMA mask that the HCA driver is
>> using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from
>> struct ib_device? Is it in ib_device->device?
> 
> I think what Erez is asking, or maybe it is something I was wondering 
> is, that scsi drivers like lpfc or qla2xxx will do something like:
> 
> if (dma_set_mask(&scsi_host->pdev->dev, DMA_64BIT_MASK))
>     dma_set_mask(&scsi_host->pdev->dev, DMA_32BIT_MASK)
> 
> And when __scsi_alloc_queue calls scsi_calculate_bounce_limit it checks 
> the host's parent dma_mask and sets the bounce_limit for the driver.
> 
> Does srp/iser need to call the dma_set_mask functions or does the 
> ib_device's device already have the dma info set up?

Nevermind. I misread the mail. We know the ib hw driver sets the mask. I 
  guess what we are debating is if we should set the scsi_host's parent 
to the ib_device so the dma mask is picked up, or if should just set 
them in our slave_configure by calling blk_queue_bounce_limit. And if we 
use the blk_queue_bounce_limit path, what function do we call to get the 
dma_mask.

I also modified iser to allocate a host per ib_deivce so it works like 
other scsi drivers since we know the parent. Is this preferred over the 
host per session style. Does it matter?

bnx2i works similar to iser where we use a libiscsi, and dma against a 
real device. Should it do a host per session or host per netdev?

And if we do not allocate a host per ib_device/netdevice what should we 
allocate per those structs? Should we create our own?


From andrea at qumranet.com  Wed Apr 23 10:24:32 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 19:24:32 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423162629.GB24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>
	<20080423162629.GB24536@duo.random>
Message-ID: <20080423172432.GE24536@duo.random>

On Wed, Apr 23, 2008 at 06:26:29PM +0200, Andrea Arcangeli wrote:
> On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote:
> > I guess I have to prepare another patchset then?

Apologies for my previous not too polite comment in answer to the
above, but I thought this double patchset was over now that you
converged in #v12 and obsoleted EMM and after the last private
discussions. There's nothing personal here on my side, just a bit of
general frustration on this matter. I appreciate all great
contribution from you, as last your idea to use sort(), but I can't
really see any possible benefit or justification anymore from keeping
two patchsets floating around given we already converged on the
mmu-notifier-core, and given it's almost certain mmu-notifier-core
will go in -mm in time for 2.6.26. Let's put it this way, if I fail to
merge mmu-notifier-core into 2.6.26 I'll voluntarily give up my entire
patchset and leave maintainership to you so you move 1/N to N/N and
remove mm_lock-sem patch (everything else can remain the same as it's
all orthogonal so changing the order is a matter of minutes).


From rdreier at cisco.com  Wed Apr 23 10:24:47 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 10:24:47 -0700
Subject: [ofa-general] Re: [PATCH 1/1 v1] MLX4: Added resize_cq capability.
In-Reply-To: <480F428C.7080701@dev.mellanox.co.il> (Vladimir Sokolovsky's
	message of "Wed, 23 Apr 2008 17:07:08 +0300")
References: <47E923CA.90804@dev.mellanox.co.il> <adak5jmmccl.fsf@cisco.com>
	<47F0A5A5.2010208@dev.mellanox.co.il> <adaabk9qpt1.fsf@cisco.com>
	<480F428C.7080701@dev.mellanox.co.il>
Message-ID: <adak5iobi1s.fsf@cisco.com>

yikes, thanks, applied.  sorry for messing up your original patch.

which reminds me... I need to push out the libmlx4 side of
things... I'll do that today, please test it when I do.

Thanks,
  Roland


From sean.hefty at intel.com  Wed Apr 23 10:25:44 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 23 Apr 2008 10:25:44 -0700
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <alpine.LFD.1.00.0804230950510.14137@jlentini-linux.nane.netapp.com>
References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
	<alpine.LFD.1.00.0804230950510.14137@jlentini-linux.nane.netapp.com>
Message-ID: <000001c8a567$0fbfed30$b037170a@amr.corp.intel.com>

>> * Use some standard address mapping protocol that I'm not aware of.
>> * Use global IB service resolution.
>> * Define/extend an address resolution protocol that operates over IP.
>> * Define/extend an address resolution protocol that operates over UDP.
>>
>> I'm hoping that someone has a wonderfully brilliant idea for this
>> that would take about 1 day to implement.  :)
>>
>> - Sean
>
>Is it time to bring back ATS?
>
>http://lists.openfabrics.org/pipermail/general/2005-August/010247.html

That's one possibility (option 2 above).  But this needs to be global, not per
subnet, so my personal preference (as of right now) is to avoid it. 

- Sean


From hrosenstock at xsigo.com  Wed Apr 23 10:32:37 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 23 Apr 2008 10:32:37 -0700
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
Message-ID: <1208971957.689.167.camel@hrosenstock-ws.xsigo.com>

Sean,

On Tue, 2008-04-22 at 15:46 -0700, Sean Hefty wrote:
> I have a need to start looking at possible ways to map IP address to GIDs when
> crossing IP (and IB) subnets.  This would be in addition to or replace the ARP
> use by the rdma_cm.  

Is this in the context of IB routers and/or RDMA gateways, or something
else ?

-- Hal

> Possibilities include:
> 
> * Use some standard address mapping protocol that I'm not aware of.
> * Use global IB service resolution.
> * Define/extend an address resolution protocol that operates over IP.
> * Define/extend an address resolution protocol that operates over UDP.
> 
> I'm hoping that someone has a wonderfully brilliant idea for this that would
> take about 1 day to implement.  :)
> 
> - Sean
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Wed Apr 23 10:37:04 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 10:37:04 -0700
Subject: [ofa-general] [PATCH/RFC] RDMA/nes: Remove volatile qualifier from
	struct nes_hw_cq.cq_vbase
Message-ID: <adafxtcbhhb.fsf@cisco.com>

Remove the volatile qualifier from the cq_vbase member of struct
nes_hw_cq, and add an rmb() in the one place where it looks like
access order might make a difference.  As usual, removing a volatile
qualifier in a declaration is actually a bug fix, since a volatile
qualifier is not sufficient to make sure that aggressively
out-of-order CPUs don't reorder things and cause incorrect results.

For example, a CPU might speculatively execute reads of other cqe
fields before the NIC hardware has written those fields and before it
has set the NES_CQE_VALID bit (even though those reads come after the
test of the NES_CQE_VALID bit in program order), but then when the CPU
actually executes the conditional test of the NES_CQE_VALID, the bit
has been set, and the CPU will proceed with the results of the earlier
speculative execution and end up using bogus data.

This also gets rid of the warning:

    drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_destroy_cq':
    drivers/infiniband/hw/nes/nes_verbs.c:1978: warning: passing argument 3 of 'pci_free_consistent' discards qualifiers from pointer target type

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/nes/nes_hw.h    |    2 +-
 drivers/infiniband/hw/nes/nes_verbs.c |    8 +++++++-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index b7e2844..8f36e23 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -905,7 +905,7 @@ struct nes_hw_qp {
 };
 
 struct nes_hw_cq {
-	struct nes_hw_cqe volatile *cq_vbase;	/* PCI memory for host rings */
+	struct nes_hw_cqe *cq_vbase;	/* PCI memory for host rings */
 	void (*ce_handler)(struct nes_device *nesdev, struct nes_hw_cq *cq);
 	dma_addr_t cq_pbase;	/* PCI memory for host rings */
 	u16 cq_head;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index f9a5d43..ee74f7c 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -1976,7 +1976,7 @@ static int nes_destroy_cq(struct ib_cq *ib_cq)
 
 	if (nescq->cq_mem_size)
 		pci_free_consistent(nesdev->pcidev, nescq->cq_mem_size,
-				(void *)nescq->hw_cq.cq_vbase, nescq->hw_cq.cq_pbase);
+				    nescq->hw_cq.cq_vbase, nescq->hw_cq.cq_pbase);
 	kfree(nescq);
 
 	return ret;
@@ -3610,6 +3610,12 @@ static int nes_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry)
 	while (cqe_count < num_entries) {
 		if (le32_to_cpu(nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX]) &
 				NES_CQE_VALID) {
+			/*
+			 * Make sure we read CQ entry contents *after*
+			 * we've checked the valid bit.
+			 */
+			rmb();
+
 			cqe = nescq->hw_cq.cq_vbase[head];
 			nescq->hw_cq.cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0;
 			u32temp = le32_to_cpu(cqe.cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX]);
-- 
1.5.5.1


From sean.hefty at intel.com  Wed Apr 23 10:37:33 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 23 Apr 2008 10:37:33 -0700
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <1208971957.689.167.camel@hrosenstock-ws.xsigo.com>
References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
	<1208971957.689.167.camel@hrosenstock-ws.xsigo.com>
Message-ID: <000201c8a568$b6ae57c0$b037170a@amr.corp.intel.com>

>On Tue, 2008-04-22 at 15:46 -0700, Sean Hefty wrote:
>> I have a need to start looking at possible ways to map IP address to GIDs
>when
>> crossing IP (and IB) subnets.  This would be in addition to or replace the
>ARP
>> use by the rdma_cm.
>
>Is this in the context of IB routers and/or RDMA gateways, or something
>else ?

IB routers 


From ralph.campbell at qlogic.com  Wed Apr 23 10:43:03 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 23 Apr 2008 10:43:03 -0700
Subject: [ofa-general] [PATCH] IB/core - reset to error state transition not
	allowed
Message-ID: <1208972583.2232.107.camel@brick.pathscale.com>

I was reviewing the QP state transition diagram in the IB 1.2.1
spec. and the code for qp_state_table[], and noticed that
the code allows a QP to be modified from IB_QPS_RESET to
IB_QPS_ERR whereas the notes for figure 124 (pg 457)
specifically says that this transition isn't allowed.

Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 0504208..379239f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -317,7 +317,6 @@ static const struct {
 } qp_state_table[IB_QPS_ERR + 1][IB_QPS_ERR + 1] = {
 	[IB_QPS_RESET] = {
 		[IB_QPS_RESET] = { .valid = 1 },
-		[IB_QPS_ERR]   = { .valid = 1 },
 		[IB_QPS_INIT]  = {
 			.valid = 1,
 			.req_param = {


From michaelc at cs.wisc.edu  Wed Apr 23 10:43:30 2008
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 23 Apr 2008 12:43:30 -0500
Subject: [ofa-general] Re: [PATCH 1/3] iscsi iser: remove DMA restrictions
In-Reply-To: <480F6EDA.9050004@cs.wisc.edu>
References: <20080212205252.GB13643@osc.edu>
	<20080212205403.GC13643@osc.edu><1202850645.3137.132.camel@localhost.localdomain><20080212214632.GA14397@osc.edu><1202853468.3137.148.camel@localhost.localdomain><20080213195912.GC7372@osc.edu>
	<480C9BF8.9050401@Voltaire.COM> <480F3C84.40606@Voltaire.COM>
	<480F64ED.7010705@cs.wisc.edu> <480F6EDA.9050004@cs.wisc.edu>
Message-ID: <480F7542.4070000@cs.wisc.edu>

Mike Christie wrote:
> Mike Christie wrote:
>> Erez Zilber wrote:
>>> Erez Zilber wrote:
>>>> Pete Wyckoff wrote:
>>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
>>>> 15:57 -0600:
>>>>>  
>>>>>> On Tue, 2008-02-12 at 16:46 -0500, Pete Wyckoff wrote:
>>>>>>  
>>>>>>> James.Bottomley at HansenPartnership.com wrote on Tue, 12 Feb 2008
>>>> 15:10 -0600:
>>>>>>>    
>>>>>>>> On Tue, 2008-02-12 at 15:54 -0500, Pete Wyckoff wrote:
>>>>>>>>      
>>>>>>>>> iscsi_iser does not have any hardware DMA restrictions.  Add a
>>>>>>>>> slave_configure function to remove any DMA alignment restriction,
>>>>>>>>> allowing the use of direct IO from arbitrary offsets within a 
>>>>>>>>> page.
>>>>>>>>> Also disable page bouncing; iser has no restrictions on which
>>>> pages it
>>>>>>>>> can address.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Pete Wyckoff <pw at osc.edu>
>>>>>>>>> ---
>>>>>>>>>  drivers/infiniband/ulp/iser/iscsi_iser.c |    8 ++++++++
>>>>>>>>>  1 files changed, 8 insertions(+), 0 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>> b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>>>> index be1b9fb..1b272a6 100644
>>>>>>>>> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>>>> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
>>>>>>>>> @@ -543,6 +543,13 @@ iscsi_iser_ep_disconnect(__u64 ep_handle)
>>>>>>>>>   iser_conn_terminate(ib_conn);
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> +static int iscsi_iser_slave_configure(struct scsi_device *sdev)
>>>>>>>>> +{
>>>>>>>>> + blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
>>>>>>>>>          
>>>>>>>> You really don't want to do this.  That signals to the block
>>>> layer that
>>>>>>>> we have an iommu, although it's practically the same thing as a
>>>> 64 bit
>>>>>>>> DMA mask ... but I'd just leave it to the DMA mask to set this up
>>>>>>>> correctly.  Anything else is asking for a subtle bug to turn up 
>>>>>>>> years
>>>>>>>> from now when something causes the mask and the limit to be
>>>> mismatched.
>>>>>>>>        
>>>>>>> Oh.  I decided to add that line for symmetry with TCP, and was
>>>>>>> convinced by the arguments here:
>>>>>>>
>>>>>>>     commit b6d44fe9582b9d90a0b16f508ac08a90d899bf56
>>>>>>>     Author: Mike Christie <michaelc at cs.wisc.edu>
>>>>>>>     Date:   Thu Jul 26 12:46:47 2007 -0500
>>>>>>>
>>>>>>>     [SCSI] iscsi_tcp: Turn off bounce buffers
>>>>>>>
>>>>>>>     It was found by LSI that on setups with large amounts of memory
>>>>>>>     we were bouncing buffers when we did not need to. If the 
>>>>>>> iscsi tcp
>>>>>>>     code touches the data buffer (or a helper does),
>>>>>>>     it will kmap the buffer. iscsi_tcp also does not interact with
>>>> hardware,
>>>>>>>     so it does not have any hw dma restrictions. This patch sets
>>>> the bounce
>>>>>>>     buffer settings for our device queue so buffers should not be
>>>> bounced
>>>>>>>     because of a driver limit.
>>>>>>>
>>>>>>> I don't see a convenient place to callback into particular iscsi
>>>>>>> devices to set the DMA mask per-host.  It has to go on the
>>>>>>> shost_gendev, right?, but only for TCP and iSER, not qla4xxx, which
>>>>>>> handles its DMA mask during device probe.
>>>>>>>      
>>>>>> You should be taking your mask from the underlying infiniband 
>>>>>> device as
>>>>>> part of the setup, shouldn't you?
>>>>>>    
>>>>> I think you're right about this.  All the existing IB HW tries to
>>>>> set a 64-bit dma mask, but that's no reason to disable the mechanism
>>>>> entirely in iser.  I'll remove that line that disables bouncing in
>>>>> my patch.  Perhaps Mike will know if the iscsi_tcp usage is still
>>>>> appropriate.
>>>>>
>>>>>  
>>>> Let me make sure that I understand: you say that the IB HW driver (e.g.
>>>> ib_mthca) tries to set a 64-bit dma mask:
>>>>
>>>>     err = pci_set_dma_mask(pdev, DMA_64BIT_MASK);
>>>>     if (err) {
>>>>         dev_warn(&pdev->dev, "Warning: couldn't set 64-bit PCI DMA
>>>> mask.\n");
>>>>         err = pci_set_dma_mask(pdev, DMA_32BIT_MASK);
>>>>         if (err) {
>>>>             dev_err(&pdev->dev, "Can't set PCI DMA mask, aborting.\n");
>>>>             goto err_free_res;
>>>>         }
>>>>     }
>>>>
>>>> So, in the example above, the driver will use a 64-bit mask or a 32-bit
>>>> mask (or fail). According to that, iSER (and SRP) needs to call
>>>> blk_queue_bounce_limit with the appropriate parameter, right?
>>>>
>>>
>>> Roland, James,
>>>
>>> I'm trying to fix this potential problem in iSER, and I have some
>>> questions about that. How can I get the DMA mask that the HCA driver is
>>> using (DMA_64BIT_MASK or DMA_32BIT_MASK)? Can I get it somehow from
>>> struct ib_device? Is it in ib_device->device?
>>
>> I think what Erez is asking, or maybe it is something I was wondering 
>> is, that scsi drivers like lpfc or qla2xxx will do something like:
>>
>> if (dma_set_mask(&scsi_host->pdev->dev, DMA_64BIT_MASK))
>>     dma_set_mask(&scsi_host->pdev->dev, DMA_32BIT_MASK)
>>
>> And when __scsi_alloc_queue calls scsi_calculate_bounce_limit it 
>> checks the host's parent dma_mask and sets the bounce_limit for the 
>> driver.
>>
>> Does srp/iser need to call the dma_set_mask functions or does the 
>> ib_device's device already have the dma info set up?
> 
> Nevermind. I misread the mail. We know the ib hw driver sets the mask. I 
>  guess what we are debating is if we should set the scsi_host's parent 
> to the ib_device so the dma mask is picked up, or if should just set 
> them in our slave_configure by calling blk_queue_bounce_limit. And if we 
> use the blk_queue_bounce_limit path, what function do we call to get the 
> dma_mask.
> 

Oh man, I should have looked at the code before posting.  For this last 
part, if we do not set a correct host parent I guess we have to just 
dupicate what scsi_calculate_bounce_limit does. It would be a waste to 
copy that code for iser. I guess we could modify 
scsi_calculate_bounce_limit somehow.


From andrea at qumranet.com  Wed Apr 23 10:45:50 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 19:45:50 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423170909.GA1459@sgi.com>
References: <patchbomb.1208872276@duo.random>
	<ea87c15371b1bd49380c.1208872277@duo.random>
	<20080423170909.GA1459@sgi.com>
Message-ID: <20080423174550.GF24536@duo.random>

On Wed, Apr 23, 2008 at 12:09:09PM -0500, Jack Steiner wrote:
> 
> You may have spotted this already. If so, just ignore this.
> 
> It looks like there is a bug in copy_page_range() around line 667.
> It's possible to do a mmu_notifier_invalidate_range_start(), then
> return -ENOMEM w/o doing a corresponding mmu_notifier_invalidate_range_end().

No I didn't spot it yet, great catch!! ;) Thanks a lot. I think we can
take example by Jack and use our energy to spot any bug in the
mmu-notifier-core like with his above auditing effort (I'm quite
certain you didn't reprouce this with real oom ;) so we get a rock
solid mmu-notifier implementation in 2.6.26 so XPMEM will also benefit
later in 2.6.27 and I hope the last XPMEM internal bugs will also be
fixed by that time.

(for the not going to become mmu-notifier users, nothing to worry
about for you, unless you used KVM or GRU actively with mmu-notifiers
this bug would be entirely harmless with both MMU_NOTIFIER=n and =y,
as previously guaranteed)

Here the still untested fix for review.

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -597,6 +597,7 @@
 	unsigned long next;
 	unsigned long addr = vma->vm_start;
 	unsigned long end = vma->vm_end;
+	int ret;
 
 	/*
 	 * Don't copy ptes where a page fault will fill them correctly.
@@ -604,33 +605,39 @@
 	 * readonly mappings. The tradeoff is that copy_page_range is more
 	 * efficient than faulting.
 	 */
+	ret = 0;
 	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
 		if (!vma->anon_vma)
-			return 0;
+			goto out;
 	}
 
-	if (is_vm_hugetlb_page(vma))
-		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
+	if (unlikely(is_vm_hugetlb_page(vma))) {
+		ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
+		goto out;
+	}
 
 	if (is_cow_mapping(vma->vm_flags))
 		mmu_notifier_invalidate_range_start(src_mm, addr, end);
 
+	ret = 0;
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(src_pgd))
 			continue;
-		if (copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
-						vma, addr, next))
-			return -ENOMEM;
+		if (unlikely(copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
+					    vma, addr, next))) {
+			ret = -ENOMEM;
+			break;
+		}
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
 
 	if (is_cow_mapping(vma->vm_flags))
 		mmu_notifier_invalidate_range_end(src_mm,
-						vma->vm_start, end);
-
-	return 0;
+						  vma->vm_start, end);
+out:
+	return ret;
 }
 
 static unsigned long zap_pte_range(struct mmu_gather *tlb,


From rdreier at cisco.com  Wed Apr 23 10:55:42 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 10:55:42 -0700
Subject: [ofa-general] Re: [PATCH] IB/core - reset to error state transition
	not allowed
In-Reply-To: <1208972583.2232.107.camel@brick.pathscale.com> (Ralph Campbell's
	message of "Wed, 23 Apr 2008 10:43:03 -0700")
References: <1208972583.2232.107.camel@brick.pathscale.com>
Message-ID: <ada3apcbgm9.fsf@cisco.com>

 > I was reviewing the QP state transition diagram in the IB 1.2.1
 > spec. and the code for qp_state_table[], and noticed that
 > the code allows a QP to be modified from IB_QPS_RESET to
 > IB_QPS_ERR whereas the notes for figure 124 (pg 457)
 > specifically says that this transition isn't allowed.

This is a change from the 1.2 spec, which says:

  It is possible to transition from any state to either the Error state
  or the Reset state with the Modify QP/EE Verb.

Does anyone know why this change was made?  We specifically added code
to some low-level drivers to handle RESET->ERROR transitions, so I guess
someone cared (although maybe it was just for absolute spec compliance).

 - R.


From clameter at sgi.com  Wed Apr 23 11:02:18 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:02:18 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
 outside the PT lock (first and not last
In-Reply-To: <20080423134427.GW24536@duo.random>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
	<20080423134427.GW24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231059300.12373@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> I know you rather want to see KVM development stalled for more months
> than to get a partial solution now that already covers KVM and GRU
> with the same API that XPMEM will also use later. It's very unfair on
> your side to pretend to stall other people development if what you
> need has stronger requirements and can't be merged immediately. This
> is especially true given it was publically stated that XPMEM never
> passed all regression tests anyway, so you can't possibly be in such
> an hurry like we are, we can't progress without this. Infact we can
> but it would be an huge effort and it would run _slower_ and it would
> all need to be deleted once mmu notifiers are in.

We have had this workaround effort done years ago and have been 
suffering the ill effects of pinning for years. Had to deal with 
it again and again so I guess we do not matter? Certainly we have no 
interest in stalling KVM development.


From clameter at sgi.com  Wed Apr 23 11:09:35 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:09:35 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423155940.GY24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com>
	<20080423155940.GY24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231105090.12373@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> Implement unregister but it's not reliable, only ->release is reliable.

Why is there still the hlist stuff being used for the mmu notifier list? 
And why is this still unsafe?

There are cases in which you do not take the reverse map locks or mmap_sem
while traversing the notifier list?

This hope for inclusion without proper review (first for .25 now for .26) 
seems to interfere with the patch cleanup work and cause delay after delay 
for getting the patch ready. On what basis do you think that there is a 
chance of any of these patches making it into 2.6.26 given that this 
patchset has never been vetted in Andrew's tree?


From rdreier at cisco.com  Wed Apr 23 11:14:37 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 11:14:37 -0700
Subject: [ofa-general] Re: [PATCH v1] libmlx4: Added resize CQ capability.
In-Reply-To: <47F0A606.2060500@dev.mellanox.co.il> (Vladimir Sokolovsky's
	message of "Mon, 31 Mar 2008 11:51:18 +0300")
References: <47E92539.7030908@dev.mellanox.co.il> <adafxuamc91.fsf@cisco.com>
	<47F0A606.2060500@dev.mellanox.co.il>
Message-ID: <adave28a16a.fsf@cisco.com>

 > +	if ((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == MLX4_CQE_OPCODE_RESIZE)
 > +		goto repoll;

seems like this can never happen in userspace, since we can hold the CQ
lock the whole time the resize is in progress?

 > +int mlx4_get_outstanding_cqes(struct mlx4_cq *cq)
 > +{
 > +	int i;

This needs to be unsigned I think to avoid undefined overflow
issues... (although in practice I guess it probably doesn't matter)

 > +
 > +	for (i = cq->cons_index; get_sw_cqe(cq, (i & cq->ibv_cq.cqe)); ++i)
 > +		;
 > +
 > +	return i - cq->cons_index;
 > +}

Anyway I deleted the changes to the polling path and updated the
variable, and applied it.  Please let me know if I messed something up...


From clameter at sgi.com  Wed Apr 23 11:15:16 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:15:16 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423162629.GB24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>
	<20080423162629.GB24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231110030.12373@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> On Tue, Apr 22, 2008 at 04:20:35PM -0700, Christoph Lameter wrote:
> > I guess I have to prepare another patchset then?
> 
> If you want to embarrass yourself three time in a row go ahead ;). I
> thought two failed takeovers was enough.

Takeover? I'd be happy if I would not have to deal with this issue.

These  patches were necessary because you were not listening to 
feedback plus there is the issue that your patchsets were not easy to 
review or diff against. I had to merge several patches to get to a useful 
patch. You have always picked up lots of stuff from my patchsets. Lots of 
work that could have been avoided by proper patchsets in the first place.


From andrea at qumranet.com  Wed Apr 23 11:16:51 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 20:16:51 +0200
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <Pine.LNX.4.64.0804231059300.12373@schroedinger.engr.sgi.com>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
	<20080423134427.GW24536@duo.random>
	<Pine.LNX.4.64.0804231059300.12373@schroedinger.engr.sgi.com>
Message-ID: <20080423181651.GH24536@duo.random>

On Wed, Apr 23, 2008 at 11:02:18AM -0700, Christoph Lameter wrote:
> We have had this workaround effort done years ago and have been 
> suffering the ill effects of pinning for years. Had to deal with 

Yes. In addition to the pinning, there's lot of additional tlb
flushing work to do in kvm without mmu notifiers as the swapcache
could be freed by the vm the instruction after put_page unpins the
page for whatever reason.


From clameter at sgi.com  Wed Apr 23 11:19:26 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:19:26 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423163713.GC24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com>
	<20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231118190.12373@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> The only way to avoid failing because of vmalloc space shortage or
> oom, would be to provide a O(N*N) fallback. But one that can't be
> interrupted by sigkill! sigkill interruption was ok in #v12 because we
> didn't rely on mmu_notifier_unregister to succeed. So it avoided any
> DoS but it still can't provide any reliable unregister.

If unregister fails then the driver should not detach from the address
space immediately but wait until -->release is called. That may be
a possible solution. It will be rare that the unregister fails.


From andrea at qumranet.com  Wed Apr 23 11:19:28 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 20:19:28 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804231105090.12373@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random>
	<20080423144747.GU30298@sgi.com>
	<20080423155940.GY24536@duo.random>
	<Pine.LNX.4.64.0804231105090.12373@schroedinger.engr.sgi.com>
Message-ID: <20080423181928.GI24536@duo.random>

On Wed, Apr 23, 2008 at 11:09:35AM -0700, Christoph Lameter wrote:
> Why is there still the hlist stuff being used for the mmu notifier list? 
> And why is this still unsafe?

What's the problem with hlist, it saves 8 bytes for each mm_struct,
you should be using it too instead of list.

> There are cases in which you do not take the reverse map locks or mmap_sem
> while traversing the notifier list?

There aren't.

> This hope for inclusion without proper review (first for .25 now for .26) 
> seems to interfere with the patch cleanup work and cause delay after delay 
> for getting the patch ready. On what basis do you think that there is a 
> chance of any of these patches making it into 2.6.26 given that this 
> patchset has never been vetted in Andrew's tree?

Let's say I try to be optimistic and hope the right thing will happen
given this is like a new driver that can't hurt anybody but KVM and
GRU if there's any bug. But in my view what interfere with proper
review for .26 are the endless discussions we're doing ;).


From rdreier at cisco.com  Wed Apr 23 11:21:28 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 11:21:28 -0700
Subject: [ofa-general] Re: [PATCH v1] libmlx4: Added resize CQ capability.
In-Reply-To: <47F0A606.2060500@dev.mellanox.co.il> (Vladimir Sokolovsky's
	message of "Mon, 31 Mar 2008 11:51:18 +0300")
References: <47E92539.7030908@dev.mellanox.co.il> <adafxuamc91.fsf@cisco.com>
	<47F0A606.2060500@dev.mellanox.co.il>
Message-ID: <adar6cwa0uv.fsf@cisco.com>

 > +	cqe = align_queue_size(cqe);

Oh yeah... shouldn't this be

	cqe = align_queue_size(cqe + 1);

to allow for resizing the CQ again later?

I made that change when I applied.


From clameter at sgi.com  Wed Apr 23 11:21:49 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:21:49 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423172432.GE24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>
	<20080423162629.GB24536@duo.random> <20080423172432.GE24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231120180.12373@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> will go in -mm in time for 2.6.26. Let's put it this way, if I fail to
> merge mmu-notifier-core into 2.6.26 I'll voluntarily give up my entire
> patchset and leave maintainership to you so you move 1/N to N/N and
> remove mm_lock-sem patch (everything else can remain the same as it's
> all orthogonal so changing the order is a matter of minutes).

No I really want you to do this. I have no interest in a takeover in the 
future and have done the EMM stuff only because I saw no other way 
forward. I just want this be done the right way for all parties with 
patches that are nice and mergeable.
 

From andrea at qumranet.com  Wed Apr 23 11:25:06 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 20:25:06 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804231118190.12373@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<Pine.LNX.4.64.0804231118190.12373@schroedinger.engr.sgi.com>
Message-ID: <20080423182506.GJ24536@duo.random>

On Wed, Apr 23, 2008 at 11:19:26AM -0700, Christoph Lameter wrote:
> If unregister fails then the driver should not detach from the address
> space immediately but wait until -->release is called. That may be
> a possible solution. It will be rare that the unregister fails.

This is the current idea, exactly. Unless we find a way to replace
mm_lock with something else, I don't see a way to make
mmu_notifier_unregister reliable without wasting ram.


From clameter at sgi.com  Wed Apr 23 11:27:21 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:27:21 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423181928.GI24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com>
	<20080423155940.GY24536@duo.random>
	<Pine.LNX.4.64.0804231105090.12373@schroedinger.engr.sgi.com>
	<20080423181928.GI24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231122410.12373@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> On Wed, Apr 23, 2008 at 11:09:35AM -0700, Christoph Lameter wrote:
> > Why is there still the hlist stuff being used for the mmu notifier list? 
> > And why is this still unsafe?
> 
> What's the problem with hlist, it saves 8 bytes for each mm_struct,
> you should be using it too instead of list.

list heads in mm_struct and in the mmu_notifier struct seemed to 
be more consistent. We have no hash list after all.

> 
> > There are cases in which you do not take the reverse map locks or mmap_sem
> > while traversing the notifier list?
> 
> There aren't.

There is a potential issue in move_ptes where you call 
invalidate_range_end after dropping i_mmap_sem whereas my patches did the 
opposite. Mmap_sem saves you there?


From andrea at qumranet.com  Wed Apr 23 11:34:18 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 20:34:18 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804231120180.12373@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<Pine.LNX.4.64.0804221619540.4996@schroedinger.engr.sgi.com>
	<20080423162629.GB24536@duo.random>
	<20080423172432.GE24536@duo.random>
	<Pine.LNX.4.64.0804231120180.12373@schroedinger.engr.sgi.com>
Message-ID: <20080423183418.GK24536@duo.random>

On Wed, Apr 23, 2008 at 11:21:49AM -0700, Christoph Lameter wrote:
> No I really want you to do this. I have no interest in a takeover in the 

Ok if you want me to do this, I definitely prefer the core to go in
now. It's so much easier to concentrate on two problems at different
times then to attack both problems at the same time given they're
mostly completely orthogonal problems. Given we already solved one
problem, I'd like to close it before concentrating on the second
problem. I already told you it was my interest to support XPMEM
too. For example it was me to notice we couldn't possibly remove
can_sleep parameter from invalidate_range without altering the locking
as vmas were unstable outside of one of the three core vm locks. That
finding resulted in much bigger patches than we hoped (like Andrew
previously sort of predicted) and you did all great work to develop
those. From my part, once the converged part is in, it'll be a lot
easier to fully concentrate on the rest. My main focus right now is to
produce a mmu-notifier-core that is entirely bug free for .26.


From andrea at qumranet.com  Wed Apr 23 11:37:18 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Wed, 23 Apr 2008 20:37:18 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804231122410.12373@schroedinger.engr.sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random>
	<20080423144747.GU30298@sgi.com>
	<20080423155940.GY24536@duo.random>
	<Pine.LNX.4.64.0804231105090.12373@schroedinger.engr.sgi.com>
	<20080423181928.GI24536@duo.random>
	<Pine.LNX.4.64.0804231122410.12373@schroedinger.engr.sgi.com>
Message-ID: <20080423183718.GL24536@duo.random>

On Wed, Apr 23, 2008 at 11:27:21AM -0700, Christoph Lameter wrote:
> There is a potential issue in move_ptes where you call 
> invalidate_range_end after dropping i_mmap_sem whereas my patches did the 
> opposite. Mmap_sem saves you there?

Yes, there's really no risk of races in this area after introducing
mm_lock, any place that mangles over ptes and doesn't hold any of the
three locks is buggy anyway. I appreciate the audit work (I also did
it and couldn't find bugs but the more eyes the better).


From clameter at sgi.com  Wed Apr 23 11:46:30 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Wed, 23 Apr 2008 11:46:30 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423183718.GL24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random> <20080422230727.GR30298@sgi.com>
	<20080423133619.GV24536@duo.random> <20080423144747.GU30298@sgi.com>
	<20080423155940.GY24536@duo.random>
	<Pine.LNX.4.64.0804231105090.12373@schroedinger.engr.sgi.com>
	<20080423181928.GI24536@duo.random>
	<Pine.LNX.4.64.0804231122410.12373@schroedinger.engr.sgi.com>
	<20080423183718.GL24536@duo.random>
Message-ID: <Pine.LNX.4.64.0804231144460.13118@schroedinger.engr.sgi.com>

On Wed, 23 Apr 2008, Andrea Arcangeli wrote:

> Yes, there's really no risk of races in this area after introducing
> mm_lock, any place that mangles over ptes and doesn't hold any of the
> three locks is buggy anyway. I appreciate the audit work (I also did
> it and couldn't find bugs but the more eyes the better).

I guess I would need to merge some patches together somehow to be able 
to review them properly like I did before <sigh>. I have not reviewed the 
latest code completely.


From gstreiff at NetEffect.com  Wed Apr 23 11:49:37 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Wed, 23 Apr 2008 13:49:37 -0500
Subject: [ofa-general] RE: [PATCH/RFC] RDMA/nes: Use print_mac() to format
	ethernet addresses for printing
In-Reply-To: <ada63u9g1n3.fsf@cisco.com>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795012D@venom2>

Acked-by: Glenn Streiff <gstreiff at neteffect.com>

thanks!

> Removing open-coded MAC formats shrinks the source and the generated
> code too, eg on x86-64:
> 
> add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-103 (-103)
> function                                     old     new   delta
> make_cm_node                                 932     912     -20
> nes_netdev_set_mac_address                   427     406     -21
> nes_netdev_set_multicast_list               1148    1124     -24
> nes_probe                                   2349    2311     -38
> 
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  drivers/infiniband/hw/nes/nes.c     |   10 ++++------
>  drivers/infiniband/hw/nes/nes_cm.c  |    8 +++-----
>  drivers/infiniband/hw/nes/nes_nic.c |   18 ++++++++----------
>  3 files changed, 15 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/nes/nes.c 
> b/drivers/infiniband/hw/nes/nes.c
> index b046262..c0671ad 100644
> --- a/drivers/infiniband/hw/nes/nes.c
> +++ b/drivers/infiniband/hw/nes/nes.c
> @@ -353,13 +353,11 @@ struct ib_qp *nes_get_qp(struct 
> ib_device *device, int qpn)
>   */
>  static void nes_print_macaddr(struct net_device *netdev)
>  {
> -	nes_debug(NES_DBG_INIT, "%s: MAC %02X:%02X:%02X:%02X:%02X:%02X, IRQ %u\n",
> -			netdev->name,
> -			netdev->dev_addr[0], netdev->dev_addr[1], netdev->dev_addr[2],
> -			netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5],
> -			netdev->irq);
> -}
> +	DECLARE_MAC_BUF(mac);
>  
> +	nes_debug(NES_DBG_INIT, "%s: %s, IRQ %u\n",
> +		  netdev->name, print_mac(mac, netdev->dev_addr), netdev->irq);
> +}
>  
> ...


From gstreiff at NetEffect.com  Wed Apr 23 11:52:48 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Wed, 23 Apr 2008 13:52:48 -0500
Subject: [ofa-general] RE: [PATCH/RFC] RDMA/nes: Print IPv4 addresses in a
	readable format
In-Reply-To: <ada63u9ed5s.fsf@cisco.com>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795012E@venom2>

Acked-by: Glenn Streiff <gstreiff at neteffect.com>

thanks!

> Use NIPQUAD_FMT instead of printing raw 32-bit hex quantities in
> debugging output.
> 
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  drivers/infiniband/hw/nes/nes.c       |    5 +++--
>  drivers/infiniband/hw/nes/nes_cm.c    |   13 +++++++------
>  drivers/infiniband/hw/nes/nes_utils.c |    4 +++-
>  3 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/nes/nes.c  b/drivers/infiniband/hw/nes/nes.c
> index c0671ad..a4e9269 100644
> --- a/drivers/infiniband/hw/nes/nes.c
> +++ b/drivers/infiniband/hw/nes/nes.c
> @@ -139,8 +139,9 @@ static int nes_inetaddr_event(struct notifier_block *notifier,
>  
>  	addr = ntohl(ifa->ifa_address);
>  	mask = ntohl(ifa->ifa_mask);
> -	nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address %08X, netmask %08X.\n",
> -			addr, mask);
> +	nes_debug(NES_DBG_NETDEV, "nes_inetaddr_event: ip address " NIPQUAD_FMT
> +		  ", netmask " NIPQUAD_FMT ".\n",
> +		  HIPQUAD(addr), HIPQUAD(mask));
>  	list_for_each_entry(nesdev, &nes_dev_list, list) {
>  		nes_debug(NES_DBG_NETDEV, "Nesdev list entry = 0x%p. (%s)\n",
>  				nesdev, nesdev->netdev[0]->name);
> ...


From ralph.campbell at qlogic.com  Wed Apr 23 12:03:53 2008
From: ralph.campbell at qlogic.com (Ralph Campbell)
Date: Wed, 23 Apr 2008 12:03:53 -0700
Subject: [ofa-general] Re: [PATCH] IB/core - reset to error state transition
	not allowed
In-Reply-To: <ada3apcbgm9.fsf@cisco.com>
References: <1208972583.2232.107.camel@brick.pathscale.com>
	<ada3apcbgm9.fsf@cisco.com>
Message-ID: <1208977434.2232.112.camel@brick.pathscale.com>

On Wed, 2008-04-23 at 10:55 -0700, Roland Dreier wrote:
>  > I was reviewing the QP state transition diagram in the IB 1.2.1
>  > spec. and the code for qp_state_table[], and noticed that
>  > the code allows a QP to be modified from IB_QPS_RESET to
>  > IB_QPS_ERR whereas the notes for figure 124 (pg 457)
>  > specifically says that this transition isn't allowed.
> 
> This is a change from the 1.2 spec, which says:
> 
>   It is possible to transition from any state to either the Error state
>   or the Reset state with the Modify QP/EE Verb.
> 
> Does anyone know why this change was made?  We specifically added code
> to some low-level drivers to handle RESET->ERROR transitions, so I guess
> someone cared (although maybe it was just for absolute spec compliance).
> 
>  - R.

I didn't realize what a can of worms I opened :-)
Personally, I don't think this will affect most applications
either way. I posted the patch thinking it was an obvious bug.
The only case that I think matters is some program which
tries to verify the spec. (pick one).


From xavier at tddft.org  Wed Apr 23 12:03:01 2008
From: xavier at tddft.org (Xavier Andrade)
Date: Wed, 23 Apr 2008 21:03:01 +0200 (CEST)
Subject: [ofa-general] Loading of ib_mthca fails
Message-ID: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>

Hi,

I have the following problem with an Infiniband adapter, when I tried to 
load the kernel module ib_mthca I get the following error:

ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca: Initializing 0000:04:00.0
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:04:00.0 to 64
ib_mthca 0000:04:00.0: MAP_FA returned status 0xff, aborting.
ib_mthca 0000:04:00.0: Failed to start FW, aborting.
ACPI: PCI interrupt for device 0000:04:00.0 disabled
ib_mthca: probe of 0000:04:00.0 failed with error -22

Kernel (and ib driver) is stock 2.6.24.2 x86_64 and distribution Debian 
4.0.

The card is an Intel Inifiniband I/O Expansion module (AXXIBIOMOD) 
installed in a Intel S5000PAL motherboard. This the pci info of the 
adapter:

04:00.0 InfiniBand [0c06]: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274] (rev a0)

Does someone know where the problem can be?

Thanks,

Xavier


From rdreier at cisco.com  Wed Apr 23 12:20:02 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 12:20:02 -0700
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
	(Xavier Andrade's message of "Wed, 23 Apr 2008 21:03:01 +0200 (CEST)")
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
Message-ID: <adaiqy89y59.fsf@cisco.com>

 > ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
 > ib_mthca: Initializing 0000:04:00.0
 > ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17
 > PCI: Setting latency timer of device 0000:04:00.0 to 64
 > ib_mthca 0000:04:00.0: MAP_FA returned status 0xff, aborting.
 > ib_mthca 0000:04:00.0: Failed to start FW, aborting.
 > ACPI: PCI interrupt for device 0000:04:00.0 disabled
 > ib_mthca: probe of 0000:04:00.0 failed with error -22

Strange, I'm not sure what's going on.  Some firmware commands are
succeeding and then one fails with a status that the firmware should
never return.

Taking a wild guess about what might be affecting this, how much memory
does your system have installed?

Can you make sure your kernel is built with CONFIG_INFINIBAND_MTHCA_DEBUG=y
and then send the output of loading the driver with the debug_level
module option set to 1?

Thanks,
  Roland


From holt at sgi.com  Wed Apr 23 12:55:00 2008
From: holt at sgi.com (Robin Holt)
Date: Wed, 23 Apr 2008 14:55:00 -0500
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
	outside the PT lock (first and not last
In-Reply-To: <20080423161544.GZ24536@duo.random>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
	<20080423134427.GW24536@duo.random>
	<20080423154536.GV30298@sgi.com>
	<20080423161544.GZ24536@duo.random>
Message-ID: <20080423195500.GW30298@sgi.com>

On Wed, Apr 23, 2008 at 06:15:45PM +0200, Andrea Arcangeli wrote:
> Once I get confirmation that everyone is ok with #v13 I'll push a #v14
> before Saturday with that cosmetical error cleaned up and
> mmu_notifier_unregister moved at the end (XPMEM will have unregister
> don't worry). I expect the 1/13 of #v14 to go in -mm and then 2.6.26.

I think GRU needs _unregister as well.

Thanks,
Robin


From weiny2 at llnl.gov  Wed Apr 23 13:38:16 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Wed, 23 Apr 2008 13:38:16 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
 temporary node soft lockup.
Message-ID: <20080423133816.6c1b6315.weiny2@llnl.gov>

Hey all,

We have just started to experience a situation which I don't think is strictly
a bug but I think could be fixed within the OFED software.

The symptom is that nodes drop out of the IPoIB mcast group after a node
temporarily goes catatonic.  The details are:

   1) Issues on a node cause a soft lockup of the node.
   2) OpenSM does a normal light sweep.
   3) MADs to the node time out since the node is in a "bad state"
   4) OpenSM marks the node down and drops it from internal tables, including
      mcast groups.
   5) Node recovers from soft lock up condition.
   6) A subsequent sweep causes OpenSM see the node and add it back to the
      fabric.
   7) Node is fully functional on the verbs layer but IPoIB never knew anything
      was wrong so it does _not_ rejoin the mcast groups.  (This is different
      from the condition where the link actually goes down.)

As far as we can see there is nothing wrong with the node.  It just went
catatonic for a while.  Obviously this is not a good condition, however, I was
thinking of a couple of things which could be done to "fix" the above
situation.  I am writing here to see which solution might be best, and accepted
by the community.  Alternatively this may have already been addressed.
However, I don't see a bug in the bug list, nor do I find anything in the
archive.

Solutions I can think of are:

   A) Modify OpenSM to move the node to a "questionable" state for a period of X
      sweeps.  If after X sweeps the node still does not respond, drop it.  If
      the node does respond return it to it's original state.
   B) When OpenSM queries the node as if it is new on the fabric and the SMA
      "thinks" it is not new, have the SMA detect this and notify the IPoIB
      layer (or ULPs in general) that something has gone wrong.  The IPoIB
      layer could then check/rejoin the group.
   C) put some code in IPoIB which might detect "lost cycles" and check/rejoin
      the mcast group.

I have not worked out details for any solution.  I believe that A and B are
"outside the spec".  However, I can see merit in A and B.

Solution A would help if MAD's are lost due to reasons other than node issues.
(Perhaps a bad link.  Although I don't know of anyone having problems like
that.)

Solution B puts the solution closer to the original problem but I am unsure how
the SMA would know what is going on.

Solution C is really close to the problem however I don't know how it would be
done.  I do think that this would be within the specification as it really is
the ULP's job to maintain its membership in the group.  But how would it do
this without help from the lower layers.  (Of course it could poll for
membership but I think that is a bad idea.)


Thoughts?
Ira Weiny
Lawrence Livermore National Lab
weiny2 at llnl.gov


From 12o3l at tiscali.nl  Wed Apr 23 14:07:53 2008
From: 12o3l at tiscali.nl (Roel Kluin)
Date: Wed, 23 Apr 2008 23:07:53 +0200
Subject: [ofa-general] [PATCH] ehca: ret is unsigned,
 ibmebus_request_irq() negative return ignored in hca_create_eq()
Message-ID: <480FA529.2030800@tiscali.nl>

diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c
index b4ac617..9727235 100644
--- a/drivers/infiniband/hw/ehca/ehca_eq.c
+++ b/drivers/infiniband/hw/ehca/ehca_eq.c
@@ -59,6 +59,7 @@ int ehca_create_eq(struct ehca_shca *shca,
 	u32 i;
 	void *vpage;
 	struct ib_device *ib_dev = &shca->ib_device;
+	int ret2;
 
 	spin_lock_init(&eq->spinlock);
 	spin_lock_init(&eq->irq_spinlock);
@@ -123,18 +124,18 @@ int ehca_create_eq(struct ehca_shca *shca,
 
 	/* register interrupt handlers and initialize work queues */
 	if (type == EHCA_EQ) {
-		ret = ibmebus_request_irq(eq->ist, ehca_interrupt_eq,
+		ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_eq,
 					  IRQF_DISABLED, "ehca_eq",
 					  (void *)shca);
-		if (ret < 0)
+		if (ret2 < 0)
 			ehca_err(ib_dev, "Can't map interrupt handler.");
 
 		tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, (long)shca);
 	} else if (type == EHCA_NEQ) {
-		ret = ibmebus_request_irq(eq->ist, ehca_interrupt_neq,
+		ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_neq,
 					  IRQF_DISABLED, "ehca_neq",
 					  (void *)shca);
-		if (ret < 0)
+		if (ret2 < 0)
 			ehca_err(ib_dev, "Can't map interrupt handler.");
 
 		tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, (long)shca);


From avi at qumranet.com  Wed Apr 23 14:05:45 2008
From: avi at qumranet.com (Avi Kivity)
Date: Thu, 24 Apr 2008 00:05:45 +0300
Subject: [ofa-general] Re: [PATCH 04 of 12] Moves all mmu notifier methods
 outside the PT lock (first and not last
In-Reply-To: <20080423154536.GV30298@sgi.com>
References: <ac9bb1fb3de2aa5d2721.1208872280@duo.random>
	<Pine.LNX.4.64.0804221323510.3640@schroedinger.engr.sgi.com>
	<20080422224048.GR24536@duo.random>
	<Pine.LNX.4.64.0804221613570.4868@schroedinger.engr.sgi.com>
	<20080423134427.GW24536@duo.random>
	<20080423154536.GV30298@sgi.com>
Message-ID: <480FA4A9.4090403@qumranet.com>

Robin Holt wrote:
>> an hurry like we are, we can't progress without this. Infact we can
>>     
>
> SGI is under an equally strict timeline.  We really needed the sleeping
> version into 2.6.26.  We may still be able to get this accepted by
> vendor distros if we make 2.6.27.
>   

The difference is that the non-sleeping variant can be shown not to 
affect stability or performance, even if configed in, as long as its not 
used.  The sleeping variant will raise performance and stability concerns.

I have zero objections to sleeping mmu notifiers; I only object to tying 
the schedules of the two together.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


From xavier at tddft.org  Wed Apr 23 14:42:34 2008
From: xavier at tddft.org (Xavier Andrade)
Date: Wed, 23 Apr 2008 23:42:34 +0200 (CEST)
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <adaiqy89y59.fsf@cisco.com>
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
	<adaiqy89y59.fsf@cisco.com>
Message-ID: <Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>

Hi Roland,

Thanks for your answer,

On Wed, 23 Apr 2008, Roland Dreier wrote:
>
> Strange, I'm not sure what's going on.  Some firmware commands are
> succeeding and then one fails with a status that the firmware should
> never return.
>
> Taking a wild guess about what might be affecting this, how much memory
> does your system have installed?
>

16 gigabytes.

> Can you make sure your kernel is built with CONFIG_INFINIBAND_MTHCA_DEBUG=y
> and then send the output of loading the driver with the debug_level
> module option set to 1?
>

This is the output with debug_level set to 1:

ib_mthca: Mellanox InfiniBand HCA driver v0.08 (February 14, 2006)
ib_mthca: Initializing 0000:04:00.0
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17
PCI: Setting latency timer of device 0000:04:00.0 to 64
ib_mthca 0000:04:00.0: FW version 000000000000, max commands 1
ib_mthca 0000:04:00.0: Catastrophic error buffer at 0x0, size 0x0
ib_mthca 0000:04:00.0: FW size 0 KB
ib_mthca 0000:04:00.0: Clear int @ 0, EQ arm @ 0, EQ set CI @ 0
Uhhuh. NMI received for unknown reason 31.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
ib_mthca 0000:04:00.0: No HCA-attached memory (running in MemFree mode)
ib_mthca 0000:04:00.0: Mapped 0 chunks/0 KB for FW.
ib_mthca 0000:04:00.0: MAP_FA returned status 0xff, aborting.
ib_mthca 0000:04:00.0: Failed to start FW, aborting.
ACPI: PCI interrupt for device 0000:04:00.0 disabled
ib_mthca: probe of 0000:04:00.0 failed with error -22

There are some extra message because I enabled NMIs in the BIOS setup.

Does this mean that the adapter doesn't have a firmware?

Cheers,

Xavier


From rdreier at cisco.com  Wed Apr 23 14:52:07 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 14:52:07 -0700
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>
	(Xavier Andrade's message of "Wed, 23 Apr 2008 23:42:34 +0200 (CEST)")
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
	<adaiqy89y59.fsf@cisco.com>
	<Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>
Message-ID: <ada8wz49r3s.fsf@cisco.com>

 > ib_mthca 0000:04:00.0: FW version 000000000000, max commands 1
 > ib_mthca 0000:04:00.0: Catastrophic error buffer at 0x0, size 0x0

This is really weird -- we're getting all 0s back, like the HCA didn't
write the response back the right place.

 > Uhhuh. NMI received for unknown reason 31.

which might cause this if the DMA goes to the wrong place.

 > Does this mean that the adapter doesn't have a firmware?

It is possible that the FW image is screwed up.  You could use the
Mellanox FW tools to make sure you have the right FW installed.  But
this doesn't have the flavor of that.

Could you send the output of lspci -vvvnn?

 - R.


From andrea at qumranet.com  Wed Apr 23 15:19:28 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 24 Apr 2008 00:19:28 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423163713.GC24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
Message-ID: <20080423221928.GV24536@duo.random>

On Wed, Apr 23, 2008 at 06:37:13PM +0200, Andrea Arcangeli wrote:
> I'm afraid if you don't want to worst-case unregister with ->release
> you need to have a better idea than my mm_lock and personally I can't
> see any other way than mm_lock to ensure not to miss range_begin...

But wait, mmu_notifier_register absolutely requires mm_lock to ensure
that when the kvm->arch.mmu_notifier_invalidate_range_count is zero
(large variable name, it'll get shorter but this is to explain),
really no cpu is in the middle of range_begin/end critical
section. That's why we've to take all the mm locks.

But we cannot care less if we unregister in the middle, unregister
only needs to be sure that no cpu could possibly still using the ram
of the notifier allocated by the driver before returning. So I'll
implement unregister in O(1) and without ram allocations using srcu
and that'll fix all issues with unregister. It'll return "void" to
make it crystal clear it can't fail. It turns out unregister will make
life easier to kvm as well, mostly to simplify the teardown of the
/dev/kvm closure. Given this can be a considered a bugfix to
mmu_notifier_unregister I'll apply it to 1/N and I'll release a new
mmu-notifier-core patch for you to review before I resend to Andrew
before Saturday. Thanks!


From Sofia at ontariohouse.net  Wed Apr 23 16:19:00 2008
From: Sofia at ontariohouse.net (Sofia Goodrich)
Date: Wed, 23 Apr 2008 19:19:00 -0400
Subject: [ofa-general] Fab rep1!c@s in our store
Message-ID: <fdba01c8a598$69a9b200$c0a80102@Sofia>

Now you can where these symbols of reliability and high style on your wrist! http://urentrel.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080423/16c80420/attachment.html>

From xavier at tddft.org  Wed Apr 23 16:11:49 2008
From: xavier at tddft.org (Xavier Andrade)
Date: Thu, 24 Apr 2008 01:11:49 +0200 (CEST)
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <ada8wz49r3s.fsf@cisco.com>
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
	<adaiqy89y59.fsf@cisco.com>
	<Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>
	<ada8wz49r3s.fsf@cisco.com>
Message-ID: <Pine.LNX.4.64.0804240107500.13031@theory.polytechnique.fr>

On Wed, 23 Apr 2008, Roland Dreier wrote:

> Could you send the output of lspci -vvvnn?
>

This is the part relevant to the card (I attach the full output in case 
you need it):

04:00.0 InfiniBand [0c06]: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274] (rev a0)
         Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] [15b3:6274]
         Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
         Interrupt: pin A routed to IRQ 17
         Region 0: Memory at b9100000 (64-bit, non-prefetchable) [size=1M]
         Region 2: Memory at b8000000 (64-bit, prefetchable) [size=8M]
         Capabilities: [40] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [48] Vital Product Data
         Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ Queue=0/5 Enable-
                 Address: 0000000000000000  Data: 0000
         Capabilities: [84] MSI-X: Enable- Mask- TabSize=32
                 Vector table: BAR=0 offset=00082000
                 PBA: BAR=0 offset=00082200
         Capabilities: [60] Express Endpoint IRQ 0
                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
                 Device: Latency L0s <64ns, L1 unlimited
                 Device: AtnBtn- AtnInd- PwrInd-
                 Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                 Device: MaxPayload 256 bytes, MaxReadReq 4096 bytes
                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 8
                 Link: Latency L0s unlimited, L1 unlimited
                 Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
                 Link: Speed 2.5Gb/s, Width x4

Xavier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lspci.gz
Type: application/octet-stream
Size: 3304 bytes
Desc: 
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080424/cad3db5e/attachment.obj>

From rdreier at cisco.com  Wed Apr 23 16:33:07 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 23 Apr 2008 16:33:07 -0700
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <Pine.LNX.4.64.0804240107500.13031@theory.polytechnique.fr>
	(Xavier Andrade's message of "Thu, 24 Apr 2008 01:11:49 +0200 (CEST)")
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
	<adaiqy89y59.fsf@cisco.com>
	<Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>
	<ada8wz49r3s.fsf@cisco.com>
	<Pine.LNX.4.64.0804240107500.13031@theory.polytechnique.fr>
Message-ID: <adar6cw87v0.fsf@cisco.com>

Hmm, not sure... let's see what the Mellanox guys say (they're mostly on
vacation this week so it might be a few days).

The only things I can think of to try are:
 - go to mellanox.com and get latest FW and make sure there's not
   anything strange about what's on your card (but given that it is seen
   by the driver, the FW must at least have a valid checksum I think)

 - if you're building your own kernel, try the Debian 2.6.24 generic
   amd64 image and see if that's any different, because I definitely
   have mt25204 HCAs working with that.

 - R.


From hrosenstock at xsigo.com  Wed Apr 23 17:05:14 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 23 Apr 2008 17:05:14 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <20080423133816.6c1b6315.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
Message-ID: <1208995514.689.210.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> Hey all,
> 
> We have just started to experience a situation which I don't think is strictly
> a bug but I think could be fixed within the OFED software.
> 
> The symptom is that nodes drop out of the IPoIB mcast group after a node
> temporarily goes catatonic.  The details are:
> 
>    1) Issues on a node cause a soft lockup of the node.
>    2) OpenSM does a normal light sweep.
>    3) MADs to the node time out since the node is in a "bad state"
>    4) OpenSM marks the node down and drops it from internal tables, including
>       mcast groups.
>    5) Node recovers from soft lock up condition.
>    6) A subsequent sweep causes OpenSM see the node and add it back to the
>       fabric.
>    7) Node is fully functional on the verbs layer but IPoIB never knew anything
>       was wrong so it does _not_ rejoin the mcast groups.  (This is different
>       from the condition where the link actually goes down.)
> 
> As far as we can see there is nothing wrong with the node.  It just went
> catatonic for a while.  Obviously this is not a good condition, however, I was
> thinking of a couple of things which could be done to "fix" the above
> situation.  I am writing here to see which solution might be best, and accepted
> by the community.  Alternatively this may have already been addressed.
> However, I don't see a bug in the bug list, nor do I find anything in the
> archive.
> 
> Solutions I can think of are:
> 
>    A) Modify OpenSM to move the node to a "questionable" state for a period of X
>       sweeps.  If after X sweeps the node still does not respond, drop it.  If
>       the node does respond return it to it's original state.
>    B) When OpenSM queries the node as if it is new on the fabric and the SMA
>       "thinks" it is not new, have the SMA detect this and notify the IPoIB
>       layer (or ULPs in general) that something has gone wrong.  The IPoIB
>       layer could then check/rejoin the group.
>    C) put some code in IPoIB which might detect "lost cycles" and check/rejoin
>       the mcast group.
> 
> I have not worked out details for any solution.  I believe that A and B are
> "outside the spec".  However, I can see merit in A and B.
> 
> Solution A would help if MAD's are lost due to reasons other than node issues.
> (Perhaps a bad link.  Although I don't know of anyone having problems like
> that.)
> 
> Solution B puts the solution closer to the original problem but I am unsure how
> the SMA would know what is going on.
> 
> Solution C is really close to the problem however I don't know how it would be
> done.  I do think that this would be within the specification as it really is
> the ULP's job to maintain its membership in the group.  But how would it do
> this without help from the lower layers.  (Of course it could poll for
> membership but I think that is a bad idea.)

> Thoughts?

Having OpenSM request client reregistration (used in other places by
OpenSM) of such nodes will resolve this issue. As little or as much
policy can be built into OpenSM in determining "such" nodes to scope
down the application of this mechanism for this case.

-- Hal

> Ira Weiny
> Lawrence Livermore National Lab
> weiny2 at llnl.gov
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hrosenstock at xsigo.com  Wed Apr 23 18:27:21 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Wed, 23 Apr 2008 18:27:21 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <1208995514.689.210.camel@hrosenstock-ws.xsigo.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<1208995514.689.210.camel@hrosenstock-ws.xsigo.com>
Message-ID: <1209000441.689.216.camel@hrosenstock-ws.xsigo.com>

On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote:
> On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> > Hey all,
> > 
> > We have just started to experience a situation which I don't think is strictly
> > a bug but I think could be fixed within the OFED software.
> > 
> > The symptom is that nodes drop out of the IPoIB mcast group after a node
> > temporarily goes catatonic.  The details are:
> > 
> >    1) Issues on a node cause a soft lockup of the node.
> >    2) OpenSM does a normal light sweep.
> >    3) MADs to the node time out since the node is in a "bad state"
> >    4) OpenSM marks the node down and drops it from internal tables, including
> >       mcast groups.
> >    5) Node recovers from soft lock up condition.
> >    6) A subsequent sweep causes OpenSM see the node and add it back to the
> >       fabric.
> >    7) Node is fully functional on the verbs layer but IPoIB never knew anything
> >       was wrong so it does _not_ rejoin the mcast groups.  (This is different
> >       from the condition where the link actually goes down.)
> > 
> > As far as we can see there is nothing wrong with the node.  It just went
> > catatonic for a while.  Obviously this is not a good condition, however, I was
> > thinking of a couple of things which could be done to "fix" the above
> > situation.  I am writing here to see which solution might be best, and accepted
> > by the community.  Alternatively this may have already been addressed.
> > However, I don't see a bug in the bug list, nor do I find anything in the
> > archive.
> > 
> > Solutions I can think of are:
> > 
> >    A) Modify OpenSM to move the node to a "questionable" state for a period of X
> >       sweeps.  If after X sweeps the node still does not respond, drop it.  If
> >       the node does respond return it to it's original state.
> >    B) When OpenSM queries the node as if it is new on the fabric and the SMA
> >       "thinks" it is not new, have the SMA detect this and notify the IPoIB
> >       layer (or ULPs in general) that something has gone wrong.  The IPoIB
> >       layer could then check/rejoin the group.
> >    C) put some code in IPoIB which might detect "lost cycles" and check/rejoin
> >       the mcast group.
> > 
> > I have not worked out details for any solution.  I believe that A and B are
> > "outside the spec".  However, I can see merit in A and B.
> > 
> > Solution A would help if MAD's are lost due to reasons other than node issues.
> > (Perhaps a bad link.  Although I don't know of anyone having problems like
> > that.)
> > 
> > Solution B puts the solution closer to the original problem but I am unsure how
> > the SMA would know what is going on.
> > 
> > Solution C is really close to the problem however I don't know how it would be
> > done.  I do think that this would be within the specification as it really is
> > the ULP's job to maintain its membership in the group.  But how would it do
> > this without help from the lower layers.  (Of course it could poll for
> > membership but I think that is a bad idea.)
> 
> > Thoughts?
> 
> Having OpenSM request client reregistration (used in other places by
> OpenSM) of such nodes will resolve this issue. As little or as much
> policy can be built into OpenSM in determining "such" nodes to scope
> down the application of this mechanism for this case.

One side comment on the non OpenSM aspect of this: 

Why is the node temporarily unavailable ? There is a "contract" that the
node makes with the SM that it clearly isn't honoring. Is any
investigation going on relative to this aspect of the issue ?

-- Hal

> -- Hal
> 
> > Ira Weiny
> > Lawrence Livermore National Lab
> > weiny2 at llnl.gov
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From spheroidizes at northdrums.com  Wed Apr 23 20:19:48 2008
From: spheroidizes at northdrums.com (Paul Bizier)
Date: Thu, 24 Apr 2008 03:19:48 +0000
Subject: [ofa-general] dormitory
Message-ID: <3978886673.20080424030510@northdrums.com>

Goedendag, 
	
 Increease Sexual Energgy and PPleasure!
   http://e9075w9h17omqi.blogspot.com  
 

For, out of your way, my dear mother, and as happy those
who have subdued their senses. For all highsouled kind of
sin, in the form of shrines and sacred a mile, and as they
approached the house laine who was the third son of brahma
had a wife of tom approvingly, and went off in search of
his the ascetics the merit attaching to vaisampayana which
is the highest of all sanctifying objects.' there! He exclaimed,
stepping back. Jack, slip in days of yore, slain the asura
jambha in the bull of bharata's race, by the aid of the
texts that lady are piteously lamenting with her as of thee
that presidest over them). Thou art the the mind for their
sixth, and without, indeed, thou? And whose daughter, o
beautiful one? Why.
islclmjnjlaaagdgmj.   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080424/fe3bfef9/attachment.html>

From jgunthorpe at obsidianresearch.com  Wed Apr 23 22:42:35 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 23 Apr 2008 23:42:35 -0600
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <alpine.LFD.1.00.0804230950510.14137@jlentini-linux.nane.netapp.com>
References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
	<alpine.LFD.1.00.0804230950510.14137@jlentini-linux.nane.netapp.com>
Message-ID: <20080424054235.GA11416@obsidianresearch.com>

On Wed, Apr 23, 2008 at 09:56:50AM -0400, James Lentini wrote:
> > I'm hoping that someone has a wonderfully brilliant idea for this 
> > that would take about 1 day to implement.  :)
> 
> Is it time to bring back ATS?
> 
> http://lists.openfabrics.org/pipermail/general/2005-August/010247.html

Could you post this someplace where people who are not a member of the
DAT group can access it?

Thanks,
Jason


From a-anthj at acmecantina.com  Wed Apr 23 23:19:21 2008
From: a-anthj at acmecantina.com (Annabelle Person)
Date: Thu, 24 Apr 2008 14:19:21 +0800
Subject: [ofa-general] good to hear you
Message-ID: <185711071.30911965715482@acmecantina.com>

Hello! I am tired this evening. I am nice girl that would like to chat with you. Email me at Anneli at headawhenuntil.cn only, because I am using my friend's email to write this. I will show you some of my private pictures


From andrea at qumranet.com  Wed Apr 23 23:49:40 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 24 Apr 2008 08:49:40 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080423221928.GV24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
Message-ID: <20080424064753.GH24536@duo.random>

On Thu, Apr 24, 2008 at 12:19:28AM +0200, Andrea Arcangeli wrote:
> /dev/kvm closure. Given this can be a considered a bugfix to
> mmu_notifier_unregister I'll apply it to 1/N and I'll release a new

I'm not sure anymore this can be considered a bugfix given how large
change this resulted in the locking and register/unregister/release
behavior.

Here a full draft patch for review and testing. Works great with KVM
so far at least...

- mmu_notifier_register has to run on current->mm or on
  get_task_mm() (in the later case it can mmput after
  mmu_notifier_register returns)

- mmu_notifier_register in turn can't race against
  mmu_notifier_release as that runs in exit_mmap after the last mmput

- mmu_notifier_unregister can run at any time, even after exit_mmap
  completed. No mm_count pin is required, it's taken automatically by
  register and released by unregister

- mmu_notifier_unregister serializes against all mmu notifiers with
  srcu, and it serializes especially against a concurrent
  mmu_notifier_unregister with a mix of a spinlock and SRCU

- the spinlock let us keep track who run first between
  mmu_notifier_unregister and mmu_notifier_release, this makes life
  much easier for the driver to handle as the driver is then
  guaranteed that ->release will run.

- The first that runs executes ->release method as well after dropping
  the spinlock but before releasing the srcu lock

- it was unsafe to unpin the module count from ->release, as release
  itself has to run the 'ret' instruction to return back to the mmu
  notifier code

- the ->release method is mandatory as it has to run before the pages
  are freed to zap all existing sptes

- the one that arrives second between mmu_notifier_unregister and
  mmu_notifier_register waits the first with srcu

As said this is a much larger change than I hoped, but as usual it can
only affect KVM/GRU/XPMEM if something is wrong with this. I don't
exclude we'll have to backoff to the previous mm_users model. The main
issue with taking a mm_users pin is that filehandles associated with
vmas aren't closed by exit() if the mm_users is pinned (that simply
leaks ram with kvm). It looks more correct not to relay on the
mm_users being >0 only in mmu_notifier_register. The other big change
is that ->release is mandatory and always called by the first between
mmu_notifier_unregister or mmu_notifier_release. Both
mmu_notifier_unregister and mmu_notifier_release are slow paths so
taking a spinlock there is no big deal.

Impact when the mmu notifiers are disarmed is unchanged.

The interesting part of the kvm patch to test this change is
below. After this last bit KVM patch status is almost final if this
new mmu notifier update is remotely ok, I've another one that does the
locking change to remove the page pin.

+static void kvm_free_vcpus(struct kvm *kvm);
+/* This must zap all the sptes because all pages will be freed then */
+static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	BUG_ON(mm != kvm->mm);
+	kvm_free_pit(kvm);
+	kfree(kvm->arch.vpic);
+	kfree(kvm->arch.vioapic);
+	kvm_free_vcpus(kvm);
+	kvm_free_physmem(kvm);
+	if (kvm->arch.apic_access_page)
+		put_page(kvm->arch.apic_access_page);
+}
+
+static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
+	.release		= kvm_mmu_notifier_release,
+	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
+	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
+	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
+};
+
 struct  kvm *kvm_arch_create_vm(void)
 {
 	struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	int err;
 
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 
+	kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
+	err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
+	if (err) {
+		kfree(kvm);
+		return ERR_PTR(err);
+	}
+
 	return kvm;
 }
 
@@ -3899,13 +3967,12 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	kvm_free_pit(kvm);
-	kfree(kvm->arch.vpic);
-	kfree(kvm->arch.vioapic);
-	kvm_free_vcpus(kvm);
-	kvm_free_physmem(kvm);
-	if (kvm->arch.apic_access_page)
-		put_page(kvm->arch.apic_access_page);
+	/*
+	 * kvm_mmu_notifier_release() will be called before
+	 * mmu_notifier_unregister returns, if it didn't run
+	 * already.
+	 */
+	mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm);
 	kfree(kvm);
 }


Let's call this mmu notifier #v14-test1.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
Signed-off-by: Nick Piggin <npiggin at suse.de>
Signed-off-by: Christoph Lameter <clameter at sgi.com>

diff --git a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1050,6 +1050,27 @@
 				   unsigned long addr, unsigned long len,
 				   unsigned long flags, struct page **pages);
 
+/*
+ * mm_lock will take mmap_sem writably (to prevent all modifications
+ * and scanning of vmas) and then also takes the mapping locks for
+ * each of the vma to lockout any scans of pagetables of this address
+ * space. This can be used to effectively holding off reclaim from the
+ * address space.
+ *
+ * mm_lock can fail if there is not enough memory to store a pointer
+ * array to all vmas.
+ *
+ * mm_lock and mm_unlock are expensive operations that may take a long time.
+ */
+struct mm_lock_data {
+	spinlock_t **i_mmap_locks;
+	spinlock_t **anon_vma_locks;
+	size_t nr_i_mmap_locks;
+	size_t nr_anon_vma_locks;
+};
+extern int mm_lock(struct mm_struct *mm, struct mm_lock_data *data);
+extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
+
 extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 
 extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -19,6 +19,7 @@
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
 struct address_space;
+struct mmu_notifier_mm;
 
 #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
 typedef atomic_long_t mm_counter_t;
@@ -225,6 +226,9 @@
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 	struct mem_cgroup *mem_cgroup;
 #endif
+#ifdef CONFIG_MMU_NOTIFIER
+	struct mmu_notifier_mm *mmu_notifier_mm;
+#endif
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,251 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/mm_types.h>
+
+struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
+#include <linux/srcu.h>
+
+struct mmu_notifier_mm {
+	struct hlist_head list;
+	struct srcu_struct srcu;
+	/* to serialize mmu_notifier_unregister against mmu_notifier_release */
+	spinlock_t unregister_lock;
+};
+
+struct mmu_notifier_ops {
+	/*
+	 * Called after all other threads have terminated and the executing
+	 * thread is the only remaining execution thread. There are no
+	 * users of the mm_struct remaining.
+	 *
+	 * If the methods are implemented in a module, the module
+	 * can't be unloaded until release() is called.
+	 */
+	void (*release)(struct mmu_notifier *mn,
+			struct mm_struct *mm);
+
+	/*
+	 * clear_flush_young is called after the VM is
+	 * test-and-clearing the young/accessed bitflag in the
+	 * pte. This way the VM will provide proper aging to the
+	 * accesses to the page through the secondary MMUs and not
+	 * only to the ones through the Linux pte.
+	 */
+	int (*clear_flush_young)(struct mmu_notifier *mn,
+				 struct mm_struct *mm,
+				 unsigned long address);
+
+	/*
+	 * Before this is invoked any secondary MMU is still ok to
+	 * read/write to the page previously pointed by the Linux pte
+	 * because the old page hasn't been freed yet.  If required
+	 * set_page_dirty has to be called internally to this method.
+	 */
+	void (*invalidate_page)(struct mmu_notifier *mn,
+				struct mm_struct *mm,
+				unsigned long address);
+
+	/*
+	 * invalidate_range_start() and invalidate_range_end() must be
+	 * paired and are called only when the mmap_sem is held and/or
+	 * the semaphores protecting the reverse maps. Both functions
+	 * may sleep. The subsystem must guarantee that no additional
+	 * references to the pages in the range established between
+	 * the call to invalidate_range_start() and the matching call
+	 * to invalidate_range_end().
+	 *
+	 * Invalidation of multiple concurrent ranges may be permitted
+	 * by the driver or the driver may exclude other invalidation
+	 * from proceeding by blocking on new invalidate_range_start()
+	 * callback that overlap invalidates that are already in
+	 * progress. Either way the establishment of sptes to the
+	 * range can only be allowed if all invalidate_range_stop()
+	 * function have been called.
+	 *
+	 * invalidate_range_start() is called when all pages in the
+	 * range are still mapped and have at least a refcount of one.
+	 *
+	 * invalidate_range_end() is called when all pages in the
+	 * range have been unmapped and the pages have been freed by
+	 * the VM.
+	 *
+	 * The VM will remove the page table entries and potentially
+	 * the page between invalidate_range_start() and
+	 * invalidate_range_end(). If the page must not be freed
+	 * because of pending I/O or other circumstances then the
+	 * invalidate_range_start() callback (or the initial mapping
+	 * by the driver) must make sure that the refcount is kept
+	 * elevated.
+	 *
+	 * If the driver increases the refcount when the pages are
+	 * initially mapped into an address space then either
+	 * invalidate_range_start() or invalidate_range_end() may
+	 * decrease the refcount. If the refcount is decreased on
+	 * invalidate_range_start() then the VM can free pages as page
+	 * table entries are removed.  If the refcount is only
+	 * droppped on invalidate_range_end() then the driver itself
+	 * will drop the last refcount but it must take care to flush
+	 * any secondary tlb before doing the final free on the
+	 * page. Pages will no longer be referenced by the linux
+	 * address space but may still be referenced by sptes until
+	 * the last refcount is dropped.
+	 */
+	void (*invalidate_range_start)(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long start, unsigned long end);
+	void (*invalidate_range_end)(struct mmu_notifier *mn,
+				     struct mm_struct *mm,
+				     unsigned long start, unsigned long end);
+};
+
+/*
+ * The notifier chains are protected by mmap_sem and/or the reverse map
+ * semaphores. Notifier chains are only changed when all reverse maps and
+ * the mmap_sem locks are taken.
+ *
+ * Therefore notifier chains can only be traversed when either
+ *
+ * 1. mmap_sem is held.
+ * 2. One of the reverse map locks is held (i_mmap_sem or anon_vma->sem).
+ * 3. No other concurrent thread can access the list (release)
+ */
+struct mmu_notifier {
+	struct hlist_node hlist;
+	const struct mmu_notifier_ops *ops;
+};
+
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+	return unlikely(mm->mmu_notifier_mm);
+}
+
+extern int mmu_notifier_register(struct mmu_notifier *mn,
+				 struct mm_struct *mm);
+extern void mmu_notifier_unregister(struct mmu_notifier *mn,
+				    struct mm_struct *mm);
+extern void __mmu_notifier_mm_destroy(struct mm_struct *mm);
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address);
+extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		return __mmu_notifier_clear_flush_young(mm, address);
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_start(mm, start, end);
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_invalidate_range_end(mm, start, end);
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+	mm->mmu_notifier_mm = NULL;
+}
+
+static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
+{
+	if (mm_has_notifiers(mm))
+		__mmu_notifier_mm_destroy(mm);
+}
+
+#define ptep_clear_flush_notify(__vma, __address, __ptep)		\
+({									\
+	pte_t __pte;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__pte = ptep_clear_flush(___vma, ___address, __ptep);		\
+	mmu_notifier_invalidate_page(___vma->vm_mm, ___address);	\
+	__pte;								\
+})
+
+#define ptep_clear_flush_young_notify(__vma, __address, __ptep)		\
+({									\
+	int __young;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__young = ptep_clear_flush_young(___vma, ___address, __ptep);	\
+	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
+						  ___address);		\
+	__young;							\
+})
+
+#else /* CONFIG_MMU_NOTIFIER */
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+}
+
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
+{
+}
+
+static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
+{
+}
+
+#define ptep_clear_flush_young_notify ptep_clear_flush_young
+#define ptep_clear_flush_notify ptep_clear_flush
+
+#endif /* CONFIG_MMU_NOTIFIER */
+
+#endif /* _LINUX_MMU_NOTIFIER_H */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -53,6 +53,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -362,6 +363,7 @@
 
 	if (likely(!mm_alloc_pgd(mm))) {
 		mm->def_flags = 0;
+		mmu_notifier_mm_init(mm);
 		return mm;
 	}
 
@@ -395,6 +397,7 @@
 	BUG_ON(mm == &init_mm);
 	mm_free_pgd(mm);
 	destroy_context(mm);
+	mmu_notifier_mm_destroy(mm);
 	free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
diff --git a/mm/Kconfig b/mm/Kconfig
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -193,3 +193,7 @@
 config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
+
+config MMU_NOTIFIER
+	def_bool y
+	bool "MMU notifier, for paging KVM/RDMA"
diff --git a/mm/Makefile b/mm/Makefile
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -33,4 +33,5 @@
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@
 		if (pte) {
 			/* Nuke the page table entry. */
 			flush_cache_page(vma, address, pte_pfn(*pte));
-			pteval = ptep_clear_flush(vma, address, pte);
+			pteval = ptep_clear_flush_notify(vma, address, pte);
 			page_remove_rmap(page, vma);
 			dec_mm_counter(mm, file_rss);
 			BUG_ON(pte_dirty(pteval));
diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -15,6 +15,7 @@
 #include <linux/rmap.h>
 #include <linux/module.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
@@ -214,7 +215,9 @@
 		spin_unlock(&mapping->i_mmap_lock);
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, start + size);
 	err = populate_range(mm, vma, start, size, pgoff);
+	mmu_notifier_invalidate_range_end(mm, start, start + size);
 	if (!err && !(flags & MAP_NONBLOCK)) {
 		if (unlikely(has_write_lock)) {
 			downgrade_write(&mm->mmap_sem);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,6 +14,7 @@
 #include <linux/mempolicy.h>
 #include <linux/cpuset.h>
 #include <linux/mutex.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/page.h>
 #include <asm/pgtable.h>
@@ -799,6 +800,7 @@
 	BUG_ON(start & ~HPAGE_MASK);
 	BUG_ON(end & ~HPAGE_MASK);
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	spin_lock(&mm->page_table_lock);
 	for (address = start; address < end; address += HPAGE_SIZE) {
 		ptep = huge_pte_offset(mm, address);
@@ -819,6 +821,7 @@
 	}
 	spin_unlock(&mm->page_table_lock);
 	flush_tlb_range(vma, start, end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	list_for_each_entry_safe(page, tmp, &page_list, lru) {
 		list_del(&page->lru);
 		put_page(page);
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -51,6 +51,7 @@
 #include <linux/init.h>
 #include <linux/writeback.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
@@ -596,6 +597,7 @@
 	unsigned long next;
 	unsigned long addr = vma->vm_start;
 	unsigned long end = vma->vm_end;
+	int ret;
 
 	/*
 	 * Don't copy ptes where a page fault will fill them correctly.
@@ -603,25 +605,39 @@
 	 * readonly mappings. The tradeoff is that copy_page_range is more
 	 * efficient than faulting.
 	 */
+	ret = 0;
 	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
 		if (!vma->anon_vma)
-			return 0;
+			goto out;
 	}
 
-	if (is_vm_hugetlb_page(vma))
-		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
+	if (unlikely(is_vm_hugetlb_page(vma))) {
+		ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
+		goto out;
+	}
 
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_start(src_mm, addr, end);
+
+	ret = 0;
 	dst_pgd = pgd_offset(dst_mm, addr);
 	src_pgd = pgd_offset(src_mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
 		if (pgd_none_or_clear_bad(src_pgd))
 			continue;
-		if (copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
-						vma, addr, next))
-			return -ENOMEM;
+		if (unlikely(copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
+					    vma, addr, next))) {
+			ret = -ENOMEM;
+			break;
+		}
 	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
-	return 0;
+
+	if (is_cow_mapping(vma->vm_flags))
+		mmu_notifier_invalidate_range_end(src_mm,
+						  vma->vm_start, end);
+out:
+	return ret;
 }
 
 static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -825,7 +841,9 @@
 	unsigned long start = start_addr;
 	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
 	int fullmm = (*tlbp)->fullmm;
+	struct mm_struct *mm = vma->vm_mm;
 
+	mmu_notifier_invalidate_range_start(mm, start_addr, end_addr);
 	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
 		unsigned long end;
 
@@ -876,6 +894,7 @@
 		}
 	}
 out:
+	mmu_notifier_invalidate_range_end(mm, start_addr, end_addr);
 	return start;	/* which is now the end (or restart) address */
 }
 
@@ -1463,10 +1482,11 @@
 {
 	pgd_t *pgd;
 	unsigned long next;
-	unsigned long end = addr + size;
+	unsigned long start = addr, end = addr + size;
 	int err;
 
 	BUG_ON(addr >= end);
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	pgd = pgd_offset(mm, addr);
 	do {
 		next = pgd_addr_end(addr, end);
@@ -1474,6 +1494,7 @@
 		if (err)
 			break;
 	} while (pgd++, addr = next, addr != end);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1675,7 +1696,7 @@
 		 * seen in the presence of one thread doing SMC and another
 		 * thread doing COW.
 		 */
-		ptep_clear_flush(vma, address, page_table);
+		ptep_clear_flush_notify(vma, address, page_table);
 		set_pte_at(mm, address, page_table, entry);
 		update_mmu_cache(vma, address, entry);
 		lru_cache_add_active(new_page);
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -26,6 +26,9 @@
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
+#include <linux/vmalloc.h>
+#include <linux/sort.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -2038,6 +2041,7 @@
 
 	/* mm's last user has gone, and its about to be pulled down */
 	arch_exit_mmap(mm);
+	mmu_notifier_release(mm);
 
 	lru_add_drain();
 	flush_cache_mm(mm);
@@ -2242,3 +2246,144 @@
 
 	return 0;
 }
+
+static int mm_lock_cmp(const void *a, const void *b)
+{
+	unsigned long _a = (unsigned long)*(spinlock_t **)a;
+	unsigned long _b = (unsigned long)*(spinlock_t **)b;
+
+	cond_resched();
+	if (_a < _b)
+		return -1;
+	if (_a > _b)
+		return 1;
+	return 0;
+}
+
+static unsigned long mm_lock_sort(struct mm_struct *mm, spinlock_t **locks,
+				  int anon)
+{
+	struct vm_area_struct *vma;
+	size_t i = 0;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (anon) {
+			if (vma->anon_vma)
+				locks[i++] = &vma->anon_vma->lock;
+		} else {
+			if (vma->vm_file && vma->vm_file->f_mapping)
+				locks[i++] = &vma->vm_file->f_mapping->i_mmap_lock;
+		}
+	}
+
+	if (!i)
+		goto out;
+
+	sort(locks, i, sizeof(spinlock_t *), mm_lock_cmp, NULL);
+
+out:
+	return i;
+}
+
+static inline unsigned long mm_lock_sort_anon_vma(struct mm_struct *mm,
+						  spinlock_t **locks)
+{
+	return mm_lock_sort(mm, locks, 1);
+}
+
+static inline unsigned long mm_lock_sort_i_mmap(struct mm_struct *mm,
+						spinlock_t **locks)
+{
+	return mm_lock_sort(mm, locks, 0);
+}
+
+static void mm_lock_unlock(spinlock_t **locks, size_t nr, int lock)
+{
+	spinlock_t *last = NULL;
+	size_t i;
+
+	for (i = 0; i < nr; i++)
+		/*  Multiple vmas may use the same lock. */
+		if (locks[i] != last) {
+			BUG_ON((unsigned long) last > (unsigned long) locks[i]);
+			last = locks[i];
+			if (lock)
+				spin_lock(last);
+			else
+				spin_unlock(last);
+		}
+}
+
+static inline void __mm_lock(spinlock_t **locks, size_t nr)
+{
+	mm_lock_unlock(locks, nr, 1);
+}
+
+static inline void __mm_unlock(spinlock_t **locks, size_t nr)
+{
+	mm_lock_unlock(locks, nr, 0);
+}
+
+/*
+ * This operation locks against the VM for all pte/vma/mm related
+ * operations that could ever happen on a certain mm. This includes
+ * vmtruncate, try_to_unmap, and all page faults. The holder
+ * must not hold any mm related lock. A single task can't take more
+ * than one mm lock in a row or it would deadlock.
+ */
+int mm_lock(struct mm_struct *mm, struct mm_lock_data *data)
+{
+	spinlock_t **anon_vma_locks, **i_mmap_locks;
+
+	down_write(&mm->mmap_sem);
+	if (mm->map_count) {
+		anon_vma_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count);
+		if (unlikely(!anon_vma_locks)) {
+			up_write(&mm->mmap_sem);
+			return -ENOMEM;
+		}
+
+		i_mmap_locks = vmalloc(sizeof(spinlock_t *) * mm->map_count);
+		if (unlikely(!i_mmap_locks)) {
+			up_write(&mm->mmap_sem);
+			vfree(anon_vma_locks);
+			return -ENOMEM;
+		}
+
+		data->nr_anon_vma_locks = mm_lock_sort_anon_vma(mm, anon_vma_locks);
+		data->nr_i_mmap_locks = mm_lock_sort_i_mmap(mm, i_mmap_locks);
+
+		if (data->nr_anon_vma_locks) {
+			__mm_lock(anon_vma_locks, data->nr_anon_vma_locks);
+			data->anon_vma_locks = anon_vma_locks;
+		} else
+			vfree(anon_vma_locks);
+
+		if (data->nr_i_mmap_locks) {
+			__mm_lock(i_mmap_locks, data->nr_i_mmap_locks);
+			data->i_mmap_locks = i_mmap_locks;
+		} else
+			vfree(i_mmap_locks);
+	}
+	return 0;
+}
+
+static void mm_unlock_vfree(spinlock_t **locks, size_t nr)
+{
+	__mm_unlock(locks, nr);
+	vfree(locks);
+}
+
+/* avoid memory allocations for mm_unlock to prevent deadlock */
+void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
+{
+	if (mm->map_count) {
+		if (data->nr_anon_vma_locks)
+			mm_unlock_vfree(data->anon_vma_locks,
+					data->nr_anon_vma_locks);
+		if (data->i_mmap_locks)
+			mm_unlock_vfree(data->i_mmap_locks,
+					data->nr_i_mmap_locks);
+	}
+	up_write(&mm->mmap_sem);
+}
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
new file mode 100644
--- /dev/null
+++ b/mm/mmu_notifier.c
@@ -0,0 +1,241 @@
+/*
+ *  linux/mm/mmu_notifier.c
+ *
+ *  Copyright (C) 2008  Qumranet, Inc.
+ *  Copyright (C) 2008  SGI
+ *             Christoph Lameter <clameter at sgi.com>
+ *
+ *  This work is licensed under the terms of the GNU GPL, version 2. See
+ *  the COPYING file in the top-level directory.
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/err.h>
+#include <linux/srcu.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+
+/*
+ * This function can't run concurrently against mmu_notifier_register
+ * or any other mmu notifier method. mmu_notifier_register can only
+ * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is
+ * zero). All other tasks of this mm already quit so they can't invoke
+ * mmu notifiers anymore. This can run concurrently only against
+ * mmu_notifier_unregister and it serializes against it with the
+ * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
+ * away from under us as the exit_mmap holds a mm_count pin itself.
+ *
+ * The ->release method can't allow the module to be unloaded, the
+ * module can only be unloaded after mmu_notifier_unregister run. This
+ * is because the release method has to run the ret instruction to
+ * return back here, and so it can't allow the ret instruction to be
+ * freed.
+ */
+void __mmu_notifier_release(struct mm_struct *mm)
+{
+	struct mmu_notifier *mn;
+	int srcu;
+
+	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+	while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
+		mn = hlist_entry(mm->mmu_notifier_mm->list.first,
+				 struct mmu_notifier,
+				 hlist);
+		/*
+		 * We arrived before mmu_notifier_unregister so
+		 * mmu_notifier_unregister will do nothing else than
+		 * to wait ->release to finish and
+		 * mmu_notifier_unregister to return.
+		 */
+		hlist_del_init(&mn->hlist);
+		/*
+		 * if ->release runs before mmu_notifier_unregister it
+		 * must be handled as it's the only way for the driver
+		 * to flush all existing sptes before the pages in the
+		 * mm are freed.
+		 */
+		spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
+		/* SRCU will block mmu_notifier_unregister */
+		mn->ops->release(mn, mm);
+		spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+	}
+	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
+	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+
+	/*
+	 * Wait ->release if mmu_notifier_unregister run list_del_rcu.
+	 * srcu can't go away from under us because one mm_count is
+	 * hold by exit_mmap.
+	 */
+	synchronize_srcu(&mm->mmu_notifier_mm->srcu);
+}
+
+/*
+ * If no young bitflag is supported by the hardware, ->clear_flush_young can
+ * unmap the address and return 1 or 0 depending if the mapping previously
+ * existed or not.
+ */
+int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
+					unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int young = 0, srcu;
+
+	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
+		if (mn->ops->clear_flush_young)
+			young |= mn->ops->clear_flush_young(mn, mm, address);
+	}
+	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+
+	return young;
+}
+
+void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+					  unsigned long address)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int srcu;
+
+	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
+		if (mn->ops->invalidate_page)
+			mn->ops->invalidate_page(mn, mm, address);
+	}
+	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+}
+
+void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int srcu;
+
+	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
+		if (mn->ops->invalidate_range_start)
+			mn->ops->invalidate_range_start(mn, mm, start, end);
+	}
+	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+}
+
+void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+				  unsigned long start, unsigned long end)
+{
+	struct mmu_notifier *mn;
+	struct hlist_node *n;
+	int srcu;
+
+	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
+		if (mn->ops->invalidate_range_end)
+			mn->ops->invalidate_range_end(mn, mm, start, end);
+	}
+	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+}
+
+/*
+ * Must not hold mmap_sem nor any other VM related lock when calling
+ * this registration function. Must also ensure mm_users can't go down
+ * to zero while this runs to avoid races with mmu_notifier_release,
+ * so mm has to be current->mm or the mm should be pinned safely like
+ * with get_task_mm(). mmput can be called after mmu_notifier_register
+ * returns. mmu_notifier_unregister must be always called to
+ * unregister the notifier. mm_count is automatically pinned to allow
+ * mmu_notifier_unregister to safely run at any time later, before or
+ * after exit_mmap. ->release will always be called before exit_mmap
+ * frees the pages.
+ */
+int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct mm_lock_data data;
+	int ret;
+
+	BUG_ON(atomic_read(&mm->mm_users) <= 0);
+
+	ret = mm_lock(mm, &data);
+	if (unlikely(ret))
+		goto out;
+
+	if (!mm_has_notifiers(mm)) {
+		mm->mmu_notifier_mm = kmalloc(sizeof(struct mmu_notifier_mm),
+					      GFP_KERNEL);
+		ret = -ENOMEM;
+		if (unlikely(!mm_has_notifiers(mm)))
+			goto out_unlock;
+
+		ret = init_srcu_struct(&mm->mmu_notifier_mm->srcu);
+		if (unlikely(ret)) {
+			kfree(mm->mmu_notifier_mm);
+			mmu_notifier_mm_init(mm);
+			goto out_unlock;
+		}
+		INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list);
+		spin_lock_init(&mm->mmu_notifier_mm->unregister_lock);
+	}
+	atomic_inc(&mm->mm_count);
+
+	hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
+out_unlock:
+	mm_unlock(mm, &data);
+out:
+	BUG_ON(atomic_read(&mm->mm_users) <= 0);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_register);
+
+/* this is called after the last mmu_notifier_unregister() returned */
+void __mmu_notifier_mm_destroy(struct mm_struct *mm)
+{
+	BUG_ON(!hlist_empty(&mm->mmu_notifier_mm->list));
+	cleanup_srcu_struct(&mm->mmu_notifier_mm->srcu);
+	kfree(mm->mmu_notifier_mm);
+	mm->mmu_notifier_mm = LIST_POISON1; /* debug */
+}
+
+/*
+ * This releases the mm_count pin automatically and frees the mm
+ * structure if it was the last user of it. It serializes against
+ * running mmu notifiers with SRCU and against mmu_notifier_unregister
+ * with the unregister lock + SRCU. All sptes must be dropped before
+ * calling mmu_notifier_unregister. ->release or any other notifier
+ * method may be invoked concurrently with mmu_notifier_unregister,
+ * and only after mmu_notifier_unregister returned we're guaranteed
+ * that ->release or any other method can't run anymore.
+ */
+void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	int before_release = 0, srcu;
+
+	BUG_ON(atomic_read(&mm->mm_count) <= 0);
+
+	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+	if (!hlist_unhashed(&mn->hlist)) {
+		hlist_del_rcu(&mn->hlist);
+		before_release = 1;
+	}
+	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
+	if (before_release)
+		/*
+		 * exit_mmap will block in mmu_notifier_release to
+		 * guarantee ->release is called before freeing the
+		 * pages.
+		 */
+		mn->ops->release(mn, mm);
+	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+
+	/* wait any running method to finish, including ->release */
+	synchronize_srcu(&mm->mmu_notifier_mm->srcu);
+
+	BUG_ON(atomic_read(&mm->mm_count) <= 0);
+
+	mmdrop(mm);
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
diff --git a/mm/mprotect.c b/mm/mprotect.c
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -21,6 +21,7 @@
 #include <linux/syscalls.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/mmu_notifier.h>
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
 #include <asm/cacheflush.h>
@@ -198,10 +199,12 @@
 		dirty_accountable = 1;
 	}
 
+	mmu_notifier_invalidate_range_start(mm, start, end);
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_change_protection(vma, start, end, vma->vm_page_prot);
 	else
 		change_protection(vma, start, end, vma->vm_page_prot, dirty_accountable);
+	mmu_notifier_invalidate_range_end(mm, start, end);
 	vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
 	vm_stat_account(mm, newflags, vma->vm_file, nrpages);
 	return 0;
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -18,6 +18,7 @@
 #include <linux/highmem.h>
 #include <linux/security.h>
 #include <linux/syscalls.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -74,7 +75,11 @@
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
+	unsigned long old_start;
 
+	old_start = old_addr;
+	mmu_notifier_invalidate_range_start(vma->vm_mm,
+					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -116,6 +121,7 @@
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		spin_unlock(&mapping->i_mmap_lock);
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -49,6 +49,7 @@
 #include <linux/module.h>
 #include <linux/kallsyms.h>
 #include <linux/memcontrol.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/tlbflush.h>
 
@@ -287,7 +288,7 @@
 	if (vma->vm_flags & VM_LOCKED) {
 		referenced++;
 		*mapcount = 1;	/* break early from loop */
-	} else if (ptep_clear_flush_young(vma, address, pte))
+	} else if (ptep_clear_flush_young_notify(vma, address, pte))
 		referenced++;
 
 	/* Pretend the page is referenced if the task has the
@@ -456,7 +457,7 @@
 		pte_t entry;
 
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		entry = ptep_clear_flush(vma, address, pte);
+		entry = ptep_clear_flush_notify(vma, address, pte);
 		entry = pte_wrprotect(entry);
 		entry = pte_mkclean(entry);
 		set_pte_at(mm, address, pte, entry);
@@ -717,14 +718,14 @@
 	 * skipped over this mm) then we should reactivate it.
 	 */
 	if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-			(ptep_clear_flush_young(vma, address, pte)))) {
+			(ptep_clear_flush_young_notify(vma, address, pte)))) {
 		ret = SWAP_FAIL;
 		goto out_unmap;
 	}
 
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
-	pteval = ptep_clear_flush(vma, address, pte);
+	pteval = ptep_clear_flush_notify(vma, address, pte);
 
 	/* Move the dirty bit to the physical page now the pte is gone. */
 	if (pte_dirty(pteval))
@@ -849,12 +850,12 @@
 		page = vm_normal_page(vma, address, *pte);
 		BUG_ON(!page || PageAnon(page));
 
-		if (ptep_clear_flush_young(vma, address, pte))
+		if (ptep_clear_flush_young_notify(vma, address, pte))
 			continue;
 
 		/* Nuke the page table entry. */
 		flush_cache_page(vma, address, pte_pfn(*pte));
-		pteval = ptep_clear_flush(vma, address, pte);
+		pteval = ptep_clear_flush_notify(vma, address, pte);
 
 		/* If nonlinear, store the file page offset in the pte. */
 		if (page->index != linear_page_index(vma, address))


From okir at lst.de  Thu Apr 24 02:09:40 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:09:40 +0200
Subject: [ofa-general] [PATCH 1/8]: RDS: Fix IB max_unacked_* sysctls
In-Reply-To: <200804241106.57172.okir@lst.de>
References: <200804241106.57172.okir@lst.de>
Message-ID: <200804241109.41035.okir@lst.de>

From 4c378d81c2348ac13300d033f306bfd20e65eb76 Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:05 -0700
Subject: [PATCH] RDS: Fix IB max_unacked_* sysctls

The sysctl variables max_unacked_{bytes,packets} are defined as unsigned
longs, but the sysctl table specifies proc_dointvec as the handler.

Change the variables to unsigned ints - the type is big enough.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/rds.h    |    4 ++--
 net/rds/sysctl.c |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/rds/rds.h b/net/rds/rds.h
index 2d4600a..dc1ab4c 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -667,8 +667,8 @@ extern unsigned long rds_sysctl_sndbuf_default;
 extern unsigned long rds_sysctl_sndbuf_max;
 extern unsigned long rds_sysctl_reconnect_min_jiffies;
 extern unsigned long rds_sysctl_reconnect_max_jiffies;
-extern unsigned long rds_sysctl_max_unacked_packets;
-extern unsigned long rds_sysctl_max_unacked_bytes;
+extern unsigned int  rds_sysctl_max_unacked_packets;
+extern unsigned int  rds_sysctl_max_unacked_bytes;
 
 /* threads.c */
 int __init rds_threads_init(void);
diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c
index bb0fa46..5f7ce37 100644
--- a/net/rds/sysctl.c
+++ b/net/rds/sysctl.c
@@ -44,8 +44,8 @@ static unsigned long rds_sysctl_reconnect_max = ~0UL;
 unsigned long rds_sysctl_reconnect_min_jiffies;
 unsigned long rds_sysctl_reconnect_max_jiffies = HZ;
 
-unsigned long rds_sysctl_max_unacked_packets = 16;
-unsigned long rds_sysctl_max_unacked_bytes = (16 << 20);
+unsigned int  rds_sysctl_max_unacked_packets = 16;
+unsigned int  rds_sysctl_max_unacked_bytes = (16 << 20);
 
 /* 
  * These can change over time until they're official.  Until that time we'll
@@ -95,7 +95,7 @@ static ctl_table rds_sysctl_rds_table[] = {
 		.ctl_name	= 8,
 		.procname	= "max_unacked_packets",
 		.data		= &rds_sysctl_max_unacked_packets,
-		.maxlen         = sizeof(int),
+		.maxlen         = sizeof(unsigned long),
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
@@ -103,7 +103,7 @@ static ctl_table rds_sysctl_rds_table[] = {
 		.ctl_name	= 9,
 		.procname	= "max_unacked_bytes",
 		.data		= &rds_sysctl_max_unacked_bytes,
-		.maxlen         = sizeof(int),
+		.maxlen         = sizeof(unsigned long),
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:09:51 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:09:51 +0200
Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old FMR
	R_Keys too soon
In-Reply-To: <200804241108.58748.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de>
Message-ID: <200804241109.52448.okir@lst.de>

From b1092d9002fec323aaaf42dcbff88b2f46d4f3d5 Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:34 -0700
Subject: [PATCH] mthca/mlx4: avoid recycling old FMR R_Keys too soon

When a FMR is unmapped, mthca and mlx4 reset the map count to 0, and
clear the upper part of the R_Key which is used as the sequence counter.

This poses a problem for RDS, which uses ib_fmr_unmap as a fence
operation.  RDS assumes that after issuing an unmap, the old R_Keys
will be invalid for a "reasonable" period of time. For instance, Oracle
processes uses shared memory buffers allocated from a pool of buffers.
When a process dies, we want to reclaim these buffers - but we must make sure
there are no pending RDMA operations to/from those buffers.
The only way to achieve that is by using unmap and sync the TPT.

However, when the sequence count is reset on unmap, there is a high
likelihood that a new mapping will be given the same R_Key that was
issued a few milliseconds ago.

To prevent this, we suggest to not reset the sequence count when
unmapping a FMR.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 drivers/infiniband/hw/mthca/mthca_mr.c |   13 -------------
 drivers/net/mlx4/mr.c                  |    6 ------
 2 files changed, 0 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c
index aa6c70a..e4f83cb 100644
--- a/drivers/infiniband/hw/mthca/mthca_mr.c
+++ b/drivers/infiniband/hw/mthca/mthca_mr.c
@@ -814,15 +814,9 @@ int mthca_arbel_map_phys_fmr(struct ib_fmr *ibfmr, u64 *page_list,
 
 void mthca_tavor_fmr_unmap(struct mthca_dev *dev, struct mthca_fmr *fmr)
 {
-	u32 key;
-
 	if (!fmr->maps)
 		return;
 
-	key = tavor_key_to_hw_index(fmr->ibmr.lkey);
-	key &= dev->limits.num_mpts - 1;
-	fmr->ibmr.lkey = fmr->ibmr.rkey = tavor_hw_index_to_key(key);
-
 	fmr->maps = 0;
 
 	writeb(MTHCA_MPT_STATUS_SW, fmr->mem.tavor.mpt);
@@ -830,16 +824,9 @@ void mthca_tavor_fmr_unmap(struct mthca_dev *dev, struct mthca_fmr *fmr)
 
 void mthca_arbel_fmr_unmap(struct mthca_dev *dev, struct mthca_fmr *fmr)
 {
-	u32 key;
-
 	if (!fmr->maps)
 		return;
 
-	key = arbel_key_to_hw_index(fmr->ibmr.lkey);
-	key &= dev->limits.num_mpts - 1;
-	key = adjust_key(dev, key);
-	fmr->ibmr.lkey = fmr->ibmr.rkey = arbel_hw_index_to_key(key);
-
 	fmr->maps = 0;
 
 	*(u8 *) fmr->mem.arbel.mpt = MTHCA_MPT_STATUS_SW;
diff --git a/drivers/net/mlx4/mr.c b/drivers/net/mlx4/mr.c
index 0c05a10..b9e57b0 100644
--- a/drivers/net/mlx4/mr.c
+++ b/drivers/net/mlx4/mr.c
@@ -602,15 +602,9 @@ EXPORT_SYMBOL_GPL(mlx4_fmr_enable);
 void mlx4_fmr_unmap(struct mlx4_dev *dev, struct mlx4_fmr *fmr,
 		    u32 *lkey, u32 *rkey)
 {
-	u32 key;
-
 	if (!fmr->maps)
 		return;
 
-	key = key_to_hw_index(fmr->mr.key);
-	key &= dev->caps.num_mpts - 1;
-	*lkey = *rkey = fmr->mr.key = hw_index_to_key(key);
-
 	fmr->maps = 0;
 
 	*(u8 *) fmr->mpt = MLX4_MPT_STATUS_SW;
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:11:26 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:11:26 +0200
Subject: [ofa-general] Re: [PATCH 4/8]: RDS: Increase the default number of
	WRs
In-Reply-To: <200804241110.51026.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de>
	<200804241110.51026.okir@lst.de>
Message-ID: <200804241111.26726.okir@lst.de>

From 8ee794c0530f6e5f5fe81bc78b5e09be8f4b1eda Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:35 -0700
Subject: [PATCH] RDS: Increase the default number of WRs

The default number of send and receive WRs was way too
low to be useful. Increment this to 256 send WRs and
1024 recv WRs.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/ib.h        |    3 +++
 net/rds/ib_sysctl.c |   10 ++++------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index fd0b2d8..2c6e809 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -13,6 +13,9 @@
 #define RDS_IB_MAX_SGE			8
 #define RDS_IB_RECV_SGE 		2
 
+#define RDS_IB_DEFAULT_RECV_WR		1024
+#define RDS_IB_DEFAULT_SEND_WR		256
+
 /*
  * IB posts RDS_FRAG_SIZE fragments of pages to the receive queues to 
  * try and minimize the amount of memory tied up both the device and
diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c
index 813b1a6..b8a10fc 100644
--- a/net/rds/ib_sysctl.c
+++ b/net/rds/ib_sysctl.c
@@ -38,18 +38,16 @@
 
 static struct ctl_table_header *rds_ib_sysctl_hdr;
 
-/* default to what we hope will be order 0 allocations */
-unsigned long rds_ib_sysctl_max_send_wr = PAGE_SIZE / sizeof(struct ib_send_wr);
-unsigned long rds_ib_sysctl_max_recv_wr = PAGE_SIZE / sizeof(struct ib_recv_wr);
+unsigned long rds_ib_sysctl_max_send_wr = RDS_IB_DEFAULT_SEND_WR;
+unsigned long rds_ib_sysctl_max_recv_wr = RDS_IB_DEFAULT_RECV_WR;
 unsigned long rds_ib_sysctl_max_recv_allocation = (128 * 1024 * 1024) / RDS_FRAG_SIZE;
 static unsigned long rds_ib_sysctl_max_wr_min = 1;
 /* hardware will fail CQ creation long before this */
 static unsigned long rds_ib_sysctl_max_wr_max = (u32)~0;
 
-/* default to rds_ib_sysctl_max_send_wr/4 */
-unsigned long rds_ib_sysctl_max_unsig_wrs = PAGE_SIZE / (4 * sizeof(struct ib_send_wr));
+unsigned long rds_ib_sysctl_max_unsig_wrs = 16;
 static unsigned long rds_ib_sysctl_max_unsig_wr_min = 1;
-static unsigned long rds_ib_sysctl_max_unsig_wr_max = PAGE_SIZE / sizeof(struct ib_send_wr);
+static unsigned long rds_ib_sysctl_max_unsig_wr_max = 64;
 
 unsigned long rds_ib_sysctl_max_unsig_bytes = (16 << 20);
 static unsigned long rds_ib_sysctl_max_unsig_bytes_min = 1;
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:11:56 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:11:56 +0200
Subject: [ofa-general] Re: [PATCH 5/8]: RDS: Two small code reorgs in the
	connection code
In-Reply-To: <200804241111.26726.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241110.51026.okir@lst.de>
	<200804241111.26726.okir@lst.de>
Message-ID: <200804241111.56693.okir@lst.de>

From 2962a7fd8472d068913d0de74a12159d5438f408 Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:35 -0700
Subject: [PATCH] RDS: Two small code reorgs in the connection code

This changes two things in the connection code

 1.	When we create a new connection, we need to set various
 	fields of struct rds_connection to 0. Instead of doing them
	one by one, use memset.

 2.	The code for destroying a connection is currently inside a
 	loop in rds_conn_exit. Move it to a separate function, because
	it's needed by a subsequent patch.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/connection.c |   89 ++++++++++++++++++++++++--------------------------
 1 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index ecf71b9..585123a 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -149,6 +149,8 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 		goto out;
 	}
 
+	memset(conn, 0, sizeof(*conn));
+
 	/* hash_node below */
 	conn->c_laddr = laddr;
 	conn->c_faddr = faddr;
@@ -156,21 +158,9 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 	conn->c_next_tx_seq = 1;
 
 	init_MUTEX(&conn->c_send_sem);
-	conn->c_xmit_rm = NULL;
-	conn->c_xmit_sg = 0;
-	conn->c_xmit_hdr_off = 0;
-	conn->c_xmit_data_off = 0;
-
 	INIT_LIST_HEAD(&conn->c_send_queue);
 	INIT_LIST_HEAD(&conn->c_retrans);
 
-	conn->c_next_rx_seq = 0;
-
-	conn->c_map_queued = 0;
-	conn->c_map_offset = 0;
-	conn->c_map_bytes = 0;
-	conn->c_version = 0;
-
 	ret = rds_cong_get_maps(conn);
 	if (ret) {
 		kmem_cache_free(rds_conn_slab, conn);
@@ -240,6 +230,46 @@ struct rds_connection *rds_conn_create_outgoing(__be32 laddr, __be32 faddr,
 EXPORT_SYMBOL_GPL(rds_conn_create);
 EXPORT_SYMBOL_GPL(rds_conn_create_outgoing);
 
+static void __rds_conn_destroy(struct rds_connection *conn)
+{
+	struct rds_message *rm, *rtmp;
+
+	rdsdebug("freeing conn %p for %u.%u.%u.%u -> "
+		 "%u.%u.%u.%u\n", conn, NIPQUAD(conn->c_laddr),
+		 NIPQUAD(conn->c_faddr));
+
+	/* wait for the rds thread to shut it down */
+	atomic_set(&conn->c_state, RDS_CONN_ERROR);
+	cancel_delayed_work(&conn->c_conn_w);
+	queue_work(rds_wq, &conn->c_down_w);
+	flush_workqueue(rds_wq);
+
+	/* tear down queued messages */
+	list_for_each_entry_safe(rm, rtmp,
+				 &conn->c_send_queue,
+				 m_conn_item) {
+		list_del_init(&rm->m_conn_item);
+		BUG_ON(!list_empty(&rm->m_sock_item));
+		rds_message_put(rm);
+	}
+	if (conn->c_xmit_rm)
+		rds_message_put(conn->c_xmit_rm);
+
+	conn->c_trans->conn_free(conn->c_transport_data);
+
+	/*
+	 * The congestion maps aren't freed up here.  They're
+	 * freed by rds_cong_exit() after all the connections
+	 * have been freed.
+	 */
+	rds_cong_remove_conn(conn);
+
+	BUG_ON(!list_empty(&conn->c_retrans));
+	kmem_cache_free(rds_conn_slab, conn);
+
+	rds_conn_count--;
+}
+
 static void rds_conn_message_info(struct socket *sock, unsigned int len,
 			          struct rds_info_iterator *iter,
 			          struct rds_info_lengths *lens,
@@ -376,7 +406,6 @@ void __exit rds_conn_exit(void)
 	struct hlist_head *head;
 	struct hlist_node *pos, *tmp;
 	struct rds_connection *conn;
-	struct rds_message *rm, *rtmp;
 	size_t i;
 
 	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
@@ -385,40 +414,8 @@ void __exit rds_conn_exit(void)
 
 			/* the conn won't reconnect once it's unhashed */
 			hlist_del_init(&conn->c_hash_node);
-			rds_conn_count--;
-
-			rdsdebug("freeing conn %p for %u.%u.%u.%u -> "
-				 "%u.%u.%u.%u\n", conn, NIPQUAD(conn->c_laddr),
-				 NIPQUAD(conn->c_faddr));
-
-			/* wait for the rds thread to shut it down */
-			atomic_set(&conn->c_state, RDS_CONN_ERROR);
-			cancel_delayed_work(&conn->c_conn_w);
-			queue_work(rds_wq, &conn->c_down_w);
-			flush_workqueue(rds_wq);
-
-			/* tear down queued messages */
-			list_for_each_entry_safe(rm, rtmp,
-						 &conn->c_send_queue,
-						 m_conn_item) {
-				list_del_init(&rm->m_conn_item);
-				BUG_ON(!list_empty(&rm->m_sock_item));
-				rds_message_put(rm);
-			}
-			if (conn->c_xmit_rm)
-				rds_message_put(conn->c_xmit_rm);
-
-			conn->c_trans->conn_free(conn->c_transport_data);
-
-			/*
-			 * The congestion maps aren't freed up here.  They're
-			 * freed by rds_cong_exit() after all the connections
-			 * have been freed.
-			 */
-			rds_cong_remove_conn(conn);
 
-			BUG_ON(!list_empty(&conn->c_retrans));
-			kmem_cache_free(rds_conn_slab, conn);
+			__rds_conn_destroy(conn);
 		}
 	}
 
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:12:19 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:12:19 +0200
Subject: [ofa-general] Re: [PATCH 6/8]: RDS: Use IB for loopback
In-Reply-To: <200804241111.56693.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241111.26726.okir@lst.de>
	<200804241111.56693.okir@lst.de>
Message-ID: <200804241112.19866.okir@lst.de>

From 2a91ce118f8d4e7e644ea849f61bd8953faaacc6 Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:36 -0700
Subject: [PATCH] RDS: Use IB for loopback

Currently, when an application wants to send to a RDS port
on the local host, RDS will create a connection using the
special loopback transport.

In order to be able to test RDS (and RDS over RDMA) faithfully
on standalone machines, we want loopback traffic to use the IB
transport if possible.

This patch makes the necessary changes. This turns out to be a
little tricky, as we need two rds_connection objects with the same
address pair. The current code doesn't really handle this, so
we have to jump through some hoops.

 -	loopback connections for IB are represented by two
 	rds_connections; the "active" connection created when we
	initiate the connect, and a "passive" connection created
	when we accept the incoming RC.

 -	The active connection is used to transmit packets, which
	are then received by the passive conn.

 -	the passive conn is never added to the global hash table;
 	instead it is kept in conn->c_passive.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/connection.c |   42 +++++++++++++++++++++++++++++++++++-------
 net/rds/rds.h        |    3 +++
 net/rds/tcp.c        |    1 +
 net/rds/threads.c    |   10 +++++++++-
 4 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 585123a..5d7788e 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -130,15 +130,26 @@ void rds_conn_reset(struct rds_connection *conn)
  */
 static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 				       struct rds_transport *trans, gfp_t gfp,
-				       int allow_loop_transport)
+				       int is_outgoing)
 {
-	struct rds_connection *conn, *tmp;
+	struct rds_connection *conn, *tmp, *parent = NULL;
 	struct hlist_head *head = rds_conn_bucket(laddr, faddr);
 	unsigned long flags;
 	int ret;
 
 	spin_lock_irqsave(&rds_conn_lock, flags);
 	conn = rds_conn_lookup(head, laddr, faddr, trans);
+	if (conn
+	 && conn->c_loopback
+	 && conn->c_trans != &rds_loop_transport
+	 && !is_outgoing) {
+		/* This is a looped back IB connection, and we're
+		 * called by the code handling the incoming connect.
+		 * We need a second connection object into which we
+		 * can stick the other QP. */
+		parent = conn;
+		conn = parent->c_passive;
+	}
 	spin_unlock_irqrestore(&rds_conn_lock, flags);
 	if (conn)
 		goto out;
@@ -151,7 +162,7 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 
 	memset(conn, 0, sizeof(*conn));
 
-	/* hash_node below */
+	INIT_HLIST_NODE(&conn->c_hash_node);
 	conn->c_laddr = laddr;
 	conn->c_faddr = faddr;
 	spin_lock_init(&conn->c_lock);
@@ -173,8 +184,16 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 	 * can bind to the destination address then we'd rather the messages
 	 * flow through loopback rather than either transport.
 	 */
-	if (allow_loop_transport && rds_trans_get_preferred(faddr))
-		trans = &rds_loop_transport;
+	if (rds_trans_get_preferred(faddr)) {
+		conn->c_loopback = 1;
+		if (is_outgoing && trans->t_prefer_loopback) {
+			/* "outgoing" connection - and the transport
+			 * says it wants the connection handled by the
+			 * loopback transport. This is what TCP does.
+			 */
+			trans = &rds_loop_transport;
+		}
+	}
 
 	conn->c_trans = trans;
 
@@ -198,14 +217,21 @@ static struct rds_connection *__rds_conn_create(__be32 laddr, __be32 faddr,
 		 NIPQUAD(laddr), NIPQUAD(faddr));
 
 	spin_lock_irqsave(&rds_conn_lock, flags);
-	tmp = rds_conn_lookup(head, laddr, faddr, trans);
+	if (parent == NULL) {
+		tmp = rds_conn_lookup(head, laddr, faddr, trans);
+		if (tmp == NULL)
+			hlist_add_head(&conn->c_hash_node, head);
+	} else {
+		if ((tmp = parent->c_passive) == NULL)
+			parent->c_passive = conn;
+	}
+
 	if (tmp) {
 		trans->conn_free(conn->c_transport_data);
 		kmem_cache_free(rds_conn_slab, conn);
 		conn = tmp;
 	} else {
 		rds_cong_add_conn(conn);
-		hlist_add_head(&conn->c_hash_node, head);
 		rds_conn_count++;
 	}
 
@@ -415,6 +441,8 @@ void __exit rds_conn_exit(void)
 			/* the conn won't reconnect once it's unhashed */
 			hlist_del_init(&conn->c_hash_node);
 
+			if (conn->c_passive)
+				__rds_conn_destroy(conn->c_passive);
 			__rds_conn_destroy(conn);
 		}
 	}
diff --git a/net/rds/rds.h b/net/rds/rds.h
index dc1ab4c..d5a966d 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -121,6 +121,8 @@ struct rds_connection {
 	struct hlist_node	c_hash_node;
 	__be32			c_laddr;
 	__be32			c_faddr;
+	unsigned int		c_loopback : 1;
+	struct rds_connection *	c_passive;
 	spinlock_t		c_lock;
 
 	struct rds_cong_map	*c_lcong;
@@ -342,6 +344,7 @@ struct rds_transport {
 	struct list_head	t_item;
 	struct module		*t_owner;
 	char			*t_name;
+	unsigned int		t_prefer_loopback : 1;
 	int (*laddr_check)(__be32 addr);
 	int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp);
 	void (*conn_free)(void *data);
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index baf876e..f4e6fce 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -252,6 +252,7 @@ struct rds_transport rds_tcp_transport = {
 	.exit			= rds_tcp_exit,
 	.t_owner		= THIS_MODULE,
 	.t_name			= "tcp",
+	.t_prefer_loopback	= 1,
 };
 
 int __init rds_tcp_init(void)
diff --git a/net/rds/threads.c b/net/rds/threads.c
index 2a5dc0b..b86fbc3 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -178,6 +178,11 @@ void rds_shutdown_worker(struct work_struct *work)
 		up(&conn->c_send_sem);
 
 		if (!rds_conn_transition(conn, RDS_CONN_DISCONNECTING, RDS_CONN_DOWN)) {
+			/* This can happen - eg when we're in the middle of tearing
+			 * down the connection, and someone unloads the rds module.
+			 * Quite reproduceable with loopback connections.
+			 * Mostly harmless.
+			 */
 			rds_conn_error(conn,
 				"%s: failed to transition to state DOWN, "
 				"current state is %d\n",
@@ -187,7 +192,10 @@ void rds_shutdown_worker(struct work_struct *work)
 		}
 	}
 
-	/* then reconnect if it's still live */
+	/* Then reconnect if it's still live.
+	 * The passive side of an IB loopback connection is never added
+	 * to the conn hash, so we never trigger a reconnect on this
+	 * conn - the reconnect is always triggered by the active peer. */
 	cancel_delayed_work(&conn->c_conn_w);
 	if (!hlist_unhashed(&conn->c_hash_node)) {
 		rds_queue_reconnect(conn);
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:13:08 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:13:08 +0200
Subject: [ofa-general] Re: [PATCH 7/8]: RDS: Implement rds ping
In-Reply-To: <200804241112.19866.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241111.56693.okir@lst.de>
	<200804241112.19866.okir@lst.de>
Message-ID: <200804241113.08841.okir@lst.de>

From 24000a7c11fedb519aab11807703d91ae49ac421 Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:36 -0700
Subject: [PATCH] RDS: Implement rds ping

Several people have asked for a way to test reachability of
remote nodes via RDS. This is it - rds ping.

RDS ping is implemented by sending packets to port 0.
As a matter of simplicity, we do not handle packet payloads at this time -
the ping response is always an empty packet.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/cong.c   |    2 +-
 net/rds/rds.h    |    5 ++++
 net/rds/recv.c   |    6 +++++
 net/rds/send.c   |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 net/rds/stats.c  |    2 +
 net/rds/sysctl.c |   10 +++++++++
 6 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/net/rds/cong.c b/net/rds/cong.c
index 2db2362..4ec85ce 100644
--- a/net/rds/cong.c
+++ b/net/rds/cong.c
@@ -348,7 +348,7 @@ int rds_cong_wait(struct rds_cong_map *map, __be16 port, int nonblock, struct rd
 	if (!rds_cong_test_bit(map, port))
 	       return 0;
 	if (nonblock) {
-		if (rs->rs_cong_monitor) {
+		if (rs && rs->rs_cong_monitor) {
 			unsigned long flags;
 
 			/* It would have been nice to have an atomic set_bit on
diff --git a/net/rds/rds.h b/net/rds/rds.h
index d5a966d..a0fb20c 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -487,6 +487,7 @@ struct rds_statistics {
 	unsigned long	s_recv_delayed_retry;
 	unsigned long	s_recv_ack_required;
 	unsigned long	s_recv_rdma_bytes;
+	unsigned long	s_recv_ping;
 	unsigned long	s_send_queue_empty;
 	unsigned long	s_send_queue_full;
 	unsigned long	s_send_sem_contention;
@@ -497,6 +498,7 @@ struct rds_statistics {
 	unsigned long	s_send_ack_required;
 	unsigned long	s_send_rdma;
 	unsigned long	s_send_rdma_bytes;
+	unsigned long	s_send_pong;
 	unsigned long	s_page_remainder_hit;
 	unsigned long	s_page_remainder_miss;
 	unsigned long	s_cong_update_queued;
@@ -570,6 +572,7 @@ rds_conn_up(struct rds_connection *conn)
 }
 
 /* message.c */
+struct rds_message *rds_message_alloc(unsigned int nents, gfp_t gfp);
 struct rds_message *rds_message_copy_from_user(struct iovec *first_iov,
 					       size_t total_len);
 void rds_message_populate_header(struct rds_header *hdr, __be16 sport,
@@ -641,6 +644,7 @@ void rds_send_drop_acked(struct rds_connection *conn, u64 ack,
 			 is_acked_func is_acked);
 int rds_send_acked_before(struct rds_connection *conn, u64 seq);
 void rds_send_remove_from_sock(struct list_head *messages, int status);
+int rds_send_pong(struct rds_connection *conn, __be16 dport);
 
 /* rdma.c */
 void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force);
@@ -672,6 +676,7 @@ extern unsigned long rds_sysctl_reconnect_min_jiffies;
 extern unsigned long rds_sysctl_reconnect_max_jiffies;
 extern unsigned int  rds_sysctl_max_unacked_packets;
 extern unsigned int  rds_sysctl_max_unacked_bytes;
+extern unsigned int  rds_sysctl_ping_enable;
 
 /* threads.c */
 int __init rds_threads_init(void);
diff --git a/net/rds/recv.c b/net/rds/recv.c
index 9adb24d..da3c879 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -196,6 +196,12 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 saddr, __be32 daddr,
 	}
 	conn->c_next_rx_seq = be64_to_cpu(inc->i_hdr.h_sequence) + 1;
 
+	if (rds_sysctl_ping_enable && inc->i_hdr.h_dport == 0) {
+		rds_stats_inc(s_recv_ping);
+		rds_send_pong(conn, inc->i_hdr.h_sport);
+		goto out;
+	}
+
 	rs = rds_find_bound(daddr, inc->i_hdr.h_dport);
 	if (rs == NULL) {
 		rds_stats_inc(s_recv_drop_no_sock);
diff --git a/net/rds/send.c b/net/rds/send.c
index a2a5b2a..26e1e3e 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -700,8 +700,7 @@ int rds_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 
 	if (msg->msg_namelen) {
 		/* XXX fail non-unicast destination IPs? */
-		if (msg->msg_namelen < sizeof(*usin) || usin->sin_family != AF_INET ||
-		    usin->sin_port == 0) {
+		if (msg->msg_namelen < sizeof(*usin) || usin->sin_family != AF_INET) {
 			ret = -EINVAL;
 			goto out;
 		}
@@ -820,3 +819,56 @@ out:
 		rds_message_put(rm);
 	return ret;
 }
+
+/*
+ * Reply to a ping packet.
+ */
+int
+rds_send_pong(struct rds_connection *conn, __be16 dport)
+{
+	struct rds_message *rm;
+	unsigned long flags;
+	int ret = 0;
+
+	rm = rds_message_alloc(0, GFP_ATOMIC);
+	if (rm == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	rm->m_daddr = conn->c_faddr;
+
+	/* If the connection is down, trigger a connect. We may
+	 * have scheduled a delayed reconnect however - in this case
+	 * we should not interfere.
+	 */
+	if (rds_conn_state(conn) == RDS_CONN_DOWN
+	 && !test_and_set_bit(RDS_RECONNECT_PENDING, &conn->c_flags))
+		queue_delayed_work(rds_wq, &conn->c_conn_w, 0);
+
+	ret = rds_cong_wait(conn->c_fcong, dport, 1, NULL);
+	if (ret)
+		goto out;
+
+	spin_lock_irqsave(&conn->c_lock, flags);
+	list_add_tail(&rm->m_conn_item, &conn->c_send_queue);
+	set_bit(RDS_MSG_ON_CONN, &rm->m_flags);
+	rds_message_addref(rm);
+	rm->m_inc.i_conn = conn;
+
+	rds_message_populate_header(&rm->m_inc.i_hdr, 0, dport,
+				    conn->c_next_tx_seq);
+	conn->c_next_tx_seq++;
+	spin_unlock_irqrestore(&conn->c_lock, flags);
+
+	rds_stats_inc(s_send_pong);
+
+	queue_delayed_work(rds_wq, &conn->c_send_w, 0);
+	rds_message_put(rm);
+	return 0;
+
+out:
+	if (rm)
+		rds_message_put(rm);
+	return ret;
+}
diff --git a/net/rds/stats.c b/net/rds/stats.c
index abf7103..0bd91fa 100644
--- a/net/rds/stats.c
+++ b/net/rds/stats.c
@@ -53,6 +53,7 @@ static char *rds_stat_names[] = {
 	"recv_delayed_retry",
 	"recv_ack_required",
 	"recv_rdma_bytes",
+	"recv_ping",
 	"send_queue_empty",
 	"send_queue_full",
 	"send_sem_contention",
@@ -63,6 +64,7 @@ static char *rds_stat_names[] = {
 	"send_ack_required",
 	"send_rdma",
 	"send_rdma_bytes",
+	"send_pong",
 	"page_remainder_hit",
 	"page_remainder_miss",
 	"cong_update_queued",
diff --git a/net/rds/sysctl.c b/net/rds/sysctl.c
index 5f7ce37..7b18c0a 100644
--- a/net/rds/sysctl.c
+++ b/net/rds/sysctl.c
@@ -47,6 +47,8 @@ unsigned long rds_sysctl_reconnect_max_jiffies = HZ;
 unsigned int  rds_sysctl_max_unacked_packets = 16;
 unsigned int  rds_sysctl_max_unacked_bytes = (16 << 20);
 
+unsigned int rds_sysctl_ping_enable = 1;
+
 /* 
  * These can change over time until they're official.  Until that time we'll
  * give apps a way to figure out what the values are in a given machine.
@@ -107,6 +109,14 @@ static ctl_table rds_sysctl_rds_table[] = {
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
+	{
+		.ctl_name	= 10,
+		.procname	= "ping_enable",
+		.data		= &rds_sysctl_ping_enable,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = &proc_dointvec,
+	},
 	/* 100+ are reserved for transport subdirs */
 	{ .ctl_name = 0}
 };
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:14:50 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:14:50 +0200
Subject: [ofa-general] [PATCH 0/8] RDS patch set
Message-ID: <200804241114.51260.okir@lst.de>

Hi all,

here's another set of patches related to RDS. The patches can be found
in git://git.openfabrics.org/ofed_1_3/linux-2.6
and git://git.openfabrics.org/ofed_1_3/rds-tools

There are seven kernel patches. I would very much like to see the first
four of them in OFED 1.3.1 if possible. On the remaining 3, I'm not
particularly religious - I'm fine if they make it into 1.3.* at a later
time.

RDS: Fix IB max_unacked_* sysctls
	Straightforward bugfix.

mthca/mlx4: avoid recycling old FMR R_Keys too soon
	This is a re-run of a mthca patch I posted a while back; Jack
	Morgenstein requested that I should make the same change in the
	mlx4 driver. Here it is; review and feedback much appreciated.

Reduce struct rds_ib_send_work size
RDS: Increase the default number of WRs
	These two patches go together; they shrink the size of the
	send work entry we allocate in favor of allocating more of them.
	I would very much like to see these in OFED 1.3.1

RDS: Two small code reorgs in the connection code
RDS: Use IB for loopback
	These also go together. For loopback traffic, we need to use
	IB if available, instead of the special loopback transport currently
	used. The reason is that lots of our tests run on single hosts over
	loopback, and we want to stress things like RDMA.
	
RDS: Implement rds ping
	This is really a new feature. Essentially, ping over RDS.

There's a companion patch to rds-tools that implements the rds-ping
user space utility that leverages the functionality added by the kernel
patch above.

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From holt at sgi.com  Thu Apr 24 02:51:12 2008
From: holt at sgi.com (Robin Holt)
Date: Thu, 24 Apr 2008 04:51:12 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080424064753.GH24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
Message-ID: <20080424095112.GC30298@sgi.com>

I am not certain of this, but it seems like this patch leaves things in
a somewhat asymetric state.  At the very least, I think that asymetry
should be documented in the comments of either mmu_notifier.h or .c.

Before I do the first mmu_notifier_register, all places that test for
mm_has_notifiers(mm) will return false and take the fast path.

After I do some mmu_notifier_register()s and their corresponding
mmu_notifier_unregister()s, The mm_has_notifiers(mm) will return true
and the slow path will be taken.  This, despite all registered notifiers
having unregistered.

It seems to me the work done by mmu_notifier_mm_destroy should really
be done inside the mm_lock()/mm_unlock area of mmu_unregister and
mm_notifier_release when we have removed the last entry.  That would
give the users job the same performance after they are done using the
special device that they had prior to its use.


On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote:
...
> diff --git a/mm/memory.c b/mm/memory.c
> --- a/mm/memory.c
> +++ b/mm/memory.c
...
> @@ -603,25 +605,39 @@
>  	 * readonly mappings. The tradeoff is that copy_page_range is more
>  	 * efficient than faulting.
>  	 */
> +	ret = 0;
>  	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
>  		if (!vma->anon_vma)
> -			return 0;
> +			goto out;
>  	}
>  
> -	if (is_vm_hugetlb_page(vma))
> -		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
> +	if (unlikely(is_vm_hugetlb_page(vma))) {
> +		ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
> +		goto out;
> +	}
>  
> +	if (is_cow_mapping(vma->vm_flags))
> +		mmu_notifier_invalidate_range_start(src_mm, addr, end);
> +
> +	ret = 0;

I don't think this is needed.

...
> +/* avoid memory allocations for mm_unlock to prevent deadlock */
> +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
> +{
> +	if (mm->map_count) {
> +		if (data->nr_anon_vma_locks)
> +			mm_unlock_vfree(data->anon_vma_locks,
> +					data->nr_anon_vma_locks);
> +		if (data->i_mmap_locks)

I think you really want data->nr_i_mmap_locks.

...
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> new file mode 100644
> --- /dev/null
> +++ b/mm/mmu_notifier.c
...
> +/*
> + * This function can't run concurrently against mmu_notifier_register
> + * or any other mmu notifier method. mmu_notifier_register can only
> + * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is
> + * zero). All other tasks of this mm already quit so they can't invoke
> + * mmu notifiers anymore. This can run concurrently only against
> + * mmu_notifier_unregister and it serializes against it with the
> + * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
> + * away from under us as the exit_mmap holds a mm_count pin itself.
> + *
> + * The ->release method can't allow the module to be unloaded, the
> + * module can only be unloaded after mmu_notifier_unregister run. This
> + * is because the release method has to run the ret instruction to
> + * return back here, and so it can't allow the ret instruction to be
> + * freed.
> + */

The second paragraph of this comment seems extraneous.

...
> +	/*
> +	 * Wait ->release if mmu_notifier_unregister run list_del_rcu.
> +	 * srcu can't go away from under us because one mm_count is
> +	 * hold by exit_mmap.
> +	 */

These two sentences don't make any sense to me.

...
> +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	int before_release = 0, srcu;
> +
> +	BUG_ON(atomic_read(&mm->mm_count) <= 0);
> +
> +	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
> +	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
> +	if (!hlist_unhashed(&mn->hlist)) {
> +		hlist_del_rcu(&mn->hlist);
> +		before_release = 1;
> +	}
> +	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
> +	if (before_release)
> +		/*
> +		 * exit_mmap will block in mmu_notifier_release to
> +		 * guarantee ->release is called before freeing the
> +		 * pages.
> +		 */
> +		mn->ops->release(mn, mm);

I am not certain about the need to do the release callout when the driver
has already told this subsystem it is done.  For XPMEM, this callout
would immediately return.  I would expect it to be the same or GRU.

Thanks,
Robin


From jlentini at netapp.com  Thu Apr 24 06:50:48 2008
From: jlentini at netapp.com (James Lentini)
Date: Thu, 24 Apr 2008 09:50:48 -0400 (EDT)
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <20080424054235.GA11416@obsidianresearch.com>
References: <000401c8a4ca$c156a810$94248686@amr.corp.intel.com>
	<alpine.LFD.1.00.0804230950510.14137@jlentini-linux.nane.netapp.com>
	<20080424054235.GA11416@obsidianresearch.com>
Message-ID: <alpine.LFD.1.00.0804240950170.14137@jlentini-linux.nane.netapp.com>


On Wed, 23 Apr 2008, Jason Gunthorpe wrote:

> On Wed, Apr 23, 2008 at 09:56:50AM -0400, James Lentini wrote:
> > > I'm hoping that someone has a wonderfully brilliant idea for this 
> > > that would take about 1 day to implement.  :)
> > 
> > Is it time to bring back ATS?
> > 
> > http://lists.openfabrics.org/pipermail/general/2005-August/010247.html
> 
> Could you post this someplace where people who are not a member of the
> DAT group can access it?

Here's a publically accessible link:

http://www.datcollaborative.org/ATS_v1.pdf


From ogerlitz at voltaire.com  Thu Apr 24 06:52:07 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 24 Apr 2008 16:52:07 +0300
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to
	a temporary node soft lockup.
In-Reply-To: <20080423133816.6c1b6315.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
Message-ID: <48109087.6030606@voltaire.com>

Ira Weiny wrote:
> The symptom is that nodes drop out of the IPoIB mcast group after a node
> temporarily goes catatonic.  The details are:
>
>    1) Issues on a node cause a soft lockup of the node.
>    2) OpenSM does a normal light sweep.
>    3) MADs to the node time out since the node is in a "bad state"
>    4) OpenSM marks the node down and drops it from internal tables, including
>       mcast groups.
>    5) Node recovers from soft lock up condition.
>    6) A subsequent sweep causes OpenSM see the node and add it back to the
>       fabric.
As Hal noted, client reregister is the way to go.

In a similar discussion in the past the conclusion was that the SM 
should (maybe even according to the spec, but according to common sense 
is fine as well, I think) set the re-register bit where in that case 
IPoIB rejoins and we are done. At the time, I understood that openSM 
would do so 
(http://lists.openfabrics.org/pipermail/general/2007-September/041237.html), 
am I wrong, or maybe the case brought on that thread (switch/port going 
down and a whole sub fabric is removed from the SM point of view where 
the links remain up from the view point of the nodes) was different? the 
basic point is a case where a node link is UP and the SM lost this node 
for some time and now sees it again. We used to call it "the 
active/active" transition and an SM maybe need special logic for it.

Or.


From okir at lst.de  Thu Apr 24 02:10:50 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:10:50 +0200
Subject: [ofa-general] ***SPAM*** Re: [PATCH 3/8]: RDS: Reduce struct
	rds_ib_send_work size
In-Reply-To: <200804241109.52448.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de>
	<200804241109.52448.okir@lst.de>
Message-ID: <200804241110.51026.okir@lst.de>

From 8fcaa7d5000c8e3b2b7db235d2c279ccb98a6dec Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:27:35 -0700
Subject: [PATCH] RDS: reduce struct rds_ib_send_work size

Currently, struct rds_ib_send_work contains an array of 29 ib_sge's,
making the total size of each entry around 512 bytes. This severely
limits the maximum size of the send WQ, as we allocate the
array of work entries via one kmalloc() call.

Another problem with this approach is that SENDs never use more than
2 SGEs anyway, so that fully utilize the SGE array for RDMA ops only,
anyway.

Change this to 8 SGEs, which seems to be a better balance.

An alternative (but much more intrusive) patch would have been to
replace the s_sge array with a pointer, and allocate the sge array
dynamically. For OFED 1.3.1 I chose the more conservative approach.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 net/rds/ib.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index eda4a68..fd0b2d8 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -10,7 +10,7 @@
 #define RDS_FMR_SIZE			256
 #define RDS_FMR_POOL_SIZE		2048
 
-#define RDS_IB_MAX_SGE			29
+#define RDS_IB_MAX_SGE			8
 #define RDS_IB_RECV_SGE 		2
 
 /*
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From okir at lst.de  Thu Apr 24 02:14:03 2008
From: okir at lst.de (Olaf Kirch)
Date: Thu, 24 Apr 2008 11:14:03 +0200
Subject: [ofa-general] ***SPAM*** Re: [PATCH 8/8]: rds-tools: add new
	rds-ping utility
In-Reply-To: <200804241113.08841.okir@lst.de>
References: <200804241106.57172.okir@lst.de> <200804241112.19866.okir@lst.de>
	<200804241113.08841.okir@lst.de>
Message-ID: <200804241114.03810.okir@lst.de>

From 01d43fd80fe8ca463ec01c073bf3d8c03c7daa26 Mon Sep 17 00:00:00 2001
From: Olaf Kirch <olaf.kirch at oracle.com>
Date: Thu, 24 Apr 2008 00:49:37 -0700
Subject: [PATCH] Add new rds-ping utility

This adds a new utility that acts a lot like the traditional ping
command, but uses RDS instead of ICMP. Its main purpose is to have
a simple tool to check the reachability of remote nodes.

The required kernel patch is posted separately.

Signed-off-by: Olaf Kirch <olaf.kirch at oracle.com>
---
 Makefile.in |    3 +-
 rds-ping.1  |   69 +++++++++++
 rds-ping.c  |  385 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 456 insertions(+), 1 deletions(-)
 create mode 100644 rds-ping.1
 create mode 100644 rds-ping.c

diff --git a/Makefile.in b/Makefile.in
index 7cad5f1..363bb58 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -24,7 +24,7 @@ else
 COMMON_OBJECTS = $(subst .c,.o,$(filter-out pfhack.c,$(COMMON_SOURCES)))
 endif
 
-PROGRAMS = rds-gen rds-sink rds-info rds-stress
+PROGRAMS = rds-gen rds-sink rds-info rds-stress rds-ping
 
 all-programs: $(PROGRAMS)
 
@@ -65,6 +65,7 @@ EXTRA_DIST := 	rds-info.1 \
 		rds-gen.1 \
 		rds-sink.1 \
 		rds-stress.1 \
+		rds-ping.1 \
 		rds.7 \
 		rds-rdma.7 \
 		Makefile.in \
diff --git a/rds-ping.1 b/rds-ping.1
new file mode 100644
index 0000000..ae06787
--- /dev/null
+++ b/rds-ping.1
@@ -0,0 +1,69 @@
+.Dd Apr 22, 2008
+.Dt RDS-PING 1
+.Os
+.Sh NAME
+.Nm rds-ping
+.Nd test reachability of remote node over RDS
+.Pp
+.Sh SYNOPSIS
+.Nm rds-ping
+.Bk -words
+.Op Fl c Ar count
+.Op Fl i Ar interval
+.Op Fl I Ar local_addr
+.Ar remote_addr
+
+.Sh DESCRIPTION
+.Nm rds-ping
+is used to test whether a remote node is reachable over RDS.
+Its interface is designed to operate pretty much the standard
+.Xr ping 8
+utility, even though the way it works is pretty different.
+.Pp
+.Nm rds-ping
+opens several RDS sockets and sends packets to port 0 on
+the indicated host. This is a special port number to which
+no socket is bound; instead, the kernel processes incoming
+packets and responds to them.
+.Sh OPTIONS
+The following options are available for use on the command line:
+.Bl -tag -width Ds
+.It Fl c Ar count
+Causes
+.Nm rds-ping
+to exit after sending (and receiving) the specified number of
+packets.
+.It Fl I Ar address
+By default,
+.Nm rds-ping
+will pick the local source address for the RDS socket based
+on routing information for the destination address (i.e. if
+packets to the given destination would be routed through interface
+.Nm ib0 ,
+then it will use the IP address of
+.Nm ib0
+as source address).
+Using the
+.Fl I
+option, you can override this choice.
+.It Fl i Ar timeout
+By default,
+.Nm rds-ping
+will wait for one second between sending packets. Use this option
+to specified a different interval. The timeout value is given in
+seconds, and can be a floating point number. Optionally, append
+.Nm msec
+or
+.Nm usec
+to specify a timeout in milliseconds or microseconds, respectively.
+.It
+Specifying a timeout considerably smaller than the packet round-trip
+time will produce unexpected results.
+.El
+.Sh AUTHORS
+.Nm rds-ping
+was written by Olaf Kirch <olaf.kirch at oracle.com>.
+.Sh SEE ALSO
+.Xr rds 7 ,
+.Xr rds-info 1 ,
+.Xr rds-stress 1 .
diff --git a/rds-ping.c b/rds-ping.c
new file mode 100644
index 0000000..e9c88fc
--- /dev/null
+++ b/rds-ping.c
@@ -0,0 +1,385 @@
+/*
+ * rds-ping utility
+ *
+ * Test reachability of a remote RDS node by sending a packet to port 0.
+ *
+ * Copyright (C) 2008 Oracle.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <unistd.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/time.h>
+#include <errno.h>
+#include <string.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <netdb.h>
+#include <sys/poll.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include "net/rds.h"
+
+#ifdef DYNAMIC_PF_RDS
+#include "pfhack.h"
+#endif
+
+#define die(fmt...) do {		\
+	fprintf(stderr, fmt);		\
+	exit(1);			\
+} while (0)
+
+#define die_errno(fmt, args...) do {				\
+	fprintf(stderr, fmt ", errno: %d (%s)\n", ##args , errno,\
+		strerror(errno));				\
+	exit(1);						\
+} while (0)
+
+static struct timeval	opt_wait = { 1, 1 };		/* 1s */
+static unsigned long	opt_count;
+static struct in_addr	opt_srcaddr;
+static struct in_addr	opt_dstaddr;
+
+/* For reasons of simplicity, RDS ping does not use a packet
+ * payload that is being echoed, the way ICMP does.
+ * Instead, we open a number of sockets on different ports, and
+ * match packet sequence numbers with ports.
+ */
+#define NSOCKETS	8
+
+struct socket {
+	int fd;
+	unsigned int sent_id;
+	struct timeval sent_ts;
+	unsigned int nreplies;
+};
+
+
+static int	do_ping(void);
+static void	report_packet(struct socket *sp, const struct timeval *now,
+			const struct in_addr *from, int err);
+static void	usage(const char *complaint);
+static int	rds_socket(struct in_addr *src, struct in_addr *dst);
+static int	parse_timeval(const char *, struct timeval *);
+static int	parse_long(const char *ptr, unsigned long *);
+static int	parse_addr(const char *ptr, struct in_addr *);
+
+int
+main(int argc, char **argv)
+{
+	int c;
+
+	while ((c = getopt(argc, argv, "c:i:I:")) != -1) {
+		switch (c) {
+		case 'c':
+			if (!parse_long(optarg, &opt_count))
+				die("Bad packet count <%s>\n", optarg);
+			break;
+
+		case 'I':
+			if (!parse_addr(optarg, &opt_srcaddr))
+				die("Unknown source address <%s>\n", optarg);
+			break;
+
+		case 'i':
+			if (!parse_timeval(optarg, &opt_wait))
+				die("Bad wait time <%s>\n", optarg);
+			break;
+
+		default:
+			usage("Unknown option");
+		}
+	}
+
+	if (optind + 1 != argc)
+		usage("Missing destination address");
+	if (!parse_addr(argv[optind], &opt_dstaddr))
+		die("Cannot parse destination address <%s>\n", argv[optind]);
+
+	return do_ping();
+}
+
+/* returns a - b in usecs */
+static inline long
+usec_sub(const struct timeval *a, const struct timeval *b)
+{
+	return ((long)(a->tv_sec - b->tv_sec) * 1000000UL) + a->tv_usec - b->tv_usec;
+}
+
+static int
+do_ping(void)
+{
+	struct sockaddr_in sin;
+	unsigned int	sent = 0, recv = 0;
+	struct timeval	next_ts;
+	struct socket	socket[NSOCKETS];
+	struct pollfd	pfd[NSOCKETS];
+	int		i, next = 0;
+
+	for (i = 0; i < NSOCKETS; ++i) {
+		int fd;
+
+		fd = rds_socket(&opt_srcaddr, &opt_dstaddr);
+
+		socket[i].fd = fd;
+		pfd[i].fd = fd;
+		pfd[i].events = POLLIN;
+	}
+
+	memset(&sin, 0, sizeof(sin));
+	sin.sin_family = AF_INET;
+	sin.sin_addr = opt_dstaddr;
+
+	gettimeofday(&next_ts, NULL);
+	while (1) {
+		struct timeval	now;
+		struct sockaddr_in from;
+		socklen_t	alen = sizeof(from);
+		long		deadline;
+		int		ret;
+
+		/* Fast way out - if we have received all packets, bail now.
+		 * If we're still waiting for some to come back, we need
+		 * to do the poll() below */
+		if (opt_count && recv >= opt_count)
+			break;
+
+		gettimeofday(&now, NULL);
+		if (timercmp(&now, &next_ts, >=)) {
+			struct socket *sp = &socket[next];
+			int err = 0;
+
+			if (opt_count && sent >= opt_count)
+				break;
+
+			timeradd(&next_ts, &opt_wait, &next_ts);
+			if (sendto(sp->fd, NULL, 0, 0, (struct sockaddr *) &sin, sizeof(sin)))
+				err = errno;
+			sp->sent_id = ++sent;
+			sp->sent_ts = now;
+			sp->nreplies = 0;
+			next = (next + 1) % NSOCKETS;
+
+			if (err) {
+				static unsigned int nerrs = 0;
+
+				report_packet(sp, NULL, NULL, err);
+				if (err == EINVAL && nerrs++ == 0)
+					printf("      Maybe your kernel does not support rds ping yet\n");
+			}
+		}
+
+		deadline = usec_sub(&next_ts, &now);
+		ret = poll(pfd, NSOCKETS, deadline / 1000);
+		if (ret < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("poll");
+		}
+		if (ret == 0)
+			continue;
+
+		for (i = 0; i < NSOCKETS; ++i) {
+			struct socket *sp = &socket[i];
+
+			if (!(pfd[i].revents & POLLIN))
+				continue;
+
+			ret = recvfrom(sp->fd, NULL, 0, MSG_DONTWAIT,
+					(struct sockaddr *) &from, &alen);
+			gettimeofday(&now, NULL);
+
+			if (ret < 0) {
+				if (errno != EAGAIN &&
+				    errno != EINTR)
+					report_packet(sp, &now, NULL, errno);
+			} else {
+				report_packet(sp, &now, &from.sin_addr, 0);
+				recv++;
+			}
+		}
+	}
+
+	/* Program exit code: signal success if we received any response. */
+	return recv == 0;
+}
+
+static void
+report_packet(struct socket *sp, const struct timeval *now,
+		const struct in_addr *from_addr, int err)
+{
+	printf(" %3u:", sp->sent_id);
+	if (now)
+		printf(" %ld usec", usec_sub(now, &sp->sent_ts));
+	if (from_addr && from_addr->s_addr != opt_dstaddr.s_addr)
+		printf(" (%s)", inet_ntoa(*from_addr));
+	if (sp->nreplies)
+		printf(" DUP!");
+	if (err)
+		printf(" ERROR: %s", strerror(err));
+	printf("\n");
+
+	sp->nreplies++;
+}
+
+static int
+rds_socket(struct in_addr *src, struct in_addr *dst)
+{
+	struct sockaddr_in sin;
+	int fd;
+
+	memset(&sin, 0, sizeof(sin));
+	sin.sin_family = AF_INET;
+
+	fd = socket(PF_RDS, SOCK_SEQPACKET, 0);
+	if (fd < 0)
+		die_errno("unable to create RDS socket");
+
+	/* Guess the local source addr if not given. */
+	if (src->s_addr == 0) {
+		socklen_t alen;
+		int ufd;
+
+		ufd = socket(PF_INET, SOCK_DGRAM, 0);
+		if (ufd < 0)
+			die_errno("unable to create UDP socket");
+		sin.sin_addr = *dst;
+		sin.sin_port = htons(1);
+		if (connect(ufd, (struct sockaddr *) &sin, sizeof(sin)) < 0)
+			die_errno("unable to connect to %s",
+					inet_ntoa(*dst));
+
+		alen = sizeof(sin);
+		if (getsockname(ufd, (struct sockaddr *) &sin, &alen) < 0)
+			die_errno("getsockname failed");
+
+		*src = sin.sin_addr;
+		close(ufd);
+	}
+
+	sin.sin_addr = *src;
+	sin.sin_port = 0;
+
+	if (bind(fd, (struct sockaddr *) &sin, sizeof(sin)))
+		die_errno("bind() failed");
+
+	return fd;
+}
+
+static void
+usage(const char *complaint)
+{
+	fprintf(stderr,
+		"%s\nUsage: rds-ping [options] dst_addr\n"
+		"Options:\n"
+		" -c count      limit packet count\n"
+		" -I interface  source IP address\n",
+		complaint);
+	exit(1);
+}
+
+static int
+parse_timeval(const char *ptr, struct timeval *ret)
+{
+	double	seconds;
+	char *endptr;
+
+	seconds = strtod(ptr, &endptr);
+	if (!strcmp(endptr, "ms")
+	 || !strcmp(endptr, "msec")) {
+		seconds *= 1e-3;
+	} else
+	if (!strcmp(endptr, "us")
+	 || !strcmp(endptr, "usec")) {
+		seconds *= 1e-6;
+	} else if (*endptr)
+		return 0;
+
+	ret->tv_sec = (long) seconds;
+	seconds -= ret->tv_sec;
+
+	ret->tv_usec = (long) (seconds * 1e6);
+	return 1;
+}
+
+static int
+parse_long(const char *ptr, unsigned long *ret)
+{
+	unsigned long long val;
+	char *endptr;
+
+	val = strtoull(ptr, &endptr, 0);
+	switch (*endptr) {
+	case 'k': case 'K':
+		val <<= 10;
+		endptr++;
+		break;
+
+	case 'm': case 'M':
+		val <<= 20;
+		endptr++;
+		break;
+
+	case 'g': case 'G':
+		val <<= 30;
+		endptr++;
+		break;
+	}
+
+	if (*endptr)
+		return 0;
+
+	*ret = val;
+	return 1;
+}
+
+static int
+parse_addr(const char *ptr, struct in_addr *ret)
+{
+        struct hostent *hent;
+
+        hent = gethostbyname(ptr);
+        if (hent &&
+            hent->h_addrtype == AF_INET && hent->h_length == sizeof(*ret)) {
+		memcpy(ret, hent->h_addr, sizeof(*ret));
+		return 1;
+	}
+
+	return 0;
+}
+
+/*
+ * This are completely stupid.  options.c should be removed.
+ */
+void print_usage(int durr) { }
+void print_version() { }
-- 
1.5.4.rc3


-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From andrea at qumranet.com  Thu Apr 24 08:39:43 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 24 Apr 2008 17:39:43 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080424095112.GC30298@sgi.com>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
Message-ID: <20080424153943.GJ24536@duo.random>

On Thu, Apr 24, 2008 at 04:51:12AM -0500, Robin Holt wrote:
> It seems to me the work done by mmu_notifier_mm_destroy should really
> be done inside the mm_lock()/mm_unlock area of mmu_unregister and

There's no mm_lock/unlock for mmu_unregister anymore. That's the whole
point of using srcu so it becomes reliable and quick.

> mm_notifier_release when we have removed the last entry.  That would
> give the users job the same performance after they are done using the
> special device that they had prior to its use.

That's not feasible. Otherwise mmu_notifier_mm will go away at any
time under both _release from exit_mmap and under _unregister
too. exit_mmap holds an mm_count implicit, so freeing mmu_notifier_mm
after the last mmdrop makes it safe. mmu_notifier_unregister also
holds the mm_count because mm_count was pinned by
mmu_notifier_register. That solves the issue with mmu_notifier_mm
going away from under mmu_notifier_unregister and _release and that's
why it can only be freed after mm_count == 0.

There's at least one small issue I noticed so far, that while _release
don't need to care about _register, but _unregister definitely need to
care about _register. I've to take the mmap_sem in addition or in
replacement of the unregister_lock. The srcu_read_lock can also likely
moved just before releasing the unregister_lock but that's just a
minor optimization to make the code more strict.

> On Thu, Apr 24, 2008 at 08:49:40AM +0200, Andrea Arcangeli wrote:
> ...
> > diff --git a/mm/memory.c b/mm/memory.c
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> ...
> > @@ -603,25 +605,39 @@
> >  	 * readonly mappings. The tradeoff is that copy_page_range is more
> >  	 * efficient than faulting.
> >  	 */
> > +	ret = 0;
> >  	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
> >  		if (!vma->anon_vma)
> > -			return 0;
> > +			goto out;
> >  	}
> >  
> > -	if (is_vm_hugetlb_page(vma))
> > -		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
> > +	if (unlikely(is_vm_hugetlb_page(vma))) {
> > +		ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
> > +		goto out;
> > +	}
> >  
> > +	if (is_cow_mapping(vma->vm_flags))
> > +		mmu_notifier_invalidate_range_start(src_mm, addr, end);
> > +
> > +	ret = 0;
> 
> I don't think this is needed.

It's not needed right, but I thought it was cleaner if they all use
"ret" after I had to change the code at the end of the
function. Anyway I'll delete this to make the patch shorter and only
change the minimum, agreed.

> ...
> > +/* avoid memory allocations for mm_unlock to prevent deadlock */
> > +void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data)
> > +{
> > +	if (mm->map_count) {
> > +		if (data->nr_anon_vma_locks)
> > +			mm_unlock_vfree(data->anon_vma_locks,
> > +					data->nr_anon_vma_locks);
> > +		if (data->i_mmap_locks)
> 
> I think you really want data->nr_i_mmap_locks.

Indeed. It never happens that there are zero vmas with filebacked
mappings, this is why this couldn't be triggered in practice, thanks!

> The second paragraph of this comment seems extraneous.

ok removed.

> > +	/*
> > +	 * Wait ->release if mmu_notifier_unregister run list_del_rcu.
> > +	 * srcu can't go away from under us because one mm_count is
> > +	 * hold by exit_mmap.
> > +	 */
> 
> These two sentences don't make any sense to me.

Well that was a short explanation of why the mmu_notifier_mm structure
can only be freed after the last mmdrop, which is what you asked at
the top. I'll try to rephrase.

> > +void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
> > +{
> > +	int before_release = 0, srcu;
> > +
> > +	BUG_ON(atomic_read(&mm->mm_count) <= 0);
> > +
> > +	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
> > +	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
> > +	if (!hlist_unhashed(&mn->hlist)) {
> > +		hlist_del_rcu(&mn->hlist);
> > +		before_release = 1;
> > +	}
> > +	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
> > +	if (before_release)
> > +		/*
> > +		 * exit_mmap will block in mmu_notifier_release to
> > +		 * guarantee ->release is called before freeing the
> > +		 * pages.
> > +		 */
> > +		mn->ops->release(mn, mm);
> 
> I am not certain about the need to do the release callout when the driver
> has already told this subsystem it is done.  For XPMEM, this callout
> would immediately return.  I would expect it to be the same or GRU.

The point is that you don't want to run it twice. And without this you
will have to serialize against ->release yourself in the driver. It's
much more convenient if you know that ->release will be called just
once, and before mmu_notifier_unregister returns. It could be called
by _release even after you're already inside _unregister, _release may
reach the spinlock before _unregister, you won't notice the
difference. Solving this race in the driver looked too complex, I
rather solve it once inside the mmu notifier code to be sure. Missing
a release event is fatal because all sptes have to be dropped before
_release returns. The requirement is the same for _unregister, all
sptes have to be dropped before it returns. ->release should be able
to sleep as long as it wants even with only 1/N applied. exit_mmap can
sleep too, no problem. You can't unregister inside ->release first of
all because 'ret' instruction must be still allocated to return to mmu
notifier code.


From ogerlitz at voltaire.com  Thu Apr 24 08:53:48 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Thu, 24 Apr 2008 18:53:48 +0300
Subject: [ofa-general] Re: [PATCH 6/8]: RDS: Use IB for loopback
In-Reply-To: <200804241112.19866.okir@lst.de>
References: <200804241106.57172.okir@lst.de>
	<200804241111.26726.okir@lst.de>	<200804241111.56693.okir@lst.de>
	<200804241112.19866.okir@lst.de>
Message-ID: <4810AD0C.9000305@voltaire.com>

Olaf Kirch wrote:
> In order to be able to test RDS (and RDS over RDMA) faithfully
> on standalone machines, we want loopback traffic to use the IB
> transport if possible.
Olaf,

Beyond the details of this patch, one thing which should be on the table 
here is the IB RC LID matching rules stating that when an RC QP is 
configured it is set with <SLID, DLID> and if a packet is received where 
its LRH does not match these lids, its dropped. In other words, if 
through the test, the standalone machine becomes attached to IB fabric 
and LIDs are set to the ports by the SM, the loopback connection would 
get broken. Generally speaking, with RDS this should not be big deal, 
since it will reconnect, but I just wanted to make sure we all aware to 
this.

Or


From xavier at tddft.org  Thu Apr 24 09:46:27 2008
From: xavier at tddft.org (Xavier Andrade)
Date: Thu, 24 Apr 2008 18:46:27 +0200 (CEST)
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <adar6cw87v0.fsf@cisco.com>
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>
	<adaiqy89y59.fsf@cisco.com>
	<Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>
	<ada8wz49r3s.fsf@cisco.com>
	<Pine.LNX.4.64.0804240107500.13031@theory.polytechnique.fr>
	<adar6cw87v0.fsf@cisco.com>
Message-ID: <Pine.LNX.4.64.0804241600440.5313@theory.polytechnique.fr>

Hi,

On Wed, 23 Apr 2008, Roland Dreier wrote:

> Hmm, not sure... let's see what the Mellanox guys say (they're mostly on
> vacation this week so it might be a few days).
>
> The only things I can think of to try are:
> - go to mellanox.com and get latest FW and make sure there's not
>   anything strange about what's on your card (but given that it is seen
>   by the driver, the FW must at least have a valid checksum I think)
>

I can't locate the correct firmware, the PSID reported by mtsflint 
corresponds to an Intel one:

Image type:      Failsafe
I.S. Version:    1
Chip Revision:   A0
Description:     Node             Port1            Sys image
GUIDs:           0002c9020022baa4 0002c9020022baa5 0002c9020022baa7
Board ID:         (INT0010000001)
VSD:
PSID:            INT0010000001

But I haven't been able to find any firmware in Intel's webpage.

Do you think that I could use a Mellanox firmware? Which one? There 
are three different ones for the MT25204.

> - if you're building your own kernel, try the Debian 2.6.24 generic
>   amd64 image and see if that's any different, because I definitely
>   have mt25204 HCAs working with that.
>

I tried with 2.6.18 (the default etch kernel) and it gave the same 
problem.

Finally, I upgraded the BIOS of the machine (to version 85 from 79) and 
the module is loaded without problems and everything works correctly. So 
probably it was motherboard issue.

Regards,

Xavier


From weiny2 at llnl.gov  Thu Apr 24 09:57:52 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Thu, 24 Apr 2008 09:57:52 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <1209000441.689.216.camel@hrosenstock-ws.xsigo.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<1208995514.689.210.camel@hrosenstock-ws.xsigo.com>
	<1209000441.689.216.camel@hrosenstock-ws.xsigo.com>
Message-ID: <20080424095752.416d5d55.weiny2@llnl.gov>

On Wed, 23 Apr 2008 18:27:21 -0700
Hal Rosenstock <hrosenstock at xsigo.com> wrote:

> On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote:
> > On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> > > Hey all,
> > > 

<snip>

> > 
> > > Thoughts?
> > 
> > Having OpenSM request client reregistration (used in other places by
> > OpenSM) of such nodes will resolve this issue. As little or as much
> > policy can be built into OpenSM in determining "such" nodes to scope
> > down the application of this mechanism for this case.
> 
> One side comment on the non OpenSM aspect of this: 
> 
> Why is the node temporarily unavailable ? There is a "contract" that the
> node makes with the SM that it clearly isn't honoring. Is any
> investigation going on relative to this aspect of the issue ?
> 

Yes, we are working on finding the root cause.  I agree that the "contract" is
not being honored.  This is one of the reasons I was hesitant to implement any
fix to be submitted.  I don't think this is truly a bug in the stack.
However, I could see this causing issues for people[*] and it might be nice to
have a "fix".

Ira

[*] Particularly those who do not have any other connection to nodes other than
IB.


From michael.heinz at qlogic.com  Thu Apr 24 10:15:30 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Thu, 24 Apr 2008 12:15:30 -0500
Subject: [ofa-general] [PATCH 1/1] RPM Spec files
Message-ID: <C07C40DB2364324799506DE8FF12F8D8678839@EPEXCH1.qlogic.org>

Installation of OFED 1.3.0.0.4 onto a Kusu/OCS cluster does not fully
succeed because of some missing dependencies in the RPM spec files. This
is because Kusu installs nodes over a network by presenting a pool of
RPMs to be installed and letting RPM figure out the order to install
them in. Without the dependencies we ended up with oddities like the
kernel drivers being installed before the /usr/bin directory had been
populated, causing the install script to fail.
 
I was able to work around this by manually expanding some of the source
RPM files, altering the spec file and repackaging the source RPM. This
allowed me to build binary RPMs (via the install script) that could be
installed on a Kusu cluster.
 
Here are the proposed changes. If there is a better/preferred way of
submitting this suggestion, please let me know.
 
--- ../../original/ib-bonding.spec      2008-04-22 12:54:12.000000000
-0400
+++ ib-bonding.spec     2008-04-22 12:43:07.000000000 -0400
@@ -20,6 +20,7 @@
 Group           : Applications/System
 License         : GPL
 BuildRoot:      %{_tmppath}/%{name}-%{version}-root
+PreReq         : coreutils
 
 %description
 This package provides a bonding device which is capable of enslaving
--- ../../original/ofa_kernel.spec      2008-04-22 12:54:13.000000000
-0400
+++ ofa_kernel.spec     2008-04-22 12:45:40.000000000 -0400
@@ -111,6 +111,9 @@
 BuildRequires: sysfsutils-devel
 
 %package -n kernel-ib
+PreReq: coreutils
+PreReq: kernel
+PreReq: pciutils
 Version: %{_version}
 Release: %{krelver}
 Summary: Infiniband Driver and ULPs kernel modules
@@ -119,6 +122,10 @@
 Core, HW and ULPs kernel modules
 
 %package -n kernel-ib-devel
+PreReq: coreutils
+PreReq: kernel
+PreReq: pciutils
+Requires: kernel-ib
 Version: %{_version}
 Release: %{krelver}
 Summary: Infiniband Driver and ULPs kernel modules sources
--- ../../original/open-iscsi-generic.spec      2008-04-22
12:54:13.000000000 -0400
+++ open-iscsi-generic.spec     2008-04-22 12:42:33.000000000 -0400
@@ -21,6 +21,7 @@
 %define kversion $(uname -r | sed "s/-ppc64\|-smp//")
 
 %package -n iscsi-initiator-utils
+PreReq: coreutils
 Summary                : iSCSI daemon and utility programs
 Group          : System Environment/Daemons
 %description -n iscsi-initiator-utils
@@ -30,6 +31,7 @@
 Protocol networks.
 
 %package -n open-iscsi
+PreReq: coreutils
 Summary     : Linux* Open-iSCSI Software Initiator
 Group       : Productivity/Networking/Other
 %description -n open-iscsi

 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania


From andrea at qumranet.com  Thu Apr 24 10:41:45 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Thu, 24 Apr 2008 19:41:45 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080424153943.GJ24536@duo.random>
References: <ea87c15371b1bd49380c.1208872277@duo.random>
	<Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
Message-ID: <20080424174145.GM24536@duo.random>

On Thu, Apr 24, 2008 at 05:39:43PM +0200, Andrea Arcangeli wrote:
> There's at least one small issue I noticed so far, that while _release
> don't need to care about _register, but _unregister definitely need to
> care about _register. I've to take the mmap_sem in addition or in

In the end the best is to use the spinlock around those
list_add/list_del they all run in O(1) with the hlist and they take a
few asm insn. This also avoids to take the mmap_sem in exit_mmap, at
exit_mmap time nobody should need to use mmap_sem anymore, it might
work but this looks cleaner. The lock is dynamically allocated only
when the notifiers are registered, so the few bytes taken by it aren't
relevant.

A full new update will some become visible here:

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/

Please have a close look again. Your help is extremely appreciated and
very helpful as usual! Thanks a lot.

diff -urN xxx/include/linux/mmu_notifier.h xx/include/linux/mmu_notifier.h
--- xxx/include/linux/mmu_notifier.h	2008-04-24 19:41:15.000000000 +0200
+++ xx/include/linux/mmu_notifier.h	2008-04-24 19:38:37.000000000 +0200
@@ -15,7 +15,7 @@
 	struct hlist_head list;
 	struct srcu_struct srcu;
 	/* to serialize mmu_notifier_unregister against mmu_notifier_release */
-	spinlock_t unregister_lock;
+	spinlock_t lock;
 };
 
 struct mmu_notifier_ops {
diff -urN xxx/mm/memory.c xx/mm/memory.c
--- xxx/mm/memory.c	2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/memory.c	2008-04-24 19:38:37.000000000 +0200
@@ -605,16 +605,13 @@
 	 * readonly mappings. The tradeoff is that copy_page_range is more
 	 * efficient than faulting.
 	 */
-	ret = 0;
 	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
 		if (!vma->anon_vma)
-			goto out;
+			return 0;
 	}
 
-	if (unlikely(is_vm_hugetlb_page(vma))) {
-		ret = copy_hugetlb_page_range(dst_mm, src_mm, vma);
-		goto out;
-	}
+	if (is_vm_hugetlb_page(vma))
+		return copy_hugetlb_page_range(dst_mm, src_mm, vma);
 
 	if (is_cow_mapping(vma->vm_flags))
 		mmu_notifier_invalidate_range_start(src_mm, addr, end);
@@ -636,7 +633,6 @@
 	if (is_cow_mapping(vma->vm_flags))
 		mmu_notifier_invalidate_range_end(src_mm,
 						  vma->vm_start, end);
-out:
 	return ret;
 }
 
diff -urN xxx/mm/mmap.c xx/mm/mmap.c
--- xxx/mm/mmap.c	2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmap.c	2008-04-24 19:38:37.000000000 +0200
@@ -2381,7 +2381,7 @@
 		if (data->nr_anon_vma_locks)
 			mm_unlock_vfree(data->anon_vma_locks,
 					data->nr_anon_vma_locks);
-		if (data->i_mmap_locks)
+		if (data->nr_i_mmap_locks)
 			mm_unlock_vfree(data->i_mmap_locks,
 					data->nr_i_mmap_locks);
 	}
diff -urN xxx/mm/mmu_notifier.c xx/mm/mmu_notifier.c
--- xxx/mm/mmu_notifier.c	2008-04-24 19:41:15.000000000 +0200
+++ xx/mm/mmu_notifier.c	2008-04-24 19:31:23.000000000 +0200
@@ -24,22 +24,16 @@
  * zero). All other tasks of this mm already quit so they can't invoke
  * mmu notifiers anymore. This can run concurrently only against
  * mmu_notifier_unregister and it serializes against it with the
- * unregister_lock in addition to RCU. struct mmu_notifier_mm can't go
- * away from under us as the exit_mmap holds a mm_count pin itself.
- *
- * The ->release method can't allow the module to be unloaded, the
- * module can only be unloaded after mmu_notifier_unregister run. This
- * is because the release method has to run the ret instruction to
- * return back here, and so it can't allow the ret instruction to be
- * freed.
+ * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm
+ * can't go away from under us as exit_mmap holds a mm_count pin
+ * itself.
  */
 void __mmu_notifier_release(struct mm_struct *mm)
 {
 	struct mmu_notifier *mn;
 	int srcu;
 
-	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
-	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+	spin_lock(&mm->mmu_notifier_mm->lock);
 	while (unlikely(!hlist_empty(&mm->mmu_notifier_mm->list))) {
 		mn = hlist_entry(mm->mmu_notifier_mm->list.first,
 				 struct mmu_notifier,
@@ -52,23 +46,28 @@
 		 */
 		hlist_del_init(&mn->hlist);
 		/*
+		 * SRCU here will block mmu_notifier_unregister until
+		 * ->release returns.
+		 */
+		srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+		spin_unlock(&mm->mmu_notifier_mm->lock);
+		/*
 		 * if ->release runs before mmu_notifier_unregister it
 		 * must be handled as it's the only way for the driver
-		 * to flush all existing sptes before the pages in the
-		 * mm are freed.
+		 * to flush all existing sptes and stop the driver
+		 * from establishing any more sptes before all the
+		 * pages in the mm are freed.
 		 */
-		spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
-		/* SRCU will block mmu_notifier_unregister */
 		mn->ops->release(mn, mm);
-		spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+		srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+		spin_lock(&mm->mmu_notifier_mm->lock);
 	}
-	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
-	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+	spin_unlock(&mm->mmu_notifier_mm->lock);
 
 	/*
-	 * Wait ->release if mmu_notifier_unregister run list_del_rcu.
-	 * srcu can't go away from under us because one mm_count is
-	 * hold by exit_mmap.
+	 * Wait ->release if mmu_notifier_unregister is running it.
+	 * The mmu_notifier_mm can't go away from under us because one
+	 * mm_count is hold by exit_mmap.
 	 */
 	synchronize_srcu(&mm->mmu_notifier_mm->srcu);
 }
@@ -177,11 +176,19 @@
 			goto out_unlock;
 		}
 		INIT_HLIST_HEAD(&mm->mmu_notifier_mm->list);
-		spin_lock_init(&mm->mmu_notifier_mm->unregister_lock);
+		spin_lock_init(&mm->mmu_notifier_mm->lock);
 	}
 	atomic_inc(&mm->mm_count);
 
+	/*
+	 * Serialize the update against mmu_notifier_unregister. A
+	 * side note: mmu_notifier_release can't run concurrently with
+	 * us because we hold the mm_users pin (either implicitly as
+	 * current->mm or explicitly with get_task_mm() or similar).
+	 */
+	spin_lock(&mm->mmu_notifier_mm->lock);
 	hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
+	spin_unlock(&mm->mmu_notifier_mm->lock);
 out_unlock:
 	mm_unlock(mm, &data);
 out:
@@ -215,23 +222,32 @@
 
 	BUG_ON(atomic_read(&mm->mm_count) <= 0);
 
-	srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
-	spin_lock(&mm->mmu_notifier_mm->unregister_lock);
+	spin_lock(&mm->mmu_notifier_mm->lock);
 	if (!hlist_unhashed(&mn->hlist)) {
 		hlist_del_rcu(&mn->hlist);
 		before_release = 1;
 	}
-	spin_unlock(&mm->mmu_notifier_mm->unregister_lock);
 	if (before_release)
 		/*
+		 * SRCU here will force exit_mmap to wait ->release to finish
+		 * before freeing the pages.
+		 */
+		srcu = srcu_read_lock(&mm->mmu_notifier_mm->srcu);
+	spin_unlock(&mm->mmu_notifier_mm->lock);
+	if (before_release) {
+		/*
 		 * exit_mmap will block in mmu_notifier_release to
 		 * guarantee ->release is called before freeing the
 		 * pages.
 		 */
 		mn->ops->release(mn, mm);
-	srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+		srcu_read_unlock(&mm->mmu_notifier_mm->srcu, srcu);
+	}
 
-	/* wait any running method to finish, including ->release */
+	/*
+	 * Wait any running method to finish, of course including
+	 * ->release if it was run by mmu_notifier_relase instead of us.
+	 */
 	synchronize_srcu(&mm->mmu_notifier_mm->srcu);
 
 	BUG_ON(atomic_read(&mm->mm_count) <= 0);


From hrosenstock at xsigo.com  Thu Apr 24 12:07:03 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 24 Apr 2008 12:07:03 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <20080424095752.416d5d55.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<1208995514.689.210.camel@hrosenstock-ws.xsigo.com>
	<1209000441.689.216.camel@hrosenstock-ws.xsigo.com>
	<20080424095752.416d5d55.weiny2@llnl.gov>
Message-ID: <1209064023.689.249.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-24 at 09:57 -0700, Ira Weiny wrote:

> > One side comment on the non OpenSM aspect of this: 
> > 
> > Why is the node temporarily unavailable ? There is a "contract" that the
> > node makes with the SM that it clearly isn't honoring. Is any
> > investigation going on relative to this aspect of the issue ?
> > 
> 
> Yes, we are working on finding the root cause.  I agree that the "contract" is
> not being honored.  This is one of the reasons I was hesitant to implement any
> fix to be submitted. 

I think the two issues can be tackled in parallel.

> I don't think this is truly a bug in the stack.

Any ideas on what it is ? If not, would you be willing to try something
assuming the end node issue is easily reproducible ?

> However, I could see this causing issues for people[*] and it might be nice to
> have a "fix".

Sure; both are issues which should be understood better and fixed IMO.

-- Hal

> Ira
> 
> [*] Particularly those who do not have any other connection to nodes other than
> IB.


From Brian.Murrell at Sun.COM  Thu Apr 24 12:28:00 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Thu, 24 Apr 2008 15:28:00 -0400
Subject: [ofa-general] kernel-ib on rhel5
Message-ID: <1209065280.18036.216.camel@pc.ilinx>

I wonder, what is the strategy for kernel-ib to exist on a machine with
the standard RHEL5 kernel installed.  The standard RHEL5 kernel of
course includes an OFED release and as such modules of the same name as
the OFED ones.

I can see that by default, the ofa_kernel.spec installs it's modules
into /lib/modules/%{KVERSION}/updates but how does that insure than when
a kernel module is loaded with modprobe that the one
in /lib/modules/%{KVERSION}/updates will be preferred over the one in
lib/modules/%{KVERSION}/ (i.e. provided by the RHEL5 kernel RPM)?

Thanx,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080424/c1226f5e/attachment.sig>

From swise at opengridcomputing.com  Thu Apr 24 12:57:30 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 24 Apr 2008 14:57:30 -0500
Subject: [ofa-general] dapl bug?
Message-ID: <4810E62A.6070807@opengridcomputing.com>

Hey Arlin,

Have you ever seen this?  I hit this 100% of the time trying the 1.2 
version of dapltest on an ofed-1.3 system.  The debug info below was 
obtained by builting the src rpm with debug enabled...

> (gdb) r -T T -d -s vic11-10g -D chelsio -i 10 client SR 256 server SR 
> 256 client SR 256 server SR 256
> Starting program: /usr/bin/dapltest -T T -d -s vic11-10g -D chelsio -i 
> 10 client SR 256 server SR 256 client SR 256 server SR 256
> [Thread debugging using libthread_db enabled]
> [New Thread 46912498371600 (LWP 6654)]
> -------------------------------------
> TransCmd.server_name              : vic11-10g
> TransCmd.num_iterations           : 10
> TransCmd.num_threads              : 1
> TransCmd.eps_per_thread           : 1
> TransCmd.validate                 : 0
> TransCmd.dapl_name                : chelsio
> TransCmd.num_ops                  : 4
> TransCmd.op[0].transfer_type      : SEND_RECV  (client)
> TransCmd.op[0].seg_size           : 256
> TransCmd.op[0].num_segs           : 1
> TransCmd.op[0].reap_send_on_recv  : 0
> TransCmd.op[1].transfer_type      : SEND_RECV  (server)
> TransCmd.op[1].seg_size           : 256
> TransCmd.op[1].num_segs           : 1
> TransCmd.op[1].reap_send_on_recv  : 0
> TransCmd.op[2].transfer_type      : SEND_RECV  (client)
> TransCmd.op[2].seg_size           : 256
> TransCmd.op[2].num_segs           : 1
> TransCmd.op[2].reap_send_on_recv  : 0
> TransCmd.op[3].transfer_type      : SEND_RECV  (server)
> TransCmd.op[3].seg_size           : 256
> TransCmd.op[3].num_segs           : 1
> TransCmd.op[3].reap_send_on_recv  : 0
> Server Name: vic11-10g
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 46912498371600 (LWP 6654)]
> 0x00000032f04760b0 in strlen () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00000032f04760b0 in strlen () from /lib64/libc.so.6
> #1  0x00000032f044602b in vfprintf () from /lib64/libc.so.6
> #2  0x00000032f044bdea in printf () from /lib64/libc.so.6
> #3  0x0000000000403900 in DT_NetAddrLookupHostAddress 
> (to_netaddr=0x7e16f88, hostname=0x7e1658c "vic11-10g") at 
> cmd/dapl_netaddr.c:136
> #4  0x00000000004026cb in DT_Params_Parse (argc=<value optimized out>, 
> argv=<value optimized out>, params_ptr=0x7e16580) at cmd/dapl_params.c:205
> #5  0x000000000040211f in dapltest (argc=22, argv=0x7fff48e9b5f8) at 
> cmd/dapl_main.c:88
> #6  0x00000032f041d8a4 in __libc_start_main () from /lib64/libc.so.6
> #7  0x0000000000401f59 in _start ()
> (gdb) 

Its hurling in DT_Mdep_printf() here:

> 134         /* Pull out IP address and print it as a sanity check */
> 135         DT_Mdep_printf ("Server Name: %s \n", hostname);
> 136         DT_Mdep_printf ("Server Net Address: %s\n",
> 137                         inet_ntoa(((struct sockaddr_in 
> *)target->ai_addr)->sin_addr));

The ai_addr looks ok though:
> (gdb) p/x *((struct sockaddr_in *)target->ai_addr)
> $3 = {sin_family = 0x2, sin_port = 0x0, sin_addr = {s_addr = 
> 0x8846a8c0}, sin_zero = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
> (gdb)
>

Ever seen this?

Steve.


From Jeffrey.C.Becker at nasa.gov  Thu Apr 24 13:00:24 2008
From: Jeffrey.C.Becker at nasa.gov (Jeff Becker)
Date: Thu, 24 Apr 2008 13:00:24 -0700
Subject: [ofa-general] Re: FW: [ewg] SPAM emails
In-Reply-To: <55CE0347B98FCA468923E5FBC25CB4DC036FCB37@orsmsx413.amr.corp.intel.com>
References: <55CE0347B98FCA468923E5FBC25CB4DC036FCB37@orsmsx413.amr.corp.intel.com>
Message-ID: <4810E6D8.5020708@nasa.gov>

I did see it, and have also been unhappy about the recent increase in 
spam. We do run spamassassin and amavis (for virus checking) on the 
server. However, configuring these is an ongoing project, as the 
attackers figure out how to get around whatever rules we do have. It's 
also somewhat hit and miss, e.g., sometimes, I'll clamp down the rules 
and then perfectly valid and important posts get blocked. I'm happy to 
look at this again, but it might be useful to inquire at John Companies 
if they can supply a hardware firewall such as Barracuda. I think that's 
what NASA uses, as I see these spams in my quarantine daily.

-jeff

Ryan, Jim wrote:
>
> Not sure you would have seen this. The amount of spam has increased 
> dramatically. I wasn’t aware of viruses, but will trust HB’s judgment 
> on that. I know I get called on frequently to block email that’s spam. 
> I get several such requests per day
>
>  
>
> Thanks, Jim
>
>  
>
> ------------------------------------------------------------------------
>
> *From:* ewg-bounces at lists.openfabrics.org 
> [mailto:ewg-bounces at lists.openfabrics.org] *On Behalf Of *Head Bubba
> *Sent:* Thursday, April 24, 2008 10:37 AM
> *To:* ewg at lists.openfabrics.org
> *Subject:* [ewg] SPAM emails
>
>  
>
> can we use a SPAM filter (two of the SPAMs, so far, had links to a 
> known virus that our internal email filtering caught, and 1 last week 
> was a virus that our internal email filtering also caught)- now that 
> virus are starting to be sent, can something be done before some one 
> puts in a legitimate subject line and sends a virus ?
>
>  
>
> h.b.
>
>  
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer: 
>  
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ==============================================================================


From PHF at zurich.ibm.com  Thu Apr 24 13:05:02 2008
From: PHF at zurich.ibm.com (Philip Frey1)
Date: Thu, 24 Apr 2008 22:05:02 +0200
Subject: [ofa-general] AE kernel messages decrypted (Chelsio RNIC T3)
Message-ID: <OF805FFD0D.E2E433AF-ONC1257435.006DB790-C1257435.006E4395@ch.ibm.com>

Hi,

I sometimes see async events reported through /var/log/messages of the 
form

post_qp_event - AE qpid 0x240 opcode 0 status 0x6 type 0 wrid.hi 
0xff650000 wrid.lo 0x0

and alike.
I am now looking for a more meaningful explanation of what is going wrong. 
After some grepping I ended up
in ofa_kernel-1.3/drivers/infiniband/hw/cxgb3/iwch_ev.c where this 
messages is written to the log.
Since there is still no explanation of what status 0x6 means, I continued 
my search and found cxio_wr.h.

Can you point out to me what enums are printed here?

Talking about async events: What would be the recommended way of surfacing 
those AEs at the user application?


Many thanks and best regards,
 Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080424/da899091/attachment.html>

From arlin.r.davis at intel.com  Thu Apr 24 13:21:48 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Thu, 24 Apr 2008 13:21:48 -0700
Subject: [ofa-general] RE: dapl bug [PATCH] dapltest: include definitions for
	inet_ntoa.
In-Reply-To: <4810E62A.6070807@opengridcomputing.com>
References: <4810E62A.6070807@opengridcomputing.com>
Message-ID: <B0095134066CC94FBC80973103FFA1FE06EE1B4D@orsmsx416.amr.corp.intel.com>

Steve,
 
Sorry, this was fixed in v2.0 library but apparently it didn't get
pushed back v1.2. 

dapltest: include definitions for inet_ntoa.

At load time the symbol was resolved but with the
default definition of int, instead of char*, it caused
segfault. Add correct include files in dapl_mdep_user.h
for linux.

Signed-off by: Arlin Davis <ardavis at ichips.intel.com>
---
 test/dapltest/mdep/linux/dapl_mdep_user.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/test/dapltest/mdep/linux/dapl_mdep_user.h
b/test/dapltest/mdep/linux/dapl_mdep_user.h
index 7fadbea..16170a7 100755
--- a/test/dapltest/mdep/linux/dapl_mdep_user.h
+++ b/test/dapltest/mdep/linux/dapl_mdep_user.h
@@ -43,6 +43,11 @@
 #include <string.h>
 #include <sys/times.h>
 
+/* inet_ntoa */
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+
 /* Default Device Name */
 #define DT_MdepDeviceName    "OpenIB-cma"
 
-- 
1.5.2.5


>-----Original Message-----
>From: Steve Wise [mailto:swise at opengridcomputing.com] 
>Sent: Thursday, April 24, 2008 12:58 PM
>To: Arlin Davis
>Cc: OpenFabrics General
>Subject: dapl bug?
>
>Hey Arlin,
>
>Have you ever seen this?  I hit this 100% of the time trying the 1.2 
>version of dapltest on an ofed-1.3 system.  The debug info below was 
>obtained by builting the src rpm with debug enabled...
>
>> (gdb) r -T T -d -s vic11-10g -D chelsio -i 10 client SR 256 
>server SR 
>> 256 client SR 256 server SR 256
>> Starting program: /usr/bin/dapltest -T T -d -s vic11-10g -D 
>chelsio -i 
>> 10 client SR 256 server SR 256 client SR 256 server SR 256
>> [Thread debugging using libthread_db enabled]
>> [New Thread 46912498371600 (LWP 6654)]
>> -------------------------------------
>> TransCmd.server_name              : vic11-10g
>> TransCmd.num_iterations           : 10
>> TransCmd.num_threads              : 1
>> TransCmd.eps_per_thread           : 1
>> TransCmd.validate                 : 0
>> TransCmd.dapl_name                : chelsio
>> TransCmd.num_ops                  : 4
>> TransCmd.op[0].transfer_type      : SEND_RECV  (client)
>> TransCmd.op[0].seg_size           : 256
>> TransCmd.op[0].num_segs           : 1
>> TransCmd.op[0].reap_send_on_recv  : 0
>> TransCmd.op[1].transfer_type      : SEND_RECV  (server)
>> TransCmd.op[1].seg_size           : 256
>> TransCmd.op[1].num_segs           : 1
>> TransCmd.op[1].reap_send_on_recv  : 0
>> TransCmd.op[2].transfer_type      : SEND_RECV  (client)
>> TransCmd.op[2].seg_size           : 256
>> TransCmd.op[2].num_segs           : 1
>> TransCmd.op[2].reap_send_on_recv  : 0
>> TransCmd.op[3].transfer_type      : SEND_RECV  (server)
>> TransCmd.op[3].seg_size           : 256
>> TransCmd.op[3].num_segs           : 1
>> TransCmd.op[3].reap_send_on_recv  : 0
>> Server Name: vic11-10g
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 46912498371600 (LWP 6654)]
>> 0x00000032f04760b0 in strlen () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x00000032f04760b0 in strlen () from /lib64/libc.so.6
>> #1  0x00000032f044602b in vfprintf () from /lib64/libc.so.6
>> #2  0x00000032f044bdea in printf () from /lib64/libc.so.6
>> #3  0x0000000000403900 in DT_NetAddrLookupHostAddress 
>> (to_netaddr=0x7e16f88, hostname=0x7e1658c "vic11-10g") at 
>> cmd/dapl_netaddr.c:136
>> #4  0x00000000004026cb in DT_Params_Parse (argc=<value 
>optimized out>, 
>> argv=<value optimized out>, params_ptr=0x7e16580) at 
>cmd/dapl_params.c:205
>> #5  0x000000000040211f in dapltest (argc=22, argv=0x7fff48e9b5f8) at 
>> cmd/dapl_main.c:88
>> #6  0x00000032f041d8a4 in __libc_start_main () from /lib64/libc.so.6
>> #7  0x0000000000401f59 in _start ()
>> (gdb) 
>
>Its hurling in DT_Mdep_printf() here:
>
>> 134         /* Pull out IP address and print it as a sanity check */
>> 135         DT_Mdep_printf ("Server Name: %s \n", hostname);
>> 136         DT_Mdep_printf ("Server Net Address: %s\n",
>> 137                         inet_ntoa(((struct sockaddr_in 
>> *)target->ai_addr)->sin_addr));
>
>The ai_addr looks ok though:
>> (gdb) p/x *((struct sockaddr_in *)target->ai_addr)
>> $3 = {sin_family = 0x2, sin_port = 0x0, sin_addr = {s_addr = 
>> 0x8846a8c0}, sin_zero = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}
>> (gdb)
>>
>
>Ever seen this?
>
>Steve.
>


From hrosenstock at xsigo.com  Thu Apr 24 14:02:59 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Thu, 24 Apr 2008 14:02:59 -0700
Subject: [ofa-general] kernel-ib on rhel5
In-Reply-To: <1209065280.18036.216.camel@pc.ilinx>
References: <1209065280.18036.216.camel@pc.ilinx>
Message-ID: <1209070979.689.258.camel@hrosenstock-ws.xsigo.com>

On Thu, 2008-04-24 at 15:28 -0400, Brian J. Murrell wrote:
> I wonder, what is the strategy for kernel-ib to exist on a machine with
> the standard RHEL5 kernel installed.  The standard RHEL5 kernel of
> course includes an OFED release and as such modules of the same name as
> the OFED ones.
> 
> I can see that by default, the ofa_kernel.spec installs it's modules
> into /lib/modules/%{KVERSION}/updates but how does that insure than when
> a kernel module is loaded with modprobe that the one
> in /lib/modules/%{KVERSION}/updates will be preferred over the one in
> lib/modules/%{KVERSION}/ (i.e. provided by the RHEL5 kernel RPM)?

module-init-tools and modutils have supported this precedence for some
time now. For modutils, see:
https://rhn.redhat.com/errata/RHBA-2003-327.html

-- Hal


> Thanx,
> b.
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From weiny2 at llnl.gov  Thu Apr 24 14:31:25 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Thu, 24 Apr 2008 14:31:25 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <48109087.6030606@voltaire.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<48109087.6030606@voltaire.com>
Message-ID: <20080424143125.2aad1db8.weiny2@llnl.gov>

On Thu, 24 Apr 2008 16:52:07 +0300
Or Gerlitz <ogerlitz at voltaire.com> wrote:

> Ira Weiny wrote:
> > The symptom is that nodes drop out of the IPoIB mcast group after a node
> > temporarily goes catatonic.  The details are:
> >
> >    1) Issues on a node cause a soft lockup of the node.
> >    2) OpenSM does a normal light sweep.
> >    3) MADs to the node time out since the node is in a "bad state"
> >    4) OpenSM marks the node down and drops it from internal tables, including
> >       mcast groups.
> >    5) Node recovers from soft lock up condition.
> >    6) A subsequent sweep causes OpenSM see the node and add it back to the
> >       fabric.
> As Hal noted, client reregister is the way to go.
> 
> In a similar discussion in the past the conclusion was that the SM 
> should (maybe even according to the spec, but according to common sense 
> is fine as well, I think) set the re-register bit where in that case 
> IPoIB rejoins and we are done. At the time, I understood that openSM 
> would do so 
> (http://lists.openfabrics.org/pipermail/general/2007-September/041237.html), 
> am I wrong, or maybe the case brought on that thread (switch/port going 
> down and a whole sub fabric is removed from the SM point of view where 
> the links remain up from the view point of the nodes) was different? the 
> basic point is a case where a node link is UP and the SM lost this node 
> for some time and now sees it again. We used to call it "the 
> active/active" transition and an SM maybe need special logic for it.
> 

I have set up the following as a test situation

        switch B
       /       \ (link X)
   switch A   switch C
    /           /   \
 Node1      node2  node3
  (SM)

When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast
group.

Debug output from OpenSM indicates it is setting the rereg bit but I don't see
the rejoin in the debug output from the node 2's IPoIB mcast layer.  Perhaps
there is a bug to be squashed here?

Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based
kernel, and OpenSM 3.2.1-8341058-dirty.

I am in the process of tracking this down,
Ira


From swise at opengridcomputing.com  Thu Apr 24 14:32:45 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Thu, 24 Apr 2008 16:32:45 -0500
Subject: [ofa-general] Re: dapl bug [PATCH] dapltest: include definitions for
	inet_ntoa.
In-Reply-To: <B0095134066CC94FBC80973103FFA1FE06EE1B4D@orsmsx416.amr.corp.intel.com>
References: <4810E62A.6070807@opengridcomputing.com>
	<B0095134066CC94FBC80973103FFA1FE06EE1B4D@orsmsx416.amr.corp.intel.com>
Message-ID: <4810FC7D.7040005@opengridcomputing.com>


Davis, Arlin R wrote:
> Steve,
>  
> Sorry, this was fixed in v2.0 library but apparently it didn't get
> pushed back v1.2. 
>
>   

No worries.  Glad you already had seen it.

But should I really be using dapltest on 1.2?  IE is it used by folks to 
regression test udapl and their provider? 

Steve.


From weiny2 at llnl.gov  Thu Apr 24 14:35:55 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Thu, 24 Apr 2008 14:35:55 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <1209064023.689.249.camel@hrosenstock-ws.xsigo.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<1208995514.689.210.camel@hrosenstock-ws.xsigo.com>
	<1209000441.689.216.camel@hrosenstock-ws.xsigo.com>
	<20080424095752.416d5d55.weiny2@llnl.gov>
	<1209064023.689.249.camel@hrosenstock-ws.xsigo.com>
Message-ID: <20080424143555.1daf93fe.weiny2@llnl.gov>

On Thu, 24 Apr 2008 12:07:03 -0700
Hal Rosenstock <hrosenstock at xsigo.com> wrote:

> On Thu, 2008-04-24 at 09:57 -0700, Ira Weiny wrote:
> 
> > > One side comment on the non OpenSM aspect of this: 
> > > 
> > > Why is the node temporarily unavailable ? There is a "contract" that the
> > > node makes with the SM that it clearly isn't honoring. Is any
> > > investigation going on relative to this aspect of the issue ?
> > > 
> > 
> > Yes, we are working on finding the root cause.  I agree that the "contract" is
> > not being honored.  This is one of the reasons I was hesitant to implement any
> > fix to be submitted. 
> 
> I think the two issues can be tackled in parallel.
> 
> > I don't think this is truly a bug in the stack.
> 
> Any ideas on what it is ? If not, would you be willing to try something
> assuming the end node issue is easily reproducible ?

The root cause is something to do with a users job causing this "soft lockup"
in the kernel.  We believe sometimes they will run the node (diskless/no swap)
out of memory.  Under the OOM condition I don't think the node can be trusted.
Unfortunately, this is another case where we can't seem to reproduce the issue
without the users job.  :-(

As per a previous email I was excited about Or mentioning perhaps another way
to simulate this condition on the IB side.  I have set that up and see some
issues there.  I will see what I can find.

> 
> > However, I could see this causing issues for people[*] and it might be nice to
> > have a "fix".
> 
> Sure; both are issues which should be understood better and fixed IMO.

Agreed, I have spoken with our other developer and he is still trying to get
a reproducer.

Ira


From arlin.r.davis at intel.com  Thu Apr 24 15:22:25 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Thu, 24 Apr 2008 15:22:25 -0700
Subject: [ofa-general] RE: dapl bug [PATCH] dapltest: include definitions for
	inet_ntoa.
In-Reply-To: <4810FC7D.7040005@opengridcomputing.com>
References: <4810E62A.6070807@opengridcomputing.com>
	<B0095134066CC94FBC80973103FFA1FE06EE1B4D@orsmsx416.amr.corp.intel.com>
	<4810FC7D.7040005@opengridcomputing.com>
Message-ID: <B0095134066CC94FBC80973103FFA1FE06EE1D32@orsmsx416.amr.corp.intel.com>

 
>But should I really be using dapltest on 1.2?  IE is it used 
>by folks to 
>regression test udapl and their provider? 

Most MPI vendors are still running on top of uDAPL 1.2 so you should
continue to regression test using dapltest 1.2 for now.


From or.gerlitz at gmail.com  Thu Apr 24 15:23:56 2008
From: or.gerlitz at gmail.com (Or Gerlitz)
Date: Fri, 25 Apr 2008 01:23:56 +0300
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <20080424143125.2aad1db8.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<48109087.6030606@voltaire.com>
	<20080424143125.2aad1db8.weiny2@llnl.gov>
Message-ID: <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com>

On 4/25/08, Ira Weiny <weiny2 at llnl.gov> wrote:
>
> When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast
> group.


bad!

Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based
> kernel, and OpenSM 3.2.1-8341058-dirty.


and what is the hca device and fw version at the nodes? maybe you send the
list ipoib (debug_level=1 && multicast_debug_level=1)  debug output?

Or.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080425/615f2b6d/attachment.html>

From weiny2 at llnl.gov  Thu Apr 24 18:16:57 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Thu, 24 Apr 2008 18:16:57 -0700
Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting
	rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast
	group due to a temporary node soft lockup.)
In-Reply-To: <15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<48109087.6030606@voltaire.com>
	<20080424143125.2aad1db8.weiny2@llnl.gov>
	<15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com>
Message-ID: <20080424181657.28d58a29.weiny2@llnl.gov>

On Fri, 25 Apr 2008 01:23:56 +0300
"Or Gerlitz" <or.gerlitz at gmail.com> wrote:

> On 4/25/08, Ira Weiny <weiny2 at llnl.gov> wrote:
> >
> > When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast
> > group.
> 
> 
> bad!
> 
> Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based
> > kernel, and OpenSM 3.2.1-8341058-dirty.
> 
> 
> and what is the hca device and fw version at the nodes? maybe you send the
> list ipoib (debug_level=1 && multicast_debug_level=1)  debug output?
> 

I did not get any output with multicast_debug_level!  But I added some more
debugging and finally realized that the set was not being sent.  :-(  I put a
debug statement in OpenSM where the flag was set and therefore thought that
OpenSM had set the rereg bit.  However, since no other data had changed the
"set" MAD was not sent.  (I am getting a bit tongue tied reading this back.  I
hope that all makes sense.)

Here is a patch which fixes the problem.  (At least with the partial sub-nets
configuration I explained before.)  I will have to verify this fixes the problem
I originally reported.

Ira


>From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001
From: Ira K. Weiny <weiny2 at llnl.gov>
Date: Thu, 24 Apr 2008 18:05:01 -0700
Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit


Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>
---
 opensm/opensm/osm_lid_mgr.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/opensm/opensm/osm_lid_mgr.c b/opensm/opensm/osm_lid_mgr.c
index ab23929..4d628d2 100644
--- a/opensm/opensm/osm_lid_mgr.c
+++ b/opensm/opensm/osm_lid_mgr.c
@@ -1099,9 +1099,14 @@ __osm_lid_mgr_set_physp_pi(IN osm_lid_mgr_t * const p_mgr,
 	if ((p_mgr->p_subn->first_time_master_sweep == TRUE || p_port->is_new)
 	    && !p_mgr->p_subn->opt.no_clients_rereg
 	    && ((p_old_pi->capability_mask & IB_PORT_CAP_HAS_CLIENT_REREG) !=
-		0))
+		0)) {
+		OSM_LOG(p_mgr->p_log, OSM_LOG_DEBUG,
+			"Seting client rereg on %s, port %d\n",
+			p_port->p_node->print_desc,
+			p_port->p_physp->port_num);
 		ib_port_info_set_client_rereg(p_pi, 1);
-	else
+		send_set = TRUE;
+	} else
 		ib_port_info_set_client_rereg(p_pi, 0);
 
 	/* We need to send the PortInfo Set request with the new sm_lid
-- 
1.5.1


From a-b-j at adaptive-eyecare.com  Fri Apr 25 04:31:46 2008
From: a-b-j at adaptive-eyecare.com (Maryanne Krause)
Date: Fri, 25 Apr 2008 12:31:46 +0100
Subject: [ofa-general] good luck
Message-ID: <01c8a6d0$533d6500$ac43214f@a-b-j>

Hello! I am tired this evening. I am nice girl that would like to chat with you. Email me at Erika at nextnoungen.cn only, because I am using my friend's email to write this. I will reply with my pics


From andrea at qumranet.com  Fri Apr 25 09:56:40 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Fri, 25 Apr 2008 18:56:40 +0200
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <200804221506.26226.rusty@rustcorp.com.au>
References: <ec6d8f91b299cf26cce5.1207669444@duo.random>
	<200804221506.26226.rusty@rustcorp.com.au>
Message-ID: <20080425165639.GA23300@duo.random>

I somehow lost missed this email in my inbox, found it now because it
was strangely still unread... Sorry for the late reply!

On Tue, Apr 22, 2008 at 03:06:24PM +1000, Rusty Russell wrote:
> On Wednesday 09 April 2008 01:44:04 Andrea Arcangeli wrote:
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1050,6 +1050,15 @@
> >  				   unsigned long addr, unsigned long len,
> >  				   unsigned long flags, struct page **pages);
> >
> > +struct mm_lock_data {
> > +	spinlock_t **i_mmap_locks;
> > +	spinlock_t **anon_vma_locks;
> > +	unsigned long nr_i_mmap_locks;
> > +	unsigned long nr_anon_vma_locks;
> > +};
> > +extern struct mm_lock_data *mm_lock(struct mm_struct * mm);
> > +extern void mm_unlock(struct mm_struct *mm, struct mm_lock_data *data);
> 
> As far as I can tell you don't actually need to expose this struct at all?

Yes, it should be possible to only expose 'struct mm_lock_data;'.

> > +		data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
> > +					     sizeof(spinlock_t));
> 
> This is why non-typesafe allocators suck.  You want 'sizeof(spinlock_t *)' 
> here.
> 
> > +		data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
> > +					       sizeof(spinlock_t));
> 
> and here.

Great catch! (it was temporarily wasting some ram which isn't nice at all)

> > +	err = -EINTR;
> > +	i_mmap_lock_last = NULL;
> > +	nr_i_mmap_locks = 0;
> > +	for (;;) {
> > +		spinlock_t *i_mmap_lock = (spinlock_t *) -1UL;
> > +		for (vma = mm->mmap; vma; vma = vma->vm_next) {
> ...
> > +		data->i_mmap_locks[nr_i_mmap_locks++] = i_mmap_lock;
> > +	}
> > +	data->nr_i_mmap_locks = nr_i_mmap_locks;
> 
> How about you track your running counter in data->nr_i_mmap_locks, leave 
> nr_i_mmap_locks alone, and BUG_ON(data->nr_i_mmap_locks != nr_i_mmap_locks)?
> 
> Even nicer would be to wrap this in a "get_sorted_mmap_locks()" function.

I'll try to clean this up further and I'll make a further update for review.

> Unfortunately, I just don't think we can fail locking like this.  In your next 
> patch unregistering a notifier can fail because of it: that not usable.

Fortunately I figured out we don't really need mm_lock in unregister
because it's ok to unregister in the middle of the range_begin/end
critical section (that's definitely not ok for register that's why
register needs mm_lock). And it's perfectly ok to fail in register().

Also it wasn't ok to unpin the module count in ->release as ->release
needs to 'ret' to get back to the mmu notifier code. And without any
unregister at all, the module can't be unloaded at all which
is quite unacceptable...

The logic is to prevent mmu_notifier_register to race with
mmu_notifier_release because it takes the mm_users pin (implicit or
explicit, and then mmput just after mmu_notifier_register
returns). Then _register serializes against all the mmu notifier
methods (except ->release) with srcu (->release can't run thanks to
the mm_users pin). The mmu_notifier_mm->lock then serializes the
modification on the list (register vs unregister) and it ensures one
and only one between _unregister and _releases calls ->release before
_unregister returns. All other methods runs freely with srcu. Having
the guarante that ->release is called just before all pages are freed
or inside _unregister, allows the module to zap and freeze its
secondary mmu inside ->release with the race condition of exit()
against mmu_notifier_unregister internally by the mmu notifier code
and without dependency on exit_files/exit_mm ordering depending if the
fd of the driver is open the filetables or in the vma only. The
mmu_notifier_mm can be reset to 0 only after the last mmdrop.

About the mm_count refcounting for _release and _unregiste: no mmu
notifier and not even mmu_notifier_unregister and _release can cope
with mmu_notfier_mm list and srcu structures going away out of
order. exit_mmap is safe as it holds an mm_count implicitly because
mmdrop is run after exit_mmap returns. mmu_notifier_unregister is safe
too as _register takes the mm_count pin. We can't prevent
mmu_notifer_mm to go away with mm_users as that will screwup the vma
filedescriptor closure that only happens inside exit_mmap (mm_users
pinned prevents exit_mmap to run, and it can only be taken temporarily
until _register returns).


From andrea at qumranet.com  Fri Apr 25 10:04:25 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Fri, 25 Apr 2008 19:04:25 +0200
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <20080425165639.GA23300@duo.random>
References: <ec6d8f91b299cf26cce5.1207669444@duo.random>
	<200804221506.26226.rusty@rustcorp.com.au>
	<20080425165639.GA23300@duo.random>
Message-ID: <20080425170425.GB23300@duo.random>

On Fri, Apr 25, 2008 at 06:56:39PM +0200, Andrea Arcangeli wrote:
> > > +		data->i_mmap_locks = vmalloc(nr_i_mmap_locks *
> > > +					     sizeof(spinlock_t));
> > 
> > This is why non-typesafe allocators suck.  You want 'sizeof(spinlock_t *)' 
> > here.
> > 
> > > +		data->anon_vma_locks = vmalloc(nr_anon_vma_locks *
> > > +					       sizeof(spinlock_t));
> > 
> > and here.
> 
> Great catch! (it was temporarily wasting some ram which isn't nice at all)

As I went into the editor I just found the above already fixed in
#v14-pre3. And I can't move the structure into the file anymore
without kmallocing it. Exposing that structure avoids the
ERR_PTR/PTR_ERR on the retvals and one kmalloc so I think it makes the
code simpler in the end to keep it as it is now. I'd rather avoid
further changes to the 1/N patch, as long as they don't make any
difference at runtime and as long as they involve more than
cut-and-pasting a structure from .h to .c file.


From holt at sgi.com  Fri Apr 25 12:25:32 2008
From: holt at sgi.com (Robin Holt)
Date: Fri, 25 Apr 2008 14:25:32 -0500
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <20080425165639.GA23300@duo.random>
References: <ec6d8f91b299cf26cce5.1207669444@duo.random>
	<200804221506.26226.rusty@rustcorp.com.au>
	<20080425165639.GA23300@duo.random>
Message-ID: <20080425192532.GA19717@sgi.com>

On Fri, Apr 25, 2008 at 06:56:40PM +0200, Andrea Arcangeli wrote:
> Fortunately I figured out we don't really need mm_lock in unregister
> because it's ok to unregister in the middle of the range_begin/end
> critical section (that's definitely not ok for register that's why
> register needs mm_lock). And it's perfectly ok to fail in register().

I think you still need mm_lock (unless I miss something).  What happens
when one callout is scanning mmu_notifier_invalidate_range_start() and
you unlink.  That list next pointer with LIST_POISON1 which is a really
bad address for the processor to track.

Maybe I misunderstood your description.

Thanks,
Robin


From rdreier at cisco.com  Fri Apr 25 14:30:33 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 25 Apr 2008 14:30:33 -0700
Subject: [ofa-general][PATCH 2/12 v1] mlx4: HW queues resource management
In-Reply-To: <480F4D7F.8000707@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 23 Apr 2008 17:53:51 +0300")
References: <480F4D7F.8000707@mellanox.co.il>
Message-ID: <ada4p9p7hc6.fsf@cisco.com>

thanks, applied.


From rdreier at cisco.com  Fri Apr 25 14:33:28 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 25 Apr 2008 14:33:28 -0700
Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old
	FMR R_Keys too soon
In-Reply-To: <200804241109.52448.okir@lst.de> (Olaf Kirch's message of "Thu,
	24 Apr 2008 11:09:51 +0200")
References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de>
	<200804241109.52448.okir@lst.de>
Message-ID: <adazlrh62mv.fsf@cisco.com>

Looks mostly OK... the only thing I worry about is in the Sinai
optimization case, do we run into trouble with bits getting carried into
the top bits of the key?

Can someone from Mellanox review this more carefully?

 - R.


From rdreier at cisco.com  Fri Apr 25 14:53:23 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Fri, 25 Apr 2008 14:53:23 -0700
Subject: [ofa-general][PATCH 12/12 v1] mlx4: QP to ready
In-Reply-To: <480F519D.6060101@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 23 Apr 2008 18:11:25 +0300")
References: <480F519D.6060101@mellanox.co.il>
Message-ID: <adave2561po.fsf@cisco.com>

thanks, applied


From andrea at qumranet.com  Fri Apr 25 17:57:26 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sat, 26 Apr 2008 02:57:26 +0200
Subject: [ofa-general] Re: [PATCH 1 of 9] Lock the entire mm to prevent any
	mmu related operation to happen
In-Reply-To: <20080425192532.GA19717@sgi.com>
References: <ec6d8f91b299cf26cce5.1207669444@duo.random>
	<200804221506.26226.rusty@rustcorp.com.au>
	<20080425165639.GA23300@duo.random>
	<20080425192532.GA19717@sgi.com>
Message-ID: <20080426005726.GA9514@duo.random>

On Fri, Apr 25, 2008 at 02:25:32PM -0500, Robin Holt wrote:
> I think you still need mm_lock (unless I miss something).  What happens
> when one callout is scanning mmu_notifier_invalidate_range_start() and
> you unlink.  That list next pointer with LIST_POISON1 which is a really
> bad address for the processor to track.

Ok, _release list_del_init qcan't race with that because it happens in
exit_mmap when no other mmu notifier can trigger anymore.

_unregister can run concurrently but it does list_del_rcu, that only
overwrites the pprev pointer with LIST_POISON2. The
mmu_notifier_invalidate_range_start won't crash on LIST_POISON1 thanks
to srcu.

Actually I did more changes than necessary, for example I noticed the
mmu_notifier_register can return a list_add_head instead of
list_add_head_rcu. _register can't race against _release thanks to the
mm_users temporary or implicit pin. _register can't race against
_unregister thanks to the mmu_notifier_mm->lock. And register can't
race against all other mmu notifiers thanks to the mm_lock.

At this time I've no other pending patches on top of v14-pre3 other
than the below micro-optimizing cleanup. It'd be great to have
confirmation that v14-pre3 passes GRU/XPMEM regressions tests as well
as my KVM testing already passed successfully on it. I'll forward
v14-pre3 mmu-notifier-core plus the below to Andrew tomorrow, I'm
trying to be optimistic here! ;)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -187,7 +187,7 @@ int mmu_notifier_register(struct mmu_not
 	 * current->mm or explicitly with get_task_mm() or similar).
 	 */
 	spin_lock(&mm->mmu_notifier_mm->lock);
-	hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier_mm->list);
+	hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list);
 	spin_unlock(&mm->mmu_notifier_mm->lock);
 out_unlock:
 	mm_unlock(mm, &data);


From pukras at tipower.com  Sat Apr 26 02:59:57 2008
From: pukras at tipower.com (Josh Sanders)
Date: Sat, 26 Apr 2008 09:59:57 -0000
Subject: [ofa-general] Photoshop CS3
Message-ID: <000301c8a782$c9020900$0100007f@ksfains>

Adobe CS3 Master Collection for PC or MAC includes:
~ InDesign CS3
~ Photoshop CS3
~ Illustrator CS3
~ Acrobat 8 Professional
~ Flash CS3 Professional
~ Dreamweaver CS3
~ Fireworks CS3
~ Contribute CS3
~ After Effects CS3 Professional
~ Premiere Pro CS3
~ Encore DVD CS3
~ Soundbooth CS3

~ softcheapnew. com in Internet browser

System Requirements

For PC:
~ Intel Pentium 4 (1.4GHz processor for DV; 3.4GHz processor for HDV), Intel Centrino, Intel Xeon, (dual 2.8GHz processors for HD), or Intel Core
~ Duo (or compatible) processor; SSE2-enabled processor required for AMD systems
~ Microsoft Windows XP with Service Pack 2 or Microsoft Windows Vista Home Premium, Business, Ultimate, or Enterprise (certified for 32-bit editions)
~ 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components
~ 38GB of available hard-disk space (additional free space required during installation)
~ Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred
~ Microsoft DirectX compatible sound card (multichannel ASIO-compatible sound card recommended)
~ 1,280x1,024 monitor resolution with 32-bit color adapter
~ DVD-ROM drive

For MAC:
~ PowerPC G4 or G5 or multicore Intel processor (Adobe Premiere Pro, Encore, and Soundbooth require a multicore Intel processor; Adobe OnLocation CS3 is a Windows application and may be used with Boot Camp)
~ Mac OS X v.10.4.9; Java Runtime Environment 1.5 required for Adobe Version Cue CS3 Server
~ 1GB of RAM for DV; 2GB of RAM for HDV and HD; more RAM recommended when running multiple components
~ 36GB of available hard-disk space (additional free space required during installation)
~ Dedicated 7,200 RPM hard drive for DV and HDV editing; striped disk array storage (RAID 0) for HD; SCSI disk subsystem preferred
~ Core Audio compatible sound card
~ 1,280x1,024 monitor resolution with 32-bit color adapter
~ DVD-ROM drive~ DVD+-R burner required for DVD creation

Popular native rock bands topping the charts in France this year are writing and singing in English. At the country's oldest and biggest rock festival this week, the young talent section's performance featured only English lyrics. Francophiles are calling it a threat to the French language and culture.

Intel Experts: Video Shows Nuclear Activity in Syria


From a-007 at abilenetx.com  Sat Apr 26 03:02:39 2008
From: a-007 at abilenetx.com (Ana Krueger)
Date: Sat, 26 Apr 2008 12:02:39 +0200
Subject: [ofa-general] hope you remember me
Message-ID: <122509485.19843882038351@abilenetx.com>

Hello! I am bored this evening. I am nice girl that would like to chat with you. Email me at Helena at whypapeal.cn only, because I am using my friend's email to write this. Hope you like my pictures.


From holt at sgi.com  Sat Apr 26 06:17:34 2008
From: holt at sgi.com (Robin Holt)
Date: Sat, 26 Apr 2008 08:17:34 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080424174145.GM24536@duo.random>
References: <Pine.LNX.4.64.0804221315160.3640@schroedinger.engr.sgi.com>
	<20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
Message-ID: <20080426131734.GB19717@sgi.com>

On Thu, Apr 24, 2008 at 07:41:45PM +0200, Andrea Arcangeli wrote:
> A full new update will some become visible here:
> 
> 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14-pre3/

I grabbed these and built them.  Only change needed was another include.
After that, everything built fine and xpmem regression tests ran through
the first four sets.  The fifth is the oversubscription test which trips
my xpmem bug.  This is as good as the v12 runs from before.

Since this include and the one for mm_types.h both are build breakages
for ia64, I think you need to apply your ia64_cpumask and the following
(possibly as a single patch) first or in your patch 1.  Without that,
ia64 doing a git-bisect could hit a build failure.


Index: mmu_v14_pre3_xpmem_v003_v1/include/linux/srcu.h
===================================================================
--- mmu_v14_pre3_xpmem_v003_v1.orig/include/linux/srcu.h	2008-04-26 06:41:54.000000000 -0500
+++ mmu_v14_pre3_xpmem_v003_v1/include/linux/srcu.h	2008-04-26 07:01:17.292071827 -0500
@@ -27,6 +27,8 @@
 #ifndef _LINUX_SRCU_H
 #define _LINUX_SRCU_H
 
+#include <linux/mutex.h>
+
 struct srcu_struct_array {
 	int c[2];
 };


From andrea at qumranet.com  Sat Apr 26 07:04:06 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sat, 26 Apr 2008 16:04:06 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080426131734.GB19717@sgi.com>
References: <20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com>
Message-ID: <20080426140406.GH9514@duo.random>

On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote:
> Since this include and the one for mm_types.h both are build breakages
> for ia64, I think you need to apply your ia64_cpumask and the following
> (possibly as a single patch) first or in your patch 1.  Without that,
> ia64 doing a git-bisect could hit a build failure.

Agreed, so it doesn't risk to break ia64 compilation, thanks for the
great XPMEM feedback!

Also note, I figured out that mmu_notifier_release can actually run
concurrently against other mmu notifiers in case there's a vmtruncate
(->release could already run concurrently if invoked by _unregister,
the only guarantee is that ->release will be called one time and only
one time and that no mmu notifier will ever run after _unregister
returns).

In short I can't keep the list_del_init in _release and I need a
list_del_init_rcu instead to fix this minor issue. So this won't
really make much difference after all.

I'll release #v14 with all this after a bit of kvm testing with it...

diff --git a/include/linux/list.h b/include/linux/list.h
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -755,6 +755,14 @@ static inline void hlist_del_init(struct
 	}
 }
 
+static inline void hlist_del_init_rcu(struct hlist_node *n)
+{
+	if (!hlist_unhashed(n)) {
+		__hlist_del(n);
+		n->pprev = NULL;
+	}
+}
+
 /**
  * hlist_replace_rcu - replace old entry by new one
  * @old : the element to be replaced
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -22,7 +22,10 @@ struct mmu_notifier_ops {
 	/*
 	 * Called either by mmu_notifier_unregister or when the mm is
 	 * being destroyed by exit_mmap, always before all pages are
-	 * freed. It's mandatory to implement this method.
+	 * freed. It's mandatory to implement this method. This can
+	 * run concurrently to other mmu notifier methods and it
+	 * should teardown all secondary mmu mappings and freeze the
+	 * secondary mmu.
 	 */
 	void (*release)(struct mmu_notifier *mn,
 			struct mm_struct *mm);
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -19,12 +19,13 @@
 
 /*
  * This function can't run concurrently against mmu_notifier_register
- * or any other mmu notifier method. mmu_notifier_register can only
- * run with mm->mm_users > 0 (and exit_mmap runs only when mm_users is
- * zero). All other tasks of this mm already quit so they can't invoke
- * mmu notifiers anymore. This can run concurrently only against
- * mmu_notifier_unregister and it serializes against it with the
- * mmu_notifier_mm->lock in addition to RCU. struct mmu_notifier_mm
+ * because mm->mm_users > 0 during mmu_notifier_register and exit_mmap
+ * runs with mm_users == 0. Other tasks may still invoke mmu notifiers
+ * in parallel despite there's no task using this mm anymore, through
+ * the vmas outside of the exit_mmap context, like with
+ * vmtruncate. This serializes against mmu_notifier_unregister with
+ * the mmu_notifier_mm->lock in addition to SRCU and it serializes
+ * against the other mmu notifiers with SRCU. struct mmu_notifier_mm
  * can't go away from under us as exit_mmap holds a mm_count pin
  * itself.
  */
@@ -44,7 +45,7 @@ void __mmu_notifier_release(struct mm_st
 		 * to wait ->release to finish and
 		 * mmu_notifier_unregister to return.
 		 */
-		hlist_del_init(&mn->hlist);
+		hlist_del_init_rcu(&mn->hlist);
 		/*
 		 * SRCU here will block mmu_notifier_unregister until
 		 * ->release returns.
@@ -185,6 +186,8 @@ int mmu_notifier_register(struct mmu_not
 	 * side note: mmu_notifier_release can't run concurrently with
 	 * us because we hold the mm_users pin (either implicitly as
 	 * current->mm or explicitly with get_task_mm() or similar).
+	 * We can't race against any other mmu notifiers either thanks
+	 * to mm_lock().
 	 */
 	spin_lock(&mm->mmu_notifier_mm->lock);
 	hlist_add_head(&mn->hlist, &mm->mmu_notifier_mm->list);


From andrea at qumranet.com  Sat Apr 26 09:46:38 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sat, 26 Apr 2008 18:46:38 +0200
Subject: [ofa-general] mmu notifier #v14
Message-ID: <20080426164511.GJ9514@duo.random>

Hello everyone,

here it is the mmu notifier #v14.

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/

Please everyone involved review and (hopefully ;) ack that this is
safe to go in 2.6.26, the most important is to verify that this is a
noop when disarmed regardless of MMU_NOTIFIER=y or =n.

	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core

I'll be sending that patch to Andrew inbox.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8d45fab..ce3251c 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
 	select PREEMPT_NOTIFIERS
+	select MMU_NOTIFIER
 	select ANON_INODES
 	---help---
 	  Support hosting fully virtualized guest machines using hardware
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2ad6f54..853087a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -663,6 +663,108 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
 	account_shadowed(kvm, gfn);
 }
 
+static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
+{
+	struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT);
+	get_page(page);
+	rmap_remove(kvm, spte);
+	set_shadow_pte(spte, shadow_trap_nonpresent_pte);
+	kvm_flush_remote_tlbs(kvm);
+	put_page(page);
+}
+
+static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+	u64 *spte, *curr_spte;
+
+	spte = rmap_next(kvm, rmapp, NULL);
+	while (spte) {
+		BUG_ON(!(*spte & PT_PRESENT_MASK));
+		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
+		curr_spte = spte;
+		spte = rmap_next(kvm, rmapp, spte);
+		kvm_unmap_spte(kvm, curr_spte);
+	}
+}
+
+void kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	int i;
+
+	/*
+	 * If mmap_sem isn't taken, we can look the memslots with only
+	 * the mmu_lock by skipping over the slots with userspace_addr == 0.
+	 */
+	for (i = 0; i < kvm->nmemslots; i++) {
+		struct kvm_memory_slot *memslot = &kvm->memslots[i];
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		/* mmu_lock protects userspace_addr */
+		if (!start)
+			continue;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+			kvm_unmap_rmapp(kvm, &memslot->rmap[gfn_offset]);
+		}
+	}
+}
+
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+	u64 *spte;
+	int young = 0;
+
+	spte = rmap_next(kvm, rmapp, NULL);
+	while (spte) {
+		int _young;
+		u64 _spte = *spte;
+		BUG_ON(!(_spte & PT_PRESENT_MASK));
+		_young = _spte & PT_ACCESSED_MASK;
+		if (_young) {
+			young = !!_young;
+			set_shadow_pte(spte, _spte & ~PT_ACCESSED_MASK);
+		}
+		spte = rmap_next(kvm, rmapp, spte);
+	}
+	return young;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	int i;
+	int young = 0;
+
+	/*
+	 * If mmap_sem isn't taken, we can look the memslots with only
+	 * the mmu_lock by skipping over the slots with userspace_addr == 0.
+	 */
+	spin_lock(&kvm->mmu_lock);
+	for (i = 0; i < kvm->nmemslots; i++) {
+		struct kvm_memory_slot *memslot = &kvm->memslots[i];
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		/* mmu_lock protects userspace_addr */
+		if (!start)
+			continue;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+			young |= kvm_age_rmapp(kvm, &memslot->rmap[gfn_offset]);
+		}
+	}
+	spin_unlock(&kvm->mmu_lock);
+
+	if (young)
+		kvm_flush_remote_tlbs(kvm);
+
+	return young;
+}
+
 #ifdef MMU_DEBUG
 static int is_empty_shadow_page(u64 *spt)
 {
@@ -1200,6 +1302,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	int r;
 	int largepage = 0;
 	pfn_t pfn;
+	int mmu_seq;
 
 	down_read(&current->mm->mmap_sem);
 	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
@@ -1207,6 +1310,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 		largepage = 1;
 	}
 
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 
@@ -1217,6 +1322,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	}
 
 	spin_lock(&vcpu->kvm->mmu_lock);
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		goto out_unlock;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq))
+		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, v, write, largepage, gfn, pfn,
 			 PT32E_ROOT_LEVEL);
@@ -1224,6 +1334,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 
 
 	return r;
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
 }
 
 
@@ -1355,6 +1470,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	int r;
 	int largepage = 0;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
+	int mmu_seq;
 
 	ASSERT(vcpu);
 	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -1368,6 +1484,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		largepage = 1;
 	}
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 	if (is_error_pfn(pfn)) {
@@ -1375,12 +1493,22 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		return 1;
 	}
 	spin_lock(&vcpu->kvm->mmu_lock);
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		goto out_unlock;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq))
+		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
 			 largepage, gfn, pfn, TDP_ROOT_LEVEL);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
 }
 
 static void nonpaging_free(struct kvm_vcpu *vcpu)
@@ -1643,11 +1771,11 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	int r;
 	u64 gpte = 0;
 	pfn_t pfn;
-
-	vcpu->arch.update_pte.largepage = 0;
+	int mmu_seq;
+	int largepage;
 
 	if (bytes != 4 && bytes != 8)
-		return;
+		goto out_lock;
 
 	/*
 	 * Assume that the pte write on a page table of the same type
@@ -1660,7 +1788,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		if ((bytes == 4) && (gpa % 4 == 0)) {
 			r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8);
 			if (r)
-				return;
+				goto out_lock;
 			memcpy((void *)&gpte + (gpa % 8), new, 4);
 		} else if ((bytes == 8) && (gpa % 8 == 0)) {
 			memcpy((void *)&gpte, new, 8);
@@ -1670,23 +1798,35 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 			memcpy((void *)&gpte, new, 4);
 	}
 	if (!is_present_pte(gpte))
-		return;
+		goto out_lock;
 	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
+	largepage = 0;
 	down_read(&current->mm->mmap_sem);
 	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
-		vcpu->arch.update_pte.largepage = 1;
+		largepage = 1;
 	}
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 
-	if (is_error_pfn(pfn)) {
-		kvm_release_pfn_clean(pfn);
-		return;
-	}
+	if (is_error_pfn(pfn))
+		goto out_release_and_lock;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+	BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn));
 	vcpu->arch.update_pte.gfn = gfn;
 	vcpu->arch.update_pte.pfn = pfn;
+	vcpu->arch.update_pte.largepage = largepage;
+	vcpu->arch.update_pte.mmu_seq = mmu_seq;
+	return;
+
+out_release_and_lock:
+	kvm_release_pfn_clean(pfn);
+out_lock:
+	spin_lock(&vcpu->kvm->mmu_lock);
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -1711,7 +1851,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 
 	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 	mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
-	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
 	++vcpu->kvm->stat.mmu_pte_write;
 	kvm_mmu_audit(vcpu, "pre pte write");
@@ -1790,11 +1929,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		}
 	}
 	kvm_mmu_audit(vcpu, "post pte write");
-	spin_unlock(&vcpu->kvm->mmu_lock);
 	if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
 		kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
 		vcpu->arch.update_pte.pfn = bad_pfn;
 	}
+	spin_unlock(&vcpu->kvm->mmu_lock);
 }
 
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 156fe10..4ac73a6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -263,6 +263,12 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	pfn = vcpu->arch.update_pte.pfn;
 	if (is_error_pfn(pfn))
 		return;
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		return;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) !=
+		     vcpu->arch.update_pte.mmu_seq))
+		return;
 	kvm_get_pfn(pfn);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
 		     gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte),
@@ -380,6 +386,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	int r;
 	pfn_t pfn;
 	int largepage = 0;
+	int mmu_seq;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 	kvm_mmu_audit(vcpu, "pre page fault");
@@ -413,6 +420,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 			largepage = 1;
 		}
 	}
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
 	up_read(&current->mm->mmap_sem);
 
@@ -424,6 +433,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	}
 
 	spin_lock(&vcpu->kvm->mmu_lock);
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		goto out_unlock;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq))
+		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
 				  largepage, &write_pt, pfn);
@@ -439,6 +453,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return write_pt;
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
 }
 
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ce5563..860559a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/mman.h>
 #include <linux/highmem.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/msr.h>
@@ -3859,15 +3860,152 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 	free_page((unsigned long)vcpu->arch.pio_data);
 }
 
+static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
+{
+	struct kvm_arch *kvm_arch;
+	kvm_arch = container_of(mn, struct kvm_arch, mmu_notifier);
+	return container_of(kvm_arch, struct kvm, arch);
+}
+
+static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
+					     struct mm_struct *mm,
+					     unsigned long address)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	/*
+	 * When ->invalidate_page runs, the linux pte has been zapped
+	 * already but the page is still allocated until
+	 * ->invalidate_page returns. So if we increase the sequence
+	 * here the kvm page fault will notice if the spte can't be
+	 * established because the page is going to be freed. If
+	 * instead the kvm page fault establishes the spte before
+	 * ->invalidate_page runs, kvm_unmap_hva will release it
+	 * before returning.
+
+	 * No need of memory barriers as the sequence increase only
+	 * need to be seen at spin_unlock time, and not at spin_lock
+	 * time.
+	 *
+	 * Increasing the sequence after the spin_unlock would be
+	 * unsafe because the kvm page fault could then establish the
+	 * pte after kvm_unmap_hva returned, without noticing the page
+	 * is going to be freed.
+	 */
+	atomic_inc(&kvm->arch.mmu_notifier_seq);
+	spin_lock(&kvm->mmu_lock);
+	kvm_unmap_hva(kvm, address);
+	spin_unlock(&kvm->mmu_lock);
+}
+
+static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
+						    struct mm_struct *mm,
+						    unsigned long start,
+						    unsigned long end)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+
+	/*
+	 * The count increase must become visible at unlock time as no
+	 * spte can be established without taking the mmu_lock and
+	 * count is also read inside the mmu_lock critical section.
+	 */
+	atomic_inc(&kvm->arch.mmu_notifier_count);
+
+	spin_lock(&kvm->mmu_lock);
+	for (; start < end; start += PAGE_SIZE)
+		kvm_unmap_hva(kvm, start);
+	spin_unlock(&kvm->mmu_lock);
+}
+
+static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
+						  struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long end)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	/*
+	 *
+	 * This sequence increase will notify the kvm page fault that
+	 * the page that is going to be mapped in the spte could have
+	 * been freed.
+	 *
+	 * There's also an implicit mb() here in this comment,
+	 * provided by the last PT lock taken to zap pagetables, and
+	 * that the read side has to take too in follow_page(). The
+	 * sequence increase in the worst case will become visible to
+	 * the kvm page fault after the spin_lock of the last PT lock
+	 * of the last PT-lock-protected critical section preceeding
+	 * invalidate_range_end. So if the kvm page fault is about to
+	 * establish the spte inside the mmu_lock, while we're freeing
+	 * the pages, it will have to backoff and when it retries, it
+	 * will have to take the PT lock before it can check the
+	 * pagetables again. And after taking the PT lock it will
+	 * re-establish the pte even if it will see the already
+	 * increased sequence number before calling gfn_to_pfn.
+	 */
+	atomic_inc(&kvm->arch.mmu_notifier_seq);
+	/*
+	 * The sequence increase must be visible before count
+	 * decrease. The page fault has to read count before sequence
+	 * for this write order to be effective.
+	 */
+	wmb();
+	atomic_dec(&kvm->arch.mmu_notifier_count);
+	BUG_ON(atomic_read(&kvm->arch.mmu_notifier_count) < 0);
+}
+
+static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					      struct mm_struct *mm,
+					      unsigned long address)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	return kvm_age_hva(kvm, address);
+}
+
+static void kvm_free_vcpus(struct kvm *kvm);
+/* This must zap all the sptes because all pages will be freed then */
+static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	BUG_ON(mm != kvm->mm);
+
+	kvm_destroy_common_vm(kvm);
+
+	kvm_free_pit(kvm);
+	kfree(kvm->arch.vpic);
+	kfree(kvm->arch.vioapic);
+	kvm_free_vcpus(kvm);
+	kvm_free_physmem(kvm);
+	if (kvm->arch.apic_access_page)
+		put_page(kvm->arch.apic_access_page);
+}
+
+static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
+	.release		= kvm_mmu_notifier_release,
+	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
+	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
+	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
+	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
+};
+
 struct  kvm *kvm_arch_create_vm(void)
 {
 	struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	int err;
 
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 
+	kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
+	err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
+	if (err) {
+		kfree(kvm);
+		return ERR_PTR(err);
+	}
+
 	return kvm;
 }
 
@@ -3899,13 +4037,12 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	kvm_free_pit(kvm);
-	kfree(kvm->arch.vpic);
-	kfree(kvm->arch.vioapic);
-	kvm_free_vcpus(kvm);
-	kvm_free_physmem(kvm);
-	if (kvm->arch.apic_access_page)
-		put_page(kvm->arch.apic_access_page);
+	/*
+	 * kvm_mmu_notifier_release() will be called before
+	 * mmu_notifier_unregister returns, if it didn't run
+	 * already.
+	 */
+	mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm);
 	kfree(kvm);
 }
 
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 9d963cd..f07e321 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -13,6 +13,7 @@
 
 #include <linux/types.h>
 #include <linux/mm.h>
+#include <linux/mmu_notifier.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -247,6 +248,7 @@ struct kvm_vcpu_arch {
 		gfn_t gfn;	/* presumed gfn during guest pte update */
 		pfn_t pfn;	/* pfn corresponding to that gfn */
 		int largepage;
+		int mmu_seq;
 	} update_pte;
 
 	struct i387_fxsave_struct host_fx_image;
@@ -314,6 +316,10 @@ struct kvm_arch{
 	struct page *apic_access_page;
 
 	gpa_t wall_clock;
+
+	struct mmu_notifier mmu_notifier;
+	atomic_t mmu_notifier_seq;
+	atomic_t mmu_notifier_count;
 };
 
 struct kvm_vm_stat {
@@ -434,6 +440,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu);
 int kvm_mmu_setup(struct kvm_vcpu *vcpu);
 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
 
+void kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
 void kvm_mmu_zap_all(struct kvm *kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4e16682..f089edc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -267,6 +267,7 @@ void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 
 void kvm_free_physmem(struct kvm *kvm);
+void kvm_destroy_common_vm(struct kvm *kvm);
 
 struct  kvm *kvm_arch_create_vm(void);
 void kvm_arch_destroy_vm(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f095b73..4beae7a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -231,15 +231,19 @@ void kvm_free_physmem(struct kvm *kvm)
 		kvm_free_physmem_slot(&kvm->memslots[i], NULL);
 }
 
-static void kvm_destroy_vm(struct kvm *kvm)
+void kvm_destroy_common_vm(struct kvm *kvm)
 {
-	struct mm_struct *mm = kvm->mm;
-
 	spin_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	spin_unlock(&kvm_lock);
 	kvm_io_bus_destroy(&kvm->pio_bus);
 	kvm_io_bus_destroy(&kvm->mmio_bus);
+}
+
+static void kvm_destroy_vm(struct kvm *kvm)
+{
+	struct mm_struct *mm = kvm->mm;
+
 	kvm_arch_destroy_vm(kvm);
 	mmdrop(mm);
 }


As usual you also need the kvm-mmu-notifier-lock patch to read the
memslots with only the mmu_lock.

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c7ad235..8be6551 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3871,16 +3871,23 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
 	 */
 	if (!user_alloc) {
 		if (npages && !old.rmap) {
+			unsigned long userspace_addr;
+
 			down_write(&current->mm->mmap_sem);
-			memslot->userspace_addr = do_mmap(NULL, 0,
-						     npages * PAGE_SIZE,
-						     PROT_READ | PROT_WRITE,
-						     MAP_SHARED | MAP_ANONYMOUS,
-						     0);
+			userspace_addr = do_mmap(NULL, 0,
+						 npages * PAGE_SIZE,
+						 PROT_READ | PROT_WRITE,
+						 MAP_SHARED | MAP_ANONYMOUS,
+						 0);
 			up_write(&current->mm->mmap_sem);
 
-			if (IS_ERR((void *)memslot->userspace_addr))
-				return PTR_ERR((void *)memslot->userspace_addr);
+			if (IS_ERR((void *)userspace_addr))
+				return PTR_ERR((void *)userspace_addr);
+
+			/* set userspace_addr atomically for kvm_hva_to_rmapp */
+			spin_lock(&kvm->mmu_lock);
+			memslot->userspace_addr = userspace_addr;
+			spin_unlock(&kvm->mmu_lock);
 		} else {
 			if (!old.user_alloc && old.rmap) {
 				int ret;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6a52c08..97bcc8d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -342,7 +342,15 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		memset(new.rmap, 0, npages * sizeof(*new.rmap));
 
 		new.user_alloc = user_alloc;
-		new.userspace_addr = mem->userspace_addr;
+		/*
+		 * hva_to_rmmap() serialzies with the mmu_lock and to be
+		 * safe it has to ignore memslots with !user_alloc &&
+		 * !userspace_addr.
+		 */
+		if (user_alloc)
+			new.userspace_addr = mem->userspace_addr;
+		else
+			new.userspace_addr = 0;
 	}
 	if (npages && !new.lpage_info) {
 		int largepages = npages / KVM_PAGES_PER_HPAGE;
@@ -374,14 +382,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
 		memset(new.dirty_bitmap, 0, dirty_bytes);
 	}
 
+	spin_lock(&kvm->mmu_lock);
 	if (mem->slot >= kvm->nmemslots)
 		kvm->nmemslots = mem->slot + 1;
 
 	*memslot = new;
+	spin_unlock(&kvm->mmu_lock);
 
 	r = kvm_arch_set_memory_region(kvm, mem, old, user_alloc);
 	if (r) {
+		spin_lock(&kvm->mmu_lock);
 		*memslot = old;
+		spin_unlock(&kvm->mmu_lock);
 		goto out_free;
 	}
 

From aliguori at us.ibm.com  Sat Apr 26 11:59:23 2008
From: aliguori at us.ibm.com (Anthony Liguori)
Date: Sat, 26 Apr 2008 13:59:23 -0500
Subject: [ofa-general] Re: mmu notifier #v14
In-Reply-To: <20080426164511.GJ9514@duo.random>
References: <20080426164511.GJ9514@duo.random>
Message-ID: <48137B8B.7010202@us.ibm.com>

Andrea Arcangeli wrote:
> Hello everyone,
>
> here it is the mmu notifier #v14.
>
> 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/
>
> Please everyone involved review and (hopefully ;) ack that this is
> safe to go in 2.6.26, the most important is to verify that this is a
> noop when disarmed regardless of MMU_NOTIFIER=y or =n.
>
> 	http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core
>
> I'll be sending that patch to Andrew inbox.
>
> Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 8d45fab..ce3251c 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
>  	tristate "Kernel-based Virtual Machine (KVM) support"
>  	depends on HAVE_KVM
>  	select PREEMPT_NOTIFIERS
> +	select MMU_NOTIFIER
>  	select ANON_INODES
>  	---help---
>  	  Support hosting fully virtualized guest machines using hardware
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2ad6f54..853087a 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -663,6 +663,108 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
>  	account_shadowed(kvm, gfn);
>  }
>
> +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
> +{
> +	struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT);
> +	get_page(page);
>   

You should not assume a struct page exists for any given spte. Instead, 
use kvm_get_pfn() and kvm_release_pfn_clean().

>  static void nonpaging_free(struct kvm_vcpu *vcpu)
> @@ -1643,11 +1771,11 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  	int r;
>  	u64 gpte = 0;
>  	pfn_t pfn;
> -
> -	vcpu->arch.update_pte.largepage = 0;
> +	int mmu_seq;
> +	int largepage;
>
>  	if (bytes != 4 && bytes != 8)
> -		return;
> +		goto out_lock;
>
>  	/*
>  	 * Assume that the pte write on a page table of the same type
> @@ -1660,7 +1788,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  		if ((bytes == 4) && (gpa % 4 == 0)) {
>  			r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8);
>  			if (r)
> -				return;
> +				goto out_lock;
>  			memcpy((void *)&gpte + (gpa % 8), new, 4);
>  		} else if ((bytes == 8) && (gpa % 8 == 0)) {
>  			memcpy((void *)&gpte, new, 8);
> @@ -1670,23 +1798,35 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>  			memcpy((void *)&gpte, new, 4);
>  	}
>  	if (!is_present_pte(gpte))
> -		return;
> +		goto out_lock;
>  	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
>
> +	largepage = 0;
>  	down_read(&current->mm->mmap_sem);
>  	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
>  		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
> -		vcpu->arch.update_pte.largepage = 1;
> +		largepage = 1;
>  	}
> +	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
> +	/* implicit mb(), we'll read before PT lock is unlocked */
>  	pfn = gfn_to_pfn(vcpu->kvm, gfn);
>  	up_read(&current->mm->mmap_sem);
>
> -	if (is_error_pfn(pfn)) {
> -		kvm_release_pfn_clean(pfn);
> -		return;
> -	}
> +	if (is_error_pfn(pfn))
> +		goto out_release_and_lock;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +	BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn));
>  	vcpu->arch.update_pte.gfn = gfn;
>  	vcpu->arch.update_pte.pfn = pfn;
> +	vcpu->arch.update_pte.largepage = largepage;
> +	vcpu->arch.update_pte.mmu_seq = mmu_seq;
> +	return;
> +
> +out_release_and_lock:
> +	kvm_release_pfn_clean(pfn);
> +out_lock:
> +	spin_lock(&vcpu->kvm->mmu_lock);
>  }
>   

Perhaps I just have a weak stomach but I am uneasy having a function 
that takes a lock on exit. I walked through the logic and it doesn't 
appear to be wrong but it also is pretty clear that you could defer the 
acquisition of the lock to the caller (in this case, kvm_mmu_pte_write) 
by moving the update_pte assignment into kvm_mmu_pte_write.

>  void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> @@ -1711,7 +1851,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
>
>  	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
>  	mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
>   

Worst case, you pass 4 more pointer arguments here and, take the spin 
lock, and then depending on the result of mmu_guess_page_from_pte_write, 
update vcpu->arch.update_pte.

> @@ -3899,13 +4037,12 @@ static void kvm_free_vcpus(struct kvm *kvm)
>
>  void kvm_arch_destroy_vm(struct kvm *kvm)
>  {
> -	kvm_free_pit(kvm);
> -	kfree(kvm->arch.vpic);
> -	kfree(kvm->arch.vioapic);
> -	kvm_free_vcpus(kvm);
> -	kvm_free_physmem(kvm);
> -	if (kvm->arch.apic_access_page)
> -		put_page(kvm->arch.apic_access_page);
> +	/*
> +	 * kvm_mmu_notifier_release() will be called before
> +	 * mmu_notifier_unregister returns, if it didn't run
> +	 * already.
> +	 */
> +	mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm);
>  	kfree(kvm);
>  }
>   

Why move the destruction of the vm to the MMU notifier unregister hook? 
Does anything else ever call mmu_notifier_unregister that would 
implicitly destroy the VM?

Regards,

Anthony Liguori


From dks at mediaweb.com  Sat Apr 26 15:52:12 2008
From: dks at mediaweb.com (DK Smith)
Date: Sat, 26 Apr 2008 15:52:12 -0700
Subject: [ofa-general] install.sh question
In-Reply-To: <1207688301.1661.86.camel@localhost>
References: <1207688301.1661.86.camel@localhost>
Message-ID: <4813B21C.4020901@mediaweb.com>

Bump +1
That's a good question.

Frank Leers wrote:
> Hi all,
> 
> I'd like to be able to use the provided install.sh from cluster nodes to
> install from a build which is shared over nfs, while utilizing an
> ofed_net.conf  The Install Guide talks about this, but I must be missing
> something in the detail.
> 
> Is there a way to not check if a build needs to be (re)done and simply
> install the rpm's that were created during the original build, then
> create the ifcfg-ib? devices based on the template file passed in with
> -net <ofed_net.conf> ?  I prefer not to have kernel sources, compiler,
> etc. on these compute nodes, nor should I have to recompile for each
> homogeneous node.
> 
> thanks,
> 
> -frank
> 


Cheers,
DK


From andrea at qumranet.com  Sat Apr 26 17:20:19 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sun, 27 Apr 2008 02:20:19 +0200
Subject: [ofa-general] Re: mmu notifier #v14
In-Reply-To: <48137B8B.7010202@us.ibm.com>
References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com>
Message-ID: <20080427002019.GL9514@duo.random>

On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote:
>> +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
>> +{
>> +	struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
>> PAGE_SHIFT);
>> +	get_page(page);
>>   
>
> You should not assume a struct page exists for any given spte. Instead, use 
> kvm_get_pfn() and kvm_release_pfn_clean().

Last email from muli at ibm in my inbox argues it's useless to build rmap
on mmio regions, so the above is more efficient so put_page runs
directly on the page without going back and forth between spte -> pfn
-> page -> pfn -> page in a single function.

Certainly if we start building rmap on mmio regions we'll have to
change that.

> Perhaps I just have a weak stomach but I am uneasy having a function that 
> takes a lock on exit. I walked through the logic and it doesn't appear to 
> be wrong but it also is pretty clear that you could defer the acquisition 
> of the lock to the caller (in this case, kvm_mmu_pte_write) by moving the 
> update_pte assignment into kvm_mmu_pte_write.

I agree out_lock is an uncommon exit path, the problem is that the
code was buggy, and I tried to fix it with the smallest possible
change and that resulting in an out_lock. That section likely need a
refactoring, all those update_pte fields should be at least returned
by the function guess_....  but I tried to reduce the changes to make
the issue more readable, I didn't want to rewrite certain functions
just to take a spinlock a few instructions ahead.

> Worst case, you pass 4 more pointer arguments here and, take the spin lock, 
> and then depending on the result of mmu_guess_page_from_pte_write, update 
> vcpu->arch.update_pte.

Yes that was my same idea, but that's left for a later patch. Fixing
this bug mixed with the mmu notifier patch was perhaps excessive
already ;).

> Why move the destruction of the vm to the MMU notifier unregister hook? 
> Does anything else ever call mmu_notifier_unregister that would implicitly 
> destroy the VM?

mmu notifier ->release can run at anytime before the filehandle is
closed. ->release has to zap all sptes and freeze the mmu (hence all
vcpus) to prevent any further page fault. After ->release returns all
pages are freed (we'll never relay on the page pin to avoid the
rmap_remove put_page to be a relevant unpin event). So the idea is
that I wanted to maintain the same ordering of the current code in the
vm destroy event, I didn't want to leave a partially shutdown VM on
the vmlist. If the ordering is entirely irrelevant and the
kvm_arch_destroy_vm can run well before kvm_destroy_vm is called, then
I can avoid changes to kvm_main.c but I doubt.

I've done it in a way that archs not needing mmu notifiers like s390
can simply add the kvm_destroy_common_vm at the top of their
kvm_arch_destroy_vm. All others using mmu_notifiers have to invoke
kvm_destroy_common_vm in the ->release of the mmu notifiers.

This will ensure that everything will be ok regardless if exit_mmap is
called before/after exit_files, and it won't make a whole lot of
difference anymore, if the driver fd is pinned through vmas->vm_file
released in exit_mmap or through the task filedescriptors relased in
exit_files etc... Infact this allows to call mmu_notifier_unregister
at anytime later after the task has already been killed, without any
trouble (like if the mmu notifier owner isn't registering in
current->mm but some other tasks mm).


From anthony at codemonkey.ws  Sat Apr 26 18:54:23 2008
From: anthony at codemonkey.ws (Anthony Liguori)
Date: Sat, 26 Apr 2008 20:54:23 -0500
Subject: [ofa-general] Re: [kvm-devel] mmu notifier #v14
In-Reply-To: <20080427002019.GL9514@duo.random>
References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com>
	<20080427002019.GL9514@duo.random>
Message-ID: <4813DCCF.3020201@codemonkey.ws>

Andrea Arcangeli wrote:
> On Sat, Apr 26, 2008 at 01:59:23PM -0500, Anthony Liguori wrote:
>   
>>> +static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
>>> +{
>>> +	struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
>>> PAGE_SHIFT);
>>> +	get_page(page);
>>>   
>>>       
>> You should not assume a struct page exists for any given spte. Instead, use 
>> kvm_get_pfn() and kvm_release_pfn_clean().
>>     
>
> Last email from muli at ibm in my inbox argues it's useless to build rmap
> on mmio regions, so the above is more efficient so put_page runs
> directly on the page without going back and forth between spte -> pfn
> -> page -> pfn -> page in a single function.
>   

Avi can correct me if I'm wrong, but I don't think the consensus of that 
discussion was that we're going to avoid putting mmio pages in the 
rmap.  Practically speaking, replacing:

+	struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
PAGE_SHIFT);
+	get_page(page);


With:

unsigned long pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
kvm_get_pfn(pfn);

Results in exactly the same code except the later allows mmio pfns in 
the rmap.  So ignoring the whole mmio thing, using accessors that are 
already there and used elsewhere seems like a good idea :-)

> Certainly if we start building rmap on mmio regions we'll have to
> change that.
>
>   
>> Perhaps I just have a weak stomach but I am uneasy having a function that 
>> takes a lock on exit. I walked through the logic and it doesn't appear to 
>> be wrong but it also is pretty clear that you could defer the acquisition 
>> of the lock to the caller (in this case, kvm_mmu_pte_write) by moving the 
>> update_pte assignment into kvm_mmu_pte_write.
>>     
>
> I agree out_lock is an uncommon exit path, the problem is that the
> code was buggy, and I tried to fix it with the smallest possible
> change and that resulting in an out_lock. That section likely need a
> refactoring, all those update_pte fields should be at least returned
> by the function guess_....  but I tried to reduce the changes to make
> the issue more readable, I didn't want to rewrite certain functions
> just to take a spinlock a few instructions ahead.
>   

I appreciate the desire to minimize changes, but taking a lock on return 
seems to take that to a bit of an extreme.  It seems like a simple thing 
to fix though, no?

>> Why move the destruction of the vm to the MMU notifier unregister hook? 
>> Does anything else ever call mmu_notifier_unregister that would implicitly 
>> destroy the VM?
>>     
>
> mmu notifier ->release can run at anytime before the filehandle is
> closed. ->release has to zap all sptes and freeze the mmu (hence all
> vcpus) to prevent any further page fault. After ->release returns all
> pages are freed (we'll never relay on the page pin to avoid the
> rmap_remove put_page to be a relevant unpin event). So the idea is
> that I wanted to maintain the same ordering of the current code in the
> vm destroy event, I didn't want to leave a partially shutdown VM on
> the vmlist. If the ordering is entirely irrelevant and the
> kvm_arch_destroy_vm can run well before kvm_destroy_vm is called, then
> I can avoid changes to kvm_main.c but I doubt.
>
> I've done it in a way that archs not needing mmu notifiers like s390
> can simply add the kvm_destroy_common_vm at the top of their
> kvm_arch_destroy_vm. All others using mmu_notifiers have to invoke
> kvm_destroy_common_vm in the ->release of the mmu notifiers.
>
> This will ensure that everything will be ok regardless if exit_mmap is
> called before/after exit_files, and it won't make a whole lot of
> difference anymore, if the driver fd is pinned through vmas->vm_file
> released in exit_mmap or through the task filedescriptors relased in
> exit_files etc... Infact this allows to call mmu_notifier_unregister
> at anytime later after the task has already been killed, without any
> trouble (like if the mmu notifier owner isn't registering in
> current->mm but some other tasks mm).
>   

I see.  It seems a little strange to me as a KVM guest isn't really tied 
to the current mm.  It seems like the net effect of this is that we are 
now tying a KVM guest to an mm.

For instance, if you create a guest, but didn't assign any memory to it, 
you could transfer the fd to another process and then close the fd 
(without destroying the guest).  The other process then could assign 
memory to it and presumably run the guest.

With your change, as soon as the first process exits, the guest will be 
destroyed.  I'm not sure this behavioral difference really matters but 
it is a behavioral difference.

Regards,

Anthony Liguori

> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> kvm-devel mailing list
> kvm-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   


From andrea at qumranet.com  Sat Apr 26 20:05:14 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sun, 27 Apr 2008 05:05:14 +0200
Subject: [ofa-general] Re: [kvm-devel] mmu notifier #v14
In-Reply-To: <4813DCCF.3020201@codemonkey.ws>
References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com>
	<20080427002019.GL9514@duo.random> <4813DCCF.3020201@codemonkey.ws>
Message-ID: <20080427030514.GM9514@duo.random>

On Sat, Apr 26, 2008 at 08:54:23PM -0500, Anthony Liguori wrote:
> Avi can correct me if I'm wrong, but I don't think the consensus of that 
> discussion was that we're going to avoid putting mmio pages in the rmap.  

My first impression on that discussion was that pci-passthrough mmio
can't be swapped, can't require write throttling etc.. ;). From a
linux VM pagetable point of view rmap on mmio looks weird. However
thinking some more, it's not like in the linux kernel where write
protect through rmap is needed only for write-throttling MAP_SHARED
which clearly is strictly RAM, for sptes we need it for every cr3
touch too to trap pagetable updates (think ioremap done by guest
kernel). So I think Avi's take that we need rmap for everything mapped
by sptes is probably the only feasible way to go.

> Practically speaking, replacing:
>
> +	struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
> PAGE_SHIFT);
> +	get_page(page);
>
>
> With:
>
> unsigned long pfn = (*spte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
> kvm_get_pfn(pfn);
>
> Results in exactly the same code except the later allows mmio pfns in the 
> rmap.  So ignoring the whole mmio thing, using accessors that are already 
> there and used elsewhere seems like a good idea :-)

Agreed especially at the light of the above. I didn't actually touch
that function for a while (I clearly wrote it before we started moving
the kvm mmu code from page to pfn), and it was still safe to use to
test the locking of the mmu notifier methods. My current main focus in
the last few days was to get the locking right against the last mmu
notifier code #v14 ;).

Now that I look into it more closely, the get_page/put_page are
unnecessary by now (it was necessary with the older patches that
didn't implement range_begin and that relied on page pinning).

Not just in that function, but all reference counting inside kvm is
now entirely useless and can be removed.

NOTE: it is safe to flush the tlb outside the mmu_lock if done inside
the mmu_notifier methods. But only mmu notifiers can defer the tlb
flush after releasing mmu_lock because the page can't be freed by the
VM until we return.

All other kvm code must instead definitely flush the tlb inside the
mmu_lock, otherwise when the mmu notifier code runs, it will see the
spte nonpresent and so the mmu notifier code will do nothing (it will
not wait kvm to drop the mmu_lock before allowing the main linux VM to
free the page).

The tlb flush must happen before the page is freed, and doing it
inside mmu_lock everywhere (except in mmu-notifier contex where it can
be done after releasing mmu_lock) guarantees it.

The positive side of the tradeoff of having to do the tlb flush inside
the mmu_lock, is that KVM can now safely zap and unmap as many sptes
at it wants and do a single tlb flush at the end. The pages can't be
freed as long as the mmu_lock is hold (this is why the tlb flush has
to be done inside the mmu_lock). This model reduces heavily the tlb
flush frequency for large spte-mangling, and tlb flushes here are
quite expensive because of ipis.

> I appreciate the desire to minimize changes, but taking a lock on return 
> seems to take that to a bit of an extreme.  It seems like a simple thing to 
> fix though, no?

I agree it needs to be rewritten as a cleaner fix but probably in a
separate patch (which has to be incremental as that code will reject
on the mmu notifier patch). I didn't see as a big issue however to
apply my quick fix first and cleanup with an incremental update.

> I see.  It seems a little strange to me as a KVM guest isn't really tied to 
> the current mm.  It seems like the net effect of this is that we are now 
> tying a KVM guest to an mm.
>
> For instance, if you create a guest, but didn't assign any memory to it, 
> you could transfer the fd to another process and then close the fd (without 
> destroying the guest).  The other process then could assign memory to it 
> and presumably run the guest.

Passing the anon kvm vm fd through unix sockets to another task is
exactly why we need things like ->release not dependent on fd->release
vma->vm_file->release ordering in the do_exit path to teardown the VM.

The guest itself is definitely tied to a "mm", the guest runs using
get_user_pages and get_user_pages is meaningless without an mm. But
the fd where we run the ioctl isn't tied to the mm, it's just an fd
that can be passed across tasks with unix sockets.

> With your change, as soon as the first process exits, the guest will be 
> destroyed.  I'm not sure this behavioral difference really matters but it 
> is a behavioral difference.

The guest-mode of the cpu, can't run safely on any task but the one
with the "mm" tracked by the mmu notifiers and where the memory is
allocated from. The sptes points to the memory allocated in that
"mm". It's definitely memory-corrupting to leave any spte established
when the last thread of that "mm" exists as the memory supposedly
pointed by the orphaned sptes will go immediately in the freelist and
reused by the kernel. Keep in mind that there's no page pin on the
memory pointed by the sptes.

The ioctl of the qemu userland could run in any other task with a mm
different than the one of the guest and ->release allows this to work
fine without memory corruption and without requiring page pinning.

As far a I can tell your example explains why we need this fix ;).

Here an updated patch that passes my swap test (the only missing thing
is the out_lock cleanup).

Signed-off-by: Andrea Arcangeli <andrea at qumranet.com>

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8d45fab..ce3251c 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -21,6 +21,7 @@ config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on HAVE_KVM
 	select PREEMPT_NOTIFIERS
+	select MMU_NOTIFIER
 	select ANON_INODES
 	---help---
 	  Support hosting fully virtualized guest machines using hardware
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2ad6f54..330eaed 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -663,6 +663,101 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
 	account_shadowed(kvm, gfn);
 }
 
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+	u64 *spte, *curr_spte;
+	int need_tlb_flush = 0;
+
+	spte = rmap_next(kvm, rmapp, NULL);
+	while (spte) {
+		BUG_ON(!(*spte & PT_PRESENT_MASK));
+		rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
+		curr_spte = spte;
+		spte = rmap_next(kvm, rmapp, spte);
+		rmap_remove(kvm, curr_spte);
+		set_shadow_pte(curr_spte, shadow_trap_nonpresent_pte);
+		need_tlb_flush = 1;
+	}
+	return need_tlb_flush;
+}
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+	int i;
+	int need_tlb_flush = 0;
+
+	/*
+	 * If mmap_sem isn't taken, we can look the memslots with only
+	 * the mmu_lock by skipping over the slots with userspace_addr == 0.
+	 */
+	for (i = 0; i < kvm->nmemslots; i++) {
+		struct kvm_memory_slot *memslot = &kvm->memslots[i];
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		/* mmu_lock protects userspace_addr */
+		if (!start)
+			continue;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+			need_tlb_flush |= kvm_unmap_rmapp(kvm,
+							  &memslot->rmap[gfn_offset]);
+		}
+	}
+
+	return need_tlb_flush;
+}
+
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+	u64 *spte;
+	int young = 0;
+
+	spte = rmap_next(kvm, rmapp, NULL);
+	while (spte) {
+		int _young;
+		u64 _spte = *spte;
+		BUG_ON(!(_spte & PT_PRESENT_MASK));
+		_young = _spte & PT_ACCESSED_MASK;
+		if (_young) {
+			young = !!_young;
+			set_shadow_pte(spte, _spte & ~PT_ACCESSED_MASK);
+		}
+		spte = rmap_next(kvm, rmapp, spte);
+	}
+	return young;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	int i;
+	int young = 0;
+
+	/*
+	 * If mmap_sem isn't taken, we can look the memslots with only
+	 * the mmu_lock by skipping over the slots with userspace_addr == 0.
+	 */
+	for (i = 0; i < kvm->nmemslots; i++) {
+		struct kvm_memory_slot *memslot = &kvm->memslots[i];
+		unsigned long start = memslot->userspace_addr;
+		unsigned long end;
+
+		/* mmu_lock protects userspace_addr */
+		if (!start)
+			continue;
+
+		end = start + (memslot->npages << PAGE_SHIFT);
+		if (hva >= start && hva < end) {
+			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+			young |= kvm_age_rmapp(kvm, &memslot->rmap[gfn_offset]);
+		}
+	}
+
+	return young;
+}
+
 #ifdef MMU_DEBUG
 static int is_empty_shadow_page(u64 *spt)
 {
@@ -1200,6 +1295,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	int r;
 	int largepage = 0;
 	pfn_t pfn;
+	int mmu_seq;
 
 	down_read(&current->mm->mmap_sem);
 	if (is_largepage_backed(vcpu, gfn & ~(KVM_PAGES_PER_HPAGE-1))) {
@@ -1207,6 +1303,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 		largepage = 1;
 	}
 
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 
@@ -1217,6 +1315,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 	}
 
 	spin_lock(&vcpu->kvm->mmu_lock);
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		goto out_unlock;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq))
+		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, v, write, largepage, gfn, pfn,
 			 PT32E_ROOT_LEVEL);
@@ -1224,6 +1327,11 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
 
 
 	return r;
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
 }
 
 
@@ -1355,6 +1463,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 	int r;
 	int largepage = 0;
 	gfn_t gfn = gpa >> PAGE_SHIFT;
+	int mmu_seq;
 
 	ASSERT(vcpu);
 	ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
@@ -1368,6 +1477,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
 		largepage = 1;
 	}
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 	if (is_error_pfn(pfn)) {
@@ -1375,12 +1486,22 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
 		return 1;
 	}
 	spin_lock(&vcpu->kvm->mmu_lock);
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		goto out_unlock;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq))
+		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	r = __direct_map(vcpu, gpa, error_code & PFERR_WRITE_MASK,
 			 largepage, gfn, pfn, TDP_ROOT_LEVEL);
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return r;
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
 }
 
 static void nonpaging_free(struct kvm_vcpu *vcpu)
@@ -1643,11 +1764,11 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 	int r;
 	u64 gpte = 0;
 	pfn_t pfn;
-
-	vcpu->arch.update_pte.largepage = 0;
+	int mmu_seq;
+	int largepage;
 
 	if (bytes != 4 && bytes != 8)
-		return;
+		goto out_lock;
 
 	/*
 	 * Assume that the pte write on a page table of the same type
@@ -1660,7 +1781,7 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		if ((bytes == 4) && (gpa % 4 == 0)) {
 			r = kvm_read_guest(vcpu->kvm, gpa & ~(u64)7, &gpte, 8);
 			if (r)
-				return;
+				goto out_lock;
 			memcpy((void *)&gpte + (gpa % 8), new, 4);
 		} else if ((bytes == 8) && (gpa % 8 == 0)) {
 			memcpy((void *)&gpte, new, 8);
@@ -1670,23 +1791,35 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 			memcpy((void *)&gpte, new, 4);
 	}
 	if (!is_present_pte(gpte))
-		return;
+		goto out_lock;
 	gfn = (gpte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
 
+	largepage = 0;
 	down_read(&current->mm->mmap_sem);
 	if (is_large_pte(gpte) && is_largepage_backed(vcpu, gfn)) {
 		gfn &= ~(KVM_PAGES_PER_HPAGE-1);
-		vcpu->arch.update_pte.largepage = 1;
+		largepage = 1;
 	}
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, gfn);
 	up_read(&current->mm->mmap_sem);
 
-	if (is_error_pfn(pfn)) {
-		kvm_release_pfn_clean(pfn);
-		return;
-	}
+	if (is_error_pfn(pfn))
+		goto out_release_and_lock;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+	BUG_ON(!is_error_pfn(vcpu->arch.update_pte.pfn));
 	vcpu->arch.update_pte.gfn = gfn;
 	vcpu->arch.update_pte.pfn = pfn;
+	vcpu->arch.update_pte.largepage = largepage;
+	vcpu->arch.update_pte.mmu_seq = mmu_seq;
+	return;
+
+out_release_and_lock:
+	kvm_release_pfn_clean(pfn);
+out_lock:
+	spin_lock(&vcpu->kvm->mmu_lock);
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -1711,7 +1844,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 
 	pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 	mmu_guess_page_from_pte_write(vcpu, gpa, new, bytes);
-	spin_lock(&vcpu->kvm->mmu_lock);
 	kvm_mmu_free_some_pages(vcpu);
 	++vcpu->kvm->stat.mmu_pte_write;
 	kvm_mmu_audit(vcpu, "pre pte write");
@@ -1790,11 +1922,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 		}
 	}
 	kvm_mmu_audit(vcpu, "post pte write");
-	spin_unlock(&vcpu->kvm->mmu_lock);
 	if (!is_error_pfn(vcpu->arch.update_pte.pfn)) {
 		kvm_release_pfn_clean(vcpu->arch.update_pte.pfn);
 		vcpu->arch.update_pte.pfn = bad_pfn;
 	}
+	spin_unlock(&vcpu->kvm->mmu_lock);
 }
 
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 156fe10..4ac73a6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -263,6 +263,12 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *page,
 	pfn = vcpu->arch.update_pte.pfn;
 	if (is_error_pfn(pfn))
 		return;
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		return;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) !=
+		     vcpu->arch.update_pte.mmu_seq))
+		return;
 	kvm_get_pfn(pfn);
 	mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
 		     gpte & PT_DIRTY_MASK, NULL, largepage, gpte_to_gfn(gpte),
@@ -380,6 +386,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	int r;
 	pfn_t pfn;
 	int largepage = 0;
+	int mmu_seq;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 	kvm_mmu_audit(vcpu, "pre page fault");
@@ -413,6 +420,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 			largepage = 1;
 		}
 	}
+	mmu_seq = atomic_read(&vcpu->kvm->arch.mmu_notifier_seq);
+	/* implicit mb(), we'll read before PT lock is unlocked */
 	pfn = gfn_to_pfn(vcpu->kvm, walker.gfn);
 	up_read(&current->mm->mmap_sem);
 
@@ -424,6 +433,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	}
 
 	spin_lock(&vcpu->kvm->mmu_lock);
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_count)))
+		goto out_unlock;
+	smp_rmb();
+	if (unlikely(atomic_read(&vcpu->kvm->arch.mmu_notifier_seq) != mmu_seq))
+		goto out_unlock;
 	kvm_mmu_free_some_pages(vcpu);
 	shadow_pte = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
 				  largepage, &write_pt, pfn);
@@ -439,6 +453,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
 	return write_pt;
+
+out_unlock:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	kvm_release_pfn_clean(pfn);
+	return 0;
 }
 
 static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0ce5563..a026cb7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/mman.h>
 #include <linux/highmem.h>
+#include <linux/mmu_notifier.h>
 
 #include <asm/uaccess.h>
 #include <asm/msr.h>
@@ -3859,15 +3860,173 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 	free_page((unsigned long)vcpu->arch.pio_data);
 }
 
+static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
+{
+	struct kvm_arch *kvm_arch;
+	kvm_arch = container_of(mn, struct kvm_arch, mmu_notifier);
+	return container_of(kvm_arch, struct kvm, arch);
+}
+
+static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
+					     struct mm_struct *mm,
+					     unsigned long address)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	int need_tlb_flush;
+
+	/*
+	 * When ->invalidate_page runs, the linux pte has been zapped
+	 * already but the page is still allocated until
+	 * ->invalidate_page returns. So if we increase the sequence
+	 * here the kvm page fault will notice if the spte can't be
+	 * established because the page is going to be freed. If
+	 * instead the kvm page fault establishes the spte before
+	 * ->invalidate_page runs, kvm_unmap_hva will release it
+	 * before returning.
+
+	 * No need of memory barriers as the sequence increase only
+	 * need to be seen at spin_unlock time, and not at spin_lock
+	 * time.
+	 *
+	 * Increasing the sequence after the spin_unlock would be
+	 * unsafe because the kvm page fault could then establish the
+	 * pte after kvm_unmap_hva returned, without noticing the page
+	 * is going to be freed.
+	 */
+	atomic_inc(&kvm->arch.mmu_notifier_seq);
+	spin_lock(&kvm->mmu_lock);
+	need_tlb_flush = kvm_unmap_hva(kvm, address);
+	spin_unlock(&kvm->mmu_lock);
+
+	/* we've to flush the tlb before the pages can be freed */
+	if (need_tlb_flush)
+		kvm_flush_remote_tlbs(kvm);
+
+}
+
+static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
+						    struct mm_struct *mm,
+						    unsigned long start,
+						    unsigned long end)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	int need_tlb_flush = 0;
+
+	/*
+	 * The count increase must become visible at unlock time as no
+	 * spte can be established without taking the mmu_lock and
+	 * count is also read inside the mmu_lock critical section.
+	 */
+	atomic_inc(&kvm->arch.mmu_notifier_count);
+
+	spin_lock(&kvm->mmu_lock);
+	for (; start < end; start += PAGE_SIZE)
+		need_tlb_flush |= kvm_unmap_hva(kvm, start);
+	spin_unlock(&kvm->mmu_lock);
+
+	/* we've to flush the tlb before the pages can be freed */
+	if (need_tlb_flush)
+		kvm_flush_remote_tlbs(kvm);
+}
+
+static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
+						  struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long end)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	/*
+	 *
+	 * This sequence increase will notify the kvm page fault that
+	 * the page that is going to be mapped in the spte could have
+	 * been freed.
+	 *
+	 * There's also an implicit mb() here in this comment,
+	 * provided by the last PT lock taken to zap pagetables, and
+	 * that the read side has to take too in follow_page(). The
+	 * sequence increase in the worst case will become visible to
+	 * the kvm page fault after the spin_lock of the last PT lock
+	 * of the last PT-lock-protected critical section preceeding
+	 * invalidate_range_end. So if the kvm page fault is about to
+	 * establish the spte inside the mmu_lock, while we're freeing
+	 * the pages, it will have to backoff and when it retries, it
+	 * will have to take the PT lock before it can check the
+	 * pagetables again. And after taking the PT lock it will
+	 * re-establish the pte even if it will see the already
+	 * increased sequence number before calling gfn_to_pfn.
+	 */
+	atomic_inc(&kvm->arch.mmu_notifier_seq);
+	/*
+	 * The sequence increase must be visible before count
+	 * decrease. The page fault has to read count before sequence
+	 * for this write order to be effective.
+	 */
+	wmb();
+	atomic_dec(&kvm->arch.mmu_notifier_count);
+	BUG_ON(atomic_read(&kvm->arch.mmu_notifier_count) < 0);
+}
+
+static int kvm_mmu_notifier_clear_flush_young(struct mmu_notifier *mn,
+					      struct mm_struct *mm,
+					      unsigned long address)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	int young;
+
+	spin_lock(&kvm->mmu_lock);
+	young = kvm_age_hva(kvm, address);
+	spin_unlock(&kvm->mmu_lock);
+
+	if (young)
+		kvm_flush_remote_tlbs(kvm);
+
+	return young;
+}
+
+static void kvm_free_vcpus(struct kvm *kvm);
+/* This must zap all the sptes because all pages will be freed then */
+static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
+				     struct mm_struct *mm)
+{
+	struct kvm *kvm = mmu_notifier_to_kvm(mn);
+	BUG_ON(mm != kvm->mm);
+
+	kvm_destroy_common_vm(kvm);
+
+	kvm_free_pit(kvm);
+	kfree(kvm->arch.vpic);
+	kfree(kvm->arch.vioapic);
+	kvm_free_vcpus(kvm);
+	kvm_free_physmem(kvm);
+	if (kvm->arch.apic_access_page)
+		put_page(kvm->arch.apic_access_page);
+}
+
+static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
+	.release		= kvm_mmu_notifier_release,
+	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
+	.invalidate_range_start	= kvm_mmu_notifier_invalidate_range_start,
+	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
+	.clear_flush_young	= kvm_mmu_notifier_clear_flush_young,
+};
+
 struct  kvm *kvm_arch_create_vm(void)
 {
 	struct kvm *kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL);
+	int err;
 
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 
+	kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops;
+	err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm);
+	if (err) {
+		kfree(kvm);
+		return ERR_PTR(err);
+	}
+
 	return kvm;
 }
 
@@ -3899,13 +4058,12 @@ static void kvm_free_vcpus(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	kvm_free_pit(kvm);
-	kfree(kvm->arch.vpic);
-	kfree(kvm->arch.vioapic);
-	kvm_free_vcpus(kvm);
-	kvm_free_physmem(kvm);
-	if (kvm->arch.apic_access_page)
-		put_page(kvm->arch.apic_access_page);
+	/*
+	 * kvm_mmu_notifier_release() will be called before
+	 * mmu_notifier_unregister returns, if it didn't run
+	 * already.
+	 */
+	mmu_notifier_unregister(&kvm->arch.mmu_notifier, kvm->mm);
 	kfree(kvm);
 }
 
diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
index 9d963cd..7b8deea 100644
--- a/include/asm-x86/kvm_host.h
+++ b/include/asm-x86/kvm_host.h
@@ -13,6 +13,7 @@
 
 #include <linux/types.h>
 #include <linux/mm.h>
+#include <linux/mmu_notifier.h>
 
 #include <linux/kvm.h>
 #include <linux/kvm_para.h>
@@ -247,6 +248,7 @@ struct kvm_vcpu_arch {
 		gfn_t gfn;	/* presumed gfn during guest pte update */
 		pfn_t pfn;	/* pfn corresponding to that gfn */
 		int largepage;
+		int mmu_seq;
 	} update_pte;
 
 	struct i387_fxsave_struct host_fx_image;
@@ -314,6 +316,10 @@ struct kvm_arch{
 	struct page *apic_access_page;
 
 	gpa_t wall_clock;
+
+	struct mmu_notifier mmu_notifier;
+	atomic_t mmu_notifier_seq;
+	atomic_t mmu_notifier_count;
 };
 
 struct kvm_vm_stat {
@@ -434,6 +440,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu);
 int kvm_mmu_setup(struct kvm_vcpu *vcpu);
 void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
 
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
 void kvm_mmu_zap_all(struct kvm *kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4e16682..f089edc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -267,6 +267,7 @@ void kvm_arch_check_processor_compat(void *rtn);
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 
 void kvm_free_physmem(struct kvm *kvm);
+void kvm_destroy_common_vm(struct kvm *kvm);
 
 struct  kvm *kvm_arch_create_vm(void);
 void kvm_arch_destroy_vm(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f095b73..4beae7a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -231,15 +231,19 @@ void kvm_free_physmem(struct kvm *kvm)
 		kvm_free_physmem_slot(&kvm->memslots[i], NULL);
 }
 
-static void kvm_destroy_vm(struct kvm *kvm)
+void kvm_destroy_common_vm(struct kvm *kvm)
 {
-	struct mm_struct *mm = kvm->mm;
-
 	spin_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	spin_unlock(&kvm_lock);
 	kvm_io_bus_destroy(&kvm->pio_bus);
 	kvm_io_bus_destroy(&kvm->mmio_bus);
+}
+
+static void kvm_destroy_vm(struct kvm *kvm)
+{
+	struct mm_struct *mm = kvm->mm;
+
 	kvm_arch_destroy_vm(kvm);
 	mmdrop(mm);
 }


From erezz at voltaire.com  Sat Apr 26 23:20:55 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Sun, 27 Apr 2008 09:20:55 +0300
Subject: [ofa-general] [PATCH 1/1] RPM Spec files
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D8678839@EPEXCH1.qlogic.org>
References: <C07C40DB2364324799506DE8FF12F8D8678839@EPEXCH1.qlogic.org>
Message-ID: <48141B47.4080408@voltaire.com>

Mike Heinz wrote:

> Installation of OFED 1.3.0.0.4 onto a Kusu/OCS cluster does not fully
> succeed because of some missing dependencies in the RPM spec files. This
> is because Kusu installs nodes over a network by presenting a pool of
> RPMs to be installed and letting RPM figure out the order to install
> them in. Without the dependencies we ended up with oddities like the
> kernel drivers being installed before the /usr/bin directory had been
> populated, causing the install script to fail.
>  
> I was able to work around this by manually expanding some of the source
> RPM files, altering the spec file and repackaging the source RPM. This
> allowed me to build binary RPMs (via the install script) that could be
> installed on a Kusu cluster.
>  
> Here are the proposed changes. If there is a better/preferred way of
> submitting this suggestion, please let me know.
>   

Some general comments:

    * OFED issues are discussed in the ewg list. You should send patches
      to that list.
    * You have patches for multiple git trees (bonding, open-iscsi etc).
      You should separate them to multiple patches. Each patch should
      have a separate e-mail message (and add the maintainer to the
      thread). The best thing to do is to create a patch set.
    * Please create the patches against the relevant git trees. It will
      make it easier to apply them.

See more comments below.

>  
> --- ../../original/ib-bonding.spec      2008-04-22 12:54:12.000000000
> -0400
> +++ ib-bonding.spec     2008-04-22 12:43:07.000000000 -0400
> @@ -20,6 +20,7 @@
>  Group           : Applications/System
>  License         : GPL
>  BuildRoot:      %{_tmppath}/%{name}-%{version}-root
> +PreReq         : coreutils
>  
>  %description
>  This package provides a bonding device which is capable of enslaving
> --- ../../original/ofa_kernel.spec      2008-04-22 12:54:13.000000000
> -0400
> +++ ofa_kernel.spec     2008-04-22 12:45:40.000000000 -0400
> @@ -111,6 +111,9 @@
>  BuildRequires: sysfsutils-devel
>  
>  %package -n kernel-ib
> +PreReq: coreutils
> +PreReq: kernel
> +PreReq: pciutils
>  Version: %{_version}
>  Release: %{krelver}
>  Summary: Infiniband Driver and ULPs kernel modules
> @@ -119,6 +122,10 @@
>  Core, HW and ULPs kernel modules
>  
>  %package -n kernel-ib-devel
> +PreReq: coreutils
> +PreReq: kernel
> +PreReq: pciutils
> +Requires: kernel-ib
>  Version: %{_version}
>  Release: %{krelver}
>  Summary: Infiniband Driver and ULPs kernel modules sources
> --- ../../original/open-iscsi-generic.spec      2008-04-22
>   

If this change is relevant for open-iscsi.git, it is also relevant for
open-iscsi-rh4.git. BTW - you can see the list of git trees here:
http://www.openfabrics.org/git/

Erez


From vlad at dev.mellanox.co.il  Sat Apr 26 23:35:45 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 27 Apr 2008 09:35:45 +0300
Subject: [ofa-general] install.sh question
In-Reply-To: <1207688301.1661.86.camel@localhost>
References: <1207688301.1661.86.camel@localhost>
Message-ID: <48141EC1.7010801@dev.mellanox.co.il>

Frank Leers wrote:
> Hi all,
> 
> I'd like to be able to use the provided install.sh from cluster nodes to
> install from a build which is shared over nfs, while utilizing an
> ofed_net.conf  The Install Guide talks about this, but I must be missing
> something in the detail.
> 
> Is there a way to not check if a build needs to be (re)done and simply
> install the rpm's that were created during the original build, then
> create the ifcfg-ib? devices based on the template file passed in with
> -net <ofed_net.conf> ?  I prefer not to have kernel sources, compiler,
> etc. on these compute nodes, nor should I have to recompile for each
> homogeneous node.
> 
> thanks,
> 
> -frank
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 

Hi Frank,
install.sh checks if there are binary RPMS for all selected packages under OFED-x.x.x/RPMS directory.
If you have created binary RPMs on one of the nodes (by install.sh script), then make sure that the OFED-x.x.x/ofed.conf file includes only these packages.
Then run on all cluster nodes (no kernel sources, compilers, ... required on these nodes):
 > ./install.sh -c ofed.conf -net ofed_net.conf

Note: If there are no RPMs for one or more of the packages selected (package_name=y)in the ofed.conf file then install.sh will run the RPM build process.

Regards,
Vladimir


From admin at cnwhhk.com  Sun Apr 27 00:33:32 2008
From: admin at cnwhhk.com (=?gb2312?B?gVfGtLKrqWfQobn5?=)
Date: Sun, 27 Apr 2008 15:33:32 +0800
Subject: [ofa-general] =?gb2312?b?zuS6ur6pwte/xry8uPjE+rXEzfjVvr2oyejXysHP?=
Message-ID: <20080427.CGHSQRYGJRWCKRPA@cnwhhk.com>

An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080427/5d637d9d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 武汉京伦科技开发有限公司网站建设资料.doc
Type: application/msword
Size: 194560 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080427/5d637d9d/attachment.doc>

From vlad at dev.mellanox.co.il  Sun Apr 27 01:07:11 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Sun, 27 Apr 2008 11:07:11 +0300
Subject: [ofa-general] [PATCH 1/1] RPM Spec files
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D8678839@EPEXCH1.qlogic.org>
References: <C07C40DB2364324799506DE8FF12F8D8678839@EPEXCH1.qlogic.org>
Message-ID: <4814342F.2050509@dev.mellanox.co.il>

Mike Heinz wrote:
...
> --- ../../original/ofa_kernel.spec      2008-04-22 12:54:13.000000000
> -0400
> +++ ofa_kernel.spec     2008-04-22 12:45:40.000000000 -0400
> @@ -111,6 +111,9 @@
>  BuildRequires: sysfsutils-devel
>  
>  %package -n kernel-ib
> +PreReq: coreutils
> +PreReq: kernel
> +PreReq: pciutils
>  Version: %{_version}
>  Release: %{krelver}
>  Summary: Infiniband Driver and ULPs kernel modules
> @@ -119,6 +122,10 @@
>  Core, HW and ULPs kernel modules
>  
>  %package -n kernel-ib-devel
> +PreReq: coreutils
> +PreReq: kernel
> +PreReq: pciutils
> +Requires: kernel-ib
>  Version: %{_version}
>  Release: %{krelver}
>  Summary: Infiniband Driver and ULPs kernel modules sources

...

> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 


Applied to ofa_kernel.spec.

Regards,
Vladimir


From sashak at voltaire.com  Sun Apr 27 04:38:01 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 27 Apr 2008 11:38:01 +0000
Subject: [ofa-general] madrpc_init and reseting performance counters
In-Reply-To: <47FFCF16.6020302@isomerica.net>
References: <200804101027456.SM08116@[66.94.32.4]>
	<1207837970.15625.626.camel@hrosenstock-ws.xsigo.com>
	<47FFCF16.6020302@isomerica.net>
Message-ID: <20080427113801.GC22406@sashak.voltaire.com>

Hi Dan,

On 16:50 Fri 11 Apr     , Dan Noe wrote:
>
> The solution Joel had mentioned was to use madrpc_init() and then call 
> port_performance_reset() to reset the port.  But madrpc_init keeps a static 
> file descriptor (mad_portid) that is used for subsequent calls (such as is 
> eventually used when port_performance_reset() is called). And, there does 
> not seem to be any method to close this file descriptor.
>
> So, it is impossible to extend this method to multiple devices (or even 
> multiple ports).  With a single call to madrpc_init one can perpetually 
> reset the performance counters in the polling loop but this approach 
> doesn't work with multiple devices.

Why do you need to open multiple devices/ports? Are you using this tool
for multiple IB subnets handling?

> If madrpc_init is called more than 
> once, it leaks a file descriptor.

Yes, madrpc_init() is old and it works in this way. There are newer
mad_rpc_open_port() and mad_rpc_close_port() functions in libibmad which
support multiple devices/ports you can use it instead.

Sasha


From ogerlitz at voltaire.com  Sun Apr 27 01:47:54 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Sun, 27 Apr 2008 11:47:54 +0300
Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting
	rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB
	mcast	group due to a temporary node soft lockup.)
In-Reply-To: <20080424181657.28d58a29.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>	<48109087.6030606@voltaire.com>	<20080424143125.2aad1db8.weiny2@llnl.gov>	<15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com>
	<20080424181657.28d58a29.weiny2@llnl.gov>
Message-ID: <48143DBA.3080701@voltaire.com>

Ira Weiny wrote:
>
> I did not get any output with multicast_debug_level!  
why should you, as from the node's point of view nothing has happened 
(the exact param name is mcast_debug_level)
>
> Here is a patch which fixes the problem.  (At least with the partial sub-nets
> configuration I explained before.)  I will have to verify this fixes the problem
> I originally reported.
OK, good. Does this problem exist in the released openSM? if yes, what 
would be the trigger for the SM to "really discover" (i.e do PortInfo 
SET) this sub-fabric and how much time would it take to reach this 
trigger, worst case wise?

The failure configuration you have set to reproduce the problem is very 
untypical, I think. Since under common clos etc topologies which don't 
have a 1:n blocking nature, failure of such link would cause re-route 
etc by the SM which would not (and should not) be noted by the nodes (I 
hope I am not falling into another problem here...)

Or.


From sashak at voltaire.com  Sun Apr 27 06:53:36 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 27 Apr 2008 13:53:36 +0000
Subject: [ofa-general] Re: [PATCH] opensm/configure.in: Fix the QOS and
	prefix routes config file default locations
In-Reply-To: <20080422140601.64764e18.weiny2@llnl.gov>
References: <20080422140601.64764e18.weiny2@llnl.gov>
Message-ID: <20080427135336.GH22406@sashak.voltaire.com>

On 14:06 Tue 22 Apr     , Ira Weiny wrote:
> From ef37654c0917875129fa2bad2e8ee0dd0d3f8859 Mon Sep 17 00:00:00 2001
> From: Ira K. Weiny <weiny2 at llnl.gov>
> Date: Fri, 18 Apr 2008 15:51:58 -0700
> Subject: [PATCH] opensm/configure.in: Fix the QOS and prefix routes config file default
> locations
> 
> Signed-off-by: Ira K. Weiny <weiny2 at llnl.gov>

Applied. Thanks.

Sasha


From andrea at qumranet.com  Sun Apr 27 05:27:27 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Sun, 27 Apr 2008 14:27:27 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080426131734.GB19717@sgi.com>
References: <20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com> <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com>
Message-ID: <20080427122727.GO9514@duo.random>

On Sat, Apr 26, 2008 at 08:17:34AM -0500, Robin Holt wrote:
> the first four sets.  The fifth is the oversubscription test which trips
> my xpmem bug.  This is as good as the v12 runs from before.

Now that mmu-notifier-core #v14 seems finished and hopefully will
appear in 2.6.26 ;), I started exercising more the kvm-mmu-notifier
code with the full patchset applied and not only with
mmu-notifier-core. I soon found the full patchset has a swap deadlock
bug. Then I tried without using kvm (so with mmu notifier disarmed)
and I could still reproduce the crashes. After grabbing a few stack
traces I tracked it down to a bug in the i_mmap_lock->i_mmap_sem
conversion. If you oversubscription means swapping, you should retest
with this applied on #v14 i_mmap_sem patch as it would eventually
deadlock with all tasks allocating memory in D state without this. Now
the full patchset is as rock solid as with only mmu-notifier-core
applied. It's swapping 2G memhog on top of a 3G VM with 2G of ram for
the last hours without a problem. Everything is working great with KVM
at least.

Talking about post 2.6.26: the refcount with rcu in the anon-vma
conversion seems unnecessary and may explain part of the AIM slowdown
too. The rest looks ok and probably we should switch the code to a
compile-time decision between rwlock and rwsem (so obsoleting the
current spinlock).

diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1008,7 +1008,7 @@ static int try_to_unmap_file(struct page
 	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
 		vma->vm_private_data = NULL;
 out:
-	up_write(&mapping->i_mmap_sem);
+	up_read(&mapping->i_mmap_sem);
 	return ret;
 }
 

From dorfman.eli at gmail.com  Sun Apr 27 05:49:33 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Sun, 27 Apr 2008 15:49:33 +0300
Subject: [ofa-general] [PATCH 0/2] IB/iSER: Calculating the VA in iSER header
Message-ID: <694d48600804270549p1945a618t9ff3aac21c9f6114@mail.gmail.com>

The following patch set includes a bug fix for the VA value in the iSER
header.
The current value is incorrect according to the iSER spec.
This patch set includes a bug fix for the initiator code that was made
against the 2.6.26 branch
and a fix for the iSER code in STGT.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080427/6546ee32/attachment.html>

From dorfman.eli at gmail.com  Sun Apr 27 05:53:19 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Sun, 27 Apr 2008 15:53:19 +0300
Subject: [ofa-general] [PATCH 1/2] IB/iSER: Do not add unsolicited data
	offset to VA in iSER header
Message-ID: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com>

iSER initiator sends a VA (in the iSER header) which includes
an offset for the unsolicited data (which is wrong according to the spec).

Signed-off-by: Eli Dorfman <elid at voltaire.com>
Signed-off-by: Erez Zilber <erezz at voltaire.com>
---
 drivers/infiniband/ulp/iser/iser_initiator.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c
b/drivers/infiniband/ulp/iser/iser_initiator.c
index 08dc81c..5c2bbc6 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -154,12 +154,12 @@ iser_prepare_write_cmd(struct iscsi_cmd_task *ctask,
 	if (unsol_sz < edtl) {
 		hdr->flags     |= ISER_WSV;
 		hdr->write_stag = cpu_to_be32(regd_buf->reg.rkey);
-		hdr->write_va   = cpu_to_be64(regd_buf->reg.va + unsol_sz);
+		hdr->write_va   = cpu_to_be64(regd_buf->reg.va);

 		iser_dbg("Cmd itt:%d, WRITE tags, RKEY:%#.4X "
-			 "VA:%#llX + unsol:%d\n",
+			 "VA:%#llX\n",
 			 ctask->itt, regd_buf->reg.rkey,
-			 (unsigned long long)regd_buf->reg.va, unsol_sz);
+			 (unsigned long long)regd_buf->reg.va);
 	}

 	if (imm_sz > 0) {
-- 
1.5.5


From dorfman.eli at gmail.com  Sun Apr 27 05:55:00 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Sun, 27 Apr 2008 15:55:00 +0300
Subject: [ofa-general] [PATCH 2/2] IB/iSER: Use offset from r2t header for
	rdma
Message-ID: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com>

Use offset from r2t header for rdma instead of using
internal offset counter.

Signed-off-by: Eli Dorfman <elid at voltaire.com>
---
 usr/iscsi/iscsi_rdma.c |   16 +++++-----------
 1 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/usr/iscsi/iscsi_rdma.c b/usr/iscsi/iscsi_rdma.c
index d46ddff..84f5949 100644
--- a/usr/iscsi/iscsi_rdma.c
+++ b/usr/iscsi/iscsi_rdma.c
@@ -1447,28 +1447,22 @@ static int iscsi_rdma_rdma_read(struct
iscsi_connection *conn)
 	struct iscsi_r2t_rsp *r2t = (struct iscsi_r2t_rsp *) &conn->rsp.bhs;
 	uint8_t *buf;
 	uint32_t len;
+	uint32_t offset;
 	int ret;

 	buf = (uint8_t *) task->data + task->offset;
 	len = be32_to_cpu(r2t->data_length);
+	offset = be32_to_cpu(r2t->data_offset);

-	dprintf("len %u stag %x va %llx\n",
+	dprintf("len %u stag %x va %llx offset %x\n",
 		len, itask->rem_write_stag,
-		(unsigned long long) itask->rem_write_va);
+		(unsigned long long) itask->rem_write_va, offset);

 	ret = iser_post_rdma_wr(ci, task, buf, len, IBV_WR_RDMA_READ,
-				itask->rem_write_va, itask->rem_write_stag);
+				itask->rem_write_va + offset, itask->rem_write_stag);
 	if (ret < 0)
 		return ret;

-	/*
-	 * Initiator registers the entire buffer, but gives us a VA that
-	 * is advanced by immediate + unsolicited data amounts.  Advance
-	 * rem_va as we read, knowing that the target always grabs segments
-	 * in order.
-	 */
-	itask->rem_write_va += len;
-
 	return 0;
 }

-- 
1.5.5


From sashak at voltaire.com  Sun Apr 27 10:11:40 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Sun, 27 Apr 2008 17:11:40 +0000
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <20080423133816.6c1b6315.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
Message-ID: <20080427171140.GI22406@sashak.voltaire.com>

Hi Ira,

On 13:38 Wed 23 Apr     , Ira Weiny wrote:
> 
> The symptom is that nodes drop out of the IPoIB mcast group after a node
> temporarily goes catatonic.  The details are:
> 
>    1) Issues on a node cause a soft lockup of the node.
>    2) OpenSM does a normal light sweep.
>    3) MADs to the node time out since the node is in a "bad state"

Normally during light sweep OpenSM will not query nodes. I think OpenSM
should not detect such soft lockup unless ib link state was changed and
heavy sweep was triggered. Is this the case?

>    4) OpenSM marks the node down and drops it from internal tables, including
>       mcast groups.
>    5) Node recovers from soft lock up condition.
>    6) A subsequent sweep causes OpenSM see the node and add it back to the
>       fabric.
>    7) Node is fully functional on the verbs layer but IPoIB never knew anything
>       was wrong so it does _not_ rejoin the mcast groups.  (This is different
>       from the condition where the link actually goes down.)

If my approach above is correct it should be same as port down/up
handling. And as was noted already in this thread OpenSM should ask
for reregistration (by setting client reregistration bit).

I see your patch - seems this part is buggy in OpenSM now, will see
closer to this.

Sasha


From tziporet at dev.mellanox.co.il  Sun Apr 27 07:17:09 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Sun, 27 Apr 2008 17:17:09 +0300
Subject: [ewg] Re: [ofa-general] Agenda for the OFED meeting today
In-Reply-To: <480E43C0.6080107@opengridcomputing.com>
References: <6C2C79E72C305246B504CBA17B5500C903D375E4@mtlexch01.mtl.com>
	<480E43C0.6080107@opengridcomputing.com>
Message-ID: <48148AE5.4020801@mellanox.co.il>

Steve Wise wrote:
>
>>       Note: daily builds of 1.3.1 are already available at:
>>       _http://www.openfabrics.org/builds/ofed-1.3.1_
>>
>
> Is there a new git repos for the 1.3.1 kernel?  Or just using the 1.3 
> repos?
>
We use the same git tree as for 1.3

Tziporet


From swise at opengridcomputing.com  Sun Apr 27 08:54:56 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sun, 27 Apr 2008 10:54:56 -0500
Subject: [ofa-general] [PATCH 2.6.26 0/3] RDMA/cxgb3: fixes and enhancements
	for 2.6.26
Message-ID: <20080427155456.31018.22282.stgit@dell3.ogc.int>

The following series fixes some bugs as well as enabling peer-2-peer
applications including OpenMPI and HPMPI.

I hope this can make 2.6.26.

NOTE: The changes in patch 3 require a new firmware version.  I added
the version change to drivers/net/cxgb3/version.h in this patch so that
the changes that require the new firmware as well as the version bump
are all in one git commit.  This keeps things like 'git bisect' from
leaving the driver broken.

-- 
Steve.


From swise at opengridcomputing.com  Sun Apr 27 09:00:06 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sun, 27 Apr 2008 11:00:06 -0500
Subject: [ofa-general] [PATCH 2.6.26 1/3] RDMA/cxgb3: Correctly serialize
	peer abort path.
In-Reply-To: <20080427155456.31018.22282.stgit@dell3.ogc.int>
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
Message-ID: <20080427160006.31018.66715.stgit@dell3.ogc.int>


OpenMPI and other stress testing exposed a few bad bugs in handling
aborts in the middle of a normal close.

- serialize abort reply and peer abort processing with
disconnect processing

- warn (and ignore) if ep timer is stopped when it wasn't running

- cleaned up disconnect path to correctly deal with aborting and dead
endpoints

- in iwch_modify_qp(), add a ref to the ep before releasing the qp lock if
iwch_ep_disconnect() will be called.  The dref after calling disconnect.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/iwch_cm.c |   98 ++++++++++++++++++++++-----------
 drivers/infiniband/hw/cxgb3/iwch_cm.h |    1 
 drivers/infiniband/hw/cxgb3/iwch_qp.c |    6 ++
 3 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 99f2f2a..1627bff 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -125,6 +125,12 @@ static void start_ep_timer(struct iwch_ep *ep)
 static void stop_ep_timer(struct iwch_ep *ep)
 {
 	PDBG("%s ep %p\n", __FUNCTION__, ep);
+	if (!timer_pending(&ep->timer)) {
+		printk(KERN_ERR "%s timer stopped when its not running!  ep %p state %u\n",
+			__FUNCTION__, ep, ep->com.state);
+		WARN_ON(1);
+		return;
+	}
 	del_timer_sync(&ep->timer);
 	put_ep(&ep->com);
 }
@@ -1083,8 +1089,11 @@ static int tx_ack(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 static int abort_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 {
 	struct iwch_ep *ep = ctx;
+	unsigned long flags;
+	int release = 0;
 
 	PDBG("%s ep %p\n", __FUNCTION__, ep);
+	BUG_ON(!ep);
 
 	/*
 	 * We get 2 abort replies from the HW.  The first one must
@@ -1095,9 +1104,22 @@ static int abort_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 		return CPL_RET_BUF_DONE;
 	}
 
-	close_complete_upcall(ep);
-	state_set(&ep->com, DEAD);
-	release_ep_resources(ep);
+	spin_lock_irqsave(&ep->com.lock, flags);
+	switch (ep->com.state) {
+	case ABORTING:
+		close_complete_upcall(ep);
+		__state_set(&ep->com, DEAD);
+		release = 1;
+		break;
+	default:
+		printk(KERN_ERR "%s ep %p state %d\n",
+		     __FUNCTION__, ep, ep->com.state);
+		break;
+	}
+	spin_unlock_irqrestore(&ep->com.lock, flags);
+
+	if (release)
+		release_ep_resources(ep);
 	return CPL_RET_BUF_DONE;
 }
 
@@ -1470,7 +1492,8 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 	struct sk_buff *rpl_skb;
 	struct iwch_qp_attributes attrs;
 	int ret;
-	int state;
+	int release = 0;
+	unsigned long flags;
 
 	if (is_neg_adv_abort(req->status)) {
 		PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep,
@@ -1488,9 +1511,9 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 		return CPL_RET_BUF_DONE;
 	}
 
-	state = state_read(&ep->com);
-	PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state);
-	switch (state) {
+	spin_lock_irqsave(&ep->com.lock, flags);
+	PDBG("%s ep %p state %u\n", __FUNCTION__, ep, ep->com.state);
+	switch (ep->com.state) {
 	case CONNECTING:
 		break;
 	case MPA_REQ_WAIT:
@@ -1536,21 +1559,25 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 		break;
 	case DEAD:
 		PDBG("%s PEER_ABORT IN DEAD STATE!!!!\n", __FUNCTION__);
+		spin_unlock_irqrestore(&ep->com.lock, flags);
 		return CPL_RET_BUF_DONE;
 	default:
 		BUG_ON(1);
 		break;
 	}
 	dst_confirm(ep->dst);
+	if (ep->com.state != ABORTING) {
+		__state_set(&ep->com, DEAD);
+		release = 1;
+	}
+	spin_unlock_irqrestore(&ep->com.lock, flags);
 
 	rpl_skb = get_skb(skb, sizeof(*rpl), GFP_KERNEL);
 	if (!rpl_skb) {
 		printk(KERN_ERR MOD "%s - cannot allocate skb!\n",
 		       __FUNCTION__);
-		dst_release(ep->dst);
-		l2t_release(L2DATA(ep->com.tdev), ep->l2t);
-		put_ep(&ep->com);
-		return CPL_RET_BUF_DONE;
+		release = 1;
+		goto out;
 	}
 	rpl_skb->priority = CPL_PRIORITY_DATA;
 	rpl = (struct cpl_abort_rpl *) skb_put(rpl_skb, sizeof(*rpl));
@@ -1559,10 +1586,9 @@ static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 	OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_ABORT_RPL, ep->hwtid));
 	rpl->cmd = CPL_ABORT_NO_RST;
 	cxgb3_ofld_send(ep->com.tdev, rpl_skb);
-	if (state != ABORTING) {
-		state_set(&ep->com, DEAD);
+out:
+	if (release)
 		release_ep_resources(ep);
-	}
 	return CPL_RET_BUF_DONE;
 }
 
@@ -1661,15 +1687,18 @@ static void ep_timeout(unsigned long arg)
 	struct iwch_ep *ep = (struct iwch_ep *)arg;
 	struct iwch_qp_attributes attrs;
 	unsigned long flags;
+	int abort=1;
 
 	spin_lock_irqsave(&ep->com.lock, flags);
 	PDBG("%s ep %p tid %u state %d\n", __FUNCTION__, ep, ep->hwtid,
 	     ep->com.state);
 	switch (ep->com.state) {
 	case MPA_REQ_SENT:
+		__state_set(&ep->com, ABORTING);
 		connect_reply_upcall(ep, -ETIMEDOUT);
 		break;
 	case MPA_REQ_WAIT:
+		__state_set(&ep->com, ABORTING);
 		break;
 	case CLOSING:
 	case MORIBUND:
@@ -1679,13 +1708,17 @@ static void ep_timeout(unsigned long arg)
 				     ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
 				     &attrs, 1);
 		}
+		__state_set(&ep->com, ABORTING);
 		break;
 	default:
-		BUG();
+		printk(KERN_ERR "%s unexpected state ep %p state %u\n",
+			__FUNCTION__, ep, ep->com.state);
+		WARN_ON(1);
+		abort=0;
 	}
-	__state_set(&ep->com, CLOSING);
 	spin_unlock_irqrestore(&ep->com.lock, flags);
-	abort_connection(ep, NULL, GFP_ATOMIC);
+	if (abort)
+		abort_connection(ep, NULL, GFP_ATOMIC);
 	put_ep(&ep->com);
 }
 
@@ -1968,34 +2001,33 @@ int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp)
 	PDBG("%s ep %p state %s, abrupt %d\n", __FUNCTION__, ep,
 	     states[ep->com.state], abrupt);
 
-	if (ep->com.state == DEAD) {
-		PDBG("%s already dead ep %p\n", __FUNCTION__, ep);
-		goto out;
-	}
-
-	if (abrupt) {
-		if (ep->com.state != ABORTING) {
-			ep->com.state = ABORTING;
-			close = 1;
-		}
-		goto out;
-	}
-
 	switch (ep->com.state) {
 	case MPA_REQ_WAIT:
 	case MPA_REQ_SENT:
 	case MPA_REQ_RCVD:
 	case MPA_REP_SENT:
 	case FPDU_MODE:
-		start_ep_timer(ep);
-		ep->com.state = CLOSING;
 		close = 1;
+		if (abrupt)
+			ep->com.state = ABORTING;
+		else {
+			ep->com.state = CLOSING;
+			start_ep_timer(ep);
+		}
 		break;
 	case CLOSING:
-		ep->com.state = MORIBUND;
 		close = 1;
+		if (abrupt) {
+			stop_ep_timer(ep);
+			ep->com.state = ABORTING;
+		} else
+			ep->com.state = MORIBUND;
 		break;
 	case MORIBUND:
+	case ABORTING:
+	case DEAD:
+		PDBG("%s ignoring disconnect ep %p state %u\n",
+		     __FUNCTION__, ep, ep->com.state);
 		break;
 	default:
 		BUG();
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
index 6107e7c..a3fb959 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -56,6 +56,7 @@
 #define put_ep(ep) { \
 	PDBG("put_ep (via %s:%u) ep %p refcnt %d\n", __FUNCTION__, __LINE__,  \
 	     ep, atomic_read(&((ep)->kref.refcount))); \
+	WARN_ON(atomic_read(&((ep)->kref.refcount)) < 1); \
 	kref_put(&((ep)->kref), __free_ep); \
 }
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index ea2cdd7..c02bb94 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -832,6 +832,7 @@ int iwch_modify_qp(struct iwch_dev *rhp, struct iwch_qp *qhp,
 				abort=0;
 				disconnect = 1;
 				ep = qhp->ep;
+				get_ep(&ep->com);
 			}
 			flush_qp(qhp, &flag);
 			break;
@@ -848,6 +849,7 @@ int iwch_modify_qp(struct iwch_dev *rhp, struct iwch_qp *qhp,
 				abort=1;
 				disconnect = 1;
 				ep = qhp->ep;
+				get_ep(&ep->com);
 			}
 			goto err;
 			break;
@@ -929,8 +931,10 @@ out:
 	 * on the EP.  This can be a normal close (RTS->CLOSING) or
 	 * an abnormal close (RTS/CLOSING->ERROR).
 	 */
-	if (disconnect)
+	if (disconnect) {
 		iwch_ep_disconnect(ep, abort, GFP_KERNEL);
+		put_ep(&ep->com);
+	}
 
 	/*
 	 * If free is 1, then we've disassociated the EP from the QP


From swise at opengridcomputing.com  Sun Apr 27 09:00:08 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sun, 27 Apr 2008 11:00:08 -0500
Subject: [ofa-general] [PATCH 2.6.26 2/3] RDMA/cxgb3: Correctly set the
	max_mr_size device attribute.
In-Reply-To: <20080427155456.31018.22282.stgit@dell3.ogc.int>
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
Message-ID: <20080427160008.31018.15516.stgit@dell3.ogc.int>


cxgb3 only supports 4GB memory regions.  The lustre RDMA code uses this
attribute and currently has to code around our bad setting.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_hal.h      |    1 +
 drivers/infiniband/hw/cxgb3/iwch.c          |    1 +
 drivers/infiniband/hw/cxgb3/iwch.h          |    1 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c |    2 +-
 4 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.h b/drivers/infiniband/hw/cxgb3/cxio_hal.h
index 99543d6..2bcff7f 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.h
@@ -53,6 +53,7 @@
 #define T3_MAX_PBL_SIZE 256
 #define T3_MAX_RQ_SIZE 1024
 #define T3_MAX_NUM_STAG (1<<15)
+#define T3_MAX_MR_SIZE 0x100000000ULL
 
 #define T3_STAG_UNSET 0xffffffff
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
index 0315c9d..98a768f 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.c
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -83,6 +83,7 @@ static void rnic_init(struct iwch_dev *rnicp)
 	rnicp->attr.max_phys_buf_entries = T3_MAX_PBL_SIZE;
 	rnicp->attr.max_pds = T3_MAX_NUM_PD - 1;
 	rnicp->attr.mem_pgsizes_bitmask = 0x7FFF;	/* 4KB-128MB */
+	rnicp->attr.max_mr_size = T3_MAX_MR_SIZE;
 	rnicp->attr.can_resize_wq = 0;
 	rnicp->attr.max_rdma_reads_per_qp = 8;
 	rnicp->attr.max_rdma_read_resources =
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index caf4e60..238c103 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -66,6 +66,7 @@ struct iwch_rnic_attributes {
 	 * size (4k)^i.  Phys block list mode unsupported.
 	 */
 	u32 mem_pgsizes_bitmask;
+	u64 max_mr_size;
 	u8 can_resize_wq;
 
 	/*
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b2ea921..f7df213 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -998,7 +998,7 @@ static int iwch_query_device(struct ib_device *ibdev,
 	props->device_cap_flags = dev->device_cap_flags;
 	props->vendor_id = (u32)dev->rdev.rnic_info.pdev->vendor;
 	props->vendor_part_id = (u32)dev->rdev.rnic_info.pdev->device;
-	props->max_mr_size = ~0ull;
+	props->max_mr_size = dev->attr.max_mr_size;
 	props->max_qp = dev->attr.max_qps;
 	props->max_qp_wr = dev->attr.max_wrs;
 	props->max_sge = dev->attr.max_sge_per_wr;


From swise at opengridcomputing.com  Sun Apr 27 09:00:10 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sun, 27 Apr 2008 11:00:10 -0500
Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer
	connection setup.
In-Reply-To: <20080427155456.31018.22282.stgit@dell3.ogc.int>
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
Message-ID: <20080427160010.31018.67436.stgit@dell3.ogc.int>


Open MPI, Intel MPI and other applications don't support the iWARP
requirement that the client side send the first RDMA message.  This class
of application connection setup is called peer-2-peer.  Typically once
the connection is setup, _both_ sides want to send data.

This patch enables supporting peer-2-peer over the chelsio rnic by
enforcing this iWARP requirement in the driver itself as part of RDMA
connection setup.

Connection setup is extended, when peer2peer is 1, such that the MPA
initiator will send a 0B Read (the RTR) just after connection setup.
The MPA responder will suspend SQ processing until the RTR message is
received and reply-to.

Design:

- Add a module option, peer2peer, to enable this mode.

- New firmware support for peer-2-peer mode:

	- a new bits in the rdma_init WR to tell it to do peer-2-peer
	and what form of RTR message to send or expect.

	- process _all_ preposted recvs before moving the connection
	into rdma mode.

	- passive side: defer completing the rdma_init WR until all
	pre-posted recvs are processed.  Suspend SQ processing until
	the RTR is received.

	- active side: expect and process the 0B read WR on offload tx
	queue.	Defer completing the rdma_init WR until all pre-posted
	recvs are processed.  Suspend SQ processing until the 0B read
	WR is processed from the offload tx queue.

- If peer2peer is set, driver posts 0B read request on offload tx queue just
after posting the rdma_init wr to the offload tx queue.

- Add cq poll logic to ignore unsolicitied read responses.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---

 drivers/infiniband/hw/cxgb3/cxio_hal.c      |   18 ++++++-
 drivers/infiniband/hw/cxgb3/cxio_wr.h       |   21 +++++++-
 drivers/infiniband/hw/cxgb3/iwch_cm.c       |   68 +++++++++++++++++++--------
 drivers/infiniband/hw/cxgb3/iwch_cm.h       |    1 
 drivers/infiniband/hw/cxgb3/iwch_provider.h |    3 +
 drivers/infiniband/hw/cxgb3/iwch_qp.c       |   54 ++++++++++++++++++++-
 drivers/net/cxgb3/version.h                 |    2 -
 7 files changed, 137 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c b/drivers/infiniband/hw/cxgb3/cxio_hal.c
index 03c5ff6..3de0fbf 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
+++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
@@ -456,7 +456,8 @@ void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
 	ptr = cq->sw_rptr;
 	while (!Q_EMPTY(ptr, cq->sw_wptr)) {
 		cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
-		if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) &&
+		if ((SQ_TYPE(*cqe) ||
+		     ((CQE_OPCODE(*cqe) == T3_READ_RESP) && wq->oldest_read)) &&
 		    (CQE_QPID(*cqe) == wq->qpid))
 			(*count)++;
 		ptr++;
@@ -829,7 +830,8 @@ int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr)
 	wqe->mpaattrs = attr->mpaattrs;
 	wqe->qpcaps = attr->qpcaps;
 	wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss);
-	wqe->flags = cpu_to_be32(attr->flags);
+	wqe->rqe_count = cpu_to_be16(attr->rqe_count);
+	wqe->flags_rtr_type = cpu_to_be16(attr->flags|V_RTR_TYPE(attr->rtr_type));
 	wqe->ord = cpu_to_be32(attr->ord);
 	wqe->ird = cpu_to_be32(attr->ird);
 	wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr);
@@ -1135,6 +1137,18 @@ int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
 	if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) {
 
 		/*
+		 * If this is an unsolicited read response, then the read
+		 * was generated by the kernel driver as part of peer-2-peer
+		 * connection setup.  So ignore the completion.
+		 */
+		if (!wq->oldest_read) {
+			if (CQE_STATUS(*hw_cqe))
+				wq->error = 1;
+			ret = -1;
+			goto skip_cqe;
+		}
+
+		/*
 		 * Don't write to the HWCQ, so create a new read req CQE
 		 * in local memory.
 		 */
diff --git a/drivers/infiniband/hw/cxgb3/cxio_wr.h b/drivers/infiniband/hw/cxgb3/cxio_wr.h
index 969d4d9..f1a25a8 100644
--- a/drivers/infiniband/hw/cxgb3/cxio_wr.h
+++ b/drivers/infiniband/hw/cxgb3/cxio_wr.h
@@ -278,6 +278,17 @@ enum t3_qp_caps {
 	uP_RI_QP_STAG0_ENABLE = 0x10
 } __attribute__ ((packed));
 
+enum rdma_init_rtr_types {
+	RTR_READ = 1,
+	RTR_WRITE = 2,
+	RTR_SEND = 3,
+};
+
+#define S_RTR_TYPE	2
+#define M_RTR_TYPE	0x3
+#define V_RTR_TYPE(x)	((x) << S_RTR_TYPE)
+#define G_RTR_TYPE(x)	((((x) >> S_RTR_TYPE)) & M_RTR_TYPE)
+
 struct t3_rdma_init_attr {
 	u32 tid;
 	u32 qpid;
@@ -293,7 +304,9 @@ struct t3_rdma_init_attr {
 	u32 ird;
 	u64 qp_dma_addr;
 	u32 qp_dma_size;
-	u32 flags;
+	enum rdma_init_rtr_types rtr_type;
+	u16 flags;
+	u16 rqe_count;
 	u32 irs;
 };
 
@@ -309,8 +322,8 @@ struct t3_rdma_init_wr {
 	u8 mpaattrs;		/* 5 */
 	u8 qpcaps;
 	__be16 ulpdu_size;
-	__be32 flags;		/* bits 31-1 - reservered */
-				/* bit     0 - set if RECV posted */
+	__be16 flags_rtr_type;
+	__be16 rqe_count;
 	__be32 ord;		/* 6 */
 	__be32 ird;
 	__be64 qp_dma_addr;	/* 7 */
@@ -324,7 +337,7 @@ struct t3_genbit {
 };
 
 enum rdma_init_wr_flags {
-	RECVS_POSTED = (1<<0),
+	MPA_INITIATOR = (1<<0),
 	PRIV_QP = (1<<1),
 };
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index 1627bff..f4f3c9e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -63,6 +63,10 @@ static char *states[] = {
 	NULL,
 };
 
+int peer2peer = 0;
+module_param(peer2peer, int, 0644);
+MODULE_PARM_DESC(peer2peer, "Support peer2peer ULPs (default=0)");
+
 static int ep_timeout_secs = 10;
 module_param(ep_timeout_secs, int, 0644);
 MODULE_PARM_DESC(ep_timeout_secs, "CM Endpoint operation timeout "
@@ -514,7 +518,7 @@ static void send_mpa_req(struct iwch_ep *ep, struct sk_buff *skb)
 	skb_reset_transport_header(skb);
 	len = skb->len;
 	req = (struct tx_data_wr *) skb_push(skb, sizeof(*req));
-	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
+	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)|F_WR_COMPL);
 	req->wr_lo = htonl(V_WR_TID(ep->hwtid));
 	req->len = htonl(len);
 	req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
@@ -565,7 +569,7 @@ static int send_mpa_reject(struct iwch_ep *ep, const void *pdata, u8 plen)
 	set_arp_failure_handler(skb, arp_failure_discard);
 	skb_reset_transport_header(skb);
 	req = (struct tx_data_wr *) skb_push(skb, sizeof(*req));
-	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
+	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)|F_WR_COMPL);
 	req->wr_lo = htonl(V_WR_TID(ep->hwtid));
 	req->len = htonl(mpalen);
 	req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
@@ -617,7 +621,7 @@ static int send_mpa_reply(struct iwch_ep *ep, const void *pdata, u8 plen)
 	skb_reset_transport_header(skb);
 	len = skb->len;
 	req = (struct tx_data_wr *) skb_push(skb, sizeof(*req));
-	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
+	req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA)|F_WR_COMPL);
 	req->wr_lo = htonl(V_WR_TID(ep->hwtid));
 	req->len = htonl(len);
 	req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
@@ -885,6 +889,7 @@ static void process_mpa_reply(struct iwch_ep *ep, struct sk_buff *skb)
 	 * the MPA header is valid.
 	 */
 	state_set(&ep->com, FPDU_MODE);
+	ep->mpa_attr.initiator = 1;
 	ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0;
 	ep->mpa_attr.recv_marker_enabled = markers_enabled;
 	ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0;
@@ -907,8 +912,14 @@ static void process_mpa_reply(struct iwch_ep *ep, struct sk_buff *skb)
 	/* bind QP and TID with INIT_WR */
 	err = iwch_modify_qp(ep->com.qp->rhp,
 			     ep->com.qp, mask, &attrs, 1);
-	if (!err)
-		goto out;
+	if (err)
+		goto err;
+
+	if (peer2peer && iwch_rqes_posted(ep->com.qp) == 0) {
+		iwch_post_zb_read(ep->com.qp);
+	}
+
+	goto out;
 err:
 	abort_connection(ep, skb, GFP_KERNEL);
 out:
@@ -1001,6 +1012,7 @@ static void process_mpa_request(struct iwch_ep *ep, struct sk_buff *skb)
 	 * If we get here we have accumulated the entire mpa
 	 * start reply message including private data.
 	 */
+	ep->mpa_attr.initiator = 0;
 	ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0;
 	ep->mpa_attr.recv_marker_enabled = markers_enabled;
 	ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0;
@@ -1071,17 +1083,33 @@ static int tx_ack(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 
 	PDBG("%s ep %p credits %u\n", __FUNCTION__, ep, credits);
 
-	if (credits == 0)
+	if (credits == 0) {
+		PDBG(KERN_ERR "%s 0 credit ack  ep %p state %u\n",
+			__FUNCTION__, ep, state_read(&ep->com));
 		return CPL_RET_BUF_DONE;
+	}
+
 	BUG_ON(credits != 1);
-	BUG_ON(ep->mpa_skb == NULL);
-	kfree_skb(ep->mpa_skb);
-	ep->mpa_skb = NULL;
 	dst_confirm(ep->dst);
-	if (state_read(&ep->com) == MPA_REP_SENT) {
-		ep->com.rpl_done = 1;
-		PDBG("waking up ep %p\n", ep);
-		wake_up(&ep->com.waitq);
+	if (!ep->mpa_skb) {
+		PDBG("%s rdma_init wr_ack ep %p state %u\n",
+			__FUNCTION__, ep, state_read(&ep->com));
+		if (ep->mpa_attr.initiator) {
+			PDBG("%s initiator ep %p state %u\n",
+				__FUNCTION__, ep, state_read(&ep->com));
+			if (peer2peer)
+				iwch_post_zb_read(ep->com.qp);
+		} else {
+			PDBG("%s responder ep %p state %u\n",
+				__FUNCTION__, ep, state_read(&ep->com));
+			ep->com.rpl_done = 1;
+			wake_up(&ep->com.waitq);
+		}
+	} else {
+		PDBG("%s lsm ack ep %p state %u freeing skb\n",
+			__FUNCTION__, ep, state_read(&ep->com));
+		kfree_skb(ep->mpa_skb);
+		ep->mpa_skb = NULL;
 	}
 	return CPL_RET_BUF_DONE;
 }
@@ -1795,16 +1823,19 @@ int iwch_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 	if (err)
 		goto err;
 
+	/* if needed, wait for wr_ack */
+	if (iwch_rqes_posted(qp)) {
+		wait_event(ep->com.waitq, ep->com.rpl_done);
+		err = ep->com.rpl_err;
+		if (err)
+			goto err;
+	}
+
 	err = send_mpa_reply(ep, conn_param->private_data,
 			     conn_param->private_data_len);
 	if (err)
 		goto err;
 
-	/* wait for wr_ack */
-	wait_event(ep->com.waitq, ep->com.rpl_done);
-	err = ep->com.rpl_err;
-	if (err)
-		goto err;
 
 	state_set(&ep->com, FPDU_MODE);
 	established_upcall(ep);
@@ -2033,7 +2064,6 @@ int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp)
 		BUG();
 		break;
 	}
-out:
 	spin_unlock_irqrestore(&ep->com.lock, flags);
 	if (close) {
 		if (abrupt)
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
index a3fb959..c0978a8 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -226,5 +226,6 @@ int iwch_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new, st
 
 int __init iwch_cm_init(void);
 void __exit iwch_cm_term(void);
+extern int peer2peer;
 
 #endif				/* _IWCH_CM_H_ */
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 48833f3..ad77f05 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -118,6 +118,7 @@ enum IWCH_QP_FLAGS {
 };
 
 struct iwch_mpa_attributes {
+	u8 initiator;
 	u8 recv_marker_enabled;
 	u8 xmit_marker_enabled;	/* iWARP: enable inbound Read Resp. */
 	u8 crc_enabled;
@@ -322,6 +323,7 @@ enum iwch_qp_query_flags {
 	IWCH_QP_QUERY_TEST_USERWRITE = 0x32	/* Test special */
 };
 
+u16 iwch_rqes_posted(struct iwch_qp *qhp);
 int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		      struct ib_send_wr **bad_wr);
 int iwch_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
@@ -331,6 +333,7 @@ int iwch_bind_mw(struct ib_qp *qp,
 			     struct ib_mw_bind *mw_bind);
 int iwch_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
 int iwch_post_terminate(struct iwch_qp *qhp, struct respQ_msg_t *rsp_msg);
+int iwch_post_zb_read(struct iwch_qp *qhp);
 int iwch_register_device(struct iwch_dev *dev);
 void iwch_unregister_device(struct iwch_dev *dev);
 int iwch_quiesce_qps(struct iwch_cq *chp);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index c02bb94..b0e5aea 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -586,6 +586,36 @@ static inline void build_term_codes(struct respQ_msg_t *rsp_msg,
 	}
 }
 
+int iwch_post_zb_read(struct iwch_qp *qhp)
+{
+	union t3_wr *wqe;
+	struct sk_buff *skb;
+	u8 flit_cnt = sizeof(struct t3_rdma_read_wr) >> 3;
+
+	PDBG("%s enter\n", __FUNCTION__);
+	skb = alloc_skb(40, GFP_KERNEL);
+	if (!skb) {
+		printk(KERN_ERR "%s cannot send zb_read!!\n", __FUNCTION__);
+		return -ENOMEM;
+	}
+	wqe = (union t3_wr *)skb_put(skb, sizeof(struct t3_rdma_read_wr));
+	memset(wqe, 0, sizeof(struct t3_rdma_read_wr));
+	wqe->read.rdmaop = T3_READ_REQ;
+	wqe->read.reserved[0] = 0;
+	wqe->read.reserved[1] = 0;
+	wqe->read.reserved[2] = 0;
+	wqe->read.rem_stag = cpu_to_be32(1);
+	wqe->read.rem_to = cpu_to_be64(1);
+	wqe->read.local_stag = cpu_to_be32(1);
+	wqe->read.local_len = cpu_to_be32(0);
+	wqe->read.local_to = cpu_to_be64(1);
+	wqe->send.wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_READ));
+	wqe->send.wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(qhp->ep->hwtid)|
+						V_FW_RIWR_LEN(flit_cnt));
+	skb->priority = CPL_PRIORITY_DATA;
+	return cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb);
+}
+
 /*
  * This posts a TERMINATE with layer=RDMA, type=catastrophic.
  */
@@ -671,11 +701,18 @@ static void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
 
 
 /*
- * Return non zero if at least one RECV was pre-posted.
+ * Return count of RECV WRs posted
  */
-static int rqes_posted(struct iwch_qp *qhp)
+u16 iwch_rqes_posted(struct iwch_qp *qhp)
 {
-	return fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV;
+	union t3_wr *wqe = qhp->wq.queue;
+	u16 count = 0;
+	while ((count+1) != 0 && fw_riwrh_opcode((struct fw_riwrh *)wqe) == T3_WR_RCV) {
+		count++;
+		wqe++;
+	}
+	PDBG("%s qhp %p count %u\n", __FUNCTION__, qhp, count);
+	return count;
 }
 
 static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp,
@@ -716,8 +753,17 @@ static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp,
 	init_attr.ird = qhp->attr.max_ird;
 	init_attr.qp_dma_addr = qhp->wq.dma_addr;
 	init_attr.qp_dma_size = (1UL << qhp->wq.size_log2);
-	init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0;
+	init_attr.rqe_count = iwch_rqes_posted(qhp);
+	init_attr.flags = qhp->attr.mpa_attr.initiator ? MPA_INITIATOR : 0;
 	init_attr.flags |= capable(CAP_NET_BIND_SERVICE) ? PRIV_QP : 0;
+	if (peer2peer) {
+		init_attr.rtr_type = RTR_READ;
+		if (init_attr.ord == 0 && qhp->attr.mpa_attr.initiator)
+			init_attr.ord = 1;
+		if (init_attr.ird == 0 && !qhp->attr.mpa_attr.initiator)
+			init_attr.ird = 1;
+	} else
+		init_attr.rtr_type = 0;
 	init_attr.irs = qhp->ep->rcv_seq;
 	PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d "
 	     "flags 0x%x qpcaps 0x%x\n", __FUNCTION__,
diff --git a/drivers/net/cxgb3/version.h b/drivers/net/cxgb3/version.h
index 229303f..a0177fc 100644
--- a/drivers/net/cxgb3/version.h
+++ b/drivers/net/cxgb3/version.h
@@ -38,7 +38,7 @@
 #define DRV_VERSION "1.0-ko"
 
 /* Firmware version */
-#define FW_VERSION_MAJOR 5
+#define FW_VERSION_MAJOR 6
 #define FW_VERSION_MINOR 0
 #define FW_VERSION_MICRO 0
 #endif				/* __CHELSIO_VERSION_H */


From rdreier at cisco.com  Sun Apr 27 09:30:47 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 27 Apr 2008 09:30:47 -0700
Subject: [ofa-general] Re: [PATCH 1/2] IB/iSER: Do not add unsolicited data
	offset to VA in iSER header
In-Reply-To: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com> (Eli
	Dorfman's message of "Sun, 27 Apr 2008 15:53:19 +0300")
References: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com>
Message-ID: <ada3ap745vs.fsf@cisco.com>

So what was the conclusion on the right way to handle the change that
affects on-the-wire data?  Just have a flag day so targets either work
with 2.6.25 and earlier initiators, or work with 2.6.26 and later
initiators, and corrupt data if someone mixes things the wrong way?

 - R.


From rdreier at cisco.com  Sun Apr 27 09:31:08 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 27 Apr 2008 09:31:08 -0700
Subject: [ofa-general] [PATCH 2/2] IB/iSER: Use offset from r2t header for
	rdma
In-Reply-To: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com>
	(Eli Dorfman's message of "Sun, 27 Apr 2008 15:55:00 +0300")
References: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com>
Message-ID: <aday76z2rar.fsf@cisco.com>

 >  usr/iscsi/iscsi_rdma.c |   16 +++++-----------

I have no idea what tree this file lives in so I'll just ignore this
patch, right?

 - R.


From rdreier at cisco.com  Sun Apr 27 09:34:00 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Sun, 27 Apr 2008 09:34:00 -0700
Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer
	connection setup.
In-Reply-To: <20080427160010.31018.67436.stgit@dell3.ogc.int> (Steve Wise's
	message of "Sun, 27 Apr 2008 11:00:10 -0500")
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
	<20080427160010.31018.67436.stgit@dell3.ogc.int>
Message-ID: <adatzhn2r5z.fsf@cisco.com>

What are the interoperability implications of this?

Looking closer I see that iw_nes has the send_first module parameter.
How does this interact with that?

I guess it's fine to apply this, but do we have a plan for how we want
to handle this issue in the long-term?

 - R.


From swise at opengridcomputing.com  Sun Apr 27 09:44:43 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Sun, 27 Apr 2008 11:44:43 -0500
Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer
	connection setup.
In-Reply-To: <adatzhn2r5z.fsf@cisco.com>
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>	<20080427160010.31018.67436.stgit@dell3.ogc.int>
	<adatzhn2r5z.fsf@cisco.com>
Message-ID: <4814AD7B.2060006@opengridcomputing.com>


Roland Dreier wrote:
> What are the interoperability implications of this?
> 
> Looking closer I see that iw_nes has the send_first module parameter.
> How does this interact with that?
> 

It doesn't...yet.  But we wanted to enable these applications for 
chelsio now and get the low level fw and driver changes done first and 
tested.

> I guess it's fine to apply this, but do we have a plan for how we want
> to handle this issue in the long-term?
> 

Yes!  If you'll recall, we had a thread on the ofa general list 
discussing how to enhance the MPA negotiation so peers can indicate 
whether they want/need the RTR and what type of RTR (0B read, 0B write, 
or 0B send) should be sent.  This will be done by standardizing a few 
bits of the private data in order to negotiate all this.  The rdma-cma 
API will be extended so applications will have to request this 
peer-2-peer model since it adds overhead to the connection setup.

I plan to do this work for 2.6.27/ofed-1.4.  I think it was listed in 
Felix's talk at Sonoma.  This work (design, API, and code changes 
affecting core and placing requirements on iwarp providers) will be 
posted as RFC changes to get everyones feedback as soon as I get 
something going.

Does that sound ok?


Steve.


From erezz at voltaire.com  Sun Apr 27 11:49:33 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Sun, 27 Apr 2008 21:49:33 +0300
Subject: [ofa-general] [PATCH 2/2] IB/iSER: Use offset from r2t header
	forrdma
References: <694d48600804270555i6ee55843x51c416294fec6397@mail.gmail.com>
	<aday76z2rar.fsf@cisco.com>
Message-ID: <39C75744D164D948A170E9792AF8E7CAF60D35@exil.voltaire.com>

> >  usr/iscsi/iscsi_rdma.c |   16 +++++-----------
> 
> I have no idea what tree this file lives in so I'll just ignore this
> patch, right?

As Eli mentioned in PATCH 0/2, the patch set contains a fix for the initiator side and another fix for the iSER code in stgt. That's why Fujita Tomonori (who maintains stgt) is on the thread. Although the 2 fixes are for separate trees, it's a single logcial change.
 
Erez


From erezz at voltaire.com  Sun Apr 27 11:53:41 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Sun, 27 Apr 2008 21:53:41 +0300
Subject: [ofa-general] RE: [PATCH 1/2] IB/iSER: Do not add unsolicited data
	offset to VA in iSER header
References: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com>
	<ada3ap745vs.fsf@cisco.com>
Message-ID: <39C75744D164D948A170E9792AF8E7CAF60D36@exil.voltaire.com>

> So what was the conclusion on the right way to handle the change that
> affects on-the-wire data?  Just have a flag day so targets either work
> with 2.6.25 and earlier initiators, or work with 2.6.26 and later
> initiators, and corrupt data if someone mixes things the wrong way?

See Eli's answer here:
 
http://lists.openfabrics.org/pipermail/general/2008-April/049248.html


From qjeemhpuqggej at pww.every1.net  Sun Apr 27 14:24:18 2008
From: qjeemhpuqggej at pww.every1.net (christine)
Date: Sun, 27 Apr 2008 13:24:18 -0800  (EDT) 
Subject: [ofa-general] hi from christine
Message-ID: <13140787.5999136133764.JavaMail.vmail@service1.colo.empereur.com>


Hi

My name is christine. I found your email on that dating site.
I also love sex on the side. I have a loving partner but he is working 16 hours a day and we have sex only once a week :(
If you are interested and wanna see my pictures just email me at cchristine037 at ewekgame.cn
Don`t reply, use the email above (my boyfriend doesn`t know about that email!)


From a-alexc at activest.de  Sun Apr 27 17:42:38 2008
From: a-alexc at activest.de (Fanny Gabriel)
Date: Mon, 28 Apr 2008 08:42:38 +0800
Subject: [ofa-general] Long time, no talk
Message-ID: <01c8a90b$d0085300$065c48de@a-alexc>

Hello! I am bored tonight. I am nice girl that would like to chat with you. Email me at Katarina at likeihape.cn only, because I am using my friend's email to write this. Wanna see some pictures of me?


From ogerlitz at voltaire.com  Mon Apr 28 00:09:56 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 28 Apr 2008 10:09:56 +0300
Subject: [ofa-general] Re: status of ofed ipoib changes which are not
	upstream
In-Reply-To: <1209365678.22367.34.camel@mtls03>
References: <Pine.LNX.4.64.0804270847580.25762@zuben.voltaire.com>	
	<1209286508.22367.5.camel@mtls03> <adaprsb2ak6.fsf@cisco.com>
	<1209365678.22367.34.camel@mtls03>
Message-ID: <48157844.6000909@voltaire.com>

Eli Cohen wrote:
>> ...looking closer, what happens if the send queue has less than 16
>> entries?  (set with send_queue_size on module load)
> I assumed that no one will want to use such a low number but surely
> someone will do it ;-) How about using a set function for the module
> parameter and allowing values >= 2 * MAX_SEND_CQE?
>
Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , 
send_queue_size/4}

Or


From eli at mellanox.co.il  Mon Apr 28 00:14:58 2008
From: eli at mellanox.co.il (Eli Cohen)
Date: Mon, 28 Apr 2008 10:14:58 +0300
Subject: [ofa-general] Re: status of ofed ipoib changes which are not
	upstream
In-Reply-To: <48157844.6000909@voltaire.com>
References: <Pine.LNX.4.64.0804270847580.25762@zuben.voltaire.com>
	<1209286508.22367.5.camel@mtls03>  <adaprsb2ak6.fsf@cisco.com>
	<1209365678.22367.34.camel@mtls03>  <48157844.6000909@voltaire.com>
Message-ID: <1209366898.22367.47.camel@mtls03>


On Mon, 2008-04-28 at 10:09 +0300, Or Gerlitz wrote:
> >
> Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , 
> send_queue_size/4}
> 
But then I have to evaluate this expression in the fast path so I think
we should put the limit at module initialization.


From ogerlitz at voltaire.com  Mon Apr 28 00:17:27 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Mon, 28 Apr 2008 10:17:27 +0300
Subject: [ofa-general] Re: split CQs for IPOIB UD
In-Reply-To: <1209366898.22367.47.camel@mtls03>
References: <Pine.LNX.4.64.0804270847580.25762@zuben.voltaire.com>	
	<1209286508.22367.5.camel@mtls03> <adaprsb2ak6.fsf@cisco.com>	
	<1209365678.22367.34.camel@mtls03> <48157844.6000909@voltaire.com>
	<1209366898.22367.47.camel@mtls03>
Message-ID: <48157A07.1020905@voltaire.com>

Eli Cohen wrote:
>> Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , 
>> send_queue_size/4}
>>     
> But then I have to evaluate this expression in the fast path so I think
> we should put the limit at module initialization.
>   
Using a value V which equals  MIN (A, b) which is not predefined does 
mot mean to you need to evaluate the MIN function each time you check if 
X > V, just compute V once and you are done.

Or.


From eli at mellanox.co.il  Mon Apr 28 00:28:19 2008
From: eli at mellanox.co.il (Eli Cohen)
Date: Mon, 28 Apr 2008 10:28:19 +0300
Subject: [ofa-general] Re: split CQs for IPOIB UD
In-Reply-To: <48157A07.1020905@voltaire.com>
References: <Pine.LNX.4.64.0804270847580.25762@zuben.voltaire.com>
	<1209286508.22367.5.camel@mtls03>  <adaprsb2ak6.fsf@cisco.com>
	<1209365678.22367.34.camel@mtls03>  <48157844.6000909@voltaire.com>
	<1209366898.22367.47.camel@mtls03>  <48157A07.1020905@voltaire.com>
Message-ID: <1209367699.22367.51.camel@mtls03>


On Mon, 2008-04-28 at 10:17 +0300, Or Gerlitz wrote:
> Eli Cohen wrote:
> >> Or go simpler and have MAX_SEND_CQE be replaced by (say) MIN {16 , 
> >> send_queue_size/4}
> >>     
> > But then I have to evaluate this expression in the fast path so I think
> > we should put the limit at module initialization.
> >   
> Using a value V which equals  MIN (A, b) which is not predefined does 
> mot mean to you need to evaluate the MIN function each time you check if 
> X > V, just compute V once and you are done.
> 
That's true but then you have to read the calculated value from the
memory location where you saved it while in the case of a macro, the
compare value is placed in the code segment at the same area where the
code is read from.


From eli at dev.mellanox.co.il  Mon Apr 28 01:14:47 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Mon, 28 Apr 2008 11:14:47 +0300
Subject: [ofa-general] [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
Message-ID: <1209370487.11248.1.camel@mtls03>

>From 3d87645b9209f95d374c455b3d7535673518b421 Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at mellanox.co.il>
Date: Thu, 20 Mar 2008 16:35:30 +0200
Subject: [PATCH] IB/ipoib: Split CQs for IPOIB UD

Use a dedicated CQ for UD send. Also, do not arm the UD send CQ thus
reducing the number of interrupts generated by the HCA. This patch
farther reduces overhead by not calling poll CQ for every posted send
WR - it does it only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
---

changes since the last commit (v1):
make sure the tx ring size is at least twice MAX_SEND_CQE to ensure
polling the send CQ is done before the tx ring is exhausted.

 drivers/infiniband/ulp/ipoib/ipoib.h       |    9 ++++--
 drivers/infiniband/ulp/ipoib/ipoib_cm.c    |    8 ++--
 drivers/infiniband/ulp/ipoib/ipoib_etool.c |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |   45 ++++++++++++++++------------
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |   39 ++++++++++++++++--------
 6 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 43feffc..fb28f0b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -95,6 +95,8 @@ enum {
 	IPOIB_MCAST_FLAG_SENDONLY = 1,
 	IPOIB_MCAST_FLAG_BUSY	  = 2,	/* joining or already joined */
 	IPOIB_MCAST_FLAG_ATTACHED = 3,
+
+	MAX_SEND_CQE		  = 16,
 };
 
 #define	IPOIB_OP_RECV   (1ul << 31)
@@ -285,7 +287,8 @@ struct ipoib_dev_priv {
 	u16		  pkey_index;
 	struct ib_pd	 *pd;
 	struct ib_mr	 *mr;
-	struct ib_cq	 *cq;
+	struct ib_cq	 *rcq;
+	struct ib_cq	 *scq;
 	struct ib_qp	 *qp;
 	u32		  qkey;
 
@@ -305,7 +308,8 @@ struct ipoib_dev_priv {
 	struct ib_send_wr    tx_wr;
 	unsigned	     tx_outstanding;
 
-	struct ib_wc ibwc[IPOIB_NUM_WC];
+	struct ib_wc	     ibwc[IPOIB_NUM_WC];
+	struct ib_wc	     send_wc[MAX_SEND_CQE];
 
 	struct list_head dead_ahs;
 
@@ -650,7 +654,6 @@ static inline int ipoib_register_debugfs(void) { return 0; }
 static inline void ipoib_unregister_debugfs(void) { }
 #endif
 
-
 #define ipoib_printk(level, priv, format, arg...)	\
 	printk(level "%s: " format, ((struct ipoib_dev_priv *) priv)->dev->name , ## arg)
 #define ipoib_warn(priv, format, arg...)		\
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 90ff2c9..dfabb38 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -249,8 +249,8 @@ static struct ib_qp *ipoib_cm_create_rx_qp(struct net_device *dev,
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_init_attr attr = {
 		.event_handler = ipoib_cm_rx_event_handler,
-		.send_cq = priv->cq, /* For drain WR */
-		.recv_cq = priv->cq,
+		.send_cq = priv->rcq, /* For drain WR */
+		.recv_cq = priv->rcq,
 		.srq = priv->cm.srq,
 		.cap.max_send_wr = 1, /* For drain WR */
 		.cap.max_send_sge = 1, /* FIXME: 0 Seems not to work */
@@ -951,8 +951,8 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	struct ib_qp_init_attr attr = {
-		.send_cq		= priv->cq,
-		.recv_cq		= priv->cq,
+		.send_cq		= priv->rcq,
+		.recv_cq		= priv->rcq,
 		.srq			= priv->cm.srq,
 		.cap.max_send_wr	= ipoib_sendq_size,
 		.cap.max_send_sge	= 1,
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_etool.c b/drivers/infiniband/ulp/ipoib/ipoib_etool.c
index a3ac4cf..b4f4f0f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_etool.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_etool.c
@@ -73,7 +73,7 @@ static int ipoib_set_coalesce(struct net_device *dev,
 	    coal->rx_max_coalesced_frames > 0xffff)
 		return -EINVAL;
 
-	ret = ib_modify_cq(priv->cq, coal->rx_max_coalesced_frames,
+	ret = ib_modify_cq(priv->rcq, coal->rx_max_coalesced_frames,
 			   coal->rx_coalesce_usecs);
 	if (ret) {
 		ipoib_dbg(priv, "failed modifying CQ\n");
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 5c61a81..8222b50 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -311,7 +311,6 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	unsigned int wr_id = wc->wr_id;
 	struct ipoib_tx_buf *tx_req;
-	unsigned long flags;
 
 	ipoib_dbg_data(priv, "send completion: id %d, status: %d\n",
 		       wr_id, wc->status);
@@ -331,13 +330,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 
 	dev_kfree_skb_any(tx_req->skb);
 
-	spin_lock_irqsave(&priv->tx_lock, flags);
 	++priv->tx_tail;
 	if (unlikely(--priv->tx_outstanding == ipoib_sendq_size >> 1) &&
 	    netif_queue_stopped(dev) &&
 	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 		netif_wake_queue(dev);
-	spin_unlock_irqrestore(&priv->tx_lock, flags);
 
 	if (wc->status != IB_WC_SUCCESS &&
 	    wc->status != IB_WC_WR_FLUSH_ERR)
@@ -346,6 +343,17 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 			   wc->status, wr_id, wc->vendor_err);
 }
 
+static int poll_tx(struct ipoib_dev_priv *priv)
+{
+	int n, i;
+
+	n = ib_poll_cq(priv->scq, MAX_SEND_CQE, priv->send_wc);
+	for (i = 0; i < n; ++i)
+		ipoib_ib_handle_tx_wc(priv->dev, priv->send_wc + i);
+
+	return n == MAX_SEND_CQE;
+}
+
 int ipoib_poll(struct napi_struct *napi, int budget)
 {
 	struct ipoib_dev_priv *priv = container_of(napi, struct ipoib_dev_priv, napi);
@@ -361,7 +369,7 @@ poll_more:
 		int max = (budget - done);
 
 		t = min(IPOIB_NUM_WC, max);
-		n = ib_poll_cq(priv->cq, t, priv->ibwc);
+		n = ib_poll_cq(priv->rcq, t, priv->ibwc);
 
 		for (i = 0; i < n; i++) {
 			struct ib_wc *wc = priv->ibwc + i;
@@ -372,12 +380,8 @@ poll_more:
 					ipoib_cm_handle_rx_wc(dev, wc);
 				else
 					ipoib_ib_handle_rx_wc(dev, wc);
-			} else {
-				if (wc->wr_id & IPOIB_OP_CM)
-					ipoib_cm_handle_tx_wc(dev, wc);
-				else
-					ipoib_ib_handle_tx_wc(dev, wc);
-			}
+			} else
+                                ipoib_cm_handle_tx_wc(priv->dev, wc);
 		}
 
 		if (n != t)
@@ -386,7 +390,7 @@ poll_more:
 
 	if (done < budget) {
 		netif_rx_complete(dev, napi);
-		if (unlikely(ib_req_notify_cq(priv->cq,
+		if (unlikely(ib_req_notify_cq(priv->rcq,
 					      IB_CQ_NEXT_COMP |
 					      IB_CQ_REPORT_MISSED_EVENTS)) &&
 		    netif_rx_reschedule(dev, napi))
@@ -507,12 +511,17 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 
 		address->last_send = priv->tx_head;
 		++priv->tx_head;
+		skb_orphan(skb);
 
 		if (++priv->tx_outstanding == ipoib_sendq_size) {
 			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 			netif_stop_queue(dev);
 		}
 	}
+
+	if (unlikely(priv->tx_outstanding > MAX_SEND_CQE))
+		poll_tx(priv);
+
 	return;
 
 drop:
@@ -665,7 +674,7 @@ void ipoib_drain_cq(struct net_device *dev)
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
 	int i, n;
 	do {
-		n = ib_poll_cq(priv->cq, IPOIB_NUM_WC, priv->ibwc);
+		n = ib_poll_cq(priv->rcq, IPOIB_NUM_WC, priv->ibwc);
 		for (i = 0; i < n; ++i) {
 			/*
 			 * Convert any successful completions to flush
@@ -680,14 +689,12 @@ void ipoib_drain_cq(struct net_device *dev)
 					ipoib_cm_handle_rx_wc(dev, priv->ibwc + i);
 				else
 					ipoib_ib_handle_rx_wc(dev, priv->ibwc + i);
-			} else {
-				if (priv->ibwc[i].wr_id & IPOIB_OP_CM)
-					ipoib_cm_handle_tx_wc(dev, priv->ibwc + i);
-				else
-					ipoib_ib_handle_tx_wc(dev, priv->ibwc + i);
-			}
+			} else
+                                ipoib_cm_handle_tx_wc(dev, priv->ibwc + i);
 		}
 	} while (n == IPOIB_NUM_WC);
+
+	while(poll_tx(priv));
 }
 
 int ipoib_ib_dev_stop(struct net_device *dev, int flush)
@@ -779,7 +786,7 @@ timeout:
 		msleep(1);
 	}
 
-	ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP);
+	ib_req_notify_cq(priv->rcq, IB_CQ_NEXT_COMP);
 
 	return 0;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 2f8a07d..b0633dd 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1291,7 +1291,8 @@ static int __init ipoib_init_module(void)
 
 	ipoib_sendq_size = roundup_pow_of_two(ipoib_sendq_size);
 	ipoib_sendq_size = min(ipoib_sendq_size, IPOIB_MAX_QUEUE_SIZE);
-	ipoib_sendq_size = max(ipoib_sendq_size, IPOIB_MIN_QUEUE_SIZE);
+	ipoib_sendq_size = max(ipoib_sendq_size, max(2 * MAX_SEND_CQE,
+						     IPOIB_MIN_QUEUE_SIZE));
 #ifdef CONFIG_INFINIBAND_IPOIB_CM
 	ipoib_max_conn_qp = min(ipoib_max_conn_qp, IPOIB_CM_MAX_CONN_QP);
 #endif
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index 1d59f27..b9e7eab 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -171,26 +171,33 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_pd;
 	}
 
-	size = ipoib_sendq_size + ipoib_recvq_size + 1;
+	size = ipoib_recvq_size + 1;
 	ret = ipoib_cm_dev_init(dev);
 	if (!ret) {
+		size += ipoib_sendq_size;
 		if (ipoib_cm_has_srq(dev))
 			size += ipoib_recvq_size + 1; /* 1 extra for rx_drain_qp */
 		else
 			size += ipoib_recvq_size * ipoib_max_conn_qp;
 	}
 
-	priv->cq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0);
-	if (IS_ERR(priv->cq)) {
-		printk(KERN_WARNING "%s: failed to create CQ\n", ca->name);
+	priv->rcq = ib_create_cq(priv->ca, ipoib_ib_completion, NULL, dev, size, 0);
+	if (IS_ERR(priv->rcq)) {
+		printk(KERN_WARNING "%s: failed to create receive CQ\n", ca->name);
 		goto out_free_mr;
 	}
 
-	if (ib_req_notify_cq(priv->cq, IB_CQ_NEXT_COMP))
-		goto out_free_cq;
+	priv->scq = ib_create_cq(priv->ca, NULL, NULL, dev, ipoib_sendq_size, 0);
+	if (IS_ERR(priv->scq)) {
+		printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name);
+		goto out_free_rcq;
+	}
+
+	if (ib_req_notify_cq(priv->rcq, IB_CQ_NEXT_COMP))
+		goto out_free_scq;
 
-	init_attr.send_cq = priv->cq;
-	init_attr.recv_cq = priv->cq;
+	init_attr.send_cq = priv->scq;
+	init_attr.recv_cq = priv->rcq;
 
 	if (priv->hca_caps & IB_DEVICE_TCP_TSO)
 		init_attr.create_flags = QP_CREATE_LSO;
@@ -201,7 +208,7 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 	priv->qp = ib_create_qp(priv->pd, &init_attr);
 	if (IS_ERR(priv->qp)) {
 		printk(KERN_WARNING "%s: failed to create QP\n", ca->name);
-		goto out_free_cq;
+		goto out_free_scq;
 	}
 
 	priv->dev->dev_addr[1] = (priv->qp->qp_num >> 16) & 0xff;
@@ -217,8 +224,11 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 
 	return 0;
 
-out_free_cq:
-	ib_destroy_cq(priv->cq);
+out_free_scq:
+	ib_destroy_cq(priv->scq);
+
+out_free_rcq:
+	ib_destroy_cq(priv->rcq);
 
 out_free_mr:
 	ib_dereg_mr(priv->mr);
@@ -241,8 +251,11 @@ void ipoib_transport_dev_cleanup(struct net_device *dev)
 		clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
 	}
 
-	if (ib_destroy_cq(priv->cq))
-		ipoib_warn(priv, "ib_cq_destroy failed\n");
+	if (ib_destroy_cq(priv->scq))
+		ipoib_warn(priv, "ib_cq_destroy (send) failed\n");
+
+	if (ib_destroy_cq(priv->rcq))
+		ipoib_warn(priv, "ib_cq_destroy (recv) failed\n");
 
 	ipoib_cm_dev_cleanup(dev);
 
-- 
1.5.5


From jackm at dev.mellanox.co.il  Mon Apr 28 04:38:28 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Mon, 28 Apr 2008 14:38:28 +0300
Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max HCA
	resource limits at run time -- reposting
Message-ID: <200804281438.28417.jackm@dev.mellanox.co.il>

mlx4-core: enable changing default max HCA resource limits.

Enable module-initialization time modification of default HCA
maximum resource limits via module parameters, as is done in mthca.

Specify the log of the parameter value, rather than the value itself
to avoid the hidden side-effect of rounding up values to next power-of-2.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

---

Roland,
This patch was first posted on Oct 16, 2007 (but got overlooked).

I'm reposting its current incarnation, which applies to the OFED 1.4 driver
as is currently on the OpenFabrics server (based on Kernel 2.6.25-rc7).

Please queue up for kernel 2.6.26.
Thanks!
Jack

Index: ofed_kernel/drivers/net/mlx4/main.c
===================================================================
--- ofed_kernel.orig/drivers/net/mlx4/main.c	2007-10-29 10:22:34.771753000 +0200
+++ ofed_kernel/drivers/net/mlx4/main.c	2007-10-29 11:03:17.939875000 +0200
@@ -85,6 +85,56 @@ static struct mlx4_profile default_profi
 	.num_mtt	= 1 << 20,
 };
 
+static struct mlx4_profile mod_param_profile = { 0 };
+
+module_param_named(log_num_qp, mod_param_profile.num_qp, int, 0444);
+MODULE_PARM_DESC(log_num_qp, "log maximum number of QPs per HCA");
+
+module_param_named(log_num_srq, mod_param_profile.num_srq, int, 0444);
+MODULE_PARM_DESC(log_num_srq, "log maximum number of SRQs per HCA");
+
+module_param_named(log_rdmarc_per_qp, mod_param_profile.rdmarc_per_qp, int, 0444);
+MODULE_PARM_DESC(log_rdmarc_per_qp, "log number of RDMARC buffers per QP");
+
+module_param_named(log_num_cq, mod_param_profile.num_cq, int, 0444);
+MODULE_PARM_DESC(log_num_cq, "log maximum number of CQs per HCA");
+
+module_param_named(log_num_mcg, mod_param_profile.num_mcg, int, 0444);
+MODULE_PARM_DESC(log_num_mcg, "log maximum number of multicast groups per HCA");
+
+module_param_named(log_num_mpt, mod_param_profile.num_mpt, int, 0444);
+MODULE_PARM_DESC(log_num_mpt,
+		"log maximum number of memory protection table entries per HCA");
+
+module_param_named(log_num_mtt, mod_param_profile.num_mtt, int, 0444);
+MODULE_PARM_DESC(log_num_mtt,
+		 "log maximum number of memory translation table segments per HCA");
+
+static void process_mod_param_profile(void)
+{
+	default_profile.num_qp = (mod_param_profile.num_qp ?
+				  1 << mod_param_profile.num_qp :
+				  default_profile.num_qp);
+	default_profile.num_srq = (mod_param_profile.num_srq ?
+				  1 << mod_param_profile.num_srq :
+				  default_profile.num_srq);
+	default_profile.rdmarc_per_qp = (mod_param_profile.rdmarc_per_qp ?
+				  1 << mod_param_profile.rdmarc_per_qp :
+				  default_profile.rdmarc_per_qp);
+	default_profile.num_cq = (mod_param_profile.num_cq ?
+				  1 << mod_param_profile.num_cq :
+				  default_profile.num_cq);
+	default_profile.num_mcg = (mod_param_profile.num_mcg ?
+				  1 << mod_param_profile.num_mcg :
+				  default_profile.num_mcg);
+	default_profile.num_mpt = (mod_param_profile.num_mpt ?
+				  1 << mod_param_profile.num_mpt :
+				  default_profile.num_mpt);
+	default_profile.num_mtt = (mod_param_profile.num_mtt ?
+				  1 << mod_param_profile.num_mtt :
+				  default_profile.num_mtt);
+}
+
 static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 {
 	int err;
@@ -514,6 +564,7 @@ static int __devinit mlx4_init_hca(struc
 		goto err_stop_fw;
 	}
 
+	process_mod_param_profile();
 	profile = default_profile;
 
 	icm_size = mlx4_make_profile(dev, &profile, &dev_cap, &init_hca);


From tziporet at dev.mellanox.co.il  Mon Apr 28 04:52:56 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Mon, 28 Apr 2008 14:52:56 +0300
Subject: [ofa-general] Loading of ib_mthca fails
In-Reply-To: <Pine.LNX.4.64.0804241600440.5313@theory.polytechnique.fr>
References: <Pine.LNX.4.64.0804232043280.22707@theory.polytechnique.fr>	<adaiqy89y59.fsf@cisco.com>	<Pine.LNX.4.64.0804232332170.13031@theory.polytechnique.fr>	<ada8wz49r3s.fsf@cisco.com>	<Pine.LNX.4.64.0804240107500.13031@theory.polytechnique.fr>	<adar6cw87v0.fsf@cisco.com>
	<Pine.LNX.4.64.0804241600440.5313@theory.polytechnique.fr>
Message-ID: <4815BA98.8000802@mellanox.co.il>

Xavier Andrade wrote:
> On Wed, 23 Apr 2008, Roland Dreier wrote:
>
>> Hmm, not sure... let's see what the Mellanox guys say (they're mostly on
>> vacation this week so it might be a few days).
We are back :-)
>
> I can't locate the correct firmware, the PSID reported by mtsflint 
> corresponds to an Intel one:
>
> Image type:      Failsafe
> I.S. Version:    1
> Chip Revision:   A0
> Description:     Node             Port1            Sys image
> GUIDs:           0002c9020022baa4 0002c9020022baa5 0002c9020022baa7
> Board ID:         (INT0010000001)
> VSD:
> PSID:            INT0010000001
>
> But I haven't been able to find any firmware in Intel's webpage.
>
> Do you think that I could use a Mellanox firmware? Which one? There 
> are three different ones for the MT25204.
>
Attached is the ini file for this PSID.
Please create a binary using the MFT package in our web site and try to 
burn it.
If you have more issues please work with Todd that cc on this maik

Tziporet


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: D51144_A1-A3.ini
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/abf83e26/attachment.ksh>

From dorfman.eli at gmail.com  Mon Apr 28 05:01:33 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Mon, 28 Apr 2008 15:01:33 +0300
Subject: [ofa-general] [PATCH} IB/iSER: Move high-volume debug output to
	higher debug levels
Message-ID: <694d48600804280501q3cf74a10p2e1b73b4ac0d3d27@mail.gmail.com>

Add more levels for debug.

Signed-off-by: Eli Dorfman <elid at voltaire.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c  |    5 ++---
 drivers/infiniband/ulp/iser/iscsi_iser.h  |    7 +++++++
 drivers/infiniband/ulp/iser/iser_memory.c |    7 +++++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index be1b9fb..451e601 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -78,15 +78,14 @@ static unsigned int iscsi_max_lun = 512;
 module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);

 int iser_debug_level = 0;
+module_param_named(debug_level, iser_debug_level, int,
S_IRUGO|S_IWUSR|S_IWGRP);
+MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0
(default:disabled)");

 MODULE_DESCRIPTION("iSER (iSCSI Extensions for RDMA) Datamover "
 		   "v" DRV_VER " (" DRV_DATE ")");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Alex Nezhinsky, Dan Bar Dov, Or Gerlitz");

-module_param_named(debug_level, iser_debug_level, int, 0644);
-MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0
(default:disabled)");
-
 struct iser_global ig;

 void
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 1ee867b..a8c1b30 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -71,6 +71,13 @@

 #define iser_dbg(fmt, arg...)				\
 	do {						\
+		if (iser_debug_level > 1)		\
+			printk(KERN_DEBUG PFX "%s:" fmt,\
+				__func__ , ## arg);	\
+	} while (0)
+
+#define iser_warn(fmt, arg...)				\
+	do {						\
 		if (iser_debug_level > 0)		\
 			printk(KERN_DEBUG PFX "%s:" fmt,\
 				__func__ , ## arg);	\
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c
b/drivers/infiniband/ulp/iser/iser_memory.c
index 4a17743..ee58199 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -334,8 +334,11 @@ static void iser_data_buf_dump(struct iser_data_buf *data,
 	struct scatterlist *sg;
 	int i;

+	if (iser_debug_level == 0)
+		return;
+
 	for_each_sg(sgl, sg, data->dma_nents, i)
-		iser_err("sg[%d] dma_addr:0x%lX page:0x%p "
+		iser_warn("sg[%d] dma_addr:0x%lX page:0x%p "
 			 "off:0x%x sz:0x%x dma_len:0x%x\n",
 			 i, (unsigned long)ib_sg_dma_address(ibdev, sg),
 			 sg_page(sg), sg->offset,
@@ -434,7 +437,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_cmd_task
*iser_ctask,

 	aligned_len = iser_data_buf_aligned_len(mem, ibdev);
 	if (aligned_len != mem->dma_nents) {
-		iser_err("rdma alignment violation %d/%d aligned\n",
+		iser_warn("rdma alignment violation %d/%d aligned\n",
 			 aligned_len, mem->size);
 		iser_data_buf_dump(mem, ibdev);

-- 
1.5.5

This patch was made against 2.6.26 branch.
Since this includes minor changes please try to push it to 2.6.26.
Otherwise this can go to 2.6.27.


From dorfman.eli at gmail.com  Mon Apr 28 05:10:16 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Mon, 28 Apr 2008 15:10:16 +0300
Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment
	violations
Message-ID: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com>

Add read only module param to count alignment violations.
In case of unaligned pages iSER allocates memory and copies
the data to the new memory.

Signed-off-by: Eli Dorfman <elid at voltaire.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c  |    3 +++
 drivers/infiniband/ulp/iser/iscsi_iser.h  |    1 +
 drivers/infiniband/ulp/iser/iser_memory.c |    1 +
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 451e601..5181a1e 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -77,6 +77,9 @@
 static unsigned int iscsi_max_lun = 512;
 module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);

+unsigned int iser_unaligned_cnt = 0;
+module_param_named(unaligned_cnt, iser_unaligned_cnt, uint, S_IRUGO);
+
 int iser_debug_level = 0;
 module_param_named(debug_level, iser_debug_level, int,
S_IRUGO|S_IWUSR|S_IWGRP);
 MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0
(default:disabled)");
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h
b/drivers/infiniband/ulp/iser/iscsi_iser.h
index a8c1b30..4a39a38 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -294,6 +294,7 @@ struct iser_global {

 extern struct iser_global ig;
 extern int iser_debug_level;
+extern unsigned int iser_unaligned_cnt;

 /* allocate connection resources needed for rdma functionality */
 int iser_conn_set_full_featured_mode(struct iscsi_conn *conn);
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c
b/drivers/infiniband/ulp/iser/iser_memory.c
index ee58199..0f0fcb3 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -437,6 +437,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_cmd_task
*iser_ctask,

 	aligned_len = iser_data_buf_aligned_len(mem, ibdev);
 	if (aligned_len != mem->dma_nents) {
+		iser_unaligned_cnt++;
 		iser_warn("rdma alignment violation %d/%d aligned\n",
 			 aligned_len, mem->size);
 		iser_data_buf_dump(mem, ibdev);
-- 
1.5.5

This patch was made against 2.6.26 branch.
Since it includes minor changes please try to push it to 2.6.26.
Otherwise this can go to 2.6.27.


From Arkady.Kanevsky at netapp.com  Mon Apr 28 06:51:11 2008
From: Arkady.Kanevsky at netapp.com (Kanevsky, Arkady)
Date: Mon, 28 Apr 2008 09:51:11 -0400
Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support
	peer-2-peerconnection setup.
In-Reply-To: <4814AD7B.2060006@opengridcomputing.com>
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>	<20080427160010.31018.67436.stgit@dell3.ogc.int><adatzhn2r5z.fsf@cisco.com>
	<4814AD7B.2060006@opengridcomputing.com>
Message-ID: <C98692FD98048C41885E0B0FACD9DFB8078B23DC@exnane01.hq.netapp.com>

I expect it to be tests at Sept interop event.
If it works then I will send proposal to IETF for MPA enhancement.
Thanks,

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Steve Wise [mailto:swise at opengridcomputing.com] 
> Sent: Sunday, April 27, 2008 12:45 PM
> To: Roland Dreier
> Cc: netdev at vger.kernel.org; general at lists.openfabrics.org; 
> linux-kernel at vger.kernel.org; divy at chelsio.com
> Subject: Re: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: 
> Support peer-2-peerconnection setup.
> 
> 
> 
> Roland Dreier wrote:
> > What are the interoperability implications of this?
> > 
> > Looking closer I see that iw_nes has the send_first module 
> parameter.
> > How does this interact with that?
> > 
> 
> It doesn't...yet.  But we wanted to enable these applications 
> for chelsio now and get the low level fw and driver changes 
> done first and tested.
> 
> > I guess it's fine to apply this, but do we have a plan for 
> how we want 
> > to handle this issue in the long-term?
> > 
> 
> Yes!  If you'll recall, we had a thread on the ofa general 
> list discussing how to enhance the MPA negotiation so peers 
> can indicate whether they want/need the RTR and what type of 
> RTR (0B read, 0B write, or 0B send) should be sent.  This 
> will be done by standardizing a few bits of the private data 
> in order to negotiate all this.  The rdma-cma API will be 
> extended so applications will have to request this 
> peer-2-peer model since it adds overhead to the connection setup.
> 
> I plan to do this work for 2.6.27/ofed-1.4.  I think it was 
> listed in Felix's talk at Sonoma.  This work (design, API, 
> and code changes affecting core and placing requirements on 
> iwarp providers) will be posted as RFC changes to get 
> everyones feedback as soon as I get something going.
> 
> Does that sound ok?
> 
> 
> Steve.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 


From mtosatti at redhat.com  Mon Apr 28 07:48:55 2008
From: mtosatti at redhat.com (Marcelo Tosatti)
Date: Mon, 28 Apr 2008 11:48:55 -0300
Subject: [ofa-general] Re: [kvm-devel] mmu notifier #v14
In-Reply-To: <20080427030514.GM9514@duo.random>
References: <20080426164511.GJ9514@duo.random> <48137B8B.7010202@us.ibm.com>
	<20080427002019.GL9514@duo.random> <4813DCCF.3020201@codemonkey.ws>
	<20080427030514.GM9514@duo.random>
Message-ID: <20080428144855.GA1702@dmt>

Hi Andrea,

Looks good.

Acked-by: Marcelo Tosatti <mtosatti at redhat.com>


From hrosenstock at xsigo.com  Mon Apr 28 08:02:55 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Mon, 28 Apr 2008 08:02:55 -0700
Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max
	HCA resource limits at run time -- reposting
In-Reply-To: <200804281438.28417.jackm@dev.mellanox.co.il>
References: <200804281438.28417.jackm@dev.mellanox.co.il>
Message-ID: <1209394975.689.386.camel@hrosenstock-ws.xsigo.com>

Jack,

On Mon, 2008-04-28 at 14:38 +0300, Jack Morgenstein wrote:
> mlx4-core: enable changing default max HCA resource limits.
> 
> Enable module-initialization time modification of default HCA
> maximum resource limits via module parameters, as is done in mthca.
> 
> Specify the log of the parameter value, rather than the value itself
> to avoid the hidden side-effect of rounding up values to next power-of-2.

This is much needed; thanks!

One minor comment:

In places where there are reserved resources (like qps, srqs, others ?),
should it be ensured that the parameters set are above the logs of those
amounts so the user doesn't shoot themselves in the foot by accident ?

Or perhaps a little more on the ranges in the mod param descriptions ?

-- Hal

> Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>
> 
> ---
> 
> Roland,
> This patch was first posted on Oct 16, 2007 (but got overlooked).
> 
> I'm reposting its current incarnation, which applies to the OFED 1.4 driver
> as is currently on the OpenFabrics server (based on Kernel 2.6.25-rc7).
> 
> Please queue up for kernel 2.6.26.
> Thanks!
> Jack
> 
> Index: ofed_kernel/drivers/net/mlx4/main.c
> ===================================================================
> --- ofed_kernel.orig/drivers/net/mlx4/main.c	2007-10-29 10:22:34.771753000 +0200
> +++ ofed_kernel/drivers/net/mlx4/main.c	2007-10-29 11:03:17.939875000 +0200
> @@ -85,6 +85,56 @@ static struct mlx4_profile default_profi
>  	.num_mtt	= 1 << 20,
>  };
>  
> +static struct mlx4_profile mod_param_profile = { 0 };
> +
> +module_param_named(log_num_qp, mod_param_profile.num_qp, int, 0444);
> +MODULE_PARM_DESC(log_num_qp, "log maximum number of QPs per HCA");
> +
> +module_param_named(log_num_srq, mod_param_profile.num_srq, int, 0444);
> +MODULE_PARM_DESC(log_num_srq, "log maximum number of SRQs per HCA");
> +
> +module_param_named(log_rdmarc_per_qp, mod_param_profile.rdmarc_per_qp, int, 0444);
> +MODULE_PARM_DESC(log_rdmarc_per_qp, "log number of RDMARC buffers per QP");
> +
> +module_param_named(log_num_cq, mod_param_profile.num_cq, int, 0444);
> +MODULE_PARM_DESC(log_num_cq, "log maximum number of CQs per HCA");
> +
> +module_param_named(log_num_mcg, mod_param_profile.num_mcg, int, 0444);
> +MODULE_PARM_DESC(log_num_mcg, "log maximum number of multicast groups per HCA");
> +
> +module_param_named(log_num_mpt, mod_param_profile.num_mpt, int, 0444);
> +MODULE_PARM_DESC(log_num_mpt,
> +		"log maximum number of memory protection table entries per HCA");
> +
> +module_param_named(log_num_mtt, mod_param_profile.num_mtt, int, 0444);
> +MODULE_PARM_DESC(log_num_mtt,
> +		 "log maximum number of memory translation table segments per HCA");
> +
> +static void process_mod_param_profile(void)
> +{
> +	default_profile.num_qp = (mod_param_profile.num_qp ?
> +				  1 << mod_param_profile.num_qp :
> +				  default_profile.num_qp);
> +	default_profile.num_srq = (mod_param_profile.num_srq ?
> +				  1 << mod_param_profile.num_srq :
> +				  default_profile.num_srq);
> +	default_profile.rdmarc_per_qp = (mod_param_profile.rdmarc_per_qp ?
> +				  1 << mod_param_profile.rdmarc_per_qp :
> +				  default_profile.rdmarc_per_qp);
> +	default_profile.num_cq = (mod_param_profile.num_cq ?
> +				  1 << mod_param_profile.num_cq :
> +				  default_profile.num_cq);
> +	default_profile.num_mcg = (mod_param_profile.num_mcg ?
> +				  1 << mod_param_profile.num_mcg :
> +				  default_profile.num_mcg);
> +	default_profile.num_mpt = (mod_param_profile.num_mpt ?
> +				  1 << mod_param_profile.num_mpt :
> +				  default_profile.num_mpt);
> +	default_profile.num_mtt = (mod_param_profile.num_mtt ?
> +				  1 << mod_param_profile.num_mtt :
> +				  default_profile.num_mtt);
> +}
> +
>  static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
>  {
>  	int err;
> @@ -514,6 +564,7 @@ static int __devinit mlx4_init_hca(struc
>  		goto err_stop_fw;
>  	}
>  
> +	process_mod_param_profile();
>  	profile = default_profile;
>  
>  	icm_size = mlx4_make_profile(dev, &profile, &dev_cap, &init_hca);
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From David.Chevalier at ge.com  Mon Apr 28 08:13:36 2008
From: David.Chevalier at ge.com (Chevalier, David (GE Healthcare))
Date: Mon, 28 Apr 2008 11:13:36 -0400
Subject: [ofa-general] SDP poll() behavior
Message-ID: <68D58DEFB8673048A64DE1FBE56BEE1807CF50EC@CINMLVEM11.e2k.ad.ge.com>

Hi SDP developers,

 I've noticed apparent difference between SDP and TCP/IP handling of
certain scenario (OFED 1.3),
not necessarily a bug, but wondering if it might be better to behave
more like TCP/IP in this case:

receiver and sender use non-blocking sockets (SDP) and monitor through
poll()
sender writes a known quantity of data through many calls to write(),
then closes its side of socket.
receiver polls socket, and reads the data through many calls to read(),
then closes its socket.
receiver is monitoring poll() revents for POLLERR, POLLHUP and POLLIN

On the receiver's last expected pass through the poll() loop to read()
the last remaining data,
I'll often get revents of {POLLERR|POLLHUP|POLLIN}, likely due to sender
closing its socket after last write().

If my poll() handling loop goes in this order:
  check/handle POLLERR
  check/handle POLLHUP
  check/handle POLLIN
then it fails, because I don't expect to be able to read() data when
poll() return POLLERR or POLLHUP.
If I change the order and handle POLLIN first, then read() works and
gets the last data.
I've never encountered this in TCP/IP - that is to say, for TCP/IP I
first receive a clean POLLIN from poll(),
then the next poll()(after I read() the data) returns POLLHUP (without
the POLLERR).

If I get POLLERR from poll(), I'd expect subsequent call to read() to
return an error, not valid data...

While this is probably an "implementation defined" behavior, it seems
like a good idea to try
to behave the same as the TCP/IP sockets that SDP aims to replace...

Regards,
Dave


From jackm at dev.mellanox.co.il  Mon Apr 28 08:20:21 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Mon, 28 Apr 2008 18:20:21 +0300
Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max HCA
	resource limits at run time -- reposting
In-Reply-To: <1209394975.689.386.camel@hrosenstock-ws.xsigo.com>
References: <200804281438.28417.jackm@dev.mellanox.co.il>
	<1209394975.689.386.camel@hrosenstock-ws.xsigo.com>
Message-ID: <200804281820.21672.jackm@dev.mellanox.co.il>

On Monday 28 April 2008 18:02, Hal Rosenstock wrote:
> In places where there are reserved resources (like qps, srqs, others ?),
> should it be ensured that the parameters set are above the logs of those
> amounts so the user doesn't shoot themselves in the foot by accident ?
> 
No worry there.  The reserved resources are subtracted from the above
log amounts (when expressed as a power-of-2:  1UL << log), and the
resulting amounts (<power-of-2> - reserved) returned to the user as
the device limits.
(check this out using ibv_devinfo)

Thus, the user CANNOT "shoot themselves in the foot".

- Jack

(P.S. this patch is in OFED 1.3 -- do "modinfo mlx4_core" to see the
above module parameters ).


From Tyson at ceo-ag.de  Mon Apr 28 08:21:05 2008
From: Tyson at ceo-ag.de (Tyson Monroe)
Date: Mon, 28 Apr 2008 19:21:05 +0400
Subject: [ofa-general] Superb quality copies of best-known watches
Message-ID: <305d01c8a943$79bbc170$bb0682d5@Tyson>

We offer the largest selection of splendid watch rep1!c at s on the web!
Have a good time shopping in our store!
http://sebua.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/e3b139ef/attachment.html>

From olga.shern at gmail.com  Mon Apr 28 08:22:23 2008
From: olga.shern at gmail.com (Olga Shern (Voltaire))
Date: Mon, 28 Apr 2008 18:22:23 +0300
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
Message-ID: <bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>

Hi Tziporet,

I was on vacation, therefore couldn't attend this meeting, and I want to
update about Voltaire's plans for OFED 1.3.1.
We are working on bug fixes for Bonding and HA , minimal impact on traffic,
multicast and partitioning  during SM failover.

Also it is very important for us that IPoIB 2 kernel panics will be fixed (
https://bugs.openfabrics.org/show_bug.cgi?id=989,
https://bugs.openfabrics.org/show_bug.cgi?id=985)

Best Regards,
Olga


On 4/22/08, Tziporet Koren <tziporet at mellanox.co.il> wrote:
>
>  OFED April 21 meeting summary about 1.3.1 plans and OFED 1.4 development:
>
> 1. OFED 1.3.1:
>
>    1.1  Planned changes:
>
>       ULPs changes:
>
>          IB-bonding - done
>          SRP failover - on work
>          SDP crashes - on work
>          RDS fixes for RDMA API - done
>          librdmacm 1.0.7 - done
>          Open MPI 1.2.6 - done
>          uDAPL - on work
>
>       Low level drivers: - each HW vendor should reply when the
>       changes will be ready
>
>          nes - will be ready on first week of May
>          mlx4 - fixes are ready; changes to support Eth are under
>          review of the submission to kernel so not clear if they will make it on
>          time.
>
>          cxgb3 - will be ready by middle of may. Majority of
>          changes should be submitted for RC1.
>          ipath - wait for update from Betsy
>          ehca - wait for update from Christoph
>
>        1.2 Schedule: we agreed that 2 release candidate should be
>    sufficient
>
>       GA is planned for May-29
>       - RC1 - May 6
>       - RC2 - May 20
>
>    Note: daily builds of 1.3.1 are already available at: *
>    http://www.openfabrics.org/builds/ofed-1.3.1*<http://www.openfabrics.org/builds/ofed-1.3.1>
>
>
> 2. OFED 1.4:
>
>    Release features were presented at Sonoma (presentation available at
>    *http://www.openfabrics.org/archives/april2008sonoma.htm*<http://www.openfabrics.org/archives/april2008sonoma.htm>
>    )
>
>    IPv6: Woody is looking for resources to add IPv6 support to the CMA.
>    Hal noted that it will require a change in opensm too.
>
>    Xsigo Vnic & Vhba - Not clear if they will make it
>
>    Kernel tree is under work at: git://
>    git.openfabrics.org/ofed_1_4/linux-2.6.git branch ofed_kernel
>    We should try to get the kernel code to compile as soon as possible
>    so everybody will be able to contribute code.
>
>    Schedule reminder:
>    ==============
>    Release: Oct 06, 2008
>    Features freeze: Jun 25, 08 (kernel 2.6.26 based)
>    Alpha:  Jul 9, 08
>    Beta:   Jul 30, 08 kernel 2.6.27-rcX (assuming it will be available)
>    RC1:    Aug 13, 08
>    RC2:    Aug 27, 08
>    RC3-RC5/6 – every 5-10 days
>    Latest RC to be used in OFA interop event
>    GA:     Oct 06 08
>
>
> Tziporet
>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/7fe564f4/attachment.html>

From hrosenstock at xsigo.com  Mon Apr 28 08:50:13 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Mon, 28 Apr 2008 08:50:13 -0700
Subject: [ofa-general] [PATCH] mlx4_core: enable changing default max
	HCA resource limits at run time -- reposting
In-Reply-To: <200804281820.21672.jackm@dev.mellanox.co.il>
References: <200804281438.28417.jackm@dev.mellanox.co.il>
	<1209394975.689.386.camel@hrosenstock-ws.xsigo.com>
	<200804281820.21672.jackm@dev.mellanox.co.il>
Message-ID: <1209397813.689.395.camel@hrosenstock-ws.xsigo.com>

On Mon, 2008-04-28 at 18:20 +0300, Jack Morgenstein wrote:
> On Monday 28 April 2008 18:02, Hal Rosenstock wrote:
> > In places where there are reserved resources (like qps, srqs, others ?),
> > should it be ensured that the parameters set are above the logs of those
> > amounts so the user doesn't shoot themselves in the foot by accident ?
> > 
> No worry there.  The reserved resources are subtracted from the above
> log amounts (when expressed as a power-of-2:  1UL << log), and the
> resulting amounts (<power-of-2> - reserved) returned to the user as
> the device limits.
> (check this out using ibv_devinfo)
> 
> Thus, the user CANNOT "shoot themselves in the foot".

Right; that accounts for the reserved ones but what happens if they
mistakenly set something like

log_num_qp = 5

where the total number is less than the reserved number ?

Should this be protected against in some way (friendly error or bump to
minimum needed) and/or indicate a minimum in the mod param description ?

> - Jack
> 
> (P.S. this patch is in OFED 1.3

Also, OFED 1.2.5.4

>  -- do "modinfo mlx4_core" to see the above module parameters ).

Thanks.

-- Hal


From rdreier at cisco.com  Mon Apr 28 08:50:25 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 08:50:25 -0700
Subject: [ofa-general] Re: [PATCH] mlx4_core: enable changing default max HCA
	resource limits at run time -- reposting
In-Reply-To: <200804281438.28417.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Mon, 28 Apr 2008 14:38:28 +0300")
References: <200804281438.28417.jackm@dev.mellanox.co.il>
Message-ID: <adawsmi0yim.fsf@cisco.com>

Hmm... wouldn't it be better to follow the same interface as ib_mthca
and have consumers pass in the numbers instead of the log sizes?  Having
two different ways of changing the same parameters seems pretty confusing.


From rdreier at cisco.com  Mon Apr 28 08:51:10 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 08:51:10 -0700
Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count
	alignment violations
In-Reply-To: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com> (Eli
	Dorfman's message of "Mon, 28 Apr 2008 15:10:16 +0300")
References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com>
Message-ID: <adaskx60yhd.fsf@cisco.com>

 > Add read only module param to count alignment violations.

I don't think a module parameter is the way to report statistics from
the kernel.  Can't you just add a device attribute or something?  Or
stick a file in debugfs?

 - R.


From rdreier at cisco.com  Mon Apr 28 08:57:12 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 08:57:12 -0700
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	(Olga Shern's message of "Mon, 28 Apr 2008 18:22:23 +0300")
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
Message-ID: <adak5ii0y7b.fsf@cisco.com>

 > Also it is very important for us that IPoIB 2 kernel panics will be fixed (
 > https://bugs.openfabrics.org/show_bug.cgi?id=989,
 > https://bugs.openfabrics.org/show_bug.cgi?id=985)

Are either of these panics seen with upstream kernels?

If we don't know then this points to a serious problem with the OFED
model: we are diluting testing resources from the upstream kernel, which
hurts the quality of the kernel that most users get from their distro.


From olga.shern at gmail.com  Mon Apr 28 09:14:39 2008
From: olga.shern at gmail.com (Olga Shern (Voltaire))
Date: Mon, 28 Apr 2008 19:14:39 +0300
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <adak5ii0y7b.fsf@cisco.com>
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	<adak5ii0y7b.fsf@cisco.com>
Message-ID: <bc457d660804280914x6bac9bc0raa7c05122a9806fa@mail.gmail.com>

On 4/28/08, Roland Dreier <rdreier at cisco.com> wrote:
>
> > Also it is very important for us that IPoIB 2 kernel panics will be
> fixed (
> > https://bugs.openfabrics.org/show_bug.cgi?id=989,
> > https://bugs.openfabrics.org/show_bug.cgi?id=985)
>
> Are either of these panics seen with upstream kernels?
>
> https://bugs.openfabrics.org/show_bug.cgi?id=989 is OFED bug


  https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce
it on upstream kernel and let you know
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/80d50754/attachment.html>

From weiny2 at llnl.gov  Mon Apr 28 09:19:23 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Mon, 28 Apr 2008 09:19:23 -0700
Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when
	setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB
	mcast group due to a temporary node soft lockup.)
In-Reply-To: <48143DBA.3080701@voltaire.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<48109087.6030606@voltaire.com>
	<20080424143125.2aad1db8.weiny2@llnl.gov>
	<15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com>
	<20080424181657.28d58a29.weiny2@llnl.gov>
	<48143DBA.3080701@voltaire.com>
Message-ID: <20080428091923.0abf9fb5.weiny2@llnl.gov>

On Sun, 27 Apr 2008 11:47:54 +0300
Or Gerlitz <ogerlitz at voltaire.com> wrote:

> Ira Weiny wrote:
> >
> > I did not get any output with multicast_debug_level!  
> why should you, as from the node's point of view nothing has happened 
> (the exact param name is mcast_debug_level)
> >
> > Here is a patch which fixes the problem.  (At least with the partial sub-nets
> > configuration I explained before.)  I will have to verify this fixes the problem
> > I originally reported.
> OK, good. Does this problem exist in the released openSM? if yes, what 
> would be the trigger for the SM to "really discover" (i.e do PortInfo 
> SET) this sub-fabric and how much time would it take to reach this 
> trigger, worst case wise?

Yes, this is in the current released version of OpenSM, AFAICT.  The trigger
is: the single link separating the partial sub net will come up and that trap
will cause OpenSM to resweep.  I believe this will happen on the next resweep
cycle which is by default 10 sec.  (But this is configurable.)  I don't think
there is an issue with allowing OpenSM to resweep as designed.

> 
> The failure configuration you have set to reproduce the problem is very 
> untypical, I think.

I agree.  I made a patch to turn off the processing of MAD's in the kernel to
test my original theory, that the node is not responding to MAD's.  Using this
patch I have been able to verify that if a node stops responding that the rereg
is sent by OpenSM when the node comes back.

See my next email response to Sasha concerning the original issue.

Ira

>
> Since under common clos etc topologies which don't 
> have a 1:n blocking nature, failure of such link would cause re-route 
> etc by the SM which would not (and should not) be noted by the nodes (I 
> hope I am not falling into another problem here...)
> 
> Or.
> 
> 
> 


From HNGUYEN at de.ibm.com  Mon Apr 28 09:36:20 2008
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Mon, 28 Apr 2008 18:36:20 +0200
Subject: [ofa-general] Re: [PATCH] ehca: ret is unsigned,
 ibmebus_request_irq() negative return ignored in hca_create_eq()
In-Reply-To: <480FA529.2030800@tiscali.nl>
Message-ID: <OF9A87AED0.00E7FAFE-ONC1257439.005B0599-C1257439.005B2A38@de.ibm.com>

Hello Roel!
Thanks for pointing this out.
Will send another version with a more consistent naming convention
for return variable from firmware. We used to naming it h_ret.
Regards
Nam

Roel Kluin <12o3l at tiscali.nl> wrote on 23.04.2008 23:07:53:

> diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c
> b/drivers/infiniband/hw/ehca/ehca_eq.c
> index b4ac617..9727235 100644
> --- a/drivers/infiniband/hw/ehca/ehca_eq.c
> +++ b/drivers/infiniband/hw/ehca/ehca_eq.c
> @@ -59,6 +59,7 @@ int ehca_create_eq(struct ehca_shca *shca,
>     u32 i;
>     void *vpage;
>     struct ib_device *ib_dev = &shca->ib_device;
> +   int ret2;
>
>     spin_lock_init(&eq->spinlock);
>     spin_lock_init(&eq->irq_spinlock);
> @@ -123,18 +124,18 @@ int ehca_create_eq(struct ehca_shca *shca,
>
>     /* register interrupt handlers and initialize work queues */
>     if (type == EHCA_EQ) {
> -      ret = ibmebus_request_irq(eq->ist, ehca_interrupt_eq,
> +      ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_eq,
>                   IRQF_DISABLED, "ehca_eq",
>                   (void *)shca);
> -      if (ret < 0)
> +      if (ret2 < 0)
>           ehca_err(ib_dev, "Can't map interrupt handler.");
>
>        tasklet_init(&eq->interrupt_task, ehca_tasklet_eq, (long)shca);
>     } else if (type == EHCA_NEQ) {
> -      ret = ibmebus_request_irq(eq->ist, ehca_interrupt_neq,
> +      ret2 = ibmebus_request_irq(eq->ist, ehca_interrupt_neq,
>                   IRQF_DISABLED, "ehca_neq",
>                   (void *)shca);
> -      if (ret < 0)
> +      if (ret2 < 0)
>           ehca_err(ib_dev, "Can't map interrupt handler.");
>
>        tasklet_init(&eq->interrupt_task, ehca_tasklet_neq, (long)shca);


From akepner at sgi.com  Mon Apr 28 09:37:31 2008
From: akepner at sgi.com (akepner at sgi.com)
Date: Mon, 28 Apr 2008 09:37:31 -0700
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <bc457d660804280914x6bac9bc0raa7c05122a9806fa@mail.gmail.com>
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	<adak5ii0y7b.fsf@cisco.com>
	<bc457d660804280914x6bac9bc0raa7c05122a9806fa@mail.gmail.com>
Message-ID: <20080428163731.GL30919@sgi.com>

On Mon, Apr 28, 2008 at 07:14:39PM +0300, Olga Shern (Voltaire) wrote:
> ...
>   https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce
> it on upstream kernel and let you know

I just saw this bug report today, but we've had similar crashes. 
Looks like the problem is that in ipoib_neigh_cleanup() this is 
done (no locking):

    neigh = *to_ipoib_neigh(n);

then later:

      spin_lock_irqsave(&priv->lock, flags);
      if (neigh->ah)
               ah = neigh->ah;
      list_del(&neigh->list); <---- neigh may be stale now
      ipoib_neigh_free(n->dev, neigh);
      spin_unlock_irqrestore(&priv->lock, flags);

neigh wasn't re-read after acquiring the lock, so it may point
to an already freed data structure.

In our crashes we had backtraces like:

RIP: ib_ipoib:ipoib_neigh_cleanup+368
     neigh_destroy+197
     neigh_periodic_timer+249
     neigh_periodic_timer+0
     run_timer_softirq+348
     __do_softirq+85
     call_softirq+30
     do_softirq+44
     .....

And the following helpful hint:

Unable to handle kernel paging request at 0000000000100108
                                          ^^^^^^^^^^^^^^^^
                                          LIST_POISON1 + 0x8

So we were dying in the midst of list_del().

-- 
Arthur


From hnguyen at linux.vnet.ibm.com  Mon Apr 28 09:47:44 2008
From: hnguyen at linux.vnet.ibm.com (Hoang-Nam Nguyen)
Date: Mon, 28 Apr 2008 18:47:44 +0200
Subject: [ofa-general] IB/ehca: handle negative return value from
	ibmebus_request_irq() properly in ehca_create_eq()
Message-ID: <200804281847.44968.hnguyen@linux.vnet.ibm.com>

Signed-off-by: Hoang-Nam Nguyen <hnguyen at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_eq.c |   35 ++++++++++++++++-----------------
 1 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c b/drivers/infiniband/hw/ehca/ehca_eq.c
index b4ac617..49660df 100644
--- a/drivers/infiniband/hw/ehca/ehca_eq.c
+++ b/drivers/infiniband/hw/ehca/ehca_eq.c
@@ -54,7 +54,8 @@ int ehca_create_eq(struct ehca_shca *shca,
 		   struct ehca_eq *eq,
 		   const enum ehca_eq_type type, const u32 length)
 {
-	u64 ret;
+	int ret;
+	u64 h_ret;
 	u32 nr_pages;
 	u32 i;
 	void *vpage;
@@ -73,15 +74,15 @@ int ehca_create_eq(struct ehca_shca *shca,
 		return -EINVAL;
 	}
 
-	ret = hipz_h_alloc_resource_eq(shca->ipz_hca_handle,
-				       &eq->pf,
-				       type,
-				       length,
-				       &eq->ipz_eq_handle,
-				       &eq->length,
-				       &nr_pages, &eq->ist);
+	h_ret = hipz_h_alloc_resource_eq(shca->ipz_hca_handle,
+					 &eq->pf,
+					 type,
+					 length,
+					 &eq->ipz_eq_handle,
+					 &eq->length,
+					 &nr_pages, &eq->ist);
 
-	if (ret != H_SUCCESS) {
+	if (h_ret != H_SUCCESS) {
 		ehca_err(ib_dev, "Can't allocate EQ/NEQ. eq=%p", eq);
 		return -EINVAL;
 	}
@@ -97,24 +98,22 @@ int ehca_create_eq(struct ehca_shca *shca,
 		u64 rpage;
 
 		vpage = ipz_qpageit_get_inc(&eq->ipz_queue);
-		if (!vpage) {
-			ret = H_RESOURCE;
+		if (!vpage)
 			goto create_eq_exit2;
-		}
 
 		rpage = virt_to_abs(vpage);
-		ret = hipz_h_register_rpage_eq(shca->ipz_hca_handle,
-					       eq->ipz_eq_handle,
-					       &eq->pf,
-					       0, 0, rpage, 1);
+		h_ret = hipz_h_register_rpage_eq(shca->ipz_hca_handle,
+						 eq->ipz_eq_handle,
+						 &eq->pf,
+						 0, 0, rpage, 1);
 
 		if (i == (nr_pages - 1)) {
 			/* last page */
 			vpage = ipz_qpageit_get_inc(&eq->ipz_queue);
-			if (ret != H_SUCCESS || vpage)
+			if (h_ret != H_SUCCESS || vpage)
 				goto create_eq_exit2;
 		} else {
-			if (ret != H_PAGE_REGISTERED || !vpage)
+			if (h_ret != H_PAGE_REGISTERED || !vpage)
 				goto create_eq_exit2;
 		}
 	}
-- 
1.5.5


From arlin.r.davis at intel.com  Mon Apr 28 10:13:40 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Mon, 28 Apr 2008 10:13:40 -0700
Subject: [ofa-general] [PATCH] [dat1.2] dapl cma: add check before destroying
	cm event channel in release
Message-ID: <B0095134066CC94FBC80973103FFA1FE06F24C89@orsmsx416.amr.corp.intel.com>


library may be loaded and unloaded without calling open
in which case the cm event channel is not created.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_util.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_util.c
b/dapl/openib_cma/dapl_ib_util.c
index 5f4fbd0..56c0a05 100755
--- a/dapl/openib_cma/dapl_ib_util.c
+++ b/dapl/openib_cma/dapl_ib_util.c
@@ -189,7 +189,8 @@ int32_t dapls_ib_release(void)
 {
 	dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapl_ib_release: \n");
 	dapli_ib_thread_destroy();
-	rdma_destroy_event_channel(g_cm_events);
+	if (g_cm_events != NULL)
+		rdma_destroy_event_channel(g_cm_events);
 	return 0;
 }
 
-- 
1.5.2.5


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/a475dd4d/attachment.html>

From arlin.r.davis at intel.com  Mon Apr 28 10:13:42 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Mon, 28 Apr 2008 10:13:42 -0700
Subject: [ofa-general] [PATCH] [master] dapl cma: add check before destroying
	cm event channel in release
Message-ID: <002401c8a953$35b62b90$51fc070a@amr.corp.intel.com>


library may be loaded and unloaded without calling open
in which case the cm event channel is not created.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_util.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c
index a7ba3d6..1f41186 100755
--- a/dapl/openib_cma/dapl_ib_util.c
+++ b/dapl/openib_cma/dapl_ib_util.c
@@ -178,7 +178,8 @@ int32_t dapls_ib_release(void)
 {
 	dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapl_ib_release: \n");
 	dapli_ib_thread_destroy();
-	rdma_destroy_event_channel(g_cm_events);
+	if (g_cm_events != NULL)
+		rdma_destroy_event_channel(g_cm_events);
 	return 0;
 }
 
-- 
1.5.2.5


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/43fa40a8/attachment.html>

From arlin.r.davis at intel.com  Mon Apr 28 10:17:06 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Mon, 28 Apr 2008 10:17:06 -0700
Subject: [ofa-general] [PATCH][master] dapl: add vendor_err with DTO error
	logging
Message-ID: <B0095134066CC94FBC80973103FFA1FE06F24CA3@orsmsx416.amr.corp.intel.com>


DAPL_GET_CQE_VENDOR_ERR added to get vendor_err via cq entry.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_evd_util.c   |   16 ++++++++++------
 dapl/openib_cma/dapl_ib_dto.h |    2 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c
index 32fbaba..293759f 100755
--- a/dapl/common/dapl_evd_util.c
+++ b/dapl/common/dapl_evd_util.c
@@ -543,9 +543,12 @@ bail:
     return dat_status;
 }
 
-#if defined(DAPL_DBG) && !defined(DAPL_GET_CQE_OP_STR)
+#if !defined(DAPL_GET_CQE_OP_STR)
 #define DAPL_GET_CQE_OP_STR(e) "Unknown CEQ OP String?"
 #endif
+#if !defined(DAPL_GET_CQE_VENDOR_ERR)
+#define DAPL_GET_CQE_VENDOR_ERR(e) 0
+#endif
 
 /*
  * dapli_evd_eh_print_cqe
@@ -565,7 +568,6 @@ dapli_evd_eh_print_cqe (
     IN 	ib_work_completion_t	*cqe_ptr)
 {
 #ifdef DAPL_DBG
-
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
 		  "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n");
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
@@ -583,8 +585,9 @@ dapli_evd_eh_print_cqe (
 		  DAPL_GET_CQE_BYTESNUM (cqe_ptr));
     }
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t\t status %d\n",
-		  DAPL_GET_CQE_STATUS (cqe_ptr));
+		  "\t\t status %d vendor_err 0x%x\n",
+		  DAPL_GET_CQE_STATUS(cqe_ptr),
+		  DAPL_GET_CQE_VENDOR_ERR(cqe_ptr));
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
 		  "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n");
 #endif
@@ -1215,9 +1218,10 @@ dapli_evd_cqe_to_event (
 	}
 
 	dapl_log(DAPL_DBG_TYPE_ERR,
-		 "DTO completion ERR: status %d, opcode %s \n",
+		 "DTO completion ERR: status %d, opcode %s, vendor_err
0x%x\n",
 		 DAPL_GET_CQE_STATUS(cqe_ptr),
-		 DAPL_GET_CQE_OP_STR(cqe_ptr));
+		 DAPL_GET_CQE_OP_STR(cqe_ptr),
+		 DAPL_GET_CQE_VENDOR_ERR(cqe_ptr));
     }
 }
 
diff --git a/dapl/openib_cma/dapl_ib_dto.h
b/dapl/openib_cma/dapl_ib_dto.h
index a90aea2..b111e5e 100644
--- a/dapl/openib_cma/dapl_ib_dto.h
+++ b/dapl/openib_cma/dapl_ib_dto.h
@@ -458,10 +458,10 @@ STATIC _INLINE_ int
dapls_cqe_opcode(ib_work_completion_t *cqe_p)
 	}
 }
 
-
 #define DAPL_GET_CQE_OPTYPE(cqe_p) dapls_cqe_opcode(cqe_p)
 #define DAPL_GET_CQE_WRID(cqe_p) ((ib_work_completion_t*)cqe_p)->wr_id
 #define DAPL_GET_CQE_STATUS(cqe_p)
((ib_work_completion_t*)cqe_p)->status
+#define DAPL_GET_CQE_VENDOR_ERR(cqe_p)
((ib_work_completion_t*)cqe_p)->vendor_err
 #define DAPL_GET_CQE_BYTESNUM(cqe_p)
((ib_work_completion_t*)cqe_p)->byte_len
 #define DAPL_GET_CQE_IMMED_DATA(cqe_p)
((ib_work_completion_t*)cqe_p)->imm_data
 
-- 
1.5.2.5


From arlin.r.davis at intel.com  Mon Apr 28 10:17:07 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Mon, 28 Apr 2008 10:17:07 -0700
Subject: [ofa-general] [PATCH][dat1.2] dapl: add vendor_err with DTO error
	logging
Message-ID: <002901c8a953$affe56c0$51fc070a@amr.corp.intel.com>


DAPL_GET_CQE_VENDOR_ERR added to get vendor_err via cq entry.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/common/dapl_evd_util.c   |   50 ++++++++++++++++++----------------------
 dapl/openib_cma/dapl_ib_dto.h |    1 +
 2 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c
index 36b776c..2c95c6d 100644
--- a/dapl/common/dapl_evd_util.c
+++ b/dapl/common/dapl_evd_util.c
@@ -485,6 +485,12 @@ bail:
     return dat_status;
 }
 
+#if !defined(DAPL_GET_CQE_OP_STR)
+#define DAPL_GET_CQE_OP_STR(e) "Unknown CEQ OP String?"
+#endif
+#if !defined(DAPL_GET_CQE_VENDOR_ERR)
+#define DAPL_GET_CQE_VENDOR_ERR(e) 0
+#endif
 
 /*
  * dapli_evd_eh_print_cqe
@@ -504,39 +510,28 @@ dapli_evd_eh_print_cqe (
     IN 	ib_work_completion_t	*cqe_ptr)
 {
 #ifdef DAPL_DBG
-    static char *optable[] =
-    {
-	"OP_SEND",
-	"OP_RDMA_READ",
-	"OP_RDMA_WRITE",
-	"OP_COMP_AND_SWAP",
-	"OP_FETCH_AND_ADD",
-	"OP_RECEIVE",
-	"OP_BIND_MW",
-	0
-    };
-
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n");
+                  "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n");
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t dapl_evd_dto_callback : CQE \n");
+                  "\t dapl_evd_dto_callback : CQE \n");
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t\t work_req_id %lli\n",
-		  DAPL_GET_CQE_WRID (cqe_ptr));
+                  "\t\t work_req_id %lli\n",
+                  DAPL_GET_CQE_WRID (cqe_ptr));
     if (DAPL_GET_CQE_STATUS (cqe_ptr) == 0)
     {
-	dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t\t op_type: %s\n",
-		  optable[DAPL_GET_CQE_OPTYPE (cqe_ptr)]);
-	dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t\t bytes_num %d\n",
-		  DAPL_GET_CQE_BYTESNUM (cqe_ptr));
+        dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
+                  "\t\t op_type: %s\n",
+                  DAPL_GET_CQE_OP_STR(cqe_ptr));
+        dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
+                  "\t\t bytes_num %d\n",
+                  DAPL_GET_CQE_BYTESNUM (cqe_ptr));
     }
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t\t status %d\n",
-		  DAPL_GET_CQE_STATUS (cqe_ptr));
+                  "\t\t status %d vendor_err 0x%x\n",
+                  DAPL_GET_CQE_STATUS(cqe_ptr),
+                  DAPL_GET_CQE_VENDOR_ERR(cqe_ptr));
     dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK,
-		  "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n");
+                  "\t >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<\n");
 #endif
     return;
 }
@@ -1171,9 +1166,10 @@ dapli_evd_cqe_to_event (
 	}
 
 	dapl_log(DAPL_DBG_TYPE_ERR,
-		 "DTO completion ERR: status %d, opcode %s \n",
+		 "DTO completion ERR: status %d, opcode %s, vendor_err 0x%x\n",
 		 DAPL_GET_CQE_STATUS(cqe_ptr),
-		 DAPL_GET_CQE_OP_STR(cqe_ptr));
+		 DAPL_GET_CQE_OP_STR(cqe_ptr),
+		 DAPL_GET_CQE_VENDOR_ERR(cqe_ptr));
     }
 }
 
diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h
index 1a83718..52b189b 100644
--- a/dapl/openib_cma/dapl_ib_dto.h
+++ b/dapl/openib_cma/dapl_ib_dto.h
@@ -272,6 +272,7 @@ STATIC _INLINE_ int dapls_cqe_opcode(ib_work_completion_t *cqe_p)
 #define DAPL_GET_CQE_OPTYPE(cqe_p) dapls_cqe_opcode(cqe_p)
 #define DAPL_GET_CQE_WRID(cqe_p) ((ib_work_completion_t*)cqe_p)->wr_id
 #define DAPL_GET_CQE_STATUS(cqe_p) ((ib_work_completion_t*)cqe_p)->status
+#define DAPL_GET_CQE_VENDOR_ERR(cqe_p) ((ib_work_completion_t*)cqe_p)->vendor_err
 #define DAPL_GET_CQE_BYTESNUM(cqe_p) ((ib_work_completion_t*)cqe_p)->byte_len
 #define DAPL_GET_CQE_IMMED_DATA(cqe_p) ((ib_work_completion_t*)cqe_p)->imm_data
 
-- 
1.5.2.5


From weiny2 at llnl.gov  Mon Apr 28 11:03:32 2008
From: weiny2 at llnl.gov (Ira Weiny)
Date: Mon, 28 Apr 2008 11:03:32 -0700
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <20080427171140.GI22406@sashak.voltaire.com>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<20080427171140.GI22406@sashak.voltaire.com>
Message-ID: <20080428110332.6fb8e1d8.weiny2@llnl.gov>

On Sun, 27 Apr 2008 17:11:40 +0000
Sasha Khapyorsky <sashak at voltaire.com> wrote:

> Hi Ira,
> 
> On 13:38 Wed 23 Apr     , Ira Weiny wrote:
> > 
> > The symptom is that nodes drop out of the IPoIB mcast group after a node
> > temporarily goes catatonic.  The details are:
> > 
> >    1) Issues on a node cause a soft lockup of the node.
> >    2) OpenSM does a normal light sweep.
> >    3) MADs to the node time out since the node is in a "bad state"
> 
> Normally during light sweep OpenSM will not query nodes. I think OpenSM
> should not detect such soft lockup unless ib link state was changed and
> heavy sweep was triggered. Is this the case?

Yes I agree.  Per my previous mail to Or I found that light sweeps did not in
fact notice the nodes were gone.  Looking at the logs I am not sure what
caused OpenSM to notice them.  However, something must have triggered a heavy
sweep when those nodes were catatonic.  From the logs they were unresponsive
for multiple seconds, some as long as 30s.  It is still a bit of a mystery why
OpenSM did a heavy sweep during this period but I don't think it is
unreasonable for it to do so.

> 
> >    4) OpenSM marks the node down and drops it from internal tables, including
> >       mcast groups.
> >    5) Node recovers from soft lock up condition.
> >    6) A subsequent sweep causes OpenSM see the node and add it back to the
> >       fabric.
> >    7) Node is fully functional on the verbs layer but IPoIB never knew anything
> >       was wrong so it does _not_ rejoin the mcast groups.  (This is different
> >       from the condition where the link actually goes down.)
> 
> If my approach above is correct it should be same as port down/up
> handling. And as was noted already in this thread OpenSM should ask
> for reregistration (by setting client reregistration bit).
> 
> I see your patch - seems this part is buggy in OpenSM now, will see
> closer to this.
> 

Yes I believe this is all fixed.

Thanks again for everyone's help on this,
Ira


From arlin.r.davis at intel.com  Mon Apr 28 11:47:38 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Mon, 28 Apr 2008 11:47:38 -0700
Subject: [ofa-general] [PATCH][dat1.2] dapl: cma provider needs to support
	lower inline send default for iWARP
Message-ID: <B0095134066CC94FBC80973103FFA1FE06F24E3F@orsmsx416.amr.corp.intel.com>


IB and iWARP work best with different defaults. Add transport check
and set default accordingly. 64 for iWARP, 200 for IB.

DAPL_MAX_INLINE environment variable is still used to override.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_util.c |   11 +++++++++--
 dapl/openib_cma/dapl_ib_util.h |    3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_util.c
b/dapl/openib_cma/dapl_ib_util.c
index 56c0a05..4de5a2c 100755
--- a/dapl/openib_cma/dapl_ib_util.c
+++ b/dapl/openib_cma/dapl_ib_util.c
@@ -274,8 +274,15 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME
hca_name, IN DAPL_HCA *hca_ptr)
 		(unsigned long long)bswap_64(gid->global.interface_id));
 
 	/* set inline max with env or default, get local lid and gid 0
*/
-	hca_ptr->ib_trans.max_inline_send = 
-		dapl_os_get_env_val("DAPL_MAX_INLINE",
INLINE_SEND_DEFAULT);
+	if (hca_ptr->ib_hca_handle->device->transport_type
+					== IBV_TRANSPORT_IWARP)
+		hca_ptr->ib_trans.max_inline_send =
+			dapl_os_get_env_val("DAPL_MAX_INLINE",
+					    INLINE_SEND_IWARP_DEFAULT);
+	else
+		hca_ptr->ib_trans.max_inline_send =
+			dapl_os_get_env_val("DAPL_MAX_INLINE",
+					    INLINE_SEND_IB_DEFAULT);
 
 	/* set CM timer defaults */	
 	hca_ptr->ib_trans.max_cm_timeout =
diff --git a/dapl/openib_cma/dapl_ib_util.h
b/dapl/openib_cma/dapl_ib_util.h
index 93f4fde..1e464b2 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -122,7 +122,8 @@ typedef struct _ib_wait_obj_handle
 #define IB_INVALID_HANDLE	NULL
 
 /* inline send rdma threshold */
-#define	INLINE_SEND_DEFAULT	128
+#define	INLINE_SEND_IWARP_DEFAULT	64
+#define	INLINE_SEND_IB_DEFAULT		200
 
 /* CM private data areas */
 #define	IB_MAX_REQ_PDATA_SIZE	48
-- 
1.5.2.5


From arlin.r.davis at intel.com  Mon Apr 28 11:47:41 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Mon, 28 Apr 2008 11:47:41 -0700
Subject: [ofa-general] [PATCH][master] dapl: cma provider needs to support
	lower inline send default for iWARP
Message-ID: <002a01c8a960$56e964f0$51fc070a@amr.corp.intel.com>


IB and iWARP work best with different defaults. Add transport check
and set default accordingly. 64 for iWARP, 200 for IB.

DAPL_MAX_INLINE environment variable is still used to override.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_util.c |   11 +++++++++--
 dapl/openib_cma/dapl_ib_util.h |    3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_util.c b/dapl/openib_cma/dapl_ib_util.c
index 1f41186..41986a3 100755
--- a/dapl/openib_cma/dapl_ib_util.c
+++ b/dapl/openib_cma/dapl_ib_util.c
@@ -270,8 +270,15 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, IN DAPL_HCA *hca_ptr)
 		(unsigned long long)bswap_64(gid->global.interface_id));
 
 	/* set inline max with env or default, get local lid and gid 0 */
-	hca_ptr->ib_trans.max_inline_send = 
-		dapl_os_get_env_val("DAPL_MAX_INLINE", INLINE_SEND_DEFAULT);
+        if (hca_ptr->ib_hca_handle->device->transport_type
+                                        == IBV_TRANSPORT_IWARP)
+		hca_ptr->ib_trans.max_inline_send = 
+			dapl_os_get_env_val("DAPL_MAX_INLINE", 
+					    INLINE_SEND_IWARP_DEFAULT);
+        else
+		hca_ptr->ib_trans.max_inline_send = 
+			dapl_os_get_env_val("DAPL_MAX_INLINE", 
+					    INLINE_SEND_IB_DEFAULT);
 
 	/* set CM timer defaults */	
 	hca_ptr->ib_trans.max_cm_timeout =
diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h
index 71593fd..3368180 100755
--- a/dapl/openib_cma/dapl_ib_util.h
+++ b/dapl/openib_cma/dapl_ib_util.h
@@ -111,7 +111,8 @@ typedef struct _ib_wait_obj_handle
 #define IB_INVALID_HANDLE	NULL
 
 /* inline send rdma threshold */
-#define	INLINE_SEND_DEFAULT	64
+#define	INLINE_SEND_IWARP_DEFAULT	64
+#define	INLINE_SEND_IB_DEFAULT		200
 
 /* CMA private data areas */
 #define CMA_PDATA_HDR		36
-- 
1.5.2.5


From jimmott at austin.rr.com  Mon Apr 28 12:43:35 2008
From: jimmott at austin.rr.com (Jim Mott)
Date: Mon, 28 Apr 2008 14:43:35 -0500
Subject: [ofa-general] SDP poll() behavior
In-Reply-To: <68D58DEFB8673048A64DE1FBE56BEE1807CF50EC@CINMLVEM11.e2k.ad.ge.com>
References: <68D58DEFB8673048A64DE1FBE56BEE1807CF50EC@CINMLVEM11.e2k.ad.ge.com>
Message-ID: <005f01c8a968$262c9cd0$7285d670$@rr.com>

I agree that SDP should have the same behavior as TCP in this situation.  Bug 1020 has been opened so we can track the issue.

Thanks David!

-----Original Message-----
From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Chevalier, David (GE
Healthcare)
Sent: Monday, April 28, 2008 10:14 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] SDP poll() behavior

Hi SDP developers,

 I've noticed apparent difference between SDP and TCP/IP handling of
certain scenario (OFED 1.3),
not necessarily a bug, but wondering if it might be better to behave
more like TCP/IP in this case:

receiver and sender use non-blocking sockets (SDP) and monitor through
poll()
sender writes a known quantity of data through many calls to write(),
then closes its side of socket.
receiver polls socket, and reads the data through many calls to read(),
then closes its socket.
receiver is monitoring poll() revents for POLLERR, POLLHUP and POLLIN

On the receiver's last expected pass through the poll() loop to read()
the last remaining data,
I'll often get revents of {POLLERR|POLLHUP|POLLIN}, likely due to sender
closing its socket after last write().

If my poll() handling loop goes in this order:
  check/handle POLLERR
  check/handle POLLHUP
  check/handle POLLIN
then it fails, because I don't expect to be able to read() data when
poll() return POLLERR or POLLHUP.
If I change the order and handle POLLIN first, then read() works and
gets the last data.
I've never encountered this in TCP/IP - that is to say, for TCP/IP I
first receive a clean POLLIN from poll(),
then the next poll()(after I read() the data) returns POLLHUP (without
the POLLERR).

If I get POLLERR from poll(), I'd expect subsequent call to read() to
return an error, not valid data...

While this is probably an "implementation defined" behavior, it seems
like a good idea to try
to behave the same as the TCP/IP sockets that SDP aims to replace...

Regards,
Dave
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Mon Apr 28 08:57:12 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 08:57:12 -0700
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	(Olga Shern's message of "Mon, 28 Apr 2008 18:22:23 +0300")
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
Message-ID: <adak5ii0y7b.fsf@cisco.com>

 > Also it is very important for us that IPoIB 2 kernel panics will be fixed (
 > https://bugs.openfabrics.org/show_bug.cgi?id=989,
 > https://bugs.openfabrics.org/show_bug.cgi?id=985)

Are either of these panics seen with upstream kernels?

If we don't know then this points to a serious problem with the OFED
model: we are diluting testing resources from the upstream kernel, which
hurts the quality of the kernel that most users get from their distro.
_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


From clameter at sgi.com  Mon Apr 28 13:34:11 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Mon, 28 Apr 2008 13:34:11 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080427122727.GO9514@duo.random>
References: <20080422223545.GP24536@duo.random>
	<20080422230727.GR30298@sgi.com>
	<20080423002848.GA32618@sgi.com> <20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com>
	<20080427122727.GO9514@duo.random>
Message-ID: <Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>

On Sun, 27 Apr 2008, Andrea Arcangeli wrote:

> Talking about post 2.6.26: the refcount with rcu in the anon-vma
> conversion seems unnecessary and may explain part of the AIM slowdown
> too. The rest looks ok and probably we should switch the code to a
> compile-time decision between rwlock and rwsem (so obsoleting the
> current spinlock).

You are going to take a semphore in an rcu section? Guess you did not 
activate all debugging options while testing? I was not aware that you can 
take a sleeping lock from a non preemptible context.


From olga.shern at gmail.com  Mon Apr 28 09:14:39 2008
From: olga.shern at gmail.com (Olga Shern (Voltaire))
Date: Mon, 28 Apr 2008 19:14:39 +0300
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <adak5ii0y7b.fsf@cisco.com>
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	<adak5ii0y7b.fsf@cisco.com>
Message-ID: <bc457d660804280914x6bac9bc0raa7c05122a9806fa@mail.gmail.com>

On 4/28/08, Roland Dreier <rdreier at cisco.com> wrote:
>
> > Also it is very important for us that IPoIB 2 kernel panics will be
> fixed (
> > https://bugs.openfabrics.org/show_bug.cgi?id=989,
> > https://bugs.openfabrics.org/show_bug.cgi?id=985)
>
> Are either of these panics seen with upstream kernels?
>
> https://bugs.openfabrics.org/show_bug.cgi?id=989 is OFED bug


  https://bugs.openfabrics.org/show_bug.cgi?id=985 we will try to reproduce
it on upstream kernel and let you know
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080428/80d50754/attachment-0001.html>
-------------- next part --------------
_______________________________________________
ewg mailing list
ewg at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

From rdreier at cisco.com  Mon Apr 28 14:45:21 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 14:45:21 -0700
Subject: [ofa-general] Re: [PATCH 1/2] IB/iSER: Do not add unsolicited data
	offset to VA in iSER header
In-Reply-To: <39C75744D164D948A170E9792AF8E7CAF60D36@exil.voltaire.com> (Erez
	Zilber's message of "Sun, 27 Apr 2008 21:53:41 +0300")
References: <694d48600804270553u36b776ame9695a8858dd278@mail.gmail.com>
	<ada3ap745vs.fsf@cisco.com>
	<39C75744D164D948A170E9792AF8E7CAF60D36@exil.voltaire.com>
Message-ID: <aday76xzma6.fsf@cisco.com>

 > See Eli's answer here:
 >  
 > http://lists.openfabrics.org/pipermail/general/2008-April/049248.html

Does everyone agree with that?  Pete?  stgt developers?

 - R.


From rdreier at cisco.com  Mon Apr 28 15:34:00 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 15:34:00 -0700
Subject: [ofa-general][PATCH 9/12 v1] mlx4: Collapsed CQ support
In-Reply-To: <480F506D.9020202@mellanox.co.il> (Yevgeny Petrilin's message of
	"Wed, 23 Apr 2008 18:06:21 +0300")
References: <480F506D.9020202@mellanox.co.il>
Message-ID: <adatzhlzk13.fsf@cisco.com>

thanks, applied


From dks at mediaweb.com  Mon Apr 28 15:43:02 2008
From: dks at mediaweb.com (DK Smith)
Date: Mon, 28 Apr 2008 15:43:02 -0700
Subject: [ofa-general] install.sh question
In-Reply-To: <48141EC1.7010801@dev.mellanox.co.il>
References: <1207688301.1661.86.camel@localhost>
	<48141EC1.7010801@dev.mellanox.co.il>
Message-ID: <481652F6.50008@mediaweb.com>


> Hi Frank,
> install.sh checks if there are binary RPMS for all selected packages
> under OFED-x.x.x/RPMS directory.
> If you have created binary RPMs on one of the nodes (by install.sh
> script), then make sure that the OFED-x.x.x/ofed.conf file includes only
> these packages.
> Then run on all cluster nodes (no kernel sources, compilers, ...
> required on these nodes):
>> ./install.sh -c ofed.conf -net ofed_net.conf
> 
> Note: If there are no RPMs for one or more of the packages selected
> (package_name=y)in the ofed.conf file then install.sh will run the RPM
> build process.
> 
> Regards,
> Vladimir

Is the NEW & IMPROVED installer, install.pl, a drop in replacement for
build.sh?

I recently wrote a set of build scripts that are used to build a
distribution (kernel + modules + root file system) for deployment
elsewhere. (i.e. a non-native build of everything including OFED).

In the OFED 1.2 installer, I used this method of invocation:

/build.sh -c

wherein, build.sh locates the config file, "ofed.conf" in the same
directory. That worked.

The statement about "run on all cluster nodes" appears to indicate a
non-native build is no-longer possible.

Cheers,
DK


From rdreier at cisco.com  Mon Apr 28 15:44:17 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 15:44:17 -0700
Subject: [ofa-general] [PATCH 2.6.26 1/3] RDMA/cxgb3: Correctly serialize
	peer abort path.
In-Reply-To: <20080427160006.31018.66715.stgit@dell3.ogc.int> (Steve Wise's
	message of "Sun, 27 Apr 2008 11:00:06 -0500")
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
	<20080427160006.31018.66715.stgit@dell3.ogc.int>
Message-ID: <adaprs9zjjy.fsf@cisco.com>

OK, applied, with a few fixups based on checkpatch output -- mostly
__FUNCTION__ -> __func__ (__FUNCTION__ is a deprecated gcc-specific
extension, __func__ is standard), and also a couple "abort=0" -> "abort
= 0".

 - R.


From rdreier at cisco.com  Mon Apr 28 15:45:40 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 15:45:40 -0700
Subject: [ofa-general] Re: [PATCH 2.6.26 2/3] RDMA/cxgb3: Correctly set the
	max_mr_size device attribute.
In-Reply-To: <20080427160008.31018.15516.stgit@dell3.ogc.int> (Steve Wise's
	message of "Sun, 27 Apr 2008 11:00:08 -0500")
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
	<20080427160008.31018.15516.stgit@dell3.ogc.int>
Message-ID: <adalk2xzjhn.fsf@cisco.com>

thanks, applied


From rdreier at cisco.com  Mon Apr 28 15:47:27 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 15:47:27 -0700
Subject: [ofa-general] [PATCH 2.6.26 1/3] RDMA/cxgb3: Correctly serialize
	peer abort path.
In-Reply-To: <adaprs9zjjy.fsf@cisco.com> (Roland Dreier's message of "Mon, 28
	Apr 2008 15:44:17 -0700")
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
	<20080427160006.31018.66715.stgit@dell3.ogc.int>
	<adaprs9zjjy.fsf@cisco.com>
Message-ID: <adahcdlzjeo.fsf@cisco.com>

oh yeah, and I deleted an unused "out" label


From rdreier at cisco.com  Mon Apr 28 15:54:29 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 15:54:29 -0700
Subject: [ofa-general] [PATCH 2.6.26 3/3] RDMA/cxgb3: Support peer-2-peer
	connection setup.
In-Reply-To: <20080427160010.31018.67436.stgit@dell3.ogc.int> (Steve Wise's
	message of "Sun, 27 Apr 2008 11:00:10 -0500")
References: <20080427155456.31018.22282.stgit@dell3.ogc.int>
	<20080427160010.31018.67436.stgit@dell3.ogc.int>
Message-ID: <adad4o9zj2y.fsf@cisco.com>

thanks applied (and I see you deleted the unused label in this patch, heh)


From rdreier at cisco.com  Mon Apr 28 16:00:04 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 16:00:04 -0700
Subject: [ofa-general] Re: IB/ehca: handle negative return value from
	ibmebus_request_irq() properly in ehca_create_eq()
In-Reply-To: <200804281847.44968.hnguyen@linux.vnet.ibm.com> (Hoang-Nam
	Nguyen's message of "Mon, 28 Apr 2008 18:47:44 +0200")
References: <200804281847.44968.hnguyen@linux.vnet.ibm.com>
Message-ID: <ada4p9lzitn.fsf@cisco.com>

thanks, applied


From rdreier at cisco.com  Mon Apr 28 16:07:35 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Mon, 28 Apr 2008 16:07:35 -0700
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <20080428163731.GL30919@sgi.com> (akepner@sgi.com's message of
	"Mon, 28 Apr 2008 09:37:31 -0700")
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>
	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>
	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	<adak5ii0y7b.fsf@cisco.com>
	<bc457d660804280914x6bac9bc0raa7c05122a9806fa@mail.gmail.com>
	<20080428163731.GL30919@sgi.com>
Message-ID: <adazlrdy3wo.fsf@cisco.com>

 > I just saw this bug report today, but we've had similar crashes. 
 > Looks like the problem is that in ipoib_neigh_cleanup() this is 
 > done (no locking):
 > 
 >     neigh = *to_ipoib_neigh(n);
 > 
 > then later:
 > 
 >       spin_lock_irqsave(&priv->lock, flags);
 >       if (neigh->ah)
 >                ah = neigh->ah;
 >       list_del(&neigh->list); <---- neigh may be stale now
 >       ipoib_neigh_free(n->dev, neigh);
 >       spin_unlock_irqrestore(&priv->lock, flags);
 > 
 > neigh wasn't re-read after acquiring the lock, so it may point
 > to an already freed data structure.

Ugh, looks delicate to fix properly, since we don't have a lock to take
until we find out whether the neighbour is attached to an IPoIB device.

 > Unable to handle kernel paging request at 0000000000100108
 >                                           ^^^^^^^^^^^^^^^^
 >                                           LIST_POISON1 + 0x8

strange that the ofa bugzilla entry has a different address it's
crashing at.


From andrea at qumranet.com  Mon Apr 28 17:10:52 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 29 Apr 2008 02:10:52 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
References: <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
Message-ID: <20080429001052.GA8315@duo.random>

On Mon, Apr 28, 2008 at 01:34:11PM -0700, Christoph Lameter wrote:
> On Sun, 27 Apr 2008, Andrea Arcangeli wrote:
> 
> > Talking about post 2.6.26: the refcount with rcu in the anon-vma
> > conversion seems unnecessary and may explain part of the AIM slowdown
> > too. The rest looks ok and probably we should switch the code to a
> > compile-time decision between rwlock and rwsem (so obsoleting the
> > current spinlock).
> 
> You are going to take a semphore in an rcu section? Guess you did not 
> activate all debugging options while testing? I was not aware that you can 
> take a sleeping lock from a non preemptible context.

I'd hoped to discuss this topic after mmu-notifier-core was already
merged, but let's do it anyway.

My point of view is that there was no rcu when I wrote that code, yet
there was no reference count and yet all locking looks still exactly
the same as I wrote it. There's even still the page_table_lock to
serialize threads taking the mmap_sem in read mode against the first
vma->anon_vma = anon_vma during the page fault.

Frankly I've absolutely no idea why rcu is needed in all rmap code
when walking the page->mapping. Definitely the PG_locked is taken so
there's no way page->mapping could possibly go away under the rmap
code, hence the anon_vma can't go away as it's queued in the vma, and
the vma has to go away before the page is zapped out of the pte.

So there are some possible scenarios:

1) my original anon_vma code was buggy not taking the rcu_read_lock()
and somebody fixed it (I tend to exclude it)

2) somebody has seen a race that doesn't exist and didn't bother to
document it other than with this obscure comment

 * Getting a lock on a stable anon_vma from a page off the LRU is
 * tricky: page_lock_anon_vma rely on RCU to guard against the races.

I tend to exclude it too as VM folks are too smart for this to be the case.

3) somebody did some microoptimization using rcu and we surely can
undo that microoptimization to get the code back to my original code
that didn't need rcu despite it worked exactly the same, and that is
going to be cheaper to use with semaphores than doubling the number of
locked ops for every lock instruction.

Now the double atomic op may not be horrible when not contented, as it
works on the same cacheline but with cacheline bouncing with
contention it sounds doubly horrible than a single cacheline bounce
and I don't see the point of it as you can't use rcu anyways, so you
can't possibly take advantage of whatever microoptimization done over
the original locking.


From clameter at sgi.com  Mon Apr 28 18:28:06 2008
From: clameter at sgi.com (Christoph Lameter)
Date: Mon, 28 Apr 2008 18:28:06 -0700 (PDT)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080429001052.GA8315@duo.random>
References: <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com>
	<20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
	<20080429001052.GA8315@duo.random>
Message-ID: <Pine.LNX.4.64.0804281819020.2502@schroedinger.engr.sgi.com>

On Tue, 29 Apr 2008, Andrea Arcangeli wrote:

> Frankly I've absolutely no idea why rcu is needed in all rmap code
> when walking the page->mapping. Definitely the PG_locked is taken so
> there's no way page->mapping could possibly go away under the rmap
> code, hence the anon_vma can't go away as it's queued in the vma, and
> the vma has to go away before the page is zapped out of the pte.

zap_pte_range can race with the rmap code and it does not take the page 
lock. The page may not go away since a refcount was taken but the mapping 
can go away. Without RCU you have no guarantee that the anon_vma is 
existing when you take the lock. 

How long were you away from VM development?

> Now the double atomic op may not be horrible when not contented, as it
> works on the same cacheline but with cacheline bouncing with
> contention it sounds doubly horrible than a single cacheline bounce
> and I don't see the point of it as you can't use rcu anyways, so you
> can't possibly take advantage of whatever microoptimization done over
> the original locking.

Cachelines are acquired for exclusive use for a mininum duration. 
Multiple atomic operations can be performed after a cacheline becomes 
exclusive without danger of bouncing.


From gstreiff at neteffect.com  Mon Apr 28 21:24:04 2008
From: gstreiff at neteffect.com (Glenn Streiff)
Date: Mon, 28 Apr 2008 23:24:04 -0500
Subject: [ofa-general] [ PATCH 1/3 ] RDMA/nes LRO enablement
Message-ID: <200804290424.m3T4O4i4018169@velma.neteffect.com>

From: Faisal Latif <flatif at neteffect.com>

Adding Large Receive Offload (LRO) enablement to iw_nes
module.

Signed-off-by: Faisal Latif <flatif at neteffect.com.
Signed-off-by: Glenn Streiff <gstreiff at neteffect.com>
---
 drivers/infiniband/hw/nes/Kconfig   |    1 +
 drivers/infiniband/hw/nes/nes.c     |    4 +++
 drivers/infiniband/hw/nes/nes.h     |    1 +
 drivers/infiniband/hw/nes/nes_hw.c  |   53 ++++++++++++++++++++++++++++++-----
 drivers/infiniband/hw/nes/nes_hw.h  |   11 ++++++-
 drivers/infiniband/hw/nes/nes_nic.c |   12 +++++++-
 6 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/nes/Kconfig b/drivers/infiniband/hw/nes/Kconfig
index 2aeb7ac..d449eb6 100644
--- a/drivers/infiniband/hw/nes/Kconfig
+++ b/drivers/infiniband/hw/nes/Kconfig
@@ -2,6 +2,7 @@ config INFINIBAND_NES
 	tristate "NetEffect RNIC Driver"
 	depends on PCI && INET && INFINIBAND
 	select LIBCRC32C
+	select INET_LRO
 	---help---
 	  This is a low-level driver for NetEffect RDMA enabled
 	  Network Interface Cards (RNIC).
diff --git a/drivers/infiniband/hw/nes/nes.c b/drivers/infiniband/hw/nes/nes.c
index a4e9269..9f7364a 100644
--- a/drivers/infiniband/hw/nes/nes.c
+++ b/drivers/infiniband/hw/nes/nes.c
@@ -91,6 +91,10 @@ unsigned int nes_debug_level = 0;
 module_param_named(debug_level, nes_debug_level, uint, 0644);
 MODULE_PARM_DESC(debug_level, "Enable debug output level");
 
+unsigned int nes_lro_max_aggr = NES_LRO_MAX_AGGR;
+module_param(nes_lro_max_aggr, int, NES_LRO_MAX_AGGR);
+MODULE_PARM_DESC(nes_mro_max_aggr, " nic LRO MAX packet aggregation");
+
 LIST_HEAD(nes_adapter_list);
 static LIST_HEAD(nes_dev_list);
 
diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index cdf2e9a..484b5e3 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -173,6 +173,7 @@ extern int disable_mpa_crc;
 extern unsigned int send_first;
 extern unsigned int nes_drv_opt;
 extern unsigned int nes_debug_level;
+extern unsigned int nes_lro_max_aggr;
 
 extern struct list_head nes_adapter_list;
 
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 08964cc..197eee9 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -38,6 +38,7 @@ #include <linux/etherdevice.h>
 #include <linux/ip.h>
 #include <linux/tcp.h>
 #include <linux/if_vlan.h>
+#include <linux/inet_lro.h>
 
 #include "nes.h"
 
@@ -1375,6 +1376,25 @@ static void nes_rq_wqes_timeout(unsigned
 }
 
 
+static int nes_lro_get_skb_hdr(struct sk_buff *skb, void **iphdr,
+                       void **tcph, u64 *hdr_flags, void *priv)
+{
+        unsigned int ip_len;
+        struct iphdr *iph;
+        skb_reset_network_header(skb);
+        iph = ip_hdr(skb);
+        if (iph->protocol != IPPROTO_TCP)
+                return -1;
+        ip_len = ip_hdrlen(skb);
+        skb_set_transport_header(skb, ip_len);
+        *tcph = tcp_hdr(skb);
+
+        *hdr_flags = LRO_IPV4 | LRO_TCP;
+        *iphdr = iph;
+        return 0;
+}
+
+
 /**
  * nes_init_nic_qp
  */
@@ -1592,15 +1612,21 @@ int nes_init_nic_qp(struct nes_device *n
 	nesvnic->rq_wqes_timer.function = nes_rq_wqes_timeout;
 	nesvnic->rq_wqes_timer.data = (unsigned long)nesvnic;
 	nes_debug(NES_DBG_INIT, "NAPI support Enabled\n");
-
 	if (nesdev->nesadapter->et_use_adaptive_rx_coalesce)
 	{
 		nes_nic_init_timer(nesdev);
 		if (netdev->mtu > 1500)
 			jumbomode = 1;
-                nes_nic_init_timer_defaults(nesdev, jumbomode);
-	}
-
+		nes_nic_init_timer_defaults(nesdev, jumbomode);
+	}
+	nesvnic->lro_mgr.max_aggr = NES_LRO_MAX_AGGR;
+	nesvnic->lro_mgr.max_desc = NES_MAX_LRO_DESCRIPTORS;
+	nesvnic->lro_mgr.lro_arr = nesvnic->lro_desc;
+	nesvnic->lro_mgr.get_skb_header = nes_lro_get_skb_hdr;
+	nesvnic->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID;
+	nesvnic->lro_mgr.dev = netdev;
+	nesvnic->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY;
+	nesvnic->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY;
 	return 0;
 }
 
@@ -2254,10 +2280,13 @@ void nes_nic_ce_handler(struct nes_devic
 	u16 pkt_type;
 	u16 rqes_processed = 0;
 	u8 sq_cqes = 0;
+	u8 nes_use_lro = 0;
 
 	head = cq->cq_head;
 	cq_size = cq->cq_size;
 	cq->cqes_pending = 1;
+	if (nesvnic->netdev->features & NETIF_F_LRO)
+		nes_use_lro = 1;
 	do {
 		if (le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_NIC_CQE_MISC_IDX]) &
 				NES_NIC_CQE_VALID) {
@@ -2379,9 +2408,16 @@ void nes_nic_ce_handler(struct nes_devic
 								>> 16);
 						nes_debug(NES_DBG_CQ, "%s: Reporting stripped VLAN packet. Tag = 0x%04X\n",
 								nesvnic->netdev->name, vlan_tag);
-						nes_vlan_rx(rx_skb, nesvnic->vlan_grp, vlan_tag);
+						if (nes_use_lro)
+							lro_vlan_hwaccel_receive_skb(&nesvnic->lro_mgr, rx_skb,
+									nesvnic->vlan_grp, vlan_tag, NULL);
+						else
+							nes_vlan_rx(rx_skb, nesvnic->vlan_grp, vlan_tag);
 					} else {
-						nes_netif_rx(rx_skb);
+						if (nes_use_lro)
+							lro_receive_skb(&nesvnic->lro_mgr, rx_skb, NULL);
+						else
+							nes_netif_rx(rx_skb);
 					}
 				}
 
@@ -2413,13 +2449,14 @@ void nes_nic_ce_handler(struct nes_devic
 
 	} while (1);
 
+	if (nes_use_lro)
+		lro_flush_all(&nesvnic->lro_mgr);
 	if (sq_cqes) {
 		barrier();
 		/* restart the queue if it had been stopped */
 		if (netif_queue_stopped(nesvnic->netdev))
 			netif_wake_queue(nesvnic->netdev);
 	}
-
 	cq->cq_head = head;
 	/* nes_debug(NES_DBG_CQ, "CQ%u Processed = %u cqes, new head = %u.\n",
 			cq->cq_number, cqe_count, cq->cq_head); */
@@ -2432,7 +2469,7 @@ void nes_nic_ce_handler(struct nes_devic
 	}
 	if (atomic_read(&nesvnic->rx_skbs_needed))
 		nes_replenish_nic_rq(nesvnic);
-	}
+}
 
 
 /**
diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index 8f36e23..1363995 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -33,6 +33,8 @@
 #ifndef __NES_HW_H
 #define __NES_HW_H
 
+#include <linux/inet_lro.h>
+
 #define NES_PHY_TYPE_1G   2
 #define NES_PHY_TYPE_IRIS 3
 #define NES_PHY_TYPE_PUMA_10G  6
@@ -982,8 +984,10 @@ struct nes_hw_tune_timer {
 #define NES_TIMER_INT_LIMIT         2
 #define NES_TIMER_INT_LIMIT_DYNAMIC 10
 #define NES_TIMER_ENABLE_LIMIT      4
-#define NES_MAX_LINK_INTERRUPTS		128
-#define NES_MAX_LINK_CHECK		200
+#define NES_MAX_LINK_INTERRUPTS     128
+#define NES_MAX_LINK_CHECK          200
+#define NES_MAX_LRO_DESCRIPTORS     32
+#define NES_LRO_MAX_AGGR            64
 
 struct nes_adapter {
 	u64              fw_ver;
@@ -1183,6 +1187,9 @@ struct nes_vnic {
 	u8  of_device_registered;
 	u8  rdma_enabled;
 	u8  rx_checksum_disabled;
+	u32 lro_max_aggr;
+	struct net_lro_mgr lro_mgr;
+	struct net_lro_desc lro_desc[ NES_MAX_LRO_DESCRIPTORS ];
 };
 
 struct nes_ib_device {
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index e5366b0..6998af0 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -936,8 +936,7 @@ static int nes_netdev_change_mtu(struct	
 	return ret;
 }
 
-#define NES_ETHTOOL_STAT_COUNT 55
-static const char nes_ethtool_stringset[NES_ETHTOOL_STAT_COUNT][ETH_GSTRING_LEN] = {
+static const char nes_ethtool_stringset[][ETH_GSTRING_LEN] = {
 	"Link Change Interrupts",
 	"Linearized SKBs",
 	"T/GSO Requests",
@@ -993,8 +992,12 @@ static const char nes_ethtool_stringset[
 	"CQ Depth 32",
 	"CQ Depth 128",
 	"CQ Depth 256",
+	"LRO aggregated",
+	"LRO flushed",
+	"LRO no_desc",
 };
 
+#define NES_ETHTOOL_STAT_COUNT  ARRAY_SIZE(nes_ethtool_stringset)
 
 /**
  * nes_netdev_get_rx_csum
@@ -1189,6 +1192,9 @@ static void nes_netdev_get_ethtool_stats
 	target_stat_values[52] = int_mod_cq_depth_32;
 	target_stat_values[53] = int_mod_cq_depth_128;
 	target_stat_values[54] = int_mod_cq_depth_256;
+	target_stat_values[55] = nesvnic->lro_mgr.stats.aggregated;
+	target_stat_values[56] = nesvnic->lro_mgr.stats.flushed;
+	target_stat_values[57] = nesvnic->lro_mgr.stats.no_desc;
 
 }
 
@@ -1454,6 +1460,8 @@ static struct ethtool_ops nes_ethtool_op
 	.set_sg = ethtool_op_set_sg,
 	.get_tso = ethtool_op_get_tso,
 	.set_tso = ethtool_op_set_tso,
+	.get_flags = ethtool_op_get_flags,
+	.set_flags = ethtool_op_set_flags,
 };
 
 
From gstreiff at neteffect.com  Mon Apr 28 21:25:46 2008
From: gstreiff at neteffect.com (Glenn Streiff)
Date: Mon, 28 Apr 2008 23:25:46 -0500
Subject: [ofa-general] [ PATCH 2/3 ] RDMA/nes SFP+ enablement
Message-ID: <200804290425.m3T4PkKq018184@velma.neteffect.com>

From: Eric Schneider <eric.schneider at neteffect.com>

This patch enables the iw_nes module for NetEffect RNICs to
support additional PHYs including SFP+ optical transceivers (referred
to as ARGUS in the code).

Signed-off-by: Eric Schneider <eric.schneider at neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff at neteffect.com>
---
 drivers/infiniband/hw/nes/nes.h       |    4 -
 drivers/infiniband/hw/nes/nes_hw.c    |  210 ++++++++++++++++++++++++++++-----
 drivers/infiniband/hw/nes/nes_hw.h    |    6 +
 drivers/infiniband/hw/nes/nes_nic.c   |   69 +++++++----
 drivers/infiniband/hw/nes/nes_utils.c |   10 --
 5 files changed, 237 insertions(+), 62 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index 484b5e3..1f9f7bf 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -536,8 +536,8 @@ int nes_register_ofa_device(struct nes_i
 int nes_read_eeprom_values(struct nes_device *, struct nes_adapter *);
 void nes_write_1G_phy_reg(struct nes_device *, u8, u8, u16);
 void nes_read_1G_phy_reg(struct nes_device *, u8, u8, u16 *);
-void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16);
-void nes_read_10G_phy_reg(struct nes_device *, u16, u8);
+void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16, u16);
+void nes_read_10G_phy_reg(struct nes_device *, u8, u8, u16);
 struct nes_cqp_request *nes_get_cqp_request(struct nes_device *);
 void nes_post_cqp_request(struct nes_device *, struct nes_cqp_request *, int);
 int nes_arp_table(struct nes_device *, u32, u8 *, u32);
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 197eee9..19f2a5b 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -1208,11 +1208,15 @@ int nes_init_phy(struct nes_device *nesd
 {
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	u32 counter = 0;
+	u32 sds_common_control0;
 	u32 mac_index = nesdev->mac_index;
-	u32 tx_config;
+	u32 tx_config = 0;
 	u16 phy_data;
+	u32 temp_phy_data = 0;
+	u32 temp_phy_data2 = 0;
+	u32 i =0;
 
-	if (nesadapter->OneG_Mode) {
+	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 		nes_debug(NES_DBG_PHY, "1G PHY, mac_index = %d.\n", mac_index);
 		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) {
 			printk(PFX "%s: Programming mdc config for 1G\n", __func__);
@@ -1278,12 +1282,108 @@ int nes_init_phy(struct nes_device *nesd
 		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data);
 		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data | 0x0300);
 	} else {
-		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) {
+		if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) || (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) {
 			/* setup 10G MDIO operation */
 			tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG);
 			tx_config |= 0x14;
 			nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config);
 		}
+		if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) {
+			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+
+			temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+			mdelay(10);
+			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+			temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+
+			/* if firmware is already running (like from a driver un-load/load, don't do anything. */
+			if (temp_phy_data == temp_phy_data2) {
+				/* configure QT2505 AMCC PHY */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0x0000, 0x8000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc302, 0x0044);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc318, 0x0052);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528);
+
+				//remove micro from reset; chip boots from ROM, uploads EEPROM f/w image, uC executes f/w
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0002);
+
+				//wait for heart beat to start to know loading is done
+				counter = 0;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if (counter++ > 1000) {
+						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from heartbeat check <this is bad!!!> \n");
+						break;
+					}
+					mdelay(100);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+					temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+				} while ( (temp_phy_data2 == temp_phy_data) );
+
+
+				//wait for tracking to start to know f/w is good to go.
+				counter = 0;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7fd);
+					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if (counter++ > 1000) {
+						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from status check <this is bad!!!> \n");
+						break;
+					}
+					mdelay(1000);
+//					nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", temp_phy_data);
+				} while ( ((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70) );
+
+
+
+
+				//set LOS Control invert RXLOSB_I_PADINV
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd003, 0x0000);
+				//set LOS Control to mask of RXLOSB_I
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc314, 0x0042);
+				//set LED1 to input mode (LED1 and LED2 share same LED)
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd006, 0x0007);
+				//set LED2 to RX link_status and activity
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd007, 0x000A);
+				//set LED3 to RX link_status
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd008, 0x0009);
+
+				// reset the res-calibration on t2 serdes, ensures it is stable after the amcc phy is stable.
+
+				sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0);
+				sds_common_control0 |= 0x1;
+				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
+
+				//release the res-calibration reset.
+				sds_common_control0 &= 0xfffffffe;
+				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
+
+
+				i=0;
+				while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) && (i++ < 5000)) {
+					/* mdelay(1); */
+				}
+
+
+
+				// wait for link train done before moving on, or will get an interupt storm
+				counter = 0;
+				do {
+					temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +(0x200*(nesdev->mac_index&1) ));
+					if (counter++ > 1000) {
+						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from link train wait <this is bad, link didnt train!!!>\n");
+						break;
+					}
+					mdelay(1);
+				} while ( ((temp_phy_data & 0x0f1f0000) != 0x0f0f0000) );
+			}
+		}
 	}
 	return 0;
 }
@@ -2107,6 +2207,8 @@ static void nes_process_mac_intr(struct 
 	u32 u32temp;
 	u16 phy_data;
 	u16 temp_phy_data;
+	u32 pcs_val = 0x0f0f0000;
+	u32 pcs_mask = 0x0f1f0000;
 
 	spin_lock_irqsave(&nesadapter->phy_lock, flags);
 	if (nesadapter->mac_sw_state[mac_number] != NES_MAC_SW_IDLE) {
@@ -2170,13 +2272,29 @@ static void nes_process_mac_intr(struct 
 		nes_debug(NES_DBG_PHY, "Eth SERDES Common Status: 0=0x%08X, 1=0x%08X\n",
 				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0),
 				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0+0x200));
-		pcs_control_status = nes_read_indexed(nesdev,
-				NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
-		pcs_control_status = nes_read_indexed(nesdev,
-				NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
+
+		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_PUMA_1G) {
+			switch (mac_index) {
+				case 1:
+				case 3:
+					pcs_control_status = nes_read_indexed(nesdev,
+							NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200);
+					break;
+				default:
+					pcs_control_status = nes_read_indexed(nesdev,
+							NES_IDX_PHY_PCS_CONTROL_STATUS0);
+					break;
+			}
+		} else {
+			pcs_control_status = nes_read_indexed(nesdev,
+					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
+			pcs_control_status = nes_read_indexed(nesdev,
+					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
+		}
+
 		nes_debug(NES_DBG_PHY, "PCS PHY Control/Status%u: 0x%08X\n",
 				mac_index, pcs_control_status);
-		if (nesadapter->OneG_Mode) {
+		if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 			u32temp = 0x01010000;
 			if (nesadapter->port_count > 2) {
 				u32temp |= 0x02020000;
@@ -2185,24 +2303,58 @@ static void nes_process_mac_intr(struct 
 				phy_data = 0;
 				nes_debug(NES_DBG_PHY, "PCS says the link is down\n");
 			}
-		} else if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) {
-			nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]);
-			temp_phy_data = (u16)nes_read_indexed(nesdev,
-								NES_IDX_MAC_MDIO_CONTROL);
-			u32temp = 20;
-			do {
-				nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]);
-				phy_data = (u16)nes_read_indexed(nesdev,
-								NES_IDX_MAC_MDIO_CONTROL);
-				if ((phy_data == temp_phy_data) || (!(--u32temp)))
-					break;
-				temp_phy_data = phy_data;
-			} while (1);
-			nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
-				__func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP");
-
 		} else {
-			phy_data = (0x0f0f0000 == (pcs_control_status & 0x0f1f0000)) ? 4 : 0;
+			switch (nesadapter->phy_type[mac_index]) {
+				case NES_PHY_TYPE_IRIS:
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					u32temp = 20;
+					do {
+						nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+						phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+						if ((phy_data == temp_phy_data) || (!(--u32temp)))
+							break;
+						temp_phy_data = phy_data;
+					} while (1);
+					nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
+						__func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP");
+					break;
+				case NES_PHY_TYPE_ARGUS:
+					//clear the alarms.
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0x0008);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc001);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc002);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc005);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc006);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9003);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9004);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9005);
+					//check link status
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					u32temp = 100;
+					do {
+						nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+
+						phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+						if ((phy_data == temp_phy_data) || (!(--u32temp)))
+							break;
+						temp_phy_data = phy_data;
+					} while (1);
+					nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
+						__func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP");
+					break;
+				case NES_PHY_TYPE_PUMA_1G:
+					if (mac_index < 2) {
+						pcs_val = pcs_mask = 0x01010000;
+					} else {
+						pcs_val = pcs_mask = 0x02020000;
+					}
+					/* fall through */
+				default:
+					phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0;
+					break;
+			}
 		}
 
 		if (phy_data & 0x0004) {
@@ -2211,8 +2363,8 @@ static void nes_process_mac_intr(struct 
 				nes_debug(NES_DBG_PHY, "The Link is UP!!.  linkup was %d\n",
 						nesvnic->linkup);
 				if (nesvnic->linkup == 0) {
-					printk(PFX "The Link is now up for port %u, netdev %p.\n",
-							mac_index, nesvnic->netdev);
+					printk(PFX "The Link is now up for port %s, netdev %p.\n",
+							nesvnic->netdev->name, nesvnic->netdev);
 					if (netif_queue_stopped(nesvnic->netdev))
 						netif_start_queue(nesvnic->netdev);
 					nesvnic->linkup = 1;
@@ -2225,8 +2377,8 @@ static void nes_process_mac_intr(struct 
 				nes_debug(NES_DBG_PHY, "The Link is Down!!. linkup was %d\n",
 						nesvnic->linkup);
 				if (nesvnic->linkup == 1) {
-					printk(PFX "The Link is now down for port %u, netdev %p.\n",
-							mac_index, nesvnic->netdev);
+					printk(PFX "The Link is now down for port %s, netdev %p.\n",
+							nesvnic->netdev->name, nesvnic->netdev);
 					if (!(netif_queue_stopped(nesvnic->netdev)))
 						netif_stop_queue(nesvnic->netdev);
 					nesvnic->linkup = 0;
diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index 1363995..7d47f92 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -35,8 +35,10 @@ #define __NES_HW_H
 
 #include <linux/inet_lro.h>
 
-#define NES_PHY_TYPE_1G   2
-#define NES_PHY_TYPE_IRIS 3
+#define NES_PHY_TYPE_1G        2
+#define NES_PHY_TYPE_IRIS      3
+#define NES_PHY_TYPE_ARGUS     4
+#define NES_PHY_TYPE_PUMA_1G   5
 #define NES_PHY_TYPE_PUMA_10G  6
 
 #define NES_MULTICAST_PF_MAX 8
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 6998af0..5ba9dd3 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -1377,21 +1377,31 @@ static int nes_netdev_get_settings(struc
 
 	et_cmd->duplex = DUPLEX_FULL;
 	et_cmd->port = PORT_MII;
+
 	if (nesadapter->OneG_Mode) {
-		et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg;
-		et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg;
 		et_cmd->speed = SPEED_1000;
-		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
-				&phy_data);
-		if (phy_data&0x1000) {
-			et_cmd->autoneg = AUTONEG_ENABLE;
-		} else {
+		if (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) {
+			et_cmd->supported = SUPPORTED_1000baseT_Full;
+			et_cmd->advertising = ADVERTISED_1000baseT_Full;
 			et_cmd->autoneg = AUTONEG_DISABLE;
+			et_cmd->transceiver = XCVR_INTERNAL;
+			et_cmd->phy_address = nesdev->mac_index;
+		} else {
+			et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg;
+			et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg;
+			nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
+					&phy_data);
+			if (phy_data&0x1000) {
+				et_cmd->autoneg = AUTONEG_ENABLE;
+			} else {
+				et_cmd->autoneg = AUTONEG_DISABLE;
+			}
+			et_cmd->transceiver = XCVR_EXTERNAL;
+			et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
 		}
-		et_cmd->transceiver = XCVR_EXTERNAL;
-		et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
 	} else {
-		if (nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) {
+		if ( (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) ||
+                      (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS) ) {
 			et_cmd->transceiver = XCVR_EXTERNAL;
 			et_cmd->port = PORT_FIBRE;
 			et_cmd->supported = SUPPORTED_FIBRE;
@@ -1422,7 +1432,7 @@ static int nes_netdev_set_settings(struc
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	u16 phy_data;
 
-	if (nesadapter->OneG_Mode) {
+	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
 				&phy_data);
 		if (et_cmd->autoneg) {
@@ -1615,27 +1625,34 @@ struct net_device *nes_netdev_init(struc
 	list_add_tail(&nesvnic->list, &nesdev->nesadapter->nesvnic_list[nesdev->mac_index]);
 
 	if ((nesdev->netdev_count == 0) &&
-	    (PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index)) {
+	    ((PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index) ||
+	     ((nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) &&
+	      (((PCI_FUNC(nesdev->pcidev->devfn) == 1) && (nesdev->mac_index == 2)) ||
+	       ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)) ) ) ) ) {
+/* PUMA HACK
 		nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n",
 				NES_IDX_PHY_PCS_CONTROL_STATUS0+(0x200*(nesvnic->logical_port&1)));
+*/
 		u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesvnic->logical_port&1)));
-		u32temp |= 0x00200000;
-		nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesvnic->logical_port&1)), u32temp);
+				(0x200*(nesdev->mac_index&1)));
+		if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G) {
+			u32temp |= 0x00200000;
+			nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
+				(0x200*(nesdev->mac_index&1)), u32temp);
+		}
+
 		u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesvnic->logical_port&1)) );
+				(0x200*(nesdev->mac_index&1)) );
+
 		if ((u32temp&0x0f1f0000) == 0x0f0f0000) {
-			if (nesdev->nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) {
+			if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) {
 				nes_init_phy(nesdev);
-				nes_read_10G_phy_reg(nesdev, 1,
-						nesdev->nesadapter->phy_index[nesvnic->logical_port]);
+				nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1);
 				temp_phy_data = (u16)nes_read_indexed(nesdev,
 									NES_IDX_MAC_MDIO_CONTROL);
 				u32temp = 20;
 				do {
-					nes_read_10G_phy_reg(nesdev, 1,
-							nesdev->nesadapter->phy_index[nesvnic->logical_port]);
+					nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1);
 					phy_data = (u16)nes_read_indexed(nesdev,
 									NES_IDX_MAC_MDIO_CONTROL);
 					if ((phy_data == temp_phy_data) || (!(--u32temp)))
@@ -1652,6 +1669,14 @@ struct net_device *nes_netdev_init(struc
 				nes_debug(NES_DBG_INIT, "The Link is UP!!.\n");
 				nesvnic->linkup = 1;
 			}
+		} else if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) {
+			nes_debug(NES_DBG_INIT, "mac_index=%d, logical_port=%d, u32temp=0x%04X, PCI_FUNC=%d\n",
+				nesdev->mac_index, nesvnic->logical_port, u32temp, PCI_FUNC(nesdev->pcidev->devfn));
+			if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000) ) ||
+				((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000) ) )  {
+				nes_debug(NES_DBG_INIT, "The Link is UP!!.\n");
+				nesvnic->linkup = 1;
+			}
 		}
 		/* clear the MAC interrupt status, assumes direct logical to physical mapping */
 		u32temp = nes_read_indexed(nesdev, NES_IDX_MAC_INT_STATUS + (0x200 * nesdev->mac_index));
diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c
index c6d5631..fe83d1b 100644
--- a/drivers/infiniband/hw/nes/nes_utils.c
+++ b/drivers/infiniband/hw/nes/nes_utils.c
@@ -444,15 +444,13 @@ void nes_read_1G_phy_reg(struct nes_devi
 /**
  * nes_write_10G_phy_reg
  */
-void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg,
-		u8 phy_addr, u16 data)
+void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_addr, u8 dev_addr, u16 phy_reg,
+		u16 data)
 {
-	u32 dev_addr;
 	u32 port_addr;
 	u32 u32temp;
 	u32 counter;
 
-	dev_addr = 1;
 	port_addr = phy_addr;
 
 	/* set address */
@@ -492,14 +490,12 @@ void nes_write_10G_phy_reg(struct nes_de
  * This routine only issues the read, the data must be read
  * separately.
  */
-void nes_read_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg, u8 phy_addr)
+void nes_read_10G_phy_reg(struct nes_device *nesdev, u8 phy_addr, u8 dev_addr, u16 phy_reg)
 {
-	u32 dev_addr;
 	u32 port_addr;
 	u32 u32temp;
 	u32 counter;
 
-	dev_addr = 1;
 	port_addr = phy_addr;
 
 	/* set address */


From gstreiff at neteffect.com  Mon Apr 28 21:26:43 2008
From: gstreiff at neteffect.com (Glenn Streiff)
Date: Mon, 28 Apr 2008 23:26:43 -0500
Subject: [ofa-general] [ PATCH 3/3 ] RDMA/nes SFP+ cleanup
Message-ID: <200804290426.m3T4QhJl018196@velma.neteffect.com>

Clean up the SFP+ patch.

Signed-off-by: Glenn Streiff <gstreiff at neteffect.com>
---
 drivers/infiniband/hw/nes/nes_hw.c  |  279 ++++++++++++++++++-----------------
 drivers/infiniband/hw/nes/nes_nic.c |   63 ++++----
 2 files changed, 178 insertions(+), 164 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 19f2a5b..dce2d66 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -1214,9 +1214,9 @@ int nes_init_phy(struct nes_device *nesd
 	u16 phy_data;
 	u32 temp_phy_data = 0;
 	u32 temp_phy_data2 = 0;
-	u32 i =0;
+	u32 i = 0;
 
-	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
+	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ mac_index ] != NES_PHY_TYPE_PUMA_1G)) {
 		nes_debug(NES_DBG_PHY, "1G PHY, mac_index = %d.\n", mac_index);
 		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) {
 			printk(PFX "%s: Programming mdc config for 1G\n", __func__);
@@ -1225,17 +1225,17 @@ int nes_init_phy(struct nes_device *nesd
 			nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config);
 		}
 
-		nes_read_1G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 1, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 1 phy address %u = 0x%X.\n",
-				nesadapter->phy_index[mac_index], phy_data);
-		nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[mac_index],  0xb000);
+				nesadapter->phy_index[ mac_index ], phy_data);
+		nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[ mac_index ],  0xb000);
 
 		/* Reset the PHY */
-		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], 0x8000);
+		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], 0x8000);
 		udelay(100);
 		counter = 0;
 		do {
-			nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data);
+			nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], &phy_data);
 			nes_debug(NES_DBG_PHY, "Phy data from register 0 = 0x%X.\n", phy_data);
 			if (counter++ > 100) break;
 		} while (phy_data & 0x8000);
@@ -1243,145 +1243,156 @@ int nes_init_phy(struct nes_device *nesd
 		/* Setting no phy loopback */
 		phy_data &= 0xbfff;
 		phy_data |= 0x1140;
-		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index],  phy_data);
-		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data);
+		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ],  phy_data);
+		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0 = 0x%X.\n", phy_data);
 
-		nes_read_1G_phy_reg(nesdev, 0x17, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 0x17, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x17 = 0x%X.\n", phy_data);
 
-		nes_read_1G_phy_reg(nesdev, 0x1e, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 0x1e, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x1e = 0x%X.\n", phy_data);
 
 		/* Setting the interrupt mask */
-		nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x19 = 0x%X.\n", phy_data);
-		nes_write_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[mac_index], 0xffee);
+		nes_write_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[ mac_index ], 0xffee);
 
-		nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 0x19, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x19 = 0x%X.\n", phy_data);
 
 		/* turning on flow control */
-		nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x4 = 0x%X.\n", phy_data);
-		nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index],
+		nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ],
 				(phy_data & ~(0x03E0)) | 0xc00);
-		/* nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index],
-				phy_data | 0xc00); */
-		nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[mac_index], &phy_data);
+
+		/*
+		 * nes_write_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ],
+		 *			phy_data | 0xc00);
+		 */
+		nes_read_1G_phy_reg(nesdev, 4, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x4 = 0x%X.\n", phy_data);
 
-		nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x9 = 0x%X.\n", phy_data);
+
 		/* Clear Half duplex */
-		nes_write_1G_phy_reg(nesdev, 9, nesadapter->phy_index[mac_index],
+		nes_write_1G_phy_reg(nesdev, 9, nesadapter->phy_index[ mac_index ],
 				phy_data & ~(0x0100));
-		nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[mac_index], &phy_data);
+		nes_read_1G_phy_reg(nesdev, 9, nesadapter->phy_index[ mac_index ], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 0x9 = 0x%X.\n", phy_data);
 
-		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data);
-		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data | 0x0300);
+		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], &phy_data);
+		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ mac_index ], phy_data | 0x0300);
 	} else {
-		if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) || (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) {
+		if ((nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_IRIS) ||
+		    (nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_ARGUS)) {
 			/* setup 10G MDIO operation */
 			tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG);
 			tx_config |= 0x14;
 			nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config);
 		}
-		if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) {
-			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+		if ((nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_ARGUS)) {
+			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee);
 
 			temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
 			mdelay(10);
-			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee);
 			temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
 
 			/* if firmware is already running (like from a driver un-load/load, don't do anything. */
 			if (temp_phy_data == temp_phy_data2) {
 				/* configure QT2505 AMCC PHY */
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0x0000, 0x8000);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0000);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc302, 0x0044);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc318, 0x0052);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000);
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528);
-
-				//remove micro from reset; chip boots from ROM, uploads EEPROM f/w image, uC executes f/w
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0002);
-
-				//wait for heart beat to start to know loading is done
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0x0000, 0x8000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc300, 0x0000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc302, 0x0044);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc318, 0x0052);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc319, 0x0008);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc31a, 0x0098);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0x0026, 0x0E00);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0x0027, 0x0000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0x0028, 0xA528);
+
+				/*
+				 * remove micro from reset; chip boots from ROM,
+				 * uploads EEPROM f/w image, uC executes f/w
+				 */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc300, 0x0002);
+
+				/* wait for heart beat to start to know loading is done */
 				counter = 0;
 				do {
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee);
 					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
 					if (counter++ > 1000) {
 						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from heartbeat check <this is bad!!!> \n");
 						break;
 					}
 					mdelay(100);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7ee);
 					temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
 				} while ( (temp_phy_data2 == temp_phy_data) );
 
-
-				//wait for tracking to start to know f/w is good to go.
+				/* wait for tracking to start to know f/w is good to go */
 				counter = 0;
 				do {
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7fd);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x3, 0xd7fd);
 					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
 					if (counter++ > 1000) {
 						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from status check <this is bad!!!> \n");
 						break;
 					}
 					mdelay(1000);
-//					nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", temp_phy_data);
-				} while ( ((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70) );
+					/* nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n", temp_phy_data); */
+				} while (((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70));
+
+				/* set LOS Control invert RXLOSB_I_PADINV */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd003, 0x0000);
 
+				/* set LOS Control to mask of RXLOSB_I */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xc314, 0x0042);
 
+				/* set LED1 to input mode (LED1 and LED2 share same LED) */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd006, 0x0007);
 
+				/* set LED2 to RX link_status and activity */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd007, 0x000A);
 
-				//set LOS Control invert RXLOSB_I_PADINV
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd003, 0x0000);
-				//set LOS Control to mask of RXLOSB_I
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc314, 0x0042);
-				//set LED1 to input mode (LED1 and LED2 share same LED)
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd006, 0x0007);
-				//set LED2 to RX link_status and activity
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd007, 0x000A);
-				//set LED3 to RX link_status
-				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd008, 0x0009);
+				/* set LED3 to RX link_status */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 0x1, 0xd008, 0x0009);
 
-				// reset the res-calibration on t2 serdes, ensures it is stable after the amcc phy is stable.
+				/*
+				 * reset the res-calibration on t2 serdes, ensures it is stable
+				 * after the amcc phy is stable.
+				 */
 
-				sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0);
+				sds_common_control0  = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0);
 				sds_common_control0 |= 0x1;
 				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
 
-				//release the res-calibration reset.
+				/* release the res-calibration reset */
 				sds_common_control0 &= 0xfffffffe;
 				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
 
-
-				i=0;
-				while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040) && (i++ < 5000)) {
+				i = 0;
+				while (((nes_read32(nesdev->regs + NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040)
+						&& (i++ < 5000)) {
 					/* mdelay(1); */
 				}
 
-
-
-				// wait for link train done before moving on, or will get an interupt storm
+				/* wait for link train done before moving on, or will get an interupt storm */
 				counter = 0;
 				do {
-					temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +(0x200*(nesdev->mac_index&1) ));
+					temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
+										(0x200 * (nesdev->mac_index & 1)));
 					if (counter++ > 1000) {
-						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from link train wait <this is bad, link didnt train!!!>\n");
+						nes_debug(NES_DBG_PHY,
+							"AMCC PHY- breaking from link train wait <this is bad, link didnt train!!!>\n");
 						break;
 					}
 					mdelay(1);
-				} while ( ((temp_phy_data & 0x0f1f0000) != 0x0f0f0000) );
+				} while (((temp_phy_data & 0x0f1f0000) != 0x0f0f0000));
 			}
 		}
 	}
@@ -2271,30 +2282,30 @@ static void nes_process_mac_intr(struct 
 		}
 		nes_debug(NES_DBG_PHY, "Eth SERDES Common Status: 0=0x%08X, 1=0x%08X\n",
 				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0),
-				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0+0x200));
+				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0 + 0x200));
 
-		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_PUMA_1G) {
+		if (nesadapter->phy_type[ mac_index ] == NES_PHY_TYPE_PUMA_1G) {
 			switch (mac_index) {
-				case 1:
-				case 3:
-					pcs_control_status = nes_read_indexed(nesdev,
-							NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200);
-					break;
-				default:
-					pcs_control_status = nes_read_indexed(nesdev,
-							NES_IDX_PHY_PCS_CONTROL_STATUS0);
-					break;
+			case 1:
+			case 3:
+				pcs_control_status = nes_read_indexed(nesdev,
+						NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200);
+				break;
+			default:
+				pcs_control_status = nes_read_indexed(nesdev,
+						NES_IDX_PHY_PCS_CONTROL_STATUS0);
+				break;
 			}
 		} else {
 			pcs_control_status = nes_read_indexed(nesdev,
-					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
+					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200));
 			pcs_control_status = nes_read_indexed(nesdev,
-					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
+					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200));
 		}
 
 		nes_debug(NES_DBG_PHY, "PCS PHY Control/Status%u: 0x%08X\n",
 				mac_index, pcs_control_status);
-		if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
+		if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ mac_index ] != NES_PHY_TYPE_PUMA_1G)) {
 			u32temp = 0x01010000;
 			if (nesadapter->port_count > 2) {
 				u32temp |= 0x02020000;
@@ -2304,56 +2315,58 @@ static void nes_process_mac_intr(struct 
 				nes_debug(NES_DBG_PHY, "PCS says the link is down\n");
 			}
 		} else {
-			switch (nesadapter->phy_type[mac_index]) {
-				case NES_PHY_TYPE_IRIS:
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
-					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
-					u32temp = 20;
-					do {
-						nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
-						phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
-						if ((phy_data == temp_phy_data) || (!(--u32temp)))
-							break;
-						temp_phy_data = phy_data;
-					} while (1);
-					nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
-						__func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP");
-					break;
-				case NES_PHY_TYPE_ARGUS:
-					//clear the alarms.
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0x0008);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc001);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc002);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc005);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc006);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9003);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9004);
-					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9005);
-					//check link status
+			switch (nesadapter->phy_type[ mac_index ]) {
+			case NES_PHY_TYPE_IRIS:
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 1);
+				temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+				u32temp = 20;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 1);
+					phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if ((phy_data == temp_phy_data) || (!(--u32temp)))
+						break;
+					temp_phy_data = phy_data;
+				} while (1);
+				nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
+					__func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP");
+				break;
+
+			case NES_PHY_TYPE_ARGUS:
+				//clear the alarms.
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0x0008);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc001);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc002);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc005);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 4, 0xc006);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 0x9003);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 0x9004);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 0x9005);
+				//check link status
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[ mac_index ], 1, 1);
+				temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+				u32temp = 100;
+				do {
 					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
-					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
-					u32temp = 100;
-					do {
-						nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
 
-						phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
-						if ((phy_data == temp_phy_data) || (!(--u32temp)))
-							break;
-						temp_phy_data = phy_data;
-					} while (1);
-					nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
-						__func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP");
-					break;
-				case NES_PHY_TYPE_PUMA_1G:
-					if (mac_index < 2) {
-						pcs_val = pcs_mask = 0x01010000;
-					} else {
-						pcs_val = pcs_mask = 0x02020000;
-					}
-					/* fall through */
-				default:
-					phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0;
-					break;
+					phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if ((phy_data == temp_phy_data) || (!(--u32temp)))
+						break;
+					temp_phy_data = phy_data;
+				} while (1);
+				nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
+					__func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP");
+				break;
+
+			case NES_PHY_TYPE_PUMA_1G:
+				if (mac_index < 2) {
+					pcs_val = pcs_mask = 0x01010000;
+				} else {
+					pcs_val = pcs_mask = 0x02020000;
+				}
+				/* fall through */
+			default:
+				phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0;
+				break;
 			}
 		}
 
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 5ba9dd3..939887a 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -1376,20 +1376,20 @@ static int nes_netdev_get_settings(struc
 	u16 phy_data;
 
 	et_cmd->duplex = DUPLEX_FULL;
-	et_cmd->port = PORT_MII;
+	et_cmd->port   = PORT_MII;
 
 	if (nesadapter->OneG_Mode) {
 		et_cmd->speed = SPEED_1000;
-		if (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) {
-			et_cmd->supported = SUPPORTED_1000baseT_Full;
+		if (nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_PUMA_1G) {
+			et_cmd->supported   = SUPPORTED_1000baseT_Full;
 			et_cmd->advertising = ADVERTISED_1000baseT_Full;
-			et_cmd->autoneg = AUTONEG_DISABLE;
+			et_cmd->autoneg     = AUTONEG_DISABLE;
 			et_cmd->transceiver = XCVR_INTERNAL;
 			et_cmd->phy_address = nesdev->mac_index;
 		} else {
-			et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg;
-			et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg;
-			nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
+			et_cmd->supported   = SUPPORTED_1000baseT_Full  | SUPPORTED_Autoneg;
+			et_cmd->advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg;
+			nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ nesdev->mac_index ],
 					&phy_data);
 			if (phy_data&0x1000) {
 				et_cmd->autoneg = AUTONEG_ENABLE;
@@ -1400,20 +1400,20 @@ static int nes_netdev_get_settings(struc
 			et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
 		}
 	} else {
-		if ( (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) ||
-                      (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS) ) {
+		if ((nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_IRIS) ||
+		    (nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_ARGUS)) {
 			et_cmd->transceiver = XCVR_EXTERNAL;
-			et_cmd->port = PORT_FIBRE;
-			et_cmd->supported = SUPPORTED_FIBRE;
+			et_cmd->port        = PORT_FIBRE;
+			et_cmd->supported   = SUPPORTED_FIBRE;
 			et_cmd->advertising = ADVERTISED_FIBRE;
-			et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
+			et_cmd->phy_address = nesadapter->phy_index[ nesdev->mac_index ];
 		} else {
 			et_cmd->transceiver = XCVR_INTERNAL;
-			et_cmd->supported = SUPPORTED_10000baseT_Full;
+			et_cmd->supported   = SUPPORTED_10000baseT_Full;
 			et_cmd->advertising = ADVERTISED_10000baseT_Full;
 			et_cmd->phy_address = nesdev->mac_index;
 		}
-		et_cmd->speed = SPEED_10000;
+		et_cmd->speed   = SPEED_10000;
 		et_cmd->autoneg = AUTONEG_DISABLE;
 	}
 	et_cmd->maxtxpkt = 511;
@@ -1432,17 +1432,18 @@ static int nes_netdev_set_settings(struc
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	u16 phy_data;
 
-	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G)) {
-		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
+	if ((nesadapter->OneG_Mode) &&
+	    (nesadapter->phy_type[ nesdev->mac_index ] != NES_PHY_TYPE_PUMA_1G)) {
+		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ nesdev->mac_index ],
 				&phy_data);
 		if (et_cmd->autoneg) {
 			/* Turn on Full duplex, Autoneg, and restart autonegotiation */
 			phy_data |= 0x1300;
 		} else {
-			// Turn off autoneg
+			/* Turn off autoneg */
 			phy_data &= ~0x1000;
 		}
-		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
+		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[ nesdev->mac_index ],
 				phy_data);
 	}
 
@@ -1628,13 +1629,13 @@ struct net_device *nes_netdev_init(struc
 	    ((PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index) ||
 	     ((nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) &&
 	      (((PCI_FUNC(nesdev->pcidev->devfn) == 1) && (nesdev->mac_index == 2)) ||
-	       ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)) ) ) ) ) {
-/* PUMA HACK
-		nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n",
-				NES_IDX_PHY_PCS_CONTROL_STATUS0+(0x200*(nesvnic->logical_port&1)));
-*/
+	       ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)))))){
+		/*
+		 * nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n",
+		 *		NES_IDX_PHY_PCS_CONTROL_STATUS0 + (0x200 * (nesvnic->logical_port & 1)));
+		 */
 		u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesdev->mac_index&1)));
+				(0x200*(nesdev->mac_index & 1)));
 		if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G) {
 			u32temp |= 0x00200000;
 			nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
@@ -1645,14 +1646,14 @@ struct net_device *nes_netdev_init(struc
 				(0x200*(nesdev->mac_index&1)) );
 
 		if ((u32temp&0x0f1f0000) == 0x0f0f0000) {
-			if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) {
+			if (nesdev->nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_IRIS) {
 				nes_init_phy(nesdev);
-				nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1);
+				nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[ nesdev->mac_index ], 1, 1);
 				temp_phy_data = (u16)nes_read_indexed(nesdev,
 									NES_IDX_MAC_MDIO_CONTROL);
 				u32temp = 20;
 				do {
-					nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1);
+					nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[ nesdev->mac_index ], 1, 1);
 					phy_data = (u16)nes_read_indexed(nesdev,
 									NES_IDX_MAC_MDIO_CONTROL);
 					if ((phy_data == temp_phy_data) || (!(--u32temp)))
@@ -1669,11 +1670,11 @@ struct net_device *nes_netdev_init(struc
 				nes_debug(NES_DBG_INIT, "The Link is UP!!.\n");
 				nesvnic->linkup = 1;
 			}
-		} else if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) {
+		} else if (nesdev->nesadapter->phy_type[ nesdev->mac_index ] == NES_PHY_TYPE_PUMA_1G) {
 			nes_debug(NES_DBG_INIT, "mac_index=%d, logical_port=%d, u32temp=0x%04X, PCI_FUNC=%d\n",
 				nesdev->mac_index, nesvnic->logical_port, u32temp, PCI_FUNC(nesdev->pcidev->devfn));
-			if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000) ) ||
-				((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000) ) )  {
+			if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000)) ||
+			    ((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000))) {
 				nes_debug(NES_DBG_INIT, "The Link is UP!!.\n");
 				nesvnic->linkup = 1;
 			}
@@ -1683,7 +1684,7 @@ struct net_device *nes_netdev_init(struc
 		nes_debug(NES_DBG_INIT, "Phy interrupt status = 0x%X.\n", u32temp);
 		nes_write_indexed(nesdev, NES_IDX_MAC_INT_STATUS + (0x200 * nesdev->mac_index), u32temp);
 
-		if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_IRIS)
+		if (nesdev->nesadapter->phy_type[ nesdev->mac_index ] != NES_PHY_TYPE_IRIS)
 			nes_init_phy(nesdev);
 
 	}


From sashak at voltaire.com  Tue Apr 29 01:01:41 2008
From: sashak at voltaire.com (Sasha Khapyorsky)
Date: Tue, 29 Apr 2008 08:01:41 +0000
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group due to a
	temporary node soft lockup.
In-Reply-To: <20080428110332.6fb8e1d8.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>
	<20080427171140.GI22406@sashak.voltaire.com>
	<20080428110332.6fb8e1d8.weiny2@llnl.gov>
Message-ID: <20080429080141.GE20790@sashak.voltaire.com>

On 11:03 Mon 28 Apr     , Ira Weiny wrote:
> 
> Yes I agree.  Per my previous mail to Or I found that light sweeps did not in
> fact notice the nodes were gone.  Looking at the logs I am not sure what
> caused OpenSM to notice them.  However, something must have triggered a heavy
> sweep when those nodes were catatonic.  From the logs they were unresponsive
> for multiple seconds, some as long as 30s.  It is still a bit of a mystery why
> OpenSM did a heavy sweep during this period but I don't think it is
> unreasonable for it to do so.

Could you send me log file?

Sasha


From jackm at dev.mellanox.co.il  Mon Apr 28 23:40:37 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 29 Apr 2008 09:40:37 +0300
Subject: [ofa-general] Re: [PATCH] mlx4_core: enable changing default max HCA
	resource limits at run time -- reposting
In-Reply-To: <adawsmi0yim.fsf@cisco.com>
References: <200804281438.28417.jackm@dev.mellanox.co.il>
	<adawsmi0yim.fsf@cisco.com>
Message-ID: <200804290940.37587.jackm@dev.mellanox.co.il>

On Monday 28 April 2008 18:50, Roland Dreier wrote:
> Hmm... wouldn't it be better to follow the same interface as ib_mthca
> and have consumers pass in the numbers instead of the log sizes?  Having
> two different ways of changing the same parameters seems pretty confusing.
> 
Dotan also mentioned this.  Our preference was to change ib_mthca to use logs
as well, since this way the user knows exactly what the amounts will be (instead
of having a hidden "round up to next or equal power of 2").

I just did not get around to doing this change for ib_mthca. I'll do this
for the next release.

- Jack

P.S.
BTW, I think there is a bug in the mthca driver, which messes things
up if the profile numbers are NOT powers of 2:
(from mthca_make_profile, in file mthca_profile.c):
	for (i = 0; i < MTHCA_RES_NUM; ++i) {
		profile[i].type     = i;
		profile[i].log_num  = max(ffs(profile[i].num) - 1, 0);
		profile[i].size    *= profile[i].num;

  should be
	for (i = 0; i < MTHCA_RES_NUM; ++i) {
		profile[i].type     = i;
		profile[i].num      = roundup_pow_of_two(profile[i].num);
		profile[i].log_num  = ilog2(profile[i].num);
		profile[i].size    *= profile[i].num;

since later the procedure assumes that all sizes are powers of 2.


From ogerlitz at voltaire.com  Mon Apr 28 23:57:58 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 29 Apr 2008 09:57:58 +0300
Subject: [ofa-general] Nodes dropping out of IPoIB mcast group 
In-Reply-To: <20080428091923.0abf9fb5.weiny2@llnl.gov>
References: <20080423133816.6c1b6315.weiny2@llnl.gov>	<48109087.6030606@voltaire.com>	<20080424143125.2aad1db8.weiny2@llnl.gov>	<15ddcffd0804241523p19559580vc3a1293c1fe097b1@mail.gmail.com>	<20080424181657.28d58a29.weiny2@llnl.gov>	<48143DBA.3080701@voltaire.com>
	<20080428091923.0abf9fb5.weiny2@llnl.gov>
Message-ID: <4816C6F6.6000602@voltaire.com>

Ira Weiny wrote:
>> OK, good. Does this problem exist in the released openSM? if yes, what 
>> would be the trigger for the SM to "really discover" (i.e do PortInfo 
>> SET) this sub-fabric and how much time would it take to reach this 
>> trigger, worst case wise?
> Yes, this is in the current released version of OpenSM, AFAICT.  The trigger
> is: the single link separating the partial sub net will come up and that trap
> will cause OpenSM to resweep.  I believe this will happen on the next resweep
> cycle which is by default 10 sec.  (But this is configurable.)  I don't think
> there is an issue with allowing OpenSM to resweep as designed.

And when openSM does the heavy sweep, what nodes would have their client 
rereg bit set, only the ones beyond the recovered link? also will openSM 
cycle the logical link state of those nodes (which is active!) through 
armed-active again or the only SET would be for the rereg bit?

Or.


From jackm at dev.mellanox.co.il  Mon Apr 28 23:59:41 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 29 Apr 2008 09:59:41 +0300
Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old
	FMR R_Keys too soon
In-Reply-To: <adazlrh62mv.fsf@cisco.com>
References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de>
	<adazlrh62mv.fsf@cisco.com>
Message-ID: <200804290959.41881.jackm@dev.mellanox.co.il>

On Saturday 26 April 2008 00:33, Roland Dreier wrote:
> Looks mostly OK... the only thing I worry about is in the Sinai
> optimization case, do we run into trouble with bits getting carried into
> the top bits of the key?
> 
> Can someone from Mellanox review this more carefully?

Olaf submitted such a patch (as an RFC) in February, and we had a discussion thread then:
    http://lists.openfabrics.org/pipermail/general/2008-February/046863.html

We concluded at that time that the patch was OK.

I also reviewed the patch again (especially the Sinai optimization), and the patch is OK
there, too:

The key-adjustment games are all within the index portion of the key.
The spare-bits portion is outside this range (i.e., the most
significant byte only).

For Sinai, the increment used on the key when remapping is 0x10000000
(low order bit of most significant byte), so there is no influence on the index portion
(which therefore remains constant, as it should). As such, there is no need to
call adjust-key() when unmapping an fmr -- since the index portion of the key
is not touched -- neither in map, nor in unmap.

- Jack


From dorfman.eli at gmail.com  Tue Apr 29 00:33:07 2008
From: dorfman.eli at gmail.com (Eli Dorfman)
Date: Tue, 29 Apr 2008 10:33:07 +0300
Subject: [ofa-general] [PATCH] IB/iSER: Count fmr alignment violations per
	session
Message-ID: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com>

Count fmr alignment violations per session
as part of the iscsi statistics.

Signed-off-by: Eli Dorfman <elid at voltaire.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c  |    4 +++-
 drivers/infiniband/ulp/iser/iser_memory.c |    2 ++
 include/scsi/libiscsi.h                   |    1 +
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c
b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 451e601..df44fa7 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -472,13 +472,15 @@ iscsi_iser_conn_get_stats(struct iscsi_cls_conn
*cls_conn, struct iscsi_stats *s
 	stats->r2t_pdus = conn->r2t_pdus_cnt; /* always 0 */
 	stats->tmfcmd_pdus = conn->tmfcmd_pdus_cnt;
 	stats->tmfrsp_pdus = conn->tmfrsp_pdus_cnt;
-	stats->custom_length = 3;
+	stats->custom_length = 4;
 	strcpy(stats->custom[0].desc, "qp_tx_queue_full");
 	stats->custom[0].value = 0; /* TB iser_conn->qp_tx_queue_full; */
 	strcpy(stats->custom[1].desc, "fmr_map_not_avail");
 	stats->custom[1].value = 0; /* TB iser_conn->fmr_map_not_avail */;
 	strcpy(stats->custom[2].desc, "eh_abort_cnt");
 	stats->custom[2].value = conn->eh_abort_cnt;
+	strcpy(stats->custom[3].desc, "fmr_unalign_cnt");
+	stats->custom[3].value = conn->fmr_unalign_cnt;
 }

 static int
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c
b/drivers/infiniband/ulp/iser/iser_memory.c
index ee58199..cac50c4 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -423,6 +423,7 @@ void iser_dma_unmap_task_data(struct
iscsi_iser_cmd_task *iser_ctask)
 int iser_reg_rdma_mem(struct iscsi_iser_cmd_task *iser_ctask,
 		      enum   iser_data_dir        cmd_dir)
 {
+	struct iscsi_conn    *iscsi_conn = iser_ctask->iser_conn->iscsi_conn;
 	struct iser_conn     *ib_conn = iser_ctask->iser_conn->ib_conn;
 	struct iser_device   *device = ib_conn->device;
 	struct ib_device     *ibdev = device->ib_device;
@@ -437,6 +438,7 @@ int iser_reg_rdma_mem(struct iscsi_iser_cmd_task
*iser_ctask,

 	aligned_len = iser_data_buf_aligned_len(mem, ibdev);
 	if (aligned_len != mem->dma_nents) {
+		iscsi_conn->fmr_unalign_cnt++;
 		iser_warn("rdma alignment violation %d/%d aligned\n",
 			 aligned_len, mem->size);
 		iser_data_buf_dump(mem, ibdev);
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 7b90b63..cd3ca63 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -225,6 +225,7 @@ struct iscsi_conn {

 	/* custom statistics */
 	uint32_t		eh_abort_cnt;
+	uint32_t		fmr_unalign_cnt;
 };

 struct iscsi_pool {
-- 
1.5.5


From Jean-Francois.Neyroud at bull.net  Tue Apr 29 01:17:38 2008
From: Jean-Francois.Neyroud at bull.net (Jean-Francois.Neyroud)
Date: Tue, 29 Apr 2008 10:17:38 +0200
Subject: [ofa-general] perfquery causes kernel to be stuck in
	ib_unregister_mad_agent() function
Message-ID: <4816D9A2.7040009@bull.net>

If I attemp to query at the same time the performance counters on all 
nodes on a cluster ( 40 nodes) .
perfquery causes kernel to be stuck in ib_unregister_mad_agent() function.

Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the kernel.
# pgrep perfquery
27578
# cat /proc/27578/wchan
ib_unregister_mad_agent

I have this problem with OFED-1.2.5 or 1.3 and with mthca or ConnectX, 
not tested with others HCA and OFED.

Reproduceur with 2 nodes and without switch:

# for i in `seq 1 100`; do perfquery >/dev/null 2>&1 & done

# pgrep perfquery | while read pid; do echo "$pid: `cat /proc/$pid/wchan`"; echo; done | dshbak -c
----------------
[14936,14938-15029]
----------------
 0
----------------

----------------
----------------
14937
----------------
 flush_cpu_workqueue


Does anyone know this problem ?

Jean-Francois.


From eli at dev.mellanox.co.il  Tue Apr 29 02:17:33 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 29 Apr 2008 12:17:33 +0300
Subject: [ofa-general] [PATCH] IB/ipoib: set child MTU as the parent's
Message-ID: <1209460653.28929.1.camel@mtls03>

>From 71e918e23f7f8815f3248c1089f69680ae6a203b Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at mellanox.co.il>
Date: Tue, 29 Apr 2008 11:48:09 +0300
Subject: [PATCH] IB/ipoib: set child MTU as the parent's

When the child joins the broadcast group reset the mtu to
the real one.

Signed-off-by: Eli Cohen <eli at mellanox.co.il>
---
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 431fdea..872b670 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -90,6 +90,9 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey)
 	}
 
 	priv->max_ib_mtu = ppriv->max_ib_mtu;
+	/* MTU will be reset when mcast join happens */
+	priv->dev->mtu  = IPOIB_UD_MTU(priv->max_ib_mtu);
+	priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
 	set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
 
 	priv->pkey = pkey;
-- 
1.5.5


From vlad at dev.mellanox.co.il  Tue Apr 29 03:40:55 2008
From: vlad at dev.mellanox.co.il (Vladimir Sokolovsky)
Date: Tue, 29 Apr 2008 13:40:55 +0300
Subject: [ofa-general] [PATCH 0/8] RDS patch set
In-Reply-To: <200804241114.51260.okir@lst.de>
References: <200804241114.51260.okir@lst.de>
Message-ID: <4816FB37.8010407@dev.mellanox.co.il>

Olaf Kirch wrote:
> Hi all,
> 
> here's another set of patches related to RDS. The patches can be found
> in git://git.openfabrics.org/ofed_1_3/linux-2.6
> and git://git.openfabrics.org/ofed_1_3/rds-tools
> 
> There are seven kernel patches. I would very much like to see the first
> four of them in OFED 1.3.1 if possible. On the remaining 3, I'm not
> particularly religious - I'm fine if they make it into 1.3.* at a later
> time.
> 
> RDS: Fix IB max_unacked_* sysctls
> 	Straightforward bugfix.
> 
> mthca/mlx4: avoid recycling old FMR R_Keys too soon
> 	This is a re-run of a mthca patch I posted a while back; Jack
> 	Morgenstein requested that I should make the same change in the
> 	mlx4 driver. Here it is; review and feedback much appreciated.
> 
> Reduce struct rds_ib_send_work size
> RDS: Increase the default number of WRs
> 	These two patches go together; they shrink the size of the
> 	send work entry we allocate in favor of allocating more of them.
> 	I would very much like to see these in OFED 1.3.1
> 
> RDS: Two small code reorgs in the connection code
> RDS: Use IB for loopback
> 	These also go together. For loopback traffic, we need to use
> 	IB if available, instead of the special loopback transport currently
> 	used. The reason is that lots of our tests run on single hosts over
> 	loopback, and we want to stress things like RDMA.
> 	
> RDS: Implement rds ping
> 	This is really a new feature. Essentially, ping over RDS.
> 
> There's a companion patch to rds-tools that implements the rds-ping
> user space utility that leverages the functionality added by the kernel
> patch above.
> 
> Olaf

Applied to ofed_1_3/linux-2.6.git and to ofed_1_3/rds-tools.git.

The following OFED build includes these patches:

http://www.openfabrics.org/builds/ofed-1.3.1/OFED-1.3.1-20080429-0110.tgz

Regards,
Vladimir


From hrosenstock at xsigo.com  Tue Apr 29 03:49:20 2008
From: hrosenstock at xsigo.com (Hal Rosenstock)
Date: Tue, 29 Apr 2008 03:49:20 -0700
Subject: [ofa-general] perfquery causes kernel to be stuck in
	ib_unregister_mad_agent() function
In-Reply-To: <4816D9A2.7040009@bull.net>
References: <4816D9A2.7040009@bull.net>
Message-ID: <1209466160.689.433.camel@hrosenstock-ws.xsigo.com>

Hi Jean-Francois,

On Tue, 2008-04-29 at 10:17 +0200, Jean-Francois.Neyroud wrote:
> If I attemp to query at the same time the performance counters on all 
> nodes on a cluster ( 40 nodes) .
> perfquery causes kernel to be stuck in ib_unregister_mad_agent() function.
> 
> Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the kernel.
> # pgrep perfquery
> 27578
> # cat /proc/27578/wchan
> ib_unregister_mad_agent
> 
> I have this problem with OFED-1.2.5 or 1.3 and with mthca or ConnectX, 
> not tested with others HCA and OFED.
> 
> Reproduceur with 2 nodes and without switch:
> 
> # for i in `seq 1 100`; do perfquery >/dev/null 2>&1 & done
> 
> # pgrep perfquery | while read pid; do echo "$pid: `cat /proc/$pid/wchan`"; echo; done | dshbak -c
> ----------------
> [14936,14938-15029]
> ----------------
>  0
> ----------------
> 
> ----------------
> ----------------
> 14937
> ----------------
>  flush_cpu_workqueue
> 
> 
> Does anyone know this problem ?

This could be related to the lock dependency issue discussed in the
following thread:

http://lists.openfabrics.org/pipermail/general/2008-January/044723.html

You might want to look to the following for the actual fix:

commit 2fe7e6f7c9f55eac24c5b3cdf56af29ab9b0ca81
Author: Roland Dreier <rolandd at cisco.com>
Date:   Fri Jan 25 14:15:42 2008 -0800

    IB/umad: Simplify and fix locking
    
    In addition to being overly complex, the locking in user_mad.c is
    broken: there were multiple reports of deadlocks and lockdep warnings.
    In particular it seems that a single thread may end up trying to take
    the same rwsem for reading more than once, which is explicitly
    forbidden in the comments in <linux/rwsem.h>.
    
    To solve this, we change the locking to use plain mutexes instead of
    rwsems.  There is one mutex per open file, which protects the contents
    of the struct ib_umad_file, including the array of agents and list of
    queued packets; and there is one mutex per struct ib_umad_port, which
    protects the contents, including the list of open files.  We never
    hold the file mutex across calls to functions like ib_unregister_mad_agent()
,
    which can call back into other ib_umad code to queue a packet, and we
    always hold the port mutex as long as we need to make sure that a
    device is not hot-unplugged from under us.
    
    This even makes things nicer for users of the -rt patch, since we
    remove calls to downgrade_write() (which is not implemented in -rt).
    
    Signed-off-by: Roland Dreier <rolandd at cisco.com>

I don't think this change was incorporated into either OFED 1.2.5 or 1.3.

-- Hal

> 
> Jean-Francois.
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From hugh at veritas.com  Tue Apr 29 03:49:11 2008
From: hugh at veritas.com (Hugh Dickins)
Date: Tue, 29 Apr 2008 11:49:11 +0100 (BST)
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080429001052.GA8315@duo.random>
References: <20080423002848.GA32618@sgi.com>
	<20080423163713.GC24536@duo.random>
	<20080423221928.GV24536@duo.random> <20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com> <20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random> <20080426131734.GB19717@sgi.com>
	<20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
	<20080429001052.GA8315@duo.random>
Message-ID: <Pine.LNX.4.64.0804291128330.12702@blonde.site>

On Tue, 29 Apr 2008, Andrea Arcangeli wrote:
> 
> My point of view is that there was no rcu when I wrote that code, yet
> there was no reference count and yet all locking looks still exactly
> the same as I wrote it. There's even still the page_table_lock to
> serialize threads taking the mmap_sem in read mode against the first
> vma->anon_vma = anon_vma during the page fault.
> 
> Frankly I've absolutely no idea why rcu is needed in all rmap code
> when walking the page->mapping. Definitely the PG_locked is taken so
> there's no way page->mapping could possibly go away under the rmap
> code, hence the anon_vma can't go away as it's queued in the vma, and
> the vma has to go away before the page is zapped out of the pte.

[I'm scarcely following the mmu notifiers to-and-fro, which seems
to be in good hands, amongst faster thinkers than me: who actually
need and can test this stuff.  Don't let me slow you down; but I
can quickly clarify on this history.]

No, the locking was different as you had it, Andrea: there was an extra
bitspin lock, carried over from the pte_chains days (maybe we changed
the name, maybe we disagreed over the name, I forget), which mainly
guarded the page->mapcount.  I thought that was one lock more than we
needed, and eliminated it in favour of atomic page->mapcount in 2.6.9.

Here's the relevant extracts from ChangeLog-2.6.9:

[PATCH] rmaplock: PageAnon in mapping

First of a batch of five patches to eliminate rmap's page_map_lock, replace
its trylocking by spinlocking, and use anon_vma to speed up swapoff.

Patches updated from the originals against 2.6.7-mm7: nothing new so I won't
spam the list, but including Manfred's SLAB_DESTROY_BY_RCU fixes, and omitting
the unuse_process mmap_sem fix already in 2.6.8-rc3.

This patch:

Replace the PG_anon page->flags bit by setting the lower bit of the pointer in
page->mapping when it's anon_vma: PAGE_MAPPING_ANON bit.

We're about to eliminate the locking which kept the flags and mapping in
synch: it's much easier to work on a local copy of page->mapping, than worry
about whether flags and mapping are in synch (though I imagine it could be
done, at greater cost, with some barriers).

[PATCH] rmaplock: kill page_map_lock

The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to
lock its pte_chains.  We kept this (as page_map_lock: bit_spin_lock on
PG_maplock) when we moved to objrmap.  But the file objrmap locks its vma tree
with mapping->i_mmap_lock, and the anon objrmap locks its vma list with
anon_vma->lock: so isn't the page_map_lock superfluous?

Pretty much, yes.  The mapcount was protected by it, and needs to become an
atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely
up and down.  The last page_remove_rmap can't clear anon page mapping any
more, because of races with page_add_rmap; from which some BUG_ONs must go for
the same reason, but they've served their purpose.

vmscan decisions are naturally racy, little change there beyond removing
page_map_lock/unlock.  But to stabilize the file-backed page->mapping against
truncation while acquiring i_mmap_lock, page_referenced_file now needs page
lock to be held even for refill_inactive_zone.  There's a similar issue in
acquiring anon_vma->lock, where page lock doesn't help: which this patch
pretends to handle, but actually it needs the next.

Roughly 10% cut off lmbench fork numbers on my 2*HT*P4.  Must confess my
testing failed to show the races even while they were knowingly exposed: would
benefit from testing on racier equipment.

[PATCH] rmaplock: SLAB_DESTROY_BY_RCU

With page_map_lock gone, how to stabilize page->mapping's anon_vma while
acquiring anon_vma->lock in page_referenced_anon and try_to_unmap_anon?

The page cannot actually be freed (vmscan holds reference), but however much
we check page_mapped (which guarantees that anon_vma is in use - or would
guarantee that if we added suitable barriers), there's no locking against page
becoming unmapped the instant after, then anon_vma freed.

It's okay to take anon_vma->lock after it's freed, so long as it remains a
struct anon_vma (its list would become empty, or perhaps reused for an
unrelated anon_vma: but no problem since we always check that the page located
is the right one); but corruption if that memory gets reused for some other
purpose.

This is not unique: it's liable to be problem whenever the kernel tries to
approach a structure obliquely.  It's generally solved with an atomic
reference count; but one advantage of anon_vma over anonmm is that it does not
have such a count, and it would be a backward step to add one.

Therefore...  implement SLAB_DESTROY_BY_RCU flag, to guarantee that such a
kmem_cache_alloc'ed structure cannot get freed to other use while the
rcu_read_lock is held i.e.  preempt disabled; and use that for anon_vma.

Fix concerns raised by Manfred: this flag is incompatible with poisoning and
destructor, and kmem_cache_destroy needs to synchronize_kernel.

I hope SLAB_DESTROY_BY_RCU may be useful elsewhere; but though it's safe for
little anon_vma, I'd be reluctant to use it on any caches whose immediate
shrinkage under pressure is important to the system.

[PATCH] rmaplock: mm lock ordering

With page_map_lock out of the way, there's no need for page_referenced and
try_to_unmap to use trylocks - provided we switch anon_vma->lock and
mm->page_table_lock around in anon_vma_prepare.  Though I suppose it's
possible that we'll find that vmscan makes better progress with trylocks than
spinning - we're free to choose trylocks again if so.

Try to update the mm lock ordering documentation in filemap.c.  But I still
find it confusing, and I've no idea of where to stop.  So add an mm lock
ordering list I can understand to rmap.c.

[The fifth patch was about using anon_vma in swapoff, not relevant here.]

So, going back to what you wrote: holding the page lock there is
not enough to prevent the struct anon_vma going away beneath us.

Hugh


From ogerlitz at voltaire.com  Tue Apr 29 04:17:05 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Tue, 29 Apr 2008 14:17:05 +0300
Subject: [ofa-general] perfquery causes kernel to be stuck
	in	ib_unregister_mad_agent() function
In-Reply-To: <4816D9A2.7040009@bull.net>
References: <4816D9A2.7040009@bull.net>
Message-ID: <481703B1.60900@voltaire.com>

Jean-Francois.Neyroud wrote:
> If I attemp to query at the same time the performance counters on all 
> nodes on a cluster ( 40 nodes) .
> perfquery causes kernel to be stuck in ib_unregister_mad_agent() 
> function.
> Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the 
> kernel.
maybe with
$ dmesg -c
$ echo 1 > /proc/sysrq-trigger
$ echo t > /proc/sysrq-trigger

and then looking on the related kernel threads stacks from the dmesg (eg 
of ib_madX threads, etc) you would get more info that you can share.

Or.


From tziporet at dev.mellanox.co.il  Tue Apr 29 04:39:16 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 29 Apr 2008 14:39:16 +0300
Subject: [ofa-general] install.sh question
In-Reply-To: <481652F6.50008@mediaweb.com>
References: <1207688301.1661.86.camel@localhost>	<48141EC1.7010801@dev.mellanox.co.il>
	<481652F6.50008@mediaweb.com>
Message-ID: <481708E4.4070306@mellanox.co.il>

DK Smith wrote:
>
> Is the NEW & IMPROVED installer, install.pl, a drop in replacement for
> build.sh?
>
> I recently wrote a set of build scripts that are used to build a
> distribution (kernel + modules + root file system) for deployment
> elsewhere. (i.e. a non-native build of everything including OFED).
>
> In the OFED 1.2 installer, I used this method of invocation:
>
> /build.sh -c
>
> wherein, build.sh locates the config file, "ofed.conf" in the same
> directory. That worked.
>
> The statement about "run on all cluster nodes" appears to indicate a
> non-native build is no-longer possible.
>
>   
The build.sh was removed from OFED 1.3 and it is explianed in the RN:
    2.2 Package and install
        o There is a new install script. See OFED_Installation_Guide.txt for
          more details on the new installation and build procedures.
        o User space packages are now in different source RPMs (as 
opposed to
          one source RPM in previous OFED releases).
        o The option for a build without installing is not supported any 
more.
        o Added the script make-dist to generate tarball with kernel sources
          for each kernel.

Tziporet


From tziporet at dev.mellanox.co.il  Tue Apr 29 05:00:44 2008
From: tziporet at dev.mellanox.co.il (Tziporet Koren)
Date: Tue, 29 Apr 2008 15:00:44 +0300
Subject: [ofa-general] Re: [ewg] OFED April 21 meeting summary
In-Reply-To: <adak5ii0y7b.fsf@cisco.com>
References: <458BC6B0F287034F92FE78908BD01CE831A08338@mtlexch01.mtl.com>	<6C2C79E72C305246B504CBA17B5500C903DA9BAC@mtlexch01.mtl.com>	<bc457d660804280822y1e49bd0elcc5350c89fcd7370@mail.gmail.com>
	<adak5ii0y7b.fsf@cisco.com>
Message-ID: <48170DEC.1020303@mellanox.co.il>

Roland Dreier wrote:
>  > Also it is very important for us that IPoIB 2 kernel panics will be fixed (
>  > https://bugs.openfabrics.org/show_bug.cgi?id=989,
>  > https://bugs.openfabrics.org/show_bug.cgi?id=985)
>
>
>   
Both should not happen in upstream kernel:
989 - bug in a new optimization of OFED 1.3 (see bug report for details)
985 - bug in backports only (Eli will update the bug and resolution)

Tziporet


From dpn at isomerica.net  Tue Apr 29 05:46:13 2008
From: dpn at isomerica.net (Dan Noe)
Date: Tue, 29 Apr 2008 08:46:13 -0400
Subject: [ofa-general] perfquery causes kernel to be
	stuck	in	ib_unregister_mad_agent() function
In-Reply-To: <481703B1.60900@voltaire.com>
References: <4816D9A2.7040009@bull.net> <481703B1.60900@voltaire.com>
Message-ID: <48171895.9050705@isomerica.net>

Or Gerlitz wrote:
> Jean-Francois.Neyroud wrote:
>> If I attemp to query at the same time the performance counters on all 
>> nodes on a cluster ( 40 nodes) .
>> perfquery causes kernel to be stuck in ib_unregister_mad_agent() 
>> function.
>> Impossible to send CTRL-C or CTRL-Z to perfquery, it is stuck in the 
>> kernel.
> maybe with
> $ dmesg -c
> $ echo 1 > /proc/sysrq-trigger
> $ echo t > /proc/sysrq-trigger

Hi,

Depending on how recent your kernel is, you can also echo d > 
/proc/sysrq-trigger which will show the state of currently held locks.

Hope that helps.

Cheers,
Dan

-- 
                     /--------------- - -  -  -   -   -
                     |  Dan Noe
                     |  http://isomerica.net/~dpn/


From Jean-Francois.Neyroud at bull.net  Tue Apr 29 05:55:48 2008
From: Jean-Francois.Neyroud at bull.net (Jean-Francois.Neyroud)
Date: Tue, 29 Apr 2008 14:55:48 +0200
Subject: [ofa-general] perfquery causes kernel to be stuck
	in	ib_unregister_mad_agent() function
In-Reply-To: <1209466160.689.433.camel@hrosenstock-ws.xsigo.com>
References: <4816D9A2.7040009@bull.net>
	<1209466160.689.433.camel@hrosenstock-ws.xsigo.com>
Message-ID: <48171AD4.2020900@bull.net>

Thanks Hal with this fix  it's OK.

Jean-Francois.
> This could be related to the lock dependency issue discussed in the
> following thread:
>
> http://lists.openfabrics.org/pipermail/general/2008-January/044723.html
>
> You might want to look to the following for the actual fix:
>
> commit 2fe7e6f7c9f55eac24c5b3cdf56af29ab9b0ca81
> Author: Roland Dreier <rolandd at cisco.com>
> Date:   Fri Jan 25 14:15:42 2008 -0800
>
>   


From jackm at dev.mellanox.co.il  Tue Apr 29 06:31:56 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 29 Apr 2008 16:31:56 +0300
Subject: [ofa-general] Re: [PATCH] mlx4_core: enable changing default max HCA
	resource limits at run time -- reposting
In-Reply-To: <200804290940.37587.jackm@dev.mellanox.co.il>
References: <200804281438.28417.jackm@dev.mellanox.co.il>
	<adawsmi0yim.fsf@cisco.com>
	<200804290940.37587.jackm@dev.mellanox.co.il>
Message-ID: <200804291631.56685.jackm@dev.mellanox.co.il>

On Tuesday 29 April 2008 09:40, Jack Morgenstein wrote:
> P.S.
> BTW, I think there is a bug in the mthca driver, which messes things
> up if the profile numbers are NOT powers of 2:
> (from mthca_make_profile, in file mthca_profile.c):
> 	for (i = 0; i < MTHCA_RES_NUM; ++i) {
> 		profile[i].type     = i;
> 		profile[i].log_num  = max(ffs(profile[i].num) - 1, 0);
> 		profile[i].size    *= profile[i].num;
> 
>   should be
> 	for (i = 0; i < MTHCA_RES_NUM; ++i) {
> 		profile[i].type     = i;
> 		profile[i].num      = roundup_pow_of_two(profile[i].num);
> 		profile[i].log_num  = ilog2(profile[i].num);
> 		profile[i].size    *= profile[i].num;
> 
> since later the procedure assumes that all sizes are powers of 2.

I was wrong -- sorry about that, Roland.
I missed the procedure __mthca_check_profile_val() in file mthca_main.c, which
does raise the profile values to the next (or same) power-of-2 value, so there
is no bug.

Still, I feel that it is much cleaner to require the user to specify a power-of-2
directly, rather than correct cases in which the user did not do so.

I'm working on a patch for ib_mthca now, on top of your 2.6.26 tree, which
will do the job.

- Jack


From andrea at qumranet.com  Tue Apr 29 06:32:35 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 29 Apr 2008 15:32:35 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804291128330.12702@blonde.site>
References: <20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
	<20080429001052.GA8315@duo.random>
	<Pine.LNX.4.64.0804291128330.12702@blonde.site>
Message-ID: <20080429133235.GC8315@duo.random>

Hi Hugh!!

On Tue, Apr 29, 2008 at 11:49:11AM +0100, Hugh Dickins wrote:
> [I'm scarcely following the mmu notifiers to-and-fro, which seems
> to be in good hands, amongst faster thinkers than me: who actually
> need and can test this stuff.  Don't let me slow you down; but I
> can quickly clarify on this history.]

Still I think it'd be great if you could review mmu-notifier-core v14.
You and Nick are the core VM maintainers so it'd be great to hear any
feedback about it. I think it's fairly easy to classify the patch as
obviously safe as long as mmu notifiers are disarmed. Here a link for
your convenience.

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.25/mmu-notifier-v14/mmu-notifier-core

> No, the locking was different as you had it, Andrea: there was an extra
> bitspin lock, carried over from the pte_chains days (maybe we changed
> the name, maybe we disagreed over the name, I forget), which mainly
> guarded the page->mapcount.  I thought that was one lock more than we
> needed, and eliminated it in favour of atomic page->mapcount in 2.6.9.

Thanks a lot for the explanation!


From dks at mediaweb.com  Tue Apr 29 06:50:52 2008
From: dks at mediaweb.com (DK Smith)
Date: Tue, 29 Apr 2008 06:50:52 -0700
Subject: [ofa-general] install.sh question
In-Reply-To: <481708E4.4070306@mellanox.co.il>
References: <1207688301.1661.86.camel@localhost>	<48141EC1.7010801@dev.mellanox.co.il>
	<481652F6.50008@mediaweb.com> <481708E4.4070306@mellanox.co.il>
Message-ID: <481727BC.7000709@mediaweb.com>

Hello,

Thank you for the prompt response an the navigation to the Release Notes.

> The build.sh was removed from OFED 1.3 and it is explianed in the RN:
>    2.2 Package and install
>        o There is a new install script. See OFED_Installation_Guide.txt for
>          more details on the new installation and build procedures.
>        o User space packages are now in different source RPMs (as
> opposed to
>          one source RPM in previous OFED releases).
>        o The option for a build without installing is not supported any
> more.
>        o Added the script make-dist to generate tarball with kernel sources
>          for each kernel.
> 
> Tziporet
> 

What was the reason for the decision that building without installation
is the way to go?

I still have a consideration that remains ambiguous to me. I believe
that my scenario requires a separate build and installation ... and
non-natively too. :)

In version 1.2.*, I specified the location of the kernel source that I
was building against. Then I took the resulting RPM package and
installed it into a root file system that is subsequently rolled into a
special boot disk which is installed into a Linux "appliance". Is there
a way to accomplish this with the new installer?

The new installer appears to be restricted to native installations. Is
this the case? If so, isn't this also a problem for other people?


Usage: ./install.pl [-c <packages config_file>|--all|--hpc|--basic]
[-n|--net <network config_file>]

  -c|--config <packages config_file>. Example of the config file can be
found under docs.

  -l|--prefix          Set installation prefix.

  -p|--print-available Print available packages for current platform.
                       And create corresponding ofed.conf file.

  -k|--kernel <kernel version>. Default on this system: 9.2.2

  -s|--kernel-sources  <path to the kernel sources>. Default on this
system: /lib/modules/9.2.2/build

  --build32            Build 32-bit libraries. Relevant for x86_64 and
ppc64 platforms

  --without-depcheck   Skip Distro's libraries check

  -v|-vv|-vvv.         Set verbosity level

  -q.                  Set quiet - no messages will be printed

  --all|--hpc|--basic    Install all,hpc or basic packages correspondingly


Cheers,
DK


From Brian.Murrell at Sun.COM  Tue Apr 29 06:58:30 2008
From: Brian.Murrell at Sun.COM (Brian J. Murrell)
Date: Tue, 29 Apr 2008 09:58:30 -0400
Subject: [ofa-general] install.sh question
In-Reply-To: <481727BC.7000709@mediaweb.com>
References: <1207688301.1661.86.camel@localhost>
	<48141EC1.7010801@dev.mellanox.co.il> <481652F6.50008@mediaweb.com>
	<481708E4.4070306@mellanox.co.il> <481727BC.7000709@mediaweb.com>
Message-ID: <1209477510.16768.45.camel@pc.ilinx>

On Tue, 2008-04-29 at 06:50 -0700, DK Smith wrote:
> 
> The new installer appears to be restricted to native installations. Is
> this the case? If so, isn't this also a problem for other people?

If by this you mean "isn't it a problem for other people that the mode
of operations is that you have to build RPMs and then install them on
the build system in order to build other RPMs", _absolutely_ this is a
problem!

I am not allowed to "pollute" the pristine build system by installing
random RPMs into it.

To that end, I have created a build system for OFED 1.3 that builds and
installs (and builds and installs, and builds and installs) all of the
RPMs into a "tree" that I am free to make in my $HOME.  It does not work
as entirely smoothly as it could/should due to some (what I consider)
breakage in the packages themselves but I seem to have been able to work
around the breakage that I have found.

I don't yet build all RPMs either though.  I've only built enough to get
me the test tools I needed at the time.

But IMHO, the OFED build system should be able to complete building all
RPMs without a) needing to be root and b) having to install intermediate
RPMs on the build system.  AFAIK, it can/does not do this currently.

Cheers,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080429/47eea8c1/attachment.sig>

From dks at mediaweb.com  Tue Apr 29 07:56:17 2008
From: dks at mediaweb.com (DK Smith)
Date: Tue, 29 Apr 2008 07:56:17 -0700
Subject: [ofa-general] install.sh question
In-Reply-To: <1209477510.16768.45.camel@pc.ilinx>
References: <1207688301.1661.86.camel@localhost>	<48141EC1.7010801@dev.mellanox.co.il>
	<481652F6.50008@mediaweb.com>	<481708E4.4070306@mellanox.co.il>
	<481727BC.7000709@mediaweb.com>
	<1209477510.16768.45.camel@pc.ilinx>
Message-ID: <48173711.9040007@mediaweb.com>

Brian J. Murrell wrote:
> On Tue, 2008-04-29 at 06:50 -0700, DK Smith wrote:
>> The new installer appears to be restricted to native installations. Is
>> this the case? If so, isn't this also a problem for other people?
> 
> But IMHO, the OFED build system should be able to complete building all
> RPMs without a) needing to be root and b) having to install intermediate
> RPMs on the build system.  AFAIK, it can/does not do this currently.
> 

Hi and thanks for adding you voice to the chorus, err, I mean, duet.

I did not want to seem ungrateful by complaining, too, about the root
user thing. But seriously, how dangerous is that? ... to run some
massive perl script of unknown quality as root. My own paranoia was
beginning to make me worried. Am I too paranoid? LOL! So I appreciate
this being mentioned, explicitly. :)

Personally, I do not have good experiences with some vendor-supplied
installer programs that claim that they must be run as root. As an
example take QLogic's buggy installer for QLAxxxx FC product.

The utility of Make seems to be often overlooked and/or under-used or
simply misused. I assume that writing a custom installer is big decision
and commitment. So then why not make such a large investment able to
relocate the build output to another part of a file system? Or use the
facilites that Make provides to design such functionality? The kernel
build process seems to be a working model of such, that could be copied.
A lot of flexibility can be achieved by parameterizing the K_VER and
MODULES_INSTALL_DIR variables. Is there something specific to OFED that
makes this sort of flexibility impossible?

Cheers,
DK


From jackm at dev.mellanox.co.il  Tue Apr 29 08:22:57 2008
From: jackm at dev.mellanox.co.il (Jack Morgenstein)
Date: Tue, 29 Apr 2008 18:22:57 +0300
Subject: [ofa-general] [PATCH] ib_mthca: use log values instead of numeric
	values when specifiying HCA resource maxes in module parameters
Message-ID: <200804291822.57820.jackm@dev.mellanox.co.il>

ib_mthca: change all HCA resource module parameters to be log values.

Module parameters for overriding driver default maximum HCA resource
quantities should be log values, not numeric values -- since these
quantities should all be powers-of-2 anyway.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

---

Roland,
This is for kernel 2.6.26. (I generated it against your for-2.6.26 git tree).

I put a check in the patch for detecting if the user specified a log or not,
to make the transition from the old method (of numbers instead of logs)
easier.

Maybe add such a check to the mlx4 version, too?

Jack

index 9ebadd6..c9f9bbe 100644
--- a/drivers/infiniband/hw/mthca/mthca_main.c
+++ b/drivers/infiniband/hw/mthca/mthca_main.c
@@ -99,37 +99,88 @@ static struct mthca_profile hca_profile = {
 	.uarc_size          = MTHCA_DEFAULT_NUM_UARC_SIZE,     /* Arbel only */
 };
 
-module_param_named(num_qp, hca_profile.num_qp, int, 0444);
-MODULE_PARM_DESC(num_qp, "maximum number of QPs per HCA");
+static struct mthca_profile mod_param_profile = { 0 };
+module_param_named(num_qp, mod_param_profile.num_qp, int, 0444);
+MODULE_PARM_DESC(num_qp, "log maximum number of QPs per HCA (default 16)");
 
-module_param_named(rdb_per_qp, hca_profile.rdb_per_qp, int, 0444);
-MODULE_PARM_DESC(rdb_per_qp, "number of RDB buffers per QP");
+module_param_named(rdb_per_qp, mod_param_profile.rdb_per_qp, int, 0444);
+MODULE_PARM_DESC(rdb_per_qp, "log number of RDB buffers per QP (default 2)");
 
-module_param_named(num_cq, hca_profile.num_cq, int, 0444);
-MODULE_PARM_DESC(num_cq, "maximum number of CQs per HCA");
+module_param_named(num_cq, mod_param_profile.num_cq, int, 0444);
+MODULE_PARM_DESC(num_cq, "log maximum number of CQs per HCA (default 16)");
 
-module_param_named(num_mcg, hca_profile.num_mcg, int, 0444);
-MODULE_PARM_DESC(num_mcg, "maximum number of multicast groups per HCA");
+module_param_named(num_mcg, mod_param_profile.num_mcg, int, 0444);
+MODULE_PARM_DESC(num_mcg, "log maximum number of multicast groups per HCA"
+		 " (default 13)");
 
-module_param_named(num_mpt, hca_profile.num_mpt, int, 0444);
+module_param_named(num_mpt, mod_param_profile.num_mpt, int, 0444);
 MODULE_PARM_DESC(num_mpt,
-		"maximum number of memory protection table entries per HCA");
+		 "log maximum number of memory protection table entries per HCA"
+		 " (default 17)");
 
-module_param_named(num_mtt, hca_profile.num_mtt, int, 0444);
+module_param_named(num_mtt, mod_param_profile.num_mtt, int, 0444);
 MODULE_PARM_DESC(num_mtt,
-		 "maximum number of memory translation table segments per HCA");
+		 "log maximum number of memory translation table segments per"
+		 " HCA (default 20)");
 
-module_param_named(num_udav, hca_profile.num_udav, int, 0444);
-MODULE_PARM_DESC(num_udav, "maximum number of UD address vectors per HCA");
+module_param_named(num_udav, mod_param_profile.num_udav, int, 0444);
+MODULE_PARM_DESC(num_udav, "log maximum number of UD address vectors per HCA"
+		 " (default 15)");
 
-module_param_named(fmr_reserved_mtts, hca_profile.fmr_reserved_mtts, int, 0444);
+module_param_named(fmr_reserved_mtts, mod_param_profile.fmr_reserved_mtts,
+		   int, 0444);
 MODULE_PARM_DESC(fmr_reserved_mtts,
-		 "number of memory translation table segments reserved for FMR");
+		 "log number of memory translation table segments reserved for"
+		 " FMR (default 18)");
 
 static char mthca_version[] __devinitdata =
 	DRV_NAME ": Mellanox InfiniBand HCA driver v"
 	DRV_VERSION " (" DRV_RELDATE ")\n";
 
+static void process_mod_param_profile(void)
+{
+	if (mod_param_profile.num_qp > 31 	||
+	    mod_param_profile.rdb_per_qp > 31	||
+	    mod_param_profile.num_cq > 31 	||
+	    mod_param_profile.num_mcg > 31 	||
+	    mod_param_profile.num_mpt > 31 	||
+	    mod_param_profile.num_mtt > 31 	||
+	    mod_param_profile.num_udav > 31 	||
+	    mod_param_profile.fmr_reserved_mtts > 31) {
+		printk(KERN_WARNING PFX "Value of one or more HCA resource"
+		       " module parameters exceeds 31.\n");
+		printk(KERN_WARNING PFX "Are you specifying LOG values?\n");
+		printk(KERN_WARNING PFX "Reverting to using max default values"
+		       " for all HCA resources.\n");
+		return;
+	}
+
+	hca_profile.num_qp = (mod_param_profile.num_qp ?
+			      1 << mod_param_profile.num_qp :
+			      hca_profile.num_qp);		 
+	hca_profile.rdb_per_qp = (mod_param_profile.rdb_per_qp ?
+			      1 << mod_param_profile.rdb_per_qp :
+			      hca_profile.rdb_per_qp);
+	hca_profile.num_cq = (mod_param_profile.num_cq ?
+			      1 << mod_param_profile.num_cq :
+			      hca_profile.num_cq);
+	hca_profile.num_mcg = (mod_param_profile.num_mcg ?
+			      1 << mod_param_profile.num_mcg :
+			      hca_profile.num_mcg);
+	hca_profile.num_mpt = (mod_param_profile.num_mpt ?
+			      1 << mod_param_profile.num_mpt :
+			      hca_profile.num_mpt);
+	hca_profile.num_mtt = (mod_param_profile.num_mtt ?
+			      1 << mod_param_profile.num_mtt :
+			      hca_profile.num_mtt);
+	hca_profile.num_udav = (mod_param_profile.num_udav ?
+			      1 << mod_param_profile.num_udav :
+			      hca_profile.num_udav);
+	hca_profile.fmr_reserved_mtts = (mod_param_profile.fmr_reserved_mtts ?
+			      1 << mod_param_profile.fmr_reserved_mtts :
+			      hca_profile.fmr_reserved_mtts);
+}
+
 static int mthca_tune_pci(struct mthca_dev *mdev)
 {
 	if (!tune_pci)
@@ -1364,6 +1415,7 @@ static int __init mthca_init(void)
 {
 	int ret;
 
+	process_mod_param_profile();
 	mthca_validate_profile();
 
 	ret = mthca_catas_init();


From andrea at qumranet.com  Tue Apr 29 08:30:52 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 29 Apr 2008 17:30:52 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <Pine.LNX.4.64.0804281819020.2502@schroedinger.engr.sgi.com>
References: <20080423221928.GV24536@duo.random>
	<20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
	<20080429001052.GA8315@duo.random>
	<Pine.LNX.4.64.0804281819020.2502@schroedinger.engr.sgi.com>
Message-ID: <20080429153052.GE8315@duo.random>

On Mon, Apr 28, 2008 at 06:28:06PM -0700, Christoph Lameter wrote:
> On Tue, 29 Apr 2008, Andrea Arcangeli wrote:
> 
> > Frankly I've absolutely no idea why rcu is needed in all rmap code
> > when walking the page->mapping. Definitely the PG_locked is taken so
> > there's no way page->mapping could possibly go away under the rmap
> > code, hence the anon_vma can't go away as it's queued in the vma, and
> > the vma has to go away before the page is zapped out of the pte.
> 
> zap_pte_range can race with the rmap code and it does not take the page 
> lock. The page may not go away since a refcount was taken but the mapping 
> can go away. Without RCU you have no guarantee that the anon_vma is 
> existing when you take the lock. 

There's some room for improvement, like using down_read_trylock, if
that succeeds we don't need to increase the refcount and we can keep
the rcu_read_lock held instead.

Secondly we don't need to increase the refcount in fork() when we
queue the vma-copy in the anon_vma. You should init the refcount to 1
when the anon_vma is allocated, remove the atomic_inc from all code
(except when down_read_trylock fails) and then change anon_vma_unlink
to:

        up_write(&anon_vma->sem);
	if (empty)
		put_anon_vma(anon_vma);

While the down_read_trylock surely won't help in AIM, the second
change will reduce a bit the overhead in the VM core fast paths by
avoiding all refcounting changes by checking the list_empty the same
way the current code does. I really like how I designed the garbage
collection through list_empty and that's efficient and I'd like to
keep it.

I however doubt this will bring us back to the same performance of the
current spinlock version, as the real overhead should come out of
overscheduling in down_write ai anon_vma_link. Here an initially
spinning lock would help but that's gray area, it greatly depends on
timings, and on very large systems where a cacheline wait with many
cpus forking at the same time takes more than scheduling a semaphore
may not slowdown performance that much. So I think the only way is a
configuration option to switch the locking at compile time, then XPMEM
will depend on that option to be on, I don't see a big deal and this
guarantees embedded isn't screwed up by totally unnecessary locks on UP.


From HNGUYEN at de.ibm.com  Tue Apr 29 08:40:52 2008
From: HNGUYEN at de.ibm.com (Hoang-Nam Nguyen)
Date: Tue, 29 Apr 2008 17:40:52 +0200
Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support,
	Patch 10)
Message-ID: <OF187B0B8F.E74F0E35-ONC125743A.004F40A2-C125743A.00561614@de.ibm.com>


Hi Roland!
>> Each CQ is attached to an EQ and receives its completion interrupts from
that EQ.
>>
>> CQ and EQ are not per port.
>>
>> Implementing this in in device layer allows all ULP's to use the
feature.
>> We do not expose EQ allocation API, because there is no point creating
more EQs
>> then CPUs.
>CQ are not per port but netdevices are bounded to port (its correct that
>few of them can be bounded to the same port, eg with different PKEYs or
>VLAN tags), maybe it worth thinking on API that either let the ULP
>dictate to what CPU/core they want the EQ serving this CQ direct its
>interrupts or if the ULP doesn't care, let the driver allocate that in
>round robin fashion.
We've had some ehca code doing round robin scheme, which is an ehca
specific
policy.
Do you have any thoughts on the approach you want to pursue?
Will it be 2.6.26 or 2.6.27 instead?
Thanks
Nam


From ossrosch at linux.vnet.ibm.com  Tue Apr 29 08:44:15 2008
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Tue, 29 Apr 2008 17:44:15 +0200
Subject: [ofa-general] [PATCH] IB/ehca: Allocate event queue size depending
	on max number of CQs and QPs
Message-ID: <200804291744.17235.ossrosch@linux.vnet.ibm.com>


Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
---
 drivers/infiniband/hw/ehca/ehca_classes.h |    5 ++++
 drivers/infiniband/hw/ehca/ehca_cq.c      |   10 ++++++++
 drivers/infiniband/hw/ehca/ehca_main.c    |   36 +++++++++++++++++++++++++++-
 drivers/infiniband/hw/ehca/ehca_qp.c      |   10 ++++++++
 4 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 3d6d946..00bab60 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -66,6 +66,7 @@ struct ehca_av;
 #include "ehca_irq.h"
 
 #define EHCA_EQE_CACHE_SIZE 20
+#define EHCA_MAX_NUM_QUEUES 0xffff
 
 struct ehca_eqe_cache_entry {
 	struct ehca_eqe *eqe;
@@ -127,6 +128,8 @@ struct ehca_shca {
 	/* MR pgsize: bit 0-3 means 4K, 64K, 1M, 16M respectively */
 	u32 hca_cap_mr_pgsize;
 	int max_mtu;
+	atomic_t num_cqs;
+	atomic_t num_qps;
 };
 
 struct ehca_pd {
@@ -344,6 +347,8 @@ extern int ehca_use_hp_mr;
 extern int ehca_scaling_code;
 extern int ehca_lock_hcalls;
 extern int ehca_nr_ports;
+extern int ehca_max_cq;
+extern int ehca_max_qp;
 
 struct ipzu_queue_resp {
 	u32 qe_size;      /* queue entry size */
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index ec0cfcf..5b4f9a3 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -132,6 +132,14 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 	if (cqe >= 0xFFFFFFFF - 64 - additional_cqe)
 		return ERR_PTR(-EINVAL);
 
+	if (atomic_read(&shca->num_cqs) >= ehca_max_cq) {
+		ehca_err(device, "Unable to create CQ, max number of %i "
+			"CQs reached.", ehca_max_cq);
+		ehca_err(device, "To increase the maximum number of CQs "
+			"use the number_of_cqs module parameter.\n");
+		return ERR_PTR(-ENOSPC);
+	}
+
 	my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL);
 	if (!my_cq) {
 		ehca_err(device, "Out of memory for ehca_cq struct device=%p",
@@ -286,6 +294,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 		}
 	}
 
+	atomic_inc(&shca->num_cqs);
 	return cq;
 
 create_cq_exit4:
@@ -359,6 +368,7 @@ int ehca_destroy_cq(struct ib_cq *cq)
 	ipz_queue_dtor(NULL, &my_cq->ipz_queue);
 	kmem_cache_free(cq_cache, my_cq);
 
+	atomic_dec(&shca->num_cqs);
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 6504897..401907f 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -68,6 +68,8 @@ int ehca_port_act_time = 30;
 int ehca_static_rate   = -1;
 int ehca_scaling_code  = 0;
 int ehca_lock_hcalls   = -1;
+int ehca_max_cq        = -1;
+int ehca_max_qp        = -1;
 
 module_param_named(open_aqp1,     ehca_open_aqp1,     bool, S_IRUGO);
 module_param_named(debug_level,   ehca_debug_level,   int,  S_IRUGO);
@@ -79,6 +81,8 @@ module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  bool, S_IRUGO);
 module_param_named(static_rate,   ehca_static_rate,   int,  S_IRUGO);
 module_param_named(scaling_code,  ehca_scaling_code,  bool, S_IRUGO);
 module_param_named(lock_hcalls,   ehca_lock_hcalls,   bool, S_IRUGO);
+module_param_named(number_of_cqs, ehca_max_cq,        int, S_IRUGO);
+module_param_named(number_of_qps, ehca_max_qp,        int, S_IRUGO);
 
 MODULE_PARM_DESC(open_aqp1,
 		 "Open AQP1 on startup (default: no)");
@@ -104,6 +108,12 @@ MODULE_PARM_DESC(scaling_code,
 MODULE_PARM_DESC(lock_hcalls,
 		 "Serialize all hCalls made by the driver "
 		 "(default: autodetect)");
+MODULE_PARM_DESC(number_of_cqs,
+		"Max number of CQs which can be allocated "
+		"(default: autodetect)");
+MODULE_PARM_DESC(number_of_qps,
+		"Max number of QPs which can be allocated "
+		"(default: autodetect)");
 
 DEFINE_RWLOCK(ehca_qp_idr_lock);
 DEFINE_RWLOCK(ehca_cq_idr_lock);
@@ -355,6 +365,25 @@ static int ehca_sense_attributes(struct ehca_shca *shca)
 		if (rblock->memory_page_size_supported & pgsize_map[i])
 			shca->hca_cap_mr_pgsize |= pgsize_map[i + 1];
 
+	/* Set maximum number of CQs and QPs to calculate EQ size */
+	if (ehca_max_qp == -1)
+		ehca_max_qp = min_t(int, rblock->max_qp, EHCA_MAX_NUM_QUEUES);
+	else if (ehca_max_qp < 1 || ehca_max_qp > rblock->max_qp) {
+		ehca_gen_err("Requested number of QPs is out of range (1 - %i) "
+			"specified by HW", rblock->max_qp);
+		ret = -EINVAL;
+		goto sense_attributes1;
+	}
+
+	if (ehca_max_cq == -1)
+		ehca_max_cq = min_t(int, rblock->max_cq, EHCA_MAX_NUM_QUEUES);
+	else if (ehca_max_cq < 1 || ehca_max_cq > rblock->max_cq) {
+		ehca_gen_err("Requested number of CQs is out of range (1 - %i) "
+			"specified by HW", rblock->max_cq);
+		ret = -EINVAL;
+		goto sense_attributes1;
+	}
+
 	/* query max MTU from first port -- it's the same for all ports */
 	port = (struct hipz_query_port *)rblock;
 	h_ret = hipz_h_query_port(shca->ipz_hca_handle, 1, port);
@@ -684,7 +713,7 @@ static int __devinit ehca_probe(struct of_device *dev,
 	struct ehca_shca *shca;
 	const u64 *handle;
 	struct ib_pd *ibpd;
-	int ret, i;
+	int ret, i, eq_size;
 
 	handle = of_get_property(dev->node, "ibm,hca-handle", NULL);
 	if (!handle) {
@@ -705,6 +734,8 @@ static int __devinit ehca_probe(struct of_device *dev,
 		return -ENOMEM;
 	}
 	mutex_init(&shca->modify_mutex);
+	atomic_set(&shca->num_cqs, 0);
+	atomic_set(&shca->num_qps, 0);
 	for (i = 0; i < ARRAY_SIZE(shca->sport); i++)
 		spin_lock_init(&shca->sport[i].mod_sqp_lock);
 
@@ -724,8 +755,9 @@ static int __devinit ehca_probe(struct of_device *dev,
 		goto probe1;
 	}
 
+	eq_size = 2 * ehca_max_cq + 4 * ehca_max_qp;
 	/* create event queues */
-	ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048);
+	ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, eq_size);
 	if (ret) {
 		ehca_err(&shca->ib_device, "Cannot create EQ.");
 		goto probe1;
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 57bef11..73d9c4a 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -421,6 +421,14 @@ static struct ehca_qp *internal_create_qp(
 	u32 swqe_size = 0, rwqe_size = 0, ib_qp_num;
 	unsigned long flags;
 
+	if (atomic_read(&shca->num_qps) >= ehca_max_qp) {
+		ehca_err(pd->device, "Unable to create QP, max number of %i "
+			 "QPs reached.", ehca_max_qp);
+		ehca_err(pd->device, "To increase the maximum number of QPs "
+			 "use the number_of_qps module parameter.\n");
+		return ERR_PTR(-ENOSPC);
+	}
+
 	if (init_attr->create_flags)
 		return ERR_PTR(-EINVAL);
 
@@ -797,6 +805,7 @@ static struct ehca_qp *internal_create_qp(
 		}
 	}
 
+	atomic_inc(&shca->num_qps);
 	return my_qp;
 
 create_qp_exit6:
@@ -1948,6 +1957,7 @@ static int internal_destroy_qp(struct ib_device *dev, struct ehca_qp *my_qp,
 	if (HAS_SQ(my_qp))
 		ipz_queue_dtor(my_pd, &my_qp->ipz_squeue);
 	kmem_cache_free(qp_cache, my_qp);
+	atomic_dec(&shca->num_qps);
 	return 0;
 }
 
-- 
1.5.5


From holt at sgi.com  Tue Apr 29 08:50:30 2008
From: holt at sgi.com (Robin Holt)
Date: Tue, 29 Apr 2008 10:50:30 -0500
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080429153052.GE8315@duo.random>
References: <20080424064753.GH24536@duo.random>
	<20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
	<20080429001052.GA8315@duo.random>
	<Pine.LNX.4.64.0804281819020.2502@schroedinger.engr.sgi.com>
	<20080429153052.GE8315@duo.random>
Message-ID: <20080429155030.GB28944@sgi.com>

> I however doubt this will bring us back to the same performance of the
> current spinlock version, as the real overhead should come out of
> overscheduling in down_write ai anon_vma_link. Here an initially
> spinning lock would help but that's gray area, it greatly depends on
> timings, and on very large systems where a cacheline wait with many
> cpus forking at the same time takes more than scheduling a semaphore
> may not slowdown performance that much. So I think the only way is a
> configuration option to switch the locking at compile time, then XPMEM
> will depend on that option to be on, I don't see a big deal and this
> guarantees embedded isn't screwed up by totally unnecessary locks on UP.

You have said this continually about a CONFIG option.  I am unsure how
that could be achieved.  Could you provide a patch?

Thanks,
Robin


From andrea at qumranet.com  Tue Apr 29 09:03:40 2008
From: andrea at qumranet.com (Andrea Arcangeli)
Date: Tue, 29 Apr 2008 18:03:40 +0200
Subject: [ofa-general] Re: [PATCH 01 of 12] Core of mmu notifiers
In-Reply-To: <20080429155030.GB28944@sgi.com>
References: <20080424095112.GC30298@sgi.com>
	<20080424153943.GJ24536@duo.random>
	<20080424174145.GM24536@duo.random>
	<20080426131734.GB19717@sgi.com> <20080427122727.GO9514@duo.random>
	<Pine.LNX.4.64.0804281332030.31163@schroedinger.engr.sgi.com>
	<20080429001052.GA8315@duo.random>
	<Pine.LNX.4.64.0804281819020.2502@schroedinger.engr.sgi.com>
	<20080429153052.GE8315@duo.random> <20080429155030.GB28944@sgi.com>
Message-ID: <20080429160340.GG8315@duo.random>

On Tue, Apr 29, 2008 at 10:50:30AM -0500, Robin Holt wrote:
> You have said this continually about a CONFIG option.  I am unsure how
> that could be achieved.  Could you provide a patch?

I'm busy with the reserved ram patch against 2.6.25 and latest kvm.git
that is moving from pages to pfn for pci passthrough (that change will
also remove the page pin with mmu notifiers).

Unfortunately reserved-ram bugs out again in the blk-settings.c on
real hardware. The fix I pushed in .25 for it, works when booting kvm
(that's how I tested it) but on real hardware sata b_pfn happens to be
1 page less than the result of the min comparison and I'll have to
figure out what happens (only .24 code works on real hardware..., at
least my fix is surely better than the previous .25-pre code).

I've other people waiting on that reserved-ram to be working, so once
I've finished, I'll do the optimization to anon-vma (at least the
removal of the unnecessary atomic_inc from fork) and add the config
option.

Christoph if you've interest in evolving anon-vma-sem and i_mmap_sem
yourself in this direction, you're very welcome to go ahead while I
finish sorting out reserved-ram. If you do, please let me know so we
don't duplicate effort, and it'd be absolutely great if the patches
could be incremental with #v14 so I can merge them trivially later and
upload a new patchset once you're finished (the only outstanding fix
you have to apply on top of #v14 that is already integrated in my
patchset, is the i_mmap_sem deadlock fix I posted and that I'm sure
you've already applied on top of #v14 before doing any more
development on it).

Thanks!


From rdreier at cisco.com  Tue Apr 29 09:36:32 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 09:36:32 -0700
Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old
	FMR R_Keys too soon
In-Reply-To: <200804290959.41881.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Tue, 29 Apr 2008 09:59:41 +0300")
References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de>
	<adazlrh62mv.fsf@cisco.com>
	<200804290959.41881.jackm@dev.mellanox.co.il>
Message-ID: <adaod7sy5wv.fsf@cisco.com>

 > We concluded at that time that the patch was OK.
 > 
 > I also reviewed the patch again (especially the Sinai optimization), and the patch is OK
 > there, too:

Thanks for the really detailed explanation.

I'll apply Olaf's patch for 2.6.26


From michael.heinz at qlogic.com  Tue Apr 29 09:37:41 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Tue, 29 Apr 2008 11:37:41 -0500
Subject: [ofa-general] Can't Initialize an MT23108 HCA
Message-ID: <C07C40DB2364324799506DE8FF12F8D86789AD@EPEXCH1.qlogic.org>

I thought I'd posted this question on the group before, but looking
through my notes I couldn't find that I had. My apologies if this is a
repeat.
 
I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA
running 3.05 firmware. The HCA is known to work with QuickSilver. When I
rebooted I got this:
 
Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA
driver v0.08
(February 14, 2006)
Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0
Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] ->
GSI 28
(level, low) -> IRQ 217
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did
not come
back after reset, aborting.
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset
HCA,
aborting.
The system is running RHEL4, update 4, x86_64. Are older HCAs supported
with OFED or can only Arbel and Connect-X type HCAs usable?
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080429/ae104d60/attachment.html>

From boris at mellanox.com  Tue Apr 29 09:42:04 2008
From: boris at mellanox.com (Boris Shpolyansky)
Date: Tue, 29 Apr 2008 09:42:04 -0700
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D86789AD@EPEXCH1.qlogic.org>
Message-ID: <1E3DCD1C63492545881FACB6063A57C10257C6DE@mtiexch01.mti.com>

Hi Michael,
 
MT23108 HCAs are supported.
Please, update FW on your card to the latest version available from
Mellanox web site at
http://www.mellanox.com/support/firmware_table_IH.php
 
Follow FW burning instructions provided there.
 
Regards,
Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com
 
 
________________________________

From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz
Sent: Tuesday, April 29, 2008 9:38 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Can't Initialize an MT23108 HCA


I thought I'd posted this question on the group before, but looking
through my notes I couldn't find that I had. My apologies if this is a
repeat.
 
I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA
running 3.05 firmware. The HCA is known to work with QuickSilver. When I
rebooted I got this:
 
Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA
driver v0.08
(February 14, 2006)
Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0
Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] ->
GSI 28
(level, low) -> IRQ 217
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did
not come
back after reset, aborting.
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset
HCA,
aborting.
The system is running RHEL4, update 4, x86_64. Are older HCAs supported
with OFED or can only Arbel and Connect-X type HCAs usable?
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080429/b70324cd/attachment.html>

From rdreier at cisco.com  Tue Apr 29 09:43:42 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 09:43:42 -0700
Subject: [ofa-general] Re: [PATCH] IB/ehca: Allocate event queue size
	depending on max number of CQs and QPs
In-Reply-To: <200804291744.17235.ossrosch@linux.vnet.ibm.com> (Stefan
	Roscher's message of "Tue, 29 Apr 2008 17:44:15 +0200")
References: <200804291744.17235.ossrosch@linux.vnet.ibm.com>
Message-ID: <adafxt4y5kx.fsf@cisco.com>

 > 
 > Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>

Kind of an inadequate changelog ;)

Is this a fix or an enhancement or what?

 > +	if (atomic_read(&shca->num_cqs) >= ehca_max_cq) {

 > +	if (atomic_read(&shca->num_qps) >= ehca_max_qp) {

These are racy in the sense that multiple simultaneous calls to
create_cq/create_qp might end up exceeding the ehca_max_cq limit.  Is
that an issue?

You could close the race by using atomic_add_unless() and testing the
return value (and being careful to do atomic_dec() on error paths after
you bump num_cqs/num_qps).

 - R.


From rdreier at cisco.com  Tue Apr 29 09:48:25 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 09:48:25 -0700
Subject: [ofa-general] Re: [PATCH] ib_mthca: use log values instead of
	numeric values when specifiying HCA resource maxes in module
	parameters
In-Reply-To: <200804291822.57820.jackm@dev.mellanox.co.il> (Jack Morgenstein's
	message of "Tue, 29 Apr 2008 18:22:57 +0300")
References: <200804291822.57820.jackm@dev.mellanox.co.il>
Message-ID: <adabq3sy5d2.fsf@cisco.com>

 > Module parameters for overriding driver default maximum HCA resource
 > quantities should be log values, not numeric values -- since these
 > quantities should all be powers-of-2 anyway.

Hmm, that's a creative answer to my objection about the mlx4 interface.

given that mthca has had the old interface for nearly a year and a half,
what do we gain from changing it now?

 > I put a check in the patch for detecting if the user specified a log or not,
 > to make the transition from the old method (of numbers instead of logs)
 > easier.

Yes, that is nice.  Would the plan be just to allow both methods?

 > Maybe add such a check to the mlx4 version, too?

Definitely, and I think the mlx4 module parameter names should match
too.  But then it would make sense for mlx4 to allow setting parameter
values by value and not by log, and then we end up with all the same
code in both places, and so why not just have mlx4 set by value the same
way as mthca?

 - R.


From rdreier at cisco.com  Tue Apr 29 10:02:44 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 10:02:44 -0700
Subject: [ofa-general] Re: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup
In-Reply-To: <200804290426.m3T4QhJl018196@velma.neteffect.com> (Glenn
	Streiff's message of "Mon, 28 Apr 2008 23:26:43 -0500")
References: <200804290426.m3T4QhJl018196@velma.neteffect.com>
Message-ID: <ada7iegy4p7.fsf@cisco.com>

 > Clean up the SFP+ patch.

Why send a patch and then immediately a cleanup?  Why not just clean the
original patch?

 > -	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 > +	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ mac_index ] != NES_PHY_TYPE_PUMA_1G)) {

This type of change isn't a cleanup... kernel style prefers

	array[index]

to

	array[ index ]

and it seems most of this patch is making the change to the less-good way?

 - R.


From pw at osc.edu  Tue Apr 29 10:05:16 2008
From: pw at osc.edu (Pete Wyckoff)
Date: Tue, 29 Apr 2008 13:05:16 -0400
Subject: [ofa-general] Re: [Ips] Calculating the VA in iSER header
In-Reply-To: <694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com>
References: <4804B03C.6060507@voltaire.com>
	<OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
	<694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>
	<20080416144830.GC23861@osc.edu> <adaskxlls4u.fsf@cisco.com>
	<694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com>
Message-ID: <20080429170516.GA8857@osc.edu>

dorfman.eli at gmail.com wrote on Thu, 17 Apr 2008 14:13 +0300:
> On Wed, Apr 16, 2008 at 6:46 PM, Roland Dreier <rdreier at cisco.com> wrote:
> >  > Agree with the interpretation of the spec, and it's probably a bit
> >   > clearer that way too.  But we have working initiators and targets
> >   > that do it the "wrong" way.
> >
> >  Yes... I guess the key question is whether there are any initiators that
> >  do things the "right" way.
> >
> >
> >   > 1. Flag day: all initiators and targets change at the same time.
> >   > Will see data corruption if someone unluckily runs one or the other
> >   > using old non-fixed code.
> >
> >  Seems unacceptable to me... it doesn't make sense at all to break every
> >  setup in the world just to be "right" according to the spec.
> 
> This will break only when both initiator and target will use
> InitialR2T=No, which means allow unsolicited data.
> As far as I know, STGT is not very common (and its version in RHEL5.1
> is considered experimental). Its default is also InitialR2T=Yes.
> Voltaire's iSCSI over iSER target also uses default InitialR2T=Yes.
> So it seems that nothing will break.

I finally got a chance to look at this just now.  I think you mean
default is InitialR2T=No above, which means no unsolicited data.
That is the default case, and true, the two different meanings
of the initiator-supplied VA coincide.

But you missed the impact of immediate data.  We run with the
defaults (I think) that say the first write request packet should be
filled with a bit of the coming data stream.  From iscsid.conf:

    # To enable immediate data (i.e., the initiator sends unsolicited data
    # with the iSCSI command packet), uncomment the following line:
    #
    # The default is Yes
    node.session.iscsi.ImmediateData = Yes

Looking at the offset printed out by your patch, it is indeed
non-zero for the first RDMA read.  Please correct me if I am
mistaken about this---you must have tested all four variations of
with and without the patches on initiator and target side, but I did
not.

Hence I am still a bit unhappy about having to deal with the
fallout, with no way to detect it.  For our local use, I'll keep an
older version of stgt in use until we switch to a new kernel, then
merge up the target side change.  It is a bother, but I can deal
with it.  For other institutions, this lockstep upgrade requirement
will not be obvious until they debug the resulting data corruption.

Still, I do understand why it would be nice to conform to the spec,
and it is maybe a bit cleaner that way too.  Maybe you can help with
the bug reports on stgt-devel during the transition, and maintain
and publish a patch to let it work with old kernels.

		-- Pete


From gstreiff at NetEffect.com  Tue Apr 29 10:16:11 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Tue, 29 Apr 2008 12:16:11 -0500
Subject: [ofa-general] RE: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup
In-Reply-To: <ada7iegy4p7.fsf@cisco.com>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2>

 
>  > Clean up the SFP+ patch.
> 
> Why send a patch and then immediately a cleanup?  Why not 
> just clean the
> original patch?
> 
>  > -	if ((nesadapter->OneG_Mode) && 
> (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
>  > +	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ 
> mac_index ] != NES_PHY_TYPE_PUMA_1G)) {
> 
> This type of change isn't a cleanup... kernel style prefers
> 
> 	array[index]
> 
> to
> 
> 	array[ index ]
> 
> and it seems most of this patch is making the change to the 
> less-good way?
> 
>  - R.
> 

My bad, on the array index idiom.  I can redo.

With regard to post patch clean-ups, I recall you telling me 
that is was preferred to either front-load or back-load the 
cleanups in a patch series.  

I generally "cleaned-up" the entire functions rather than 
just the patched portion.  If I do both together, then you'll 
get clean-up noise interspersed with functional deltas making 
functional review somewhat annoying in my opinion.

Will be happy to redo as a single SFP patch
and drop 3rd patch if that works better for you.  In fact that
is how I did it originally. :-)

Glenn


From rdreier at cisco.com  Tue Apr 29 10:18:16 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 10:18:16 -0700
Subject: [ofa-general] Re: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup
In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2> (Glenn
	Streiff's message of "Tue, 29 Apr 2008 12:16:11 -0500")
References: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2>
Message-ID: <ada3ap4y3zb.fsf@cisco.com>

 > My bad, on the array index idiom.  I can redo.

Yes, please do resend without that.

 > With regard to post patch clean-ups, I recall you telling me 
 > that is was preferred to either front-load or back-load the 
 > cleanups in a patch series.  

Yes, that is true.

 > I generally "cleaned-up" the entire functions rather than 
 > just the patched portion.  If I do both together, then you'll 
 > get clean-up noise interspersed with functional deltas making 
 > functional review somewhat annoying in my opinion.

OK, got it.  The changelog "Clean up the SFP+ patch." was misleading.

 - R.


From gstreiff at NetEffect.com  Tue Apr 29 10:23:58 2008
From: gstreiff at NetEffect.com (Glenn Streiff)
Date: Tue, 29 Apr 2008 12:23:58 -0500
Subject: [ofa-general] RE: [ewg] RE: [ PATCH 3/3 ] RDMA/nes SFP+ cleanup
In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015C@venom2>
Message-ID: <5E701717F2B2ED4EA60F87C8AA57B7CC0795015D@venom2>


> >  > Clean up the SFP+ patch.
> > 
> > Why send a patch and then immediately a cleanup?  Why not 
> > just clean the
> > original patch?
> > 
> >  > -	if ((nesadapter->OneG_Mode) && 
> > (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
> >  > +	if ((nesadapter->OneG_Mode) && (nesadapter->phy_type[ 
> > mac_index ] != NES_PHY_TYPE_PUMA_1G)) {
> > 
> > This type of change isn't a cleanup... kernel style prefers
> > 
> > 	array[index]
> > 
> > to
> > 
> > 	array[ index ]
> > 
> > and it seems most of this patch is making the change to the 
> > less-good way?
> > 
> >  - R.
> > 
> 
> My bad, on the array index idiom.  I can redo.
> 
> With regard to post patch clean-ups, I recall you telling me 
> that is was preferred to either front-load or back-load the 
> cleanups in a patch series.  
> 
> I generally "cleaned-up" the entire functions rather than 
> just the patched portion.  If I do both together, then you'll 
> get clean-up noise interspersed with functional deltas making 
> functional review somewhat annoying in my opinion.
> 

Hmm...what I probably should of done was given a clean sfp-patch
and then add peripheral cleanups to the functions as a subsequent
patch.  I'll go down that page.

Glenn


From michael.heinz at qlogic.com  Tue Apr 29 10:29:44 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Tue, 29 Apr 2008 12:29:44 -0500
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <1E3DCD1C63492545881FACB6063A57C10257C6DE@mtiexch01.mti.com>
References: <C07C40DB2364324799506DE8FF12F8D86789AD@EPEXCH1.qlogic.org>
	<1E3DCD1C63492545881FACB6063A57C10257C6DE@mtiexch01.mti.com>
Message-ID: <C07C40DB2364324799506DE8FF12F8D86789B9@EPEXCH1.qlogic.org>

Hey, Boris,
 
I am, indeed, running current firmware. I'll try to isolate variables
and see if I can focus on why this machine has a problem.
 
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 

________________________________

From: Boris Shpolyansky [mailto:boris at mellanox.com] 
Sent: Tuesday, April 29, 2008 12:42 PM
To: Mike Heinz; general at lists.openfabrics.org
Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA


Hi Michael,
 
MT23108 HCAs are supported.
Please, update FW on your card to the latest version available from
Mellanox web site at
http://www.mellanox.com/support/firmware_table_IH.php
 
Follow FW burning instructions provided there.
 
Regards,
Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com
 
 
________________________________

From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz
Sent: Tuesday, April 29, 2008 9:38 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Can't Initialize an MT23108 HCA


I thought I'd posted this question on the group before, but looking
through my notes I couldn't find that I had. My apologies if this is a
repeat.
 
I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA
running 3.05 firmware. The HCA is known to work with QuickSilver. When I
rebooted I got this:
 
Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA
driver v0.08
(February 14, 2006)
Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0
Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] ->
GSI 28
(level, low) -> IRQ 217
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did
not come
back after reset, aborting.
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset
HCA,
aborting.
The system is running RHEL4, update 4, x86_64. Are older HCAs supported
with OFED or can only Arbel and Connect-X type HCAs usable?
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080429/67e82d68/attachment.html>

From rdreier at cisco.com  Tue Apr 29 10:33:37 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 10:33:37 -0700
Subject: [ofa-general] Re: [PATCH} IB/iSER: Move high-volume debug output to
	higher debug levels
In-Reply-To: <694d48600804280501q3cf74a10p2e1b73b4ac0d3d27@mail.gmail.com>
	(Eli Dorfman's message of "Mon, 28 Apr 2008 15:01:33 +0300")
References: <694d48600804280501q3cf74a10p2e1b73b4ac0d3d27@mail.gmail.com>
Message-ID: <aday76wwopa.fsf@cisco.com>

 > +module_param_named(debug_level, iser_debug_level, int,
 > S_IRUGO|S_IWUSR|S_IWGRP);
 > +MODULE_PARM_DESC(debug_level, "Enable debug tracing if > 0
 > (default:disabled)");

In addition to being line-wrapped this looks really funny... why add
S_IWGRP?  The ownership of parameter files is root:root so what do you
get from changing from the current 0644 permissions?

I applied the patch without this change, if there is a reason for this,
please send the permission change separately.

 - R.


From boris at mellanox.com  Tue Apr 29 10:33:41 2008
From: boris at mellanox.com (Boris Shpolyansky)
Date: Tue, 29 Apr 2008 10:33:41 -0700
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D86789B9@EPEXCH1.qlogic.org>
Message-ID: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com>

Hi Michael,
 
You mentioned FW version 3.05, while the latest is 3.5.0.
Please, verify.
 
Boris
 

________________________________

From: Mike Heinz [mailto:michael.heinz at qlogic.com] 
Sent: Tuesday, April 29, 2008 10:30 AM
To: Boris Shpolyansky; general at lists.openfabrics.org
Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA


Hey, Boris,
 
I am, indeed, running current firmware. I'll try to isolate variables
and see if I can focus on why this machine has a problem.
 
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 

________________________________

From: Boris Shpolyansky [mailto:boris at mellanox.com] 
Sent: Tuesday, April 29, 2008 12:42 PM
To: Mike Heinz; general at lists.openfabrics.org
Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA


Hi Michael,
 
MT23108 HCAs are supported.
Please, update FW on your card to the latest version available from
Mellanox web site at
http://www.mellanox.com/support/firmware_table_IH.php
 
Follow FW burning instructions provided there.
 
Regards,
Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com
 
 
________________________________

From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz
Sent: Tuesday, April 29, 2008 9:38 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Can't Initialize an MT23108 HCA


I thought I'd posted this question on the group before, but looking
through my notes I couldn't find that I had. My apologies if this is a
repeat.
 
I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA
running 3.05 firmware. The HCA is known to work with QuickSilver. When I
rebooted I got this:
 
Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA
driver v0.08
(February 14, 2006)
Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0
Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] ->
GSI 28
(level, low) -> IRQ 217
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did
not come
back after reset, aborting.
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset
HCA,
aborting.
The system is running RHEL4, update 4, x86_64. Are older HCAs supported
with OFED or can only Arbel and Connect-X type HCAs usable?
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080429/d9f4a872/attachment.html>

From rdreier at cisco.com  Tue Apr 29 10:36:05 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 10:36:05 -0700
Subject: [ofa-general] Re: [PATCH] IB/iSER: Count fmr alignment violations
	per session
In-Reply-To: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com>
	(Eli Dorfman's message of "Tue, 29 Apr 2008 10:33:07 +0300")
References: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com>
Message-ID: <adatzhkwol6.fsf@cisco.com>

thanks, applied


From michael.heinz at qlogic.com  Tue Apr 29 10:46:20 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Tue, 29 Apr 2008 12:46:20 -0500
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com>
References: <C07C40DB2364324799506DE8FF12F8D86789B9@EPEXCH1.qlogic.org>
	<1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com>
Message-ID: <C07C40DB2364324799506DE8FF12F8D86789BC@EPEXCH1.qlogic.org>

Boris,
 
The HCA is a Silverstorm version that reports to QuickSilver as
"3.05.0000rc01" which is equivalent to Mellanox "3.5.000". The "rc01"
means it was the first (and only) OEM build of that firmware.
 
I just compared the MLX files for the QuickSilver firmware and one I
just downloaded from Mellanox and they are identical.
 
I'm looking for another suitable machine to see if I get the same
behavior.
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 

________________________________

From: Boris Shpolyansky [mailto:boris at mellanox.com] 
Sent: Tuesday, April 29, 2008 1:34 PM
To: Mike Heinz; general at lists.openfabrics.org
Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA


Hi Michael,
 
You mentioned FW version 3.05, while the latest is 3.5.0.
Please, verify.
 
Boris
 

________________________________

From: Mike Heinz [mailto:michael.heinz at qlogic.com] 
Sent: Tuesday, April 29, 2008 10:30 AM
To: Boris Shpolyansky; general at lists.openfabrics.org
Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA


Hey, Boris,
 
I am, indeed, running current firmware. I'll try to isolate variables
and see if I can focus on why this machine has a problem.
 
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 

________________________________

From: Boris Shpolyansky [mailto:boris at mellanox.com] 
Sent: Tuesday, April 29, 2008 12:42 PM
To: Mike Heinz; general at lists.openfabrics.org
Subject: RE: [ofa-general] Can't Initialize an MT23108 HCA


Hi Michael,
 
MT23108 HCAs are supported.
Please, update FW on your card to the latest version available from
Mellanox web site at
http://www.mellanox.com/support/firmware_table_IH.php
 
Follow FW burning instructions provided there.
 
Regards,
Boris Shpolyansky
Sr. Member of Technical Staff
Applications
Mellanox Technologies Inc.
2900 Stender Way
Santa Clara, CA 95054
Tel.: (408) 916 0014
Fax: (408) 970 3403
Cell: (408) 834 9365
www.mellanox.com
 
 
________________________________

From: general-bounces at lists.openfabrics.org
[mailto:general-bounces at lists.openfabrics.org] On Behalf Of Mike Heinz
Sent: Tuesday, April 29, 2008 9:38 AM
To: general at lists.openfabrics.org
Subject: [ofa-general] Can't Initialize an MT23108 HCA


I thought I'd posted this question on the group before, but looking
through my notes I couldn't find that I had. My apologies if this is a
repeat.
 
I installed OFED 1.3.0.0.4 on a system with an older, MT23108 HCA
running 3.05 firmware. The HCA is known to work with QuickSilver. When I
rebooted I got this:
 
Feb 18 13:06:02 newberry kernel: ib_mthca: Mellanox InfiniBand HCA
driver v0.08
(February 14, 2006)
Feb 18 13:06:02 newberry kernel: ib_mthca: Initializing 0000:04:00.0
Feb 18 13:06:02 newberry kernel: ACPI: PCI interrupt 0000:04:00.0[A] ->
GSI 28
(level, low) -> IRQ 217
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: PCI device did
not come
back after reset, aborting.
Feb 18 13:06:02 newberry kernel: ib_mthca 0000:04:00.0: Failed to reset
HCA,
aborting.
The system is running RHEL4, update 4, x86_64. Are older HCAs supported
with OFED or can only Arbel and Connect-X type HCAs usable?
 
--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080429/787acd5b/attachment.html>

From rdreier at cisco.com  Tue Apr 29 10:48:32 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 10:48:32 -0700
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <C07C40DB2364324799506DE8FF12F8D86789BC@EPEXCH1.qlogic.org> (Mike
	Heinz's message of "Tue, 29 Apr 2008 12:46:20 -0500")
References: <C07C40DB2364324799506DE8FF12F8D86789B9@EPEXCH1.qlogic.org>
	<1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com>
	<C07C40DB2364324799506DE8FF12F8D86789BC@EPEXCH1.qlogic.org>
Message-ID: <adalk2wwo0f.fsf@cisco.com>

 > I'm looking for another suitable machine to see if I get the same
 > behavior.

What are the details of the machine where you're seeing this problem?

I seem to recall some ancient Dell systems had problems with PCI-X HCAs
not reappearing on PCI after an HCA reset.

Also it might be worth checking that your BIOS is up-to-date.

 - R.


From michael.heinz at qlogic.com  Tue Apr 29 10:51:10 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Tue, 29 Apr 2008 12:51:10 -0500
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <adalk2wwo0f.fsf@cisco.com>
References: <C07C40DB2364324799506DE8FF12F8D86789B9@EPEXCH1.qlogic.org><1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com><C07C40DB2364324799506DE8FF12F8D86789BC@EPEXCH1.qlogic.org>
	<adalk2wwo0f.fsf@cisco.com>
Message-ID: <C07C40DB2364324799506DE8FF12F8D86789BE@EPEXCH1.qlogic.org>

It's an older Opteron box, circa 2004-2005 or so. 

I'm trying to find an Intel box I can test with.

--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania

-----Original Message-----
From: Roland Dreier [mailto:rdreier at cisco.com] 
Sent: Tuesday, April 29, 2008 1:49 PM
To: Mike Heinz
Cc: Boris Shpolyansky; general at lists.openfabrics.org
Subject: Re: [ofa-general] Can't Initialize an MT23108 HCA

 > I'm looking for another suitable machine to see if I get the same  >
behavior.

What are the details of the machine where you're seeing this problem?

I seem to recall some ancient Dell systems had problems with PCI-X HCAs
not reappearing on PCI after an HCA reset.

Also it might be worth checking that your BIOS is up-to-date.

 - R.


From rdreier at cisco.com  Tue Apr 29 11:10:32 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 11:10:32 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <1209370487.11248.1.camel@mtls03> (Eli Cohen's message of "Mon,
	28 Apr 2008 11:14:47 +0300")
References: <1209370487.11248.1.camel@mtls03>
Message-ID: <adahcdkwmzr.fsf@cisco.com>

Thanks, applied, with some fixups -- this patch seemed to be against some
tree I don't have, maybe OFED??  for example:

 > +	if (unlikely(priv->tx_outstanding > MAX_SEND_CQE))
 > +		poll_tx(priv);
 > +
 >  	return;
 >  
 >  drop:

I didn't see any version of ipoib_ib.c ever in the kernel tree that had
a drop: label.

 - R.


From rdreier at cisco.com  Tue Apr 29 11:15:17 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 11:15:17 -0700
Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support,
	Patch 10)
In-Reply-To: <OF187B0B8F.E74F0E35-ONC125743A.004F40A2-C125743A.00561614@de.ibm.com>
	(Hoang-Nam Nguyen's message of "Tue, 29 Apr 2008 17:40:52 +0200")
References: <OF187B0B8F.E74F0E35-ONC125743A.004F40A2-C125743A.00561614@de.ibm.com>
Message-ID: <adad4o8wmru.fsf@cisco.com>

 > >CQ are not per port but netdevices are bounded to port (its correct that
 > >few of them can be bounded to the same port, eg with different PKEYs or
 > >VLAN tags), maybe it worth thinking on API that either let the ULP
 > >dictate to what CPU/core they want the EQ serving this CQ direct its
 > >interrupts or if the ULP doesn't care, let the driver allocate that in
 > >round robin fashion.
 > We've had some ehca code doing round robin scheme, which is an ehca
 > specific
 > policy.
 > Do you have any thoughts on the approach you want to pursue?

I would just like to see an approach that is fully thought through and
gives a way for applications/kernel drivers to choose a CQ vector based
on some information about what CPU it will go to.

If we want to add a way to allow a request for round-robin, that is
fine, but I don't think we want to change the default to round-robin,
unless someone can come up with a workload where it actually helps.

 > Will it be 2.6.26 or 2.6.27 instead?

Given that we always seem to start this discussion at the end of the
merge window, and then no one follows up, it may be never...

certainly not 2.6.26 at this point.

 - R.


From ossrosch at linux.vnet.ibm.com  Tue Apr 29 11:15:36 2008
From: ossrosch at linux.vnet.ibm.com (Stefan Roscher)
Date: Tue, 29 Apr 2008 20:15:36 +0200
Subject: [ofa-general] [REPOST][PATCH] IB/ehca: Allocate event queue size
	depending on max number of CQs and QPs
In-Reply-To: <adafxt4y5kx.fsf@cisco.com>
References: <200804291744.17235.ossrosch@linux.vnet.ibm.com>
	<adafxt4y5kx.fsf@cisco.com>
Message-ID: <200804292015.38321.ossrosch@linux.vnet.ibm.com>

If a lot of QPs fall into Error state at once and the EQ of the respective
HCA is too small, it might overrun, causing the eHCA driver to stop
processing completion events and call application software's completion
handlers, effectively causing traffic to stop.

Fix this by limiting available QPs and CQs to a customizable max count,
and determining EQ size based on these counts and a worst-case assumption.

Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com>
---

Reposted based on Roland's comments:
- use atomic_add_unless instead of atomic_read
- inf% changelog increase ;)

 drivers/infiniband/hw/ehca/ehca_classes.h |    5 ++++
 drivers/infiniband/hw/ehca/ehca_cq.c      |   11 +++++++++
 drivers/infiniband/hw/ehca/ehca_main.c    |   36 +++++++++++++++++++++++++++-
 drivers/infiniband/hw/ehca/ehca_qp.c      |   26 +++++++++++++++++++-
 4 files changed, 74 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 3d6d946..00bab60 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -66,6 +66,7 @@ struct ehca_av;
 #include "ehca_irq.h"
 
 #define EHCA_EQE_CACHE_SIZE 20
+#define EHCA_MAX_NUM_QUEUES 0xffff
 
 struct ehca_eqe_cache_entry {
 	struct ehca_eqe *eqe;
@@ -127,6 +128,8 @@ struct ehca_shca {
 	/* MR pgsize: bit 0-3 means 4K, 64K, 1M, 16M respectively */
 	u32 hca_cap_mr_pgsize;
 	int max_mtu;
+	atomic_t num_cqs;
+	atomic_t num_qps;
 };
 
 struct ehca_pd {
@@ -344,6 +347,8 @@ extern int ehca_use_hp_mr;
 extern int ehca_scaling_code;
 extern int ehca_lock_hcalls;
 extern int ehca_nr_ports;
+extern int ehca_max_cq;
+extern int ehca_max_qp;
 
 struct ipzu_queue_resp {
 	u32 qe_size;      /* queue entry size */
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index ec0cfcf..5540b27 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -132,10 +132,19 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 	if (cqe >= 0xFFFFFFFF - 64 - additional_cqe)
 		return ERR_PTR(-EINVAL);
 
+	if (!atomic_add_unless(&shca->num_cqs, 1, ehca_max_cq)) {
+		ehca_err(device, "Unable to create CQ, max number of %i "
+			"CQs reached.", ehca_max_cq);
+		ehca_err(device, "To increase the maximum number of CQs "
+			"use the number_of_cqs module parameter.\n");
+		return ERR_PTR(-ENOSPC);
+	}
+
 	my_cq = kmem_cache_zalloc(cq_cache, GFP_KERNEL);
 	if (!my_cq) {
 		ehca_err(device, "Out of memory for ehca_cq struct device=%p",
 			 device);
+		atomic_dec(&shca->num_cqs);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -305,6 +314,7 @@ create_cq_exit2:
 create_cq_exit1:
 	kmem_cache_free(cq_cache, my_cq);
 
+	atomic_dec(&shca->num_cqs);
 	return cq;
 }
 
@@ -359,6 +369,7 @@ int ehca_destroy_cq(struct ib_cq *cq)
 	ipz_queue_dtor(NULL, &my_cq->ipz_queue);
 	kmem_cache_free(cq_cache, my_cq);
 
+	atomic_dec(&shca->num_cqs);
 	return 0;
 }
 
diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index 6504897..482103e 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -68,6 +68,8 @@ int ehca_port_act_time = 30;
 int ehca_static_rate   = -1;
 int ehca_scaling_code  = 0;
 int ehca_lock_hcalls   = -1;
+int ehca_max_cq        = -1;
+int ehca_max_qp        = -1;
 
 module_param_named(open_aqp1,     ehca_open_aqp1,     bool, S_IRUGO);
 module_param_named(debug_level,   ehca_debug_level,   int,  S_IRUGO);
@@ -79,6 +81,8 @@ module_param_named(poll_all_eqs,  ehca_poll_all_eqs,  bool, S_IRUGO);
 module_param_named(static_rate,   ehca_static_rate,   int,  S_IRUGO);
 module_param_named(scaling_code,  ehca_scaling_code,  bool, S_IRUGO);
 module_param_named(lock_hcalls,   ehca_lock_hcalls,   bool, S_IRUGO);
+module_param_named(number_of_cqs, ehca_max_cq,        int,  S_IRUGO);
+module_param_named(number_of_qps, ehca_max_qp,        int,  S_IRUGO);
 
 MODULE_PARM_DESC(open_aqp1,
 		 "Open AQP1 on startup (default: no)");
@@ -104,6 +108,12 @@ MODULE_PARM_DESC(scaling_code,
 MODULE_PARM_DESC(lock_hcalls,
 		 "Serialize all hCalls made by the driver "
 		 "(default: autodetect)");
+MODULE_PARM_DESC(number_of_cqs,
+		"Max number of CQs which can be allocated "
+		"(default: autodetect)");
+MODULE_PARM_DESC(number_of_qps,
+		"Max number of QPs which can be allocated "
+		"(default: autodetect)");
 
 DEFINE_RWLOCK(ehca_qp_idr_lock);
 DEFINE_RWLOCK(ehca_cq_idr_lock);
@@ -355,6 +365,25 @@ static int ehca_sense_attributes(struct ehca_shca *shca)
 		if (rblock->memory_page_size_supported & pgsize_map[i])
 			shca->hca_cap_mr_pgsize |= pgsize_map[i + 1];
 
+	/* Set maximum number of CQs and QPs to calculate EQ size */
+	if (ehca_max_qp == -1)
+		ehca_max_qp = min_t(int, rblock->max_qp, EHCA_MAX_NUM_QUEUES);
+	else if (ehca_max_qp < 1 || ehca_max_qp > rblock->max_qp) {
+		ehca_gen_err("Requested number of QPs is out of range (1 - %i) "
+			"specified by HW", rblock->max_qp);
+		ret = -EINVAL;
+		goto sense_attributes1;
+	}
+
+	if (ehca_max_cq == -1)
+		ehca_max_cq = min_t(int, rblock->max_cq, EHCA_MAX_NUM_QUEUES);
+	else if (ehca_max_cq < 1 || ehca_max_cq > rblock->max_cq) {
+		ehca_gen_err("Requested number of CQs is out of range (1 - %i) "
+			"specified by HW", rblock->max_cq);
+		ret = -EINVAL;
+		goto sense_attributes1;
+	}
+
 	/* query max MTU from first port -- it's the same for all ports */
 	port = (struct hipz_query_port *)rblock;
 	h_ret = hipz_h_query_port(shca->ipz_hca_handle, 1, port);
@@ -684,7 +713,7 @@ static int __devinit ehca_probe(struct of_device *dev,
 	struct ehca_shca *shca;
 	const u64 *handle;
 	struct ib_pd *ibpd;
-	int ret, i;
+	int ret, i, eq_size;
 
 	handle = of_get_property(dev->node, "ibm,hca-handle", NULL);
 	if (!handle) {
@@ -705,6 +734,8 @@ static int __devinit ehca_probe(struct of_device *dev,
 		return -ENOMEM;
 	}
 	mutex_init(&shca->modify_mutex);
+	atomic_set(&shca->num_cqs, 0);
+	atomic_set(&shca->num_qps, 0);
 	for (i = 0; i < ARRAY_SIZE(shca->sport); i++)
 		spin_lock_init(&shca->sport[i].mod_sqp_lock);
 
@@ -724,8 +755,9 @@ static int __devinit ehca_probe(struct of_device *dev,
 		goto probe1;
 	}
 
+	eq_size = 2 * ehca_max_cq + 4 * ehca_max_qp;
 	/* create event queues */
-	ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, 2048);
+	ret = ehca_create_eq(shca, &shca->eq, EHCA_EQ, eq_size);
 	if (ret) {
 		ehca_err(&shca->ib_device, "Cannot create EQ.");
 		goto probe1;
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 57bef11..18fba92 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -421,8 +421,18 @@ static struct ehca_qp *internal_create_qp(
 	u32 swqe_size = 0, rwqe_size = 0, ib_qp_num;
 	unsigned long flags;
 
-	if (init_attr->create_flags)
+	if (!atomic_add_unless(&shca->num_qps, 1, ehca_max_qp)) {
+		ehca_err(pd->device, "Unable to create QP, max number of %i "
+			 "QPs reached.", ehca_max_qp);
+		ehca_err(pd->device, "To increase the maximum number of QPs "
+			 "use the number_of_qps module parameter.\n");
+		return ERR_PTR(-ENOSPC);
+	}
+
+	if (init_attr->create_flags) {
+		atomic_dec(&shca->num_qps);
 		return ERR_PTR(-EINVAL);
+	}
 
 	memset(&parms, 0, sizeof(parms));
 	qp_type = init_attr->qp_type;
@@ -431,6 +441,7 @@ static struct ehca_qp *internal_create_qp(
 		init_attr->sq_sig_type != IB_SIGNAL_ALL_WR) {
 		ehca_err(pd->device, "init_attr->sg_sig_type=%x not allowed",
 			 init_attr->sq_sig_type);
+		atomic_dec(&shca->num_qps);
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -455,6 +466,7 @@ static struct ehca_qp *internal_create_qp(
 
 	if (is_llqp && has_srq) {
 		ehca_err(pd->device, "LLQPs can't have an SRQ");
+		atomic_dec(&shca->num_qps);
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -466,6 +478,7 @@ static struct ehca_qp *internal_create_qp(
 			ehca_err(pd->device, "no more than three SGEs "
 				 "supported for SRQ  pd=%p  max_sge=%x",
 				 pd, init_attr->cap.max_recv_sge);
+			atomic_dec(&shca->num_qps);
 			return ERR_PTR(-EINVAL);
 		}
 	}
@@ -477,6 +490,7 @@ static struct ehca_qp *internal_create_qp(
 	    qp_type != IB_QPT_SMI &&
 	    qp_type != IB_QPT_GSI) {
 		ehca_err(pd->device, "wrong QP Type=%x", qp_type);
+		atomic_dec(&shca->num_qps);
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -490,6 +504,7 @@ static struct ehca_qp *internal_create_qp(
 					 "or max_rq_wr=%x for RC LLQP",
 					 init_attr->cap.max_send_wr,
 					 init_attr->cap.max_recv_wr);
+				atomic_dec(&shca->num_qps);
 				return ERR_PTR(-EINVAL);
 			}
 			break;
@@ -497,6 +512,7 @@ static struct ehca_qp *internal_create_qp(
 			if (!EHCA_BMASK_GET(HCA_CAP_UD_LL_QP, shca->hca_cap)) {
 				ehca_err(pd->device, "UD LLQP not supported "
 					 "by this adapter");
+				atomic_dec(&shca->num_qps);
 				return ERR_PTR(-ENOSYS);
 			}
 			if (!(init_attr->cap.max_send_sge <= 5
@@ -508,20 +524,22 @@ static struct ehca_qp *internal_create_qp(
 					 "or max_recv_sge=%x for UD LLQP",
 					 init_attr->cap.max_send_sge,
 					 init_attr->cap.max_recv_sge);
+				atomic_dec(&shca->num_qps);
 				return ERR_PTR(-EINVAL);
 			} else if (init_attr->cap.max_send_wr > 255) {
 				ehca_err(pd->device,
 					 "Invalid Number of "
 					 "max_send_wr=%x for UD QP_TYPE=%x",
 					 init_attr->cap.max_send_wr, qp_type);
+				atomic_dec(&shca->num_qps);
 				return ERR_PTR(-EINVAL);
 			}
 			break;
 		default:
 			ehca_err(pd->device, "unsupported LL QP Type=%x",
 				 qp_type);
+			atomic_dec(&shca->num_qps);
 			return ERR_PTR(-EINVAL);
-			break;
 		}
 	} else {
 		int max_sge = (qp_type == IB_QPT_UD || qp_type == IB_QPT_SMI
@@ -533,6 +551,7 @@ static struct ehca_qp *internal_create_qp(
 				 "send_sge=%x recv_sge=%x max_sge=%x",
 				 init_attr->cap.max_send_sge,
 				 init_attr->cap.max_recv_sge, max_sge);
+			atomic_dec(&shca->num_qps);
 			return ERR_PTR(-EINVAL);
 		}
 	}
@@ -543,6 +562,7 @@ static struct ehca_qp *internal_create_qp(
 	my_qp = kmem_cache_zalloc(qp_cache, GFP_KERNEL);
 	if (!my_qp) {
 		ehca_err(pd->device, "pd=%p not enough memory to alloc qp", pd);
+		atomic_dec(&shca->num_qps);
 		return ERR_PTR(-ENOMEM);
 	}
 
@@ -823,6 +843,7 @@ create_qp_exit1:
 
 create_qp_exit0:
 	kmem_cache_free(qp_cache, my_qp);
+	atomic_dec(&shca->num_qps);
 	return ERR_PTR(ret);
 }
 
@@ -1948,6 +1969,7 @@ static int internal_destroy_qp(struct ib_device *dev, struct ehca_qp *my_qp,
 	if (HAS_SQ(my_qp))
 		ipz_queue_dtor(my_pd, &my_qp->ipz_squeue);
 	kmem_cache_free(qp_cache, my_qp);
+	atomic_dec(&shca->num_qps);
 	return 0;
 }
 
-- 
1.5.5


From rdreier at cisco.com  Tue Apr 29 11:18:05 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 11:18:05 -0700
Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's
In-Reply-To: <1209460653.28929.1.camel@mtls03> (Eli Cohen's message of "Tue,
	29 Apr 2008 12:17:33 +0300")
References: <1209460653.28929.1.camel@mtls03>
Message-ID: <ada8wywwmn6.fsf@cisco.com>

 > When the child joins the broadcast group reset the mtu to
 > the real one.

This changelog is a little too short for me to understand what this is
fixing.  It seems that child devices are left with a bogus MTU until
they complete their multicast join, is that it?

 > +	priv->dev->mtu  = IPOIB_UD_MTU(priv->max_ib_mtu);
 > +	priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;

Do child devices also need to copy over the checksum offload/LSO stuff
from the parent?

 - R.


From rdreier at cisco.com  Tue Apr 29 11:20:28 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 11:20:28 -0700
Subject: [ofa-general] Re: [REPOST][PATCH] IB/ehca: Allocate event queue size
	depending on max number of CQs and QPs
In-Reply-To: <200804292015.38321.ossrosch@linux.vnet.ibm.com> (Stefan
	Roscher's message of "Tue, 29 Apr 2008 20:15:36 +0200")
References: <200804291744.17235.ossrosch@linux.vnet.ibm.com>
	<adafxt4y5kx.fsf@cisco.com>
	<200804292015.38321.ossrosch@linux.vnet.ibm.com>
Message-ID: <ada4p9kwmj7.fsf@cisco.com>

thanks, makes sense, applied.

fast turnaround too ;)


From rdreier at cisco.com  Tue Apr 29 11:25:49 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 11:25:49 -0700
Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old
	FMR R_Keys too soon
In-Reply-To: <200804241109.52448.okir@lst.de> (Olaf Kirch's message of "Thu,
	24 Apr 2008 11:09:51 +0200")
References: <200804241106.57172.okir@lst.de> <200804241108.58748.okir@lst.de>
	<200804241109.52448.okir@lst.de>
Message-ID: <adazlrcv7pu.fsf@cisco.com>

 > Content-Transfer-Encoding: quoted-printable

ugh, mangled patch.

simple enough that I applied it by hand as separate patches to mthca and mlx4.

 - R.


From a-b-j at aeroandspaceusa.com  Tue Apr 29 11:30:48 2008
From: a-b-j at aeroandspaceusa.com (Andre Hargrove)
Date: Tue, 29 Apr 2008 19:30:48 +0100
Subject: [ofa-general] i still remember you
Message-ID: <01c8aa2f$86b12400$c9171ac3@a-b-j>

Hello! I am tired today. I am nice girl that would like to chat with you. Email me at Julia at themayle.cn only, because I am using my friend's email to write this. Will send some of my pictures


From michael.heinz at qlogic.com  Tue Apr 29 11:59:28 2008
From: michael.heinz at qlogic.com (Mike Heinz)
Date: Tue, 29 Apr 2008 13:59:28 -0500
Subject: [ofa-general] Can't Initialize an MT23108 HCA
In-Reply-To: <adalk2wwo0f.fsf@cisco.com>
References: <C07C40DB2364324799506DE8FF12F8D86789B9@EPEXCH1.qlogic.org><1E3DCD1C63492545881FACB6063A57C10257C6F4@mtiexch01.mti.com><C07C40DB2364324799506DE8FF12F8D86789BC@EPEXCH1.qlogic.org>
	<adalk2wwo0f.fsf@cisco.com>
Message-ID: <C07C40DB2364324799506DE8FF12F8D86789D7@EPEXCH1.qlogic.org>

Roland, Boris,

Good news for you, bad news for me. When I switched a different machine
over to a Tavor HCA, the HCA came up as expected. So, the problem is
either with the particular machine or particular HCA.

I'll keep playing to see if I can isolate the important factor, but it
doesn't look like an OFED problem. (unless it's a problem with early
Opteron boxes or something like that...)

--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania

-----Original Message-----
From: Roland Dreier [mailto:rdreier at cisco.com] 
Sent: Tuesday, April 29, 2008 1:49 PM
To: Mike Heinz
Cc: Boris Shpolyansky; general at lists.openfabrics.org
Subject: Re: [ofa-general] Can't Initialize an MT23108 HCA

 > I'm looking for another suitable machine to see if I get the same  >
behavior.

What are the details of the machine where you're seeing this problem?

I seem to recall some ancient Dell systems had problems with PCI-X HCAs
not reappearing on PCI after an HCA reset.

Also it might be worth checking that your BIOS is up-to-date.

 - R.


From eli at dev.mellanox.co.il  Tue Apr 29 12:25:18 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 29 Apr 2008 22:25:18 +0300
Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's
In-Reply-To: <ada8wywwmn6.fsf@cisco.com>
References: <1209460653.28929.1.camel@mtls03> <ada8wywwmn6.fsf@cisco.com>
Message-ID: <4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com>

On Tue, Apr 29, 2008 at 9:18 PM, Roland Dreier <rdreier at cisco.com> wrote:
>
> This changelog is a little too short for me to understand what this is
> fixing.  It seems that child devices are left with a bogus MTU until
> they complete their multicast join, is that it?

The situation is even worse since even when multicast join completes,
the device's MTU
will not be updated since the following statment

                dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

 at ipoib_mcast_join_task() yields zero since admin mtu is zero.

>
> Do child devices also need to copy over the checksum offload/LSO stuff
> from the parent?
>

I think they do but it would require using two fields for flags at the
private data. priv->flags would
save flags that relate to the state of the net device, and say,
priv->cap_flags, to save stuff like LRO,
checksum or any other stuff related to capabilities. What do you think?


From rdreier at cisco.com  Tue Apr 29 12:46:47 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 12:46:47 -0700
Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's
In-Reply-To: <4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com>
	(Eli Cohen's message of "Tue, 29 Apr 2008 22:25:18 +0300")
References: <1209460653.28929.1.camel@mtls03> <ada8wywwmn6.fsf@cisco.com>
	<4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com>
Message-ID: <adaskx4v3yw.fsf@cisco.com>

 > I think they do but it would require using two fields for flags at the
 > private data. priv->flags would
 > save flags that relate to the state of the net device, and say,
 > priv->cap_flags, to save stuff like LRO,
 > checksum or any other stuff related to capabilities. What do you think?

We could do that or just copy only the flags that should be copied when
creating a child device.

 - R.


From eli at dev.mellanox.co.il  Tue Apr 29 12:47:12 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 29 Apr 2008 22:47:12 +0300
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <adahcdkwmzr.fsf@cisco.com>
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
Message-ID: <4e6a6b3c0804291247g5fc3cd6dw357d7a877f48ceee@mail.gmail.com>

>
> I didn't see any version of ipoib_ib.c ever in the kernel tree that had
> a drop: label.
>

I must have had some patches stacked in my git tree.
Thanks.


From eli at dev.mellanox.co.il  Tue Apr 29 12:50:19 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Tue, 29 Apr 2008 22:50:19 +0300
Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's
In-Reply-To: <adaskx4v3yw.fsf@cisco.com>
References: <1209460653.28929.1.camel@mtls03> <ada8wywwmn6.fsf@cisco.com>
	<4e6a6b3c0804291225v465e02a4u725018431e94d038@mail.gmail.com>
	<adaskx4v3yw.fsf@cisco.com>
Message-ID: <4e6a6b3c0804291250p2e2cb4dfk4319f25f04bf13c7@mail.gmail.com>

>
> We could do that or just copy only the flags that should be copied when
> creating a child device.
>

Or we could define a "clone" function that will have the wisdom of
which flags to copy.


From rdreier at cisco.com  Tue Apr 29 13:19:22 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 13:19:22 -0700
Subject: [ofa-general] Re: [PATCH] IB/ipoib: set child MTU as the parent's
In-Reply-To: <1209460653.28929.1.camel@mtls03> (Eli Cohen's message of "Tue,
	29 Apr 2008 12:17:33 +0300")
References: <1209460653.28929.1.camel@mtls03>
Message-ID: <ada63u0v2gl.fsf@cisco.com>

anyway, I applied this at least.


From gstreiff at neteffect.com  Tue Apr 29 13:24:30 2008
From: gstreiff at neteffect.com (Glenn Streiff)
Date: Tue, 29 Apr 2008 15:24:30 -0500
Subject: [ofa-general] [ PATCH 2/3 v2 ] RDMA/nes SFP+ enablement
Message-ID: <200804292024.m3TKOU3w023065@velma.neteffect.com>

From: Eric Schneider <eric.schneider at neteffect.com>

This patch enables the iw_nes module for NetEffect RNICs to
support additional PHYs including SFP+ optical transceivers (referred
to as ARGUS in the code).

Signed-off-by: Eric Schneider <eric.schneider at neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff at neteffect.com>
---
Roland, here is the cleaned up sfp patch.

 drivers/infiniband/hw/nes/nes.h       |    4 -
 drivers/infiniband/hw/nes/nes_hw.c    |  221 +++++++++++++++++++++++++++++----
 drivers/infiniband/hw/nes/nes_hw.h    |    6 +
 drivers/infiniband/hw/nes/nes_nic.c   |   72 +++++++----
 drivers/infiniband/hw/nes/nes_utils.c |   10 -
 5 files changed, 249 insertions(+), 64 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes.h b/drivers/infiniband/hw/nes/nes.h
index 484b5e3..1f9f7bf 100644
--- a/drivers/infiniband/hw/nes/nes.h
+++ b/drivers/infiniband/hw/nes/nes.h
@@ -536,8 +536,8 @@ int nes_register_ofa_device(struct nes_i
 int nes_read_eeprom_values(struct nes_device *, struct nes_adapter *);
 void nes_write_1G_phy_reg(struct nes_device *, u8, u8, u16);
 void nes_read_1G_phy_reg(struct nes_device *, u8, u8, u16 *);
-void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16);
-void nes_read_10G_phy_reg(struct nes_device *, u16, u8);
+void nes_write_10G_phy_reg(struct nes_device *, u16, u8, u16, u16);
+void nes_read_10G_phy_reg(struct nes_device *, u8, u8, u16);
 struct nes_cqp_request *nes_get_cqp_request(struct nes_device *);
 void nes_post_cqp_request(struct nes_device *, struct nes_cqp_request *, int);
 int nes_arp_table(struct nes_device *, u32, u8 *, u32);
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 197eee9..0887ed5 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -1208,11 +1208,16 @@ int nes_init_phy(struct nes_device *nesd
 {
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	u32 counter = 0;
+	u32 sds_common_control0;
 	u32 mac_index = nesdev->mac_index;
-	u32 tx_config;
+	u32 tx_config = 0;
 	u16 phy_data;
+	u32 temp_phy_data = 0;
+	u32 temp_phy_data2 = 0;
+	u32 i = 0;
 
-	if (nesadapter->OneG_Mode) {
+	if ((nesadapter->OneG_Mode) &&
+	    (nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 		nes_debug(NES_DBG_PHY, "1G PHY, mac_index = %d.\n", mac_index);
 		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_1G) {
 			printk(PFX "%s: Programming mdc config for 1G\n", __func__);
@@ -1278,12 +1283,116 @@ int nes_init_phy(struct nes_device *nesd
 		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], &phy_data);
 		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], phy_data | 0x0300);
 	} else {
-		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) {
+		if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) ||
+		    (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) {
 			/* setup 10G MDIO operation */
 			tx_config = nes_read_indexed(nesdev, NES_IDX_MAC_TX_CONFIG);
 			tx_config |= 0x14;
 			nes_write_indexed(nesdev, NES_IDX_MAC_TX_CONFIG, tx_config);
 		}
+		if ((nesadapter->phy_type[mac_index] == NES_PHY_TYPE_ARGUS)) {
+			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+
+			temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+			mdelay(10);
+			nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+			temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+
+			/* if firmware is already running (like from a driver un-load/load, don't do anything. */
+			if (temp_phy_data == temp_phy_data2) {
+				/* configure QT2505 AMCC PHY */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0x0000, 0x8000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc302, 0x0044);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc318, 0x0052);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc319, 0x0008);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc31a, 0x0098);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0026, 0x0E00);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0027, 0x0000);
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0x0028, 0xA528);
+
+				/*
+				 * remove micro from reset; chip boots from ROM,
+				 * uploads EEPROM f/w image, uC executes f/w
+				 */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc300, 0x0002);
+
+				/* wait for heart beat to start to know loading is done */
+				counter = 0;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if (counter++ > 1000) {
+						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from heartbeat check <this is bad!!!> \n");
+						break;
+					}
+					mdelay(100);
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7ee);
+					temp_phy_data2 = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+				} while ((temp_phy_data2 == temp_phy_data));
+
+				/* wait for tracking to start to know f/w is good to go */
+				counter = 0;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x3, 0xd7fd);
+					temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if (counter++ > 1000) {
+						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from status check <this is bad!!!> \n");
+						break;
+					}
+					mdelay(1000);
+					/*
+					 * nes_debug(NES_DBG_PHY, "AMCC PHY- phy_status not ready yet = 0x%02X\n",
+					 *			temp_phy_data);
+					 */
+				} while (((temp_phy_data & 0xff) != 0x50) && ((temp_phy_data & 0xff) != 0x70));
+
+				/* set LOS Control invert RXLOSB_I_PADINV */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd003, 0x0000);
+				/* set LOS Control to mask of RXLOSB_I */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xc314, 0x0042);
+				/* set LED1 to input mode (LED1 and LED2 share same LED) */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd006, 0x0007);
+				/* set LED2 to RX link_status and activity */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd007, 0x000A);
+				/* set LED3 to RX link_status */
+				nes_write_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 0x1, 0xd008, 0x0009);
+
+				/*
+				 * reset the res-calibration on t2 serdes;
+				 * ensures it is stable after the amcc phy is stable
+				 */
+
+				sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0);
+				sds_common_control0 |= 0x1;
+				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
+
+				/* release the res-calibration reset */
+				sds_common_control0 &= 0xfffffffe;
+				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
+
+				i = 0;
+				while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040)
+						&& (i++ < 5000)) {
+					/* mdelay(1); */
+				}
+
+				/*
+				 * wait for link train done before moving on,
+				 * or will get an interupt storm
+				 */
+				counter = 0;
+				do {
+					temp_phy_data = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
+								(0x200 * (nesdev->mac_index & 1)));
+					if (counter++ > 1000) {
+						nes_debug(NES_DBG_PHY, "AMCC PHY- breaking from link train wait <this is bad, link didnt train!!!>\n");
+						break;
+					}
+					mdelay(1);
+				} while (((temp_phy_data & 0x0f1f0000) != 0x0f0f0000));
+			}
+		}
 	}
 	return 0;
 }
@@ -2107,6 +2216,8 @@ static void nes_process_mac_intr(struct 
 	u32 u32temp;
 	u16 phy_data;
 	u16 temp_phy_data;
+	u32 pcs_val  = 0x0f0f0000;
+	u32 pcs_mask = 0x0f1f0000;
 
 	spin_lock_irqsave(&nesadapter->phy_lock, flags);
 	if (nesadapter->mac_sw_state[mac_number] != NES_MAC_SW_IDLE) {
@@ -2170,13 +2281,30 @@ static void nes_process_mac_intr(struct 
 		nes_debug(NES_DBG_PHY, "Eth SERDES Common Status: 0=0x%08X, 1=0x%08X\n",
 				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0),
 				nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_STATUS0+0x200));
-		pcs_control_status = nes_read_indexed(nesdev,
-				NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
-		pcs_control_status = nes_read_indexed(nesdev,
-				NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index&1)*0x200));
+
+		if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_PUMA_1G) {
+			switch (mac_index) {
+			case 1:
+			case 3:
+				pcs_control_status = nes_read_indexed(nesdev,
+						NES_IDX_PHY_PCS_CONTROL_STATUS0 + 0x200);
+				break;
+			default:
+				pcs_control_status = nes_read_indexed(nesdev,
+						NES_IDX_PHY_PCS_CONTROL_STATUS0);
+				break;
+			}
+		} else {
+			pcs_control_status = nes_read_indexed(nesdev,
+					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200));
+			pcs_control_status = nes_read_indexed(nesdev,
+					NES_IDX_PHY_PCS_CONTROL_STATUS0 + ((mac_index & 1) * 0x200));
+		}
+
 		nes_debug(NES_DBG_PHY, "PCS PHY Control/Status%u: 0x%08X\n",
 				mac_index, pcs_control_status);
-		if (nesadapter->OneG_Mode) {
+		if ((nesadapter->OneG_Mode) &&
+				(nesadapter->phy_type[mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 			u32temp = 0x01010000;
 			if (nesadapter->port_count > 2) {
 				u32temp |= 0x02020000;
@@ -2185,24 +2313,59 @@ static void nes_process_mac_intr(struct 
 				phy_data = 0;
 				nes_debug(NES_DBG_PHY, "PCS says the link is down\n");
 			}
-		} else if (nesadapter->phy_type[mac_index] == NES_PHY_TYPE_IRIS) {
-			nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]);
-			temp_phy_data = (u16)nes_read_indexed(nesdev,
-								NES_IDX_MAC_MDIO_CONTROL);
-			u32temp = 20;
-			do {
-				nes_read_10G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index]);
-				phy_data = (u16)nes_read_indexed(nesdev,
-								NES_IDX_MAC_MDIO_CONTROL);
-				if ((phy_data == temp_phy_data) || (!(--u32temp)))
-					break;
-				temp_phy_data = phy_data;
-			} while (1);
-			nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
-				__func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP");
-
 		} else {
-			phy_data = (0x0f0f0000 == (pcs_control_status & 0x0f1f0000)) ? 4 : 0;
+			switch (nesadapter->phy_type[mac_index]) {
+			case NES_PHY_TYPE_IRIS:
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+				temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+				u32temp = 20;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+					phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if ((phy_data == temp_phy_data) || (!(--u32temp)))
+						break;
+					temp_phy_data = phy_data;
+				} while (1);
+				nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
+					__func__, phy_data, nesadapter->mac_link_down[mac_index] ? "DOWN" : "UP");
+				break;
+
+			case NES_PHY_TYPE_ARGUS:
+				/* clear the alarms */
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0x0008);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc001);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc002);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc005);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 4, 0xc006);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9003);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9004);
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 0x9005);
+				/* check link status */
+				nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+				temp_phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+				u32temp = 100;
+				do {
+					nes_read_10G_phy_reg(nesdev, nesadapter->phy_index[mac_index], 1, 1);
+
+					phy_data = (u16)nes_read_indexed(nesdev, NES_IDX_MAC_MDIO_CONTROL);
+					if ((phy_data == temp_phy_data) || (!(--u32temp)))
+						break;
+					temp_phy_data = phy_data;
+				} while (1);
+				nes_debug(NES_DBG_PHY, "%s: Phy data = 0x%04X, link was %s.\n",
+					__func__, phy_data, nesadapter->mac_link_down ? "DOWN" : "UP");
+				break;
+
+			case NES_PHY_TYPE_PUMA_1G:
+				if (mac_index < 2)
+					pcs_val = pcs_mask = 0x01010000;
+				else
+					pcs_val = pcs_mask = 0x02020000;
+				/* fall through */
+			default:
+				phy_data = (pcs_val == (pcs_control_status & pcs_mask)) ? 0x4 : 0x0;
+				break;
+			}
 		}
 
 		if (phy_data & 0x0004) {
@@ -2211,8 +2374,8 @@ static void nes_process_mac_intr(struct 
 				nes_debug(NES_DBG_PHY, "The Link is UP!!.  linkup was %d\n",
 						nesvnic->linkup);
 				if (nesvnic->linkup == 0) {
-					printk(PFX "The Link is now up for port %u, netdev %p.\n",
-							mac_index, nesvnic->netdev);
+					printk(PFX "The Link is now up for port %s, netdev %p.\n",
+							nesvnic->netdev->name, nesvnic->netdev);
 					if (netif_queue_stopped(nesvnic->netdev))
 						netif_start_queue(nesvnic->netdev);
 					nesvnic->linkup = 1;
@@ -2225,8 +2388,8 @@ static void nes_process_mac_intr(struct 
 				nes_debug(NES_DBG_PHY, "The Link is Down!!. linkup was %d\n",
 						nesvnic->linkup);
 				if (nesvnic->linkup == 1) {
-					printk(PFX "The Link is now down for port %u, netdev %p.\n",
-							mac_index, nesvnic->netdev);
+					printk(PFX "The Link is now down for port %s, netdev %p.\n",
+							nesvnic->netdev->name, nesvnic->netdev);
 					if (!(netif_queue_stopped(nesvnic->netdev)))
 						netif_stop_queue(nesvnic->netdev);
 					nesvnic->linkup = 0;
diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index 1363995..7d47f92 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -35,8 +35,10 @@ #define __NES_HW_H
 
 #include <linux/inet_lro.h>
 
-#define NES_PHY_TYPE_1G   2
-#define NES_PHY_TYPE_IRIS 3
+#define NES_PHY_TYPE_1G        2
+#define NES_PHY_TYPE_IRIS      3
+#define NES_PHY_TYPE_ARGUS     4
+#define NES_PHY_TYPE_PUMA_1G   5
 #define NES_PHY_TYPE_PUMA_10G  6
 
 #define NES_MULTICAST_PF_MAX 8
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index 6998af0..d65a846 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -1377,21 +1377,29 @@ static int nes_netdev_get_settings(struc
 
 	et_cmd->duplex = DUPLEX_FULL;
 	et_cmd->port = PORT_MII;
+
 	if (nesadapter->OneG_Mode) {
-		et_cmd->supported = SUPPORTED_1000baseT_Full|SUPPORTED_Autoneg;
-		et_cmd->advertising = ADVERTISED_1000baseT_Full|ADVERTISED_Autoneg;
 		et_cmd->speed = SPEED_1000;
-		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
-				&phy_data);
-		if (phy_data&0x1000) {
-			et_cmd->autoneg = AUTONEG_ENABLE;
+		if (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) {
+			et_cmd->supported   = SUPPORTED_1000baseT_Full;
+			et_cmd->advertising = ADVERTISED_1000baseT_Full;
+			et_cmd->autoneg     = AUTONEG_DISABLE;
+			et_cmd->transceiver = XCVR_INTERNAL;
+			et_cmd->phy_address = nesdev->mac_index;
 		} else {
-			et_cmd->autoneg = AUTONEG_DISABLE;
+			et_cmd->supported   = SUPPORTED_1000baseT_Full | SUPPORTED_Autoneg;
+			et_cmd->advertising = ADVERTISED_1000baseT_Full | ADVERTISED_Autoneg;
+			nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index], &phy_data);
+			if (phy_data & 0x1000)
+				et_cmd->autoneg = AUTONEG_ENABLE;
+			else
+				et_cmd->autoneg = AUTONEG_DISABLE;
+			et_cmd->transceiver = XCVR_EXTERNAL;
+			et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
 		}
-		et_cmd->transceiver = XCVR_EXTERNAL;
-		et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
 	} else {
-		if (nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) {
+		if ((nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) ||
+		    (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS)) {
 			et_cmd->transceiver = XCVR_EXTERNAL;
 			et_cmd->port = PORT_FIBRE;
 			et_cmd->supported = SUPPORTED_FIBRE;
@@ -1422,7 +1430,8 @@ static int nes_netdev_set_settings(struc
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	u16 phy_data;
 
-	if (nesadapter->OneG_Mode) {
+	if ((nesadapter->OneG_Mode) &&
+	    (nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G)) {
 		nes_read_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
 				&phy_data);
 		if (et_cmd->autoneg) {
@@ -1615,27 +1624,34 @@ struct net_device *nes_netdev_init(struc
 	list_add_tail(&nesvnic->list, &nesdev->nesadapter->nesvnic_list[nesdev->mac_index]);
 
 	if ((nesdev->netdev_count == 0) &&
-	    (PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index)) {
-		nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n",
-				NES_IDX_PHY_PCS_CONTROL_STATUS0+(0x200*(nesvnic->logical_port&1)));
+	    ((PCI_FUNC(nesdev->pcidev->devfn) == nesdev->mac_index) ||
+	     ((nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) &&
+	      (((PCI_FUNC(nesdev->pcidev->devfn) == 1) && (nesdev->mac_index == 2)) ||
+	       ((PCI_FUNC(nesdev->pcidev->devfn) == 2) && (nesdev->mac_index == 1)))))) {
+		/*
+		 * nes_debug(NES_DBG_INIT, "Setting up PHY interrupt mask. Using register index 0x%04X\n",
+		 *		NES_IDX_PHY_PCS_CONTROL_STATUS0 + (0x200 * (nesvnic->logical_port & 1)));
+		 */
 		u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesvnic->logical_port&1)));
-		u32temp |= 0x00200000;
-		nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesvnic->logical_port&1)), u32temp);
+				(0x200 * (nesdev->mac_index & 1)));
+		if (nesdev->nesadapter->phy_type[nesdev->mac_index] != NES_PHY_TYPE_PUMA_1G) {
+			u32temp |= 0x00200000;
+			nes_write_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
+				(0x200 * (nesdev->mac_index & 1)), u32temp);
+		}
+
 		u32temp = nes_read_indexed(nesdev, NES_IDX_PHY_PCS_CONTROL_STATUS0 +
-				(0x200*(nesvnic->logical_port&1)) );
+				(0x200 * (nesdev->mac_index & 1)));
+
 		if ((u32temp&0x0f1f0000) == 0x0f0f0000) {
-			if (nesdev->nesadapter->phy_type[nesvnic->logical_port] == NES_PHY_TYPE_IRIS) {
+			if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) {
 				nes_init_phy(nesdev);
-				nes_read_10G_phy_reg(nesdev, 1,
-						nesdev->nesadapter->phy_index[nesvnic->logical_port]);
+				nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1);
 				temp_phy_data = (u16)nes_read_indexed(nesdev,
 									NES_IDX_MAC_MDIO_CONTROL);
 				u32temp = 20;
 				do {
-					nes_read_10G_phy_reg(nesdev, 1,
-							nesdev->nesadapter->phy_index[nesvnic->logical_port]);
+					nes_read_10G_phy_reg(nesdev, nesdev->nesadapter->phy_index[nesdev->mac_index], 1, 1);
 					phy_data = (u16)nes_read_indexed(nesdev,
 									NES_IDX_MAC_MDIO_CONTROL);
 					if ((phy_data == temp_phy_data) || (!(--u32temp)))
@@ -1652,6 +1668,14 @@ struct net_device *nes_netdev_init(struc
 				nes_debug(NES_DBG_INIT, "The Link is UP!!.\n");
 				nesvnic->linkup = 1;
 			}
+		} else if (nesdev->nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_PUMA_1G) {
+			nes_debug(NES_DBG_INIT, "mac_index=%d, logical_port=%d, u32temp=0x%04X, PCI_FUNC=%d\n",
+				nesdev->mac_index, nesvnic->logical_port, u32temp, PCI_FUNC(nesdev->pcidev->devfn));
+			if (((nesdev->mac_index < 2) && ((u32temp&0x01010000) == 0x01010000)) ||
+			    ((nesdev->mac_index > 1) && ((u32temp&0x02020000) == 0x02020000)))  {
+				nes_debug(NES_DBG_INIT, "The Link is UP!!.\n");
+				nesvnic->linkup = 1;
+			}
 		}
 		/* clear the MAC interrupt status, assumes direct logical to physical mapping */
 		u32temp = nes_read_indexed(nesdev, NES_IDX_MAC_INT_STATUS + (0x200 * nesdev->mac_index));
diff --git a/drivers/infiniband/hw/nes/nes_utils.c b/drivers/infiniband/hw/nes/nes_utils.c
index c6d5631..fe83d1b 100644
--- a/drivers/infiniband/hw/nes/nes_utils.c
+++ b/drivers/infiniband/hw/nes/nes_utils.c
@@ -444,15 +444,13 @@ void nes_read_1G_phy_reg(struct nes_devi
 /**
  * nes_write_10G_phy_reg
  */
-void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg,
-		u8 phy_addr, u16 data)
+void nes_write_10G_phy_reg(struct nes_device *nesdev, u16 phy_addr, u8 dev_addr, u16 phy_reg,
+		u16 data)
 {
-	u32 dev_addr;
 	u32 port_addr;
 	u32 u32temp;
 	u32 counter;
 
-	dev_addr = 1;
 	port_addr = phy_addr;
 
 	/* set address */
@@ -492,14 +490,12 @@ void nes_write_10G_phy_reg(struct nes_de
  * This routine only issues the read, the data must be read
  * separately.
  */
-void nes_read_10G_phy_reg(struct nes_device *nesdev, u16 phy_reg, u8 phy_addr)
+void nes_read_10G_phy_reg(struct nes_device *nesdev, u8 phy_addr, u8 dev_addr, u16 phy_reg)
 {
-	u32 dev_addr;
 	u32 port_addr;
 	u32 u32temp;
 	u32 counter;
 
-	dev_addr = 1;
 	port_addr = phy_addr;
 
 	/* set address */


From gstreiff at neteffect.com  Tue Apr 29 13:25:01 2008
From: gstreiff at neteffect.com (Glenn Streiff)
Date: Tue, 29 Apr 2008 15:25:01 -0500
Subject: [ofa-general] [ PATCH 3/3 v2 ] RDMA/nes Formatting cleanup
Message-ID: <200804292025.m3TKP1im023075@velma.neteffect.com>

Various cleanups:
	Change // to /* .. */
	Place whitespace around binary operators.
	Trim down a few long lines.
	Some minor alignment formatting for better readability.
	Remove some silly tabs.

Signed-off-by: Glenn Streiff <gstreiff at neteffect.com>
---
Roland, this is the replacement patch for "RDMA/nes SFP+ cleanup".
I've fixed the whitespace issue with the array indices and swept
through a bit more code.  Feelings will not be hurt if I still don't
have it right...can always punt on this patch if necessary.  Glenn

 drivers/infiniband/hw/nes/nes_cm.c    |    8 +--
 drivers/infiniband/hw/nes/nes_hw.c    |  103 +++++++++++++++++----------------
 drivers/infiniband/hw/nes/nes_hw.h    |    2 -
 drivers/infiniband/hw/nes/nes_nic.c   |   96 ++++++++++++++++---------------
 drivers/infiniband/hw/nes/nes_verbs.c |    2 -
 5 files changed, 109 insertions(+), 102 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index d940fc2..9a4b40f 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -594,7 +594,7 @@ static void nes_cm_timer_tick(unsigned l
 				continue;
 			}
 			/* this seems like the correct place, but leave send entry unprotected */
-			// spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags);
+			/* spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags); */
 			atomic_inc(&send_entry->skb->users);
 			cm_packets_retrans++;
 			nes_debug(NES_DBG_CM, "Retransmitting send_entry %p for node %p,"
@@ -1335,7 +1335,7 @@ static int process_packet(struct nes_cm_
 							cm_node->loc_addr, cm_node->loc_port,
 							cm_node->rem_addr, cm_node->rem_port,
 							cm_node->state, atomic_read(&cm_node->ref_count));
-				// create event
+				/* create event */
 				cm_node->state = NES_CM_STATE_CLOSED;
 
 				create_event(cm_node, NES_CM_EVENT_ABORTED);
@@ -1669,7 +1669,7 @@ static struct nes_cm_node *mini_cm_conne
 	if (!cm_node)
 		return NULL;
 
-	// set our node side to client (active) side
+	/* set our node side to client (active) side */
 	cm_node->tcp_cntxt.client = 1;
 	cm_node->tcp_cntxt.rcv_wscale = NES_CM_DEFAULT_RCV_WND_SCALE;
 
@@ -1694,7 +1694,7 @@ static struct nes_cm_node *mini_cm_conne
 			loopbackremotenode->mpa_frame_size = mpa_frame_size -
 					sizeof(struct ietf_mpa_frame);
 
-			// we are done handling this state, set node to a TSA state
+			/* we are done handling this state, set node to a TSA state */
 			cm_node->state = NES_CM_STATE_TSA;
 			cm_node->tcp_cntxt.rcv_nxt = loopbackremotenode->tcp_cntxt.loc_seq_num;
 			loopbackremotenode->tcp_cntxt.rcv_nxt = cm_node->tcp_cntxt.loc_seq_num;
diff --git a/drivers/infiniband/hw/nes/nes_hw.c b/drivers/infiniband/hw/nes/nes_hw.c
index 0887ed5..1c02639 100644
--- a/drivers/infiniband/hw/nes/nes_hw.c
+++ b/drivers/infiniband/hw/nes/nes_hw.c
@@ -833,7 +833,7 @@ static void nes_init_csr_ne020(struct ne
 	nes_write_indexed(nesdev, 0x00000900, 0x20000001);
 	nes_write_indexed(nesdev, 0x000060C0, 0x0000028e);
 	nes_write_indexed(nesdev, 0x000060C8, 0x00000020);
-														//
+
 	nes_write_indexed(nesdev, 0x000001EC, 0x7b2625a0);
 	/* nes_write_indexed(nesdev, 0x000001EC, 0x5f2625a0); */
 
@@ -1229,7 +1229,7 @@ int nes_init_phy(struct nes_device *nesd
 		nes_read_1G_phy_reg(nesdev, 1, nesadapter->phy_index[mac_index], &phy_data);
 		nes_debug(NES_DBG_PHY, "Phy data from register 1 phy address %u = 0x%X.\n",
 				nesadapter->phy_index[mac_index], phy_data);
-		nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[mac_index],  0xb000);
+		nes_write_1G_phy_reg(nesdev, 23, nesadapter->phy_index[mac_index], 0xb000);
 
 		/* Reset the PHY */
 		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[mac_index], 0x8000);
@@ -1363,7 +1363,7 @@ int nes_init_phy(struct nes_device *nesd
 				 * ensures it is stable after the amcc phy is stable
 				 */
 
-				sds_common_control0 = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0);
+				sds_common_control0  = nes_read_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0);
 				sds_common_control0 |= 0x1;
 				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
 
@@ -1372,7 +1372,7 @@ int nes_init_phy(struct nes_device *nesd
 				nes_write_indexed(nesdev, NES_IDX_ETH_SERDES_COMMON_CONTROL0, sds_common_control0);
 
 				i = 0;
-				while (((nes_read32(nesdev->regs+NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040)
+				while (((nes_read32(nesdev->regs + NES_SOFTWARE_RESET) & 0x00000040) != 0x00000040)
 						&& (i++ < 5000)) {
 					/* mdelay(1); */
 				}
@@ -1649,10 +1649,10 @@ int nes_init_nic_qp(struct nes_device *n
 	}
 
 	u64temp = (u64)nesvnic->nic.sq_pbase;
-	nic_context->context_words[NES_NIC_CTX_SQ_LOW_IDX] = cpu_to_le32((u32)u64temp);
+	nic_context->context_words[NES_NIC_CTX_SQ_LOW_IDX]  = cpu_to_le32((u32)u64temp);
 	nic_context->context_words[NES_NIC_CTX_SQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp >> 32));
 	u64temp = (u64)nesvnic->nic.rq_pbase;
-	nic_context->context_words[NES_NIC_CTX_RQ_LOW_IDX] = cpu_to_le32((u32)u64temp);
+	nic_context->context_words[NES_NIC_CTX_RQ_LOW_IDX]  = cpu_to_le32((u32)u64temp);
 	nic_context->context_words[NES_NIC_CTX_RQ_HIGH_IDX] = cpu_to_le32((u32)(u64temp >> 32));
 
 	cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] = cpu_to_le32(NES_CQP_CREATE_QP |
@@ -1704,7 +1704,7 @@ int nes_init_nic_qp(struct nes_device *n
 		nic_rqe = &nesvnic->nic.rq_vbase[counter];
 		nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_1_0_IDX] = cpu_to_le32(nesvnic->max_frame_size);
 		nic_rqe->wqe_words[NES_NIC_RQ_WQE_LENGTH_3_2_IDX] = 0;
-		nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX] = cpu_to_le32((u32)pmem);
+		nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]  = cpu_to_le32((u32)pmem);
 		nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX] = cpu_to_le32((u32)((u64)pmem >> 32));
 		nesvnic->nic.rx_skb[counter] = skb;
 	}
@@ -1728,13 +1728,13 @@ int nes_init_nic_qp(struct nes_device *n
 			jumbomode = 1;
 		nes_nic_init_timer_defaults(nesdev, jumbomode);
 	}
-	nesvnic->lro_mgr.max_aggr = NES_LRO_MAX_AGGR;
-	nesvnic->lro_mgr.max_desc = NES_MAX_LRO_DESCRIPTORS;
-	nesvnic->lro_mgr.lro_arr = nesvnic->lro_desc;
+	nesvnic->lro_mgr.max_aggr       = NES_LRO_MAX_AGGR;
+	nesvnic->lro_mgr.max_desc       = NES_MAX_LRO_DESCRIPTORS;
+	nesvnic->lro_mgr.lro_arr        = nesvnic->lro_desc;
 	nesvnic->lro_mgr.get_skb_header = nes_lro_get_skb_hdr;
-	nesvnic->lro_mgr.features = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID;
-	nesvnic->lro_mgr.dev = netdev;
-	nesvnic->lro_mgr.ip_summed = CHECKSUM_UNNECESSARY;
+	nesvnic->lro_mgr.features       = LRO_F_NAPI | LRO_F_EXTRACT_VLAN_ID;
+	nesvnic->lro_mgr.dev            = netdev;
+	nesvnic->lro_mgr.ip_summed      = CHECKSUM_UNNECESSARY;
 	nesvnic->lro_mgr.ip_summed_aggr = CHECKSUM_UNNECESSARY;
 	return 0;
 }
@@ -1755,8 +1755,8 @@ void nes_destroy_nic_qp(struct nes_vnic 
 
 	/* Free remaining NIC receive buffers */
 	while (nesvnic->nic.rq_head != nesvnic->nic.rq_tail) {
-		nic_rqe = &nesvnic->nic.rq_vbase[nesvnic->nic.rq_tail];
-		wqe_frag = (u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]);
+		nic_rqe   = &nesvnic->nic.rq_vbase[nesvnic->nic.rq_tail];
+		wqe_frag  = (u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_LOW_IDX]);
 		wqe_frag |= ((u64)le32_to_cpu(nic_rqe->wqe_words[NES_NIC_RQ_WQE_FRAG0_HIGH_IDX])) << 32;
 		pci_unmap_single(nesdev->pcidev, (dma_addr_t)wqe_frag,
 				nesvnic->max_frame_size, PCI_DMA_FROMDEVICE);
@@ -1839,17 +1839,17 @@ int nes_napi_isr(struct nes_device *nesd
 	/* iff NIC, process here, else wait for DPC */
 	if ((int_stat) && ((int_stat & 0x0000ff00) == int_stat)) {
 		nesdev->napi_isr_ran = 0;
-		nes_write32(nesdev->regs+NES_INT_STAT,
-				(int_stat &
-				~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3)));
+		nes_write32(nesdev->regs + NES_INT_STAT,
+			(int_stat &
+			~(NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0 | NES_INT_MAC1 | NES_INT_MAC2 | NES_INT_MAC3)));
 
 		/* Process the CEQs */
 		nes_process_ceq(nesdev, &nesdev->nesadapter->ceq[nesdev->nic_ceq_index]);
 
 		if (unlikely((((nesadapter->et_rx_coalesce_usecs_irq) &&
-					   (!nesadapter->et_use_adaptive_rx_coalesce)) ||
-					  ((nesadapter->et_use_adaptive_rx_coalesce) &&
-					   (nesdev->deepcq_count > nesadapter->et_pkt_rate_low)))) ) {
+					(!nesadapter->et_use_adaptive_rx_coalesce)) ||
+					((nesadapter->et_use_adaptive_rx_coalesce) &&
+					 (nesdev->deepcq_count > nesadapter->et_pkt_rate_low))))) {
 			if ((nesdev->int_req & NES_INT_TIMER) == 0) {
 				/* Enable Periodic timer interrupts */
 				nesdev->int_req |= NES_INT_TIMER;
@@ -1927,12 +1927,12 @@ void nes_dpc(unsigned long param)
 		}
 
 		if (int_stat) {
-			if (int_stat & ~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|
-					NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3)) {
+			if (int_stat & ~(NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0|
+					NES_INT_MAC1|NES_INT_MAC2 | NES_INT_MAC3)) {
 				/* Ack the interrupts */
 				nes_write32(nesdev->regs+NES_INT_STAT,
-						(int_stat & ~(NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|
-						NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3)));
+					(int_stat & ~(NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0|
+					NES_INT_MAC1 | NES_INT_MAC2 | NES_INT_MAC3)));
 			}
 
 			temp_int_stat = int_stat;
@@ -1997,8 +1997,8 @@ void nes_dpc(unsigned long param)
 			}
 		}
 		/* Don't use the interface interrupt bit stay in loop */
-		int_stat &= ~NES_INT_INTF|NES_INT_TIMER|NES_INT_MAC0|
-				NES_INT_MAC1|NES_INT_MAC2|NES_INT_MAC3;
+		int_stat &= ~NES_INT_INTF | NES_INT_TIMER | NES_INT_MAC0 |
+				NES_INT_MAC1 | NES_INT_MAC2 | NES_INT_MAC3;
 	} while ((int_stat != 0) && (loop_counter++ < MAX_DPC_ITERATIONS));
 
 	if (timer_ints == 1) {
@@ -2009,9 +2009,9 @@ void nes_dpc(unsigned long param)
 					nesdev->timer_only_int_count = 0;
 					nesdev->int_req &= ~NES_INT_TIMER;
 					nes_write32(nesdev->regs + NES_INTF_INT_MASK, ~(nesdev->intf_int_req));
-					nes_write32(nesdev->regs+NES_INT_MASK, ~nesdev->int_req);
+					nes_write32(nesdev->regs + NES_INT_MASK, ~nesdev->int_req);
 				} else {
-					nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff|(~nesdev->int_req));
+					nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff | (~nesdev->int_req));
 				}
 			} else {
 				if (unlikely(nesadapter->et_use_adaptive_rx_coalesce))
@@ -2019,7 +2019,7 @@ void nes_dpc(unsigned long param)
 					nes_nic_init_timer(nesdev);
 				}
 				nesdev->timer_only_int_count = 0;
-				nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff|(~nesdev->int_req));
+				nes_write32(nesdev->regs+NES_INT_MASK, 0x0000ffff | (~nesdev->int_req));
 			}
 		} else {
 			nesdev->timer_only_int_count = 0;
@@ -2068,7 +2068,7 @@ static void nes_process_ceq(struct nes_d
 	do {
 		if (le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX]) &
 				NES_CEQE_VALID) {
-			u64temp = (((u64)(le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX])))<<32) |
+			u64temp = (((u64)(le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_HIGH_IDX]))) << 32) |
 						((u64)(le32_to_cpu(ceq->ceq_vbase[head].ceqe_words[NES_CEQE_CQ_CTX_LOW_IDX])));
 			u64temp <<= 1;
 			cq = *((struct nes_hw_cq **)&u64temp);
@@ -2096,7 +2096,7 @@ static void nes_process_ceq(struct nes_d
  */
 static void nes_process_aeq(struct nes_device *nesdev, struct nes_hw_aeq *aeq)
 {
-//	u64 u64temp;
+	/* u64 u64temp; */
 	u32 head;
 	u32 aeq_size;
 	u32 aeqe_misc;
@@ -2115,8 +2115,10 @@ static void nes_process_aeq(struct nes_d
 		if (aeqe_misc & (NES_AEQE_QP|NES_AEQE_CQ)) {
 			if (aeqe_cq_id >= NES_FIRST_QPN) {
 				/* dealing with an accelerated QP related AE */
-//				u64temp = (((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])))<<32) |
-//					((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX])));
+				/*
+				 * u64temp = (((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX]))) << 32) |
+				 *	     ((u64)(le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX])));
+				 */
 				nes_process_iwarp_aeqe(nesdev, (struct nes_hw_aeqe *)aeqe);
 			} else {
 				/* TODO: dealing with a CQP related AE */
@@ -2464,8 +2466,10 @@ void nes_nic_ce_handler(struct nes_devic
 				/* bump past the vlan tag */
 				wqe_fragment_length++;
 				if (le16_to_cpu(wqe_fragment_length[wqe_fragment_index]) != 0) {
-					u64temp = (u64) le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX+wqe_fragment_index*2]);
-					u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX+wqe_fragment_index*2]))<<32;
+					u64temp = (u64) le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX +
+							wqe_fragment_index * 2]);
+					u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX +
+							wqe_fragment_index * 2])) << 32;
 					bus_address = (dma_addr_t)u64temp;
 					if (test_and_clear_bit(nesnic->sq_tail, nesnic->first_frag_overflow)) {
 						pci_unmap_single(nesdev->pcidev,
@@ -2475,8 +2479,10 @@ void nes_nic_ce_handler(struct nes_devic
 					}
 					for (; wqe_fragment_index < 5; wqe_fragment_index++) {
 						if (wqe_fragment_length[wqe_fragment_index]) {
-							u64temp = le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX+wqe_fragment_index*2]);
-							u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX+wqe_fragment_index*2]))<<32;
+							u64temp = le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_LOW_IDX +
+										wqe_fragment_index * 2]);
+							u64temp += ((u64)le32_to_cpu(nic_sqe->wqe_words[NES_NIC_SQ_WQE_FRAG0_HIGH_IDX
+										+ wqe_fragment_index * 2])) <<32;
 							bus_address = (dma_addr_t)u64temp;
 							pci_unmap_page(nesdev->pcidev,
 									bus_address,
@@ -2523,7 +2529,7 @@ void nes_nic_ce_handler(struct nes_devic
 				if (atomic_read(&nesvnic->rx_skbs_needed) > (nesvnic->nic.rq_size>>1)) {
 					nes_write32(nesdev->regs+NES_CQE_ALLOC,
 							cq->cq_number | (cqe_count << 16));
-//					nesadapter->tune_timer.cq_count += cqe_count;
+					/* nesadapter->tune_timer.cq_count += cqe_count; */
 					nesdev->currcq_count += cqe_count;
 					cqe_count = 0;
 					nes_replenish_nic_rq(nesvnic);
@@ -2598,7 +2604,7 @@ void nes_nic_ce_handler(struct nes_devic
 				/* Replenish Nic CQ */
 				nes_write32(nesdev->regs+NES_CQE_ALLOC,
 						cq->cq_number | (cqe_count << 16));
-//				nesdev->nesadapter->tune_timer.cq_count += cqe_count;
+				/* nesdev->nesadapter->tune_timer.cq_count += cqe_count; */
 				nesdev->currcq_count += cqe_count;
 				cqe_count = 0;
 			}
@@ -2626,7 +2632,7 @@ void nes_nic_ce_handler(struct nes_devic
 	cq->cqe_allocs_pending = cqe_count;
 	if (unlikely(nesadapter->et_use_adaptive_rx_coalesce))
 	{
-//		nesdev->nesadapter->tune_timer.cq_count += cqe_count;
+		/* nesdev->nesadapter->tune_timer.cq_count += cqe_count; */
 		nesdev->currcq_count += cqe_count;
 		nes_nic_tune_timer(nesdev);
 	}
@@ -2661,7 +2667,7 @@ static void nes_cqp_ce_handler(struct ne
 
 		if (le32_to_cpu(cq->cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX]) & NES_CQE_VALID) {
 			u64temp = (((u64)(le32_to_cpu(cq->cq_vbase[head].
-					cqe_words[NES_CQE_COMP_COMP_CTX_HIGH_IDX])))<<32) |
+					cqe_words[NES_CQE_COMP_COMP_CTX_HIGH_IDX]))) << 32) |
 					((u64)(le32_to_cpu(cq->cq_vbase[head].
 					cqe_words[NES_CQE_COMP_COMP_CTX_LOW_IDX])));
 			cqp = *((struct nes_hw_cqp **)&u64temp);
@@ -2678,7 +2684,7 @@ static void nes_cqp_ce_handler(struct ne
 			}
 
 			u64temp = (((u64)(le32_to_cpu(nesdev->cqp.sq_vbase[cqp->sq_tail].
-					wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX])))<<32) |
+					wqe_words[NES_CQP_WQE_COMP_SCRATCH_HIGH_IDX]))) << 32) |
 					((u64)(le32_to_cpu(nesdev->cqp.sq_vbase[cqp->sq_tail].
 					wqe_words[NES_CQP_WQE_COMP_SCRATCH_LOW_IDX])));
 			cqp_request = *((struct nes_cqp_request **)&u64temp);
@@ -2715,7 +2721,7 @@ static void nes_cqp_ce_handler(struct ne
 				} else {
 					nes_debug(NES_DBG_CQP, "CQP request %p (opcode 0x%02X) freed.\n",
 							cqp_request,
-							le32_to_cpu(cqp_request->cqp_wqe.wqe_words[NES_CQP_WQE_OPCODE_IDX])&0x3f);
+							le32_to_cpu(cqp_request->cqp_wqe.wqe_words[NES_CQP_WQE_OPCODE_IDX]) & 0x3f);
 					if (cqp_request->dynamic) {
 						kfree(cqp_request);
 					} else {
@@ -2729,7 +2735,7 @@ static void nes_cqp_ce_handler(struct ne
 			}
 
 			cq->cq_vbase[head].cqe_words[NES_CQE_OPCODE_IDX] = 0;
-			nes_write32(nesdev->regs+NES_CQE_ALLOC, cq->cq_number | (1 << 16));
+			nes_write32(nesdev->regs + NES_CQE_ALLOC, cq->cq_number | (1 << 16));
 			if (++cqp->sq_tail >= cqp->sq_size)
 				cqp->sq_tail = 0;
 
@@ -2798,13 +2804,13 @@ static void nes_process_iwarp_aeqe(struc
 	nes_debug(NES_DBG_AEQ, "\n");
 	aeq_info = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_MISC_IDX]);
 	if ((NES_AEQE_INBOUND_RDMA&aeq_info) || (!(NES_AEQE_QP&aeq_info))) {
-		context = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]);
+		context  = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]);
 		context += ((u64)le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])) << 32;
 	} else {
 		aeqe_context = le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_LOW_IDX]);
 		aeqe_context += ((u64)le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_CTXT_HIGH_IDX])) << 32;
 		context = (unsigned long)nesadapter->qp_table[le32_to_cpu(
-						aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX])-NES_FIRST_QPN];
+						aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]) - NES_FIRST_QPN];
 		BUG_ON(!context);
 	}
 
@@ -2817,7 +2823,6 @@ static void nes_process_iwarp_aeqe(struc
 			le32_to_cpu(aeqe->aeqe_words[NES_AEQE_COMP_QP_CQ_ID_IDX]), aeqe,
 			nes_tcp_state_str[tcp_state], nes_iwarp_state_str[iwarp_state]);
 
-
 	switch (async_event_id) {
 		case NES_AEQE_AEID_LLP_FIN_RECEIVED:
 			nesqp = *((struct nes_qp **)&context);
@@ -3221,7 +3226,7 @@ void nes_manage_arp_cache(struct net_dev
 		cqp_wqe->wqe_words[NES_CQP_WQE_OPCODE_IDX] |= cpu_to_le32(NES_CQP_ARP_VALID);
 		cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_ADDR_LOW_IDX] = cpu_to_le32(
 				(((u32)mac_addr[2]) << 24) | (((u32)mac_addr[3]) << 16) |
-				(((u32)mac_addr[4]) << 8) | (u32)mac_addr[5]);
+				(((u32)mac_addr[4]) << 8)  | (u32)mac_addr[5]);
 		cqp_wqe->wqe_words[NES_CQP_ARP_WQE_MAC_HIGH_IDX] = cpu_to_le32(
 				(((u32)mac_addr[0]) << 16) | (u32)mac_addr[1]);
 	} else {
diff --git a/drivers/infiniband/hw/nes/nes_hw.h b/drivers/infiniband/hw/nes/nes_hw.h
index 7d47f92..6e58c44 100644
--- a/drivers/infiniband/hw/nes/nes_hw.h
+++ b/drivers/infiniband/hw/nes/nes_hw.h
@@ -969,7 +969,7 @@ #define DEFAULT_JUMBO_NES_QL_HIGH   128
 #define NES_NIC_CQ_DOWNWARD_TREND   16
 
 struct nes_hw_tune_timer {
-    //u16 cq_count;
+    /* u16 cq_count; */
     u16 threshold_low;
     u16 threshold_target;
     u16 threshold_high;
diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
index d65a846..1b0938c 100644
--- a/drivers/infiniband/hw/nes/nes_nic.c
+++ b/drivers/infiniband/hw/nes/nes_nic.c
@@ -185,12 +185,13 @@ static int nes_netdev_open(struct net_de
 	nic_active |= nic_active_bit;
 	nes_write_indexed(nesdev, NES_IDX_NIC_BROADCAST_ON, nic_active);
 
-	macaddr_high = ((u16)netdev->dev_addr[0]) << 8;
+	macaddr_high  = ((u16)netdev->dev_addr[0]) << 8;
 	macaddr_high += (u16)netdev->dev_addr[1];
-	macaddr_low = ((u32)netdev->dev_addr[2]) << 24;
-	macaddr_low += ((u32)netdev->dev_addr[3]) << 16;
-	macaddr_low += ((u32)netdev->dev_addr[4]) << 8;
-	macaddr_low += (u32)netdev->dev_addr[5];
+
+	macaddr_low   = ((u32)netdev->dev_addr[2]) << 24;
+	macaddr_low  += ((u32)netdev->dev_addr[3]) << 16;
+	macaddr_low  += ((u32)netdev->dev_addr[4]) << 8;
+	macaddr_low  += (u32)netdev->dev_addr[5];
 
 	/* Program the various MAC regs */
 	for (i = 0; i < NES_MAX_PORT_COUNT; i++) {
@@ -451,7 +452,7 @@ #define NES_MAX_TSO_FRAGS 18
 	__le16 *wqe_fragment_length;
 	u32 nr_frags;
 	u32 original_first_length;
-//	u64 *wqe_fragment_address;
+	/* u64 *wqe_fragment_address; */
 	/* first fragment (0) is used by copy buffer */
 	u16 wqe_fragment_index=1;
 	u16 hoffset;
@@ -461,11 +462,12 @@ #define NES_MAX_TSO_FRAGS 18
 	u32 old_head;
 	u32 wqe_misc;
 
-	/* nes_debug(NES_DBG_NIC_TX, "%s Request to tx NIC packet length %u, headlen %u,"
-			" (%u frags), tso_size=%u\n",
-			netdev->name, skb->len, skb_headlen(skb),
-			skb_shinfo(skb)->nr_frags, skb_is_gso(skb));
-	*/
+	/*
+	 * nes_debug(NES_DBG_NIC_TX, "%s Request to tx NIC packet length %u, headlen %u,"
+	 *		" (%u frags), tso_size=%u\n",
+	 *		netdev->name, skb->len, skb_headlen(skb),
+	 *		skb_shinfo(skb)->nr_frags, skb_is_gso(skb));
+	 */
 
 	if (!netif_carrier_ok(netdev))
 		return NETDEV_TX_OK;
@@ -795,12 +797,12 @@ static int nes_netdev_set_mac_address(st
 	memcpy(netdev->dev_addr, mac_addr->sa_data, netdev->addr_len);
 	printk(PFX "%s: Address length = %d, Address = %s\n",
 	       __func__, netdev->addr_len, print_mac(mac, mac_addr->sa_data));
-	macaddr_high = ((u16)netdev->dev_addr[0]) << 8;
+	macaddr_high  = ((u16)netdev->dev_addr[0]) << 8;
 	macaddr_high += (u16)netdev->dev_addr[1];
-	macaddr_low = ((u32)netdev->dev_addr[2]) << 24;
-	macaddr_low += ((u32)netdev->dev_addr[3]) << 16;
-	macaddr_low += ((u32)netdev->dev_addr[4]) << 8;
-	macaddr_low += (u32)netdev->dev_addr[5];
+	macaddr_low   = ((u32)netdev->dev_addr[2]) << 24;
+	macaddr_low  += ((u32)netdev->dev_addr[3]) << 16;
+	macaddr_low  += ((u32)netdev->dev_addr[4]) << 8;
+	macaddr_low  += (u32)netdev->dev_addr[5];
 
 	for (i = 0; i < NES_MAX_PORT_COUNT; i++) {
 		if (nesvnic->qp_nic_index[i] == 0xf) {
@@ -881,12 +883,12 @@ static void nes_netdev_set_multicast_lis
 					  print_mac(mac, multicast_addr->dmi_addr),
 					  perfect_filter_register_address+(mc_index * 8),
 					  mc_nic_index);
-				macaddr_high = ((u16)multicast_addr->dmi_addr[0]) << 8;
+				macaddr_high  = ((u16)multicast_addr->dmi_addr[0]) << 8;
 				macaddr_high += (u16)multicast_addr->dmi_addr[1];
-				macaddr_low = ((u32)multicast_addr->dmi_addr[2]) << 24;
-				macaddr_low += ((u32)multicast_addr->dmi_addr[3]) << 16;
-				macaddr_low += ((u32)multicast_addr->dmi_addr[4]) << 8;
-				macaddr_low += (u32)multicast_addr->dmi_addr[5];
+				macaddr_low   = ((u32)multicast_addr->dmi_addr[2]) << 24;
+				macaddr_low  += ((u32)multicast_addr->dmi_addr[3]) << 16;
+				macaddr_low  += ((u32)multicast_addr->dmi_addr[4]) << 8;
+				macaddr_low  += (u32)multicast_addr->dmi_addr[5];
 				nes_write_indexed(nesdev,
 						perfect_filter_register_address+(mc_index * 8),
 						macaddr_low);
@@ -910,23 +912,23 @@ static void nes_netdev_set_multicast_lis
 /**
  * nes_netdev_change_mtu
  */
-static int nes_netdev_change_mtu(struct	net_device *netdev,	int	new_mtu)
+static int nes_netdev_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct nes_vnic	*nesvnic = netdev_priv(netdev);
-	struct nes_device *nesdev =	nesvnic->nesdev;
-	int	ret	= 0;
-	u8 jumbomode=0;
+	struct nes_device *nesdev = nesvnic->nesdev;
+	int ret = 0;
+	u8 jumbomode = 0;
 
-	if ((new_mtu < ETH_ZLEN) ||	(new_mtu > max_mtu))
+	if ((new_mtu < ETH_ZLEN) || (new_mtu > max_mtu))
 		return -EINVAL;
 
-	netdev->mtu	= new_mtu;
+	netdev->mtu = new_mtu;
 	nesvnic->max_frame_size	= new_mtu + VLAN_ETH_HLEN;
 
 	if (netdev->mtu	> 1500)	{
 		jumbomode=1;
 	}
-	nes_nic_init_timer_defaults(nesdev,	jumbomode);
+	nes_nic_init_timer_defaults(nesdev, jumbomode);
 
 	if (netif_running(netdev)) {
 		nes_netdev_stop(netdev);
@@ -1225,14 +1227,14 @@ static int nes_netdev_set_coalesce(struc
 		struct ethtool_coalesce	*et_coalesce)
 {
 	struct nes_vnic	*nesvnic = netdev_priv(netdev);
-	struct nes_device *nesdev =	nesvnic->nesdev;
+	struct nes_device *nesdev = nesvnic->nesdev;
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	struct nes_hw_tune_timer *shared_timer = &nesadapter->tune_timer;
 	unsigned long flags;
 
-	spin_lock_irqsave(&nesadapter->periodic_timer_lock,	flags);
+	spin_lock_irqsave(&nesadapter->periodic_timer_lock, flags);
 	if (et_coalesce->rx_max_coalesced_frames_low) {
-		shared_timer->threshold_low	 = et_coalesce->rx_max_coalesced_frames_low;
+		shared_timer->threshold_low = et_coalesce->rx_max_coalesced_frames_low;
 	}
 	if (et_coalesce->rx_max_coalesced_frames_irq) {
 		shared_timer->threshold_target = et_coalesce->rx_max_coalesced_frames_irq;
@@ -1252,14 +1254,14 @@ static int nes_netdev_set_coalesce(struc
 	nesadapter->et_rx_coalesce_usecs_irq = et_coalesce->rx_coalesce_usecs_irq;
 	if (et_coalesce->use_adaptive_rx_coalesce) {
 		nesadapter->et_use_adaptive_rx_coalesce	= 1;
-		nesadapter->timer_int_limit	= NES_TIMER_INT_LIMIT_DYNAMIC;
+		nesadapter->timer_int_limit = NES_TIMER_INT_LIMIT_DYNAMIC;
 		nesadapter->et_rx_coalesce_usecs_irq = 0;
 		if (et_coalesce->pkt_rate_low) {
-			nesadapter->et_pkt_rate_low	= et_coalesce->pkt_rate_low;
+			nesadapter->et_pkt_rate_low = et_coalesce->pkt_rate_low;
 		}
 	} else {
 		nesadapter->et_use_adaptive_rx_coalesce	= 0;
-		nesadapter->timer_int_limit	= NES_TIMER_INT_LIMIT;
+		nesadapter->timer_int_limit = NES_TIMER_INT_LIMIT;
 		if (nesadapter->et_rx_coalesce_usecs_irq) {
 			nes_write32(nesdev->regs+NES_PERIODIC_CONTROL,
 					0x80000000 | ((u32)(nesadapter->et_rx_coalesce_usecs_irq*8)));
@@ -1276,28 +1278,28 @@ static int nes_netdev_get_coalesce(struc
 		struct ethtool_coalesce	*et_coalesce)
 {
 	struct nes_vnic	*nesvnic = netdev_priv(netdev);
-	struct nes_device *nesdev =	nesvnic->nesdev;
+	struct nes_device *nesdev = nesvnic->nesdev;
 	struct nes_adapter *nesadapter = nesdev->nesadapter;
 	struct ethtool_coalesce	temp_et_coalesce;
 	struct nes_hw_tune_timer *shared_timer = &nesadapter->tune_timer;
 	unsigned long flags;
 
 	memset(&temp_et_coalesce, 0, sizeof(temp_et_coalesce));
-	temp_et_coalesce.rx_coalesce_usecs_irq = nesadapter->et_rx_coalesce_usecs_irq;
-	temp_et_coalesce.use_adaptive_rx_coalesce =	nesadapter->et_use_adaptive_rx_coalesce;
-	temp_et_coalesce.rate_sample_interval =	nesadapter->et_rate_sample_interval;
+	temp_et_coalesce.rx_coalesce_usecs_irq    = nesadapter->et_rx_coalesce_usecs_irq;
+	temp_et_coalesce.use_adaptive_rx_coalesce = nesadapter->et_use_adaptive_rx_coalesce;
+	temp_et_coalesce.rate_sample_interval     = nesadapter->et_rate_sample_interval;
 	temp_et_coalesce.pkt_rate_low =	nesadapter->et_pkt_rate_low;
 	spin_lock_irqsave(&nesadapter->periodic_timer_lock,	flags);
-	temp_et_coalesce.rx_max_coalesced_frames_low =	shared_timer->threshold_low;
-	temp_et_coalesce.rx_max_coalesced_frames_irq =	shared_timer->threshold_target;
+	temp_et_coalesce.rx_max_coalesced_frames_low  = shared_timer->threshold_low;
+	temp_et_coalesce.rx_max_coalesced_frames_irq  = shared_timer->threshold_target;
 	temp_et_coalesce.rx_max_coalesced_frames_high = shared_timer->threshold_high;
-	temp_et_coalesce.rx_coalesce_usecs_low = shared_timer->timer_in_use_min;
+	temp_et_coalesce.rx_coalesce_usecs_low  = shared_timer->timer_in_use_min;
 	temp_et_coalesce.rx_coalesce_usecs_high = shared_timer->timer_in_use_max;
 	if (nesadapter->et_use_adaptive_rx_coalesce) {
 		temp_et_coalesce.rx_coalesce_usecs_irq = shared_timer->timer_in_use;
 	}
 	spin_unlock_irqrestore(&nesadapter->periodic_timer_lock, flags);
-	memcpy(et_coalesce,	&temp_et_coalesce, sizeof(*et_coalesce));
+	memcpy(et_coalesce, &temp_et_coalesce, sizeof(*et_coalesce));
 	return 0;
 }
 
@@ -1376,7 +1378,7 @@ static int nes_netdev_get_settings(struc
 	u16 phy_data;
 
 	et_cmd->duplex = DUPLEX_FULL;
-	et_cmd->port = PORT_MII;
+	et_cmd->port   = PORT_MII;
 
 	if (nesadapter->OneG_Mode) {
 		et_cmd->speed = SPEED_1000;
@@ -1401,13 +1403,13 @@ static int nes_netdev_get_settings(struc
 		if ((nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_IRIS) ||
 		    (nesadapter->phy_type[nesdev->mac_index] == NES_PHY_TYPE_ARGUS)) {
 			et_cmd->transceiver = XCVR_EXTERNAL;
-			et_cmd->port = PORT_FIBRE;
-			et_cmd->supported = SUPPORTED_FIBRE;
+			et_cmd->port        = PORT_FIBRE;
+			et_cmd->supported   = SUPPORTED_FIBRE;
 			et_cmd->advertising = ADVERTISED_FIBRE;
 			et_cmd->phy_address = nesadapter->phy_index[nesdev->mac_index];
 		} else {
 			et_cmd->transceiver = XCVR_INTERNAL;
-			et_cmd->supported = SUPPORTED_10000baseT_Full;
+			et_cmd->supported   = SUPPORTED_10000baseT_Full;
 			et_cmd->advertising = ADVERTISED_10000baseT_Full;
 			et_cmd->phy_address = nesdev->mac_index;
 		}
@@ -1438,7 +1440,7 @@ static int nes_netdev_set_settings(struc
 			/* Turn on Full duplex, Autoneg, and restart autonegotiation */
 			phy_data |= 0x1300;
 		} else {
-			// Turn off autoneg
+			/* Turn off autoneg */
 			phy_data &= ~0x1000;
 		}
 		nes_write_1G_phy_reg(nesdev, 0, nesadapter->phy_index[nesdev->mac_index],
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index ee74f7c..3436430 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -1266,7 +1266,7 @@ static struct ib_qp *nes_create_qp(struc
 			sq_size = init_attr->cap.max_send_wr;
 			rq_size = init_attr->cap.max_recv_wr;
 
-			// check if the encoded sizes are OK or not...
+			/* check if the encoded sizes are OK or not... */
 			sq_encoded_size = nes_get_encoded_size(&sq_size);
 			rq_encoded_size = nes_get_encoded_size(&rq_size);
 

From rdreier at cisco.com  Tue Apr 29 13:32:19 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 13:32:19 -0700
Subject: [ofa-general] Re: [ PATCH 3/3 v2 ] RDMA/nes Formatting cleanup
In-Reply-To: <200804292025.m3TKP1im023075@velma.neteffect.com> (Glenn
	Streiff's message of "Tue, 29 Apr 2008 15:25:01 -0500")
References: <200804292025.m3TKP1im023075@velma.neteffect.com>
Message-ID: <adazlrctnak.fsf@cisco.com>

All looks fine, I applied all three of your patches.

Thanks


From rdreier at cisco.com  Tue Apr 29 13:57:33 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 13:57:33 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <adamynctm4i.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a last batch of changes before 2.6.26-rc1:

Eli Cohen (2):
      IPoIB: Use separate CQ for UD send completions
      IPoIB: Copy child MTU from parent

Eli Dorfman (2):
      IB/iser: Move high-volume debug output to higher debug level
      IB/iser: Count FMR alignment violations per session

Eric Schneider (1):
      RDMA/nes: Add support for SFP+ PHY

Faisal Latif (1):
      RDMA/nes: Use LRO

Glenn Streiff (1):
      RDMA/nes: Formatting cleanup

Hoang-Nam Nguyen (1):
      IB/ehca: handle negative return value from ibmebus_request_irq() properly

Olaf Kirch (2):
      mlx4_core: Avoid recycling old FMR R_Keys too soon
      IB/mthca: Avoid recycling old FMR R_Keys too soon

Roland Dreier (1):
      IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute

Stefan Roscher (1):
      IB/ehca: Allocate event queue size depending on max number of CQs and QPs

Steve Wise (3):
      RDMA/cxgb3: Correctly serialize peer abort path
      RDMA/cxgb3: Set the max_mr_size device attribute correctly
      RDMA/cxgb3: Support peer-2-peer connection setup

Yevgeny Petrilin (1):
      mlx4_core: Add a way to set the "collapsed" CQ flag

 drivers/infiniband/hw/cxgb3/cxio_hal.c       |   18 ++-
 drivers/infiniband/hw/cxgb3/cxio_hal.h       |    1 +
 drivers/infiniband/hw/cxgb3/cxio_wr.h        |   21 ++-
 drivers/infiniband/hw/cxgb3/iwch.c           |    1 +
 drivers/infiniband/hw/cxgb3/iwch.h           |    1 +
 drivers/infiniband/hw/cxgb3/iwch_cm.c        |  167 ++++++++----
 drivers/infiniband/hw/cxgb3/iwch_cm.h        |    2 +
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |    2 +-
 drivers/infiniband/hw/cxgb3/iwch_provider.h  |    3 +
 drivers/infiniband/hw/cxgb3/iwch_qp.c        |   60 ++++-
 drivers/infiniband/hw/ehca/ehca_classes.h    |    5 +
 drivers/infiniband/hw/ehca/ehca_cq.c         |   11 +
 drivers/infiniband/hw/ehca/ehca_eq.c         |   35 ++--
 drivers/infiniband/hw/ehca/ehca_main.c       |   36 +++-
 drivers/infiniband/hw/ehca/ehca_qp.c         |   26 ++-
 drivers/infiniband/hw/mlx4/cq.c              |    2 +-
 drivers/infiniband/hw/mthca/mthca_mr.c       |   13 -
 drivers/infiniband/hw/mthca/mthca_provider.c |   14 +-
 drivers/infiniband/hw/mthca/mthca_provider.h |    1 +
 drivers/infiniband/hw/mthca/mthca_user.h     |   10 +-
 drivers/infiniband/hw/nes/Kconfig            |    1 +
 drivers/infiniband/hw/nes/nes.c              |    4 +
 drivers/infiniband/hw/nes/nes.h              |    5 +-
 drivers/infiniband/hw/nes/nes_cm.c           |    8 +-
 drivers/infiniband/hw/nes/nes_hw.c           |  371 ++++++++++++++++++++------
 drivers/infiniband/hw/nes/nes_hw.h           |   19 +-
 drivers/infiniband/hw/nes/nes_nic.c          |  180 ++++++++-----
 drivers/infiniband/hw/nes/nes_utils.c        |   10 +-
 drivers/infiniband/hw/nes/nes_verbs.c        |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib.h         |    7 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c      |    8 +-
 drivers/infiniband/ulp/ipoib/ipoib_ethtool.c |    2 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c      |   45 ++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c    |    3 +-
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c   |   39 ++-
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c    |    3 +
 drivers/infiniband/ulp/iser/iscsi_iser.c     |    4 +-
 drivers/infiniband/ulp/iser/iscsi_iser.h     |    7 +
 drivers/infiniband/ulp/iser/iser_memory.c    |    9 +-
 drivers/net/cxgb3/version.h                  |    2 +-
 drivers/net/mlx4/cq.c                        |    4 +-
 drivers/net/mlx4/mr.c                        |    6 -
 include/linux/mlx4/device.h                  |    3 +-
 include/scsi/libiscsi.h                      |    1 +
 44 files changed, 845 insertions(+), 327 deletions(-)


From rdreier at cisco.com  Tue Apr 29 14:41:48 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 14:41:48 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <adahcdkwmzr.fsf@cisco.com> (Roland Dreier's message of "Tue, 29
	Apr 2008 11:10:32 -0700")
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
Message-ID: <adawsmgbaoz.fsf@cisco.com>

Umm... a little late now that I asked Linus to pull but I realized that
this patch is somewhat buggy by design:

You make the send CQ unsignaled, so you never get TX completion events.
But this means that if the send queue ever fills up, we'll do
netif_stop_queue() and then never reap a TX completion to restart the
queue... I guess if we do netif_stop_queue() then we had better start a
timer or something to kick us sometime in the future.

Or we could request an event for the send CQ only when the send queue
is full.

But polling from either a timer or CQ event leads to locking issues
against polling from the send path...

 - R.


From rdreier at cisco.com  Tue Apr 29 14:49:37 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Tue, 29 Apr 2008 14:49:37 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <adawsmgbaoz.fsf@cisco.com> (Roland Dreier's message of "Tue, 29
	Apr 2008 14:41:48 -0700")
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com>
Message-ID: <adaskx4baby.fsf@cisco.com>

By the way, this isn't just theoretical -- I'm not smart enough to
realize this except that I just saw:

    ib1: TX ring full, stopping kernel net queue
    NETDEV WATCHDOG: ib1: transmit timed out
    ib1: transmit timeout: latency 1240 msecs
    ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255

and of course it never recovers.


From akepner at sgi.com  Tue Apr 29 15:16:22 2008
From: akepner at sgi.com (akepner at sgi.com)
Date: Tue, 29 Apr 2008 15:16:22 -0700
Subject: IPoIB - "TX ring full" (was: Re: [ofa-general] Re: [PATCH v2]
	IB/ipoib: Split CQs for IPOIB UD)
In-Reply-To: <adaskx4baby.fsf@cisco.com>
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com> <adaskx4baby.fsf@cisco.com>
Message-ID: <20080429221622.GL30919@sgi.com>

On Tue, Apr 29, 2008 at 02:49:37PM -0700, Roland Dreier wrote:
> By the way, this isn't just theoretical -- I'm not smart enough to
> realize this except that I just saw:
> 
>     ib1: TX ring full, stopping kernel net queue
>     NETDEV WATCHDOG: ib1: transmit timed out
>     ib1: transmit timeout: latency 1240 msecs
>     ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255
> 

It's very interesting to me that you mention this. I'm in the 
midst of debugging a similar problem, but with IPoIB circa 
OFED 1.2.

Found 2 problems:

1) In connected mode it's possible to get into a situation where 
   one (or more) IPoIB-CM send queues fill up (no completions 
   ever happen for them for some reason), while all the other 
   CM send queues are empty. Of course the empty TX queues don't 
   generate completions either, so nothing ever restarts the 
   xmit queue and one bad connection kills IPoIB. We have had 
   IPoIB stuck "forever" in this situation. Simple, brutal fix is 
   to do ipoib_flush_paths() in ipoib_timeout().

2) We also see situations very similar to what you describe above. 
   The IPoIB-UD send queue fills and never restarts. (Of course 
   it's nothing to do with the patch that was being discussed in 
   this thread, this is with OFED 1.2-rc2, and also OFED 1.2.)

I don't see how case (2) is possible with circa OFED 1.2 code. Can 
anyone clue me in? 

-- 
Arthur


From arlin.r.davis at intel.com  Tue Apr 29 19:45:27 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Tue, 29 Apr 2008 19:45:27 -0700
Subject: [ofa-general] [PATCH 1/1][dat1.2] dat: cleanup error handling with
	static registry parsing of dat.conf
Message-ID: <000001c8aa6c$41658020$daba020a@amr.corp.intel.com>


change asserts to return codes, add log messages, and
report errors via open instead of asserts during dat
library load.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dat/udat/linux/dat_osd.c  |    2 +-
 dat/udat/udat_sr_parser.c |  382 +++++++++++++++++----------------------------
 2 files changed, 144 insertions(+), 240 deletions(-)

diff --git a/dat/udat/linux/dat_osd.c b/dat/udat/linux/dat_osd.c
index d6a5747..e1725e5 100644
--- a/dat/udat/linux/dat_osd.c
+++ b/dat/udat/linux/dat_osd.c
@@ -76,7 +76,7 @@ typedef enum
  *                                                                   *
  *********************************************************************/
 
-static DAT_OS_DBG_TYPE_VAL 	g_dbg_type = 0;
+static DAT_OS_DBG_TYPE_VAL 	g_dbg_type = DAT_OS_DBG_TYPE_ERROR;
 static DAT_OS_DBG_DEST 		g_dbg_dest = DAT_OS_DBG_DEST_STDOUT;
 
 
diff --git a/dat/udat/udat_sr_parser.c b/dat/udat/udat_sr_parser.c
index 64c4114..3959268 100644
--- a/dat/udat/udat_sr_parser.c
+++ b/dat/udat/udat_sr_parser.c
@@ -293,7 +293,7 @@ dat_sr_load (void)
     sr_file = dat_os_fopen (sr_path);
     if ( sr_file == NULL )
     {
-	return DAT_INTERNAL_ERROR;
+	goto bail;
     }
 
     for (;;)
@@ -308,17 +308,22 @@ dat_sr_load (void)
 	}
 	else
 	{
-	    dat_os_assert (!"unable to parse static registry file");
-	    break;
+            goto cleanup;
 	}
     }
 
-    if ( 0 != dat_os_fclose (sr_file) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
+    if (0 != dat_os_fclose (sr_file))
+	goto bail;
 
     return DAT_SUCCESS;
+
+cleanup:
+    dat_os_fclose(sr_file);	
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     "ERROR: unable to parse static registry file, dat.conf\n");
+    return DAT_INTERNAL_ERROR;
+
 }
 
 
@@ -570,33 +575,22 @@ dat_sr_parse_ia_name (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
 
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->ia_name = token.value;
-
-	status = DAT_SUCCESS;
-    }
-
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
+    if (DAT_SR_TOKEN_STRING != token.type) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
+    entry->ia_name = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " ia_name, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -610,39 +604,26 @@ dat_sr_parse_api (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_api (
-		  token.value, &entry->api_version) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
 
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_api(token.value, &entry->api_version))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " api_ver, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -656,39 +637,27 @@ dat_sr_parse_thread_safety (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_thread_safety (
-		  token.value, &entry->is_thread_safe) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
 
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_thread_safety(
+			token.value, &entry->is_thread_safe))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " thread_safety, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -702,39 +671,26 @@ dat_sr_parse_default (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_default (
-		  token.value, &entry->is_default) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
 
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_default(token.value, &entry->is_default))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " default section, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -748,33 +704,22 @@ dat_sr_parse_lib_path (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->lib_path = token.value;
-
-	status = DAT_SUCCESS;
-    }
-
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SUCCESS != dat_sr_get_token(file, &token))
+	goto bail;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
+    if (DAT_SR_TOKEN_STRING != token.type) {
+ 	dat_sr_put_token (file, &token);
+	goto bail;
     }
+    entry->lib_path = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " lib_path, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 /***********************************************************************
@@ -787,42 +732,29 @@ dat_sr_parse_provider_version (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
 
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_provider_version (
-		  token.value, &entry->provider_version) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
-
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_provider_version(
+			token.value, &entry->provider_version))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) * dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " provider_ver, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
-
 /***********************************************************************
  * Function: dat_sr_parse_ia_params
  ***********************************************************************/
@@ -833,33 +765,23 @@ dat_sr_parse_ia_params (
     DAT_SR_CONF_ENTRY *entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->ia_params = token.value;
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-	status = DAT_SUCCESS;
+    if (DAT_SR_TOKEN_STRING != token.type) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    entry->ia_params = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " ia_params, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -873,33 +795,23 @@ dat_sr_parse_platform_params (
     DAT_SR_CONF_ENTRY *entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->platform_params = token.value;
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-	status = DAT_SUCCESS;
+    if (DAT_SR_TOKEN_STRING != token.type) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    entry->platform_params = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " platform_params, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -913,32 +825,23 @@ dat_sr_parse_eoe (
     DAT_SR_CONF_ENTRY *entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( (DAT_SR_TOKEN_EOF != token.type) &&
-	 (DAT_SR_TOKEN_EOR != token.type) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	status = DAT_SUCCESS;
-    }
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
+    if ((DAT_SR_TOKEN_EOF != token.type) &&
+	(DAT_SR_TOKEN_EOR != token.type)) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
+    
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " EOR, EOF, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -954,7 +857,8 @@ dat_sr_convert_api (
     int i;
     int minor_i;
 
-    dat_os_assert ( 0 < dat_os_strlen (str) );
+    if (dat_os_strlen(str) <= 0)
+	return DAT_INTERNAL_ERROR;
 
     if ( 'u' == str[0] )
     {
@@ -1078,8 +982,8 @@ dat_sr_convert_provider_version (
     int 		i;
     int 		decimal_i;
 
-    dat_os_assert ( 0 < dat_os_strlen (str) );
-    dat_os_assert ( NULL == provider_version->id );
+    if ((dat_os_strlen(str) <= 0) || (NULL != provider_version->id))
+	return DAT_INTERNAL_ERROR;
 
     status = DAT_SUCCESS;
 
-- 
1.5.2.5


From arlin.r.davis at intel.com  Tue Apr 29 19:45:30 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Tue, 29 Apr 2008 19:45:30 -0700
Subject: [ofa-general] [PATCH 1/1][v2.0] dat: cleanup error handling with
	static registry parsing of dat.conf
Message-ID: <B0095134066CC94FBC80973103FFA1FE06F720AD@orsmsx416.amr.corp.intel.com>


change asserts to return codes, add log messages, and
report errors via open instead of asserts during dat
library load.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dat/udat/linux/dat_osd.c  |    2 +-
 dat/udat/udat_sr_parser.c |  382
+++++++++++++++++----------------------------
 2 files changed, 144 insertions(+), 240 deletions(-)

diff --git a/dat/udat/linux/dat_osd.c b/dat/udat/linux/dat_osd.c
index fa76c12..7305168 100644
--- a/dat/udat/linux/dat_osd.c
+++ b/dat/udat/linux/dat_osd.c
@@ -76,7 +76,7 @@ typedef enum
  *                                                                   *
  *********************************************************************/
 
-static DAT_OS_DBG_TYPE_VAL 	g_dbg_type = 0;
+static DAT_OS_DBG_TYPE_VAL 	g_dbg_type = DAT_OS_DBG_TYPE_ERROR;
 static DAT_OS_DBG_DEST 		g_dbg_dest =
DAT_OS_DBG_DEST_STDOUT;
 
 
diff --git a/dat/udat/udat_sr_parser.c b/dat/udat/udat_sr_parser.c
index 5761e3b..904acff 100644
--- a/dat/udat/udat_sr_parser.c
+++ b/dat/udat/udat_sr_parser.c
@@ -297,7 +297,7 @@ dat_sr_load (void)
     sr_file = dat_os_fopen (sr_path);
     if ( sr_file == NULL )
     {
-	return DAT_INTERNAL_ERROR;
+	goto bail;
     }
 
     for (;;)
@@ -312,17 +312,22 @@ dat_sr_load (void)
 	}
 	else
 	{
-	    dat_os_assert (!"unable to parse static registry file");
-	    break;
+            goto cleanup;
 	}
     }
 
-    if ( 0 != dat_os_fclose (sr_file) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
+    if (0 != dat_os_fclose (sr_file))
+	goto bail;
 
     return DAT_SUCCESS;
+
+cleanup:
+    dat_os_fclose(sr_file);	
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     "ERROR: unable to parse static registry file,
dat.conf\n");
+    return DAT_INTERNAL_ERROR;
+
 }
 
 
@@ -574,33 +579,22 @@ dat_sr_parse_ia_name (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
 
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->ia_name = token.value;
-
-	status = DAT_SUCCESS;
-    }
-
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
+    if (DAT_SR_TOKEN_STRING != token.type) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
+    entry->ia_name = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " ia_name, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -614,39 +608,26 @@ dat_sr_parse_api (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_api (
-		  token.value, &entry->api_version) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
 
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_api(token.value,
&entry->api_version))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) *
dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " api_ver, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -660,39 +641,27 @@ dat_sr_parse_thread_safety (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_thread_safety (
-		  token.value, &entry->is_thread_safe) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
 
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_thread_safety(
+			token.value, &entry->is_thread_safe))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) *
dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " thread_safety, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -706,39 +675,26 @@ dat_sr_parse_default (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_default (
-		  token.value, &entry->is_default) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
 
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_default(token.value,
&entry->is_default))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) *
dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " default section, file offset=%ld\n",
ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -752,33 +708,22 @@ dat_sr_parse_lib_path (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->lib_path = token.value;
-
-	status = DAT_SUCCESS;
-    }
-
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SUCCESS != dat_sr_get_token(file, &token))
+	goto bail;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
+    if (DAT_SR_TOKEN_STRING != token.type) {
+ 	dat_sr_put_token (file, &token);
+	goto bail;
     }
+    entry->lib_path = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " lib_path, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
/***********************************************************************
@@ -791,42 +736,29 @@ dat_sr_parse_provider_version (
     DAT_SR_CONF_ENTRY 	*entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
 
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else if ( DAT_SUCCESS != dat_sr_convert_provider_version (
-		  token.value, &entry->provider_version) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	dat_os_free (token.value,
-		    (sizeof (char) * dat_os_strlen (token.value)) + 1);
-
-	status = DAT_SUCCESS;
-    }
+    if (DAT_SR_TOKEN_STRING != token.type)
+	goto cleanup;
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    if (DAT_SUCCESS != dat_sr_convert_provider_version(
+			token.value, &entry->provider_version))
+	goto cleanup;
+    
+    dat_os_free(token.value, (sizeof(char) *
dat_os_strlen(token.value))+1);
+    return DAT_SUCCESS;
 
-    return status;
+cleanup:
+    dat_sr_put_token (file, &token);
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " provider_ver, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
-
 
/***********************************************************************
  * Function: dat_sr_parse_ia_params
 
***********************************************************************/
@@ -837,33 +769,23 @@ dat_sr_parse_ia_params (
     DAT_SR_CONF_ENTRY *entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->ia_params = token.value;
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-	status = DAT_SUCCESS;
+    if (DAT_SR_TOKEN_STRING != token.type) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    entry->ia_params = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " ia_params, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -877,33 +799,23 @@ dat_sr_parse_platform_params (
     DAT_SR_CONF_ENTRY *entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
 
-    if ( DAT_SR_TOKEN_STRING != token.type )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	entry->platform_params = token.value;
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-	status = DAT_SUCCESS;
+    if (DAT_SR_TOKEN_STRING != token.type) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
-
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
-    }
+    entry->platform_params = token.value;
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " platform_params, file offset=%ld\n",
ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -917,32 +829,23 @@ dat_sr_parse_eoe (
     DAT_SR_CONF_ENTRY *entry)
 {
     DAT_SR_TOKEN 	token;
-    DAT_RETURN 		status;
-
-    if ( DAT_SUCCESS != dat_sr_get_token (file, &token) )
-    {
-	return DAT_INTERNAL_ERROR;
-    }
-
-    if ( (DAT_SR_TOKEN_EOF != token.type) &&
-	 (DAT_SR_TOKEN_EOR != token.type) )
-    {
-	status = DAT_INTERNAL_ERROR;
-    }
-    else
-    {
-	status = DAT_SUCCESS;
-    }
 
-    if ( DAT_SUCCESS != status )
-    {
-	DAT_RETURN 	status_success;
+    if (DAT_SUCCESS != dat_sr_get_token (file, &token))
+	goto bail;
 
-	status_success = dat_sr_put_token (file, &token);
-	dat_os_assert ( DAT_SUCCESS == status_success);
+    if ((DAT_SR_TOKEN_EOF != token.type) &&
+	(DAT_SR_TOKEN_EOR != token.type)) {
+	dat_sr_put_token (file, &token);
+	goto bail;
     }
+    
+    return DAT_SUCCESS;
 
-    return status;
+bail:
+    dat_os_dbg_print(DAT_OS_DBG_TYPE_ERROR, 
+		     " ERR: corrupt dat.conf entry field:"
+		     " EOR, EOF, file offset=%ld\n", ftell(file));
+    return DAT_INTERNAL_ERROR;
 }
 
 
@@ -958,7 +861,8 @@ dat_sr_convert_api (
     int i;
     int minor_i;
 
-    dat_os_assert ( 0 < dat_os_strlen (str) );
+    if (dat_os_strlen(str) <= 0)
+	return DAT_INTERNAL_ERROR;
 
     if ( 'u' == str[0] )
     {
@@ -1082,8 +986,8 @@ dat_sr_convert_provider_version (
     int 		i;
     int 		decimal_i;
 
-    dat_os_assert ( 0 < dat_os_strlen (str) );
-    dat_os_assert ( NULL == provider_version->id );
+    if ((dat_os_strlen(str) <= 0) || (NULL != provider_version->id))
+	return DAT_INTERNAL_ERROR;
 
     status = DAT_SUCCESS;
 
-- 
1.5.2.5


From okir at lst.de  Tue Apr 29 22:44:58 2008
From: okir at lst.de (Olaf Kirch)
Date: Wed, 30 Apr 2008 07:44:58 +0200
Subject: [ofa-general] Re: [PATCH 2/8]: mthca/mlx4: avoid recycling old
	FMR R_Keys too soon
In-Reply-To: <adazlrcv7pu.fsf@cisco.com>
References: <200804241106.57172.okir@lst.de> <200804241109.52448.okir@lst.de>
	<adazlrcv7pu.fsf@cisco.com>
Message-ID: <200804300744.59654.okir@lst.de>

On Tuesday 29 April 2008 20:25:49 Roland Dreier wrote:
>  > Content-Transfer-Encoding: quoted-printable
> 
> ugh, mangled patch.

Argh, sorry.

/me whacks his WIMPish mailer

Olaf
-- 
Olaf Kirch  |  --- o --- Nous sommes du soleil we love when we play
okir at lst.de |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax


From erezz at voltaire.com  Wed Apr 30 04:54:14 2008
From: erezz at voltaire.com (Erez Zilber)
Date: Wed, 30 Apr 2008 14:54:14 +0300
Subject: [Stgt-devel] [ofa-general] Re: [Ips] Calculating the VA
	iniSER	header
References: <4804B03C.6060507@voltaire.com><OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com><694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com><20080416144830.GC23861@osc.edu>
	<adaskxlls4u.fsf@cisco.com><694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com>
	<20080429170516.GA8857@osc.edu>
	<BLU117-W2898E0FCDF826F8FBFE13ED7D80@phx.gbl>
Message-ID: <39C75744D164D948A170E9792AF8E7CAF60D50@exil.voltaire.com>

> Hi all,
> 
> It appears the current Linux iSER initiator does not send the HELLO message when the connection transits to full feature phase. The stgt target also ignores this message (if it were to appear). Both of these implementations use a non-conformant iSER header (they add write_va and read_va fields, which incidentally do not appear to be used). Are these changes documented anywhere in the IB domain, or are these variations needed for another reason?
> 
> If these deviations from the RFC are not needed and were to be fixed (along with the offset fix), then these implementations can detect the current mode of operation by examining the size > of the iSER header received. The choice to proceed in the broken way, or to terminate the connection (with big loud error messages) is the implementor's choice. Either way, the issue is detected and corruption avoided.
> 
> Thoughts?

Take a look at the iSER for IB annex:
 
http://www.infinibandta.org/members/spec/Annex_iSER.PDF

Erez


From kensandars at hotmail.com  Wed Apr 30 00:43:19 2008
From: kensandars at hotmail.com (Ken Sandars)
Date: Wed, 30 Apr 2008 17:43:19 +1000
Subject: ***SPAM*** RE: [Stgt-devel] [ofa-general] Re: [Ips] Calculating the
	VA in iSER header
In-Reply-To: <20080429170516.GA8857@osc.edu>
References: <4804B03C.6060507@voltaire.com>
	<OFA528E763.71479425-ON8525742C.005B02F4-8825742C.005F18F1@us.ibm.com>
	<694d48600804160122l1cc97b8aka8986ee6deb7dec8@mail.gmail.com>
	<20080416144830.GC23861@osc.edu> <adaskxlls4u.fsf@cisco.com>
	<694d48600804170413g4d54cd9g447abd345a1f6301@mail.gmail.com> 
	<20080429170516.GA8857@osc.edu>
Message-ID: <BLU117-W2898E0FCDF826F8FBFE13ED7D80@phx.gbl>

Hi all,

It appears the current Linux iSER initiator does not send the HELLO message when the connection transits to full feature phase. The stgt target also ignores this message (if it were to appear). Both of these implementations use a non-conformant iSER header (they add write_va and read_va fields, which incidentally do not appear to be used). Are these changes documented anywhere in the IB domain, or are these variations needed for another reason?

If these deviations from the RFC are not needed and were to be fixed (along with the offset fix), then these implementations can detect the current mode of operation by examining the size of the iSER header received. The choice to proceed in the broken way, or to terminate the connection (with big loud error messages) is the implementor's choice. Either way, the issue is detected and corruption avoided.

Thoughts?

Cheers
Ken


> Date: Tue, 29 Apr 2008 13:05:16 -0400
> From: pw at osc.edu
> To: dorfman.eli at gmail.com
> CC: stgt-devel at lists.berlios.de; rdreier at cisco.com; general at lists.openfabrics.org; mako at almaden.ibm.com; ips at ietf.org; open-iscsi at googlegroups.com
> Subject: Re: [Stgt-devel] [ofa-general] Re: [Ips] Calculating the VA in iSER	header
> 
> dorfman.eli at gmail.com wrote on Thu, 17 Apr 2008 14:13 +0300:
> > On Wed, Apr 16, 2008 at 6:46 PM, Roland Dreier <rdreier at cisco.com> wrote:
> > >  > Agree with the interpretation of the spec, and it's probably a bit
> > >   > clearer that way too.  But we have working initiators and targets
> > >   > that do it the "wrong" way.
> > >
> > >  Yes... I guess the key question is whether there are any initiators that
> > >  do things the "right" way.
> > >
> > >
> > >   > 1. Flag day: all initiators and targets change at the same time.
> > >   > Will see data corruption if someone unluckily runs one or the other
> > >   > using old non-fixed code.
> > >
> > >  Seems unacceptable to me... it doesn't make sense at all to break every
> > >  setup in the world just to be "right" according to the spec.
> > 
> > This will break only when both initiator and target will use
> > InitialR2T=No, which means allow unsolicited data.
> > As far as I know, STGT is not very common (and its version in RHEL5.1
> > is considered experimental). Its default is also InitialR2T=Yes.
> > Voltaire's iSCSI over iSER target also uses default InitialR2T=Yes.
> > So it seems that nothing will break.
> 
> I finally got a chance to look at this just now.  I think you mean
> default is InitialR2T=No above, which means no unsolicited data.
> That is the default case, and true, the two different meanings
> of the initiator-supplied VA coincide.
> 
> But you missed the impact of immediate data.  We run with the
> defaults (I think) that say the first write request packet should be
> filled with a bit of the coming data stream.  From iscsid.conf:
> 
>     # To enable immediate data (i.e., the initiator sends unsolicited data
>     # with the iSCSI command packet), uncomment the following line:
>     #
>     # The default is Yes
>     node.session.iscsi.ImmediateData = Yes
> 
> Looking at the offset printed out by your patch, it is indeed
> non-zero for the first RDMA read.  Please correct me if I am
> mistaken about this---you must have tested all four variations of
> with and without the patches on initiator and target side, but I did
> not.
> 
> Hence I am still a bit unhappy about having to deal with the
> fallout, with no way to detect it.  For our local use, I'll keep an
> older version of stgt in use until we switch to a new kernel, then
> merge up the target side change.  It is a bother, but I can deal
> with it.  For other institutions, this lockstep upgrade requirement
> will not be obvious until they debug the resulting data corruption.
> 
> Still, I do understand why it would be nice to conform to the spec,
> and it is maybe a bit cleaner that way too.  Maybe you can help with
> the bug reports on stgt-devel during the transition, and maintain
> and publish a patch to let it work with old kernels.
> 
> 		-- Pete
> _______________________________________________
> Stgt-devel mailing list
> Stgt-devel at lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/stgt-devel

_________________________________________________________________
Find the job of your dreams before someone else does
http://mycareer.com.au/?s_cid=596064 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080430/9e698849/attachment.html>

From pw at osc.edu  Wed Apr 30 07:08:25 2008
From: pw at osc.edu (Pete Wyckoff)
Date: Wed, 30 Apr 2008 10:08:25 -0400
Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count
	alignment violations
In-Reply-To: <adaskx60yhd.fsf@cisco.com>
References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com>
	<adaskx60yhd.fsf@cisco.com>
Message-ID: <20080430140825.GB19339@osc.edu>

rdreier at cisco.com wrote on Mon, 28 Apr 2008 08:51 -0700:
>  > Add read only module param to count alignment violations.
> 
> I don't think a module parameter is the way to report statistics from
> the kernel.  Can't you just add a device attribute or something?  Or
> stick a file in debugfs?

This is definitely a worthwhile change though.  By monitoring this
statistic we were able to get good insight to what our apps are
doing to cause these alignment violations.

I have a hacky patch that tries to export it via sysfs, but it
doesn't clean up properly.  The iscsi transport class defines the
sysfs tree and doesn't give hooks to a particular device to
add/change those entries, which is why this approach came out rather
ugly.  Hope Eli is willing to do this the right way; maybe debugfs
is the way to go.

		-- Pete


From monis at Voltaire.COM  Wed Apr 30 07:12:42 2008
From: monis at Voltaire.COM (Moni Shoua)
Date: Wed, 30 Apr 2008 17:12:42 +0300
Subject: [ofa-general] [PATCH] IB/core: handle race between elements in qork
 queues after event 
Message-ID: <48187E5A.7040809@Voltaire.COM>


This patch solves a race between elements in work queues that are 
carried out after an event occurs. When SM address handle becomes i
nvalid and needs an update it is set to NULL and until update_sm_ah() 
is called, any request that needs sm_ah is replied with -EAGAIN return 
status.

Signed-off-by: Moni Levy  <monil at voltaire.com>
Signed-off-by: Moni Shoua <monis at voltaire.com>

---

 drivers/infiniband/core/sa_query.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index cf474ec..19439d8 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -407,15 +407,27 @@ static void update_sm_ah(struct work_str
 
 static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event)
 {
+
 	if (event->event == IB_EVENT_PORT_ERR    ||
 	    event->event == IB_EVENT_PORT_ACTIVE ||
 	    event->event == IB_EVENT_LID_CHANGE  ||
 	    event->event == IB_EVENT_PKEY_CHANGE ||
 	    event->event == IB_EVENT_SM_CHANGE   ||
 	    event->event == IB_EVENT_CLIENT_REREGISTER) {
-		struct ib_sa_device *sa_dev;
-		sa_dev = container_of(handler, typeof(*sa_dev), event_handler);
-
+		unsigned long flags;
+		struct ib_sa_device *sa_dev =
+			container_of(handler, typeof(*sa_dev), event_handler);
+		struct ib_sa_port *port =
+			&sa_dev->port[event->element.port_num - sa_dev->start_port];
+		struct ib_sa_sm_ah *sm_ah;
+
+		spin_lock_irqsave(&port->ah_lock, flags);
+		sm_ah = port->sm_ah;
+		port->sm_ah = NULL;
+		spin_unlock_irqrestore(&port->ah_lock, flags);
+
+		if (sm_ah)
+			kref_put(&sm_ah->ref, free_sm_ah);
 		schedule_work(&sa_dev->port[event->element.port_num -
 					    sa_dev->start_port].update_task);
 	}
@@ -780,6 +792,10 @@ int ib_sa_service_rec_query(struct ib_sa
 		return -ENODEV;
 
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
+	if (!port->sm_ah) {
+		return  -EAGAIN;
+	}
+
 	agent = port->agent;
 
 	if (method != IB_MGMT_METHOD_GET &&
@@ -877,8 +893,12 @@ int ib_sa_mcmember_rec_query(struct ib_s
 		return -ENODEV;
 
 	port  = &sa_dev->port[port_num - sa_dev->start_port];
-	agent = port->agent;
+	if (!port->sm_ah) {
+		return  -EAGAIN;
+	}
 
+	agent = port->agent;
+	
 	query = kmalloc(sizeof *query, gfp_mask);
 	if (!query)
 		return -ENOMEM;


From ogerlitz at voltaire.com  Wed Apr 30 07:16:40 2008
From: ogerlitz at voltaire.com (Or Gerlitz)
Date: Wed, 30 Apr 2008 17:16:40 +0300
Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements in
 qork queues after event
In-Reply-To: <48187E5A.7040809@Voltaire.COM>
References: <48187E5A.7040809@Voltaire.COM>
Message-ID: <48187F48.1090701@voltaire.com>

Moni Shoua wrote:
> any request that needs sm_ah is replied with -EAGAIN return status.
what about ib_sa_path_rec_get()

Or.


From monis at Voltaire.COM  Wed Apr 30 07:37:06 2008
From: monis at Voltaire.COM (Moni Shoua)
Date: Wed, 30 Apr 2008 17:37:06 +0300
Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements
	in qork queues after event
In-Reply-To: <48187F48.1090701@voltaire.com>
References: <48187E5A.7040809@Voltaire.COM> <48187F48.1090701@voltaire.com>
Message-ID: <48188412.305@Voltaire.COM>

Or Gerlitz wrote:
> Moni Shoua wrote:
>> any request that needs sm_ah is replied with -EAGAIN return status.
> what about ib_sa_path_rec_get()

Could you please be more specific? What did I miss?


From monis at Voltaire.COM  Wed Apr 30 07:43:49 2008
From: monis at Voltaire.COM (Moni Shoua)
Date: Wed, 30 Apr 2008 17:43:49 +0300
Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements
	in qork queues after event
In-Reply-To: <48188412.305@Voltaire.COM>
References: <48187E5A.7040809@Voltaire.COM> <48187F48.1090701@voltaire.com>
	<48188412.305@Voltaire.COM>
Message-ID: <481885A5.3070001@Voltaire.COM>

Moni Shoua wrote:
> Or Gerlitz wrote:
>> Moni Shoua wrote:
>>> any request that needs sm_ah is replied with -EAGAIN return status.
>> what about ib_sa_path_rec_get()
> 
> Could you please be more specific? What did I miss?
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
Ok, I think I understand what you mean.
I should put an if in alloc_mad() as well.

thanks


From daniela at georgex.org  Wed Apr 30 07:44:57 2008
From: daniela at georgex.org (Daniela George)
Date: Wed, 30 Apr 2008 07:44:57 -0700
Subject: [ofa-general] Re: HP PCI-X 2-port 4X Fabric (HPC) Adapter
In-Reply-To: <E87DB458899A9040ACD4D627073D9EF6039A24D8@VFOHMLAO11.Enterprise.afmc.ds.af.mil>
References: <E87DB458899A9040ACD4D627073D9EF6039A24D8@VFOHMLAO11.Enterprise.afmc.ds.af.mil>
Message-ID: <1209566697.6137.32.camel@blue>

David, 

The best place to address this question is to the OpenFabrics general
list (general at lists.openfabrics.org).  I have cc'd that list.  

Thanks, 

Daniela

On Wed, 2008-04-30 at 09:17 -0400, Shue, David CTR USAF AFMC AFRL/RITB
wrote:
> I have used the OFED-1.3 software to communicate to the Mellanox HPC I
> use.  However, the OFED-1.3 does not appear to work with the subject
> HPC card.  The card is an HPC 380299-B21.  Is there any information
> you may provide in how to communicate to this card?
> 
>  
> 
> Thank you.
> 
>  
> 
>  
> 
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
> 
>   David Shue                      
> 
>  Systems Specialist        
> 
>   Computer Sciences Corporation                                     
> 
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 
> 
>  
> 
> 


From rdreier at cisco.com  Wed Apr 30 07:59:49 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 07:59:49 -0700
Subject: [ofa-general] Re: HP PCI-X 2-port 4X Fabric (HPC) Adapter
In-Reply-To: <1209566697.6137.32.camel@blue> (Daniela George's message of
	"Wed, 30 Apr 2008 07:44:57 -0700")
References: <E87DB458899A9040ACD4D627073D9EF6039A24D8@VFOHMLAO11.Enterprise.afmc.ds.af.mil>
	<1209566697.6137.32.camel@blue>
Message-ID: <adaprs79ymy.fsf@cisco.com>

 > > I have used the OFED-1.3 software to communicate to the Mellanox HPC I
 > > use.  However, the OFED-1.3 does not appear to work with the subject
 > > HPC card.  The card is an HPC 380299-B21.  Is there any information
 > > you may provide in how to communicate to this card?

What does lspci -vvvnn show for this card?

What do you mean by "does not appear to work"?  How does it fail exactly?

 - R.


From pw at osc.edu  Wed Apr 30 08:01:16 2008
From: pw at osc.edu (Pete Wyckoff)
Date: Wed, 30 Apr 2008 11:01:16 -0400
Subject: [ofa-general] [PATCH] IB/iSER: Count fmr alignment violations
	per session
In-Reply-To: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com>
References: <694d48600804290033k61f717f7ob97d33b27e4c236f@mail.gmail.com>
Message-ID: <20080430150116.GA22791@osc.edu>

dorfman.eli at gmail.com wrote on Tue, 29 Apr 2008 10:33 +0300:
> Count fmr alignment violations per session
> as part of the iscsi statistics.
>
> Signed-off-by: Eli Dorfman <elid at voltaire.com>

Brilliant.  Thanks for this.

		-- Pete


From rdreier at cisco.com  Wed Apr 30 08:07:23 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 08:07:23 -0700
Subject: [ofa-general] Re: [PATCH] IB/core: handle race between elements in
	qork queues after event
In-Reply-To: <48187E5A.7040809@Voltaire.COM> (Moni Shoua's message of "Wed, 30
	Apr 2008 17:12:42 +0300")
References: <48187E5A.7040809@Voltaire.COM>
Message-ID: <adalk2v9yac.fsf@cisco.com>

 > This patch solves a race between elements in work queues that are 
 > carried out after an event occurs. When SM address handle becomes i
 > nvalid and needs an update it is set to NULL and until update_sm_ah() 
 > is called, any request that needs sm_ah is replied with -EAGAIN return 
 > status.

What is the race?  What is the effect of the race?  Don't expect me to
be psychic and guess what you're fixing.  And if there is more
information in an email thread or bugzilla entry, please include a link
to it.

Can this race between work queue entries be solved in a simpler way just
by using a single-threaded workqueue?

Your patch doesn't seem to change any consumers of this code.  How do
they cope with a -EAGAIN return value?

 >  static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event)
 >  {
 > +
 >  	if (event->event == IB_EVENT_PORT_ERR    ||

No need to add a blank line here.

 > +	if (!port->sm_ah) {
 > +		return  -EAGAIN;
 > +	}

No need for braces here.

 > +	agent = port->agent;
 > +	
 >  	query = kmalloc(sizeof *query, gfp_mask);

blank line has trailing whitespace.

Please investigate using checkpatch.pl.

 - R.


From eli at dev.mellanox.co.il  Wed Apr 30 09:06:26 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 30 Apr 2008 19:06:26 +0300
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <adaskx4baby.fsf@cisco.com>
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com>  <adaskx4baby.fsf@cisco.com>
Message-ID: <1209571586.1790.5.camel@mtls03>

On Tue, 2008-04-29 at 14:49 -0700, Roland Dreier wrote:
> By the way, this isn't just theoretical -- I'm not smart enough to
> realize this except that I just saw:
> 
>     ib1: TX ring full, stopping kernel net queue
>     NETDEV WATCHDOG: ib1: transmit timed out
>     ib1: transmit timeout: latency 1240 msecs
>     ib1: queue stopped 1, tx_head 5291313, tx_tail 5291255
> 
> and of course it never recovers.

I started working on a fix for this by arming the send CQ when the QP
reaches 63 outstanding requests and draining the CQ at the completion
handler while holding priv->tx_lock.

But I had another strange problem that I don't understand. If I just
load and unload ib_ipoib, the system crashes showing messages that
appear like there has been a memory corruption. If I comment out
destroying the send CQ at ipoib_transport_dev_cleanup() the crashes
disappear. Do you see this as well?


From rdreier at cisco.com  Wed Apr 30 09:14:10 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 09:14:10 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <1209571586.1790.5.camel@mtls03> (Eli Cohen's message of "Wed, 30
	Apr 2008 19:06:26 +0300")
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com> <adaskx4baby.fsf@cisco.com>
	<1209571586.1790.5.camel@mtls03>
Message-ID: <adahcdj9v71.fsf@cisco.com>

 > I started working on a fix for this by arming the send CQ when the QP
 > reaches 63 outstanding requests and draining the CQ at the completion
 > handler while holding priv->tx_lock.

OK (I hope 63 is replaced with something that is computed based on other
constants though -- like you could arm the CQ when you're about to do
netif_stop_queue())... seems like it should work.

 > But I had another strange problem that I don't understand. If I just
 > load and unload ib_ipoib, the system crashes showing messages that
 > appear like there has been a memory corruption. If I comment out
 > destroying the send CQ at ipoib_transport_dev_cleanup() the crashes
 > disappear. Do you see this as well?

Not here... what tree are you running?

 - R.


From rdreier at cisco.com  Wed Apr 30 09:15:32 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 09:15:32 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <1209571586.1790.5.camel@mtls03> (Eli Cohen's message of "Wed, 30
	Apr 2008 19:06:26 +0300")
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com> <adaskx4baby.fsf@cisco.com>
	<1209571586.1790.5.camel@mtls03>
Message-ID: <adad4o79v4r.fsf@cisco.com>

 > But I had another strange problem that I don't understand. If I just
 > load and unload ib_ipoib, the system crashes showing messages that
 > appear like there has been a memory corruption. If I comment out
 > destroying the send CQ at ipoib_transport_dev_cleanup() the crashes
 > disappear. Do you see this as well?

Actually maybe I just saw this happen -- it did look like memory
corruption but it wasn't an immediate crash.

Will try to investigate.

 - R.


From rdreier at cisco.com  Wed Apr 30 09:16:40 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 09:16:40 -0700
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <adad4o79v4r.fsf@cisco.com> (Roland Dreier's message of "Wed, 30
	Apr 2008 09:15:32 -0700")
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com> <adaskx4baby.fsf@cisco.com>
	<1209571586.1790.5.camel@mtls03> <adad4o79v4r.fsf@cisco.com>
Message-ID: <ada8wyv9v2v.fsf@cisco.com>

 > Actually maybe I just saw this happen -- it did look like memory
 > corruption but it wasn't an immediate crash.

By the way, what kind of HCA are you using in your system?  mthca or
mlx4?

 - R.


From eli at dev.mellanox.co.il  Wed Apr 30 09:22:30 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 30 Apr 2008 19:22:30 +0300
Subject: [ofa-general] Re: [PATCH v2] IB/ipoib: Split CQs for IPOIB UD
In-Reply-To: <ada8wyv9v2v.fsf@cisco.com>
References: <1209370487.11248.1.camel@mtls03> <adahcdkwmzr.fsf@cisco.com>
	<adawsmgbaoz.fsf@cisco.com> <adaskx4baby.fsf@cisco.com>
	<1209571586.1790.5.camel@mtls03> <adad4o79v4r.fsf@cisco.com>
	<ada8wyv9v2v.fsf@cisco.com>
Message-ID: <1209572550.1790.7.camel@mtls03>


On Wed, 2008-04-30 at 09:16 -0700, Roland Dreier wrote:
> > Actually maybe I just saw this happen -- it did look like memory
>  > corruption but it wasn't an immediate crash.
> 
> By the way, what kind of HCA are you using in your system?  mthca or
> mlx4?
> 
I have both ConnectX and Arbel.


From michaelc at cs.wisc.edu  Wed Apr 30 09:27:22 2008
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 30 Apr 2008 11:27:22 -0500
Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment
	violations
In-Reply-To: <20080430140825.GB19339@osc.edu>
References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com>
	<adaskx60yhd.fsf@cisco.com> <20080430140825.GB19339@osc.edu>
Message-ID: <48189DEA.50108@cs.wisc.edu>

Pete Wyckoff wrote:
> rdreier at cisco.com wrote on Mon, 28 Apr 2008 08:51 -0700:
>>  > Add read only module param to count alignment violations.
>>
>> I don't think a module parameter is the way to report statistics from
>> the kernel.  Can't you just add a device attribute or something?  Or
>> stick a file in debugfs?
> 
> This is definitely a worthwhile change though.  By monitoring this
> statistic we were able to get good insight to what our apps are
> doing to cause these alignment violations.
> 
> I have a hacky patch that tries to export it via sysfs, but it
> doesn't clean up properly.  The iscsi transport class defines the
> sysfs tree and doesn't give hooks to a particular device to
> add/change those entries, which is why this approach came out rather
> ugly.  Hope Eli is willing to do this the right way; maybe debugfs
> is the way to go.
> 

We have iscsi stats already. I thought Eli sent a patch to put this 
there already? If not then put this in the get_stats callout as one of 
the iser custom values.


From michaelc at cs.wisc.edu  Wed Apr 30 09:30:23 2008
From: michaelc at cs.wisc.edu (Mike Christie)
Date: Wed, 30 Apr 2008 11:30:23 -0500
Subject: [ofa-general] [PATCH] IB/iSER: Add module param to count alignment
	violations
In-Reply-To: <48189DEA.50108@cs.wisc.edu>
References: <694d48600804280510l25ee6f90t9eff86fd6743461@mail.gmail.com>
	<adaskx60yhd.fsf@cisco.com> <20080430140825.GB19339@osc.edu>
	<48189DEA.50108@cs.wisc.edu>
Message-ID: <48189E9F.1020606@cs.wisc.edu>

Mike Christie wrote:
> Pete Wyckoff wrote:
>> rdreier at cisco.com wrote on Mon, 28 Apr 2008 08:51 -0700:
>>>  > Add read only module param to count alignment violations.
>>>
>>> I don't think a module parameter is the way to report statistics from
>>> the kernel.  Can't you just add a device attribute or something?  Or
>>> stick a file in debugfs?
>> This is definitely a worthwhile change though.  By monitoring this
>> statistic we were able to get good insight to what our apps are
>> doing to cause these alignment violations.
>>
>> I have a hacky patch that tries to export it via sysfs, but it
>> doesn't clean up properly.  The iscsi transport class defines the
>> sysfs tree and doesn't give hooks to a particular device to
>> add/change those entries, which is why this approach came out rather
>> ugly.  Hope Eli is willing to do this the right way; maybe debugfs
>> is the way to go.
>>
> 
> We have iscsi stats already. I thought Eli sent a patch to put this 
> there already? If not then put this in the get_stats callout as one of 
> the iser custom values.
> 

Nevermind I see the other mail.


From sean.hefty at intel.com  Wed Apr 30 09:38:13 2008
From: sean.hefty at intel.com (Sean Hefty)
Date: Wed, 30 Apr 2008 09:38:13 -0700
Subject: [ofa-general] beginner resources
In-Reply-To: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com>
References: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com>
Message-ID: <000a01c8aae0$95d30550$f2d8180a@amr.corp.intel.com>

I had a look at the rping example and I'm trying to use Roland Dreier's
examples.


But my example simply doesn't work. I'm totally new to this so please bare with
me.


If someone has time to have a look at http://pastebin.com/m708b032c and
http://pastebin.com/m13673097


It would be helpful if you explained what the problem is, and post the relevant
code directly to the list.
 
- Sean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080430/cf9c527b/attachment.html>

From swise at opengridcomputing.com  Wed Apr 30 09:45:55 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 11:45:55 -0500
Subject: [ofa-general] iwarp-specific async events
Message-ID: <4818A243.1090201@opengridcomputing.com>

Hey Roland,

I'm looking for a good way to trigger iwarp QP flushing on a normal 
disconnect for user mode QPs.  The async event notification provider ops 
function is one way I can do it easily with the currently 
infrastructure, if we add some new event types.   For example, if a 
fatal error occurs on a QP which causes the connection to be aborted, 
then the kernel driver will mark the user qp as "in error" and post a 
FATAL_QP event.  When the app reaps that event, the libcxgb3 async event 
ops function will flush the user's qp.  However for a normal non fatal 
close, no async event is posted.  But one should be.  The iWARP verbs 
specify many async event types that I think we need to add at some 
point.  Case in point:

LLP Close Complete  (qp event) - The TCP connection completed and no SQ 
WQEs were flushed (normal close)

There is a whole slew of other events.  The above event, however, is key 
in that libcxgb3 could trigger a qp flush when this event is reaped by 
the application.  Currently, the flushing of the QP is only triggered by 
fatal connections errors as described above and/or if the application 
tries to post on a QP that has been marked in error by the kernel.   
However, If the app does neither, then the flush never happens.  

There are other ways to tackle this cxgb3 problem:

- enabling the providers to get a callback on rdma-cm event reaping.  So 
reaping the DISCONNECTED event would cause the qp to be flushed.

- I could hack this into the cxgb3 provider kernel driver so it can mark 
a user mode CQ with state that tells it to go flush any QPs it owns that 
are in error.   Thus the next time the application polls, the poll logic 
would go flush any qps in error.

I'm opting for the simplest change, which I think is adding new async 
events and changing the iwarp driver to post them at the right times.

Thoughts?

Thanks,

Steve.


From swise at opengridcomputing.com  Wed Apr 30 09:51:14 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 11:51:14 -0500
Subject: [ofa-general] beginner resources
In-Reply-To: <000a01c8aae0$95d30550$f2d8180a@amr.corp.intel.com>
References: <6978b4af0804230620p560c33c5hfa8385a57bbed80c@mail.gmail.com>
	<000a01c8aae0$95d30550$f2d8180a@amr.corp.intel.com>
Message-ID: <4818A382.9060601@opengridcomputing.com>

Is it time for someone to write an RDMA programming book? 

Do we have enough buyers yet?  :)


Steve.


Sean Hefty wrote:
> I had a look at the rping example and I'm trying to use Roland Dreier's examples.
>
> But my example simply doesn't work. I'm totally new to this so please bare with me.
>
> If someone has time to have a look at http://pastebin.com/m708b032c and http://pastebin.com/m13673097
>
>
>
>  
> It would be helpful if you explained what the problem is, and post the relevant code directly to the list.
>  
> - Sean
> ------------------------------------------------------------------------
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


From rdreier at cisco.com  Wed Apr 30 10:13:15 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 10:13:15 -0700
Subject: [ofa-general] Re: iwarp-specific async events
In-Reply-To: <4818A243.1090201@opengridcomputing.com> (Steve Wise's message of
	"Wed, 30 Apr 2008 11:45:55 -0500")
References: <4818A243.1090201@opengridcomputing.com>
Message-ID: <adak5if8dw4.fsf@cisco.com>

 > I'm looking for a good way to trigger iwarp QP flushing on a normal
 > disconnect for user mode QPs.  The async event notification provider
 > ops function is one way I can do it easily with the currently
 > infrastructure, if we add some new event types.   For example, if a
 > fatal error occurs on a QP which causes the connection to be aborted,
 > then the kernel driver will mark the user qp as "in error" and post a
 > FATAL_QP event.  When the app reaps that event, the libcxgb3 async
 > event ops function will flush the user's qp.  However for a normal non
 > fatal close, no async event is posted.  But one should be.  The iWARP
 > verbs specify many async event types that I think we need to add at
 > some point.  Case in point:
 > 
 > LLP Close Complete  (qp event) - The TCP connection completed and no
 > SQ WQEs were flushed (normal close)

Yeah, it makes sense just to add any iWARP events that make sense and
don't fit the existing set of IB events.  We already have IB-specific
stuff for path migration etc.

 > There is a whole slew of other events.  The above event, however, is
 > key in that libcxgb3 could trigger a qp flush when this event is
 > reaped by the application.  Currently, the flushing of the QP is only
 > triggered by fatal connections errors as described above and/or if the
 > application tries to post on a QP that has been marked in error by the
 > kernel.   However, If the app does neither, then the flush never
 > happens.  

On the other hand, how does cxgb3 know when an application has reaped
the event?  Do we need to add code to the uverbs module to know when an
async event has reached userspace?

 - R.


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:15:52 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:45:52 +0530
Subject: [ofa-general] [PATCH 00/13] QLogic Virtual NIC (VNIC) Driver 
Message-ID: <20080430171028.31725.86190.stgit@localhost.localdomain>

Roland,

This is the QLogic Virtual NIC driver patch series which has been tested
against your for-2.6.26 and for-2.6.27 branches. We intended these patches to
make it to the 2.6.26 kernel, but if it is too late for the 2.6.26 merge window
please consider them for 2.6.27.

This patch series adds the QLogic Virtual NIC (VNIC) driver which works in
conjunction with the the QLogic Ethernet Virtual I/O Controller (EVIC) hardware.
The VNIC driver along with the QLogic EVIC's two 10 Gigabit ethernet ports,
enables Infiniband clusters to connect to Ethernet networks. This driver also
works with the earlier version of the I/O Controller, the VEx.

The QLogic VNIC driver creates virtual ethernet interfaces and tunnels the
Ethernet data to/from the EVIC over Infiniband using an Infiniband reliable
connection.

The driver compiles cleanly with sparse endianness checking enabled. We have
also tested the driver with lockdep checking enabled.

We have run these patches through checkpatch.pl and the only warnings are
related to lines slightly longer than 80 columns in some of the statements.

The driver itself has has been tested with long duration iperf, netperf TCP,
UDP streams.

---

      [PATCH 01/13] QLogic VNIC: Driver - netdev implementation
      [PATCH 02/13] QLogic VNIC: Netpath - abstraction of connection to EVIC/VEx
      [PATCH 03/13] QLogic VNIC: Implementation of communication protocol with EVIC/VEx
      [PATCH 04/13] QLogic VNIC: Implementation of Control path of communication protocol
      [PATCH 05/13] QLogic VNIC: Implementation of Data path of communication protocol
      [PATCH 06/13] QLogic VNIC: IB core stack interaction
      [PATCH 07/13] QLogic VNIC: Handling configurable parameters of the driver
      [PATCH 08/13] QLogic VNIC: sysfs interface implementation for the driver
      [PATCH 09/13] QLogic VNIC: IB Multicast for Ethernet broadcast/multicast
      [PATCH 10/13] QLogic VNIC: Driver Statistics collection
      [PATCH 11/13] QLogic VNIC: Driver utility file - implements various utility macros
      [PATCH 12/13] QLogic VNIC: Driver Kconfig and Makefile.
      [PATCH 13/13] QLogic VNIC: Modifications to IB Kconfig and Makefile


 drivers/infiniband/Kconfig                         |    2 
 drivers/infiniband/Makefile                        |    1 
 drivers/infiniband/ulp/qlgc_vnic/Kconfig           |   28 
 drivers/infiniband/ulp/qlgc_vnic/Makefile          |   13 
 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c     |  380 +++
 drivers/infiniband/ulp/qlgc_vnic/vnic_config.h     |  242 ++
 drivers/infiniband/ulp/qlgc_vnic/vnic_control.c    | 2288 ++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_control.h    |  180 ++
 .../infiniband/ulp/qlgc_vnic/vnic_control_pkt.h    |  368 +++
 drivers/infiniband/ulp/qlgc_vnic/vnic_data.c       | 1473 +++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_data.h       |  206 ++
 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c         | 1046 +++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h         |  206 ++
 drivers/infiniband/ulp/qlgc_vnic/vnic_main.c       | 1052 +++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_main.h       |  167 +
 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c  |  332 +++
 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h  |   76 +
 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c    |  112 +
 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h    |   80 +
 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c      |  234 ++
 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h      |  497 ++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c        | 1127 ++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h        |   62 +
 drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h    |  103 +
 drivers/infiniband/ulp/qlgc_vnic/vnic_util.h       |  251 ++
 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c     | 1233 +++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h     |  176 ++
 27 files changed, 11935 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Kconfig
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Makefile
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_util.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h

-- 
Regards,
Ram


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:16:24 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:46:24 +0530
Subject: [ofa-general] [PATCH 01/13] QLogic VNIC: Driver - netdev
	implementation
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171624.31725.98475.stgit@localhost.localdomain>

From: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>

QLogic Virtual NIC Driver. This patch implements netdev registration,
netdev functions and state maintenance of the QLogic Virtual NIC
corresponding to the various events associated with the QLogic Ethernet 
Virtual I/O Controller (EVIC/VEx) connection.

Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_main.c | 1052 ++++++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_main.h |  167 ++++
 2 files changed, 1219 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_main.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c
new file mode 100644
index 0000000..393c79a
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.c
@@ -0,0 +1,1052 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/completion.h>
+
+#include <rdma/ib_cache.h>
+
+#include "vnic_util.h"
+#include "vnic_main.h"
+#include "vnic_netpath.h"
+#include "vnic_viport.h"
+#include "vnic_ib.h"
+#include "vnic_stats.h"
+
+#define MODULEVERSION "1.3.0.0.4"
+#define MODULEDETAILS	\
+		"QLogic Corp. Virtual NIC (VNIC) driver version " MODULEVERSION
+
+MODULE_AUTHOR("QLogic Corp.");
+MODULE_DESCRIPTION(MODULEDETAILS);
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_SUPPORTED_DEVICE("QLogic Ethernet Virtual I/O Controller");
+
+u32 vnic_debug;
+
+module_param(vnic_debug, uint, 0444);
+MODULE_PARM_DESC(vnic_debug, "Enable debug tracing if > 0");
+
+LIST_HEAD(vnic_list);
+
+static DECLARE_WAIT_QUEUE_HEAD(vnic_npevent_queue);
+static LIST_HEAD(vnic_npevent_list);
+static DECLARE_COMPLETION(vnic_npevent_thread_exit);
+static spinlock_t vnic_npevent_list_lock;
+static struct task_struct *vnic_npevent_thread;
+static int vnic_npevent_thread_end;
+
+
+void vnic_connected(struct vnic *vnic, struct netpath *netpath)
+{
+	VNIC_FUNCTION("vnic_connected()\n");
+	if (netpath->second_bias)
+		vnic_npevent_queue_evt(netpath, VNIC_SECNP_CONNECTED);
+	else
+		vnic_npevent_queue_evt(netpath, VNIC_PRINP_CONNECTED);
+
+	vnic_connected_stats(vnic);
+}
+
+void vnic_disconnected(struct vnic *vnic, struct netpath *netpath)
+{
+	VNIC_FUNCTION("vnic_disconnected()\n");
+	if (netpath->second_bias)
+		vnic_npevent_queue_evt(netpath, VNIC_SECNP_DISCONNECTED);
+	else
+		vnic_npevent_queue_evt(netpath, VNIC_PRINP_DISCONNECTED);
+}
+
+void vnic_link_up(struct vnic *vnic, struct netpath *netpath)
+{
+	VNIC_FUNCTION("vnic_link_up()\n");
+	if (netpath->second_bias)
+		vnic_npevent_queue_evt(netpath, VNIC_SECNP_LINKUP);
+	else
+		vnic_npevent_queue_evt(netpath, VNIC_PRINP_LINKUP);
+}
+
+void vnic_link_down(struct vnic *vnic, struct netpath *netpath)
+{
+	VNIC_FUNCTION("vnic_link_down()\n");
+	if (netpath->second_bias)
+		vnic_npevent_queue_evt(netpath, VNIC_SECNP_LINKDOWN);
+	else
+		vnic_npevent_queue_evt(netpath, VNIC_PRINP_LINKDOWN);
+}
+
+void vnic_stop_xmit(struct vnic *vnic, struct netpath *netpath)
+{
+	VNIC_FUNCTION("vnic_stop_xmit()\n");
+	if (netpath == vnic->current_path) {
+		if (vnic->xmit_started) {
+			netif_stop_queue(vnic->netdevice);
+			vnic->xmit_started = 0;
+		}
+
+		vnic_stop_xmit_stats(vnic);
+	}
+}
+
+void vnic_restart_xmit(struct vnic *vnic, struct netpath *netpath)
+{
+	VNIC_FUNCTION("vnic_restart_xmit()\n");
+	if (netpath == vnic->current_path) {
+		if (!vnic->xmit_started) {
+			netif_wake_queue(vnic->netdevice);
+			vnic->xmit_started = 1;
+		}
+
+		vnic_restart_xmit_stats(vnic);
+	}
+}
+
+void vnic_recv_packet(struct vnic *vnic, struct netpath *netpath,
+		      struct sk_buff *skb)
+{
+	VNIC_FUNCTION("vnic_recv_packet()\n");
+	if ((netpath != vnic->current_path) || !vnic->open) {
+		VNIC_INFO("tossing packet\n");
+		dev_kfree_skb(skb);
+		return;
+	}
+
+	vnic->netdevice->last_rx = jiffies;
+	skb->dev = vnic->netdevice;
+	skb->protocol = eth_type_trans(skb, skb->dev);
+	if (!vnic->config->use_rx_csum)
+		skb->ip_summed = CHECKSUM_NONE;
+	netif_rx(skb);
+	vnic_recv_pkt_stats(vnic);
+}
+
+static struct net_device_stats *vnic_get_stats(struct net_device *device)
+{
+	struct vnic *vnic;
+	struct netpath *np;
+
+	VNIC_FUNCTION("vnic_get_stats()\n");
+	vnic = (struct vnic *)device->priv;
+
+	np = vnic->current_path;
+	if (np && np->viport && !np->cleanup_started)
+		viport_get_stats(np->viport, &vnic->stats);
+	return &vnic->stats;
+}
+
+static int vnic_open(struct net_device *device)
+{
+	struct vnic *vnic;
+
+	VNIC_FUNCTION("vnic_open()\n");
+	vnic = (struct vnic *)device->priv;
+
+	vnic->open++;
+	vnic_npevent_queue_evt(&vnic->primary_path, VNIC_PRINP_SETLINK);
+	vnic->xmit_started = 1;
+	netif_start_queue(vnic->netdevice);
+
+	return 0;
+}
+
+static int vnic_stop(struct net_device *device)
+{
+	struct vnic *vnic;
+	int ret = 0;
+
+	VNIC_FUNCTION("vnic_stop()\n");
+	vnic = (struct vnic *)device->priv;
+	netif_stop_queue(device);
+	vnic->xmit_started = 0;
+	vnic->open--;
+	vnic_npevent_queue_evt(&vnic->primary_path, VNIC_PRINP_SETLINK);
+
+	return ret;
+}
+
+static int vnic_hard_start_xmit(struct sk_buff *skb,
+				struct net_device *device)
+{
+	struct vnic *vnic;
+	struct netpath *np;
+	cycles_t xmit_time;
+	int	 ret = -1;
+
+	VNIC_FUNCTION("vnic_hard_start_xmit()\n");
+	vnic = (struct vnic *)device->priv;
+	np = vnic->current_path;
+
+	vnic_pre_pkt_xmit_stats(&xmit_time);
+
+	if (np && np->viport)
+		ret = viport_xmit_packet(np->viport, skb);
+
+	if (ret) {
+		vnic_xmit_fail_stats(vnic);
+		dev_kfree_skb_any(skb);
+		vnic->stats.tx_dropped++;
+		goto out;
+	}
+
+	device->trans_start = jiffies;
+	vnic_post_pkt_xmit_stats(vnic, xmit_time);
+out:
+	return 0;
+}
+
+static void vnic_tx_timeout(struct net_device *device)
+{
+	struct vnic *vnic;
+
+	VNIC_FUNCTION("vnic_tx_timeout()\n");
+	vnic = (struct vnic *)device->priv;
+	device->trans_start = jiffies;
+
+	if (vnic->current_path->viport)
+		viport_failure(vnic->current_path->viport);
+
+	VNIC_ERROR("vnic_tx_timeout\n");
+}
+
+static void vnic_set_multicast_list(struct net_device *device)
+{
+	struct vnic *vnic;
+	unsigned long flags;
+
+	VNIC_FUNCTION("vnic_set_multicast_list()\n");
+	vnic = (struct vnic *)device->priv;
+
+	spin_lock_irqsave(&vnic->lock, flags);
+	if (device->mc_count == 0) {
+		if (vnic->mc_list_len) {
+			vnic->mc_list_len = vnic->mc_count = 0;
+			kfree(vnic->mc_list);
+		}
+	} else {
+		struct dev_mc_list *mc_list = device->mc_list;
+		int i;
+
+		if (device->mc_count > vnic->mc_list_len) {
+			if (vnic->mc_list_len)
+				kfree(vnic->mc_list);
+			vnic->mc_list_len = device->mc_count + 10;
+			vnic->mc_list = kmalloc(vnic->mc_list_len *
+						sizeof *mc_list, GFP_ATOMIC);
+			if (!vnic->mc_list) {
+				vnic->mc_list_len = vnic->mc_count = 0;
+				VNIC_ERROR("failed allocating mc_list\n");
+				goto failure;
+			}
+		}
+		vnic->mc_count = device->mc_count;
+		for (i = 0; i < device->mc_count; i++) {
+			vnic->mc_list[i] = *mc_list;
+			vnic->mc_list[i].next = &vnic->mc_list[i + 1];
+			mc_list = mc_list->next;
+		}
+	}
+	spin_unlock_irqrestore(&vnic->lock, flags);
+
+	if (vnic->primary_path.viport)
+		viport_set_multicast(vnic->primary_path.viport,
+				     vnic->mc_list, vnic->mc_count);
+
+	if (vnic->secondary_path.viport)
+		viport_set_multicast(vnic->secondary_path.viport,
+				     vnic->mc_list, vnic->mc_count);
+
+	vnic_npevent_queue_evt(&vnic->primary_path, VNIC_PRINP_SETLINK);
+	return;
+failure:
+	spin_unlock_irqrestore(&vnic->lock, flags);
+}
+
+/**
+ * Following set of functions queues up the events for EVIC and the
+ * kernel thread queuing up the event might return.
+ */
+static int vnic_set_mac_address(struct net_device *device, void *addr)
+{
+	struct vnic	*vnic;
+	struct sockaddr	*sockaddr = addr;
+	u8		*address;
+	int		ret = -1;
+
+	VNIC_FUNCTION("vnic_set_mac_address()\n");
+	vnic = (struct vnic *)device->priv;
+
+	if (!is_valid_ether_addr(sockaddr->sa_data))
+		return -EADDRNOTAVAIL;
+
+	if (netif_running(device))
+		return -EBUSY;
+
+	memcpy(device->dev_addr, sockaddr->sa_data, ETH_ALEN);
+	address = sockaddr->sa_data;
+
+	if (vnic->primary_path.viport)
+		ret = viport_set_unicast(vnic->primary_path.viport,
+					 address);
+
+	if (ret)
+		return ret;
+
+	if (vnic->secondary_path.viport)
+		viport_set_unicast(vnic->secondary_path.viport, address);
+
+	vnic->mac_set = 1;
+	return 0;
+}
+
+static int vnic_change_mtu(struct net_device *device, int mtu)
+{
+	struct vnic	*vnic;
+	int		ret = 0;
+	int		pri_max_mtu;
+	int		sec_max_mtu;
+
+	VNIC_FUNCTION("vnic_change_mtu()\n");
+	vnic = (struct vnic *)device->priv;
+
+	if (vnic->primary_path.viport)
+		pri_max_mtu = viport_max_mtu(vnic->primary_path.viport);
+	else
+		pri_max_mtu = MAX_PARAM_VALUE;
+
+	if (vnic->secondary_path.viport)
+		sec_max_mtu = viport_max_mtu(vnic->secondary_path.viport);
+	else
+		sec_max_mtu = MAX_PARAM_VALUE;
+
+	if ((mtu < pri_max_mtu) && (mtu < sec_max_mtu)) {
+		device->mtu = mtu;
+		vnic_npevent_queue_evt(&vnic->primary_path,
+				       VNIC_PRINP_SETLINK);
+		vnic_npevent_queue_evt(&vnic->secondary_path,
+				       VNIC_SECNP_SETLINK);
+	} else if (pri_max_mtu < sec_max_mtu)
+		printk(KERN_WARNING PFX "%s: Maximum "
+					"supported MTU size is %d. "
+					"Cannot set MTU to %d\n",
+					vnic->config->name, pri_max_mtu, mtu);
+	else
+		printk(KERN_WARNING PFX "%s: Maximum "
+					"supported MTU size is %d. "
+					"Cannot set MTU to %d\n",
+					vnic->config->name, sec_max_mtu, mtu);
+
+	return ret;
+}
+
+static int vnic_npevent_register(struct vnic *vnic, struct netpath *netpath)
+{
+	u8	*address;
+	int	ret;
+
+	if (!vnic->mac_set) {
+		/* if netpath == secondary_path, then the primary path isn't
+		 * connected.  MAC address will be set when the primary
+		 * connects.
+		 */
+		netpath_get_hw_addr(netpath, vnic->netdevice->dev_addr);
+		address = vnic->netdevice->dev_addr;
+
+		if (vnic->secondary_path.viport)
+			viport_set_unicast(vnic->secondary_path.viport,
+					   address);
+
+		vnic->mac_set = 1;
+	}
+	ret = register_netdev(vnic->netdevice);
+	if (ret) {
+		printk(KERN_ERR PFX "%s failed registering netdev "
+			"error %d - calling viport_failure\n",
+			config_viport_name(vnic->primary_path.viport->config),
+				ret);
+		vnic_free(vnic);
+		printk(KERN_ERR PFX "%s DELETED : register_netdev failure\n",
+			config_viport_name(vnic->primary_path.viport->config));
+		return ret;
+	}
+
+	vnic->state = VNIC_REGISTERED;
+	vnic->carrier = 2; /*special value to force netif_carrier_(on|off)*/
+	return 0;
+}
+
+static void vnic_npevent_dequeue_all(struct vnic *vnic)
+{
+	unsigned long flags;
+	struct vnic_npevent *npevt, *tmp;
+
+	spin_lock_irqsave(&vnic_npevent_list_lock, flags);
+	if (list_empty(&vnic_npevent_list))
+		goto out;
+	list_for_each_entry_safe(npevt, tmp, &vnic_npevent_list,
+				 list_ptrs) {
+		if ((npevt->vnic == vnic)) {
+			list_del(&npevt->list_ptrs);
+			kfree(npevt);
+		}
+	}
+out:
+	spin_unlock_irqrestore(&vnic_npevent_list_lock, flags);
+}
+
+static void update_path_and_reconnect(struct netpath *netpath,
+				      struct vnic *vnic)
+{
+	struct viport_config *config = netpath->viport->config;
+	int delay = 1;
+
+	if (vnic_ib_get_path(netpath, vnic))
+		return;
+	/*
+	 * tell viport_connect to wait for default_no_path_timeout
+	 * before connecting if  we are retrying the same path index
+	 * within default_no_path_timeout.
+	 * This prevents flooding connect requests to a path (or set
+	 * of paths) that aren't successfully connecting for some reason.
+	 */
+	if (jiffies > netpath->connect_time +
+		      vnic->config->no_path_timeout) {
+		netpath->path_idx = config->path_idx;
+		netpath->connect_time = jiffies;
+		netpath->delay_reconnect = 0;
+		delay = 0;
+	} else if (config->path_idx != netpath->path_idx) {
+		delay = netpath->delay_reconnect;
+		netpath->path_idx = config->path_idx;
+		netpath->delay_reconnect = 1;
+	} else
+		delay = 1;
+	viport_connect(netpath->viport, delay);
+}
+
+static void vnic_set_uni_multicast(struct vnic *vnic,
+				   struct netpath *netpath)
+{
+	unsigned long	flags;
+	u8		*address;
+
+	if (vnic->mac_set) {
+		address = vnic->netdevice->dev_addr;
+
+		if (netpath->viport)
+			viport_set_unicast(netpath->viport, address);
+	}
+	spin_lock_irqsave(&vnic->lock, flags);
+
+	if (vnic->mc_list && netpath->viport)
+		viport_set_multicast(netpath->viport, vnic->mc_list,
+				     vnic->mc_count);
+
+	spin_unlock_irqrestore(&vnic->lock, flags);
+	if (vnic->state == VNIC_REGISTERED) {
+		if (!netpath->viport)
+			return;
+		viport_set_link(netpath->viport,
+				vnic->netdevice->flags & ~IFF_UP,
+				vnic->netdevice->mtu);
+	}
+}
+
+static void vnic_set_netpath_timers(struct vnic *vnic,
+				    struct netpath *netpath)
+{
+	switch (netpath->timer_state) {
+	case NETPATH_TS_IDLE:
+		netpath->timer_state = NETPATH_TS_ACTIVE;
+		if (vnic->state == VNIC_UNINITIALIZED)
+			netpath_timer(netpath,
+				      vnic->config->
+				      primary_connect_timeout);
+		else
+			netpath_timer(netpath,
+				      vnic->config->
+				      primary_reconnect_timeout);
+			break;
+	case NETPATH_TS_ACTIVE:
+		/*nothing to do*/
+		break;
+	case NETPATH_TS_EXPIRED:
+		if (vnic->state == VNIC_UNINITIALIZED)
+			vnic_npevent_register(vnic, netpath);
+
+		break;
+	}
+}
+
+static void vnic_check_primary_path_timer(struct vnic *vnic)
+{
+	switch (vnic->primary_path.timer_state) {
+	case NETPATH_TS_ACTIVE:
+		/* nothing to do. just wait */
+		break;
+	case NETPATH_TS_IDLE:
+		netpath_timer(&vnic->primary_path,
+			      vnic->config->
+			      primary_switch_timeout);
+		break;
+	case NETPATH_TS_EXPIRED:
+		printk(KERN_INFO PFX
+		       "%s: switching to primary path\n",
+		       vnic->config->name);
+
+		vnic->current_path = &vnic->primary_path;
+		if (vnic->config->use_tx_csum
+		    && netpath_can_tx_csum(vnic->
+					   current_path)) {
+			vnic->netdevice->features |=
+					    NETIF_F_IP_CSUM;
+		}
+		break;
+	}
+}
+
+static void vnic_carrier_loss(struct vnic *vnic,
+			      struct netpath *last_path)
+{
+	if (vnic->primary_path.carrier) {
+		vnic->carrier = 1;
+		vnic->current_path = &vnic->primary_path;
+
+		if (last_path && last_path != vnic->current_path)
+			printk(KERN_INFO PFX
+			       "%s: failing over to primary path\n",
+			       vnic->config->name);
+		else if (!last_path)
+			printk(KERN_INFO PFX "%s: using primary path\n",
+			       vnic->config->name);
+
+		if (vnic->config->use_tx_csum &&
+		    netpath_can_tx_csum(vnic->current_path))
+			vnic->netdevice->features |= NETIF_F_IP_CSUM;
+
+	} else if ((vnic->secondary_path.carrier) &&
+		   (vnic->secondary_path.timer_state != NETPATH_TS_ACTIVE)) {
+		vnic->carrier = 1;
+		vnic->current_path = &vnic->secondary_path;
+
+		if (last_path && last_path != vnic->current_path)
+			printk(KERN_INFO PFX
+			       "%s: failing over to secondary path\n",
+			       vnic->config->name);
+		else if (!last_path)
+			printk(KERN_INFO PFX "%s: using secondary path\n",
+			       vnic->config->name);
+
+		if (vnic->config->use_tx_csum &&
+		    netpath_can_tx_csum(vnic->current_path))
+			vnic->netdevice->features |= NETIF_F_IP_CSUM;
+
+	}
+
+}
+
+static void vnic_handle_path_change(struct vnic *vnic,
+				    struct netpath **path)
+{
+	struct netpath *last_path = *path;
+
+	if (!last_path) {
+		if (vnic->current_path == &vnic->primary_path)
+			last_path = &vnic->secondary_path;
+		else
+			last_path = &vnic->primary_path;
+
+	}
+
+	if (vnic->current_path && vnic->current_path->viport)
+		viport_set_link(vnic->current_path->viport,
+				vnic->netdevice->flags,
+				vnic->netdevice->mtu);
+
+	if (last_path->viport)
+		viport_set_link(last_path->viport,
+				 vnic->netdevice->flags &
+				 ~IFF_UP, vnic->netdevice->mtu);
+
+	vnic_restart_xmit(vnic, vnic->current_path);
+}
+
+static void vnic_report_path_change(struct vnic *vnic,
+				    struct netpath *last_path,
+				    int other_path_ok)
+{
+	if (!vnic->current_path) {
+		if (last_path == &vnic->primary_path)
+			printk(KERN_INFO PFX "%s: primary path lost, "
+			       "no failover path available\n",
+			       vnic->config->name);
+		else
+			printk(KERN_INFO PFX "%s: secondary path lost, "
+			       "no failover path available\n",
+			       vnic->config->name);
+		return;
+	}
+
+	if (last_path != vnic->current_path)
+		return;
+
+	if (vnic->current_path == &vnic->secondary_path) {
+		if (other_path_ok != vnic->primary_path.carrier) {
+			if (other_path_ok)
+				printk(KERN_INFO PFX "%s: primary path no"
+				       " longer available for failover\n",
+				       vnic->config->name);
+			else
+				printk(KERN_INFO PFX "%s: primary path now"
+				       " available for failover\n",
+				       vnic->config->name);
+		}
+	} else {
+		if (other_path_ok != vnic->secondary_path.carrier) {
+			if (other_path_ok)
+				printk(KERN_INFO PFX "%s: secondary path no"
+				       " longer available for failover\n",
+				       vnic->config->name);
+			else
+				printk(KERN_INFO PFX "%s: secondary path now"
+				       " available for failover\n",
+				       vnic->config->name);
+		}
+	}
+}
+
+static void vnic_handle_free_vnic_evt(struct vnic *vnic)
+{
+	netpath_timer_stop(&vnic->primary_path);
+	netpath_timer_stop(&vnic->secondary_path);
+	vnic->current_path = NULL;
+	netpath_free(&vnic->primary_path);
+	netpath_free(&vnic->secondary_path);
+	if (vnic->state == VNIC_REGISTERED) {
+		unregister_netdev(vnic->netdevice);
+		free_netdev(vnic->netdevice);
+	}
+	vnic_npevent_dequeue_all(vnic);
+	kfree(vnic->config);
+	if (vnic->mc_list_len) {
+		vnic->mc_list_len = vnic->mc_count = 0;
+		kfree(vnic->mc_list);
+	}
+
+	sysfs_remove_group(&vnic->dev_info.dev.kobj,
+			   &vnic_dev_attr_group);
+	vnic_cleanup_stats_files(vnic);
+	device_unregister(&vnic->dev_info.dev);
+	wait_for_completion(&vnic->dev_info.released);
+}
+
+static struct vnic *vnic_handle_npevent(struct vnic *vnic,
+					 enum vnic_npevent_type npevt_type)
+{
+	struct netpath	*netpath;
+	const char *netpath_str;
+
+	if (npevt_type <= VNIC_PRINP_LASTTYPE)
+		netpath_str = netpath_to_string(vnic, &vnic->primary_path);
+	else if	(npevt_type <= VNIC_SECNP_LASTTYPE)
+		netpath_str = netpath_to_string(vnic, &vnic->secondary_path);
+	else
+		netpath_str = netpath_to_string(vnic, vnic->current_path);
+
+	VNIC_INFO("%s: processing %s, netpath=%s, carrier=%d\n",
+		  vnic->config->name, vnic_npevent_str[npevt_type],
+		  netpath_str, vnic->carrier);
+
+	switch (npevt_type) {
+	case VNIC_PRINP_CONNECTED:
+		netpath = &vnic->primary_path;
+		if (vnic->state == VNIC_UNINITIALIZED) {
+			if (vnic_npevent_register(vnic, netpath))
+				break;
+		}
+		vnic_set_uni_multicast(vnic, netpath);
+		break;
+	case VNIC_SECNP_CONNECTED:
+		vnic_set_uni_multicast(vnic, &vnic->secondary_path);
+		break;
+	case VNIC_PRINP_TIMEREXPIRED:
+		netpath = &vnic->primary_path;
+		netpath->timer_state = NETPATH_TS_EXPIRED;
+		if (!netpath->carrier)
+			update_path_and_reconnect(netpath, vnic);
+		break;
+	case VNIC_SECNP_TIMEREXPIRED:
+		netpath = &vnic->secondary_path;
+		netpath->timer_state = NETPATH_TS_EXPIRED;
+		if (!netpath->carrier)
+			update_path_and_reconnect(netpath, vnic);
+		else {
+			if (vnic->state == VNIC_UNINITIALIZED)
+				vnic_npevent_register(vnic, netpath);
+		}
+		break;
+	case VNIC_PRINP_LINKUP:
+		vnic->primary_path.carrier = 1;
+		break;
+	case VNIC_SECNP_LINKUP:
+		netpath = &vnic->secondary_path;
+		netpath->carrier = 1;
+		if (!vnic->carrier)
+			vnic_set_netpath_timers(vnic, netpath);
+		break;
+	case VNIC_PRINP_LINKDOWN:
+		vnic->primary_path.carrier = 0;
+		break;
+	case VNIC_SECNP_LINKDOWN:
+		if (vnic->state == VNIC_UNINITIALIZED)
+			netpath_timer_stop(&vnic->secondary_path);
+		vnic->secondary_path.carrier = 0;
+		break;
+	case VNIC_PRINP_DISCONNECTED:
+		netpath = &vnic->primary_path;
+		netpath_timer_stop(netpath);
+		netpath->carrier = 0;
+		update_path_and_reconnect(netpath, vnic);
+		break;
+	case VNIC_SECNP_DISCONNECTED:
+		netpath = &vnic->secondary_path;
+		netpath_timer_stop(netpath);
+		netpath->carrier = 0;
+		update_path_and_reconnect(netpath, vnic);
+		break;
+	case VNIC_PRINP_SETLINK:
+		netpath = vnic->current_path;
+		if (!netpath || !netpath->viport)
+			break;
+		viport_set_link(netpath->viport,
+				vnic->netdevice->flags,
+				vnic->netdevice->mtu);
+		break;
+	case VNIC_SECNP_SETLINK:
+		netpath = &vnic->secondary_path;
+		if (!netpath || !netpath->viport)
+			break;
+		viport_set_link(netpath->viport,
+				vnic->netdevice->flags,
+				vnic->netdevice->mtu);
+		break;
+	case VNIC_NP_FREEVNIC:
+		vnic_handle_free_vnic_evt(vnic);
+		kfree(vnic);
+		vnic = NULL;
+		break;
+	}
+	return vnic;
+}
+
+static int vnic_npevent_statemachine(void *context)
+{
+	struct vnic_npevent	*vnic_link_evt;
+	enum vnic_npevent_type	npevt_type;
+	struct vnic		*vnic;
+	int			last_carrier;
+	int			other_path_ok = 0;
+	struct netpath		*last_path;
+
+	while (!vnic_npevent_thread_end ||
+	       !list_empty(&vnic_npevent_list)) {
+		unsigned long flags;
+
+		wait_event_interruptible(vnic_npevent_queue,
+					 !list_empty(&vnic_npevent_list)
+					 || vnic_npevent_thread_end);
+		spin_lock_irqsave(&vnic_npevent_list_lock, flags);
+		if (list_empty(&vnic_npevent_list)) {
+			spin_unlock_irqrestore(&vnic_npevent_list_lock,
+					       flags);
+			VNIC_INFO("netpath statemachine wake"
+				  " on empty list\n");
+			continue;
+		}
+
+		vnic_link_evt = list_entry(vnic_npevent_list.next,
+					   struct vnic_npevent,
+					   list_ptrs);
+		list_del(&vnic_link_evt->list_ptrs);
+		spin_unlock_irqrestore(&vnic_npevent_list_lock, flags);
+		vnic = vnic_link_evt->vnic;
+		npevt_type = vnic_link_evt->event_type;
+		kfree(vnic_link_evt);
+
+		if (vnic->current_path == &vnic->secondary_path)
+			other_path_ok = vnic->primary_path.carrier;
+		else if (vnic->current_path == &vnic->primary_path)
+			other_path_ok = vnic->secondary_path.carrier;
+
+		vnic = vnic_handle_npevent(vnic, npevt_type);
+
+		if (!vnic)
+			continue;
+
+		last_carrier = vnic->carrier;
+		last_path = vnic->current_path;
+
+		if (!vnic->current_path ||
+		    !vnic->current_path->carrier) {
+			vnic->carrier = 0;
+			vnic->current_path = NULL;
+			vnic->netdevice->features &= ~NETIF_F_IP_CSUM;
+		}
+
+		if (!vnic->carrier)
+			vnic_carrier_loss(vnic, last_path);
+		else if ((vnic->current_path != &vnic->primary_path) &&
+			 (vnic->config->prefer_primary) &&
+			 (vnic->primary_path.carrier))
+				vnic_check_primary_path_timer(vnic);
+
+		if (last_path)
+			vnic_report_path_change(vnic, last_path,
+						other_path_ok);
+
+		VNIC_INFO("new netpath=%s, carrier=%d\n",
+			  netpath_to_string(vnic, vnic->current_path),
+			  vnic->carrier);
+
+		if (vnic->current_path != last_path)
+			vnic_handle_path_change(vnic, &last_path);
+
+		if (vnic->carrier != last_carrier) {
+			if (vnic->carrier) {
+				VNIC_INFO("netif_carrier_on\n");
+				netif_carrier_on(vnic->netdevice);
+				vnic_carrier_loss_stats(vnic);
+			} else {
+				VNIC_INFO("netif_carrier_off\n");
+				netif_carrier_off(vnic->netdevice);
+				vnic_disconn_stats(vnic);
+			}
+
+		}
+	}
+	complete_and_exit(&vnic_npevent_thread_exit, 0);
+	return 0;
+}
+
+void vnic_npevent_queue_evt(struct netpath *netpath,
+			    enum vnic_npevent_type evt)
+{
+	struct vnic_npevent *npevent;
+	unsigned long flags;
+
+	npevent = kmalloc(sizeof *npevent, GFP_ATOMIC);
+	if (!npevent) {
+		VNIC_ERROR("Could not allocate memory for vnic event\n");
+		return;
+	}
+	npevent->vnic = netpath->parent;
+	npevent->event_type = evt;
+	INIT_LIST_HEAD(&npevent->list_ptrs);
+	spin_lock_irqsave(&vnic_npevent_list_lock, flags);
+	list_add_tail(&npevent->list_ptrs, &vnic_npevent_list);
+	spin_unlock_irqrestore(&vnic_npevent_list_lock, flags);
+	wake_up(&vnic_npevent_queue);
+}
+
+void vnic_npevent_dequeue_evt(struct netpath *netpath,
+			      enum vnic_npevent_type evt)
+{
+	unsigned long flags;
+	struct vnic_npevent *npevt, *tmp;
+	struct vnic *vnic = netpath->parent;
+
+	spin_lock_irqsave(&vnic_npevent_list_lock, flags);
+	if (list_empty(&vnic_npevent_list))
+		goto out;
+	list_for_each_entry_safe(npevt, tmp, &vnic_npevent_list,
+				 list_ptrs) {
+		if ((npevt->vnic == vnic) &&
+		    (npevt->event_type == evt)) {
+			list_del(&npevt->list_ptrs);
+			kfree(npevt);
+			break;
+		}
+	}
+out:
+	spin_unlock_irqrestore(&vnic_npevent_list_lock, flags);
+}
+
+static int vnic_npevent_start(void)
+{
+	VNIC_FUNCTION("vnic_npevent_start()\n");
+
+	spin_lock_init(&vnic_npevent_list_lock);
+	vnic_npevent_thread = kthread_run(vnic_npevent_statemachine, NULL,
+						"qlgc_vnic_npevent_s_m");
+	if (IS_ERR(vnic_npevent_thread)) {
+		printk(KERN_WARNING PFX "failed to create vnic npevent"
+		       " thread; error %d\n",
+			(int) PTR_ERR(vnic_npevent_thread));
+		vnic_npevent_thread = NULL;
+		return 1;
+	}
+
+	return 0;
+}
+
+void vnic_npevent_cleanup(void)
+{
+	if (vnic_npevent_thread) {
+		vnic_npevent_thread_end = 1;
+		wake_up(&vnic_npevent_queue);
+		wait_for_completion(&vnic_npevent_thread_exit);
+		vnic_npevent_thread = NULL;
+	}
+}
+
+static void vnic_setup(struct net_device *device)
+{
+	ether_setup(device);
+
+	/* ether_setup is used to fill
+	 * device parameters for ethernet devices.
+	 * We override some of the parameters
+	 * which are specific to VNIC.
+	 */
+	device->get_stats		= vnic_get_stats;
+	device->open			= vnic_open;
+	device->stop			= vnic_stop;
+	device->hard_start_xmit		= vnic_hard_start_xmit;
+	device->tx_timeout		= vnic_tx_timeout;
+	device->set_multicast_list	= vnic_set_multicast_list;
+	device->set_mac_address		= vnic_set_mac_address;
+	device->change_mtu		= vnic_change_mtu;
+	device->watchdog_timeo 		= 10 * HZ;
+	device->features		= 0;
+}
+
+struct vnic *vnic_allocate(struct vnic_config *config)
+{
+	struct vnic *vnic = NULL;
+
+	VNIC_FUNCTION("vnic_allocate()\n");
+	vnic = kzalloc(sizeof *vnic, GFP_KERNEL);
+	if (!vnic) {
+		VNIC_ERROR("failed allocating vnic structure\n");
+		return NULL;
+	}
+
+	spin_lock_init(&vnic->lock);
+	vnic_alloc_stats(vnic);
+	vnic->state = VNIC_UNINITIALIZED;
+	vnic->config = config;
+
+	/* Allocating a VNIC network device.
+	 * The private data structure for VNIC will be taken care by the
+	 * VNIC driver, hence setting size of private data structure to 0.
+	 */
+	vnic->netdevice = alloc_netdev((int) 0, config->name, vnic_setup);
+	vnic->netdevice->priv = (void *)vnic;
+
+	netpath_init(&vnic->primary_path, vnic, 0);
+	netpath_init(&vnic->secondary_path, vnic, 1);
+
+	vnic->current_path = NULL;
+
+	list_add_tail(&vnic->list_ptrs, &vnic_list);
+
+	return vnic;
+}
+
+void vnic_free(struct vnic *vnic)
+{
+	VNIC_FUNCTION("vnic_free()\n");
+	list_del(&vnic->list_ptrs);
+	vnic_npevent_queue_evt(&vnic->primary_path, VNIC_NP_FREEVNIC);
+}
+
+static void __exit vnic_cleanup(void)
+{
+	VNIC_FUNCTION("vnic_cleanup()\n");
+
+	VNIC_INIT("unloading %s\n", MODULEDETAILS);
+
+	while (!list_empty(&vnic_list)) {
+		struct vnic *vnic =
+		    list_entry(vnic_list.next, struct vnic, list_ptrs);
+		vnic_free(vnic);
+	}
+
+	vnic_npevent_cleanup();
+	viport_cleanup();
+	vnic_ib_cleanup();
+}
+
+static int __init vnic_init(void)
+{
+	int ret;
+	VNIC_FUNCTION("vnic_init()\n");
+	VNIC_INIT("Initializing %s\n", MODULEDETAILS);
+
+	ret = config_start();
+	if (ret) {
+		VNIC_ERROR("config_start failed\n");
+		goto failure;
+	}
+
+	ret = vnic_ib_init();
+	if (ret) {
+		VNIC_ERROR("ib_start failed\n");
+		goto failure;
+	}
+
+	ret = viport_start();
+	if (ret) {
+		VNIC_ERROR("viport_start failed\n");
+		goto failure;
+	}
+
+	ret = vnic_npevent_start();
+	if (ret) {
+		VNIC_ERROR("vnic_npevent_start failed\n");
+		goto failure;
+	}
+
+	return 0;
+failure:
+	vnic_cleanup();
+	return ret;
+}
+
+module_init(vnic_init);
+module_exit(vnic_cleanup);
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h
new file mode 100644
index 0000000..c5ccd8b
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_main.h
@@ -0,0 +1,167 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_MAIN_H_INCLUDED
+#define VNIC_MAIN_H_INCLUDED
+
+#include <linux/timex.h>
+#include <linux/netdevice.h>
+#include <linux/kthread.h>
+#include <linux/fs.h>
+
+#include "vnic_config.h"
+#include "vnic_netpath.h"
+
+extern u16 vnic_max_mtu;
+extern struct list_head vnic_list;
+extern struct attribute_group vnic_stats_attr_group;
+extern cycles_t recv_ref;
+
+enum vnic_npevent_type {
+	VNIC_PRINP_CONNECTED	= 0,
+	VNIC_PRINP_DISCONNECTED	= 1,
+	VNIC_PRINP_LINKUP	= 2,
+	VNIC_PRINP_LINKDOWN	= 3,
+	VNIC_PRINP_TIMEREXPIRED	= 4,
+	VNIC_PRINP_SETLINK	= 5,
+
+	/* used to figure out PRI vs SEC types for dbg msg*/
+	VNIC_PRINP_LASTTYPE     = VNIC_PRINP_SETLINK,
+
+	VNIC_SECNP_CONNECTED	= 6,
+	VNIC_SECNP_DISCONNECTED	= 7,
+	VNIC_SECNP_LINKUP	= 8,
+	VNIC_SECNP_LINKDOWN	= 9,
+	VNIC_SECNP_TIMEREXPIRED	= 10,
+	VNIC_SECNP_SETLINK	= 11,
+
+	/* used to figure out PRI vs SEC types for dbg msg*/
+	VNIC_SECNP_LASTTYPE     = VNIC_SECNP_SETLINK,
+
+	VNIC_NP_FREEVNIC	= 12,
+};
+
+/* This array should be kept next to enum above since a change to npevent_type
+   enum affects this array. */
+static const char *const vnic_npevent_str[] = {
+    "PRIMARY CONNECTED",
+    "PRIMARY DISCONNECTED",
+    "PRIMARY CARRIER",
+    "PRIMARY NO CARRIER",
+    "PRIMARY TIMER EXPIRED",
+    "PRIMARY SETLINK",
+    "SECONDARY CONNECTED",
+    "SECONDARY DISCONNECTED",
+    "SECONDARY CARRIER",
+    "SECONDARY NO CARRIER",
+    "SECONDARY TIMER EXPIRED",
+    "SECONDARY SETLINK",
+    "FREE VNIC",
+};
+
+
+struct vnic_npevent {
+	struct list_head	list_ptrs;
+	struct vnic		*vnic;
+	enum vnic_npevent_type	event_type;
+};
+
+void vnic_npevent_queue_evt(struct netpath *netpath,
+			    enum vnic_npevent_type evt);
+void vnic_npevent_dequeue_evt(struct netpath *netpath,
+			      enum vnic_npevent_type evt);
+
+enum vnic_state {
+	VNIC_UNINITIALIZED	= 0,
+	VNIC_REGISTERED		= 1
+};
+
+struct vnic {
+	struct list_head		list_ptrs;
+	enum vnic_state			state;
+	struct vnic_config		*config;
+	struct netpath			*current_path;
+	struct netpath			primary_path;
+	struct netpath			secondary_path;
+	int				open;
+	int				carrier;
+	int				xmit_started;
+	int				mac_set;
+	struct net_device_stats 	stats;
+	struct net_device		*netdevice;
+	struct dev_info			dev_info;
+	struct dev_mc_list		*mc_list;
+	int				mc_list_len;
+	int				mc_count;
+	spinlock_t			lock;
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+	struct {
+		cycles_t	start_time;
+		cycles_t	conn_time;
+		cycles_t	disconn_ref;	/* intermediate time */
+		cycles_t	disconn_time;
+		u32		disconn_num;
+		cycles_t	xmit_time;
+		u32		xmit_num;
+		u32		xmit_fail;
+		cycles_t	recv_time;
+		u32		recv_num;
+		u32		multicast_recv_num;
+		cycles_t	xmit_ref;	/* intermediate time */
+		cycles_t	xmit_off_time;
+		u32		xmit_off_num;
+		cycles_t	carrier_ref;	/* intermediate time */
+		cycles_t	carrier_off_time;
+		u32		carrier_off_num;
+	} statistics;
+	struct dev_info		stat_info;
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+};
+
+struct vnic *vnic_allocate(struct vnic_config *config);
+
+void vnic_free(struct vnic *vnic);
+
+void vnic_connected(struct vnic *vnic, struct netpath *netpath);
+void vnic_disconnected(struct vnic *vnic, struct netpath *netpath);
+
+void vnic_link_up(struct vnic *vnic, struct netpath *netpath);
+void vnic_link_down(struct vnic *vnic, struct netpath *netpath);
+
+void vnic_stop_xmit(struct vnic *vnic, struct netpath *netpath);
+void vnic_restart_xmit(struct vnic *vnic, struct netpath *netpath);
+
+void vnic_recv_packet(struct vnic *vnic, struct netpath *netpath,
+		      struct sk_buff *skb);
+void vnic_npevent_cleanup(void);
+void completion_callback_cleanup(struct vnic_ib_conn *ib_conn);
+#endif	/* VNIC_MAIN_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:17:24 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:47:24 +0530
Subject: [ofa-general] [PATCH 03/13] QLogic VNIC: Implementation of
	communication protocol with EVIC/VEx
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171724.31725.91243.stgit@localhost.localdomain>

From: Poornima Kamath <poornima.kamath at qlogic.com>

Implementation of the statemachine for the protocol used while 
communicating with the EVIC. The patch also implements the viport
abstraction which represents the virtual ethernet port on EVIC.

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c | 1233 ++++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h |  176 +++
 2 files changed, 1409 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c
new file mode 100644
index 0000000..e44e31b
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.c
@@ -0,0 +1,1233 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/netdevice.h>
+#include <linux/completion.h>
+#include <linux/delay.h>
+#include <linux/net.h>
+
+#include "vnic_util.h"
+#include "vnic_main.h"
+#include "vnic_viport.h"
+#include "vnic_netpath.h"
+#include "vnic_control.h"
+#include "vnic_data.h"
+#include "vnic_config.h"
+#include "vnic_control_pkt.h"
+
+#define VIPORT_DISCONN_TIMER	10000 	 /* 10 seconds */
+
+#define MAX_RETRY_INTERVAL 	  20000  /* 20 seconds */
+#define RETRY_INCREMENT		  5000   /* 5 seconds  */
+#define MAX_CONNECT_RETRY_TIMEOUT 600000 /* 10 minutes */
+
+static DECLARE_WAIT_QUEUE_HEAD(viport_queue);
+static LIST_HEAD(viport_list);
+static DECLARE_COMPLETION(viport_thread_exit);
+static spinlock_t viport_list_lock;
+
+static struct task_struct *viport_thread;
+static int viport_thread_end;
+
+static void viport_timer(struct viport *viport, int timeout);
+
+struct viport *viport_allocate(struct viport_config *config)
+{
+	struct viport *viport;
+
+	VIPORT_FUNCTION("viport_allocate()\n");
+	viport = kzalloc(sizeof *viport, GFP_KERNEL);
+	if (!viport) {
+		VIPORT_ERROR("failed allocating viport structure\n");
+		return NULL;
+	}
+
+	viport->state = VIPORT_DISCONNECTED;
+	viport->link_state = LINK_FIRSTCONNECT;
+	viport->connect = WAIT;
+	viport->new_mtu = 1500;
+	viport->new_flags = 0;
+	viport->config = config;
+	viport->connect = DELAY;
+	viport->data.max_mtu = vnic_max_mtu;
+	spin_lock_init(&viport->lock);
+	init_waitqueue_head(&viport->stats_queue);
+	init_waitqueue_head(&viport->disconnect_queue);
+	init_waitqueue_head(&viport->reference_queue);
+	INIT_LIST_HEAD(&viport->list_ptrs);
+
+	vnic_mc_init(viport);
+
+	return viport;
+}
+
+void viport_connect(struct viport *viport, int delay)
+{
+	VIPORT_FUNCTION("viport_connect()\n");
+
+	if (viport->connect != DELAY)
+		viport->connect = (delay) ? DELAY : NOW;
+	if (viport->link_state == LINK_FIRSTCONNECT) {
+		u32 duration;
+		duration = (net_random() & 0x1ff);
+		if (!viport->parent->is_primary_path)
+			duration += 0x1ff;
+		viport->link_state = LINK_RETRYWAIT;
+		viport_timer(viport, duration);
+	} else
+		viport_kick(viport);
+}
+
+void viport_disconnect(struct viport *viport)
+{
+	VIPORT_FUNCTION("viport_disconnect()\n");
+	viport->disconnect = 1;
+	viport_failure(viport);
+	wait_event(viport->disconnect_queue, viport->disconnect == 0);
+}
+
+void viport_free(struct viport *viport)
+{
+	VIPORT_FUNCTION("viport_free()\n");
+	vnic_mc_uninit(viport);
+	viport_disconnect(viport);	/* NOTE: this can sleep */
+	kfree(viport->config);
+	kfree(viport);
+}
+
+void viport_set_link(struct viport *viport, u16 flags, u16 mtu)
+{
+	unsigned long localflags;
+	int i;
+
+	VIPORT_FUNCTION("viport_set_link()\n");
+	if (mtu > data_max_mtu(&viport->data)) {
+		VIPORT_ERROR("configuration error."
+			     " mtu of %d unsupported by %s\n", mtu,
+			     config_viport_name(viport->config));
+		goto failure;
+	}
+
+	spin_lock_irqsave(&viport->lock, localflags);
+	flags &= IFF_UP | IFF_ALLMULTI | IFF_PROMISC;
+	if ((viport->new_flags != flags)
+	    || (viport->new_mtu != mtu)) {
+		viport->new_flags = flags;
+		viport->new_mtu = mtu;
+		viport->updates |= NEED_LINK_CONFIG;
+		if (viport->features_supported & VNIC_FEAT_INBOUND_IB_MC) {
+			if (((viport->mtu <= MCAST_MSG_SIZE) && (mtu >  MCAST_MSG_SIZE)) ||
+			    ((viport->mtu >  MCAST_MSG_SIZE) && (mtu <= MCAST_MSG_SIZE))) {
+			/*
+			 * MTU value will enable/disable the multicast. In
+			 * either case, need to send the CMD_CONFIG_ADDRESS2 to
+			 * EVIC. Hence, setting the NEED_ADDRESS_CONFIG flag.
+			 */
+				viport->updates |= NEED_ADDRESS_CONFIG;
+				if (mtu <= MCAST_MSG_SIZE) {
+				    VIPORT_PRINT("%s: MTU changed; "
+						"old:%d new:%d (threshold:%d);"
+						" MULTICAST will be enabled.\n",
+						config_viport_name(viport->config),
+						viport->mtu, mtu,
+						(int)MCAST_MSG_SIZE);
+				} else {
+				    VIPORT_PRINT("%s: MTU changed; "
+						"old:%d new:%d (threshold:%d); "
+						"MULTICAST will be disabled.\n",
+						config_viport_name(viport->config),
+						viport->mtu, mtu,
+						(int)MCAST_MSG_SIZE);
+				}
+				/* When we resend these addresses, EVIC will
+				 * send mgid=0 back in response. So no need to
+				 * shutoff ib_multicast.
+				 */
+				for (i = MCAST_ADDR_START; i < viport->num_mac_addresses; i++) {
+					if (viport->mac_addresses[i].valid)
+						viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY;
+				}
+			}
+		}
+		viport_kick(viport);
+	}
+
+	spin_unlock_irqrestore(&viport->lock, localflags);
+	return;
+failure:
+	viport_failure(viport);
+}
+
+int viport_set_unicast(struct viport *viport, u8 *address)
+{
+	unsigned long flags;
+	int	ret = -1;
+	VIPORT_FUNCTION("viport_set_unicast()\n");
+	spin_lock_irqsave(&viport->lock, flags);
+
+	if (!viport->mac_addresses)
+		goto out;
+
+	if (memcmp(viport->mac_addresses[UNICAST_ADDR].address,
+		   address, ETH_ALEN)) {
+		memcpy(viport->mac_addresses[UNICAST_ADDR].address,
+		       address, ETH_ALEN);
+		viport->mac_addresses[UNICAST_ADDR].operation
+		    = VNIC_OP_SET_ENTRY;
+		viport->updates |= NEED_ADDRESS_CONFIG;
+		viport_kick(viport);
+	}
+	ret = 0;
+out:
+	spin_unlock_irqrestore(&viport->lock, flags);
+	return ret;
+}
+
+int viport_set_multicast(struct viport *viport,
+			 struct dev_mc_list *mc_list, int mc_count)
+{
+	u32 old_update_list;
+	int i;
+	int ret = -1;
+	unsigned long flags;
+
+	VIPORT_FUNCTION("viport_set_multicast()\n");
+	spin_lock_irqsave(&viport->lock, flags);
+
+	if (!viport->mac_addresses)
+		goto out;
+
+	old_update_list = viport->updates;
+	if (mc_count > viport->num_mac_addresses - MCAST_ADDR_START)
+		viport->updates |= NEED_LINK_CONFIG | MCAST_OVERFLOW;
+	else {
+		if (mc_count == 0) {
+			ret = 0;
+			goto out;
+		}
+		if (viport->updates & MCAST_OVERFLOW) {
+			viport->updates &= ~MCAST_OVERFLOW;
+			viport->updates |= NEED_LINK_CONFIG;
+		}
+		for (i = MCAST_ADDR_START; i < mc_count + MCAST_ADDR_START;
+						i++, mc_list = mc_list->next) {
+			if (viport->mac_addresses[i].valid &&
+				!memcmp(viport->mac_addresses[i].address,
+						mc_list->dmi_addr, ETH_ALEN))
+			continue;
+		memcpy(viport->mac_addresses[i].address,
+					 mc_list->dmi_addr, ETH_ALEN);
+		viport->mac_addresses[i].valid = 1;
+		viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY;
+	}
+	for (; i < viport->num_mac_addresses; i++) {
+		if (!viport->mac_addresses[i].valid)
+			continue;
+		viport->mac_addresses[i].valid = 0;
+		viport->mac_addresses[i].operation = VNIC_OP_SET_ENTRY;
+	}
+	if (mc_count)
+		viport->updates |= NEED_ADDRESS_CONFIG;
+	}
+
+	if (viport->updates != old_update_list)
+		viport_kick(viport);
+	ret = 0;
+out:
+	spin_unlock_irqrestore(&viport->lock, flags);
+	return ret;
+}
+
+static inline void viport_disable_multicast(struct viport *viport)
+{
+	VIPORT_INFO("turned off IB_MULTICAST\n");
+	viport->config->control_config.ib_multicast = 0;
+	viport->config->control_config.ib_config.conn_data.features_supported &=
+				__constant_cpu_to_be32((u32)~VNIC_FEAT_INBOUND_IB_MC);
+	viport->link_state = LINK_RESET;
+}
+
+void viport_get_stats(struct viport *viport,
+		     struct net_device_stats *stats)
+{
+	unsigned long flags;
+
+	VIPORT_FUNCTION("viport_get_stats()\n");
+	if (jiffies > viport->last_stats_time +
+		      viport->config->stats_interval) {
+
+		spin_lock_irqsave(&viport->lock, flags);
+		viport->updates |= NEED_STATS;
+		/* increment reference count which indicates
+		 * that viport structure is being used, which
+		 * prevents its freeing when this task sleeps
+		 */
+		viport->reference_count++;
+		spin_unlock_irqrestore(&viport->lock, flags);
+		viport_kick(viport);
+		wait_event(viport->stats_queue,
+			   !(viport->updates & NEED_STATS));
+
+		if (viport->stats.ethernet_status)
+			vnic_link_up(viport->vnic, viport->parent);
+		else
+			vnic_link_down(viport->vnic, viport->parent);
+
+	} else {
+		spin_lock_irqsave(&viport->lock, flags);
+		viport->reference_count++;
+		spin_unlock_irqrestore(&viport->lock, flags);
+	}
+
+	stats->rx_packets = be64_to_cpu(viport->stats.if_in_ok);
+	stats->tx_packets = be64_to_cpu(viport->stats.if_out_ok);
+	stats->rx_bytes   = be64_to_cpu(viport->stats.if_in_octets);
+	stats->tx_bytes   = be64_to_cpu(viport->stats.if_out_octets);
+	stats->rx_errors  = be64_to_cpu(viport->stats.if_in_errors);
+	stats->tx_errors  = be64_to_cpu(viport->stats.if_out_errors);
+	stats->rx_dropped = 0;	/* EIOC doesn't track */
+	stats->tx_dropped = 0;	/* EIOC doesn't track */
+	stats->multicast  = be64_to_cpu(viport->stats.if_in_nucast_pkts);
+	stats->collisions = 0;	/* EIOC doesn't track */
+
+	spin_lock_irqsave(&viport->lock, flags);
+	viport->reference_count--;
+	spin_unlock_irqrestore(&viport->lock, flags);
+	wake_up(&viport->reference_queue);
+}
+
+int viport_xmit_packet(struct viport *viport, struct sk_buff *skb)
+{
+	int status = -1;
+	unsigned long flags;
+
+	VIPORT_FUNCTION("viport_xmit_packet()\n");
+	spin_lock_irqsave(&viport->lock, flags);
+	if (viport->state == VIPORT_CONNECTED)
+		status = data_xmit_packet(&viport->data, skb);
+	spin_unlock_irqrestore(&viport->lock, flags);
+
+	return status;
+}
+
+void viport_kick(struct viport *viport)
+{
+	unsigned long flags;
+
+	VIPORT_FUNCTION("viport_kick()\n");
+	spin_lock_irqsave(&viport_list_lock, flags);
+	if (list_empty(&viport->list_ptrs)) {
+		list_add_tail(&viport->list_ptrs, &viport_list);
+		wake_up(&viport_queue);
+	}
+	spin_unlock_irqrestore(&viport_list_lock, flags);
+}
+
+void viport_failure(struct viport *viport)
+{
+	unsigned long flags;
+
+	VIPORT_FUNCTION("viport_failure()\n");
+	spin_lock_irqsave(&viport_list_lock, flags);
+	viport->errored = 1;
+	if (list_empty(&viport->list_ptrs)) {
+		list_add_tail(&viport->list_ptrs, &viport_list);
+		wake_up(&viport_queue);
+	}
+	spin_unlock_irqrestore(&viport_list_lock, flags);
+}
+
+static void viport_timeout(unsigned long data)
+{
+	struct viport *viport;
+
+	VIPORT_FUNCTION("viport_timeout()\n");
+	viport = (struct viport *)data;
+	viport->timer_active = 0;
+	viport_kick(viport);
+}
+
+static void viport_timer(struct viport *viport, int timeout)
+{
+	VIPORT_FUNCTION("viport_timer()\n");
+	if (viport->timer_active)
+		del_timer(&viport->timer);
+	init_timer(&viport->timer);
+	viport->timer.expires = jiffies + timeout;
+	viport->timer.data = (unsigned long)viport;
+	viport->timer.function = viport_timeout;
+	viport->timer_active = 1;
+	add_timer(&viport->timer);
+}
+
+static void viport_timer_stop(struct viport *viport)
+{
+	VIPORT_FUNCTION("viport_timer_stop()\n");
+	if (viport->timer_active)
+		del_timer(&viport->timer);
+	viport->timer_active = 0;
+}
+
+static int viport_init_mac_addresses(struct viport *viport)
+{
+	struct vnic_address_op2	*temp;
+	unsigned long		flags;
+	int			i;
+
+	VIPORT_FUNCTION("viport_init_mac_addresses()\n");
+	i = viport->num_mac_addresses * sizeof *temp;
+	temp = kzalloc(viport->num_mac_addresses * sizeof *temp,
+		       GFP_KERNEL);
+	if (!temp) {
+		VIPORT_ERROR("failed allocating MAC address table\n");
+		return -ENOMEM;
+	}
+
+	spin_lock_irqsave(&viport->lock, flags);
+	viport->mac_addresses = temp;
+	for (i = 0; i < viport->num_mac_addresses; i++) {
+		viport->mac_addresses[i].index = cpu_to_be16(i);
+		viport->mac_addresses[i].vlan =
+				cpu_to_be16(viport->default_vlan);
+	}
+	memset(viport->mac_addresses[BROADCAST_ADDR].address,
+	       0xFF, ETH_ALEN);
+	viport->mac_addresses[BROADCAST_ADDR].valid = 1;
+	memcpy(viport->mac_addresses[UNICAST_ADDR].address,
+	       viport->hw_mac_address, ETH_ALEN);
+	viport->mac_addresses[UNICAST_ADDR].valid = 1;
+
+	spin_unlock_irqrestore(&viport->lock, flags);
+
+	return 0;
+}
+
+static inline void viport_match_mac_address(struct vnic *vnic,
+					    struct viport *viport)
+{
+	if (vnic && vnic->current_path &&
+	    viport == vnic->current_path->viport &&
+	    vnic->mac_set &&
+	    memcmp(vnic->netdevice->dev_addr, viport->hw_mac_address, ETH_ALEN)) {
+		VIPORT_ERROR("*** ERROR MAC address mismatch; "
+				"current = %02x:%02x:%02x:%02x:%02x:%02x "
+				"From EVIC = %02x:%02x:%02x:%02x:%02x:%02x\n",
+				vnic->netdevice->dev_addr[0],
+				vnic->netdevice->dev_addr[1],
+				vnic->netdevice->dev_addr[2],
+				vnic->netdevice->dev_addr[3],
+				vnic->netdevice->dev_addr[4],
+				vnic->netdevice->dev_addr[5],
+				viport->hw_mac_address[0],
+				viport->hw_mac_address[1],
+				viport->hw_mac_address[2],
+				viport->hw_mac_address[3],
+				viport->hw_mac_address[4],
+				viport->hw_mac_address[5]);
+	}
+}
+
+static int viport_handle_init_states(struct viport *viport)
+{
+	enum link_state old_state;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_UNINITIALIZED:
+			LINK_STATE("state LINK_UNINITIALIZED\n");
+			viport->updates = 0;
+			/* cleanup_started will ensure that
+			 * no more get_stats request will be
+			 * be sent.Old stats will be returned
+			 */
+			viport->parent->cleanup_started = 1;
+			wake_up(&viport->stats_queue);
+			spin_lock_irq(&viport_list_lock);
+			list_del_init(&viport->list_ptrs);
+			spin_unlock_irq(&viport_list_lock);
+			spin_lock_irq(&viport->lock);
+			if (viport->reference_count) {
+				spin_unlock_irq(&viport->lock);
+				wait_event(viport->reference_queue,
+						 viport->reference_count == 0);
+			} else
+				spin_unlock_irq(&viport->lock);
+			/* No more references to viport structure
+			 * so it is safe to delete it by waking disconnect
+			 * queue
+			 */
+
+			viport->disconnect = 0;
+			wake_up(&viport->disconnect_queue);
+			break;
+		case LINK_INITIALIZE:
+			LINK_STATE("state LINK_INITIALIZE\n");
+			viport->errored = 0;
+			viport->connect = WAIT;
+			viport->last_stats_time = 0;
+			if (viport->disconnect)
+				viport->link_state = LINK_UNINITIALIZED;
+			else
+				viport->link_state = LINK_INITIALIZECONTROL;
+			break;
+		case LINK_INITIALIZECONTROL:
+			LINK_STATE("state LINK_INITIALIZECONTROL\n");
+			viport->pd = ib_alloc_pd(viport->config->ibdev);
+			if (IS_ERR(viport->pd))
+				viport->link_state = LINK_DISCONNECTED;
+			else if (control_init(&viport->control, viport,
+					    &viport->config->control_config,
+					    viport->pd)) {
+				ib_dealloc_pd(viport->pd);
+				viport->link_state = LINK_DISCONNECTED;
+
+			} else
+				viport->link_state = LINK_INITIALIZEDATA;
+			break;
+		case LINK_INITIALIZEDATA:
+			LINK_STATE("state LINK_INITIALIZEDATA\n");
+			if (data_init(&viport->data, viport,
+				      &viport->config->data_config,
+				      viport->pd))
+				viport->link_state = LINK_CLEANUPCONTROL;
+			else
+				viport->link_state = LINK_CONTROLCONNECT;
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_control_states(struct viport *viport)
+{
+	enum link_state old_state;
+	struct vnic *vnic;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_CONTROLCONNECT:
+			if (vnic_ib_cm_connect(&viport->control.ib_conn))
+				viport->link_state = LINK_CLEANUPDATA;
+			else
+				viport->link_state = LINK_CONTROLCONNECTWAIT;
+			break;
+		case LINK_CONTROLCONNECTWAIT:
+			LINK_STATE("state LINK_CONTROLCONNECTWAIT\n");
+			if (control_is_connected(&viport->control))
+				viport->link_state = LINK_INITVNICREQ;
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_CONTROLDISCONNECT;
+			}
+			break;
+		case LINK_INITVNICREQ:
+			LINK_STATE("state LINK_INITVNICREQ\n");
+			if (control_init_vnic_req(&viport->control))
+				viport->link_state = LINK_RESETCONTROL;
+			else
+				viport->link_state = LINK_INITVNICRSP;
+			break;
+		case LINK_INITVNICRSP:
+			LINK_STATE("state LINK_INITVNICRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_init_vnic_rsp(&viport->control,
+						  &viport->features_supported,
+						  viport->hw_mac_address,
+						  &viport->num_mac_addresses,
+						  &viport->default_vlan)) {
+				if (viport_init_mac_addresses(viport))
+					viport->link_state =
+							LINK_RESETCONTROL;
+				else {
+					viport->link_state =
+							LINK_BEGINDATAPATH;
+					/*
+					 * Ensure that the current path's MAC
+					 * address matches the one returned by
+					 * EVIC - we've had cases of mismatch
+					 * which then caused havoc.
+					 */
+					vnic = viport->parent->parent;
+					viport_match_mac_address(vnic, viport);
+				}
+			}
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESETCONTROL;
+			}
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_data_states(struct viport *viport)
+{
+	enum link_state old_state;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_BEGINDATAPATH:
+			LINK_STATE("state LINK_BEGINDATAPATH\n");
+			viport->link_state = LINK_CONFIGDATAPATHREQ;
+			break;
+		case LINK_CONFIGDATAPATHREQ:
+			LINK_STATE("state LINK_CONFIGDATAPATHREQ\n");
+			if (control_config_data_path_req(&viport->control,
+						data_path_id(&viport->
+							     data),
+						data_host_pool_max
+						(&viport->data),
+						data_eioc_pool_max
+						(&viport->data)))
+				viport->link_state = LINK_RESETCONTROL;
+			else
+				viport->link_state = LINK_CONFIGDATAPATHRSP;
+			break;
+		case LINK_CONFIGDATAPATHRSP:
+			LINK_STATE("state LINK_CONFIGDATAPATHRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_config_data_path_rsp(&viport->control,
+							 data_host_pool
+							 (&viport->data),
+							 data_eioc_pool
+							 (&viport->data),
+							 data_host_pool_max
+							 (&viport->data),
+							 data_eioc_pool_max
+							 (&viport->data),
+							 data_host_pool_min
+							 (&viport->data),
+							 data_eioc_pool_min
+							 (&viport->data)))
+				viport->link_state = LINK_DATACONNECT;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESETCONTROL;
+			}
+			break;
+		case LINK_DATACONNECT:
+			LINK_STATE("state LINK_DATACONNECT\n");
+			if (data_connect(&viport->data))
+				viport->link_state = LINK_RESETCONTROL;
+			else
+				viport->link_state = LINK_DATACONNECTWAIT;
+			break;
+		case LINK_DATACONNECTWAIT:
+			LINK_STATE("state LINK_DATACONNECTWAIT\n");
+			control_process_async(&viport->control);
+			if (data_is_connected(&viport->data))
+				viport->link_state = LINK_XCHGPOOLREQ;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+			}
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_xchgpool_states(struct viport *viport)
+{
+	enum link_state old_state;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_XCHGPOOLREQ:
+			LINK_STATE("state LINK_XCHGPOOLREQ\n");
+			if (control_exchange_pools_req(&viport->control,
+						       data_local_pool_addr
+						       (&viport->data),
+						       data_local_pool_rkey
+						       (&viport->data)))
+				viport->link_state = LINK_RESET;
+			else
+				viport->link_state = LINK_XCHGPOOLRSP;
+			break;
+		case LINK_XCHGPOOLRSP:
+			LINK_STATE("state LINK_XCHGPOOLRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_exchange_pools_rsp(&viport->control,
+						       data_remote_pool_addr
+						       (&viport->data),
+						       data_remote_pool_rkey
+						       (&viport->data)))
+				viport->link_state = LINK_INITIALIZED;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+			}
+			break;
+		case LINK_INITIALIZED:
+			LINK_STATE("state LINK_INITIALIZED\n");
+			viport->state = VIPORT_CONNECTED;
+			printk(KERN_INFO PFX
+			       "%s: connection established\n",
+			       config_viport_name(viport->config));
+			data_connected(&viport->data);
+			vnic_connected(viport->parent->parent,
+				       viport->parent);
+			if (viport->features_supported & VNIC_FEAT_INBOUND_IB_MC) {
+				printk(KERN_INFO PFX "%s: Supports Inbound IB "
+					"Multicast\n",
+					config_viport_name(viport->config));
+				if (mc_data_init(&viport->mc_data, viport,
+						&viport->config->data_config,
+						viport->pd)) {
+					viport_disable_multicast(viport);
+					break;
+				}
+			}
+			spin_lock_irq(&viport->lock);
+			viport->mtu = 1500;
+			viport->flags = 0;
+			if ((viport->mtu != viport->new_mtu) ||
+			    (viport->flags != viport->new_flags))
+				viport->updates |= NEED_LINK_CONFIG;
+			spin_unlock_irq(&viport->lock);
+			viport->link_state = LINK_IDLE;
+			viport->retry_duration = 0;
+			viport->total_retry_duration = 0;
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_idle_states(struct viport *viport)
+{
+	enum link_state old_state;
+	int handle_mc_join_compl, handle_mc_join;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_IDLE:
+			LINK_STATE("state LINK_IDLE\n");
+			if (viport->config->hb_interval)
+				viport_timer(viport,
+					     viport->config->hb_interval);
+			viport->link_state = LINK_IDLING;
+			break;
+		case LINK_IDLING:
+			LINK_STATE("state LINK_IDLING\n");
+			control_process_async(&viport->control);
+			if (viport->errored) {
+				viport_timer_stop(viport);
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+				break;
+			}
+
+			spin_lock_irq(&viport->lock);
+			handle_mc_join = (viport->updates & NEED_MCAST_JOIN);
+			handle_mc_join_compl =
+				      (viport->updates & NEED_MCAST_COMPLETION);
+			/*
+			 * Turn off both flags, the handler functions will
+			 * rearm them if necessary.
+			 */
+			viport->updates &= ~(NEED_MCAST_JOIN | NEED_MCAST_COMPLETION);
+
+			if (viport->updates & NEED_LINK_CONFIG) {
+				viport_timer_stop(viport);
+				viport->link_state = LINK_CONFIGLINKREQ;
+			} else if (viport->updates & NEED_ADDRESS_CONFIG) {
+				viport_timer_stop(viport);
+				viport->link_state = LINK_CONFIGADDRSREQ;
+			} else if (viport->updates & NEED_STATS) {
+				viport_timer_stop(viport);
+				viport->link_state = LINK_REPORTSTATREQ;
+			} else if (viport->config->hb_interval) {
+				if (!viport->timer_active)
+					viport->link_state =
+						LINK_HEARTBEATREQ;
+			}
+			spin_unlock_irq(&viport->lock);
+			if (handle_mc_join) {
+				if (vnic_mc_join(viport))
+					viport_disable_multicast(viport);
+			}
+			if (handle_mc_join_compl)
+				vnic_mc_join_handle_completion(viport);
+
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_config_states(struct viport *viport)
+{
+	enum link_state old_state;
+	int res;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_CONFIGLINKREQ:
+			LINK_STATE("state LINK_CONFIGLINKREQ\n");
+			spin_lock_irq(&viport->lock);
+			viport->updates &= ~NEED_LINK_CONFIG;
+			viport->flags = viport->new_flags;
+			if (viport->updates & MCAST_OVERFLOW)
+				viport->flags |= IFF_ALLMULTI;
+			viport->mtu = viport->new_mtu;
+			spin_unlock_irq(&viport->lock);
+			if (control_config_link_req(&viport->control,
+						    viport->flags,
+						    viport->mtu))
+				viport->link_state = LINK_RESET;
+			else
+				viport->link_state = LINK_CONFIGLINKRSP;
+			break;
+		case LINK_CONFIGLINKRSP:
+			LINK_STATE("state LINK_CONFIGLINKRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_config_link_rsp(&viport->control,
+						    &viport->flags,
+						    &viport->mtu))
+				viport->link_state = LINK_IDLE;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+			}
+			break;
+		case LINK_CONFIGADDRSREQ:
+			LINK_STATE("state LINK_CONFIGADDRSREQ\n");
+
+			spin_lock_irq(&viport->lock);
+			res = control_config_addrs_req(&viport->control,
+						       viport->mac_addresses,
+						       viport->
+						       num_mac_addresses);
+
+			if (res > 0) {
+				viport->updates &= ~NEED_ADDRESS_CONFIG;
+				viport->link_state = LINK_CONFIGADDRSRSP;
+			} else if (res == 0)
+				viport->link_state = LINK_CONFIGADDRSRSP;
+			else
+				viport->link_state = LINK_RESET;
+			spin_unlock_irq(&viport->lock);
+			break;
+		case LINK_CONFIGADDRSRSP:
+			LINK_STATE("state LINK_CONFIGADDRSRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_config_addrs_rsp(&viport->control))
+				viport->link_state = LINK_IDLE;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+			}
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_stat_states(struct viport *viport)
+{
+	enum link_state old_state;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_REPORTSTATREQ:
+			LINK_STATE("state LINK_REPORTSTATREQ\n");
+			if (control_report_statistics_req(&viport->control))
+				viport->link_state = LINK_RESET;
+			else
+				viport->link_state = LINK_REPORTSTATRSP;
+			break;
+		case LINK_REPORTSTATRSP:
+			LINK_STATE("state LINK_REPORTSTATRSP\n");
+			control_process_async(&viport->control);
+
+			spin_lock_irq(&viport->lock);
+			if (control_report_statistics_rsp(&viport->control,
+						  &viport->stats) == 0) {
+				viport->updates &= ~NEED_STATS;
+				viport->last_stats_time = jiffies;
+				wake_up(&viport->stats_queue);
+				viport->link_state = LINK_IDLE;
+			}
+
+			spin_unlock_irq(&viport->lock);
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+			}
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_heartbeat_states(struct viport *viport)
+{
+	enum link_state old_state;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_HEARTBEATREQ:
+			LINK_STATE("state LINK_HEARTBEATREQ\n");
+			if (control_heartbeat_req(&viport->control,
+						  viport->config->hb_timeout))
+				viport->link_state = LINK_RESET;
+			else
+				viport->link_state = LINK_HEARTBEATRSP;
+			break;
+		case LINK_HEARTBEATRSP:
+			LINK_STATE("state LINK_HEARTBEATRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_heartbeat_rsp(&viport->control))
+				viport->link_state = LINK_IDLE;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_RESET;
+			}
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_reset_states(struct viport *viport)
+{
+	enum link_state old_state;
+	int handle_mc_join_compl = 0, handle_mc_join = 0;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_RESET:
+			LINK_STATE("state LINK_RESET\n");
+			viport->errored = 0;
+			spin_lock_irq(&viport->lock);
+			viport->state = VIPORT_DISCONNECTED;
+			/*
+			 * Turn off both flags, the handler functions will
+			 * rearm them if necessary
+			 */
+			viport->updates &= ~(NEED_MCAST_JOIN | NEED_MCAST_COMPLETION);
+
+			spin_unlock_irq(&viport->lock);
+			vnic_link_down(viport->vnic, viport->parent);
+			printk(KERN_INFO PFX
+			       "%s: connection lost\n",
+			       config_viport_name(viport->config));
+			if (handle_mc_join) {
+				if (vnic_mc_join(viport))
+					viport_disable_multicast(viport);
+			}
+			if (handle_mc_join_compl)
+				vnic_mc_join_handle_completion(viport);
+			if (viport->features_supported & VNIC_FEAT_INBOUND_IB_MC) {
+				VIPORT_ERROR("calling vnic_mc_leave\n");
+				vnic_mc_leave(viport);
+				VIPORT_ERROR("calling mc_data_cleanup\n");
+				mc_data_cleanup(&viport->mc_data);
+			}
+
+			if (control_reset_req(&viport->control))
+				viport->link_state = LINK_DATADISCONNECT;
+			else
+				viport->link_state = LINK_RESETRSP;
+			break;
+		case LINK_RESETRSP:
+			LINK_STATE("state LINK_RESETRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_reset_rsp(&viport->control))
+				viport->link_state = LINK_DATADISCONNECT;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_DATADISCONNECT;
+			}
+			break;
+		case LINK_RESETCONTROL:
+			LINK_STATE("state LINK_RESETCONTROL\n");
+			if (control_reset_req(&viport->control))
+				viport->link_state = LINK_CONTROLDISCONNECT;
+			else
+				viport->link_state = LINK_RESETCONTROLRSP;
+			break;
+		case LINK_RESETCONTROLRSP:
+			LINK_STATE("state LINK_RESETCONTROLRSP\n");
+			control_process_async(&viport->control);
+
+			if (!control_reset_rsp(&viport->control))
+				viport->link_state = LINK_CONTROLDISCONNECT;
+
+			if (viport->errored) {
+				viport->errored = 0;
+				viport->link_state = LINK_CONTROLDISCONNECT;
+			}
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_handle_disconn_states(struct viport *viport)
+{
+	enum link_state old_state;
+
+	do {
+		switch (old_state = viport->link_state) {
+		case LINK_DATADISCONNECT:
+			LINK_STATE("state LINK_DATADISCONNECT\n");
+			data_disconnect(&viport->data);
+			viport->link_state = LINK_CONTROLDISCONNECT;
+			break;
+		case LINK_CONTROLDISCONNECT:
+			LINK_STATE("state LINK_CONTROLDISCONNECT\n");
+			viport->link_state = LINK_CLEANUPDATA;
+			break;
+		case LINK_CLEANUPDATA:
+			LINK_STATE("state LINK_CLEANUPDATA\n");
+			data_cleanup(&viport->data);
+			viport->link_state = LINK_CLEANUPCONTROL;
+			break;
+		case LINK_CLEANUPCONTROL:
+			LINK_STATE("state LINK_CLEANUPCONTROL\n");
+			spin_lock_irq(&viport->lock);
+			kfree(viport->mac_addresses);
+			viport->mac_addresses = NULL;
+			spin_unlock_irq(&viport->lock);
+			control_cleanup(&viport->control);
+			ib_dealloc_pd(viport->pd);
+			viport->link_state = LINK_DISCONNECTED;
+			break;
+		case LINK_DISCONNECTED:
+			LINK_STATE("state LINK_DISCONNECTED\n");
+			vnic_disconnected(viport->parent->parent,
+					  viport->parent);
+			if (viport->disconnect != 0)
+				viport->link_state = LINK_UNINITIALIZED;
+			else if (viport->retry == 1) {
+				viport->retry = 0;
+			/*
+			 * Check if the initial retry interval has crossed
+			 * 20 seconds.
+			 * The retry interval is initially 5 seconds which
+			 * is incremented by 5. Once it is 20 the interval
+			 * is fixed to 20 seconds till 10 minutes,
+			 * after which retrying is stopped
+			 */
+				if (viport->retry_duration  < MAX_RETRY_INTERVAL)
+					viport->retry_duration +=
+								RETRY_INCREMENT;
+
+				viport->total_retry_duration +=
+							 viport->retry_duration;
+
+				if (viport->total_retry_duration >=
+					MAX_CONNECT_RETRY_TIMEOUT) {
+					viport->link_state = LINK_UNINITIALIZED;
+					printk("Timed out after retrying"
+					       " for retry_duration %d msecs\n"
+						, viport->total_retry_duration);
+				} else {
+					viport->connect = DELAY;
+					viport->link_state = LINK_RETRYWAIT;
+				}
+				viport_timer(viport,
+				     msecs_to_jiffies(viport->retry_duration));
+			} else {
+				u32 duration = 5000 + ((net_random()) & 0x1FF);
+				if (!viport->parent->is_primary_path)
+					duration += 0x1ff;
+				viport_timer(viport,
+					     msecs_to_jiffies(duration));
+				viport->connect = DELAY;
+				viport->link_state = LINK_RETRYWAIT;
+			}
+			break;
+		case LINK_RETRYWAIT:
+			LINK_STATE("state LINK_RETRYWAIT\n");
+			viport->stats.ethernet_status = 0;
+			viport->updates = 0;
+			wake_up(&viport->stats_queue);
+			if (viport->disconnect != 0) {
+				viport_timer_stop(viport);
+				viport->link_state = LINK_UNINITIALIZED;
+			} else if (viport->connect == DELAY) {
+				if (!viport->timer_active)
+					viport->link_state = LINK_INITIALIZE;
+			} else if (viport->connect == NOW) {
+				viport_timer_stop(viport);
+				viport->link_state = LINK_INITIALIZE;
+			}
+			break;
+		case LINK_FIRSTCONNECT:
+			viport->stats.ethernet_status = 0;
+			viport->updates = 0;
+			wake_up(&viport->stats_queue);
+			if (viport->disconnect != 0) {
+				viport_timer_stop(viport);
+				viport->link_state = LINK_UNINITIALIZED;
+			}
+
+			break;
+		default:
+			return -1;
+		}
+	} while (viport->link_state != old_state);
+
+	return 0;
+}
+
+static int viport_statemachine(void *context)
+{
+	struct viport *viport;
+	enum link_state old_link_state;
+
+	VIPORT_FUNCTION("viport_statemachine()\n");
+	while (!viport_thread_end || !list_empty(&viport_list)) {
+		wait_event_interruptible(viport_queue,
+					 !list_empty(&viport_list)
+					 || viport_thread_end);
+		spin_lock_irq(&viport_list_lock);
+		if (list_empty(&viport_list)) {
+			spin_unlock_irq(&viport_list_lock);
+			continue;
+		}
+		viport = list_entry(viport_list.next, struct viport,
+				    list_ptrs);
+		list_del_init(&viport->list_ptrs);
+		spin_unlock_irq(&viport_list_lock);
+
+		do {
+			old_link_state = viport->link_state;
+
+			/*
+			 * Optimize for the state machine steady state
+			 * by checking for the most common states first.
+			 *
+			 */
+			if (viport_handle_idle_states(viport) == 0)
+				break;
+			if (viport_handle_heartbeat_states(viport) == 0)
+				break;
+			if (viport_handle_stat_states(viport) == 0)
+				break;
+			if (viport_handle_config_states(viport) == 0)
+				break;
+
+			if (viport_handle_init_states(viport) == 0)
+				break;
+			if (viport_handle_control_states(viport) == 0)
+				break;
+			if (viport_handle_data_states(viport) == 0)
+				break;
+			if (viport_handle_xchgpool_states(viport) == 0)
+				break;
+			if (viport_handle_reset_states(viport) == 0)
+				break;
+			if (viport_handle_disconn_states(viport) == 0)
+				break;
+		} while (viport->link_state != old_link_state);
+	}
+
+	complete_and_exit(&viport_thread_exit, 0);
+}
+
+int viport_start(void)
+{
+	VIPORT_FUNCTION("viport_start()\n");
+
+	spin_lock_init(&viport_list_lock);
+	viport_thread = kthread_run(viport_statemachine, NULL,
+					"qlgc_vnic_viport_s_m");
+	if (IS_ERR(viport_thread)) {
+		printk(KERN_WARNING PFX "Could not create viport_thread;"
+		       " error %d\n", (int) PTR_ERR(viport_thread));
+		viport_thread = NULL;
+		return 1;
+	}
+
+	return 0;
+}
+
+void viport_cleanup(void)
+{
+	VIPORT_FUNCTION("viport_cleanup()\n");
+	if (viport_thread) {
+		viport_thread_end = 1;
+		wake_up(&viport_queue);
+		wait_for_completion(&viport_thread_exit);
+		viport_thread = NULL;
+	}
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h
new file mode 100644
index 0000000..bb0e7e1
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_viport.h
@@ -0,0 +1,176 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_VIPORT_H_INCLUDED
+#define VNIC_VIPORT_H_INCLUDED
+
+#include "vnic_control.h"
+#include "vnic_data.h"
+#include "vnic_multicast.h"
+
+enum viport_state {
+	VIPORT_DISCONNECTED	= 0,
+	VIPORT_CONNECTED	= 1
+};
+
+enum link_state {
+	LINK_UNINITIALIZED	= 0,
+	LINK_INITIALIZE		= 1,
+	LINK_INITIALIZECONTROL	= 2,
+	LINK_INITIALIZEDATA	= 3,
+	LINK_CONTROLCONNECT	= 4,
+	LINK_CONTROLCONNECTWAIT	= 5,
+	LINK_INITVNICREQ	= 6,
+	LINK_INITVNICRSP	= 7,
+	LINK_BEGINDATAPATH	= 8,
+	LINK_CONFIGDATAPATHREQ	= 9,
+	LINK_CONFIGDATAPATHRSP	= 10,
+	LINK_DATACONNECT	= 11,
+	LINK_DATACONNECTWAIT	= 12,
+	LINK_XCHGPOOLREQ	= 13,
+	LINK_XCHGPOOLRSP	= 14,
+	LINK_INITIALIZED	= 15,
+	LINK_IDLE		= 16,
+	LINK_IDLING		= 17,
+	LINK_CONFIGLINKREQ	= 18,
+	LINK_CONFIGLINKRSP	= 19,
+	LINK_CONFIGADDRSREQ	= 20,
+	LINK_CONFIGADDRSRSP	= 21,
+	LINK_REPORTSTATREQ	= 22,
+	LINK_REPORTSTATRSP	= 23,
+	LINK_HEARTBEATREQ	= 24,
+	LINK_HEARTBEATRSP	= 25,
+	LINK_RESET		= 26,
+	LINK_RESETRSP		= 27,
+	LINK_RESETCONTROL	= 28,
+	LINK_RESETCONTROLRSP	= 29,
+	LINK_DATADISCONNECT	= 30,
+	LINK_CONTROLDISCONNECT	= 31,
+	LINK_CLEANUPDATA	= 32,
+	LINK_CLEANUPCONTROL	= 33,
+	LINK_DISCONNECTED	= 34,
+	LINK_RETRYWAIT		= 35,
+	LINK_FIRSTCONNECT	= 36
+};
+
+enum {
+	BROADCAST_ADDR		= 0,
+	UNICAST_ADDR		= 1,
+	MCAST_ADDR_START	= 2
+};
+
+#define current_mac_address	mac_addresses[UNICAST_ADDR].address
+
+enum {
+	NEED_STATS           	= 0x00000001,
+	NEED_ADDRESS_CONFIG  	= 0x00000002,
+	NEED_LINK_CONFIG     	= 0x00000004,
+	MCAST_OVERFLOW       	= 0x00000008,
+	NEED_MCAST_COMPLETION	= 0x00000010,
+	NEED_MCAST_JOIN      	= 0x00000020
+};
+
+struct viport {
+	struct list_head		list_ptrs;
+	struct netpath			*parent;
+	struct vnic			*vnic;
+	struct viport_config		*config;
+	struct control			control;
+	struct data			data;
+	spinlock_t			lock;
+	struct ib_pd			*pd;
+	enum viport_state		state;
+	enum link_state			link_state;
+	struct vnic_cmd_report_stats_rsp stats;
+	wait_queue_head_t		stats_queue;
+	u32				last_stats_time;
+	u32				features_supported;
+	u8				hw_mac_address[ETH_ALEN];
+	u16				default_vlan;
+	u16				num_mac_addresses;
+	struct vnic_address_op2		*mac_addresses;
+	u32				updates;
+	u16				flags;
+	u16				new_flags;
+	u16				mtu;
+	u16				new_mtu;
+	u32				errored;
+	enum { WAIT, DELAY, NOW }	connect;
+	u32				disconnect;
+	u32 				retry;
+	wait_queue_head_t		disconnect_queue;
+	int				timer_active;
+	struct timer_list		timer;
+	u32 				retry_duration;
+	u32 				total_retry_duration;
+	int 				reference_count;
+	wait_queue_head_t		reference_queue;
+	struct mc_info	mc_info;
+	struct mc_data	mc_data;
+};
+
+int  viport_start(void);
+void viport_cleanup(void);
+
+struct viport *viport_allocate(struct viport_config *config);
+void viport_free(struct viport *viport);
+
+void viport_connect(struct viport *viport, int delay);
+void viport_disconnect(struct viport *viport);
+
+void viport_set_link(struct viport *viport, u16 flags, u16 mtu);
+void viport_get_stats(struct viport *viport,
+		      struct net_device_stats *stats);
+int  viport_xmit_packet(struct viport *viport, struct sk_buff *skb);
+void viport_kick(struct viport *viport);
+
+void viport_failure(struct viport *viport);
+
+int viport_set_unicast(struct viport *viport, u8 *address);
+int viport_set_multicast(struct viport *viport,
+			 struct dev_mc_list *mc_list,
+			 int mc_count);
+
+#define viport_max_mtu(viport)		data_max_mtu(&(viport)->data)
+
+#define viport_get_hw_addr(viport, address)			\
+	memcpy(address, (viport)->hw_mac_address, ETH_ALEN)
+
+#define viport_features(viport) ((viport)->features_supported)
+
+#define viport_can_tx_csum(viport)				\
+	(((viport)->features_supported & 			\
+	(VNIC_FEAT_IPV4_CSUM_TX | VNIC_FEAT_TCP_CSUM_TX |	\
+	VNIC_FEAT_UDP_CSUM_TX)) == (VNIC_FEAT_IPV4_CSUM_TX |	\
+	VNIC_FEAT_TCP_CSUM_TX | VNIC_FEAT_UDP_CSUM_TX))
+
+#endif /* VNIC_VIPORT_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:17:54 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:47:54 +0530
Subject: [ofa-general] [PATCH 04/13] QLogic VNIC: Implementation of Control
	path of communication protocol
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171754.31725.77615.stgit@localhost.localdomain>

From: Poornima Kamath <poornima.kamath at qlogic.com>

This patch adds the files that define the control packet formats
and implements various control messages that are exchanged as part
of the communication protocol with the EVIC/VEx.

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_control.c    | 2288 ++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_control.h    |  180 ++
 .../infiniband/ulp/qlgc_vnic/vnic_control_pkt.h    |  368 +++
 3 files changed, 2836 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c
new file mode 100644
index 0000000..470f22e
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.c
@@ -0,0 +1,2288 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/list.h>
+#include <linux/vmalloc.h>
+
+#include "vnic_util.h"
+#include "vnic_main.h"
+#include "vnic_viport.h"
+#include "vnic_control.h"
+#include "vnic_control_pkt.h"
+#include "vnic_stats.h"
+
+#define vnic_multicast_address(rsp2_address, index)           \
+	((rsp2_address)->list_address_ops[index].address[0] & 0x01)
+
+static void control_log_control_packet(struct vnic_control_packet *pkt);
+
+static inline char *control_ifcfg_name(struct control *control)
+{
+	if (!control)
+		return "nctl";
+	if (!control->parent)
+		return "np";
+	if (!control->parent->parent)
+		return "npp";
+	if (!control->parent->parent->parent)
+		return "nppp";
+	if (!control->parent->parent->parent->config)
+		return "npppc";
+	return (control->parent->parent->parent->config->name);
+}
+
+static void control_recv(struct control *control, struct recv_io *recv_io)
+{
+	if (vnic_ib_post_recv(&control->ib_conn, &recv_io->io))
+		viport_failure(control->parent);
+}
+
+static void control_recv_complete(struct io *io)
+{
+	struct recv_io			*recv_io = (struct recv_io *)io;
+	struct recv_io			*last_recv_io;
+	struct control			*control = &io->viport->control;
+	struct vnic_control_packet	*pkt = control_packet(recv_io);
+	struct vnic_control_header	*c_hdr = &pkt->hdr;
+	unsigned long			flags;
+	cycles_t			response_time;
+
+	CONTROL_FUNCTION("%s: control_recv_complete() State=%d\n",
+			 control_ifcfg_name(control), control->req_state);
+
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+	control_note_rsptime_stats(&response_time);
+	CONTROL_PACKET(pkt);
+	spin_lock_irqsave(&control->io_lock, flags);
+	if (c_hdr->pkt_type == TYPE_INFO) {
+		last_recv_io = control->info;
+		control->info = recv_io;
+		spin_unlock_irqrestore(&control->io_lock, flags);
+		viport_kick(control->parent);
+		if (last_recv_io)
+			control_recv(control, last_recv_io);
+	} else if (c_hdr->pkt_type == TYPE_RSP) {
+		u8 repost = 0;
+		u8 fail = 0;
+		u8 kick = 0;
+
+		switch (control->req_state) {
+		case REQ_INACTIVE:
+		case RSP_RECEIVED:
+		case REQ_COMPLETED:
+			CONTROL_ERROR("%s: Unexpected control"
+					"response received: CMD = %d\n",
+					control_ifcfg_name(control),
+					c_hdr->pkt_cmd);
+			control_log_control_packet(pkt);
+			control->req_state = REQ_FAILED;
+			fail = 1;
+			break;
+		case REQ_POSTED:
+		case REQ_SENT:
+			if (c_hdr->pkt_cmd != control->last_cmd
+				|| c_hdr->pkt_seq_num != control->seq_num) {
+				CONTROL_ERROR("%s: Incorrect Control Response "
+					      "received\n",
+					      control_ifcfg_name(control));
+				CONTROL_ERROR("%s: Sent control request:\n",
+					      control_ifcfg_name(control));
+				control_log_control_packet(control_last_req(control));
+				CONTROL_ERROR("%s: Received control response:\n",
+					      control_ifcfg_name(control));
+				control_log_control_packet(pkt);
+				control->req_state = REQ_FAILED;
+				fail = 1;
+			} else {
+				control->response = recv_io;
+				control_update_rsptime_stats(control,
+							    response_time);
+				if (control->req_state == REQ_POSTED) {
+					CONTROL_INFO("%s: Recv CMD RSP %d"
+						     "before Send Completion\n",
+						     control_ifcfg_name(control),
+						     c_hdr->pkt_cmd);
+					control->req_state = RSP_RECEIVED;
+				} else {
+					control->req_state = REQ_COMPLETED;
+					kick = 1;
+				}
+			}
+			break;
+		case REQ_FAILED:
+			/* stay in REQ_FAILED state */
+			repost = 1;
+			break;
+		}
+		spin_unlock_irqrestore(&control->io_lock, flags);
+		/* we must do this outside the lock*/
+		if (kick)
+			viport_kick(control->parent);
+		if (repost || fail) {
+			control_recv(control, recv_io);
+			if (fail)
+				viport_failure(control->parent);
+		}
+
+	} else {
+		list_add_tail(&recv_io->io.list_ptrs,
+			      &control->failure_list);
+		spin_unlock_irqrestore(&control->io_lock, flags);
+		viport_kick(control->parent);
+	}
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+}
+
+static void control_timeout(unsigned long data)
+{
+	struct control *control;
+	unsigned long 		  flags;
+	u8 fail = 0;
+	u8 kick = 0;
+
+	control = (struct control *)data;
+	CONTROL_FUNCTION("%s: control_timeout(), State=%d\n",
+			 control_ifcfg_name(control), control->req_state);
+	control->timer_state = TIMER_EXPIRED;
+
+	spin_lock_irqsave(&control->io_lock, flags);
+	switch (control->req_state) {
+	case REQ_INACTIVE:
+		kick = 1;
+		/* stay in REQ_INACTIVE state */
+		break;
+	case REQ_POSTED:
+	case REQ_SENT:
+		control->req_state = REQ_FAILED;
+		CONTROL_ERROR("%s: No send Completion for Cmd=%d \n",
+			      control_ifcfg_name(control), control->last_cmd);
+		control_timeout_stats(control);
+		fail = 1;
+		break;
+	case RSP_RECEIVED:
+		control->req_state = REQ_FAILED;
+		CONTROL_ERROR("%s: No response received from EIOC for Cmd=%d\n",
+			      control_ifcfg_name(control), control->last_cmd);
+		control_timeout_stats(control);
+		fail = 1;
+		break;
+	case REQ_COMPLETED:
+		/* stay in REQ_COMPLETED state*/
+		kick = 1;
+		break;
+	case REQ_FAILED:
+		/* stay in REQ_FAILED state*/
+		break;
+	}
+	spin_unlock_irqrestore(&control->io_lock, flags);
+	/* we must do this outside the lock */
+	if (fail)
+		viport_failure(control->parent);
+	if (kick)
+		viport_kick(control->parent);
+
+	return;
+}
+
+static void control_timer(struct control *control, int timeout)
+{
+	CONTROL_FUNCTION("%s: control_timer()\n",
+			 control_ifcfg_name(control));
+	if (control->timer_state == TIMER_ACTIVE)
+		mod_timer(&control->timer, jiffies + timeout);
+	else {
+		init_timer(&control->timer);
+		control->timer.expires = jiffies + timeout;
+		control->timer.data = (unsigned long)control;
+		control->timer.function = control_timeout;
+		control->timer_state = TIMER_ACTIVE;
+		add_timer(&control->timer);
+	}
+}
+
+static void control_timer_stop(struct control *control)
+{
+	CONTROL_FUNCTION("%s: control_timer_stop()\n",
+			 control_ifcfg_name(control));
+	if (control->timer_state == TIMER_ACTIVE)
+		del_timer_sync(&control->timer);
+
+	control->timer_state = TIMER_IDLE;
+}
+
+static int control_send(struct control *control, struct send_io *send_io)
+{
+	unsigned long 	flags;
+	u8 ret = -1;
+	u8 fail = 0;
+	struct vnic_control_packet *pkt = control_packet(send_io);
+
+	CONTROL_FUNCTION("%s: control_send(), State=%d\n",
+			 control_ifcfg_name(control), control->req_state);
+	spin_lock_irqsave(&control->io_lock, flags);
+	switch (control->req_state) {
+	case REQ_INACTIVE:
+		CONTROL_PACKET(pkt);
+		control_timer(control, control->config->rsp_timeout);
+		control_note_reqtime_stats(control);
+		if (vnic_ib_post_send(&control->ib_conn, &control->send_io.io)) {
+			CONTROL_ERROR("%s: Failed to post send\n",
+				control_ifcfg_name(control));
+			/* stay in REQ_INACTIVE state*/
+			fail = 1;
+		} else {
+			control->last_cmd = pkt->hdr.pkt_cmd;
+			control->req_state = REQ_POSTED;
+			ret = 0;
+		}
+		break;
+	case REQ_POSTED:
+	case REQ_SENT:
+	case RSP_RECEIVED:
+	case REQ_COMPLETED:
+		CONTROL_ERROR("%s:Previous Command is not completed."
+			      "New CMD: %d Last CMD: %d Seq: %d\n",
+			      control_ifcfg_name(control), pkt->hdr.pkt_cmd,
+			      control->last_cmd, control->seq_num);
+
+		control->req_state = REQ_FAILED;
+		fail = 1;
+		break;
+	case REQ_FAILED:
+		/* this can occur after an error when ViPort state machine
+		 * attempts to reset the link.
+		 */
+		CONTROL_INFO("%s:Attempt to send in failed state."
+			     "New CMD: %d Last CMD: %d\n",
+			     control_ifcfg_name(control), pkt->hdr.pkt_cmd,
+			     control->last_cmd);
+		/* stay in REQ_FAILED state*/
+		break;
+	}
+	spin_unlock_irqrestore(&control->io_lock, flags);
+
+	/* we must do this outside the lock */
+	if (fail)
+		viport_failure(control->parent);
+	return ret;
+
+}
+
+static void control_send_complete(struct io *io)
+{
+	struct control *control = &io->viport->control;
+	unsigned long 		  flags;
+	u8 fail = 0;
+	u8 kick = 0;
+
+	CONTROL_FUNCTION("%s: control_sendComplete(), State=%d\n",
+			 control_ifcfg_name(control), control->req_state);
+	spin_lock_irqsave(&control->io_lock, flags);
+	switch (control->req_state) {
+	case REQ_INACTIVE:
+	case REQ_SENT:
+	case REQ_COMPLETED:
+		CONTROL_ERROR("%s: Unexpected control send completion\n",
+			      control_ifcfg_name(control));
+		fail = 1;
+		control->req_state = REQ_FAILED;
+		break;
+	case REQ_POSTED:
+		control->req_state = REQ_SENT;
+		break;
+	case RSP_RECEIVED:
+		control->req_state = REQ_COMPLETED;
+		kick = 1;
+		break;
+	case REQ_FAILED:
+		/* stay in REQ_FAILED state */
+		break;
+	}
+	spin_unlock_irqrestore(&control->io_lock, flags);
+	/* we must do this outside the lock */
+	if (fail)
+		viport_failure(control->parent);
+	if (kick)
+		viport_kick(control->parent);
+
+	return;
+}
+
+void control_process_async(struct control *control)
+{
+	struct recv_io			*recv_io;
+	struct vnic_control_packet	*pkt;
+	unsigned long			flags;
+
+	CONTROL_FUNCTION("%s: control_process_async()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	spin_lock_irqsave(&control->io_lock, flags);
+	recv_io = control->info;
+	if (recv_io) {
+		CONTROL_INFO("%s: processing info packet\n",
+			     control_ifcfg_name(control));
+		control->info = NULL;
+		spin_unlock_irqrestore(&control->io_lock, flags);
+		pkt = control_packet(recv_io);
+		if (pkt->hdr.pkt_cmd == CMD_REPORT_STATUS) {
+			u32		status;
+			status =
+			  be32_to_cpu(pkt->cmd.report_status.status_number);
+			switch (status) {
+			case VNIC_STATUS_LINK_UP:
+				CONTROL_INFO("%s: link up\n",
+					     control_ifcfg_name(control));
+				vnic_link_up(control->parent->vnic,
+					     control->parent->parent);
+				break;
+			case VNIC_STATUS_LINK_DOWN:
+				CONTROL_INFO("%s: link down\n",
+					     control_ifcfg_name(control));
+				vnic_link_down(control->parent->vnic,
+					       control->parent->parent);
+				break;
+			default:
+				CONTROL_ERROR("%s: asynchronous status"
+					      " received from EIOC\n",
+					      control_ifcfg_name(control));
+				control_log_control_packet(pkt);
+				break;
+			}
+		}
+		if ((pkt->hdr.pkt_cmd != CMD_REPORT_STATUS) ||
+		     pkt->cmd.report_status.is_fatal)
+			viport_failure(control->parent);
+
+		control_recv(control, recv_io);
+		spin_lock_irqsave(&control->io_lock, flags);
+	}
+
+	while (!list_empty(&control->failure_list)) {
+		CONTROL_INFO("%s: processing error packet\n",
+			     control_ifcfg_name(control));
+		recv_io = (struct recv_io *)
+		    list_entry(control->failure_list.next, struct io,
+			       list_ptrs);
+		list_del(&recv_io->io.list_ptrs);
+		spin_unlock_irqrestore(&control->io_lock, flags);
+		pkt = control_packet(recv_io);
+		CONTROL_ERROR("%s: asynchronous error received from EIOC\n",
+			      control_ifcfg_name(control));
+		control_log_control_packet(pkt);
+		if ((pkt->hdr.pkt_type != TYPE_ERR)
+		    || (pkt->hdr.pkt_cmd != CMD_REPORT_STATUS)
+		    || pkt->cmd.report_status.is_fatal)
+			viport_failure(control->parent);
+
+		control_recv(control, recv_io);
+		spin_lock_irqsave(&control->io_lock, flags);
+	}
+	spin_unlock_irqrestore(&control->io_lock, flags);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+
+	CONTROL_FUNCTION("%s: done control_process_async\n",
+		     control_ifcfg_name(control));
+}
+
+static struct send_io *control_init_hdr(struct control *control, u8 cmd)
+{
+	struct control_config		*config;
+	struct vnic_control_packet	*pkt;
+	struct vnic_control_header	*hdr;
+
+	CONTROL_FUNCTION("control_init_hdr()\n");
+	config = control->config;
+
+	pkt = control_packet(&control->send_io);
+	hdr = &pkt->hdr;
+
+	hdr->pkt_type = TYPE_REQ;
+	hdr->pkt_cmd = cmd;
+	control->seq_num++;
+	hdr->pkt_seq_num = control->seq_num;
+	hdr->pkt_retry_count = 0;
+
+	return &control->send_io;
+}
+
+static struct recv_io *control_get_rsp(struct control *control)
+{
+	struct recv_io	*recv_io = NULL;
+	unsigned long	flags;
+	u8 fail = 0;
+
+	CONTROL_FUNCTION("%s: control_getRsp(), State=%d\n",
+			 control_ifcfg_name(control), control->req_state);
+	spin_lock_irqsave(&control->io_lock, flags);
+	switch (control->req_state) {
+	case REQ_INACTIVE:
+		CONTROL_ERROR("%s: Checked for Response with no"
+			      "command pending\n",
+			      control_ifcfg_name(control));
+		control->req_state = REQ_FAILED;
+		fail = 1;
+		break;
+	case REQ_POSTED:
+	case REQ_SENT:
+	case RSP_RECEIVED:
+		/* no response available yet
+		 stay in present state*/
+		break;
+	case REQ_COMPLETED:
+		recv_io = control->response;
+		if (!recv_io) {
+			control->req_state = REQ_FAILED;
+			fail = 1;
+			break;
+		}
+		control->response = NULL;
+		control->last_cmd = CMD_INVALID;
+		control_timer_stop(control);
+		control->req_state = REQ_INACTIVE;
+		break;
+	case REQ_FAILED:
+		control_timer_stop(control);
+		/* stay in REQ_FAILED state*/
+		break;
+	}
+	spin_unlock_irqrestore(&control->io_lock, flags);
+	if (fail)
+		viport_failure(control->parent);
+	return recv_io;
+}
+
+int control_init_vnic_req(struct control *control)
+{
+	struct send_io			*send_io;
+	struct control_config		*config = control->config;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_init_vnic_req	*init_vnic_req;
+
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_INIT_VNIC);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+	init_vnic_req = &pkt->cmd.init_vnic_req;
+	init_vnic_req->vnic_major_version =
+				 __constant_cpu_to_be16(VNIC_MAJORVERSION);
+	init_vnic_req->vnic_minor_version =
+				 __constant_cpu_to_be16(VNIC_MINORVERSION);
+	init_vnic_req->vnic_instance = config->vnic_instance;
+	init_vnic_req->num_data_paths = 1;
+	init_vnic_req->num_address_entries =
+				cpu_to_be16(config->max_address_entries);
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+	CONTROL_PACKET(pkt);
+
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+static int control_chk_vnic_rsp_values(struct control *control,
+				       u16 *num_addrs,
+				       u8 num_data_paths,
+				       u8 num_lan_switches,
+				       u32 *features)
+{
+
+	struct control_config		*config = control->config;
+
+	if ((control->maj_ver > VNIC_MAJORVERSION)
+	    || ((control->maj_ver == VNIC_MAJORVERSION)
+		&& (control->min_ver > VNIC_MINORVERSION))) {
+		CONTROL_ERROR("%s: unsupported version\n",
+			      control_ifcfg_name(control));
+		goto failure;
+	}
+	if (num_data_paths != 1) {
+		CONTROL_ERROR("%s: EIOC returned too many datapaths\n",
+			      control_ifcfg_name(control));
+		goto failure;
+	}
+	if (*num_addrs > config->max_address_entries) {
+		CONTROL_ERROR("%s: EIOC returned more address"
+			      " entries than requested\n",
+			      control_ifcfg_name(control));
+		goto failure;
+	}
+	if (*num_addrs < config->min_address_entries) {
+		CONTROL_ERROR("%s: not enough address entries\n",
+			      control_ifcfg_name(control));
+		goto failure;
+	}
+	if (num_lan_switches < 1) {
+		CONTROL_ERROR("%s: EIOC returned no lan switches\n",
+			      control_ifcfg_name(control));
+		goto failure;
+	}
+	if (num_lan_switches > 1) {
+		CONTROL_ERROR("%s: EIOC returned multiple lan switches\n",
+			      control_ifcfg_name(control));
+		goto failure;
+	}
+	CONTROL_ERROR("%s checking features %x ib_multicast:%d\n",
+			control_ifcfg_name(control),
+			*features, config->ib_multicast);
+	if ((*features & VNIC_FEAT_INBOUND_IB_MC) && !config->ib_multicast) {
+		/* disable multicast if it is not on in the cfg file, or
+		   if we turned it off because join failed */
+		*features &= ~VNIC_FEAT_INBOUND_IB_MC;
+	}
+
+	return 0;
+failure:
+	return -1;
+}
+
+int control_init_vnic_rsp(struct control *control, u32 *features,
+			  u8 *mac_address, u16 *num_addrs, u16 *vlan)
+{
+	u8 num_data_paths;
+	u8 num_lan_switches;
+	struct recv_io			*recv_io;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_init_vnic_rsp	*init_vnic_rsp;
+
+
+	CONTROL_FUNCTION("%s: control_init_vnic_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_INIT_VNIC)
+		goto failure;
+
+	init_vnic_rsp = &pkt->cmd.init_vnic_rsp;
+	control->maj_ver = be16_to_cpu(init_vnic_rsp->vnic_major_version);
+	control->min_ver = be16_to_cpu(init_vnic_rsp->vnic_minor_version);
+	num_data_paths = init_vnic_rsp->num_data_paths;
+	num_lan_switches = init_vnic_rsp->num_lan_switches;
+	*features = be32_to_cpu(init_vnic_rsp->features_supported);
+	*num_addrs = be16_to_cpu(init_vnic_rsp->num_address_entries);
+
+	if (control_chk_vnic_rsp_values(control, num_addrs,
+					num_data_paths,
+					num_lan_switches,
+					features))
+		goto failure;
+
+	control->lan_switch.lan_switch_num =
+			init_vnic_rsp->lan_switch[0].lan_switch_num;
+	control->lan_switch.num_enet_ports =
+			init_vnic_rsp->lan_switch[0].num_enet_ports;
+	control->lan_switch.default_vlan =
+			init_vnic_rsp->lan_switch[0].default_vlan;
+	*vlan = be16_to_cpu(control->lan_switch.default_vlan);
+	memcpy(control->lan_switch.hw_mac_address,
+	       init_vnic_rsp->lan_switch[0].hw_mac_address, ETH_ALEN);
+	memcpy(mac_address, init_vnic_rsp->lan_switch[0].hw_mac_address,
+	       ETH_ALEN);
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+static void copy_recv_pool_config(struct vnic_recv_pool_config *src,
+				  struct vnic_recv_pool_config *dst)
+{
+	dst->size_recv_pool_entry  = src->size_recv_pool_entry;
+	dst->num_recv_pool_entries = src->num_recv_pool_entries;
+	dst->timeout_before_kick   = src->timeout_before_kick;
+	dst->num_recv_pool_entries_before_kick =
+				src->num_recv_pool_entries_before_kick;
+	dst->num_recv_pool_bytes_before_kick =
+				src->num_recv_pool_bytes_before_kick;
+	dst->free_recv_pool_entries_per_update =
+				src->free_recv_pool_entries_per_update;
+}
+
+static int check_recv_pool_config_value(__be32 *src, __be32 *dst,
+					__be32 *max, __be32 *min,
+					char *name)
+{
+	u32 value;
+
+	value = be32_to_cpu(*src);
+	if (value > be32_to_cpu(*max)) {
+		CONTROL_ERROR("value %s too large\n", name);
+		return -1;
+	} else if (value < be32_to_cpu(*min)) {
+		CONTROL_ERROR("value %s too small\n", name);
+		return -1;
+	}
+
+	*dst = cpu_to_be32(value);
+	return 0;
+}
+
+static int check_recv_pool_config(struct vnic_recv_pool_config *src,
+				  struct vnic_recv_pool_config *dst,
+				  struct vnic_recv_pool_config *max,
+				  struct vnic_recv_pool_config *min)
+{
+	if (check_recv_pool_config_value(&src->size_recv_pool_entry,
+				     &dst->size_recv_pool_entry,
+				     &max->size_recv_pool_entry,
+				     &min->size_recv_pool_entry,
+				     "size_recv_pool_entry")
+	    || check_recv_pool_config_value(&src->num_recv_pool_entries,
+				     &dst->num_recv_pool_entries,
+				     &max->num_recv_pool_entries,
+				     &min->num_recv_pool_entries,
+				     "num_recv_pool_entries")
+	    || check_recv_pool_config_value(&src->timeout_before_kick,
+				     &dst->timeout_before_kick,
+				     &max->timeout_before_kick,
+				     &min->timeout_before_kick,
+				     "timeout_before_kick")
+	    || check_recv_pool_config_value(&src->
+				     num_recv_pool_entries_before_kick,
+				     &dst->
+				     num_recv_pool_entries_before_kick,
+				     &max->
+				     num_recv_pool_entries_before_kick,
+				     &min->
+				     num_recv_pool_entries_before_kick,
+				     "num_recv_pool_entries_before_kick")
+	    || check_recv_pool_config_value(&src->
+				     num_recv_pool_bytes_before_kick,
+				     &dst->
+				     num_recv_pool_bytes_before_kick,
+				     &max->
+				     num_recv_pool_bytes_before_kick,
+				     &min->
+				     num_recv_pool_bytes_before_kick,
+				     "num_recv_pool_bytes_before_kick")
+	    || check_recv_pool_config_value(&src->
+				     free_recv_pool_entries_per_update,
+				     &dst->
+				     free_recv_pool_entries_per_update,
+				     &max->
+				     free_recv_pool_entries_per_update,
+				     &min->
+				     free_recv_pool_entries_per_update,
+				     "free_recv_pool_entries_per_update"))
+		goto failure;
+
+	if (!is_power_of2(be32_to_cpu(dst->num_recv_pool_entries))) {
+		CONTROL_ERROR("num_recv_pool_entries (%d)"
+			      " must be power of 2\n",
+			      dst->num_recv_pool_entries);
+		goto failure;
+	}
+
+	if (!is_power_of2(be32_to_cpu(dst->
+				      free_recv_pool_entries_per_update))) {
+		CONTROL_ERROR("free_recv_pool_entries_per_update (%d)"
+			      " must be power of 2\n",
+			      dst->free_recv_pool_entries_per_update);
+		goto failure;
+	}
+
+	if (be32_to_cpu(dst->free_recv_pool_entries_per_update) >=
+	    be32_to_cpu(dst->num_recv_pool_entries)) {
+		CONTROL_ERROR("free_recv_pool_entries_per_update (%d) must"
+			      " be less than num_recv_pool_entries (%d)\n",
+			      dst->free_recv_pool_entries_per_update,
+			      dst->num_recv_pool_entries);
+		goto failure;
+	}
+
+	if (be32_to_cpu(dst->num_recv_pool_entries_before_kick) >=
+	    be32_to_cpu(dst->num_recv_pool_entries)) {
+		CONTROL_ERROR("num_recv_pool_entries_before_kick (%d) must"
+			      " be less than num_recv_pool_entries (%d)\n",
+			      dst->num_recv_pool_entries_before_kick,
+			      dst->num_recv_pool_entries);
+		goto failure;
+	}
+
+	return 0;
+failure:
+	return -1;
+}
+
+int control_config_data_path_req(struct control *control, u64 path_id,
+				     struct vnic_recv_pool_config *host,
+				     struct vnic_recv_pool_config *eioc)
+{
+	struct send_io				*send_io;
+	struct vnic_control_packet		*pkt;
+	struct vnic_cmd_config_data_path	*config_data_path;
+
+	CONTROL_FUNCTION("%s: control_config_data_path_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_CONFIG_DATA_PATH);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+	config_data_path = &pkt->cmd.config_data_path_req;
+	config_data_path->data_path = 0;
+	config_data_path->path_identifier = path_id;
+	copy_recv_pool_config(host,
+			      &config_data_path->host_recv_pool_config);
+	copy_recv_pool_config(eioc,
+			      &config_data_path->eioc_recv_pool_config);
+	CONTROL_PACKET(pkt);
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+int control_config_data_path_rsp(struct control *control,
+				 struct vnic_recv_pool_config *host,
+				 struct vnic_recv_pool_config *eioc,
+				 struct vnic_recv_pool_config *max_host,
+				 struct vnic_recv_pool_config *max_eioc,
+				 struct vnic_recv_pool_config *min_host,
+				 struct vnic_recv_pool_config *min_eioc)
+{
+	struct recv_io				*recv_io;
+	struct vnic_control_packet		*pkt;
+	struct vnic_cmd_config_data_path	*config_data_path;
+
+	CONTROL_FUNCTION("%s: control_config_data_path_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_CONFIG_DATA_PATH)
+		goto failure;
+
+	config_data_path = &pkt->cmd.config_data_path_rsp;
+	if (config_data_path->data_path != 0) {
+		CONTROL_ERROR("%s: received CMD_CONFIG_DATA_PATH response"
+			      " for wrong data path: %u\n",
+			      control_ifcfg_name(control),
+			      config_data_path->data_path);
+		goto failure;
+	}
+
+	if (check_recv_pool_config(&config_data_path->
+				   host_recv_pool_config,
+				   host, max_host, min_host)
+	    || check_recv_pool_config(&config_data_path->
+				      eioc_recv_pool_config,
+				      eioc, max_eioc, min_eioc)) {
+		goto failure;
+	}
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+int control_exchange_pools_req(struct control *control, u64 addr, u32 rkey)
+{
+	struct send_io			*send_io;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_exchange_pools	*exchange_pools;
+
+	CONTROL_FUNCTION("%s: control_exchange_pools_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_EXCHANGE_POOLS);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+	exchange_pools = &pkt->cmd.exchange_pools_req;
+	exchange_pools->data_path = 0;
+	exchange_pools->pool_rkey = cpu_to_be32(rkey);
+	exchange_pools->pool_addr = cpu_to_be64(addr);
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+int control_exchange_pools_rsp(struct control *control, u64 *addr,
+			       u32 *rkey)
+{
+	struct recv_io			*recv_io;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_exchange_pools	*exchange_pools;
+
+	CONTROL_FUNCTION("%s: control_exchange_pools_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_EXCHANGE_POOLS)
+		goto failure;
+
+	exchange_pools = &pkt->cmd.exchange_pools_rsp;
+	*rkey = be32_to_cpu(exchange_pools->pool_rkey);
+	*addr = be64_to_cpu(exchange_pools->pool_addr);
+
+	if (exchange_pools->data_path != 0) {
+		CONTROL_ERROR("%s: received CMD_EXCHANGE_POOLS response"
+			      " for wrong data path: %u\n",
+			      control_ifcfg_name(control),
+			      exchange_pools->data_path);
+		goto failure;
+	}
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+int control_config_link_req(struct control *control, u16 flags, u16 mtu)
+{
+	struct send_io			*send_io;
+	struct vnic_cmd_config_link	*config_link_req;
+	struct vnic_control_packet	*pkt;
+
+	CONTROL_FUNCTION("%s: control_config_link_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_CONFIG_LINK);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+	config_link_req = &pkt->cmd.config_link_req;
+	config_link_req->lan_switch_num =
+				control->lan_switch.lan_switch_num;
+	config_link_req->cmd_flags = VNIC_FLAG_SET_MTU;
+	if (flags & IFF_UP)
+		config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_NIC;
+	else
+		config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_NIC;
+	if (flags & IFF_ALLMULTI)
+		config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_MCAST_ALL;
+	else
+		config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_MCAST_ALL;
+	if (flags & IFF_PROMISC) {
+		config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_PROMISC;
+		/* the EIOU doesn't really do PROMISC mode.
+		 * if PROMISC is set, it only receives unicast packets
+		 * I also have to set MCAST_ALL if I want real
+		 * PROMISC mode.
+		 */
+		config_link_req->cmd_flags &= ~VNIC_FLAG_DISABLE_MCAST_ALL;
+		config_link_req->cmd_flags |= VNIC_FLAG_ENABLE_MCAST_ALL;
+	} else
+		config_link_req->cmd_flags |= VNIC_FLAG_DISABLE_PROMISC;
+
+	config_link_req->mtu_size = cpu_to_be16(mtu);
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+int control_config_link_rsp(struct control *control, u16 *flags, u16 *mtu)
+{
+	struct recv_io			*recv_io;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_config_link	*config_link_rsp;
+
+	CONTROL_FUNCTION("%s: control_config_link_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_CONFIG_LINK)
+		goto failure;
+	config_link_rsp = &pkt->cmd.config_link_rsp;
+	if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_NIC)
+		*flags |= IFF_UP;
+	if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL)
+		*flags |= IFF_ALLMULTI;
+	if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_PROMISC)
+		*flags |= IFF_PROMISC;
+
+	*mtu = be16_to_cpu(config_link_rsp->mtu_size);
+
+	if (control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC) {
+		/* featuresSupported might include INBOUND_IB_MC but
+		   MTU might cause it to be auto-disabled at embedded */
+		if (config_link_rsp->cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL) {
+			union ib_gid mgid = config_link_rsp->allmulti_mgid;
+			if (mgid.raw[0] != 0xff) {
+				CONTROL_ERROR("%s: invalid formatprefix "
+						VNIC_GID_FMT "\n",
+						control_ifcfg_name(control),
+						VNIC_GID_RAW_ARG(mgid.raw));
+			} else {
+				/* rather than issuing join here, which might
+				 * arrive at SM before EVIC creates the MC
+				 * group, postpone it.
+				 */
+				vnic_mc_join_setup(control->parent, &mgid);
+				CONTROL_ERROR("join setup for ALL_MULTI\n");
+			}
+		}
+		/* we don't want to leave mcast group if MCAST_ALL is disabled
+		 * because there are no doubt multicast addresses set and we
+		 * want to stay joined so we can get that traffic via the
+		 * mcast group.
+		 */
+	}
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+/* control_config_addrs_req:
+ * return values:
+ *          -1: failure
+ *           0: incomplete (successful operation, but more address
+ *              table entries to be updated)
+ *           1: complete
+ */
+int control_config_addrs_req(struct control *control,
+			     struct vnic_address_op2 *addrs, u16 num)
+{
+	u16  i;
+	u8   j;
+	int  ret = 1;
+	struct send_io				*send_io;
+	struct vnic_control_packet		*pkt;
+	struct vnic_cmd_config_addresses	*config_addrs_req;
+    struct vnic_cmd_config_addresses2   *config_addrs_req2;
+
+	CONTROL_FUNCTION("%s: control_config_addrs_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	if (control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC) {
+		CONTROL_INFO("Sending CMD_CONFIG_ADDRESSES2 %lx MAX:%d "
+				"sizes:%d %d(off:%d) sizes2:%d %d %d"
+				"(off:%d - %d %d %d %d %d %d %d)\n", jiffies,
+				(int)MAX_CONFIG_ADDR_ENTRIES2,
+				(int)sizeof(struct vnic_cmd_config_addresses),
+			(int)sizeof(struct vnic_address_op),
+			(int)offsetof(struct vnic_cmd_config_addresses,
+							list_address_ops),
+			(int)sizeof(struct vnic_cmd_config_addresses2),
+			(int)sizeof(struct vnic_address_op2),
+			(int)sizeof(union ib_gid),
+			(int)offsetof(struct vnic_cmd_config_addresses2,
+							list_address_ops),
+			(int)offsetof(struct vnic_address_op2, index),
+			(int)offsetof(struct vnic_address_op2, operation),
+			(int)offsetof(struct vnic_address_op2, valid),
+			(int)offsetof(struct vnic_address_op2, address),
+			(int)offsetof(struct vnic_address_op2, vlan),
+			(int)offsetof(struct vnic_address_op2, reserved),
+			(int)offsetof(struct vnic_address_op2, mgid)
+			);
+		send_io = control_init_hdr(control, CMD_CONFIG_ADDRESSES2);
+		if (!send_io)
+			goto failure;
+
+		pkt = control_packet(send_io);
+		config_addrs_req2 = &pkt->cmd.config_addresses_req2;
+		memset(pkt->cmd.cmd_data, 0, VNIC_MAX_CONTROLDATASZ);
+		config_addrs_req2->lan_switch_num =
+			control->lan_switch.lan_switch_num;
+		for (i = 0, j = 0; (i < num) && (j < MAX_CONFIG_ADDR_ENTRIES2); i++) {
+			if (!addrs[i].operation)
+				continue;
+			config_addrs_req2->list_address_ops[j].index =
+								 cpu_to_be16(i);
+			config_addrs_req2->list_address_ops[j].operation =
+							VNIC_OP_SET_ENTRY;
+			config_addrs_req2->list_address_ops[j].valid =
+								 addrs[i].valid;
+			memcpy(config_addrs_req2->list_address_ops[j].address,
+			       addrs[i].address, ETH_ALEN);
+			config_addrs_req2->list_address_ops[j].vlan =
+								 addrs[i].vlan;
+			addrs[i].operation = 0;
+			CONTROL_INFO("%s i=%d "
+				"addr[%d]=%02x:%02x:%02x:%02x:%02x:%02x "
+				"valid:%d\n", control_ifcfg_name(control), i, j,
+				addrs[i].address[0], addrs[i].address[1],
+				addrs[i].address[2], addrs[i].address[3],
+				addrs[i].address[4], addrs[i].address[5],
+				addrs[i].valid);
+			j++;
+		}
+		config_addrs_req2->num_address_ops = j;
+	} else {
+		send_io = control_init_hdr(control, CMD_CONFIG_ADDRESSES);
+		if (!send_io)
+			goto failure;
+
+		pkt = control_packet(send_io);
+		config_addrs_req = &pkt->cmd.config_addresses_req;
+		config_addrs_req->lan_switch_num =
+					control->lan_switch.lan_switch_num;
+		for (i = 0, j = 0; (i < num) && (j < 16); i++) {
+			if (!addrs[i].operation)
+				continue;
+			config_addrs_req->list_address_ops[j].index =
+								 cpu_to_be16(i);
+			config_addrs_req->list_address_ops[j].operation =
+							VNIC_OP_SET_ENTRY;
+			config_addrs_req->list_address_ops[j].valid =
+								 addrs[i].valid;
+			memcpy(config_addrs_req->list_address_ops[j].address,
+			       addrs[i].address, ETH_ALEN);
+			config_addrs_req->list_address_ops[j].vlan =
+								 addrs[i].vlan;
+			addrs[i].operation = 0;
+			j++;
+		}
+		config_addrs_req->num_address_ops = j;
+	}
+	for (; i < num; i++) {
+		if (addrs[i].operation) {
+			ret = 0;
+			break;
+		}
+	}
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+
+	if (control_send(control, send_io))
+		return -1;
+	return ret;
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+static int process_cmd_config_address2_rsp(struct control *control,
+					   struct vnic_control_packet *pkt,
+					   struct recv_io *recv_io)
+{
+	struct vnic_cmd_config_addresses2 *config_addrs_rsp2;
+	int idx, mcaddrs, nomgid;
+	union ib_gid mgid, rsp_mgid;
+
+	config_addrs_rsp2 = &pkt->cmd.config_addresses_rsp2;
+	CONTROL_INFO("%s rsp to CONFIG_ADDRESSES2\n",
+				 control_ifcfg_name(control));
+
+	for (idx = 0, mcaddrs = 0, nomgid = 1;
+			idx < config_addrs_rsp2->num_address_ops;
+				idx++) {
+		if (!config_addrs_rsp2->list_address_ops[idx].valid)
+			continue;
+
+		/* check if address is multicasts */
+		if (!vnic_multicast_address(config_addrs_rsp2, idx))
+			continue;
+
+		mcaddrs++;
+		mgid = config_addrs_rsp2->list_address_ops[idx].mgid;
+		CONTROL_INFO("%s: got mgid " VNIC_GID_FMT
+				" MCAST_MSG_SIZE:%d mtu:%d\n",
+				control_ifcfg_name(control),
+				VNIC_GID_RAW_ARG(mgid.raw),
+				(int)MCAST_MSG_SIZE,
+				control->parent->mtu);
+
+		/* Embedded should have turned off multicast
+		 * due to large MTU size; mgid had better be 0.
+		 */
+		if (control->parent->mtu > MCAST_MSG_SIZE) {
+			if ((mgid.global.subnet_prefix != 0) ||
+				(mgid.global.interface_id != 0)) {
+				CONTROL_ERROR("%s: invalid mgid; "
+						"expected 0 "
+						VNIC_GID_FMT "\n",
+						control_ifcfg_name(control),
+						VNIC_GID_RAW_ARG(mgid.raw));
+				}
+				continue;
+			}
+		if (mgid.raw[0] != 0xff) {
+			CONTROL_ERROR("%s: invalid formatprefix "
+					VNIC_GID_FMT "\n",
+					control_ifcfg_name(control),
+					VNIC_GID_RAW_ARG(mgid.raw));
+			continue;
+		}
+		nomgid = 0; /* got a valid mgid */
+
+		/* let's verify that all the mgids match this one */
+		for (; idx < config_addrs_rsp2->num_address_ops; idx++) {
+			if (!config_addrs_rsp2->list_address_ops[idx].valid)
+				continue;
+
+			/* check if address is multicasts */
+			if (!vnic_multicast_address(config_addrs_rsp2, idx))
+				continue;
+
+			rsp_mgid = config_addrs_rsp2->list_address_ops[idx].mgid;
+			if (memcmp(&mgid, &rsp_mgid, sizeof(union ib_gid)) == 0)
+				continue;
+
+			CONTROL_ERROR("%s: Multicast Group MGIDs not "
+					"unique; mgids: " VNIC_GID_FMT
+					 " " VNIC_GID_FMT "\n",
+					control_ifcfg_name(control),
+					VNIC_GID_RAW_ARG(mgid.raw),
+					VNIC_GID_RAW_ARG(rsp_mgid.raw));
+			return 1;
+		}
+
+		/* rather than issuing join here, which might arrive
+		 * at SM before EVIC creates the MC group, postpone it.
+		 */
+		vnic_mc_join_setup(control->parent, &mgid);
+
+		/* there is only one multicast group to join, so we're done. */
+		break;
+	}
+
+	/* we sent atleast one multicast address but got no MGID
+	 * back so, if it is not allmulti case, leave the group
+	 * we joined before. (for allmulti case we have to stay
+	 * joined)
+	 */
+	if ((config_addrs_rsp2->num_address_ops > 0) && (mcaddrs > 0) &&
+		nomgid && !(control->parent->flags & IFF_ALLMULTI)) {
+		CONTROL_INFO("numaddrops:%d mcadrs:%d nomgid:%d\n",
+			config_addrs_rsp2->num_address_ops,
+				mcaddrs > 0, nomgid);
+
+		vnic_mc_leave(control->parent);
+	}
+
+	return 0;
+}
+
+int control_config_addrs_rsp(struct control *control)
+{
+	struct recv_io *recv_io;
+	struct vnic_control_packet *pkt;
+
+	CONTROL_FUNCTION("%s: control_config_addrs_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if ((pkt->hdr.pkt_cmd != CMD_CONFIG_ADDRESSES) &&
+		(pkt->hdr.pkt_cmd != CMD_CONFIG_ADDRESSES2))
+		goto failure;
+
+	if (((pkt->hdr.pkt_cmd == CMD_CONFIG_ADDRESSES2) &&
+	      !control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC) ||
+	      ((pkt->hdr.pkt_cmd == CMD_CONFIG_ADDRESSES) &&
+	       control->parent->features_supported & VNIC_FEAT_INBOUND_IB_MC)) {
+		CONTROL_ERROR("%s unexpected response pktCmd:%d flag:%x\n",
+				control_ifcfg_name(control), pkt->hdr.pkt_cmd,
+				control->parent->features_supported &
+				VNIC_FEAT_INBOUND_IB_MC);
+		goto failure;
+	}
+
+	if (pkt->hdr.pkt_cmd == CMD_CONFIG_ADDRESSES2) {
+		if (process_cmd_config_address2_rsp(control, pkt, recv_io))
+			goto failure;
+	} else {
+		struct vnic_cmd_config_addresses *config_addrs_rsp;
+		config_addrs_rsp = &pkt->cmd.config_addresses_rsp;
+	}
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+int control_report_statistics_req(struct control *control)
+{
+	struct send_io				*send_io;
+	struct vnic_control_packet		*pkt;
+	struct vnic_cmd_report_stats_req	*report_statistics_req;
+
+	CONTROL_FUNCTION("%s: control_report_statistics_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_REPORT_STATISTICS);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+	report_statistics_req = &pkt->cmd.report_statistics_req;
+	report_statistics_req->lan_switch_num =
+	    control->lan_switch.lan_switch_num;
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+int control_report_statistics_rsp(struct control *control,
+				  struct vnic_cmd_report_stats_rsp *stats)
+{
+	struct recv_io				*recv_io;
+	struct vnic_control_packet		*pkt;
+	struct vnic_cmd_report_stats_rsp	*rep_stat_rsp;
+
+	CONTROL_FUNCTION("%s: control_report_statistics_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_REPORT_STATISTICS)
+		goto failure;
+
+	rep_stat_rsp = &pkt->cmd.report_statistics_rsp;
+
+	stats->if_in_broadcast_pkts   = rep_stat_rsp->if_in_broadcast_pkts;
+	stats->if_in_multicast_pkts   = rep_stat_rsp->if_in_multicast_pkts;
+	stats->if_in_octets	      = rep_stat_rsp->if_in_octets;
+	stats->if_in_ucast_pkts       = rep_stat_rsp->if_in_ucast_pkts;
+	stats->if_in_nucast_pkts      = rep_stat_rsp->if_in_nucast_pkts;
+	stats->if_in_underrun	      = rep_stat_rsp->if_in_underrun;
+	stats->if_in_errors	      = rep_stat_rsp->if_in_errors;
+	stats->if_out_errors	      = rep_stat_rsp->if_out_errors;
+	stats->if_out_octets	      = rep_stat_rsp->if_out_octets;
+	stats->if_out_ucast_pkts      = rep_stat_rsp->if_out_ucast_pkts;
+	stats->if_out_multicast_pkts  = rep_stat_rsp->if_out_multicast_pkts;
+	stats->if_out_broadcast_pkts  = rep_stat_rsp->if_out_broadcast_pkts;
+	stats->if_out_nucast_pkts     = rep_stat_rsp->if_out_nucast_pkts;
+	stats->if_out_ok	      = rep_stat_rsp->if_out_ok;
+	stats->if_in_ok		      = rep_stat_rsp->if_in_ok;
+	stats->if_out_ucast_bytes     = rep_stat_rsp->if_out_ucast_bytes;
+	stats->if_out_multicast_bytes = rep_stat_rsp->if_out_multicast_bytes;
+	stats->if_out_broadcast_bytes = rep_stat_rsp->if_out_broadcast_bytes;
+	stats->if_in_ucast_bytes      = rep_stat_rsp->if_in_ucast_bytes;
+	stats->if_in_multicast_bytes  = rep_stat_rsp->if_in_multicast_bytes;
+	stats->if_in_broadcast_bytes  = rep_stat_rsp->if_in_broadcast_bytes;
+	stats->ethernet_status	      = rep_stat_rsp->ethernet_status;
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+int control_reset_req(struct control *control)
+{
+	struct send_io			*send_io;
+	struct vnic_control_packet	*pkt;
+
+	CONTROL_FUNCTION("%s: control_reset_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_RESET);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+int control_reset_rsp(struct control *control)
+{
+	struct recv_io			*recv_io;
+	struct vnic_control_packet	*pkt;
+
+	CONTROL_FUNCTION("%s: control_reset_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_RESET)
+		goto failure;
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+int control_heartbeat_req(struct control *control, u32 hb_interval)
+{
+	struct send_io			*send_io;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_heartbeat	*heartbeat_req;
+
+	CONTROL_FUNCTION("%s: control_heartbeat_req()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->send_dma, control->send_len,
+				   DMA_TO_DEVICE);
+
+	send_io = control_init_hdr(control, CMD_HEARTBEAT);
+	if (!send_io)
+		goto failure;
+
+	pkt = control_packet(send_io);
+	heartbeat_req = &pkt->cmd.heartbeat_req;
+	heartbeat_req->hb_interval = cpu_to_be32(hb_interval);
+
+	control->last_cmd = pkt->hdr.pkt_cmd;
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return control_send(control, send_io);
+failure:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->send_dma, control->send_len,
+				      DMA_TO_DEVICE);
+	return -1;
+}
+
+int control_heartbeat_rsp(struct control *control)
+{
+	struct recv_io			*recv_io;
+	struct vnic_control_packet	*pkt;
+	struct vnic_cmd_heartbeat	*heartbeat_rsp;
+
+	CONTROL_FUNCTION("%s: control_heartbeat_rsp()\n",
+			 control_ifcfg_name(control));
+	ib_dma_sync_single_for_cpu(control->parent->config->ibdev,
+				   control->recv_dma, control->recv_len,
+				   DMA_FROM_DEVICE);
+
+	recv_io = control_get_rsp(control);
+	if (!recv_io)
+		goto out;
+
+	pkt = control_packet(recv_io);
+	if (pkt->hdr.pkt_cmd != CMD_HEARTBEAT)
+		goto failure;
+
+	heartbeat_rsp = &pkt->cmd.heartbeat_rsp;
+
+	control_recv(control, recv_io);
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return 0;
+failure:
+	viport_failure(control->parent);
+out:
+	ib_dma_sync_single_for_device(control->parent->config->ibdev,
+				      control->recv_dma, control->recv_len,
+				      DMA_FROM_DEVICE);
+	return -1;
+}
+
+static int control_init_recv_ios(struct control *control,
+				 struct viport *viport,
+				 struct vnic_control_packet *pkt)
+{
+	struct io		*io;
+	struct ib_device	*ibdev = viport->config->ibdev;
+	struct control_config	*config = control->config;
+	dma_addr_t		recv_dma;
+	unsigned int		i;
+
+
+	control->recv_len = sizeof *pkt * config->num_recvs;
+	control->recv_dma = ib_dma_map_single(ibdev,
+					      pkt, control->recv_len,
+					      DMA_FROM_DEVICE);
+
+	if (ib_dma_mapping_error(ibdev, control->recv_dma)) {
+		CONTROL_ERROR("control recv dma map error\n");
+		goto failure;
+	}
+
+	recv_dma = control->recv_dma;
+	for (i = 0; i < config->num_recvs; i++) {
+		io = &control->recv_ios[i].io;
+		io->viport = viport;
+		io->routine = control_recv_complete;
+		io->type = RECV;
+
+		control->recv_ios[i].virtual_addr = (u8 *)pkt;
+		control->recv_ios[i].list.addr = recv_dma;
+		control->recv_ios[i].list.length = sizeof *pkt;
+		control->recv_ios[i].list.lkey = control->mr->lkey;
+
+		recv_dma = recv_dma + sizeof *pkt;
+		pkt++;
+
+		io->rwr.wr_id = (u64)io;
+		io->rwr.sg_list = &control->recv_ios[i].list;
+		io->rwr.num_sge = 1;
+		if (vnic_ib_post_recv(&control->ib_conn, io))
+			goto unmap_recv;
+	}
+
+	return 0;
+unmap_recv:
+	ib_dma_unmap_single(control->parent->config->ibdev,
+			    control->recv_dma, control->recv_len,
+			    DMA_FROM_DEVICE);
+failure:
+	return -1;
+}
+
+static int control_init_send_ios(struct control *control,
+				 struct viport *viport,
+				 struct vnic_control_packet *pkt)
+{
+	struct io		*io;
+	struct ib_device	*ibdev = viport->config->ibdev;
+
+	control->send_io.virtual_addr = (u8 *)pkt;
+	control->send_len = sizeof *pkt;
+	control->send_dma = ib_dma_map_single(ibdev, pkt,
+					      control->send_len,
+					      DMA_TO_DEVICE);
+	if (ib_dma_mapping_error(ibdev, control->send_dma)) {
+		CONTROL_ERROR("control send dma map error\n");
+		goto failure;
+	}
+
+	io = &control->send_io.io;
+	io->viport = viport;
+	io->routine = control_send_complete;
+
+	control->send_io.list.addr = control->send_dma;
+	control->send_io.list.length = sizeof *pkt;
+	control->send_io.list.lkey = control->mr->lkey;
+
+	io->swr.wr_id = (u64)io;
+	io->swr.sg_list = &control->send_io.list;
+	io->swr.num_sge = 1;
+	io->swr.opcode = IB_WR_SEND;
+	io->swr.send_flags = IB_SEND_SIGNALED;
+	io->type = SEND;
+
+	return 0;
+failure:
+	return -1;
+}
+
+int control_init(struct control *control, struct viport *viport,
+		 struct control_config *config, struct ib_pd *pd)
+{
+	struct vnic_control_packet	*pkt;
+	unsigned int sz;
+
+	CONTROL_FUNCTION("%s: control_init()\n",
+			 control_ifcfg_name(control));
+	control->parent = viport;
+	control->config = config;
+	control->ib_conn.viport = viport;
+	control->ib_conn.ib_config = &config->ib_config;
+	control->ib_conn.state = IB_CONN_UNINITTED;
+	control->ib_conn.callback_thread = NULL;
+	control->ib_conn.callback_thread_end = 0;
+	control->req_state = REQ_INACTIVE;
+	control->last_cmd  = CMD_INVALID;
+	control->seq_num = 0;
+	control->response = NULL;
+	control->info = NULL;
+	INIT_LIST_HEAD(&control->failure_list);
+	spin_lock_init(&control->io_lock);
+
+	if (vnic_ib_conn_init(&control->ib_conn, viport, pd,
+			      &config->ib_config)) {
+		CONTROL_ERROR("Control IB connection"
+			      " initialization failed\n");
+		goto failure;
+	}
+
+	control->mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
+	if (IS_ERR(control->mr)) {
+		CONTROL_ERROR("%s: failed to register memory"
+			      " for control connection\n",
+			      control_ifcfg_name(control));
+		goto destroy_conn;
+	}
+
+	control->ib_conn.cm_id = ib_create_cm_id(viport->config->ibdev,
+						 vnic_ib_cm_handler,
+						 &control->ib_conn);
+	if (IS_ERR(control->ib_conn.cm_id)) {
+		CONTROL_ERROR("creating control CM ID failed\n");
+		goto destroy_mr;
+	}
+
+	sz = sizeof(struct recv_io) * config->num_recvs;
+	control->recv_ios = vmalloc(sz);
+
+	if (!control->recv_ios) {
+		CONTROL_ERROR("%s: failed allocating space for recv ios\n",
+			      control_ifcfg_name(control));
+		goto destroy_cm_id;
+	}
+
+	memset(control->recv_ios, 0, sz);
+	/*One send buffer and num_recvs recv buffers */
+	control->local_storage = kzalloc(sizeof *pkt *
+					 (config->num_recvs + 1),
+					 GFP_KERNEL);
+
+	if (!control->local_storage) {
+		CONTROL_ERROR("%s: failed allocating space"
+			      " for local storage\n",
+			      control_ifcfg_name(control));
+		goto free_recv_ios;
+	}
+
+	pkt = control->local_storage;
+	if (control_init_send_ios(control, viport, pkt))
+		goto free_storage;
+
+	pkt++;
+	if (control_init_recv_ios(control, viport, pkt))
+		goto unmap_send;
+
+	return 0;
+
+unmap_send:
+	ib_dma_unmap_single(control->parent->config->ibdev,
+			    control->send_dma, control->send_len,
+			    DMA_TO_DEVICE);
+free_storage:
+	kfree(control->local_storage);
+free_recv_ios:
+	vfree(control->recv_ios);
+destroy_cm_id:
+	ib_destroy_cm_id(control->ib_conn.cm_id);
+destroy_mr:
+	ib_dereg_mr(control->mr);
+destroy_conn:
+	ib_destroy_qp(control->ib_conn.qp);
+	ib_destroy_cq(control->ib_conn.cq);
+failure:
+	return -1;
+}
+
+void control_cleanup(struct control *control)
+{
+	CONTROL_FUNCTION("%s: control_disconnect()\n",
+			 control_ifcfg_name(control));
+
+	if (ib_send_cm_dreq(control->ib_conn.cm_id, NULL, 0))
+		printk(KERN_DEBUG "control CM DREQ sending failed\n");
+
+	control->ib_conn.state = IB_CONN_DISCONNECTED;
+	control_timer_stop(control);
+	control->req_state  = REQ_INACTIVE;
+	control->response   = NULL;
+	control->last_cmd   = CMD_INVALID;
+	completion_callback_cleanup(&control->ib_conn);
+	ib_destroy_cm_id(control->ib_conn.cm_id);
+	ib_destroy_qp(control->ib_conn.qp);
+	ib_destroy_cq(control->ib_conn.cq);
+	ib_dereg_mr(control->mr);
+	ib_dma_unmap_single(control->parent->config->ibdev,
+			    control->send_dma, control->send_len,
+			    DMA_TO_DEVICE);
+	ib_dma_unmap_single(control->parent->config->ibdev,
+			    control->recv_dma, control->recv_len,
+			    DMA_FROM_DEVICE);
+	vfree(control->recv_ios);
+	kfree(control->local_storage);
+
+}
+
+static void control_log_report_status_pkt(struct vnic_control_packet *pkt)
+{
+	printk(KERN_INFO
+	       "               pkt_cmd = CMD_REPORT_STATUS\n");
+	printk(KERN_INFO
+	       "               pkt_seq_num = %u,"
+	       " pkt_retry_count = %u\n",
+	       pkt->hdr.pkt_seq_num,
+	       pkt->hdr.pkt_retry_count);
+	printk(KERN_INFO
+	       "               lan_switch_num = %u, is_fatal = %u\n",
+	       pkt->cmd.report_status.lan_switch_num,
+	       pkt->cmd.report_status.is_fatal);
+	printk(KERN_INFO
+	       "               status_number = %u, status_info = %u\n",
+	       be32_to_cpu(pkt->cmd.report_status.status_number),
+	       be32_to_cpu(pkt->cmd.report_status.status_info));
+	pkt->cmd.report_status.file_name[31] = '\0';
+	pkt->cmd.report_status.routine[31] = '\0';
+	printk(KERN_INFO "               filename = %s, routine = %s\n",
+	       pkt->cmd.report_status.file_name,
+	       pkt->cmd.report_status.routine);
+	printk(KERN_INFO
+	       "               line_num = %u, error_parameter = %u\n",
+	       be32_to_cpu(pkt->cmd.report_status.line_num),
+	       be32_to_cpu(pkt->cmd.report_status.error_parameter));
+	pkt->cmd.report_status.desc_text[127] = '\0';
+	printk(KERN_INFO "               desc_text = %s\n",
+	       pkt->cmd.report_status.desc_text);
+}
+
+static void control_log_report_stats_pkt(struct vnic_control_packet *pkt)
+{
+	printk(KERN_INFO
+	       "               pkt_cmd = CMD_REPORT_STATISTICS\n");
+	printk(KERN_INFO
+	       "               pkt_seq_num = %u,"
+	       " pkt_retry_count = %u\n",
+	       pkt->hdr.pkt_seq_num,
+	       pkt->hdr.pkt_retry_count);
+	printk(KERN_INFO "               lan_switch_num = %u\n",
+	       pkt->cmd.report_statistics_req.lan_switch_num);
+	if (pkt->hdr.pkt_type == TYPE_REQ)
+		return;
+	printk(KERN_INFO "               if_in_broadcast_pkts = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_broadcast_pkts));
+	printk(" if_in_multicast_pkts = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_multicast_pkts));
+	printk(KERN_INFO "               if_in_octets = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_octets));
+	printk(" if_in_ucast_pkts = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_ucast_pkts));
+	printk(KERN_INFO "               if_in_nucast_pkts = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_nucast_pkts));
+	printk(" if_in_underrun = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_underrun));
+	printk(KERN_INFO "               if_in_errors = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_errors));
+	printk(" if_out_errors = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_errors));
+	printk(KERN_INFO "               if_out_octets = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_octets));
+	printk(" if_out_ucast_pkts = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_ucast_pkts));
+	printk(KERN_INFO "               if_out_multicast_pkts = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_multicast_pkts));
+	printk(" if_out_broadcast_pkts = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_broadcast_pkts));
+	printk(KERN_INFO "               if_out_nucast_pkts = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_nucast_pkts));
+	printk(" if_out_ok = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.if_out_ok));
+	printk(KERN_INFO "               if_in_ok = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.if_in_ok));
+	printk(" if_out_ucast_bytes = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_ucast_bytes));
+	printk(KERN_INFO "               if_out_multicast_bytes = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+		      if_out_multicast_bytes));
+	printk(" if_out_broadcast_bytes = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_out_broadcast_bytes));
+	printk(KERN_INFO "               if_in_ucast_bytes = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_ucast_bytes));
+	printk(" if_in_multicast_bytes = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_multicast_bytes));
+	printk(KERN_INFO "               if_in_broadcast_bytes = %llu",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   if_in_broadcast_bytes));
+	printk(" ethernet_status = %llu\n",
+	       be64_to_cpu(pkt->cmd.report_statistics_rsp.
+			   ethernet_status));
+}
+
+static void control_log_config_link_pkt(struct vnic_control_packet *pkt)
+{
+	printk(KERN_INFO
+	       "               pkt_cmd = CMD_CONFIG_LINK\n");
+	printk(KERN_INFO
+	       "               pkt_seq_num = %u,"
+	       " pkt_retry_count = %u\n",
+	       pkt->hdr.pkt_seq_num,
+	       pkt->hdr.pkt_retry_count);
+	printk(KERN_INFO "               cmd_flags = %x\n",
+	       pkt->cmd.config_link_req.cmd_flags);
+	if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_ENABLE_NIC)
+		printk(KERN_INFO
+		       "                      VNIC_FLAG_ENABLE_NIC\n");
+	if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_DISABLE_NIC)
+		printk(KERN_INFO
+		       "                      VNIC_FLAG_DISABLE_NIC\n");
+	if (pkt->cmd.config_link_req.
+	    cmd_flags & VNIC_FLAG_ENABLE_MCAST_ALL)
+		printk(KERN_INFO
+		       "                     VNIC_FLAG_ENABLE_"
+		       "MCAST_ALL\n");
+	if (pkt->cmd.config_link_req.
+	    cmd_flags & VNIC_FLAG_DISABLE_MCAST_ALL)
+		printk(KERN_INFO
+		       "                       VNIC_FLAG_DISABLE_"
+		       "MCAST_ALL\n");
+	if (pkt->cmd.config_link_req.
+	    cmd_flags & VNIC_FLAG_ENABLE_PROMISC)
+		printk(KERN_INFO
+		       "                       VNIC_FLAG_ENABLE_"
+		       "PROMISC\n");
+	if (pkt->cmd.config_link_req.
+	    cmd_flags & VNIC_FLAG_DISABLE_PROMISC)
+		printk(KERN_INFO
+		       "                       VNIC_FLAG_DISABLE_"
+		       "PROMISC\n");
+	if (pkt->cmd.config_link_req.cmd_flags & VNIC_FLAG_SET_MTU)
+		printk(KERN_INFO
+		       "                       VNIC_FLAG_SET_MTU\n");
+	printk(KERN_INFO
+	       "               lan_switch_num = %x, mtu_size = %d\n",
+	       pkt->cmd.config_link_req.lan_switch_num,
+	       be16_to_cpu(pkt->cmd.config_link_req.mtu_size));
+	if (pkt->hdr.pkt_type == TYPE_RSP) {
+		printk(KERN_INFO
+		       "               default_vlan = %u,"
+		       " hw_mac_address ="
+		       " %02x:%02x:%02x:%02x:%02x:%02x\n",
+		       be16_to_cpu(pkt->cmd.config_link_req.
+				   default_vlan),
+		       pkt->cmd.config_link_req.hw_mac_address[0],
+		       pkt->cmd.config_link_req.hw_mac_address[1],
+		       pkt->cmd.config_link_req.hw_mac_address[2],
+		       pkt->cmd.config_link_req.hw_mac_address[3],
+		       pkt->cmd.config_link_req.hw_mac_address[4],
+		       pkt->cmd.config_link_req.hw_mac_address[5]);
+	}
+}
+
+static void print_config_addr(struct vnic_address_op *list,
+				int num_address_ops, size_t mgidoff)
+{
+	int i = 0;
+
+	while (i < num_address_ops && i < 16) {
+		printk(KERN_INFO "               list_address_ops[%u].index"
+				 " = %u\n", i, be16_to_cpu(list->index));
+		switch (list->operation) {
+		case VNIC_OP_GET_ENTRY:
+			printk(KERN_INFO "               list_address_ops[%u]."
+					 "operation = VNIC_OP_GET_ENTRY\n", i);
+			break;
+		case VNIC_OP_SET_ENTRY:
+			printk(KERN_INFO "               list_address_ops[%u]."
+					 "operation = VNIC_OP_SET_ENTRY\n", i);
+			break;
+		default:
+			printk(KERN_INFO "               list_address_ops[%u]."
+					 "operation = UNKNOWN(%d)\n", i,
+					 list->operation);
+			break;
+		}
+		printk(KERN_INFO "               list_address_ops[%u].valid"
+				 " = %u\n", i, list->valid);
+		printk(KERN_INFO "               list_address_ops[%u].address"
+				 " = %02x:%02x:%02x:%02x:%02x:%02x\n", i,
+				 list->address[0], list->address[1],
+				 list->address[2], list->address[3],
+				 list->address[4], list->address[5]);
+		printk(KERN_INFO "               list_address_ops[%u].vlan"
+				 " = %u\n", i, be16_to_cpu(list->vlan));
+		if (mgidoff) {
+			printk(KERN_INFO
+				 "               list_address_ops[%u].mgid"
+				 " = " VNIC_GID_FMT "\n", i,
+				 VNIC_GID_RAW_ARG((char *)list + mgidoff));
+			list = (struct vnic_address_op *)
+			       ((char *)list + sizeof(struct vnic_address_op2));
+		} else
+			list = (struct vnic_address_op *)
+			       ((char *)list + sizeof(struct vnic_address_op));
+	i++;
+	}
+}
+
+static void control_log_config_addrs_pkt(struct vnic_control_packet *pkt,
+					u8 addresses2)
+{
+	struct vnic_address_op *list;
+	int no_address_ops;
+
+	if (addresses2)
+		printk(KERN_INFO
+			"               pkt_cmd = CMD_CONFIG_ADDRESSES2\n");
+	else
+		printk(KERN_INFO
+			"               pkt_cmd = CMD_CONFIG_ADDRESSES\n");
+	printk(KERN_INFO "               pkt_seq_num = %u,"
+			" pkt_retry_count = %u\n",
+			pkt->hdr.pkt_seq_num, pkt->hdr.pkt_retry_count);
+	if (addresses2) {
+		printk(KERN_INFO "               num_address_ops = %x,"
+				" lan_switch_num = %d\n",
+				pkt->cmd.config_addresses_req2.num_address_ops,
+				pkt->cmd.config_addresses_req2.lan_switch_num);
+		list = (struct vnic_address_op *)
+				pkt->cmd.config_addresses_req2.list_address_ops;
+		no_address_ops = pkt->cmd.config_addresses_req2.num_address_ops;
+		print_config_addr(list, no_address_ops,
+				offsetof(struct vnic_address_op2, mgid));
+	} else {
+		printk(KERN_INFO "               num_address_ops = %x,"
+				" lan_switch_num = %d\n",
+				pkt->cmd.config_addresses_req.num_address_ops,
+				pkt->cmd.config_addresses_req.lan_switch_num);
+		list = pkt->cmd.config_addresses_req.list_address_ops;
+		no_address_ops = pkt->cmd.config_addresses_req.num_address_ops;
+		print_config_addr(list, no_address_ops, 0);
+	}
+}
+
+static void control_log_exch_pools_pkt(struct vnic_control_packet *pkt)
+{
+	printk(KERN_INFO
+	       "               pkt_cmd = CMD_EXCHANGE_POOLS\n");
+	printk(KERN_INFO
+	       "               pkt_seq_num = %u,"
+	       " pkt_retry_count = %u\n",
+	       pkt->hdr.pkt_seq_num,
+	       pkt->hdr.pkt_retry_count);
+	printk(KERN_INFO "               datapath = %u\n",
+	       pkt->cmd.exchange_pools_req.data_path);
+	printk(KERN_INFO "               pool_rkey = %08x"
+	       " pool_addr = %llx\n",
+	       be32_to_cpu(pkt->cmd.exchange_pools_req.pool_rkey),
+	       be64_to_cpu(pkt->cmd.exchange_pools_req.pool_addr));
+}
+
+static void control_log_data_path_pkt(struct vnic_control_packet *pkt)
+{
+	printk(KERN_INFO
+	       "               pkt_cmd = CMD_CONFIG_DATA_PATH\n");
+	printk(KERN_INFO
+	       "               pkt_seq_num = %u,"
+	       " pkt_retry_count = %u\n",
+	       pkt->hdr.pkt_seq_num,
+	       pkt->hdr.pkt_retry_count);
+	printk(KERN_INFO "               path_identifier = %llx,"
+	       " data_path = %u\n",
+	       pkt->cmd.config_data_path_req.path_identifier,
+	       pkt->cmd.config_data_path_req.data_path);
+	printk(KERN_INFO
+	       "host config    size_recv_pool_entry = %u,"
+	       " num_recv_pool_entries = %u\n",
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      host_recv_pool_config.size_recv_pool_entry),
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      host_recv_pool_config.num_recv_pool_entries));
+	printk(KERN_INFO
+	       "               timeout_before_kick = %u,"
+	       " num_recv_pool_entries_before_kick = %u\n",
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      host_recv_pool_config.timeout_before_kick),
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      host_recv_pool_config.
+		      num_recv_pool_entries_before_kick));
+	printk(KERN_INFO
+	       "               num_recv_pool_bytes_before_kick = %u,"
+	       " free_recv_pool_entries_per_update = %u\n",
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      host_recv_pool_config.
+		      num_recv_pool_bytes_before_kick),
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      host_recv_pool_config.
+		      free_recv_pool_entries_per_update));
+	printk(KERN_INFO
+	       "eioc config    size_recv_pool_entry = %u,"
+	       " num_recv_pool_entries = %u\n",
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      eioc_recv_pool_config.size_recv_pool_entry),
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      eioc_recv_pool_config.num_recv_pool_entries));
+	printk(KERN_INFO
+	       "               timeout_before_kick = %u,"
+	       " num_recv_pool_entries_before_kick = %u\n",
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      eioc_recv_pool_config.timeout_before_kick),
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      eioc_recv_pool_config.
+		      num_recv_pool_entries_before_kick));
+	printk(KERN_INFO
+	       "               num_recv_pool_bytes_before_kick = %u,"
+	       " free_recv_pool_entries_per_update = %u\n",
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      eioc_recv_pool_config.
+		      num_recv_pool_bytes_before_kick),
+	       be32_to_cpu(pkt->cmd.config_data_path_req.
+		      eioc_recv_pool_config.
+		      free_recv_pool_entries_per_update));
+}
+
+static void control_log_init_vnic_pkt(struct vnic_control_packet *pkt)
+{
+	printk(KERN_INFO
+	       "               pkt_cmd = CMD_INIT_VNIC\n");
+	printk(KERN_INFO
+	       "               pkt_seq_num = %u,"
+	       " pkt_retry_count = %u\n",
+	       pkt->hdr.pkt_seq_num,
+	       pkt->hdr.pkt_retry_count);
+	printk(KERN_INFO
+	       "               vnic_major_version = %u,"
+	       " vnic_minor_version = %u\n",
+	       be16_to_cpu(pkt->cmd.init_vnic_req.vnic_major_version),
+	       be16_to_cpu(pkt->cmd.init_vnic_req.vnic_minor_version));
+	if (pkt->hdr.pkt_type == TYPE_REQ) {
+		printk(KERN_INFO
+		       "               vnic_instance = %u,"
+		       " num_data_paths = %u\n",
+		       pkt->cmd.init_vnic_req.vnic_instance,
+		       pkt->cmd.init_vnic_req.num_data_paths);
+		printk(KERN_INFO
+		       "               num_address_entries = %u\n",
+		       be16_to_cpu(pkt->cmd.init_vnic_req.
+			      num_address_entries));
+	} else {
+		printk(KERN_INFO
+		       "               num_lan_switches = %u,"
+		       " num_data_paths = %u\n",
+		       pkt->cmd.init_vnic_rsp.num_lan_switches,
+		       pkt->cmd.init_vnic_rsp.num_data_paths);
+		printk(KERN_INFO
+		       "               num_address_entries = %u,"
+		       " features_supported = %08x\n",
+		       be16_to_cpu(pkt->cmd.init_vnic_rsp.
+			      num_address_entries),
+		       be32_to_cpu(pkt->cmd.init_vnic_rsp.
+			      features_supported));
+		if (pkt->cmd.init_vnic_rsp.num_lan_switches != 0) {
+			printk(KERN_INFO
+			       "lan_switch[0]  lan_switch_num = %u,"
+			       " num_enet_ports = %08x\n",
+			       pkt->cmd.init_vnic_rsp.
+			       lan_switch[0].lan_switch_num,
+			       pkt->cmd.init_vnic_rsp.
+			       lan_switch[0].num_enet_ports);
+			printk(KERN_INFO
+			       "               default_vlan = %u,"
+			       " hw_mac_address ="
+			       " %02x:%02x:%02x:%02x:%02x:%02x\n",
+			       be16_to_cpu(pkt->cmd.init_vnic_rsp.
+				      lan_switch[0].default_vlan),
+			       pkt->cmd.init_vnic_rsp.lan_switch[0].
+			       hw_mac_address[0],
+			       pkt->cmd.init_vnic_rsp.lan_switch[0].
+			       hw_mac_address[1],
+			       pkt->cmd.init_vnic_rsp.lan_switch[0].
+			       hw_mac_address[2],
+			       pkt->cmd.init_vnic_rsp.lan_switch[0].
+			       hw_mac_address[3],
+			       pkt->cmd.init_vnic_rsp.lan_switch[0].
+			       hw_mac_address[4],
+			       pkt->cmd.init_vnic_rsp.lan_switch[0].
+			       hw_mac_address[5]);
+		}
+	}
+}
+
+static void control_log_control_packet(struct vnic_control_packet *pkt)
+{
+	switch (pkt->hdr.pkt_type) {
+	case TYPE_INFO:
+		printk(KERN_INFO "control_packet: pkt_type = TYPE_INFO\n");
+		break;
+	case TYPE_REQ:
+		printk(KERN_INFO "control_packet: pkt_type = TYPE_REQ\n");
+		break;
+	case TYPE_RSP:
+		printk(KERN_INFO "control_packet: pkt_type = TYPE_RSP\n");
+		break;
+	case TYPE_ERR:
+		printk(KERN_INFO "control_packet: pkt_type = TYPE_ERR\n");
+		break;
+	default:
+		printk(KERN_INFO "control_packet: pkt_type = UNKNOWN\n");
+	}
+
+	switch (pkt->hdr.pkt_cmd) {
+	case CMD_INIT_VNIC:
+		control_log_init_vnic_pkt(pkt);
+		break;
+	case CMD_CONFIG_DATA_PATH:
+		control_log_data_path_pkt(pkt);
+		break;
+	case CMD_EXCHANGE_POOLS:
+		control_log_exch_pools_pkt(pkt);
+		break;
+	case CMD_CONFIG_ADDRESSES:
+		control_log_config_addrs_pkt(pkt, 0);
+		break;
+	case CMD_CONFIG_ADDRESSES2:
+		control_log_config_addrs_pkt(pkt, 1);
+		break;
+	case CMD_CONFIG_LINK:
+		control_log_config_link_pkt(pkt);
+		break;
+	case CMD_REPORT_STATISTICS:
+		control_log_report_stats_pkt(pkt);
+		break;
+	case CMD_CLEAR_STATISTICS:
+		printk(KERN_INFO
+		       "               pkt_cmd = CMD_CLEAR_STATISTICS\n");
+		printk(KERN_INFO
+		       "               pkt_seq_num = %u,"
+		       " pkt_retry_count = %u\n",
+		       pkt->hdr.pkt_seq_num,
+		       pkt->hdr.pkt_retry_count);
+		break;
+	case CMD_REPORT_STATUS:
+		control_log_report_status_pkt(pkt);
+
+		break;
+	case CMD_RESET:
+		printk(KERN_INFO
+		       "               pkt_cmd = CMD_RESET\n");
+		printk(KERN_INFO
+		       "               pkt_seq_num = %u,"
+		       " pkt_retry_count = %u\n",
+		       pkt->hdr.pkt_seq_num,
+		       pkt->hdr.pkt_retry_count);
+		break;
+	case CMD_HEARTBEAT:
+		printk(KERN_INFO
+		       "               pkt_cmd = CMD_HEARTBEAT\n");
+		printk(KERN_INFO
+		       "               pkt_seq_num = %u,"
+		       " pkt_retry_count = %u\n",
+		       pkt->hdr.pkt_seq_num,
+		       pkt->hdr.pkt_retry_count);
+		printk(KERN_INFO "               hb_interval = %d\n",
+		       be32_to_cpu(pkt->cmd.heartbeat_req.hb_interval));
+		break;
+	default:
+		printk(KERN_INFO
+		       "               pkt_cmd = UNKNOWN (%u)\n",
+		       pkt->hdr.pkt_cmd);
+		printk(KERN_INFO
+		       "               pkt_seq_num = %u,"
+		       " pkt_retry_count = %u\n",
+		       pkt->hdr.pkt_seq_num,
+		       pkt->hdr.pkt_retry_count);
+		break;
+	}
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h
new file mode 100644
index 0000000..3cf1fc0
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_control.h
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_CONTROL_H_INCLUDED
+#define VNIC_CONTROL_H_INCLUDED
+
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+#include <linux/timex.h>
+#include <linux/completion.h>
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+
+#include "vnic_ib.h"
+#include "vnic_control_pkt.h"
+
+enum control_timer_state {
+	TIMER_IDLE	= 0,
+	TIMER_ACTIVE	= 1,
+	TIMER_EXPIRED	= 2
+};
+
+enum control_request_state {
+	REQ_INACTIVE,  /* quiet state, all previous operations done
+			*      response is NULL
+			*      last_cmd = CMD_INVALID
+			*      timer_state = IDLE
+			*/
+	REQ_POSTED,    /* REQ put on send Q
+			*      response is NULL
+			*      last_cmd = command issued
+			*      timer_state = ACTIVE
+			*/
+	REQ_SENT,      /* Send completed for REQ
+			*      response is NULL
+			*      last_cmd = command issued
+			*      timer_state = ACTIVE
+			*/
+	RSP_RECEIVED,  /* Received Resp, but no Send completion yet
+			*      response is response buffer received
+			*      last_cmd = command issued
+			*      timer_state = ACTIVE
+			*/
+	REQ_COMPLETED, /* all processing for REQ completed, ready to be gotten
+			*      response is response buffer received
+			*      last_cmd = command issued
+			*      timer_state = ACTIVE
+			*/
+	REQ_FAILED,    /* processing of REQ/RSP failed.
+			*      response is NULL
+			*      last_cmd = CMD_INVALID
+			*      timer_state = IDLE or EXPIRED
+			*      viport has been moved to error state to force
+			*      recovery
+			*/
+};
+
+struct control {
+	struct viport			*parent;
+	struct control_config		*config;
+	struct ib_mr			*mr;
+	struct vnic_ib_conn		ib_conn;
+	struct vnic_control_packet	*local_storage;
+	int				send_len;
+	int				recv_len;
+	u16				maj_ver;
+	u16				min_ver;
+	struct vnic_lan_switch_attribs	lan_switch;
+	struct send_io			send_io;
+	struct recv_io			*recv_ios;
+	dma_addr_t			send_dma;
+	dma_addr_t			recv_dma;
+	enum control_timer_state	timer_state;
+	enum control_request_state      req_state;
+	struct timer_list		timer;
+	u8				seq_num;
+	u8				last_cmd;
+	struct recv_io			*response;
+	struct recv_io			*info;
+	struct list_head		failure_list;
+	spinlock_t			io_lock;
+	struct completion		done;
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+	struct {
+		cycles_t	request_time;	/* intermediate value */
+		cycles_t	response_time;
+		u32		response_num;
+		cycles_t	response_max;
+		cycles_t	response_min;
+		u32		timeout_num;
+	} statistics;
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+};
+
+int control_init(struct control *control, struct viport *viport,
+		 struct control_config *config, struct ib_pd *pd);
+
+void control_cleanup(struct control *control);
+
+void control_process_async(struct control *control);
+
+int control_init_vnic_req(struct control *control);
+int control_init_vnic_rsp(struct control *control, u32 *features,
+			  u8 *mac_address, u16 *num_addrs, u16 *vlan);
+
+int control_config_data_path_req(struct control *control, u64 path_id,
+				 struct vnic_recv_pool_config *host,
+				 struct vnic_recv_pool_config *eioc);
+int control_config_data_path_rsp(struct control *control,
+				 struct vnic_recv_pool_config *host,
+				 struct vnic_recv_pool_config *eioc,
+				 struct vnic_recv_pool_config *max_host,
+				 struct vnic_recv_pool_config *max_eioc,
+				 struct vnic_recv_pool_config *min_host,
+				 struct vnic_recv_pool_config *min_eioc);
+
+int control_exchange_pools_req(struct control *control,
+			       u64 addr, u32 rkey);
+int control_exchange_pools_rsp(struct control *control,
+			       u64 *addr, u32 *rkey);
+
+int control_config_link_req(struct control *control,
+			    u16 flags, u16 mtu);
+int control_config_link_rsp(struct control *control,
+			    u16 *flags, u16 *mtu);
+
+int control_config_addrs_req(struct control *control,
+			     struct vnic_address_op2 *addrs, u16 num);
+int control_config_addrs_rsp(struct control *control);
+
+int control_report_statistics_req(struct control *control);
+int control_report_statistics_rsp(struct control *control,
+				  struct vnic_cmd_report_stats_rsp *stats);
+
+int control_heartbeat_req(struct control *control, u32 hb_interval);
+int control_heartbeat_rsp(struct control *control);
+
+int control_reset_req(struct control *control);
+int control_reset_rsp(struct control *control);
+
+
+#define control_packet(io) 					\
+	(struct vnic_control_packet *)(io)->virtual_addr
+#define control_is_connected(control) 				\
+	(vnic_ib_conn_connected(&((control)->ib_conn)))
+
+#define control_last_req(control)	control_packet(&(control)->send_io)
+#define control_features(control)	(control)->features_supported
+
+#define control_get_mac_address(control,addr) 				\
+	memcpy(addr, (control)->lan_switch.hw_mac_address, ETH_ALEN)
+
+#endif	/* VNIC_CONTROL_H_INCLUDED */
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h
new file mode 100644
index 0000000..1fc62fb
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_control_pkt.h
@@ -0,0 +1,368 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_CONTROL_PKT_H_INCLUDED
+#define VNIC_CONTROL_PKT_H_INCLUDED
+
+#include <linux/utsname.h>
+#include <rdma/ib_verbs.h>
+
+#define VNIC_MAX_NODENAME_LEN	64
+
+struct vnic_connection_data {
+	u64	path_id;
+	u8	vnic_instance;
+	u8	path_num;
+	u8	nodename[VNIC_MAX_NODENAME_LEN + 1];
+	u8  	reserved; /* for alignment */
+	__be32 	features_supported;
+};
+
+struct vnic_control_header {
+	u8	pkt_type;
+	u8	pkt_cmd;
+	u8	pkt_seq_num;
+	u8	pkt_retry_count;
+	u32	reserved;	/* for 64-bit alignmnet */
+};
+
+/* ptk_type values */
+enum {
+	TYPE_INFO	= 0,
+	TYPE_REQ	= 1,
+	TYPE_RSP	= 2,
+	TYPE_ERR	= 3
+};
+
+/* ptk_cmd values */
+enum {
+	CMD_INVALID		= 0,
+	CMD_INIT_VNIC		= 1,
+	CMD_CONFIG_DATA_PATH	= 2,
+	CMD_EXCHANGE_POOLS	= 3,
+	CMD_CONFIG_ADDRESSES	= 4,
+	CMD_CONFIG_LINK		= 5,
+	CMD_REPORT_STATISTICS	= 6,
+	CMD_CLEAR_STATISTICS	= 7,
+	CMD_REPORT_STATUS	= 8,
+	CMD_RESET		= 9,
+	CMD_HEARTBEAT		= 10,
+	CMD_CONFIG_ADDRESSES2	= 11,
+};
+
+/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_REQ data format */
+struct vnic_cmd_init_vnic_req {
+	__be16	vnic_major_version;
+	__be16	vnic_minor_version;
+	u8	vnic_instance;
+	u8	num_data_paths;
+	__be16	num_address_entries;
+};
+
+/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_RSP subdata format */
+struct vnic_lan_switch_attribs {
+	u8	lan_switch_num;
+	u8	num_enet_ports;
+	__be16	default_vlan;
+	u8	hw_mac_address[ETH_ALEN];
+};
+
+/* pkt_cmd CMD_INIT_VNIC, pkt_type TYPE_RSP data format */
+struct vnic_cmd_init_vnic_rsp {
+	__be16				vnic_major_version;
+	__be16				vnic_minor_version;
+	u8				num_lan_switches;
+	u8				num_data_paths;
+	__be16				num_address_entries;
+	__be32				features_supported;
+	struct vnic_lan_switch_attribs	lan_switch[1];
+};
+
+/* features_supported values */
+enum {
+	VNIC_FEAT_IPV4_HEADERS		= 0x0001,
+	VNIC_FEAT_IPV6_HEADERS		= 0x0002,
+	VNIC_FEAT_IPV4_CSUM_RX		= 0x0004,
+	VNIC_FEAT_IPV4_CSUM_TX		= 0x0008,
+	VNIC_FEAT_TCP_CSUM_RX		= 0x0010,
+	VNIC_FEAT_TCP_CSUM_TX		= 0x0020,
+	VNIC_FEAT_UDP_CSUM_RX		= 0x0040,
+	VNIC_FEAT_UDP_CSUM_TX		= 0x0080,
+	VNIC_FEAT_TCP_SEGMENT		= 0x0100,
+	VNIC_FEAT_IPV4_IPSEC_OFFLOAD	= 0x0200,
+	VNIC_FEAT_IPV6_IPSEC_OFFLOAD	= 0x0400,
+	VNIC_FEAT_FCS_PROPAGATE		= 0x0800,
+	VNIC_FEAT_PF_KICK		= 0x1000,
+	VNIC_FEAT_PF_FORCE_ROUTE	= 0x2000,
+	VNIC_FEAT_CHASH_OFFLOAD		= 0x4000,
+	/* host send with immediate data */
+	VNIC_FEAT_RDMA_IMMED		= 0x8000,
+	/* host ignore inbound PF_VLAN_INSERT flag */
+	VNIC_FEAT_IGNORE_VLAN		= 0x10000,
+	/* host supports IB multicast for inbound Ethernet mcast traffic */
+	VNIC_FEAT_INBOUND_IB_MC 	= 0x20000,
+};
+
+/* pkt_cmd CMD_CONFIG_DATA_PATH subdata format */
+struct vnic_recv_pool_config {
+	__be32	size_recv_pool_entry;
+	__be32	num_recv_pool_entries;
+	__be32	timeout_before_kick;
+	__be32	num_recv_pool_entries_before_kick;
+	__be32	num_recv_pool_bytes_before_kick;
+	__be32	free_recv_pool_entries_per_update;
+};
+
+/* pkt_cmd CMD_CONFIG_DATA_PATH data format */
+struct vnic_cmd_config_data_path {
+	u64				path_identifier;
+	u8				data_path;
+	u8				reserved[3];
+	struct vnic_recv_pool_config	host_recv_pool_config;
+	struct vnic_recv_pool_config	eioc_recv_pool_config;
+};
+
+/* pkt_cmd CMD_EXCHANGE_POOLS data format */
+struct vnic_cmd_exchange_pools {
+	u8	data_path;
+	u8	reserved[3];
+	__be32	pool_rkey;
+	__be64	pool_addr;
+};
+
+/* pkt_cmd CMD_CONFIG_ADDRESSES subdata format */
+struct vnic_address_op {
+	__be16	index;
+	u8	operation;
+	u8	valid;
+	u8	address[6];
+	__be16	vlan;
+};
+
+/* pkt_cmd CMD_CONFIG_ADDRESSES2 subdata format */
+struct vnic_address_op2 {
+	__be16	index;
+	u8	operation;
+	u8	valid;
+	u8	address[6];
+	__be16	vlan;
+	u32 reserved; /* for alignment */
+	union ib_gid mgid; /* valid in rsp only if both ends support mcast */
+};
+
+/* operation values */
+enum {
+	VNIC_OP_SET_ENTRY = 0x01,
+	VNIC_OP_GET_ENTRY = 0x02
+};
+
+/* pkt_cmd CMD_CONFIG_ADDRESSES data format */
+struct vnic_cmd_config_addresses {
+	u8			num_address_ops;
+	u8			lan_switch_num;
+	struct vnic_address_op	list_address_ops[1];
+};
+
+/* pkt_cmd CMD_CONFIG_ADDRESSES2 data format */
+struct vnic_cmd_config_addresses2 {
+	u8			num_address_ops;
+	u8			lan_switch_num;
+	u8			reserved1;
+	u8			reserved2;
+	u8			reserved3;
+	struct vnic_address_op2	list_address_ops[1];
+};
+
+/* CMD_CONFIG_LINK data format */
+struct vnic_cmd_config_link {
+	u8	cmd_flags;
+	u8	lan_switch_num;
+	__be16	mtu_size;
+	__be16	default_vlan;
+	u8	hw_mac_address[6];
+	u32	reserved; /* for alignment */
+	/* valid in rsp only if both ends support mcast */
+	union ib_gid allmulti_mgid;
+};
+
+/* cmd_flags values */
+enum {
+	VNIC_FLAG_ENABLE_NIC		= 0x01,
+	VNIC_FLAG_DISABLE_NIC		= 0x02,
+	VNIC_FLAG_ENABLE_MCAST_ALL	= 0x04,
+	VNIC_FLAG_DISABLE_MCAST_ALL	= 0x08,
+	VNIC_FLAG_ENABLE_PROMISC	= 0x10,
+	VNIC_FLAG_DISABLE_PROMISC	= 0x20,
+	VNIC_FLAG_SET_MTU		= 0x40
+};
+
+/* pkt_cmd CMD_REPORT_STATISTICS, pkt_type TYPE_REQ data format */
+struct vnic_cmd_report_stats_req {
+	u8	lan_switch_num;
+};
+
+/* pkt_cmd CMD_REPORT_STATISTICS, pkt_type TYPE_RSP data format */
+struct vnic_cmd_report_stats_rsp {
+	u8	lan_switch_num;
+	u8	reserved[7];		/* for 64-bit alignment */
+	__be64	if_in_broadcast_pkts;
+	__be64	if_in_multicast_pkts;
+	__be64	if_in_octets;
+	__be64	if_in_ucast_pkts;
+	__be64	if_in_nucast_pkts;	/* if_in_broadcast_pkts
+					 + if_in_multicast_pkts */
+	__be64	if_in_underrun;		/* (OID_GEN_RCV_NO_BUFFER) */
+	__be64	if_in_errors;		/* (OID_GEN_RCV_ERROR) */
+	__be64	if_out_errors;		/* (OID_GEN_XMIT_ERROR) */
+	__be64	if_out_octets;
+	__be64	if_out_ucast_pkts;
+	__be64	if_out_multicast_pkts;
+	__be64	if_out_broadcast_pkts;
+	__be64	if_out_nucast_pkts;	/* if_out_broadcast_pkts
+					 + if_out_multicast_pkts */
+	__be64	if_out_ok;		/* if_out_nucast_pkts
+					 + if_out_ucast_pkts(OID_GEN_XMIT_OK) */
+	__be64	if_in_ok;		/* if_in_nucast_pkts
+					 + if_in_ucast_pkts(OID_GEN_RCV_OK) */
+	__be64	if_out_ucast_bytes;	/* (OID_GEN_DIRECTED_BYTES_XMT) */
+	__be64	if_out_multicast_bytes;	/* (OID_GEN_MULTICAST_BYTES_XMT) */
+	__be64	if_out_broadcast_bytes;	/* (OID_GEN_BROADCAST_BYTES_XMT) */
+	__be64	if_in_ucast_bytes;	/* (OID_GEN_DIRECTED_BYTES_RCV) */
+	__be64	if_in_multicast_bytes;	/* (OID_GEN_MULTICAST_BYTES_RCV) */
+	__be64	if_in_broadcast_bytes;	/* (OID_GEN_BROADCAST_BYTES_RCV) */
+	__be64	 ethernet_status;	/* OID_GEN_MEDIA_CONNECT_STATUS) */
+};
+
+/* pkt_cmd CMD_CLEAR_STATISTICS data format */
+struct vnic_cmd_clear_statistics {
+	u8	lan_switch_num;
+};
+
+/* pkt_cmd CMD_REPORT_STATUS data format */
+struct vnic_cmd_report_status {
+	u8	lan_switch_num;
+	u8	is_fatal;
+	u8	reserved[2];		/* for 32-bit alignment */
+	__be32	status_number;
+	__be32	status_info;
+	u8	file_name[32];
+	u8	routine[32];
+	__be32	line_num;
+	__be32	error_parameter;
+	u8	desc_text[128];
+};
+
+/* pkt_cmd CMD_HEARTBEAT data format */
+struct vnic_cmd_heartbeat {
+	__be32	hb_interval;
+};
+
+enum {
+	VNIC_STATUS_LINK_UP			= 1,
+	VNIC_STATUS_LINK_DOWN			= 2,
+	VNIC_STATUS_ENET_AGGREGATION_CHANGE	= 3,
+	VNIC_STATUS_EIOC_SHUTDOWN		= 4,
+	VNIC_STATUS_CONTROL_ERROR		= 5,
+	VNIC_STATUS_EIOC_ERROR			= 6
+};
+
+#define VNIC_MAX_CONTROLPKTSZ		256
+#define VNIC_MAX_CONTROLDATASZ						\
+	(VNIC_MAX_CONTROLPKTSZ - sizeof(struct vnic_control_header))
+
+struct vnic_control_packet {
+	struct vnic_control_header	hdr;
+	union {
+		struct vnic_cmd_init_vnic_req		init_vnic_req;
+		struct vnic_cmd_init_vnic_rsp		init_vnic_rsp;
+		struct vnic_cmd_config_data_path	config_data_path_req;
+		struct vnic_cmd_config_data_path	config_data_path_rsp;
+		struct vnic_cmd_exchange_pools		exchange_pools_req;
+		struct vnic_cmd_exchange_pools		exchange_pools_rsp;
+		struct vnic_cmd_config_addresses	config_addresses_req;
+		struct vnic_cmd_config_addresses2	config_addresses_req2;
+		struct vnic_cmd_config_addresses	config_addresses_rsp;
+		struct vnic_cmd_config_addresses2	config_addresses_rsp2;
+		struct vnic_cmd_config_link		config_link_req;
+		struct vnic_cmd_config_link		config_link_rsp;
+		struct vnic_cmd_report_stats_req	report_statistics_req;
+		struct vnic_cmd_report_stats_rsp	report_statistics_rsp;
+		struct vnic_cmd_clear_statistics	clear_statistics_req;
+		struct vnic_cmd_clear_statistics	clear_statistics_rsp;
+		struct vnic_cmd_report_status		report_status;
+		struct vnic_cmd_heartbeat		heartbeat_req;
+		struct vnic_cmd_heartbeat		heartbeat_rsp;
+
+		char   cmd_data[VNIC_MAX_CONTROLDATASZ];
+	} cmd;
+};
+
+union ib_gid_cpu {
+	u8      raw[16];
+	struct {
+		u64  subnet_prefix;
+		u64  interface_id;
+	} global;
+};
+
+static inline void bswap_ib_gid(union ib_gid *mgid1, union ib_gid_cpu *mgid2)
+{
+    /* swap hi & low */
+    __be64 low = mgid1->global.subnet_prefix;
+    mgid2->global.subnet_prefix = be64_to_cpu(mgid1->global.interface_id);
+    mgid2->global.interface_id = be64_to_cpu(low);
+}
+
+#define VNIC_GID_FMT 	"%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x"
+
+#define VNIC_GID_RAW_ARG(gid) 	be16_to_cpu(*(__be16 *)&(gid)[0]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[2]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[4]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[6]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[8]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[10]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[12]),	\
+				be16_to_cpu(*(__be16 *)&(gid)[14])
+
+
+/* These defines are used to figure out how many address entries can be passed
+ * in config_addresses request.
+ */
+#define MAX_CONFIG_ADDR_ENTRIES \
+	((VNIC_MAX_CONTROLDATASZ - (sizeof(struct vnic_cmd_config_addresses) \
+	- sizeof(struct vnic_address_op)))/sizeof(struct vnic_address_op))
+#define MAX_CONFIG_ADDR_ENTRIES2 \
+	((VNIC_MAX_CONTROLDATASZ - (sizeof(struct vnic_cmd_config_addresses2) \
+	- sizeof(struct vnic_address_op2)))/sizeof(struct vnic_address_op2))
+
+
+#endif	/* VNIC_CONTROL_PKT_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:16:54 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:46:54 +0530
Subject: [ofa-general] [PATCH 02/13] QLogic VNIC: Netpath - abstraction of
	connection to EVIC/VEx
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171654.31725.5636.stgit@localhost.localdomain>

From: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>

This patch implements the netpath layer of QLogic VNIC. Netpath is an 
abstraction of a connection to EVIC. It primarily includes the 
implementation which maintains the timers to monitor the status of
the connection to EVIC/VEx.

Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c |  112 +++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h |   80 ++++++++++++++++
 2 files changed, 192 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c
new file mode 100644
index 0000000..820b996
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+
+#include "vnic_util.h"
+#include "vnic_main.h"
+#include "vnic_viport.h"
+#include "vnic_netpath.h"
+
+static void vnic_npevent_timeout(unsigned long data)
+{
+	struct netpath *netpath = (struct netpath *)data;
+
+	if (netpath->second_bias)
+		vnic_npevent_queue_evt(netpath, VNIC_SECNP_TIMEREXPIRED);
+	else
+		vnic_npevent_queue_evt(netpath, VNIC_PRINP_TIMEREXPIRED);
+}
+
+void netpath_timer(struct netpath *netpath, int timeout)
+{
+	if (netpath->timer_state == NETPATH_TS_ACTIVE)
+		del_timer_sync(&netpath->timer);
+	if (timeout) {
+		init_timer(&netpath->timer);
+		netpath->timer_state = NETPATH_TS_ACTIVE;
+		netpath->timer.expires = jiffies + timeout;
+		netpath->timer.data = (unsigned long)netpath;
+		netpath->timer.function = vnic_npevent_timeout;
+		add_timer(&netpath->timer);
+	} else
+		vnic_npevent_timeout((unsigned long)netpath);
+}
+
+void netpath_timer_stop(struct netpath *netpath)
+{
+	if (netpath->timer_state != NETPATH_TS_ACTIVE)
+		return;
+	del_timer_sync(&netpath->timer);
+	if (netpath->second_bias)
+		vnic_npevent_dequeue_evt(netpath, VNIC_SECNP_TIMEREXPIRED);
+	else
+		vnic_npevent_dequeue_evt(netpath, VNIC_PRINP_TIMEREXPIRED);
+
+	netpath->timer_state = NETPATH_TS_IDLE;
+}
+
+void netpath_free(struct netpath *netpath)
+{
+	if (!netpath->viport)
+		return;
+	viport_free(netpath->viport);
+	netpath->viport = NULL;
+	sysfs_remove_group(&netpath->dev_info.dev.kobj,
+			   &vnic_path_attr_group);
+	device_unregister(&netpath->dev_info.dev);
+	wait_for_completion(&netpath->dev_info.released);
+}
+
+void netpath_init(struct netpath *netpath, struct vnic *vnic,
+		  int second_bias)
+{
+	netpath->parent = vnic;
+	netpath->carrier = 0;
+	netpath->viport = NULL;
+	netpath->second_bias = second_bias;
+	netpath->timer_state = NETPATH_TS_IDLE;
+	init_timer(&netpath->timer);
+}
+
+const char *netpath_to_string(struct vnic *vnic, struct netpath *netpath)
+{
+	if (!netpath)
+		return "NULL";
+	else if (netpath == &vnic->primary_path)
+		return "PRIMARY";
+	else if (netpath == &vnic->secondary_path)
+		return "SECONDARY";
+	else
+		return "UNKNOWN";
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h
new file mode 100644
index 0000000..1259ae0
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_netpath.h
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_NETPATH_H_INCLUDED
+#define VNIC_NETPATH_H_INCLUDED
+
+#include <linux/spinlock.h>
+
+#include "vnic_sys.h"
+
+struct viport;
+struct vnic;
+
+enum netpath_ts {
+	NETPATH_TS_IDLE		= 0,
+	NETPATH_TS_ACTIVE	= 1,
+	NETPATH_TS_EXPIRED	= 2
+};
+
+struct netpath {
+	int			carrier;
+	struct vnic		*parent;
+	struct viport		*viport;
+	size_t			path_idx;
+	u32			connect_time;
+	int			second_bias;
+	u8			is_primary_path;
+	u8 			delay_reconnect;
+	int 			cleanup_started;
+	struct timer_list	timer;
+	enum netpath_ts		timer_state;
+	struct dev_info		dev_info;
+};
+
+void netpath_init(struct netpath *netpath, struct vnic *vnic,
+		  int second_bias);
+void netpath_free(struct netpath *netpath);
+
+void netpath_timer(struct netpath *netpath, int timeout);
+void netpath_timer_stop(struct netpath *netpath);
+
+const char *netpath_to_string(struct vnic *vnic, struct netpath *netpath);
+
+#define netpath_get_hw_addr(netpath, address)		\
+	viport_get_hw_addr((netpath)->viport, address)
+#define netpath_is_connected(netpath)			\
+	(netpath->state == NETPATH_CONNECTED)
+#define netpath_can_tx_csum(netpath)			\
+	viport_can_tx_csum(netpath->viport)
+
+#endif	/* VNIC_NETPATH_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:18:25 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:48:25 +0530
Subject: [ofa-general] [PATCH 05/13] QLogic VNIC: Implementation of Data path
	of communication protocol
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171824.31725.5212.stgit@localhost.localdomain>

From: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>

This patch implements the actual data transfer part of the communication
protocol with the EVIC/VEx. RDMA of ethernet packets is implemented in
here.

Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_data.c    | 1473 +++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_data.h    |  206 +++
 drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h |  103 ++
 3 files changed, 1782 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_data.h
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c
new file mode 100644
index 0000000..599e716
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.c
@@ -0,0 +1,1473 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <net/inet_sock.h>
+#include <linux/ip.h>
+#include <linux/if_ether.h>
+#include <linux/vmalloc.h>
+
+#include "vnic_util.h"
+#include "vnic_viport.h"
+#include "vnic_main.h"
+#include "vnic_data.h"
+#include "vnic_trailer.h"
+#include "vnic_stats.h"
+
+static void data_received_kick(struct io *io);
+static void data_xmit_complete(struct io *io);
+
+static void mc_data_recv_routine(struct io *io);
+static void mc_data_post_recvs(struct mc_data *mc_data);
+static void mc_data_recv_to_skbuff(struct viport *viport, struct sk_buff *skb,
+	struct viport_trailer *trailer);
+
+static u32 min_rcv_skb = 60;
+module_param(min_rcv_skb, int, 0444);
+MODULE_PARM_DESC(min_rcv_skb, "Packets of size (in bytes) less than"
+		 " or equal this value will be copied during receive."
+		 " Default 60");
+
+static u32 min_xmt_skb = 60;
+module_param(min_xmt_skb, int, 0444);
+MODULE_PARM_DESC(min_xmit_skb, "Packets of size (in bytes) less than"
+		 " or equal to this value will be copied during transmit."
+		 "Default 60");
+
+int data_init(struct data *data, struct viport *viport,
+	      struct data_config *config, struct ib_pd *pd)
+{
+	DATA_FUNCTION("data_init()\n");
+
+	data->parent = viport;
+	data->config = config;
+	data->ib_conn.viport = viport;
+	data->ib_conn.ib_config = &config->ib_config;
+	data->ib_conn.state = IB_CONN_UNINITTED;
+	data->ib_conn.callback_thread = NULL;
+	data->ib_conn.callback_thread_end = 0;
+
+	if ((min_xmt_skb < 60) || (min_xmt_skb > 9000)) {
+		DATA_ERROR("min_xmt_skb (%d) must be between 60 and 9000\n",
+			   min_xmt_skb);
+		goto failure;
+	}
+	if (vnic_ib_conn_init(&data->ib_conn, viport, pd,
+			      &config->ib_config)) {
+		DATA_ERROR("Data IB connection initialization failed\n");
+		goto failure;
+	}
+	data->mr = ib_get_dma_mr(pd,
+				 IB_ACCESS_LOCAL_WRITE |
+				 IB_ACCESS_REMOTE_READ |
+				 IB_ACCESS_REMOTE_WRITE);
+	if (IS_ERR(data->mr)) {
+		DATA_ERROR("failed to register memory for"
+			   " data connection\n");
+		goto destroy_conn;
+	}
+
+	data->ib_conn.cm_id = ib_create_cm_id(viport->config->ibdev,
+					      vnic_ib_cm_handler,
+					      &data->ib_conn);
+
+	if (IS_ERR(data->ib_conn.cm_id)) {
+		DATA_ERROR("creating data CM ID failed\n");
+		goto dereg_mr;
+	}
+
+	return 0;
+
+dereg_mr:
+	ib_dereg_mr(data->mr);
+destroy_conn:
+	completion_callback_cleanup(&data->ib_conn);
+	ib_destroy_qp(data->ib_conn.qp);
+	ib_destroy_cq(data->ib_conn.cq);
+failure:
+	return -1;
+}
+
+static void data_post_recvs(struct data *data)
+{
+	unsigned long flags;
+	int i = 0;
+
+	DATA_FUNCTION("data_post_recvs()\n");
+	spin_lock_irqsave(&data->recv_ios_lock, flags);
+	while (!list_empty(&data->recv_ios)) {
+		struct io *io = list_entry(data->recv_ios.next,
+					   struct io, list_ptrs);
+		struct recv_io *recv_io = (struct recv_io *)io;
+
+		list_del(&recv_io->io.list_ptrs);
+		spin_unlock_irqrestore(&data->recv_ios_lock, flags);
+		if (vnic_ib_post_recv(&data->ib_conn, &recv_io->io)) {
+			viport_failure(data->parent);
+			return;
+		}
+		i++;
+		spin_lock_irqsave(&data->recv_ios_lock, flags);
+	}
+	spin_unlock_irqrestore(&data->recv_ios_lock, flags);
+	DATA_INFO("data posted %d %p\n", i, &data->recv_ios);
+}
+
+static void data_init_pool_work_reqs(struct data *data,
+				      struct recv_io *recv_io)
+{
+	struct recv_pool	*recv_pool = &data->recv_pool;
+	struct xmit_pool	*xmit_pool = &data->xmit_pool;
+	struct rdma_io		*rdma_io;
+	struct rdma_dest	*rdma_dest;
+	dma_addr_t		xmit_dma;
+	u8			*xmit_data;
+	unsigned int		i;
+
+	INIT_LIST_HEAD(&data->recv_ios);
+	spin_lock_init(&data->recv_ios_lock);
+	spin_lock_init(&data->xmit_buf_lock);
+	for (i = 0; i < data->config->num_recvs; i++) {
+		recv_io[i].io.viport = data->parent;
+		recv_io[i].io.routine = data_received_kick;
+		recv_io[i].list.addr = data->region_data_dma;
+		recv_io[i].list.length = 4;
+		recv_io[i].list.lkey = data->mr->lkey;
+
+		recv_io[i].io.rwr.wr_id = (u64)&recv_io[i].io;
+		recv_io[i].io.rwr.sg_list = &recv_io[i].list;
+		recv_io[i].io.rwr.num_sge = 1;
+
+		list_add(&recv_io[i].io.list_ptrs, &data->recv_ios);
+	}
+
+	INIT_LIST_HEAD(&recv_pool->avail_recv_bufs);
+	for (i = 0; i < recv_pool->pool_sz; i++) {
+		rdma_dest = &recv_pool->recv_bufs[i];
+		list_add(&rdma_dest->list_ptrs,
+			 &recv_pool->avail_recv_bufs);
+	}
+
+	xmit_dma = xmit_pool->xmitdata_dma;
+	xmit_data = xmit_pool->xmit_data;
+
+	for (i = 0; i < xmit_pool->num_xmit_bufs; i++) {
+		rdma_io = &xmit_pool->xmit_bufs[i];
+		rdma_io->index = i;
+		rdma_io->io.viport = data->parent;
+		rdma_io->io.routine = data_xmit_complete;
+
+		rdma_io->list[0].lkey = data->mr->lkey;
+		rdma_io->list[1].lkey = data->mr->lkey;
+		rdma_io->io.swr.wr_id = (u64)rdma_io;
+		rdma_io->io.swr.sg_list = rdma_io->list;
+		rdma_io->io.swr.num_sge = 2;
+		rdma_io->io.swr.opcode = IB_WR_RDMA_WRITE;
+		rdma_io->io.swr.send_flags = IB_SEND_SIGNALED;
+		rdma_io->io.type = RDMA;
+
+		rdma_io->data = xmit_data;
+		rdma_io->data_dma = xmit_dma;
+
+		xmit_data += ALIGN(min_xmt_skb, VIPORT_TRAILER_ALIGNMENT);
+		xmit_dma += ALIGN(min_xmt_skb, VIPORT_TRAILER_ALIGNMENT);
+		rdma_io->trailer = (struct viport_trailer *)xmit_data;
+		rdma_io->trailer_dma = xmit_dma;
+		xmit_data += sizeof(struct viport_trailer);
+		xmit_dma += sizeof(struct viport_trailer);
+	}
+
+	xmit_pool->rdma_rkey = data->mr->rkey;
+	xmit_pool->rdma_addr = xmit_pool->buf_pool_dma;
+}
+
+static void data_init_free_bufs_swrs(struct data *data)
+{
+	struct rdma_io		*rdma_io;
+	struct send_io		*send_io;
+
+	rdma_io = &data->free_bufs_io;
+	rdma_io->io.viport = data->parent;
+	rdma_io->io.routine = NULL;
+
+	rdma_io->list[0].lkey = data->mr->lkey;
+
+	rdma_io->io.swr.wr_id = (u64)rdma_io;
+	rdma_io->io.swr.sg_list = rdma_io->list;
+	rdma_io->io.swr.num_sge = 1;
+	rdma_io->io.swr.opcode = IB_WR_RDMA_WRITE;
+	rdma_io->io.swr.send_flags = IB_SEND_SIGNALED;
+	rdma_io->io.type = RDMA;
+
+	send_io = &data->kick_io;
+	send_io->io.viport = data->parent;
+	send_io->io.routine = NULL;
+
+	send_io->list.addr = data->region_data_dma;
+	send_io->list.length = 0;
+	send_io->list.lkey = data->mr->lkey;
+
+	send_io->io.swr.wr_id = (u64)send_io;
+	send_io->io.swr.sg_list = &send_io->list;
+	send_io->io.swr.num_sge = 1;
+	send_io->io.swr.opcode = IB_WR_SEND;
+	send_io->io.swr.send_flags = IB_SEND_SIGNALED;
+	send_io->io.type = SEND;
+}
+
+static int data_init_buf_pools(struct data *data)
+{
+	struct recv_pool	*recv_pool = &data->recv_pool;
+	struct xmit_pool	*xmit_pool = &data->xmit_pool;
+	struct viport		*viport = data->parent;
+
+	recv_pool->buf_pool_len =
+	    sizeof(struct buff_pool_entry) * recv_pool->eioc_pool_sz;
+
+	recv_pool->buf_pool = kzalloc(recv_pool->buf_pool_len, GFP_KERNEL);
+
+	if (!recv_pool->buf_pool) {
+		DATA_ERROR("failed allocating %d bytes"
+			   " for recv pool bufpool\n",
+			   recv_pool->buf_pool_len);
+		goto failure;
+	}
+
+	recv_pool->buf_pool_dma =
+	    ib_dma_map_single(viport->config->ibdev,
+			      recv_pool->buf_pool, recv_pool->buf_pool_len,
+			      DMA_TO_DEVICE);
+
+	if (ib_dma_mapping_error(viport->config->ibdev, recv_pool->buf_pool_dma)) {
+		DATA_ERROR("xmit buf_pool dma map error\n");
+		goto free_recv_pool;
+	}
+
+	xmit_pool->buf_pool_len =
+	    sizeof(struct buff_pool_entry) * xmit_pool->pool_sz;
+	xmit_pool->buf_pool = kzalloc(xmit_pool->buf_pool_len, GFP_KERNEL);
+
+	if (!xmit_pool->buf_pool) {
+		DATA_ERROR("failed allocating %d bytes"
+			   " for xmit pool bufpool\n",
+			   xmit_pool->buf_pool_len);
+		goto unmap_recv_pool;
+	}
+
+	xmit_pool->buf_pool_dma =
+	    ib_dma_map_single(viport->config->ibdev,
+			      xmit_pool->buf_pool, xmit_pool->buf_pool_len,
+			      DMA_FROM_DEVICE);
+
+	if (ib_dma_mapping_error(viport->config->ibdev, xmit_pool->buf_pool_dma)) {
+		DATA_ERROR("xmit buf_pool dma map error\n");
+		goto free_xmit_pool;
+	}
+
+	xmit_pool->xmit_data = kzalloc(xmit_pool->xmitdata_len, GFP_KERNEL);
+
+	if (!xmit_pool->xmit_data) {
+		DATA_ERROR("failed allocating %d bytes for xmit data\n",
+			   xmit_pool->xmitdata_len);
+		goto unmap_xmit_pool;
+	}
+
+	xmit_pool->xmitdata_dma =
+	    ib_dma_map_single(viport->config->ibdev,
+			      xmit_pool->xmit_data, xmit_pool->xmitdata_len,
+			      DMA_TO_DEVICE);
+
+	if (ib_dma_mapping_error(viport->config->ibdev, xmit_pool->xmitdata_dma)) {
+		DATA_ERROR("xmit data dma map error\n");
+		goto free_xmit_data;
+	}
+
+	return 0;
+
+free_xmit_data:
+	kfree(xmit_pool->xmit_data);
+unmap_xmit_pool:
+	ib_dma_unmap_single(data->parent->config->ibdev,
+			    xmit_pool->buf_pool_dma,
+			    xmit_pool->buf_pool_len, DMA_FROM_DEVICE);
+free_xmit_pool:
+	kfree(xmit_pool->buf_pool);
+unmap_recv_pool:
+	ib_dma_unmap_single(data->parent->config->ibdev,
+			    recv_pool->buf_pool_dma,
+			    recv_pool->buf_pool_len, DMA_TO_DEVICE);
+free_recv_pool:
+	kfree(recv_pool->buf_pool);
+failure:
+	return -1;
+}
+
+static void data_init_xmit_pool(struct data *data)
+{
+	struct xmit_pool	*xmit_pool = &data->xmit_pool;
+
+	xmit_pool->pool_sz =
+		be32_to_cpu(data->eioc_pool_parms.num_recv_pool_entries);
+	xmit_pool->buffer_sz =
+		be32_to_cpu(data->eioc_pool_parms.size_recv_pool_entry);
+
+	xmit_pool->notify_count = 0;
+	xmit_pool->notify_bundle = data->config->notify_bundle;
+	xmit_pool->next_xmit_pool = 0;
+	xmit_pool->num_xmit_bufs = xmit_pool->notify_bundle * 2;
+	xmit_pool->next_xmit_buf = 0;
+	xmit_pool->last_comp_buf = xmit_pool->num_xmit_bufs - 1;
+	/* This assumes that data_init_recv_pool has been called
+	 * before.
+	 */
+	data->max_mtu = MAX_PAYLOAD(min((data)->recv_pool.buffer_sz,
+				   (data)->xmit_pool.buffer_sz)) - VLAN_ETH_HLEN;
+
+	xmit_pool->kick_count = 0;
+	xmit_pool->kick_byte_count = 0;
+
+	xmit_pool->send_kicks =
+	  be32_to_cpu(data->
+			eioc_pool_parms.num_recv_pool_entries_before_kick)
+	  || be32_to_cpu(data->
+			eioc_pool_parms.num_recv_pool_bytes_before_kick);
+	xmit_pool->kick_bundle =
+	    be32_to_cpu(data->
+			eioc_pool_parms.num_recv_pool_entries_before_kick);
+	xmit_pool->kick_byte_bundle =
+	    be32_to_cpu(data->
+			eioc_pool_parms.num_recv_pool_bytes_before_kick);
+
+	xmit_pool->need_buffers = 1;
+
+	xmit_pool->xmitdata_len =
+	    BUFFER_SIZE(min_xmt_skb) * xmit_pool->num_xmit_bufs;
+}
+
+static void data_init_recv_pool(struct data *data)
+{
+	struct recv_pool	*recv_pool = &data->recv_pool;
+
+	recv_pool->pool_sz = data->config->host_recv_pool_entries;
+	recv_pool->eioc_pool_sz =
+		be32_to_cpu(data->host_pool_parms.num_recv_pool_entries);
+	if (recv_pool->pool_sz > recv_pool->eioc_pool_sz)
+		recv_pool->pool_sz =
+		    be32_to_cpu(data->host_pool_parms.num_recv_pool_entries);
+
+	recv_pool->buffer_sz =
+		    be32_to_cpu(data->host_pool_parms.size_recv_pool_entry);
+
+	recv_pool->sz_free_bundle =
+		be32_to_cpu(data->
+			host_pool_parms.free_recv_pool_entries_per_update);
+	recv_pool->num_free_bufs = 0;
+	recv_pool->num_posted_bufs = 0;
+
+	recv_pool->next_full_buf = 0;
+	recv_pool->next_free_buf = 0;
+	recv_pool->kick_on_free  = 0;
+}
+
+int data_connect(struct data *data)
+{
+	struct xmit_pool	*xmit_pool = &data->xmit_pool;
+	struct recv_pool	*recv_pool = &data->recv_pool;
+	struct recv_io		*recv_io;
+	unsigned int		sz;
+	struct viport		*viport = data->parent;
+
+	DATA_FUNCTION("data_connect()\n");
+
+	/* Do not interchange the order of the functions
+	 * called below as this will affect the MAX MTU
+	 * calculation
+	 */
+
+	data_init_recv_pool(data);
+	data_init_xmit_pool(data);
+
+	sz = sizeof(struct rdma_dest) * recv_pool->pool_sz    +
+	     sizeof(struct recv_io) * data->config->num_recvs +
+	     sizeof(struct rdma_io) * xmit_pool->num_xmit_bufs;
+
+	data->local_storage = vmalloc(sz);
+
+	if (!data->local_storage) {
+		DATA_ERROR("failed allocating %d bytes"
+			   " local storage\n", sz);
+		goto out;
+	}
+
+	memset(data->local_storage, 0, sz);
+
+	recv_pool->recv_bufs = (struct rdma_dest *)data->local_storage;
+	sz = sizeof(struct rdma_dest) * recv_pool->pool_sz;
+
+	recv_io = (struct recv_io *)(data->local_storage + sz);
+	sz += sizeof(struct recv_io) * data->config->num_recvs;
+
+	xmit_pool->xmit_bufs = (struct rdma_io *)(data->local_storage + sz);
+	data->region_data = kzalloc(4, GFP_KERNEL);
+
+	if (!data->region_data) {
+		DATA_ERROR("failed to alloc memory for region data\n");
+		goto free_local_storage;
+	}
+
+	data->region_data_dma =
+	    ib_dma_map_single(viport->config->ibdev,
+			      data->region_data, 4, DMA_BIDIRECTIONAL);
+
+	if (ib_dma_mapping_error(viport->config->ibdev, data->region_data_dma)) {
+		DATA_ERROR("region data dma map error\n");
+		goto free_region_data;
+	}
+
+	if (data_init_buf_pools(data))
+		goto unmap_region_data;
+
+	data_init_free_bufs_swrs(data);
+	data_init_pool_work_reqs(data, recv_io);
+
+	data_post_recvs(data);
+
+	if (vnic_ib_cm_connect(&data->ib_conn))
+		goto unmap_region_data;
+
+	return 0;
+
+unmap_region_data:
+	ib_dma_unmap_single(data->parent->config->ibdev,
+			    data->region_data_dma, 4, DMA_BIDIRECTIONAL);
+free_region_data:
+		kfree(data->region_data);
+free_local_storage:
+		vfree(data->local_storage);
+out:
+	return -1;
+}
+
+static void data_add_free_buffer(struct data *data, int index,
+				 struct rdma_dest *rdma_dest)
+{
+	struct recv_pool *pool = &data->recv_pool;
+	struct buff_pool_entry *bpe;
+	dma_addr_t vaddr_dma;
+
+	DATA_FUNCTION("data_add_free_buffer()\n");
+	rdma_dest->trailer->connection_hash_and_valid = 0;
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   pool->buf_pool_dma, pool->buf_pool_len,
+				   DMA_TO_DEVICE);
+
+	bpe = &pool->buf_pool[index];
+	bpe->rkey = cpu_to_be32(data->mr->rkey);
+	vaddr_dma = ib_dma_map_single(data->parent->config->ibdev,
+					rdma_dest->data, pool->buffer_sz,
+					DMA_FROM_DEVICE);
+	if (ib_dma_mapping_error(data->parent->config->ibdev, vaddr_dma)) {
+		DATA_ERROR("rdma_dest->data dma map error\n");
+		goto failure;
+	}
+	bpe->remote_addr = cpu_to_be64(vaddr_dma);
+	bpe->valid = (u32) (rdma_dest - &pool->recv_bufs[0]) + 1;
+	++pool->num_free_bufs;
+failure:
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      pool->buf_pool_dma, pool->buf_pool_len,
+				      DMA_TO_DEVICE);
+}
+
+/* NOTE: this routine is not reentrant */
+static void data_alloc_buffers(struct data *data, int initial_allocation)
+{
+	struct recv_pool *pool = &data->recv_pool;
+	struct rdma_dest *rdma_dest;
+	struct sk_buff *skb;
+	int index;
+
+	DATA_FUNCTION("data_alloc_buffers()\n");
+	index = ADD(pool->next_free_buf, pool->num_free_bufs,
+		    pool->eioc_pool_sz);
+
+	while (!list_empty(&pool->avail_recv_bufs)) {
+		rdma_dest =
+		    list_entry(pool->avail_recv_bufs.next,
+			       struct rdma_dest, list_ptrs);
+		if (!rdma_dest->skb) {
+			if (initial_allocation)
+				skb = alloc_skb(pool->buffer_sz + 2,
+						GFP_KERNEL);
+			else
+				skb = dev_alloc_skb(pool->buffer_sz + 2);
+			if (!skb)
+				break;
+			skb_reserve(skb, 2);
+			skb_put(skb, pool->buffer_sz);
+			rdma_dest->skb = skb;
+			rdma_dest->data = skb->data;
+			rdma_dest->trailer =
+			  (struct viport_trailer *)(rdma_dest->data +
+						    pool->buffer_sz -
+						    sizeof(struct
+							   viport_trailer));
+		}
+		rdma_dest->trailer->connection_hash_and_valid = 0;
+
+		list_del_init(&rdma_dest->list_ptrs);
+
+		data_add_free_buffer(data, index, rdma_dest);
+		index = NEXT(index, pool->eioc_pool_sz);
+	}
+}
+
+static void data_send_kick_message(struct data *data)
+{
+	struct xmit_pool *pool = &data->xmit_pool;
+	DATA_FUNCTION("data_send_kick_message()\n");
+	/* stop timer for bundle_timeout */
+	if (data->kick_timer_on) {
+		del_timer(&data->kick_timer);
+		data->kick_timer_on = 0;
+	}
+	pool->kick_count = 0;
+	pool->kick_byte_count = 0;
+
+	/* TODO: keep track of when kick is outstanding, and
+	 * don't reuse until complete
+	 */
+	if (vnic_ib_post_send(&data->ib_conn, &data->free_bufs_io.io)) {
+		DATA_ERROR("failed to post send\n");
+		viport_failure(data->parent);
+	}
+}
+
+static void data_send_free_recv_buffers(struct data *data)
+{
+	struct recv_pool *pool = &data->recv_pool;
+	struct ib_send_wr *swr = &data->free_bufs_io.io.swr;
+
+	int bufs_sent = 0;
+	u64 rdma_addr;
+	u32 offset;
+	u32 sz;
+	unsigned int num_to_send, next_increment;
+
+	DATA_FUNCTION("data_send_free_recv_buffers()\n");
+
+	for (num_to_send = pool->sz_free_bundle;
+	     num_to_send <= pool->num_free_bufs;
+	     num_to_send += pool->sz_free_bundle) {
+		/* handle multiple bundles as one when possible. */
+		next_increment = num_to_send + pool->sz_free_bundle;
+		if ((next_increment <= pool->num_free_bufs)
+		    && (pool->next_free_buf + next_increment <=
+			pool->eioc_pool_sz))
+			continue;
+
+		offset = pool->next_free_buf *
+				sizeof(struct buff_pool_entry);
+		sz = num_to_send * sizeof(struct buff_pool_entry);
+		rdma_addr = pool->eioc_rdma_addr + offset;
+		swr->sg_list->length = sz;
+		swr->sg_list->addr = pool->buf_pool_dma + offset;
+		swr->wr.rdma.remote_addr = rdma_addr;
+
+		if (vnic_ib_post_send(&data->ib_conn,
+		    &data->free_bufs_io.io)) {
+			DATA_ERROR("failed to post send\n");
+			viport_failure(data->parent);
+			return;
+		}
+		INC(pool->next_free_buf, num_to_send, pool->eioc_pool_sz);
+		pool->num_free_bufs -= num_to_send;
+		pool->num_posted_bufs += num_to_send;
+		bufs_sent = 1;
+	}
+
+	if (bufs_sent) {
+		if (pool->kick_on_free)
+			data_send_kick_message(data);
+	}
+	if (pool->num_posted_bufs == 0) {
+		struct vnic *vnic = data->parent->vnic;
+
+		if (vnic->current_path == &vnic->primary_path)
+			DATA_ERROR("%s: primary path: "
+					"unable to allocate receive buffers\n",
+					vnic->config->name);
+		else if (vnic->current_path == &vnic->secondary_path)
+			DATA_ERROR("%s: secondary path: "
+					"unable to allocate receive buffers\n",
+					vnic->config->name);
+		data->ib_conn.state = IB_CONN_ERRORED;
+		viport_failure(data->parent);
+	}
+}
+
+void data_connected(struct data *data)
+{
+	DATA_FUNCTION("data_connected()\n");
+	data->free_bufs_io.io.swr.wr.rdma.rkey =
+				data->recv_pool.eioc_rdma_rkey;
+	data_alloc_buffers(data, 1);
+	data_send_free_recv_buffers(data);
+	data->connected = 1;
+}
+
+void data_disconnect(struct data *data)
+{
+	struct xmit_pool *xmit_pool = &data->xmit_pool;
+	struct recv_pool *recv_pool = &data->recv_pool;
+	unsigned int i;
+
+	DATA_FUNCTION("data_disconnect()\n");
+
+	data->connected = 0;
+	if (data->kick_timer_on) {
+		del_timer_sync(&data->kick_timer);
+		data->kick_timer_on = 0;
+	}
+
+	if (ib_send_cm_dreq(data->ib_conn.cm_id, NULL, 0))
+		DATA_ERROR("data CM DREQ sending failed\n");
+	data->ib_conn.state = IB_CONN_DISCONNECTED;
+
+	completion_callback_cleanup(&data->ib_conn);
+
+	for (i = 0; i < xmit_pool->num_xmit_bufs; i++) {
+		if (xmit_pool->xmit_bufs[i].skb)
+			dev_kfree_skb(xmit_pool->xmit_bufs[i].skb);
+		xmit_pool->xmit_bufs[i].skb = NULL;
+
+	}
+	for (i = 0; i < recv_pool->pool_sz; i++) {
+		if (data->recv_pool.recv_bufs[i].skb)
+			dev_kfree_skb(recv_pool->recv_bufs[i].skb);
+		recv_pool->recv_bufs[i].skb = NULL;
+	}
+	vfree(data->local_storage);
+	if (data->region_data) {
+		ib_dma_unmap_single(data->parent->config->ibdev,
+				    data->region_data_dma, 4,
+				    DMA_BIDIRECTIONAL);
+		kfree(data->region_data);
+	}
+
+	if (recv_pool->buf_pool) {
+		ib_dma_unmap_single(data->parent->config->ibdev,
+				    recv_pool->buf_pool_dma,
+				    recv_pool->buf_pool_len, DMA_TO_DEVICE);
+		kfree(recv_pool->buf_pool);
+	}
+
+	if (xmit_pool->buf_pool) {
+		ib_dma_unmap_single(data->parent->config->ibdev,
+				    xmit_pool->buf_pool_dma,
+				    xmit_pool->buf_pool_len, DMA_FROM_DEVICE);
+		kfree(xmit_pool->buf_pool);
+	}
+
+	if (xmit_pool->xmit_data) {
+		ib_dma_unmap_single(data->parent->config->ibdev,
+				    xmit_pool->xmitdata_dma,
+				    xmit_pool->xmitdata_len, DMA_TO_DEVICE);
+		kfree(xmit_pool->xmit_data);
+	}
+}
+
+void data_cleanup(struct data *data)
+{
+	ib_destroy_cm_id(data->ib_conn.cm_id);
+
+	/* Completion callback cleanup called again.
+	 * This is to cleanup the threads in case there is an
+	 * error before state LINK_DATACONNECT due to which
+	 * data_disconnect is not called.
+	 */
+	completion_callback_cleanup(&data->ib_conn);
+	ib_destroy_qp(data->ib_conn.qp);
+	ib_destroy_cq(data->ib_conn.cq);
+	ib_dereg_mr(data->mr);
+
+}
+
+static int data_alloc_xmit_buffer(struct data *data, struct sk_buff *skb,
+				  struct buff_pool_entry **pp_bpe,
+				  struct rdma_io **pp_rdma_io,
+				  int *last)
+{
+	struct xmit_pool	*pool = &data->xmit_pool;
+	unsigned long		flags;
+	int			ret;
+
+	DATA_FUNCTION("data_alloc_xmit_buffer()\n");
+
+	spin_lock_irqsave(&data->xmit_buf_lock, flags);
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   pool->buf_pool_dma, pool->buf_pool_len,
+				   DMA_TO_DEVICE);
+	*last = 0;
+	*pp_rdma_io = &pool->xmit_bufs[pool->next_xmit_buf];
+	*pp_bpe = &pool->buf_pool[pool->next_xmit_pool];
+
+	if ((*pp_bpe)->valid && pool->next_xmit_buf !=
+	     pool->last_comp_buf) {
+		INC(pool->next_xmit_buf, 1, pool->num_xmit_bufs);
+		INC(pool->next_xmit_pool, 1, pool->pool_sz);
+		if (!pool->buf_pool[pool->next_xmit_pool].valid) {
+			DATA_INFO("just used the last EIOU"
+				  " receive buffer\n");
+			*last = 1;
+			pool->need_buffers = 1;
+			vnic_stop_xmit(data->parent->vnic,
+				       data->parent->parent);
+			data_kickreq_stats(data);
+		} else if (pool->next_xmit_buf == pool->last_comp_buf) {
+			DATA_INFO("just used our last xmit buffer\n");
+			pool->need_buffers = 1;
+			vnic_stop_xmit(data->parent->vnic,
+				       data->parent->parent);
+		}
+		(*pp_rdma_io)->skb = skb;
+		(*pp_bpe)->valid = 0;
+		ret = 0;
+	} else {
+		data_no_xmitbuf_stats(data);
+		DATA_ERROR("Out of xmit buffers\n");
+		vnic_stop_xmit(data->parent->vnic,
+			       data->parent->parent);
+		ret = -1;
+	}
+
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      pool->buf_pool_dma,
+				      pool->buf_pool_len, DMA_TO_DEVICE);
+	spin_unlock_irqrestore(&data->xmit_buf_lock, flags);
+	return ret;
+}
+
+static void data_rdma_packet(struct data *data, struct buff_pool_entry *bpe,
+			     struct rdma_io *rdma_io)
+{
+	struct ib_send_wr	*swr;
+	struct sk_buff		*skb;
+	dma_addr_t		trailer_data_dma;
+	dma_addr_t		skb_data_dma;
+	struct xmit_pool	*xmit_pool = &data->xmit_pool;
+	struct viport		*viport = data->parent;
+	u8			*d;
+	int			len;
+	int			fill_len;
+
+	DATA_FUNCTION("data_rdma_packet()\n");
+	swr = &rdma_io->io.swr;
+	skb = rdma_io->skb;
+	len = ALIGN(rdma_io->len, VIPORT_TRAILER_ALIGNMENT);
+	fill_len = len - skb->len;
+
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   xmit_pool->xmitdata_dma,
+				   xmit_pool->xmitdata_len, DMA_TO_DEVICE);
+
+	d = (u8 *) rdma_io->trailer - fill_len;
+	trailer_data_dma = rdma_io->trailer_dma - fill_len;
+	memset(d, 0, fill_len);
+
+	swr->sg_list[0].length = skb->len;
+	if (skb->len <= min_xmt_skb) {
+		memcpy(rdma_io->data, skb->data, skb->len);
+		swr->sg_list[0].lkey = data->mr->lkey;
+		swr->sg_list[0].addr = rdma_io->data_dma;
+		dev_kfree_skb_any(skb);
+		rdma_io->skb = NULL;
+	} else {
+		swr->sg_list[0].lkey = data->mr->lkey;
+
+		skb_data_dma = ib_dma_map_single(viport->config->ibdev,
+						skb->data, skb->len,
+						DMA_TO_DEVICE);
+
+		if (ib_dma_mapping_error(viport->config->ibdev, skb_data_dma)) {
+			DATA_ERROR("skb data dma map error\n");
+			goto failure;
+		}
+
+		rdma_io->skb_data_dma = skb_data_dma;
+
+		swr->sg_list[0].addr = skb_data_dma;
+		skb_orphan(skb);
+	}
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   xmit_pool->buf_pool_dma,
+				   xmit_pool->buf_pool_len, DMA_TO_DEVICE);
+
+	swr->sg_list[1].addr = trailer_data_dma;
+	swr->sg_list[1].length = fill_len + sizeof(struct viport_trailer);
+	swr->sg_list[0].lkey = data->mr->lkey;
+	swr->wr.rdma.remote_addr = be64_to_cpu(bpe->remote_addr);
+	swr->wr.rdma.remote_addr += data->xmit_pool.buffer_sz;
+	swr->wr.rdma.remote_addr -= (sizeof(struct viport_trailer) + len);
+	swr->wr.rdma.rkey = be32_to_cpu(bpe->rkey);
+
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      xmit_pool->buf_pool_dma,
+				      xmit_pool->buf_pool_len, DMA_TO_DEVICE);
+
+	/* If VNIC_FEAT_RDMA_IMMED is supported then change the work request
+	 * opcode to IB_WR_RDMA_WRITE_WITH_IMM
+	 */
+
+	if (data->parent->features_supported & VNIC_FEAT_RDMA_IMMED) {
+		swr->ex.imm_data = 0;
+		swr->opcode = IB_WR_RDMA_WRITE_WITH_IMM;
+	}
+
+	data->xmit_pool.notify_count++;
+	if (data->xmit_pool.notify_count >= data->xmit_pool.notify_bundle) {
+		data->xmit_pool.notify_count = 0;
+		swr->send_flags = IB_SEND_SIGNALED;
+	} else {
+		swr->send_flags = 0;
+	}
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      xmit_pool->xmitdata_dma,
+				      xmit_pool->xmitdata_len, DMA_TO_DEVICE);
+	if (vnic_ib_post_send(&data->ib_conn, &rdma_io->io)) {
+		DATA_ERROR("failed to post send for data RDMA write\n");
+		viport_failure(data->parent);
+		goto failure;
+	}
+
+	data_xmits_stats(data);
+failure:
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      xmit_pool->xmitdata_dma,
+				      xmit_pool->xmitdata_len, DMA_TO_DEVICE);
+}
+
+static void data_kick_timeout_handler(unsigned long arg)
+{
+	struct data *data = (struct data *)arg;
+
+	DATA_FUNCTION("data_kick_timeout_handler()\n");
+	data->kick_timer_on = 0;
+	data_send_kick_message(data);
+}
+
+int data_xmit_packet(struct data *data, struct sk_buff *skb)
+{
+	struct xmit_pool	*pool = &data->xmit_pool;
+	struct rdma_io		*rdma_io;
+	struct buff_pool_entry	*bpe;
+	struct viport_trailer	*trailer;
+	unsigned int		sz = skb->len;
+	int			last;
+
+	DATA_FUNCTION("data_xmit_packet()\n");
+	if (sz > pool->buffer_sz) {
+		DATA_ERROR("outbound packet too large, size = %d\n", sz);
+		return -1;
+	}
+
+	if (data_alloc_xmit_buffer(data, skb, &bpe, &rdma_io, &last)) {
+		DATA_ERROR("error in allocating data xmit buffer\n");
+		return -1;
+	}
+
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   pool->xmitdata_dma, pool->xmitdata_len,
+				   DMA_TO_DEVICE);
+	trailer = rdma_io->trailer;
+
+	memset(trailer, 0, sizeof *trailer);
+	memcpy(trailer->dest_mac_addr, skb->data, ETH_ALEN);
+
+	if (skb->sk)
+		trailer->connection_hash_and_valid = 0x40 |
+			 ((be16_to_cpu(inet_sk(skb->sk)->sport) +
+			   be16_to_cpu(inet_sk(skb->sk)->dport)) & 0x3f);
+
+	trailer->connection_hash_and_valid |= CHV_VALID;
+
+	if ((sz > 16) && (*(__be16 *) (skb->data + 12) ==
+			   __constant_cpu_to_be16(ETH_P_8021Q))) {
+		trailer->vlan = *(__be16 *) (skb->data + 14);
+		memmove(skb->data + 4, skb->data, 12);
+		skb_pull(skb, 4);
+		sz -= 4;
+		trailer->pkt_flags |= PF_VLAN_INSERT;
+	}
+	if (last)
+		trailer->pkt_flags |= PF_KICK;
+	if (sz < ETH_ZLEN) {
+		/* EIOU requires all packets to be
+		 * of ethernet minimum packet size.
+		 */
+		trailer->data_length = __constant_cpu_to_be16(ETH_ZLEN);
+		rdma_io->len = ETH_ZLEN;
+	} else {
+		trailer->data_length = cpu_to_be16(sz);
+		rdma_io->len = sz;
+	}
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		trailer->tx_chksum_flags = TX_CHKSUM_FLAGS_CHECKSUM_V4
+		    | TX_CHKSUM_FLAGS_IP_CHECKSUM
+		    | TX_CHKSUM_FLAGS_TCP_CHECKSUM
+		    | TX_CHKSUM_FLAGS_UDP_CHECKSUM;
+	}
+
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      pool->xmitdata_dma, pool->xmitdata_len,
+				      DMA_TO_DEVICE);
+	data_rdma_packet(data, bpe, rdma_io);
+
+	if (pool->send_kicks) {
+		/* EIOC needs kicks to inform it of sent packets */
+		pool->kick_count++;
+		pool->kick_byte_count += sz;
+		if ((pool->kick_count >= pool->kick_bundle)
+		    || (pool->kick_byte_count >= pool->kick_byte_bundle)) {
+			data_send_kick_message(data);
+		} else if (pool->kick_count == 1) {
+			init_timer(&data->kick_timer);
+			/* timeout_before_kick is in usec */
+			data->kick_timer.expires =
+			   msecs_to_jiffies(be32_to_cpu(data->
+				eioc_pool_parms.timeout_before_kick) * 1000)
+				+ jiffies;
+			data->kick_timer.data = (unsigned long)data;
+			data->kick_timer.function = data_kick_timeout_handler;
+			add_timer(&data->kick_timer);
+			data->kick_timer_on = 1;
+		}
+	}
+	return 0;
+}
+
+static void data_check_xmit_buffers(struct data *data)
+{
+	struct xmit_pool *pool = &data->xmit_pool;
+	unsigned long flags;
+
+	DATA_FUNCTION("data_check_xmit_buffers()\n");
+	spin_lock_irqsave(&data->xmit_buf_lock, flags);
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   pool->buf_pool_dma, pool->buf_pool_len,
+				   DMA_TO_DEVICE);
+
+	if (data->xmit_pool.need_buffers
+	    && pool->buf_pool[pool->next_xmit_pool].valid
+	    && pool->next_xmit_buf != pool->last_comp_buf) {
+		data->xmit_pool.need_buffers = 0;
+		vnic_restart_xmit(data->parent->vnic,
+				  data->parent->parent);
+		DATA_INFO("there are free xmit buffers\n");
+	}
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+				      pool->buf_pool_dma, pool->buf_pool_len,
+				      DMA_TO_DEVICE);
+
+	spin_unlock_irqrestore(&data->xmit_buf_lock, flags);
+}
+
+static struct sk_buff *data_recv_to_skbuff(struct data *data,
+					   struct rdma_dest *rdma_dest)
+{
+	struct viport_trailer *trailer;
+	struct sk_buff *skb = NULL;
+	int start;
+	unsigned int len;
+	u8 rx_chksum_flags;
+
+	DATA_FUNCTION("data_recv_to_skbuff()\n");
+	trailer = rdma_dest->trailer;
+	start = data_offset(data, trailer);
+	len = data_len(data, trailer);
+
+	if (len <= min_rcv_skb)
+		skb = dev_alloc_skb(len + VLAN_HLEN + 2);
+			 /* leave room for VLAN header and alignment */
+	if (skb) {
+		skb_reserve(skb, VLAN_HLEN + 2);
+		memcpy(skb->data, rdma_dest->data + start, len);
+		skb_put(skb, len);
+	} else {
+		skb = rdma_dest->skb;
+		rdma_dest->skb = NULL;
+		rdma_dest->trailer = NULL;
+		rdma_dest->data = NULL;
+		skb_pull(skb, start);
+		skb_trim(skb, len);
+	}
+
+	rx_chksum_flags = trailer->rx_chksum_flags;
+	DATA_INFO("rx_chksum_flags = %d, LOOP = %c, IP = %c,"
+	     " TCP = %c, UDP = %c\n",
+	     rx_chksum_flags,
+	     (rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK) ? 'Y' : 'N',
+	     (rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED) ? 'Y'
+	     : (rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_FAILED) ? 'N' :
+	     '-',
+	     (rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED) ? 'Y'
+	     : (rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_FAILED) ? 'N' :
+	     '-',
+	     (rx_chksum_flags & RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED) ? 'Y'
+	     : (rx_chksum_flags & RX_CHKSUM_FLAGS_UDP_CHECKSUM_FAILED) ? 'N' :
+	     '-');
+
+	if ((rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK)
+	    || ((rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED)
+		&& ((rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED)
+		    || (rx_chksum_flags &
+			RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED))))
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	else
+		skb->ip_summed = CHECKSUM_NONE;
+
+	if ((trailer->pkt_flags & PF_VLAN_INSERT) &&
+		!(data->parent->features_supported & VNIC_FEAT_IGNORE_VLAN)) {
+		u8 *rv;
+
+		rv = skb_push(skb, 4);
+		memmove(rv, rv + 4, 12);
+		*(__be16 *) (rv + 12) = __constant_cpu_to_be16(ETH_P_8021Q);
+		if (trailer->pkt_flags & PF_PVID_OVERRIDDEN)
+			*(__be16 *) (rv + 14) = trailer->vlan &
+					__constant_cpu_to_be16(0xF000);
+		else
+			*(__be16 *) (rv + 14) = trailer->vlan;
+	}
+
+	return skb;
+}
+
+static int data_incoming_recv(struct data *data)
+{
+	struct recv_pool *pool = &data->recv_pool;
+	struct rdma_dest *rdma_dest;
+	struct viport_trailer *trailer;
+	struct buff_pool_entry *bpe;
+	struct sk_buff *skb;
+	dma_addr_t vaddr_dma;
+
+	DATA_FUNCTION("data_incoming_recv()\n");
+	if (pool->next_full_buf == pool->next_free_buf)
+		return -1;
+	bpe = &pool->buf_pool[pool->next_full_buf];
+	vaddr_dma = be64_to_cpu(bpe->remote_addr);
+	rdma_dest = &pool->recv_bufs[bpe->valid - 1];
+	trailer = rdma_dest->trailer;
+
+	if (!trailer
+	    || !(trailer->connection_hash_and_valid & CHV_VALID))
+		return -1;
+
+	/* received a packet */
+	if (trailer->pkt_flags & PF_KICK)
+		pool->kick_on_free = 1;
+
+	skb = data_recv_to_skbuff(data, rdma_dest);
+
+	if (skb) {
+		vnic_recv_packet(data->parent->vnic,
+				 data->parent->parent, skb);
+		list_add(&rdma_dest->list_ptrs, &pool->avail_recv_bufs);
+	}
+
+	ib_dma_unmap_single(data->parent->config->ibdev,
+			    vaddr_dma, pool->buffer_sz,
+			    DMA_FROM_DEVICE);
+	ib_dma_sync_single_for_cpu(data->parent->config->ibdev,
+				   pool->buf_pool_dma, pool->buf_pool_len,
+				   DMA_TO_DEVICE);
+
+	bpe->valid = 0;
+	ib_dma_sync_single_for_device(data->parent->config->ibdev,
+					pool->buf_pool_dma, pool->buf_pool_len,
+					DMA_TO_DEVICE);
+
+	INC(pool->next_full_buf, 1, pool->eioc_pool_sz);
+	pool->num_posted_bufs--;
+	data_recvs_stats(data);
+	return 0;
+}
+
+static void data_received_kick(struct io *io)
+{
+	struct data *data = &io->viport->data;
+	unsigned long flags;
+
+	DATA_FUNCTION("data_received_kick()\n");
+	data_note_kickrcv_time();
+	spin_lock_irqsave(&data->recv_ios_lock, flags);
+	list_add(&io->list_ptrs, &data->recv_ios);
+	spin_unlock_irqrestore(&data->recv_ios_lock, flags);
+	data_post_recvs(data);
+	data_rcvkicks_stats(data);
+	data_check_xmit_buffers(data);
+
+	while (!data_incoming_recv(data));
+
+	if (data->connected) {
+		data_alloc_buffers(data, 0);
+		data_send_free_recv_buffers(data);
+	}
+}
+
+static void data_xmit_complete(struct io *io)
+{
+	struct rdma_io *rdma_io = (struct rdma_io *)io;
+	struct data *data = &io->viport->data;
+	struct xmit_pool *pool = &data->xmit_pool;
+	struct sk_buff *skb;
+
+	DATA_FUNCTION("data_xmit_complete()\n");
+
+	if (rdma_io->skb)
+		ib_dma_unmap_single(data->parent->config->ibdev,
+				    rdma_io->skb_data_dma, rdma_io->skb->len,
+				    DMA_TO_DEVICE);
+
+	while (pool->last_comp_buf != rdma_io->index) {
+		INC(pool->last_comp_buf, 1, pool->num_xmit_bufs);
+		skb = pool->xmit_bufs[pool->last_comp_buf].skb;
+		if (skb)
+			dev_kfree_skb_any(skb);
+		pool->xmit_bufs[pool->last_comp_buf].skb = NULL;
+	}
+
+	data_check_xmit_buffers(data);
+}
+
+static int mc_data_alloc_skb(struct ud_recv_io *recv_io, u32 len,
+				int initial_allocation)
+{
+	struct sk_buff *skb;
+	struct mc_data *mc_data = &recv_io->io.viport->mc_data;
+
+	DATA_FUNCTION("mc_data_alloc_skb\n");
+	if (initial_allocation)
+		skb = alloc_skb(len, GFP_KERNEL);
+	else
+		skb = alloc_skb(len, GFP_ATOMIC);
+	if (!skb) {
+		DATA_ERROR("failed to alloc MULTICAST skb\n");
+		return -1;
+	}
+	skb_put(skb, len);
+	recv_io->skb = skb;
+
+	recv_io->skb_data_dma = ib_dma_map_single(
+					recv_io->io.viport->config->ibdev,
+					skb->data, skb->len,
+					DMA_FROM_DEVICE);
+
+	if (ib_dma_mapping_error(recv_io->io.viport->config->ibdev,
+			recv_io->skb_data_dma)) {
+		DATA_ERROR("skb data dma map error\n");
+		dev_kfree_skb(skb);
+		return -1;
+	}
+
+	recv_io->list[0].addr = recv_io->skb_data_dma;
+	recv_io->list[0].length = sizeof(struct ib_grh);
+	recv_io->list[0].lkey = mc_data->mr->lkey;
+
+	recv_io->list[1].addr = recv_io->skb_data_dma + sizeof(struct ib_grh);
+	recv_io->list[1].length = len - sizeof(struct ib_grh);
+	recv_io->list[1].lkey = mc_data->mr->lkey;
+
+	recv_io->io.rwr.wr_id = (u64)&recv_io->io;
+	recv_io->io.rwr.sg_list = recv_io->list;
+	recv_io->io.rwr.num_sge = 2;
+	recv_io->io.rwr.next = NULL;
+
+	return 0;
+}
+
+static int mc_data_alloc_buffers(struct mc_data *mc_data)
+{
+	unsigned int i, num;
+	struct ud_recv_io *bufs = NULL, *recv_io;
+
+	DATA_FUNCTION("mc_data_alloc_buffers\n");
+	if (!mc_data->skb_len) {
+		unsigned int len;
+		/* align multicast msg buffer on viport_trailer boundary */
+		len = (MCAST_MSG_SIZE + VIPORT_TRAILER_ALIGNMENT - 1) &
+				(~((unsigned int)VIPORT_TRAILER_ALIGNMENT - 1));
+		/*
+		 * Add size of grh and trailer -
+		 * note, we don't need a + 4 for vlan because we have room in
+		 * netbuf for grh & trailer and we'll strip them both, so there
+		 * will be room enough to handle the 4 byte insertion for vlan.
+		 */
+		len +=	sizeof(struct ib_grh) +
+				sizeof(struct viport_trailer);
+		mc_data->skb_len = len;
+		DATA_INFO("mc_data->skb_len %d (sizes:%d %d)\n",
+					len, (int)sizeof(struct ib_grh),
+					(int)sizeof(struct viport_trailer));
+	}
+	mc_data->recv_len = sizeof(struct ud_recv_io) * mc_data->num_recvs;
+	bufs = kmalloc(mc_data->recv_len, GFP_KERNEL);
+	if (!bufs) {
+		DATA_ERROR("failed to allocate MULTICAST buffers size:%d\n",
+				mc_data->recv_len);
+		return -1;
+	}
+	DATA_INFO("allocated num_recvs:%d recv_len:%d \n",
+			mc_data->num_recvs, mc_data->recv_len);
+	for (num = 0; num < mc_data->num_recvs; num++) {
+		recv_io = &bufs[num];
+		recv_io->len = mc_data->skb_len;
+		recv_io->io.type = RECV_UD;
+		recv_io->io.viport = mc_data->parent;
+		recv_io->io.routine = mc_data_recv_routine;
+
+		if (mc_data_alloc_skb(recv_io, mc_data->skb_len, 1)) {
+			for (i = 0; i < num; i++) {
+				recv_io = &bufs[i];
+				ib_dma_unmap_single(recv_io->io.viport->config->ibdev,
+						    recv_io->skb_data_dma,
+						    recv_io->skb->len,
+						    DMA_FROM_DEVICE);
+				dev_kfree_skb(recv_io->skb);
+			}
+			kfree(bufs);
+			return -1;
+		}
+		list_add_tail(&recv_io->io.list_ptrs,
+						 &mc_data->avail_recv_ios_list);
+	}
+	mc_data->recv_ios = bufs;
+	return 0;
+}
+
+void mc_data_cleanup(struct mc_data *mc_data)
+{
+	DATA_FUNCTION("mc_data_cleanup\n");
+	completion_callback_cleanup(&mc_data->ib_conn);
+	if (!IS_ERR(mc_data->ib_conn.qp)) {
+		ib_destroy_qp(mc_data->ib_conn.qp);
+		mc_data->ib_conn.qp = (struct ib_qp *)ERR_PTR(-EINVAL);
+	}
+	if (!IS_ERR(mc_data->ib_conn.cq)) {
+		ib_destroy_cq(mc_data->ib_conn.cq);
+		mc_data->ib_conn.cq = (struct ib_cq *)ERR_PTR(-EINVAL);
+	}
+	kfree(mc_data->recv_ios);
+	mc_data->recv_ios = (struct ud_recv_io *)NULL;
+	if (mc_data->mr) {
+		ib_dereg_mr(mc_data->mr);
+		mc_data->mr = (struct ib_mr *)NULL;
+	}
+	DATA_FUNCTION("mc_data_cleanup done\n");
+
+}
+
+int mc_data_init(struct mc_data *mc_data, struct viport *viport,
+	      struct data_config *config, struct ib_pd *pd)
+{
+	DATA_FUNCTION("mc_data_init()\n");
+
+	mc_data->num_recvs = viport->data.config->num_recvs;
+
+	INIT_LIST_HEAD(&mc_data->avail_recv_ios_list);
+	spin_lock_init(&mc_data->recv_lock);
+
+	mc_data->parent = viport;
+	mc_data->config = config;
+
+	mc_data->ib_conn.cm_id = NULL;
+	mc_data->ib_conn.viport = viport;
+	mc_data->ib_conn.ib_config = &config->ib_config;
+	mc_data->ib_conn.state = IB_CONN_UNINITTED;
+	mc_data->ib_conn.callback_thread = NULL;
+	mc_data->ib_conn.callback_thread_end = 0;
+
+	if (vnic_ib_mc_init(mc_data, viport, pd,
+			      &config->ib_config)) {
+		DATA_ERROR("vnic_ib_mc_init failed\n");
+		goto failure;
+	}
+	mc_data->mr = ib_get_dma_mr(pd,
+				 IB_ACCESS_LOCAL_WRITE |
+				 IB_ACCESS_REMOTE_WRITE);
+	if (IS_ERR(mc_data->mr)) {
+		DATA_ERROR("failed to register memory for"
+			   " mc_data connection\n");
+		goto destroy_conn;
+	}
+
+	if (mc_data_alloc_buffers(mc_data))
+		goto dereg_mr;
+
+	mc_data_post_recvs(mc_data);
+	if (vnic_ib_mc_mod_qp_to_rts(mc_data->ib_conn.qp))
+		goto dereg_mr;
+
+	return 0;
+
+dereg_mr:
+	ib_dereg_mr(mc_data->mr);
+	mc_data->mr = (struct ib_mr *)NULL;
+destroy_conn:
+	completion_callback_cleanup(&mc_data->ib_conn);
+	ib_destroy_qp(mc_data->ib_conn.qp);
+	mc_data->ib_conn.qp = (struct ib_qp *)ERR_PTR(-EINVAL);
+	ib_destroy_cq(mc_data->ib_conn.cq);
+	mc_data->ib_conn.cq = (struct ib_cq *)ERR_PTR(-EINVAL);
+failure:
+	return -1;
+}
+
+static void mc_data_post_recvs(struct mc_data *mc_data)
+{
+	unsigned long flags;
+	int i = 0;
+	DATA_FUNCTION("mc_data_post_recvs\n");
+	spin_lock_irqsave(&mc_data->recv_lock, flags);
+	while (!list_empty(&mc_data->avail_recv_ios_list)) {
+		struct io *io = list_entry(mc_data->avail_recv_ios_list.next,
+				struct io, list_ptrs);
+		struct ud_recv_io *recv_io =
+					container_of(io, struct ud_recv_io, io);
+		list_del(&recv_io->io.list_ptrs);
+		spin_unlock_irqrestore(&mc_data->recv_lock, flags);
+		if (vnic_ib_mc_post_recv(mc_data, &recv_io->io)) {
+			viport_failure(mc_data->parent);
+			return;
+		}
+		spin_lock_irqsave(&mc_data->recv_lock, flags);
+		i++;
+	}
+	DATA_INFO("mcdata posted %d %p\n", i, &mc_data->avail_recv_ios_list);
+	spin_unlock_irqrestore(&mc_data->recv_lock, flags);
+}
+
+static void mc_data_recv_routine(struct io *io)
+{
+	struct sk_buff *skb;
+	struct ib_grh *grh;
+	struct viport_trailer *trailer;
+	struct mc_data *mc_data;
+	unsigned long flags;
+	struct ud_recv_io *recv_io = container_of(io, struct ud_recv_io, io);
+	union ib_gid_cpu sgid;
+
+	DATA_FUNCTION("mc_data_recv_routine\n");
+	skb = recv_io->skb;
+	grh = (struct ib_grh *)skb->data;
+	mc_data = &recv_io->io.viport->mc_data;
+
+	ib_dma_unmap_single(recv_io->io.viport->config->ibdev,
+			    recv_io->skb_data_dma, recv_io->skb->len,
+			    DMA_FROM_DEVICE);
+
+	/* first - check if we've got our own mc packet  */
+	/* convert sgid from host to cpu form before comparing */
+	bswap_ib_gid(&grh->sgid, &sgid);
+	if (cpu_to_be64(sgid.global.interface_id) ==
+		io->viport->config->path_info.path.sgid.global.interface_id) {
+		DATA_ERROR("dropping - our mc packet\n");
+		dev_kfree_skb(skb);
+	} else {
+		/* GRH is at head and trailer at end. Remove GRH from head.  */
+		trailer = (struct viport_trailer *)
+				(skb->data + recv_io->len -
+				 sizeof(struct viport_trailer));
+		skb_pull(skb, sizeof(struct ib_grh));
+		if (trailer->connection_hash_and_valid & CHV_VALID) {
+			mc_data_recv_to_skbuff(io->viport, skb, trailer);
+			vnic_recv_packet(io->viport->vnic, io->viport->parent,
+					skb);
+			vnic_multicast_recv_pkt_stats(io->viport->vnic);
+		} else {
+			DATA_ERROR("dropping - no CHV_VALID in HashAndValid\n");
+			dev_kfree_skb(skb);
+		}
+	}
+	recv_io->skb = NULL;
+	if (mc_data_alloc_skb(recv_io, mc_data->skb_len, 0))
+		return;
+
+	spin_lock_irqsave(&mc_data->recv_lock, flags);
+	list_add_tail(&recv_io->io.list_ptrs, &mc_data->avail_recv_ios_list);
+	spin_unlock_irqrestore(&mc_data->recv_lock, flags);
+	mc_data_post_recvs(mc_data);
+	return;
+}
+
+static void mc_data_recv_to_skbuff(struct viport *viport, struct sk_buff *skb,
+				   struct viport_trailer *trailer)
+{
+	u8 rx_chksum_flags = trailer->rx_chksum_flags;
+
+	/* drop alignment bytes at start */
+	skb_pull(skb, trailer->data_alignment_offset);
+	/* drop excess from end */
+	skb_trim(skb, __be16_to_cpu(trailer->data_length));
+
+	if ((rx_chksum_flags & RX_CHKSUM_FLAGS_LOOPBACK)
+	    || ((rx_chksum_flags & RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED)
+		&& ((rx_chksum_flags & RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED)
+		    || (rx_chksum_flags &
+			RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED))))
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+	else
+		skb->ip_summed = CHECKSUM_NONE;
+
+	if ((trailer->pkt_flags & PF_VLAN_INSERT) &&
+	    !(viport->features_supported & VNIC_FEAT_IGNORE_VLAN)) {
+		u8 *rv;
+
+		/* insert VLAN id between source & length */
+		DATA_INFO("VLAN adjustment\n");
+		rv = skb_push(skb, 4);
+		memmove(rv, rv + 4, 12);
+		*(__be16 *) (rv + 12) = __constant_cpu_to_be16(ETH_P_8021Q);
+		if (trailer->pkt_flags & PF_PVID_OVERRIDDEN)
+		/*
+		 *  Indicates VLAN is 0 but we keep the protocol id.
+		 */
+			*(__be16 *) (rv + 14) = trailer->vlan &
+					__constant_cpu_to_be16(0xF000);
+		else
+			*(__be16 *) (rv + 14) = trailer->vlan;
+		DATA_INFO("vlan:%x\n", *(int *)(rv+14));
+	}
+
+    return;
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h
new file mode 100644
index 0000000..ad77aa9
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_data.h
@@ -0,0 +1,206 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_DATA_H_INCLUDED
+#define VNIC_DATA_H_INCLUDED
+
+#include <linux/if_vlan.h>
+
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+#include <linux/timex.h>
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+
+#include "vnic_ib.h"
+#include "vnic_control_pkt.h"
+#include "vnic_trailer.h"
+
+struct rdma_dest {
+	struct list_head	list_ptrs;
+	struct sk_buff		*skb;
+	u8			*data;
+	struct viport_trailer	*trailer __attribute__((aligned(32)));
+};
+
+struct buff_pool_entry {
+	__be64	remote_addr;
+	__be32	rkey;
+	u32	valid;
+};
+
+struct recv_pool {
+	u32			buffer_sz;
+	u32			pool_sz;
+	u32			eioc_pool_sz;
+	u32	 		eioc_rdma_rkey;
+	u64 			eioc_rdma_addr;
+	u32 			next_full_buf;
+	u32 			next_free_buf;
+	u32 			num_free_bufs;
+	u32 			num_posted_bufs;
+	u32 			sz_free_bundle;
+	int			kick_on_free;
+	struct buff_pool_entry	*buf_pool;
+	dma_addr_t		buf_pool_dma;
+	int			buf_pool_len;
+	struct rdma_dest	*recv_bufs;
+	struct list_head	avail_recv_bufs;
+};
+
+struct xmit_pool {
+	u32			buffer_sz;
+	u32 			pool_sz;
+	u32 			notify_count;
+	u32 			notify_bundle;
+	u32 			next_xmit_buf;
+	u32 			last_comp_buf;
+	u32 			num_xmit_bufs;
+	u32 			next_xmit_pool;
+	u32 			kick_count;
+	u32 			kick_byte_count;
+	u32 			kick_bundle;
+	u32 			kick_byte_bundle;
+	int			need_buffers;
+	int			send_kicks;
+	uint32_t 		rdma_rkey;
+	u64 			rdma_addr;
+	struct buff_pool_entry	*buf_pool;
+	dma_addr_t		buf_pool_dma;
+	int			buf_pool_len;
+	struct rdma_io		*xmit_bufs;
+	u8			*xmit_data;
+	dma_addr_t		xmitdata_dma;
+	int			xmitdata_len;
+};
+
+struct data {
+	struct viport			*parent;
+	struct data_config		*config;
+	struct ib_mr			*mr;
+	struct vnic_ib_conn		ib_conn;
+	u8				*local_storage;
+	struct vnic_recv_pool_config	host_pool_parms;
+	struct vnic_recv_pool_config	eioc_pool_parms;
+	struct recv_pool		recv_pool;
+	struct xmit_pool		xmit_pool;
+	u8				*region_data;
+	dma_addr_t			region_data_dma;
+	struct rdma_io			free_bufs_io;
+	struct send_io			kick_io;
+	struct list_head		recv_ios;
+	spinlock_t			recv_ios_lock;
+	spinlock_t			xmit_buf_lock;
+	int				kick_timer_on;
+	int				connected;
+	u16				max_mtu;
+	struct timer_list		kick_timer;
+	struct completion		done;
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+	struct {
+		u32		xmit_num;
+		u32		recv_num;
+		u32		free_buf_sends;
+		u32		free_buf_num;
+		u32		free_buf_min;
+		u32		kick_recvs;
+		u32		kick_reqs;
+		u32		no_xmit_bufs;
+		cycles_t	no_xmit_buf_time;
+	} statistics;
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+};
+
+struct mc_data {
+    struct viport           *parent;
+    struct data_config      *config;
+    struct ib_mr            *mr;
+    struct vnic_ib_conn     ib_conn;
+
+    u32                     num_recvs;
+    u32                     skb_len;
+    spinlock_t              recv_lock;
+    int                     recv_len;
+    struct ud_recv_io      *recv_ios;
+    struct list_head        avail_recv_ios_list;
+};
+
+int data_init(struct data *data, struct viport *viport,
+	      struct data_config *config, struct ib_pd *pd);
+
+int  data_connect(struct data *data);
+void data_connected(struct data *data);
+void data_disconnect(struct data *data);
+
+int data_xmit_packet(struct data *data, struct sk_buff *skb);
+
+void data_cleanup(struct data *data);
+
+#define data_is_connected(data)		\
+	(vnic_ib_conn_connected(&((data)->ib_conn)))
+#define data_path_id(data)		(data)->config->path_id
+#define data_eioc_pool(data)		&(data)->eioc_pool_parms
+#define data_host_pool(data)		&(data)->host_pool_parms
+#define data_eioc_pool_min(data)	&(data)->config->eioc_min
+#define data_host_pool_min(data)	&(data)->config->host_min
+#define data_eioc_pool_max(data)	&(data)->config->eioc_max
+#define data_host_pool_max(data)	&(data)->config->host_max
+#define data_local_pool_addr(data)	(data)->xmit_pool.rdma_addr
+#define data_local_pool_rkey(data)	(data)->xmit_pool.rdma_rkey
+#define data_remote_pool_addr(data)	&(data)->recv_pool.eioc_rdma_addr
+#define data_remote_pool_rkey(data)	&(data)->recv_pool.eioc_rdma_rkey
+
+#define data_max_mtu(data)		(data)->max_mtu
+
+
+#define data_len(data, trailer)		be16_to_cpu(trailer->data_length)
+#define data_offset(data, trailer)					\
+	((data)->recv_pool.buffer_sz - sizeof(struct viport_trailer)	\
+	- ALIGN(data_len((data), (trailer)), VIPORT_TRAILER_ALIGNMENT)	\
+	+ (trailer->data_alignment_offset))
+
+/* the following macros manipulate ring buffer indexes.
+ * the ring buffer size must be a power of 2.
+ */
+#define ADD(index, increment, size)	(((index) + (increment))&((size) - 1))
+#define NEXT(index, size)		ADD(index, 1, size)
+#define INC(index, increment, size)	(index) = ADD(index, increment, size)
+
+/* this is max multicast msg embedded will send */
+#define MCAST_MSG_SIZE \
+		(2048 - sizeof(struct ib_grh) - sizeof(struct viport_trailer))
+
+int mc_data_init(struct mc_data *mc_data, struct viport *viport,
+	struct data_config *config,
+	struct ib_pd *pd);
+
+void mc_data_cleanup(struct mc_data *mc_data);
+
+#endif	/* VNIC_DATA_H_INCLUDED */
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h
new file mode 100644
index 0000000..dd8a073
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_trailer.h
@@ -0,0 +1,103 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_TRAILER_H_INCLUDED
+#define VNIC_TRAILER_H_INCLUDED
+
+/* pkt_flags values */
+enum {
+	PF_CHASH_VALID		= 0x01,
+	PF_IPSEC_VALID		= 0x02,
+	PF_TCP_SEGMENT		= 0x04,
+	PF_KICK			= 0x08,
+	PF_VLAN_INSERT		= 0x10,
+	PF_PVID_OVERRIDDEN 	= 0x20,
+	PF_FCS_INCLUDED 	= 0x40,
+	PF_FORCE_ROUTE		= 0x80
+};
+
+/* tx_chksum_flags values */
+enum {
+	TX_CHKSUM_FLAGS_CHECKSUM_V4	= 0x01,
+	TX_CHKSUM_FLAGS_CHECKSUM_V6	= 0x02,
+	TX_CHKSUM_FLAGS_TCP_CHECKSUM	= 0x04,
+	TX_CHKSUM_FLAGS_UDP_CHECKSUM	= 0x08,
+	TX_CHKSUM_FLAGS_IP_CHECKSUM	= 0x10
+};
+
+/* rx_chksum_flags values */
+enum {
+	RX_CHKSUM_FLAGS_TCP_CHECKSUM_FAILED	= 0x01,
+	RX_CHKSUM_FLAGS_UDP_CHECKSUM_FAILED	= 0x02,
+	RX_CHKSUM_FLAGS_IP_CHECKSUM_FAILED	= 0x04,
+	RX_CHKSUM_FLAGS_TCP_CHECKSUM_SUCCEEDED	= 0x08,
+	RX_CHKSUM_FLAGS_UDP_CHECKSUM_SUCCEEDED	= 0x10,
+	RX_CHKSUM_FLAGS_IP_CHECKSUM_SUCCEEDED	= 0x20,
+	RX_CHKSUM_FLAGS_LOOPBACK		= 0x40,
+	RX_CHKSUM_FLAGS_RESERVED		= 0x80
+};
+
+/* connection_hash_and_valid values */
+enum {
+	CHV_VALID	= 0x80,
+	CHV_HASH_MASH	= 0x7f
+};
+
+struct viport_trailer {
+	s8	data_alignment_offset;
+	u8	rndis_header_length;	/* reserved for use by edp */
+	__be16	data_length;
+	u8	pkt_flags;
+	u8	tx_chksum_flags;
+	u8	rx_chksum_flags;
+	u8	ip_sec_flags;
+	u32	tcp_seq_no;
+	u32	ip_sec_offload_handle;
+	u32	ip_sec_next_offload_handle;
+	u8	dest_mac_addr[6];
+	__be16	vlan;
+	u16	time_stamp;
+	u8	origin;
+	u8	connection_hash_and_valid;
+};
+
+#define VIPORT_TRAILER_ALIGNMENT	32
+
+#define BUFFER_SIZE(len)					\
+	(sizeof(struct viport_trailer) +			\
+	 ALIGN((len), VIPORT_TRAILER_ALIGNMENT))
+
+#define MAX_PAYLOAD(len)					\
+	ALIGN_DOWN((len) - sizeof(struct viport_trailer),	\
+		   VIPORT_TRAILER_ALIGNMENT)
+
+#endif	/* VNIC_TRAILER_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:19:25 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:49:25 +0530
Subject: [ofa-general] [PATCH 07/13] QLogic VNIC: Handling configurable
	parameters of the driver
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171925.31725.22023.stgit@localhost.localdomain>

From: Poornima Kamath <poornima.kamath at qlogic.com>

This patch adds the files that handle various configurable parameters
of the VNIC driver ---- configuration of virtual NIC, control, data 
connections to the EVIC and general IB connection parameters.

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c |  380 ++++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_config.h |  242 +++++++++++++++
 2 files changed, 622 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_config.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c
new file mode 100644
index 0000000..86d99b6
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.c
@@ -0,0 +1,380 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/string.h>
+#include <linux/utsname.h>
+#include <linux/if_vlan.h>
+
+#include <rdma/ib_cache.h>
+
+#include "vnic_util.h"
+#include "vnic_config.h"
+#include "vnic_trailer.h"
+#include "vnic_main.h"
+
+u16 vnic_max_mtu = MAX_MTU;
+
+static u32 default_no_path_timeout = DEFAULT_NO_PATH_TIMEOUT;
+static u32 sa_path_rec_get_timeout = SA_PATH_REC_GET_TIMEOUT;
+static u32 default_primary_reconnect_timeout =
+				    DEFAULT_PRIMARY_RECONNECT_TIMEOUT;
+static u32 default_primary_switch_timeout = DEFAULT_PRIMARY_SWITCH_TIMEOUT;
+static int default_prefer_primary         = DEFAULT_PREFER_PRIMARY;
+
+static int use_rx_csum = VNIC_USE_RX_CSUM;
+static int use_tx_csum = VNIC_USE_TX_CSUM;
+
+static u32 control_response_timeout = CONTROL_RSP_TIMEOUT;
+static u32 completion_limit = DEFAULT_COMPLETION_LIMIT;
+
+module_param(vnic_max_mtu, ushort, 0444);
+MODULE_PARM_DESC(vnic_max_mtu, "Maximum MTU size (1500-9500). Default is 9500");
+
+module_param(default_prefer_primary, bool, 0444);
+MODULE_PARM_DESC(default_prefer_primary, "Determines if primary path is"
+		 " preferred (1) or not (0). Defaults to 0");
+module_param(use_rx_csum, bool, 0444);
+MODULE_PARM_DESC(use_rx_csum, "Determines if RX checksum is done on VEx (1)"
+		 " or not (0). Defaults to 1");
+module_param(use_tx_csum, bool, 0444);
+MODULE_PARM_DESC(use_tx_csum, "Determines if TX checksum is done on VEx (1)"
+		 " or not (0). Defaults to 1");
+module_param(default_no_path_timeout, uint, 0444);
+MODULE_PARM_DESC(default_no_path_timeout, "Time to wait in milliseconds"
+		 " before reconnecting to VEx after connection loss");
+module_param(default_primary_reconnect_timeout, uint, 0444);
+MODULE_PARM_DESC(default_primary_reconnect_timeout,  "Time to wait in"
+		 " milliseconds before reconnecting the"
+		 " primary path to VEx");
+module_param(default_primary_switch_timeout, uint, 0444);
+MODULE_PARM_DESC(default_primary_switch_timeout, "Time to wait before"
+		 " switching back to primary path if"
+		 " primary path is preferred");
+module_param(sa_path_rec_get_timeout, uint, 0444);
+MODULE_PARM_DESC(sa_path_rec_get_timeout, "Time out value in milliseconds"
+		 " for SA path record get queries");
+
+module_param(control_response_timeout, uint, 0444);
+MODULE_PARM_DESC(control_response_timeout, "Time out value in milliseconds"
+		 " to wait for response to control requests");
+
+module_param(completion_limit, uint, 0444);
+MODULE_PARM_DESC(completion_limit, "Maximum completions to process"
+		" in a single completion callback invocation. Default is 100"
+		" Minimum value is 10");
+
+static void config_control_defaults(struct control_config *control_config,
+				    struct path_param *params)
+{
+	int len;
+	char *dot;
+	u64 sid;
+
+	sid = (SST_AGN << 56) | (SST_OUI << 32) | (CONTROL_PATH_ID << 8)
+	      |	IOC_NUMBER(be64_to_cpu(params->ioc_guid));
+
+	control_config->ib_config.service_id = cpu_to_be64(sid);
+	control_config->ib_config.conn_data.path_id = 0;
+	control_config->ib_config.conn_data.vnic_instance = params->instance;
+	control_config->ib_config.conn_data.path_num = 0;
+	control_config->ib_config.conn_data.features_supported =
+			__constant_cpu_to_be32((u32) (VNIC_FEAT_IGNORE_VLAN |
+						      VNIC_FEAT_RDMA_IMMED));
+	dot = strchr(init_utsname()->nodename, '.');
+
+	if (dot)
+		len = dot - init_utsname()->nodename;
+	else
+		len = strlen(init_utsname()->nodename);
+
+	if (len > VNIC_MAX_NODENAME_LEN)
+		len = VNIC_MAX_NODENAME_LEN;
+
+	memcpy(control_config->ib_config.conn_data.nodename,
+					init_utsname()->nodename, len);
+
+	if (params->ib_multicast == 1)
+		control_config->ib_multicast = 1;
+	else if (params->ib_multicast == 0)
+		control_config->ib_multicast = 0;
+	else {
+		/* parameter is not set - enable it by default */
+		control_config->ib_multicast = 1;
+		CONFIG_ERROR("IOCGUID=%llx INSTANCE=%d IB_MULTICAST defaulted"
+					" to TRUE\n", params->ioc_guid,
+					(char)params->instance);
+	}
+
+	if (control_config->ib_multicast)
+		control_config->ib_config.conn_data.features_supported |=
+			__constant_cpu_to_be32(VNIC_FEAT_INBOUND_IB_MC);
+
+	control_config->ib_config.retry_count = RETRY_COUNT;
+	control_config->ib_config.rnr_retry_count = RETRY_COUNT;
+	control_config->ib_config.min_rnr_timer = MIN_RNR_TIMER;
+
+	/* These values are not configurable*/
+	control_config->ib_config.num_recvs    = 5;
+	control_config->ib_config.num_sends    = 1;
+	control_config->ib_config.recv_scatter = 1;
+	control_config->ib_config.send_gather  = 1;
+	control_config->ib_config.completion_limit = completion_limit;
+
+	control_config->num_recvs = control_config->ib_config.num_recvs;
+
+	control_config->vnic_instance = params->instance;
+	control_config->max_address_entries = MAX_ADDRESS_ENTRIES;
+	control_config->min_address_entries = MIN_ADDRESS_ENTRIES;
+	control_config->rsp_timeout = msecs_to_jiffies(control_response_timeout);
+}
+
+static void config_data_defaults(struct data_config *data_config,
+				 struct path_param *params)
+{
+	u64 sid;
+
+	sid = (SST_AGN << 56) | (SST_OUI << 32) | (DATA_PATH_ID << 8)
+	      |	IOC_NUMBER(be64_to_cpu(params->ioc_guid));
+
+	data_config->ib_config.service_id = cpu_to_be64(sid);
+	data_config->ib_config.conn_data.path_id = jiffies; /* random */
+	data_config->ib_config.conn_data.vnic_instance = params->instance;
+	data_config->ib_config.conn_data.path_num = 0;
+
+	data_config->ib_config.retry_count = RETRY_COUNT;
+	data_config->ib_config.rnr_retry_count = RETRY_COUNT;
+	data_config->ib_config.min_rnr_timer = MIN_RNR_TIMER;
+
+	/*
+	 * NOTE: the num_recvs size assumes that the EIOC could
+	 * RDMA enough packets to fill all of the host recv
+	 * pool entries, plus send a kick message after each
+	 * packet, plus RDMA new buffers for the size of
+	 * the EIOC recv buffer pool, plus send kick messages
+	 * after each min_host_update_sz of new buffers all
+	 * before the host can even pull off the first completed
+	 * receive off the completion queue, and repost the
+	 * receive. NOT LIKELY!
+	 */
+	data_config->ib_config.num_recvs = HOST_RECV_POOL_ENTRIES +
+	    (MAX_EIOC_POOL_SZ / MIN_HOST_UPDATE_SZ);
+
+	data_config->ib_config.num_sends = (2 * NOTIFY_BUNDLE_SZ) +
+	    (HOST_RECV_POOL_ENTRIES / MIN_EIOC_UPDATE_SZ) + 1;
+
+	data_config->ib_config.recv_scatter = 1; /* not configurable */
+	data_config->ib_config.send_gather = 2;	 /* not configurable */
+	data_config->ib_config.completion_limit = completion_limit;
+
+	data_config->num_recvs = data_config->ib_config.num_recvs;
+	data_config->path_id = data_config->ib_config.conn_data.path_id;
+
+
+	data_config->host_recv_pool_entries = HOST_RECV_POOL_ENTRIES;
+
+	data_config->host_min.size_recv_pool_entry =
+			cpu_to_be32(BUFFER_SIZE(VLAN_ETH_HLEN + MIN_MTU));
+	data_config->host_max.size_recv_pool_entry =
+			cpu_to_be32(BUFFER_SIZE(VLAN_ETH_HLEN + vnic_max_mtu));
+	data_config->eioc_min.size_recv_pool_entry =
+			cpu_to_be32(BUFFER_SIZE(VLAN_ETH_HLEN + MIN_MTU));
+	data_config->eioc_max.size_recv_pool_entry =
+			__constant_cpu_to_be32(MAX_PARAM_VALUE);
+
+	data_config->host_min.num_recv_pool_entries =
+				__constant_cpu_to_be32(MIN_HOST_POOL_SZ);
+	data_config->host_max.num_recv_pool_entries =
+				__constant_cpu_to_be32(MAX_PARAM_VALUE);
+	data_config->eioc_min.num_recv_pool_entries =
+				__constant_cpu_to_be32(MIN_EIOC_POOL_SZ);
+	data_config->eioc_max.num_recv_pool_entries =
+				__constant_cpu_to_be32(MAX_EIOC_POOL_SZ);
+
+	data_config->host_min.timeout_before_kick =
+			__constant_cpu_to_be32(MIN_HOST_KICK_TIMEOUT);
+	data_config->host_max.timeout_before_kick =
+			__constant_cpu_to_be32(MAX_HOST_KICK_TIMEOUT);
+	data_config->eioc_min.timeout_before_kick = 0;
+	data_config->eioc_max.timeout_before_kick =
+			__constant_cpu_to_be32(MAX_PARAM_VALUE);
+
+	data_config->host_min.num_recv_pool_entries_before_kick =
+			__constant_cpu_to_be32(MIN_HOST_KICK_ENTRIES);
+	data_config->host_max.num_recv_pool_entries_before_kick =
+			__constant_cpu_to_be32(MAX_HOST_KICK_ENTRIES);
+	data_config->eioc_min.num_recv_pool_entries_before_kick = 0;
+	data_config->eioc_max.num_recv_pool_entries_before_kick =
+				__constant_cpu_to_be32(MAX_PARAM_VALUE);
+
+	data_config->host_min.num_recv_pool_bytes_before_kick =
+			__constant_cpu_to_be32(MIN_HOST_KICK_BYTES);
+	data_config->host_max.num_recv_pool_bytes_before_kick =
+			__constant_cpu_to_be32(MAX_HOST_KICK_BYTES);
+	data_config->eioc_min.num_recv_pool_bytes_before_kick = 0;
+	data_config->eioc_max.num_recv_pool_bytes_before_kick =
+				__constant_cpu_to_be32(MAX_PARAM_VALUE);
+
+	data_config->host_min.free_recv_pool_entries_per_update =
+				__constant_cpu_to_be32(MIN_HOST_UPDATE_SZ);
+	data_config->host_max.free_recv_pool_entries_per_update =
+				__constant_cpu_to_be32(MAX_HOST_UPDATE_SZ);
+	data_config->eioc_min.free_recv_pool_entries_per_update =
+				__constant_cpu_to_be32(MIN_EIOC_UPDATE_SZ);
+	data_config->eioc_max.free_recv_pool_entries_per_update =
+				__constant_cpu_to_be32(MAX_EIOC_UPDATE_SZ);
+
+	data_config->notify_bundle = NOTIFY_BUNDLE_SZ;
+}
+
+static void config_path_info_defaults(struct viport_config *config,
+				      struct path_param *params)
+{
+	int i;
+	ib_get_cached_gid(config->ibdev, config->port, 0,
+			  &config->path_info.path.sgid);
+	for (i = 0; i < 16; i++)
+		config->path_info.path.dgid.raw[i] = params->dgid[i];
+
+	config->path_info.path.pkey = params->pkey;
+	config->path_info.path.numb_path = 1;
+	config->sa_path_rec_get_timeout = sa_path_rec_get_timeout;
+
+}
+
+static void config_viport_defaults(struct viport_config *config,
+				      struct path_param *params)
+{
+	config->ibdev = params->ibdev;
+	config->port = params->port;
+	config->ioc_guid = params->ioc_guid;
+	config->stats_interval = msecs_to_jiffies(VIPORT_STATS_INTERVAL);
+	config->hb_interval = msecs_to_jiffies(VIPORT_HEARTBEAT_INTERVAL);
+	config->hb_timeout = VIPORT_HEARTBEAT_TIMEOUT * 1000;
+				/*hb_timeout needs to be in usec*/
+	strcpy(config->ioc_string, params->ioc_string);
+	config_path_info_defaults(config, params);
+
+	config_control_defaults(&config->control_config, params);
+	config_data_defaults(&config->data_config, params);
+}
+
+static void config_vnic_defaults(struct vnic_config *config)
+{
+	config->no_path_timeout = msecs_to_jiffies(default_no_path_timeout);
+	config->primary_connect_timeout =
+	    msecs_to_jiffies(DEFAULT_PRIMARY_CONNECT_TIMEOUT);
+	config->primary_reconnect_timeout =
+	    msecs_to_jiffies(default_primary_reconnect_timeout);
+	config->primary_switch_timeout =
+	    msecs_to_jiffies(default_primary_switch_timeout);
+	config->prefer_primary = default_prefer_primary;
+	config->use_rx_csum = use_rx_csum;
+	config->use_tx_csum = use_tx_csum;
+}
+
+struct viport_config *config_alloc_viport(struct path_param *params)
+{
+	struct viport_config *config;
+
+	config = kzalloc(sizeof *config, GFP_KERNEL);
+	if (!config) {
+		CONFIG_ERROR("could not allocate memory for"
+			     " struct viport_config\n");
+		return NULL;
+	}
+
+	config_viport_defaults(config, params);
+
+	return config;
+}
+
+struct vnic_config *config_alloc_vnic(void)
+{
+	struct vnic_config *config;
+
+	config = kzalloc(sizeof *config, GFP_KERNEL);
+	if (!config) {
+		CONFIG_ERROR("couldn't allocate memory for"
+			     " struct vnic_config\n");
+
+		return NULL;
+	}
+
+	config_vnic_defaults(config);
+	return config;
+}
+
+char *config_viport_name(struct viport_config *config)
+{
+	/* function only called by one thread, can return a static string */
+	static char str[64];
+
+	sprintf(str, "GUID %llx instance %d",
+		be64_to_cpu(config->ioc_guid),
+		config->control_config.vnic_instance);
+	return str;
+}
+
+int config_start(void)
+{
+	vnic_max_mtu = min_t(u16, vnic_max_mtu, MAX_MTU);
+	vnic_max_mtu = max_t(u16, vnic_max_mtu, MIN_MTU);
+
+	sa_path_rec_get_timeout = min_t(u32, sa_path_rec_get_timeout,
+					MAX_SA_TIMEOUT);
+	sa_path_rec_get_timeout = max_t(u32, sa_path_rec_get_timeout,
+					MIN_SA_TIMEOUT);
+
+	control_response_timeout = min_t(u32, control_response_timeout,
+					 MAX_CONTROL_RSP_TIMEOUT);
+
+	control_response_timeout = max_t(u32, control_response_timeout,
+					 MIN_CONTROL_RSP_TIMEOUT);
+
+	completion_limit	 = max_t(u32, completion_limit,
+					 MIN_COMPLETION_LIMIT);
+
+	if (!default_no_path_timeout)
+		default_no_path_timeout = DEFAULT_NO_PATH_TIMEOUT;
+
+	if (!default_primary_reconnect_timeout)
+		default_primary_reconnect_timeout =
+					 DEFAULT_PRIMARY_RECONNECT_TIMEOUT;
+
+	if (!default_primary_switch_timeout)
+		default_primary_switch_timeout =
+					DEFAULT_PRIMARY_SWITCH_TIMEOUT;
+
+	return 0;
+
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h
new file mode 100644
index 0000000..c5b00b9
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_config.h
@@ -0,0 +1,242 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_CONFIG_H_INCLUDED
+#define VNIC_CONFIG_H_INCLUDED
+
+#include <rdma/ib_verbs.h>
+#include <linux/types.h>
+#include <linux/if.h>
+
+#include "vnic_control.h"
+#include "vnic_ib.h"
+
+#define SST_AGN         0x10ULL
+#define SST_OUI         0x00066AULL
+
+enum {
+	CONTROL_PATH_ID = 0x0,
+	DATA_PATH_ID    = 0x1
+};
+
+#define IOC_NUMBER(GUID)        (((GUID) >> 32) & 0xFF)
+
+enum {
+	VNIC_CLASS_SUBCLASS	= 0x2000066A,
+	VNIC_PROTOCOL		= 0,
+	VNIC_PROT_VERSION	= 1
+};
+
+enum {
+	MIN_MTU	= 1500,	/* minimum negotiated MTU size */
+	MAX_MTU	= 9500	/* jumbo frame */
+};
+
+/*
+ * TODO: tune the pool parameter values
+ */
+enum {
+	MIN_ADDRESS_ENTRIES = 16,
+	MAX_ADDRESS_ENTRIES = 64
+};
+
+enum {
+	HOST_RECV_POOL_ENTRIES	= 512,
+	MIN_HOST_POOL_SZ	= 64,
+	MIN_EIOC_POOL_SZ	= 64,
+	MAX_EIOC_POOL_SZ	= 256,
+	MIN_HOST_UPDATE_SZ	= 8,
+	MAX_HOST_UPDATE_SZ	= 32,
+	MIN_EIOC_UPDATE_SZ	= 8,
+	MAX_EIOC_UPDATE_SZ	= 32,
+	NOTIFY_BUNDLE_SZ	= 32
+};
+
+enum {
+	MIN_HOST_KICK_TIMEOUT = 10,	/* in usec */
+	MAX_HOST_KICK_TIMEOUT = 100	/* in usec */
+};
+
+enum {
+	MIN_HOST_KICK_ENTRIES = 1,
+	MAX_HOST_KICK_ENTRIES = 128
+};
+
+enum {
+	MIN_HOST_KICK_BYTES = 0,
+	MAX_HOST_KICK_BYTES = 5000
+};
+
+enum {
+	DEFAULT_NO_PATH_TIMEOUT			= 10000,
+	DEFAULT_PRIMARY_CONNECT_TIMEOUT		= 10000,
+	DEFAULT_PRIMARY_RECONNECT_TIMEOUT	= 10000,
+	DEFAULT_PRIMARY_SWITCH_TIMEOUT		= 10000
+};
+
+enum {
+	VIPORT_STATS_INTERVAL		= 500,	/* .5 sec */
+	VIPORT_HEARTBEAT_INTERVAL	= 1000,	/* 1 second */
+	VIPORT_HEARTBEAT_TIMEOUT	= 64000	/* 64 sec */
+};
+
+enum {
+	/* 5 sec increased for EVIC support for large number of
+	 * host connections
+	 */
+	CONTROL_RSP_TIMEOUT		= 5000,
+	MIN_CONTROL_RSP_TIMEOUT		= 1000,	/* 1  sec */
+	MAX_CONTROL_RSP_TIMEOUT		= 60000	/* 60 sec */
+};
+
+/* Maximum number of completions to be processed
+ * during a single completion callback invocation
+ */
+enum {
+	DEFAULT_COMPLETION_LIMIT 	= 100,
+	MIN_COMPLETION_LIMIT		= 10
+};
+
+/* infiniband connection parameters */
+enum {
+	RETRY_COUNT		= 3,
+	MIN_RNR_TIMER		= 22,	/* 20 ms */
+	DEFAULT_PKEY		= 0	/* pkey table index */
+};
+
+enum {
+	SA_PATH_REC_GET_TIMEOUT	= 1000,	/* 1000 ms */
+	MIN_SA_TIMEOUT		= 100,	/* 100 ms */
+	MAX_SA_TIMEOUT		= 20000	/* 20s */
+};
+
+#define MAX_PARAM_VALUE                 0x40000000
+#define VNIC_USE_RX_CSUM		1
+#define VNIC_USE_TX_CSUM		1
+#define	DEFAULT_PREFER_PRIMARY		0
+
+/* As per IBTA specification, IOCString Maximum length can be 512 bits. */
+#define MAX_IOC_STRING_LEN 		(512/8)
+
+struct path_param {
+	__be64			ioc_guid;
+	u8			ioc_string[MAX_IOC_STRING_LEN+1];
+	u8			port;
+	u8			instance;
+	struct ib_device	*ibdev;
+	struct vnic_ib_port	*ibport;
+	char			name[IFNAMSIZ];
+	u8			dgid[16];
+	__be16			pkey;
+	int			rx_csum;
+	int			tx_csum;
+	int			heartbeat;
+	int			ib_multicast;
+};
+
+struct vnic_ib_config {
+	__be64				service_id;
+	struct vnic_connection_data	conn_data;
+	u32				retry_count;
+	u32				rnr_retry_count;
+	u8				min_rnr_timer;
+	u32				num_sends;
+	u32				num_recvs;
+	u32				recv_scatter;	/* 1 */
+	u32				send_gather;	/* 1 or 2 */
+	u32				completion_limit;
+};
+
+struct control_config {
+	struct vnic_ib_config	ib_config;
+	u32			num_recvs;
+	u8			vnic_instance;
+	u16			max_address_entries;
+	u16			min_address_entries;
+	u32			rsp_timeout;
+	u32			ib_multicast;
+};
+
+struct data_config {
+	struct vnic_ib_config		ib_config;
+	u64				path_id;
+	u32				num_recvs;
+	u32				host_recv_pool_entries;
+	struct vnic_recv_pool_config	host_min;
+	struct vnic_recv_pool_config	host_max;
+	struct vnic_recv_pool_config	eioc_min;
+	struct vnic_recv_pool_config	eioc_max;
+	u32				notify_bundle;
+};
+
+struct viport_config {
+	struct viport			*viport;
+	struct control_config		control_config;
+	struct data_config		data_config;
+	struct vnic_ib_path_info	path_info;
+	u32				sa_path_rec_get_timeout;
+	struct ib_device		*ibdev;
+	u32				port;
+	u32				stats_interval;
+	u32				hb_interval;
+	u32				hb_timeout;
+	__be64				ioc_guid;
+	u8				ioc_string[MAX_IOC_STRING_LEN+1];
+	size_t				path_idx;
+};
+
+/*
+ * primary_connect_timeout   - if the secondary connects first,
+ *                             how long do we give the primary?
+ * primary_reconnect_timeout - same as above, but used when recovering
+ *                             from the case where both paths fail
+ * primary_switch_timeout -    how long do we wait before switching to the
+ *                             primary when it comes back?
+ */
+struct vnic_config {
+	struct vnic	*vnic;
+	char		name[IFNAMSIZ];
+	u32		no_path_timeout;
+	u32 		primary_connect_timeout;
+	u32		primary_reconnect_timeout;
+	u32		primary_switch_timeout;
+	int		prefer_primary;
+	int		use_rx_csum;
+	int		use_tx_csum;
+};
+
+int config_start(void);
+struct viport_config *config_alloc_viport(struct path_param *params);
+struct vnic_config   *config_alloc_vnic(void);
+char *config_viport_name(struct viport_config *config);
+
+#endif	/* VNIC_CONFIG_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:18:55 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:48:55 +0530
Subject: [ofa-general] [PATCH 06/13] QLogic VNIC: IB core stack interaction
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171855.31725.89658.stgit@localhost.localdomain>

From: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>

The patch implements the interaction of the QLogic VNIC driver with
the underlying core infiniband stack.

Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c | 1046 ++++++++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h |  206 ++++++
 2 files changed, 1252 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c
new file mode 100644
index 0000000..3bf6455
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.c
@@ -0,0 +1,1046 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/string.h>
+#include <linux/random.h>
+#include <linux/netdevice.h>
+#include <linux/list.h>
+#include <rdma/ib_cache.h>
+
+#include "vnic_util.h"
+#include "vnic_data.h"
+#include "vnic_config.h"
+#include "vnic_ib.h"
+#include "vnic_viport.h"
+#include "vnic_sys.h"
+#include "vnic_main.h"
+#include "vnic_stats.h"
+
+static int vnic_ib_inited;
+static void vnic_add_one(struct ib_device *device);
+static void vnic_remove_one(struct ib_device *device);
+static int vnic_defer_completion(void *ptr);
+
+static int vnic_ib_mc_init_qp(struct mc_data *mc_data,
+		struct vnic_ib_config *config,
+		struct ib_pd *pd,
+		struct viport_config *viport_config);
+
+static struct ib_client vnic_client = {
+	.name = "vnic",
+	.add = vnic_add_one,
+	.remove = vnic_remove_one
+};
+
+struct ib_sa_client vnic_sa_client;
+
+int vnic_ib_init(void)
+{
+	int ret = -1;
+
+	IB_FUNCTION("vnic_ib_init()\n");
+
+	/* class has to be registered before
+	 * calling ib_register_client() because, that call
+	 * will trigger vnic_add_port() which will register
+	 * class_device for the port with the parent class
+	 * as vnic_class
+	 */
+	ret = class_register(&vnic_class);
+	if (ret) {
+		printk(KERN_ERR PFX "couldn't register class"
+		       " infiniband_qlgc_vnic; error %d", ret);
+		goto out;
+	}
+
+	ib_sa_register_client(&vnic_sa_client);
+	ret = ib_register_client(&vnic_client);
+	if (ret) {
+		printk(KERN_ERR PFX "couldn't register IB client;"
+		       " error %d", ret);
+		goto err_ib_reg;
+	}
+
+	interface_dev.dev.class = &vnic_class;
+	interface_dev.dev.release = vnic_release_dev;
+	snprintf(interface_dev.dev.bus_id,
+		 BUS_ID_SIZE, "interfaces");
+	init_completion(&interface_dev.released);
+	ret = device_register(&interface_dev.dev);
+	if (ret) {
+		printk(KERN_ERR PFX "couldn't register class interfaces;"
+		       " error %d", ret);
+		goto err_class_dev;
+	}
+	ret = device_create_file(&interface_dev.dev,
+				       &dev_attr_delete_vnic);
+	if (ret) {
+		printk(KERN_ERR PFX "couldn't create class file"
+		       " 'delete_vnic'; error %d", ret);
+		goto err_class_file;
+	}
+
+	vnic_ib_inited = 1;
+
+	return ret;
+err_class_file:
+	device_unregister(&interface_dev.dev);
+err_class_dev:
+	ib_unregister_client(&vnic_client);
+err_ib_reg:
+	ib_sa_unregister_client(&vnic_sa_client);
+	class_unregister(&vnic_class);
+out:
+	return ret;
+}
+
+static struct vnic_ib_port *vnic_add_port(struct vnic_ib_device *device,
+					  u8 port_num)
+{
+	struct vnic_ib_port *port;
+
+	port = kzalloc(sizeof *port, GFP_KERNEL);
+	if (!port)
+		return NULL;
+
+	init_completion(&port->pdev_info.released);
+	port->dev = device;
+	port->port_num = port_num;
+
+	port->pdev_info.dev.class = &vnic_class;
+	port->pdev_info.dev.parent = NULL;
+	port->pdev_info.dev.release = vnic_release_dev;
+	snprintf(port->pdev_info.dev.bus_id, BUS_ID_SIZE,
+		 "vnic-%s-%d", device->dev->name, port_num);
+
+	if (device_register(&port->pdev_info.dev))
+		goto free_port;
+
+	if (device_create_file(&port->pdev_info.dev,
+				     &dev_attr_create_primary))
+		goto err_class;
+	if (device_create_file(&port->pdev_info.dev,
+				     &dev_attr_create_secondary))
+		goto err_class;
+
+	return port;
+err_class:
+	device_unregister(&port->pdev_info.dev);
+free_port:
+	kfree(port);
+
+	return NULL;
+}
+
+static void vnic_add_one(struct ib_device *device)
+{
+	struct vnic_ib_device *vnic_dev;
+	struct vnic_ib_port *port;
+	int s, e, p;
+
+	vnic_dev = kmalloc(sizeof *vnic_dev, GFP_KERNEL);
+	if (!vnic_dev)
+		return;
+
+	vnic_dev->dev = device;
+	INIT_LIST_HEAD(&vnic_dev->port_list);
+
+	if (device->node_type == RDMA_NODE_IB_SWITCH) {
+		s = 0;
+		e = 0;
+
+	} else {
+		s = 1;
+		e = device->phys_port_cnt;
+
+	}
+
+	for (p = s; p <= e; p++) {
+		port = vnic_add_port(vnic_dev, p);
+		if (port)
+			list_add_tail(&port->list, &vnic_dev->port_list);
+	}
+
+	ib_set_client_data(device, &vnic_client, vnic_dev);
+
+}
+
+static void vnic_remove_one(struct ib_device *device)
+{
+	struct vnic_ib_device *vnic_dev;
+	struct vnic_ib_port *port, *tmp_port;
+
+	vnic_dev = ib_get_client_data(device, &vnic_client);
+	list_for_each_entry_safe(port, tmp_port,
+				 &vnic_dev->port_list, list) {
+		device_unregister(&port->pdev_info.dev);
+		/*
+		 * wait for sysfs entries to go away, so that no new vnics
+		 * are created
+		 */
+		wait_for_completion(&port->pdev_info.released);
+		kfree(port);
+
+	}
+	kfree(vnic_dev);
+
+	/* TODO Only those vnic interfaces associated with
+	 * the HCA whose remove event is called should be freed
+	 * Currently all the vnic interfaces are freed
+	 */
+
+	while (!list_empty(&vnic_list)) {
+		struct vnic *vnic =
+		    list_entry(vnic_list.next, struct vnic, list_ptrs);
+		vnic_free(vnic);
+	}
+
+	vnic_npevent_cleanup();
+	viport_cleanup();
+
+}
+
+void vnic_ib_cleanup(void)
+{
+	IB_FUNCTION("vnic_ib_cleanup()\n");
+
+	if (!vnic_ib_inited)
+		return;
+
+	device_unregister(&interface_dev.dev);
+	wait_for_completion(&interface_dev.released);
+
+	ib_unregister_client(&vnic_client);
+	ib_sa_unregister_client(&vnic_sa_client);
+	class_unregister(&vnic_class);
+}
+
+static void vnic_path_rec_completion(int status,
+				     struct ib_sa_path_rec *pathrec,
+				     void *context)
+{
+	struct vnic_ib_path_info *p = context;
+	p->status = status;
+	if (!status)
+		p->path = *pathrec;
+
+	complete(&p->done);
+}
+
+int vnic_ib_get_path(struct netpath *netpath, struct vnic *vnic)
+{
+	struct viport_config *config = netpath->viport->config;
+	int ret = 0;
+
+	init_completion(&config->path_info.done);
+	IB_INFO("Using SA path rec get time out value of %d\n",
+	       config->sa_path_rec_get_timeout);
+	config->path_info.path_query_id =
+			 ib_sa_path_rec_get(&vnic_sa_client,
+					    config->ibdev,
+					    config->port,
+					    &config->path_info.path,
+					    IB_SA_PATH_REC_DGID      |
+					    IB_SA_PATH_REC_SGID      |
+					    IB_SA_PATH_REC_NUMB_PATH |
+					    IB_SA_PATH_REC_PKEY,
+					    config->sa_path_rec_get_timeout,
+					    GFP_KERNEL,
+					    vnic_path_rec_completion,
+					    &config->path_info,
+					    &config->path_info.path_query);
+
+	if (config->path_info.path_query_id < 0) {
+		IB_ERROR("SA path record query failed; error %d\n",
+			 config->path_info.path_query_id);
+		ret = config->path_info.path_query_id;
+		goto out;
+	}
+
+	wait_for_completion(&config->path_info.done);
+
+	if (config->path_info.status < 0) {
+		printk(KERN_WARNING PFX "connection not available to dgid "
+		       "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x",
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[0]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[2]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[4]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[6]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[8]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[10]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[12]),
+		       (int)be16_to_cpu(*(__be16 *) &config->path_info.path.
+					dgid.raw[14]));
+
+		if (config->path_info.status == -ETIMEDOUT)
+			printk(KERN_INFO " path query timed out\n");
+		else if (config->path_info.status == -EIO)
+			printk(KERN_INFO " path query sending error\n");
+		else
+			printk(KERN_INFO " error %d\n",
+			       config->path_info.status);
+
+		ret = config->path_info.status;
+	}
+out:
+	if (ret)
+		netpath_timer(netpath, vnic->config->no_path_timeout);
+
+	return ret;
+}
+
+static inline void vnic_ib_handle_completions(struct ib_wc *wc,
+					      struct vnic_ib_conn *ib_conn,
+					      u32 *comp_num,
+					      cycles_t *comp_time)
+{
+	struct io *io;
+
+	io = (struct io *)(wc->wr_id);
+	vnic_ib_comp_stats(ib_conn, comp_num);
+	if (wc->status) {
+		IB_INFO("completion error  wc.status %d"
+			 " wc.opcode %d vendor err 0x%x\n",
+			  wc->status, wc->opcode, wc->vendor_err);
+	} else if (io) {
+		vnic_ib_io_stats(io, ib_conn, *comp_time);
+		if (io->type == RECV_UD) {
+			struct ud_recv_io *recv_io =
+				container_of(io, struct ud_recv_io, io);
+			recv_io->len = wc->byte_len;
+		}
+		if (io->routine)
+			(*io->routine) (io);
+	}
+}
+
+static void ib_qp_event(struct ib_event *event, void *context)
+{
+	IB_ERROR("QP event %d\n", event->event);
+}
+
+static void vnic_ib_completion(struct ib_cq *cq, void *ptr)
+{
+	struct vnic_ib_conn *ib_conn = ptr;
+	unsigned long	 flags;
+	int compl_received;
+	struct ib_wc wc;
+	cycles_t  comp_time;
+	u32  comp_num = 0;
+
+	/* for multicast, cm_id is NULL, so skip that test */
+	if (ib_conn->cm_id &&
+	    (ib_conn->state != IB_CONN_CONNECTED))
+		return;
+
+	/* Check if completion processing is taking place in thread
+	 * If not then process completions in this handler,
+	 * else set compl_received if not set, to indicate that
+	 * there are more completions to process in thread.
+	 */
+
+	spin_lock_irqsave(&ib_conn->compl_received_lock, flags);
+	compl_received = ib_conn->compl_received;
+	spin_unlock_irqrestore(&ib_conn->compl_received_lock, flags);
+
+	if (ib_conn->in_thread || compl_received) {
+		if (!compl_received) {
+			spin_lock_irqsave(&ib_conn->compl_received_lock, flags);
+			ib_conn->compl_received = 1;
+			spin_unlock_irqrestore(&ib_conn->compl_received_lock,
+									flags);
+		}
+		wake_up(&(ib_conn->callback_wait_queue));
+	} else {
+		vnic_ib_note_comptime_stats(&comp_time);
+		vnic_ib_callback_stats(ib_conn);
+		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+		while (ib_poll_cq(cq, 1, &wc) > 0) {
+			vnic_ib_handle_completions(&wc, ib_conn, &comp_num,
+								 &comp_time);
+			if (ib_conn->cm_id &&
+				 ib_conn->state != IB_CONN_CONNECTED)
+				break;
+
+			/* If we get more completions than the completion limit
+			 * defer completion to the thread
+			 */
+			if ((!ib_conn->in_thread) &&
+			    (comp_num >= ib_conn->ib_config->completion_limit)) {
+				ib_conn->in_thread = 1;
+				spin_lock_irqsave(
+					&ib_conn->compl_received_lock, flags);
+				ib_conn->compl_received = 1;
+				spin_unlock_irqrestore(
+					&ib_conn->compl_received_lock, flags);
+				wake_up(&(ib_conn->callback_wait_queue));
+				break;
+			}
+
+		}
+		vnic_ib_maxio_stats(ib_conn, comp_num);
+	}
+}
+
+static int vnic_ib_mod_qp_to_rts(struct ib_cm_id *cm_id,
+			     struct vnic_ib_conn *ib_conn)
+{
+	int attr_mask = 0;
+	int ret;
+	struct ib_qp_attr *qp_attr = NULL;
+
+	qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL);
+	if (!qp_attr)
+		return -ENOMEM;
+
+	qp_attr->qp_state = IB_QPS_RTR;
+
+	ret = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask);
+	if (ret)
+		goto out;
+
+	ret = ib_modify_qp(ib_conn->qp, qp_attr, attr_mask);
+	if (ret)
+		goto out;
+
+	IB_INFO("QP RTR\n");
+
+	qp_attr->qp_state = IB_QPS_RTS;
+
+	ret = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask);
+	if (ret)
+		goto out;
+
+	ret = ib_modify_qp(ib_conn->qp, qp_attr, attr_mask);
+	if (ret)
+		goto out;
+
+	IB_INFO("QP RTS\n");
+
+	ret = ib_send_cm_rtu(cm_id, NULL, 0);
+	if (ret)
+		goto out;
+out:
+	kfree(qp_attr);
+	return ret;
+}
+
+int vnic_ib_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
+{
+	struct vnic_ib_conn *ib_conn = cm_id->context;
+	struct viport *viport = ib_conn->viport;
+	int err = 0;
+
+	switch (event->event) {
+	case IB_CM_REQ_ERROR:
+		IB_ERROR("sending CM REQ failed\n");
+		err = 1;
+		viport->retry = 1;
+		break;
+	case IB_CM_REP_RECEIVED:
+		IB_INFO("CM REP recvd\n");
+		if (vnic_ib_mod_qp_to_rts(cm_id, ib_conn))
+			err = 1;
+		else {
+			ib_conn->state = IB_CONN_CONNECTED;
+			vnic_ib_connected_time_stats(ib_conn);
+			IB_INFO("RTU SENT\n");
+		}
+		break;
+	case IB_CM_REJ_RECEIVED:
+		printk(KERN_ERR PFX " CM rejected control connection\n");
+		if (event->param.rej_rcvd.reason ==
+		    IB_CM_REJ_INVALID_SERVICE_ID)
+			printk(KERN_ERR "reason: invalid service ID. "
+			       "IOCGUID value specified may be incorrect\n");
+		else
+			printk(KERN_ERR "reason code : 0x%x\n",
+			       event->param.rej_rcvd.reason);
+
+		err = 1;
+		viport->retry = 1;
+		break;
+	case IB_CM_MRA_RECEIVED:
+		IB_INFO("CM MRA received\n");
+		break;
+
+	case IB_CM_DREP_RECEIVED:
+		IB_INFO("CM DREP recvd\n");
+		ib_conn->state = IB_CONN_DISCONNECTED;
+		break;
+
+	case IB_CM_TIMEWAIT_EXIT:
+		IB_ERROR("CM timewait exit\n");
+		err = 1;
+		break;
+
+	default:
+		IB_INFO("unhandled CM event %d\n", event->event);
+		break;
+
+	}
+
+	if (err) {
+		ib_conn->state = IB_CONN_DISCONNECTED;
+		viport_failure(viport);
+	}
+
+	viport_kick(viport);
+	return 0;
+}
+
+
+int vnic_ib_cm_connect(struct vnic_ib_conn *ib_conn)
+{
+	struct ib_cm_req_param	*req = NULL;
+	struct viport		*viport;
+	int 			ret = -1;
+
+	if (!vnic_ib_conn_initted(ib_conn)) {
+		IB_ERROR("IB Connection out of state for CM connect (%d)\n",
+			 ib_conn->state);
+		return -EINVAL;
+	}
+
+	vnic_ib_conntime_stats(ib_conn);
+	req = kzalloc(sizeof *req, GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	viport	= ib_conn->viport;
+
+	req->primary_path	= &viport->config->path_info.path;
+	req->alternate_path	= NULL;
+	req->qp_num		= ib_conn->qp->qp_num;
+	req->qp_type		= ib_conn->qp->qp_type;
+	req->service_id 	= ib_conn->ib_config->service_id;
+	req->private_data	= &ib_conn->ib_config->conn_data;
+	req->private_data_len	= sizeof(struct vnic_connection_data);
+	req->flow_control	= 1;
+
+	get_random_bytes(&req->starting_psn, 4);
+	req->starting_psn &= 0xffffff;
+
+	/*
+	 * Both responder_resources and initiator_depth are set to zero
+	 * as we do not need RDMA read.
+	 *
+	 * They also must be set to zero, otherwise data connections
+	 * are rejected by VEx.
+	 */
+	req->responder_resources 	= 0;
+	req->initiator_depth		= 0;
+	req->remote_cm_response_timeout = 20;
+	req->local_cm_response_timeout  = 20;
+	req->retry_count		= ib_conn->ib_config->retry_count;
+	req->rnr_retry_count		= ib_conn->ib_config->rnr_retry_count;
+	req->max_cm_retries		= 15;
+
+	ib_conn->state = IB_CONN_CONNECTING;
+
+	ret = ib_send_cm_req(ib_conn->cm_id, req);
+
+	kfree(req);
+
+	if (ret) {
+		IB_ERROR("CM REQ sending failed; error %d \n", ret);
+		ib_conn->state = IB_CONN_DISCONNECTED;
+	}
+
+	return ret;
+}
+
+static int vnic_ib_init_qp(struct vnic_ib_conn *ib_conn,
+			   struct vnic_ib_config *config,
+			   struct ib_pd	*pd,
+			   struct viport_config *viport_config)
+{
+	struct ib_qp_init_attr	*init_attr;
+	struct ib_qp_attr	*attr;
+	int			ret;
+
+	init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
+	if (!init_attr)
+		return -ENOMEM;
+
+	init_attr->event_handler	= ib_qp_event;
+	init_attr->cap.max_send_wr	= config->num_sends;
+	init_attr->cap.max_recv_wr	= config->num_recvs;
+	init_attr->cap.max_recv_sge	= config->recv_scatter;
+	init_attr->cap.max_send_sge	= config->send_gather;
+	init_attr->sq_sig_type		= IB_SIGNAL_ALL_WR;
+	init_attr->qp_type		= IB_QPT_RC;
+	init_attr->send_cq		= ib_conn->cq;
+	init_attr->recv_cq		= ib_conn->cq;
+
+	ib_conn->qp = ib_create_qp(pd, init_attr);
+
+	if (IS_ERR(ib_conn->qp)) {
+		ret = -1;
+		IB_ERROR("could not create QP\n");
+		goto free_init_attr;
+	}
+
+	attr = kmalloc(sizeof *attr, GFP_KERNEL);
+	if (!attr) {
+		ret = -ENOMEM;
+		goto destroy_qp;
+	}
+
+	ret = ib_find_cached_pkey(viport_config->ibdev,
+				  viport_config->port,
+				  be16_to_cpu(viport_config->path_info.path.
+					      pkey),
+				  &attr->pkey_index);
+	if (ret) {
+		printk(KERN_WARNING PFX "ib_find_cached_pkey() failed; "
+		       "error %d\n", ret);
+		goto freeattr;
+	}
+
+	attr->qp_state		= IB_QPS_INIT;
+	attr->qp_access_flags	= IB_ACCESS_REMOTE_WRITE;
+	attr->port_num		= viport_config->port;
+
+	ret = ib_modify_qp(ib_conn->qp, attr,
+			   IB_QP_STATE |
+			   IB_QP_PKEY_INDEX |
+			   IB_QP_ACCESS_FLAGS | IB_QP_PORT);
+	if (ret) {
+		printk(KERN_WARNING PFX "could not modify QP; error %d \n",
+		       ret);
+		goto freeattr;
+	}
+
+	kfree(attr);
+	kfree(init_attr);
+	return ret;
+
+freeattr:
+	kfree(attr);
+destroy_qp:
+	ib_destroy_qp(ib_conn->qp);
+free_init_attr:
+	kfree(init_attr);
+	return ret;
+}
+
+int vnic_ib_conn_init(struct vnic_ib_conn *ib_conn, struct viport *viport,
+		      struct ib_pd *pd, struct vnic_ib_config *config)
+{
+	struct viport_config	*viport_config = viport->config;
+	int		ret = -1;
+	unsigned int	cq_size = config->num_sends + config->num_recvs;
+
+
+	if (!vnic_ib_conn_uninitted(ib_conn)) {
+		IB_ERROR("IB Connection out of state for init (%d)\n",
+			 ib_conn->state);
+		return -EINVAL;
+	}
+
+	ib_conn->cq = ib_create_cq(viport_config->ibdev, vnic_ib_completion,
+#ifdef BUILD_FOR_OFED_1_2
+				   NULL, ib_conn, cq_size);
+#else
+				   NULL, ib_conn, cq_size, 0);
+#endif
+	if (IS_ERR(ib_conn->cq)) {
+		IB_ERROR("could not create CQ\n");
+		goto out;
+	}
+
+	IB_INFO("cq created %p %d\n", ib_conn->cq, cq_size);
+	ib_req_notify_cq(ib_conn->cq, IB_CQ_NEXT_COMP);
+	init_waitqueue_head(&(ib_conn->callback_wait_queue));
+	init_completion(&(ib_conn->callback_thread_exit));
+
+	spin_lock_init(&ib_conn->compl_received_lock);
+
+	ib_conn->callback_thread = kthread_run(vnic_defer_completion, ib_conn,
+						"qlgc_vnic_def_compl");
+	if (IS_ERR(ib_conn->callback_thread)) {
+		IB_ERROR("Could not create vnic_callback_thread;"
+			" error %d\n", (int) PTR_ERR(ib_conn->callback_thread));
+		ib_conn->callback_thread = NULL;
+		goto destroy_cq;
+	}
+
+	ret = vnic_ib_init_qp(ib_conn, config, pd, viport_config);
+
+	if (ret)
+		goto destroy_thread;
+
+	spin_lock_init(&ib_conn->conn_lock);
+	ib_conn->state = IB_CONN_INITTED;
+
+	return ret;
+
+destroy_thread:
+	completion_callback_cleanup(ib_conn);
+destroy_cq:
+	ib_destroy_cq(ib_conn->cq);
+out:
+	return ret;
+}
+
+int vnic_ib_post_recv(struct vnic_ib_conn *ib_conn, struct io *io)
+{
+	cycles_t		post_time;
+	struct ib_recv_wr	*bad_wr;
+	int			ret = -1;
+	unsigned long		flags;
+
+	IB_FUNCTION("vnic_ib_post_recv()\n");
+
+	spin_lock_irqsave(&ib_conn->conn_lock, flags);
+
+	if (!vnic_ib_conn_initted(ib_conn) &&
+	    !vnic_ib_conn_connected(ib_conn)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	vnic_ib_pre_rcvpost_stats(ib_conn, io, &post_time);
+	io->type = RECV;
+	ret = ib_post_recv(ib_conn->qp, &io->rwr, &bad_wr);
+	if (ret) {
+		IB_ERROR("error in posting rcv wr; error %d\n", ret);
+		ib_conn->state = IB_CONN_ERRORED;
+		goto out;
+	}
+
+	vnic_ib_post_rcvpost_stats(ib_conn, post_time);
+out:
+	spin_unlock_irqrestore(&ib_conn->conn_lock, flags);
+	return ret;
+
+}
+
+int vnic_ib_post_send(struct vnic_ib_conn *ib_conn, struct io *io)
+{
+	cycles_t		post_time;
+	unsigned long		flags;
+	struct ib_send_wr	*bad_wr;
+	int			ret = -1;
+
+	IB_FUNCTION("vnic_ib_post_send()\n");
+
+	spin_lock_irqsave(&ib_conn->conn_lock, flags);
+	if (!vnic_ib_conn_connected(ib_conn)) {
+		IB_ERROR("IB Connection out of state for"
+			 " posting sends (%d)\n", ib_conn->state);
+		goto out;
+	}
+
+	vnic_ib_pre_sendpost_stats(io, &post_time);
+	if (io->swr.opcode == IB_WR_RDMA_WRITE)
+		io->type = RDMA;
+	else
+		io->type = SEND;
+
+	ret = ib_post_send(ib_conn->qp, &io->swr, &bad_wr);
+	if (ret) {
+		IB_ERROR("error in posting send wr; error %d\n", ret);
+		ib_conn->state = IB_CONN_ERRORED;
+		goto out;
+	}
+
+	vnic_ib_post_sendpost_stats(ib_conn, io, post_time);
+out:
+	spin_unlock_irqrestore(&ib_conn->conn_lock, flags);
+	return ret;
+}
+
+static int vnic_defer_completion(void *ptr)
+{
+	struct vnic_ib_conn *ib_conn = ptr;
+	struct ib_wc wc;
+	struct ib_cq *cq = ib_conn->cq;
+	cycles_t 	 comp_time;
+	u32              comp_num = 0;
+	unsigned long	flags;
+
+	while (!ib_conn->callback_thread_end) {
+		wait_event_interruptible(ib_conn->callback_wait_queue,
+					 ib_conn->compl_received ||
+					 ib_conn->callback_thread_end);
+		ib_conn->in_thread = 1;
+		spin_lock_irqsave(&ib_conn->compl_received_lock, flags);
+		ib_conn->compl_received = 0;
+		spin_unlock_irqrestore(&ib_conn->compl_received_lock, flags);
+		if (ib_conn->cm_id &&
+		    ib_conn->state != IB_CONN_CONNECTED)
+			goto out_thread;
+
+		vnic_ib_note_comptime_stats(&comp_time);
+		vnic_ib_callback_stats(ib_conn);
+		ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
+		while (ib_poll_cq(cq, 1, &wc) > 0) {
+			vnic_ib_handle_completions(&wc, ib_conn, &comp_num,
+								 &comp_time);
+			if (ib_conn->cm_id &&
+				 ib_conn->state != IB_CONN_CONNECTED)
+				break;
+		}
+		vnic_ib_maxio_stats(ib_conn, comp_num);
+out_thread:
+		ib_conn->in_thread = 0;
+	}
+	complete_and_exit(&(ib_conn->callback_thread_exit), 0);
+	return 0;
+}
+
+void completion_callback_cleanup(struct vnic_ib_conn *ib_conn)
+{
+	if (ib_conn->callback_thread) {
+		ib_conn->callback_thread_end = 1;
+		wake_up(&(ib_conn->callback_wait_queue));
+		wait_for_completion(&(ib_conn->callback_thread_exit));
+		ib_conn->callback_thread = NULL;
+	}
+}
+
+int vnic_ib_mc_init(struct mc_data *mc_data, struct viport *viport,
+		      struct ib_pd *pd, struct vnic_ib_config *config)
+{
+	struct viport_config	*viport_config = viport->config;
+	int		ret = -1;
+	unsigned int	cq_size = config->num_recvs; /* recvs only */
+
+	IB_FUNCTION("vnic_ib_mc_init\n");
+
+	mc_data->ib_conn.cq = ib_create_cq(viport_config->ibdev, vnic_ib_completion,
+#ifdef BUILD_FOR_OFED_1_2
+				   NULL, &mc_data->ib_conn, cq_size);
+#else
+				   NULL, &mc_data->ib_conn, cq_size, 0);
+#endif
+	if (IS_ERR(mc_data->ib_conn.cq)) {
+		IB_ERROR("ib_create_cq failed\n");
+		goto out;
+	}
+	IB_INFO("mc cq created %p %d\n", mc_data->ib_conn.cq, cq_size);
+
+	ret = ib_req_notify_cq(mc_data->ib_conn.cq, IB_CQ_NEXT_COMP);
+	if (ret) {
+		IB_ERROR("ib_req_notify_cq failed %x \n", ret);
+		goto destroy_cq;
+	}
+
+	init_waitqueue_head(&(mc_data->ib_conn.callback_wait_queue));
+	init_completion(&(mc_data->ib_conn.callback_thread_exit));
+
+	spin_lock_init(&mc_data->ib_conn.compl_received_lock);
+	mc_data->ib_conn.callback_thread = kthread_run(vnic_defer_completion,
+							&mc_data->ib_conn,
+							"qlgc_vnic_mc_def_compl");
+	if (IS_ERR(mc_data->ib_conn.callback_thread)) {
+		IB_ERROR("Could not create vnic_callback_thread for MULTICAST;"
+			" error %d\n",
+			(int) PTR_ERR(mc_data->ib_conn.callback_thread));
+		mc_data->ib_conn.callback_thread = NULL;
+		goto destroy_cq;
+	}
+	IB_INFO("callback_thread created\n");
+
+	ret = vnic_ib_mc_init_qp(mc_data, config, pd, viport_config);
+	if (ret)
+		goto destroy_thread;
+
+	spin_lock_init(&mc_data->ib_conn.conn_lock);
+	mc_data->ib_conn.state = IB_CONN_INITTED; /* stays in this state */
+
+	return ret;
+
+destroy_thread:
+	completion_callback_cleanup(&mc_data->ib_conn);
+destroy_cq:
+	ib_destroy_cq(mc_data->ib_conn.cq);
+	mc_data->ib_conn.cq = (struct ib_cq *)ERR_PTR(-EINVAL);
+out:
+	return ret;
+}
+
+static int vnic_ib_mc_init_qp(struct mc_data *mc_data,
+			   struct vnic_ib_config *config,
+			   struct ib_pd	*pd,
+			   struct viport_config *viport_config)
+{
+	struct ib_qp_init_attr	*init_attr;
+	struct ib_qp_attr	*qp_attr;
+	int			ret;
+
+	IB_FUNCTION("vnic_ib_mc_init_qp\n");
+
+	if (!mc_data->ib_conn.cq) {
+		IB_ERROR("cq is null\n");
+		return -ENOMEM;
+	}
+
+	init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
+	if (!init_attr) {
+		IB_ERROR("failed to alloc init_attr\n");
+		return -ENOMEM;
+	}
+
+	init_attr->cap.max_recv_wr	= config->num_recvs;
+	init_attr->cap.max_send_wr	= 1;
+	init_attr->cap.max_recv_sge	= 2;
+	init_attr->cap.max_send_sge	= 1;
+
+	/* Completion for all work requests. */
+	init_attr->sq_sig_type		= IB_SIGNAL_ALL_WR;
+
+	init_attr->qp_type		= IB_QPT_UD;
+
+	init_attr->send_cq		= mc_data->ib_conn.cq;
+	init_attr->recv_cq		= mc_data->ib_conn.cq;
+
+	IB_INFO("creating qp %d \n", config->num_recvs);
+
+	mc_data->ib_conn.qp = ib_create_qp(pd, init_attr);
+
+	if (IS_ERR(mc_data->ib_conn.qp)) {
+		ret = -1;
+		IB_ERROR("could not create QP\n");
+		goto free_init_attr;
+	}
+
+	qp_attr = kzalloc(sizeof *qp_attr, GFP_KERNEL);
+	if (!qp_attr) {
+		ret = -ENOMEM;
+		goto destroy_qp;
+	}
+
+	qp_attr->qp_state	= IB_QPS_INIT;
+	qp_attr->port_num	= viport_config->port;
+	qp_attr->qkey 		= IOC_NUMBER(be64_to_cpu(viport_config->ioc_guid));
+	qp_attr->pkey_index	= 0;
+	/* cannot set access flags for UD qp
+	qp_attr->qp_access_flags	= IB_ACCESS_REMOTE_WRITE; */
+
+	IB_INFO("port_num:%d qkey:%d pkey:%d\n", qp_attr->port_num,
+			qp_attr->qkey, qp_attr->pkey_index);
+	ret = ib_modify_qp(mc_data->ib_conn.qp, qp_attr,
+			   IB_QP_STATE |
+			   IB_QP_PKEY_INDEX |
+			   IB_QP_QKEY |
+
+			/* cannot set this for UD
+			   IB_QP_ACCESS_FLAGS | */
+
+			   IB_QP_PORT);
+	if (ret) {
+		IB_ERROR("ib_modify_qp to INIT failed %d \n", ret);
+		goto free_qp_attr;
+	}
+
+	kfree(qp_attr);
+	kfree(init_attr);
+	return ret;
+
+free_qp_attr:
+	kfree(qp_attr);
+destroy_qp:
+	ib_destroy_qp(mc_data->ib_conn.qp);
+	mc_data->ib_conn.qp = ERR_PTR(-EINVAL);
+free_init_attr:
+	kfree(init_attr);
+	return ret;
+}
+
+int vnic_ib_mc_mod_qp_to_rts(struct ib_qp *qp)
+{
+	int ret;
+	struct ib_qp_attr *qp_attr = NULL;
+
+	IB_FUNCTION("vnic_ib_mc_mod_qp_to_rts\n");
+	qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL);
+	if (!qp_attr)
+		return -ENOMEM;
+
+	memset(qp_attr, 0, sizeof *qp_attr);
+	qp_attr->qp_state = IB_QPS_RTR;
+
+	ret = ib_modify_qp(qp, qp_attr, IB_QP_STATE);
+	if (ret) {
+		IB_ERROR("ib_modify_qp to RTR failed %d\n", ret);
+		goto out;
+	}
+	IB_INFO("MC QP RTR\n");
+
+	memset(qp_attr, 0, sizeof *qp_attr);
+	qp_attr->qp_state = IB_QPS_RTS;
+	qp_attr->sq_psn = 0;
+
+	ret = ib_modify_qp(qp, qp_attr, IB_QP_STATE | IB_QP_SQ_PSN);
+	if (ret) {
+		IB_ERROR("ib_modify_qp to RTS failed %d\n", ret);
+		goto out;
+	}
+	IB_INFO("MC QP RTS\n");
+
+	return 0;
+
+out:
+	kfree(qp_attr);
+	return -1;
+}
+
+int vnic_ib_mc_post_recv(struct mc_data *mc_data, struct io *io)
+{
+	cycles_t		post_time;
+	struct ib_recv_wr	*bad_wr;
+	int			ret = -1;
+
+	IB_FUNCTION("vnic_ib_mc_post_recv()\n");
+
+	vnic_ib_pre_rcvpost_stats(&mc_data->ib_conn, io, &post_time);
+	io->type = RECV_UD;
+	ret = ib_post_recv(mc_data->ib_conn.qp, &io->rwr, &bad_wr);
+	if (ret) {
+		IB_ERROR("error in posting rcv wr; error %d\n", ret);
+		goto out;
+	}
+	vnic_ib_post_rcvpost_stats(&mc_data->ib_conn, post_time);
+
+out:
+	return ret;
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h
new file mode 100644
index 0000000..ebf9ef5
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_ib.h
@@ -0,0 +1,206 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_IB_H_INCLUDED
+#define VNIC_IB_H_INCLUDED
+
+#include <linux/timex.h>
+#include <linux/completion.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_pack.h>
+#include <rdma/ib_sa.h>
+#include <rdma/ib_cm.h>
+
+#include "vnic_sys.h"
+#include "vnic_netpath.h"
+#define PFX	"qlgc_vnic: "
+
+struct io;
+typedef void (comp_routine_t) (struct io *io);
+
+enum vnic_ib_conn_state {
+	IB_CONN_UNINITTED	= 0,
+	IB_CONN_INITTED		= 1,
+	IB_CONN_CONNECTING	= 2,
+	IB_CONN_CONNECTED	= 3,
+	IB_CONN_DISCONNECTED	= 4,
+	IB_CONN_ERRORED		= 5
+};
+
+struct vnic_ib_conn {
+	struct viport		*viport;
+	struct vnic_ib_config	*ib_config;
+	spinlock_t		conn_lock;
+	enum vnic_ib_conn_state	state;
+	struct ib_qp		*qp;
+	struct ib_cq		*cq;
+	struct ib_cm_id		*cm_id;
+	int 			callback_thread_end;
+	struct task_struct	*callback_thread;
+	wait_queue_head_t	callback_wait_queue;
+	u32 			in_thread;
+	u32 			compl_received;
+	struct completion 	callback_thread_exit;
+	spinlock_t		compl_received_lock;
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+	struct {
+		cycles_t	connection_time;
+		cycles_t	rdma_post_time;
+		u32		rdma_post_ios;
+		cycles_t	rdma_comp_time;
+		u32		rdma_comp_ios;
+		cycles_t	send_post_time;
+		u32		send_post_ios;
+		cycles_t	send_comp_time;
+		u32		send_comp_ios;
+		cycles_t	recv_post_time;
+		u32		recv_post_ios;
+		cycles_t	recv_comp_time;
+		u32		recv_comp_ios;
+		u32		num_ios;
+		u32		num_callbacks;
+		u32		max_ios;
+	} statistics;
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+};
+
+struct vnic_ib_path_info {
+	struct ib_sa_path_rec	path;
+	struct ib_sa_query	*path_query;
+	int			path_query_id;
+	int			status;
+	struct			completion done;
+};
+
+struct vnic_ib_device {
+	struct ib_device	*dev;
+	struct list_head	port_list;
+};
+
+struct vnic_ib_port {
+	struct vnic_ib_device	*dev;
+	u8			port_num;
+	struct dev_info		pdev_info;
+	struct list_head	list;
+};
+
+struct io {
+	struct list_head	list_ptrs;
+	struct viport		*viport;
+	comp_routine_t		*routine;
+	struct ib_recv_wr	rwr;
+	struct ib_send_wr	swr;
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+	cycles_t		time;
+#endif	/* CONFIG_INFINIBAND_QLGC_VNIC_STATS */
+	enum {RECV, RDMA, SEND, RECV_UD}	type;
+};
+
+struct rdma_io {
+	struct io		io;
+	struct ib_sge		list[2];
+	u16			index;
+	u16			len;
+	u8			*data;
+	dma_addr_t		data_dma;
+	struct sk_buff		*skb;
+	dma_addr_t		skb_data_dma;
+	struct viport_trailer 	*trailer;
+	dma_addr_t 		trailer_dma;
+};
+
+struct send_io {
+	struct io	io;
+	struct ib_sge	list;
+	u8		*virtual_addr;
+};
+
+struct recv_io {
+	struct io	io;
+	struct ib_sge	list;
+	u8		*virtual_addr;
+};
+
+struct ud_recv_io {
+	struct io	io;
+	u16 	len;
+	dma_addr_t		skb_data_dma;
+	struct ib_sge	list[2]; /* one for grh and other for rest of pkt. */
+	struct sk_buff 	*skb;
+};
+
+int	vnic_ib_init(void);
+void	vnic_ib_cleanup(void);
+
+struct vnic;
+int vnic_ib_get_path(struct netpath *netpath, struct vnic *vnic);
+int vnic_ib_conn_init(struct vnic_ib_conn *ib_conn, struct viport *viport,
+		      struct ib_pd *pd, struct vnic_ib_config *config);
+
+int vnic_ib_post_recv(struct vnic_ib_conn *ib_conn, struct io *io);
+int vnic_ib_post_send(struct vnic_ib_conn *ib_conn, struct io *io);
+int vnic_ib_cm_connect(struct vnic_ib_conn *ib_conn);
+int vnic_ib_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
+
+#define	vnic_ib_conn_uninitted(ib_conn)			\
+	((ib_conn)->state == IB_CONN_UNINITTED)
+#define	vnic_ib_conn_initted(ib_conn)			\
+	((ib_conn)->state == IB_CONN_INITTED)
+#define	vnic_ib_conn_connecting(ib_conn)		\
+	((ib_conn)->state == IB_CONN_CONNECTING)
+#define	vnic_ib_conn_connected(ib_conn)			\
+	((ib_conn)->state == IB_CONN_CONNECTED)
+#define	vnic_ib_conn_disconnected(ib_conn)		\
+	((ib_conn)->state == IB_CONN_DISCONNECTED)
+
+#define MCAST_GROUP_INVALID 0x00 /* viport failed to join or left mc group */
+#define MCAST_GROUP_JOINING 0x01 /* wait for completion */
+#define MCAST_GROUP_JOINED  0x02 /* join process completed successfully */
+
+/* vnic_sa_client is used to register with sa once. It is needed to join and
+ * leave multicast groups.
+ */
+extern struct ib_sa_client vnic_sa_client;
+
+/* The following functions are using initialize and handle multicast
+ * components.
+ */
+struct mc_data; /* forward declaration */
+/* Initialize all necessary mc components */
+int vnic_ib_mc_init(struct mc_data *mc_data, struct viport *viport,
+			struct ib_pd *pd, struct vnic_ib_config *config);
+/* Put multicast qp in RTS */
+int vnic_ib_mc_mod_qp_to_rts(struct ib_qp *qp);
+/* Post multicast receive buffers */
+int vnic_ib_mc_post_recv(struct mc_data *mc_data, struct io *io);
+
+#endif	/* VNIC_IB_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:19:55 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:49:55 +0530
Subject: [ofa-general] [PATCH 08/13] QLogic VNIC: sysfs interface
	implementation for the driver
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430171955.31725.7771.stgit@localhost.localdomain>

From: Amar Mudrankit <amar.mudrankit at qlogic.com>

The sysfs interface for the QLogic VNIC driver is implemented through
this patch.

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c | 1127 +++++++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h |   62 +
 2 files changed, 1189 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c
new file mode 100644
index 0000000..7e70b0c
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.c
@@ -0,0 +1,1127 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/parser.h>
+#include <linux/netdevice.h>
+#include <linux/if.h>
+
+#include "vnic_util.h"
+#include "vnic_config.h"
+#include "vnic_ib.h"
+#include "vnic_viport.h"
+#include "vnic_main.h"
+#include "vnic_stats.h"
+
+/*
+ * target eiocs are added by writing
+ *
+ * ioc_guid=<EIOC GUID>,dgid=<dest GID>,pkey=<P_key>,name=<interface_name>
+ * to the create_primary  sysfs attribute.
+ */
+enum {
+	VNIC_OPT_ERR = 0,
+	VNIC_OPT_IOC_GUID = 1 << 0,
+	VNIC_OPT_DGID = 1 << 1,
+	VNIC_OPT_PKEY = 1 << 2,
+	VNIC_OPT_NAME = 1 << 3,
+	VNIC_OPT_INSTANCE = 1 << 4,
+	VNIC_OPT_RXCSUM = 1 << 5,
+	VNIC_OPT_TXCSUM = 1 << 6,
+	VNIC_OPT_HEARTBEAT = 1 << 7,
+	VNIC_OPT_IOC_STRING = 1 << 8,
+	VNIC_OPT_IB_MULTICAST = 1 << 9,
+	VNIC_OPT_ALL = (VNIC_OPT_IOC_GUID |
+			VNIC_OPT_DGID | VNIC_OPT_NAME | VNIC_OPT_PKEY),
+};
+
+static match_table_t vnic_opt_tokens = {
+	{VNIC_OPT_IOC_GUID, "ioc_guid=%s"},
+	{VNIC_OPT_DGID, "dgid=%s"},
+	{VNIC_OPT_PKEY, "pkey=%x"},
+	{VNIC_OPT_NAME, "name=%s"},
+	{VNIC_OPT_INSTANCE, "instance=%d"},
+	{VNIC_OPT_RXCSUM, "rx_csum=%s"},
+	{VNIC_OPT_TXCSUM, "tx_csum=%s"},
+	{VNIC_OPT_HEARTBEAT, "heartbeat=%d"},
+	{VNIC_OPT_IOC_STRING, "ioc_string=\"%s"},
+	{VNIC_OPT_IB_MULTICAST, "ib_multicast=%s"},
+	{VNIC_OPT_ERR, NULL}
+};
+
+void vnic_release_dev(struct device *dev)
+{
+	struct dev_info *dev_info =
+	    container_of(dev, struct dev_info, dev);
+
+	complete(&dev_info->released);
+
+}
+
+struct class vnic_class = {
+	.name = "infiniband_qlgc_vnic",
+	.dev_release = vnic_release_dev
+};
+
+struct dev_info interface_dev;
+
+DEVICE_ATTR(create_primary, S_IWUSR, NULL, vnic_create_primary);
+DEVICE_ATTR(create_secondary, S_IWUSR, NULL, vnic_create_secondary);
+DEVICE_ATTR(delete_vnic, S_IWUSR, NULL, vnic_delete);
+
+static int vnic_parse_options(const char *buf, struct path_param *param)
+{
+	char *options, *sep_opt;
+	char *p;
+	char dgid[3];
+	substring_t args[MAX_OPT_ARGS];
+	int opt_mask = 0;
+	int token;
+	int ret = -EINVAL;
+	int i, len;
+
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	sep_opt = options;
+	while ((p = strsep(&sep_opt, ",")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, vnic_opt_tokens, args);
+		opt_mask |= token;
+
+		switch (token) {
+		case VNIC_OPT_IOC_GUID:
+			p = match_strdup(args);
+			param->ioc_guid = cpu_to_be64(simple_strtoull(p, NULL,
+								      16));
+			kfree(p);
+			break;
+
+		case VNIC_OPT_DGID:
+			p = match_strdup(args);
+			if (strlen(p) != 32) {
+				printk(KERN_WARNING PFX
+				       "bad dest GID parameter '%s'\n", p);
+				kfree(p);
+				goto out;
+			}
+
+			for (i = 0; i < 16; ++i) {
+				strlcpy(dgid, p + i * 2, 3);
+				param->dgid[i] = simple_strtoul(dgid, NULL,
+								16);
+
+			}
+			kfree(p);
+			break;
+
+		case VNIC_OPT_PKEY:
+			if (match_hex(args, &token)) {
+				printk(KERN_WARNING PFX
+				       "bad P_key parameter '%s'\n", p);
+				goto out;
+			}
+			param->pkey = cpu_to_be16(token);
+			break;
+
+		case VNIC_OPT_NAME:
+			p = match_strdup(args);
+			if (strlen(p) >= IFNAMSIZ) {
+				printk(KERN_WARNING PFX
+				       "interface name parameter too long\n");
+				kfree(p);
+				goto out;
+			}
+			strcpy(param->name, p);
+			kfree(p);
+			break;
+		case VNIC_OPT_INSTANCE:
+			if (match_int(args, &token)) {
+				printk(KERN_WARNING PFX
+				       "bad instance parameter '%s'\n", p);
+				goto out;
+			}
+
+			if (token > 255 || token < 0) {
+				printk(KERN_WARNING PFX
+				       "instance parameter must be"
+				       " >= 0 and <= 255\n");
+				goto out;
+			}
+
+			param->instance = token;
+			break;
+		case VNIC_OPT_RXCSUM:
+			p = match_strdup(args);
+			if (!strncmp(p, "true", 4))
+				param->rx_csum = 1;
+			else if (!strncmp(p, "false", 5))
+				param->rx_csum = 0;
+			else {
+				printk(KERN_WARNING PFX
+				       "bad rx_csum parameter."
+				       " must be 'true' or 'false'\n");
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+		case VNIC_OPT_TXCSUM:
+			p = match_strdup(args);
+			if (!strncmp(p, "true", 4))
+				param->tx_csum = 1;
+			else if (!strncmp(p, "false", 5))
+				param->tx_csum = 0;
+			else {
+				printk(KERN_WARNING PFX
+				       "bad tx_csum parameter."
+				       " must be 'true' or 'false'\n");
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+		case VNIC_OPT_HEARTBEAT:
+			if (match_int(args, &token)) {
+				printk(KERN_WARNING PFX
+				       "bad instance parameter '%s'\n", p);
+				goto out;
+			}
+
+			if (token > 6000 || token <= 0) {
+				printk(KERN_WARNING PFX
+				       "heartbeat parameter must be"
+				       " > 0 and <= 6000\n");
+				goto out;
+			}
+			param->heartbeat = token;
+			break;
+		case VNIC_OPT_IOC_STRING:
+			p = match_strdup(args);
+			len = strlen(p);
+			if (len > MAX_IOC_STRING_LEN) {
+				printk(KERN_WARNING PFX
+				       "ioc string parameter too long\n");
+				kfree(p);
+				goto out;
+			}
+			strcpy(param->ioc_string, p);
+			if (*(p + len - 1) != '\"') {
+				strcat(param->ioc_string, ",");
+				kfree(p);
+				p = strsep(&sep_opt, "\"");
+				strcat(param->ioc_string, p);
+				sep_opt++;
+			} else {
+				*(param->ioc_string + len - 1) = '\0';
+				kfree(p);
+			}
+			break;
+		case VNIC_OPT_IB_MULTICAST:
+			p = match_strdup(args);
+			if (!strncmp(p, "true", 4))
+				param->ib_multicast = 1;
+			else if (!strncmp(p, "false", 5))
+				param->ib_multicast = 0;
+			else {
+					printk(KERN_WARNING PFX
+					"bad ib_multicast parameter."
+					" must be 'true' or 'false'\n");
+				kfree(p);
+				goto out;
+			}
+			kfree(p);
+			break;
+		default:
+			printk(KERN_WARNING PFX
+			       "unknown parameter or missing value "
+			       "'%s' in target creation request\n", p);
+			goto out;
+		}
+
+	}
+
+	if ((opt_mask & VNIC_OPT_ALL) == VNIC_OPT_ALL)
+		ret = 0;
+	else
+		for (i = 0; i < ARRAY_SIZE(vnic_opt_tokens); ++i)
+			if ((vnic_opt_tokens[i].token & VNIC_OPT_ALL) &&
+			    !(vnic_opt_tokens[i].token & opt_mask))
+				printk(KERN_WARNING PFX
+				       "target creation request is "
+				       "missing parameter '%s'\n",
+				       vnic_opt_tokens[i].pattern);
+
+out:
+	kfree(options);
+	return ret;
+
+}
+
+static ssize_t show_vnic_state(struct device *dev,
+			       struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, dev_info);
+	switch (vnic->state) {
+	case VNIC_UNINITIALIZED:
+		return sprintf(buf, "VNIC_UNINITIALIZED\n");
+	case VNIC_REGISTERED:
+		return sprintf(buf, "VNIC_REGISTERED\n");
+	default:
+		return sprintf(buf, "INVALID STATE\n");
+	}
+
+}
+
+static DEVICE_ATTR(vnic_state, S_IRUGO, show_vnic_state, NULL);
+
+static ssize_t show_rx_csum(struct device *dev,
+			    struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, dev_info);
+
+	if (vnic->config->use_rx_csum)
+		return sprintf(buf, "true\n");
+	else
+		return sprintf(buf, "false\n");
+}
+
+static DEVICE_ATTR(rx_csum, S_IRUGO, show_rx_csum, NULL);
+
+static ssize_t show_tx_csum(struct device *dev,
+			    struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, dev_info);
+
+	if (vnic->config->use_tx_csum)
+		return sprintf(buf, "true\n");
+	else
+		return sprintf(buf, "false\n");
+}
+
+static DEVICE_ATTR(tx_csum, S_IRUGO, show_tx_csum, NULL);
+
+static ssize_t show_current_path(struct device *dev,
+				 struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, dev_info);
+
+	if (vnic->current_path == &vnic->primary_path)
+		return sprintf(buf, "primary path\n");
+	else if (vnic->current_path == &vnic->secondary_path)
+		return sprintf(buf, "secondary path\n");
+	else
+		return sprintf(buf, "none\n");
+
+}
+
+static DEVICE_ATTR(current_path, S_IRUGO, show_current_path, NULL);
+
+static struct attribute *vnic_dev_attrs[] = {
+	&dev_attr_vnic_state.attr,
+	&dev_attr_rx_csum.attr,
+	&dev_attr_tx_csum.attr,
+	&dev_attr_current_path.attr,
+	NULL
+};
+
+struct attribute_group vnic_dev_attr_group = {
+	.attrs = vnic_dev_attrs,
+};
+
+static inline void print_dgid(u8 *dgid)
+{
+	int i;
+
+	for (i = 0; i < 16; i += 2)
+		printk("%04x", be16_to_cpu(*(__be16 *)&dgid[i]));
+}
+
+static inline int is_dgid_zero(u8 *dgid)
+{
+	int i;
+
+	for (i = 0; i < 16; i++) {
+		if (dgid[i] != 0)
+			return 1;
+	}
+	return 0;
+}
+
+static int create_netpath(struct netpath *npdest,
+			  struct path_param *p_params)
+{
+	struct viport_config	*viport_config;
+	struct viport		*viport;
+	struct vnic		*vnic;
+	struct list_head	*ptr;
+	int			ret = 0;
+
+	list_for_each(ptr, &vnic_list) {
+		vnic = list_entry(ptr, struct vnic, list_ptrs);
+		if (vnic->primary_path.viport) {
+			viport_config = vnic->primary_path.viport->config;
+			if ((viport_config->ioc_guid == p_params->ioc_guid)
+			    && (viport_config->control_config.vnic_instance
+				== p_params->instance)
+			    && (be64_to_cpu(p_params->ioc_guid))) {
+				SYS_ERROR("GUID %llx,"
+					  " INSTANCE %d already in use\n",
+					  be64_to_cpu(p_params->ioc_guid),
+					  p_params->instance);
+				ret = -EINVAL;
+				goto out;
+			}
+		}
+
+		if (vnic->secondary_path.viport) {
+			viport_config = vnic->secondary_path.viport->config;
+			if ((viport_config->ioc_guid == p_params->ioc_guid)
+			    && (viport_config->control_config.vnic_instance
+				== p_params->instance)
+			    && (be64_to_cpu(p_params->ioc_guid))) {
+				SYS_ERROR("GUID %llx,"
+					  " INSTANCE %d already in use\n",
+					  be64_to_cpu(p_params->ioc_guid),
+					  p_params->instance);
+				ret = -EINVAL;
+				goto out;
+			}
+		}
+	}
+
+	if (npdest->viport) {
+		SYS_ERROR("create_netpath: path already exists\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	viport_config = config_alloc_viport(p_params);
+	if (!viport_config) {
+		SYS_ERROR("create_netpath: failed creating viport config\n");
+		ret = -1;
+		goto out;
+	}
+
+	/*User specified heartbeat value is in 1/100s of a sec*/
+	if (p_params->heartbeat != -1) {
+		viport_config->hb_interval =
+			msecs_to_jiffies(p_params->heartbeat * 10);
+		viport_config->hb_timeout =
+			(p_params->heartbeat << 6) * 10000; /* usec */
+	}
+
+	viport_config->path_idx = 0;
+
+	viport = viport_allocate(viport_config);
+	if (!viport) {
+		SYS_ERROR("create_netpath: failed creating viport\n");
+		kfree(viport_config);
+		ret = -1;
+		goto out;
+	}
+
+	npdest->viport = viport;
+	viport->parent = npdest;
+	viport->vnic = npdest->parent;
+
+	if (is_dgid_zero(p_params->dgid) &&  p_params->ioc_guid != 0
+	   &&  p_params->pkey != 0) {
+		viport_kick(viport);
+		vnic_disconnected(npdest->parent, npdest);
+	} else {
+		printk(KERN_WARNING "Specified parameters IOCGUID=%llx, "
+			"P_Key=%x, DGID=", be64_to_cpu(p_params->ioc_guid),
+			p_params->pkey);
+		print_dgid(p_params->dgid);
+		printk(" insufficient for establishing %s path for interface "
+			"%s. Hence, path will not be established.\n",
+			(npdest->second_bias ? "secondary" : "primary"),
+			p_params->name);
+	}
+out:
+	return ret;
+}
+
+static struct vnic *create_vnic(struct path_param *param)
+{
+	struct vnic_config *vnic_config;
+	struct vnic *vnic;
+	struct list_head *ptr;
+
+	SYS_INFO("create_vnic: name = %s\n", param->name);
+	list_for_each(ptr, &vnic_list) {
+		vnic = list_entry(ptr, struct vnic, list_ptrs);
+		if (!strcmp(vnic->config->name, param->name)) {
+			SYS_ERROR("vnic %s already exists\n",
+				   param->name);
+			return NULL;
+		}
+	}
+
+	vnic_config = config_alloc_vnic();
+	if (!vnic_config) {
+		SYS_ERROR("create_vnic: failed creating vnic config\n");
+		return NULL;
+	}
+
+	if (param->rx_csum != -1)
+		vnic_config->use_rx_csum = param->rx_csum;
+
+	if (param->tx_csum != -1)
+		vnic_config->use_tx_csum = param->tx_csum;
+
+	strcpy(vnic_config->name, param->name);
+	vnic = vnic_allocate(vnic_config);
+	if (!vnic) {
+		SYS_ERROR("create_vnic: failed allocating vnic\n");
+		goto free_vnic_config;
+	}
+
+	init_completion(&vnic->dev_info.released);
+
+	vnic->dev_info.dev.class = NULL;
+	vnic->dev_info.dev.parent = &interface_dev.dev;
+	vnic->dev_info.dev.release = vnic_release_dev;
+	snprintf(vnic->dev_info.dev.bus_id, BUS_ID_SIZE,
+		 vnic_config->name);
+
+	if (device_register(&vnic->dev_info.dev)) {
+		SYS_ERROR("create_vnic: error in registering"
+			  " vnic class dev\n");
+		goto free_vnic;
+	}
+
+	if (sysfs_create_group(&vnic->dev_info.dev.kobj,
+			       &vnic_dev_attr_group)) {
+		SYS_ERROR("create_vnic: error in creating"
+			  "vnic attr group\n");
+		goto err_attr;
+
+	}
+
+	if (vnic_setup_stats_files(vnic))
+		goto err_stats;
+
+	return vnic;
+err_stats:
+	sysfs_remove_group(&vnic->dev_info.dev.kobj,
+			   &vnic_dev_attr_group);
+err_attr:
+	device_unregister(&vnic->dev_info.dev);
+	wait_for_completion(&vnic->dev_info.released);
+free_vnic:
+	list_del(&vnic->list_ptrs);
+	kfree(vnic);
+free_vnic_config:
+	kfree(vnic_config);
+	return NULL;
+}
+
+ssize_t vnic_delete(struct device *dev, struct device_attribute *dev_attr,
+		    const char *buf, size_t count)
+{
+	struct vnic *vnic;
+	struct list_head *ptr;
+	int ret = -EINVAL;
+
+	if (count > IFNAMSIZ) {
+		printk(KERN_WARNING PFX "invalid vnic interface name\n");
+		return ret;
+	}
+
+	SYS_INFO("vnic_delete: name = %s\n", buf);
+	list_for_each(ptr, &vnic_list) {
+		vnic = list_entry(ptr, struct vnic, list_ptrs);
+		if (!strcmp(vnic->config->name, buf)) {
+			vnic_free(vnic);
+			return count;
+		}
+	}
+
+	printk(KERN_WARNING PFX "vnic interface '%s' does not exist\n", buf);
+	return ret;
+}
+
+static ssize_t show_viport_state(struct device *dev,
+				 struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+	switch (path->viport->state) {
+	case VIPORT_DISCONNECTED:
+		return sprintf(buf, "VIPORT_DISCONNECTED\n");
+	case VIPORT_CONNECTED:
+		return sprintf(buf, "VIPORT_CONNECTED\n");
+	default:
+		return sprintf(buf, "INVALID STATE\n");
+	}
+
+}
+
+static DEVICE_ATTR(viport_state, S_IRUGO, show_viport_state, NULL);
+
+static ssize_t show_link_state(struct device *dev,
+			       struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	switch (path->viport->link_state) {
+	case LINK_UNINITIALIZED:
+		return sprintf(buf, "LINK_UNINITIALIZED\n");
+	case LINK_INITIALIZE:
+		return sprintf(buf, "LINK_INITIALIZE\n");
+	case LINK_INITIALIZECONTROL:
+		return sprintf(buf, "LINK_INITIALIZECONTROL\n");
+	case LINK_INITIALIZEDATA:
+		return sprintf(buf, "LINK_INITIALIZEDATA\n");
+	case LINK_CONTROLCONNECT:
+		return sprintf(buf, "LINK_CONTROLCONNECT\n");
+	case LINK_CONTROLCONNECTWAIT:
+		return sprintf(buf, "LINK_CONTROLCONNECTWAIT\n");
+	case LINK_INITVNICREQ:
+		return sprintf(buf, "LINK_INITVNICREQ\n");
+	case LINK_INITVNICRSP:
+		return sprintf(buf, "LINK_INITVNICRSP\n");
+	case LINK_BEGINDATAPATH:
+		return sprintf(buf, "LINK_BEGINDATAPATH\n");
+	case LINK_CONFIGDATAPATHREQ:
+		return sprintf(buf, "LINK_CONFIGDATAPATHREQ\n");
+	case LINK_CONFIGDATAPATHRSP:
+		return sprintf(buf, "LINK_CONFIGDATAPATHRSP\n");
+	case LINK_DATACONNECT:
+		return sprintf(buf, "LINK_DATACONNECT\n");
+	case LINK_DATACONNECTWAIT:
+		return sprintf(buf, "LINK_DATACONNECTWAIT\n");
+	case LINK_XCHGPOOLREQ:
+		return sprintf(buf, "LINK_XCHGPOOLREQ\n");
+	case LINK_XCHGPOOLRSP:
+		return sprintf(buf, "LINK_XCHGPOOLRSP\n");
+	case LINK_INITIALIZED:
+		return sprintf(buf, "LINK_INITIALIZED\n");
+	case LINK_IDLE:
+		return sprintf(buf, "LINK_IDLE\n");
+	case LINK_IDLING:
+		return sprintf(buf, "LINK_IDLING\n");
+	case LINK_CONFIGLINKREQ:
+		return sprintf(buf, "LINK_CONFIGLINKREQ\n");
+	case LINK_CONFIGLINKRSP:
+		return sprintf(buf, "LINK_CONFIGLINKRSP\n");
+	case LINK_CONFIGADDRSREQ:
+		return sprintf(buf, "LINK_CONFIGADDRSREQ\n");
+	case LINK_CONFIGADDRSRSP:
+		return sprintf(buf, "LINK_CONFIGADDRSRSP\n");
+	case LINK_REPORTSTATREQ:
+		return sprintf(buf, "LINK_REPORTSTATREQ\n");
+	case LINK_REPORTSTATRSP:
+		return sprintf(buf, "LINK_REPORTSTATRSP\n");
+	case LINK_HEARTBEATREQ:
+		return sprintf(buf, "LINK_HEARTBEATREQ\n");
+	case LINK_HEARTBEATRSP:
+		return sprintf(buf, "LINK_HEARTBEATRSP\n");
+	case LINK_RESET:
+		return sprintf(buf, "LINK_RESET\n");
+	case LINK_RESETRSP:
+		return sprintf(buf, "LINK_RESETRSP\n");
+	case LINK_RESETCONTROL:
+		return sprintf(buf, "LINK_RESETCONTROL\n");
+	case LINK_RESETCONTROLRSP:
+		return sprintf(buf, "LINK_RESETCONTROLRSP\n");
+	case LINK_DATADISCONNECT:
+		return sprintf(buf, "LINK_DATADISCONNECT\n");
+	case LINK_CONTROLDISCONNECT:
+		return sprintf(buf, "LINK_CONTROLDISCONNECT\n");
+	case LINK_CLEANUPDATA:
+		return sprintf(buf, "LINK_CLEANUPDATA\n");
+	case LINK_CLEANUPCONTROL:
+		return sprintf(buf, "LINK_CLEANUPCONTROL\n");
+	case LINK_DISCONNECTED:
+		return sprintf(buf, "LINK_DISCONNECTED\n");
+	case LINK_RETRYWAIT:
+		return sprintf(buf, "LINK_RETRYWAIT\n");
+	default:
+		return sprintf(buf, "INVALID STATE\n");
+
+	}
+
+}
+static DEVICE_ATTR(link_state, S_IRUGO, show_link_state, NULL);
+
+static ssize_t show_heartbeat(struct device *dev,
+			      struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	/* hb_inteval is in jiffies, convert it back to
+	 * 1/100ths of a second
+	 */
+	return sprintf(buf, "%d\n",
+		(jiffies_to_msecs(path->viport->config->hb_interval)/10));
+}
+
+static DEVICE_ATTR(heartbeat, S_IRUGO, show_heartbeat, NULL);
+
+static ssize_t show_ioc_guid(struct device *dev,
+			     struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	return sprintf(buf, "%llx\n",
+				__be64_to_cpu(path->viport->config->ioc_guid));
+}
+
+static DEVICE_ATTR(ioc_guid, S_IRUGO, show_ioc_guid, NULL);
+
+static inline void get_dgid_string(u8 *dgid, char *buf)
+{
+	int i;
+	char holder[5];
+
+	for (i = 0; i < 16; i += 2) {
+		sprintf(holder, "%04x", be16_to_cpu(*(__be16 *)&dgid[i]));
+		strcat(buf, holder);
+	}
+
+	strcat(buf, "\n");
+}
+
+static ssize_t show_dgid(struct device *dev,
+			 struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	get_dgid_string(path->viport->config->path_info.path.dgid.raw, buf);
+
+	return strlen(buf);
+}
+
+static DEVICE_ATTR(dgid, S_IRUGO, show_dgid, NULL);
+
+static ssize_t show_pkey(struct device *dev,
+			 struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	return sprintf(buf, "%x\n", path->viport->config->path_info.path.pkey);
+}
+
+static DEVICE_ATTR(pkey, S_IRUGO, show_pkey, NULL);
+
+static ssize_t show_hca_info(struct device *dev,
+			     struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	return sprintf(buf, "vnic-%s-%d\n", path->viport->config->ibdev->name,
+						path->viport->config->port);
+}
+
+static DEVICE_ATTR(hca_info, S_IRUGO, show_hca_info, NULL);
+
+static ssize_t show_ioc_string(struct device *dev,
+			       struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	return sprintf(buf, "%s\n", path->viport->config->ioc_string);
+}
+
+static  DEVICE_ATTR(ioc_string, S_IRUGO, show_ioc_string, NULL);
+
+static ssize_t show_multicast_state(struct device *dev,
+				    struct device_attribute *dev_attr,
+				    char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+
+	struct netpath *path = container_of(info, struct netpath, dev_info);
+
+	if (!(path->viport->features_supported & VNIC_FEAT_INBOUND_IB_MC))
+		return sprintf(buf, "feature not enabled\n");
+
+	switch (path->viport->mc_info.state) {
+	case MCAST_STATE_INVALID:
+		return sprintf(buf, "state=Invalid\n");
+	case MCAST_STATE_JOINING:
+		return sprintf(buf, "state=Joining MGID:" VNIC_GID_FMT "\n",
+			VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw));
+	case MCAST_STATE_ATTACHING:
+		return sprintf(buf, "state=Attaching MGID:" VNIC_GID_FMT
+			" MLID:%X\n",
+			VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw),
+			path->viport->mc_info.mlid);
+	case MCAST_STATE_JOINED_ATTACHED:
+		return sprintf(buf,
+			"state=Joined & Attached MGID:" VNIC_GID_FMT
+			" MLID:%X\n",
+			VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw),
+			path->viport->mc_info.mlid);
+	case MCAST_STATE_DETACHING:
+		return sprintf(buf, "state=Detaching MGID: " VNIC_GID_FMT "\n",
+			VNIC_GID_RAW_ARG(path->viport->mc_info.mgid.raw));
+	case MCAST_STATE_RETRIED:
+		return sprintf(buf, "state=Retries Exceeded\n");
+	}
+	return sprintf(buf, "invalid state\n");
+}
+
+static  DEVICE_ATTR(multicast_state, S_IRUGO, show_multicast_state, NULL);
+
+static struct attribute *vnic_path_attrs[] = {
+	&dev_attr_viport_state.attr,
+	&dev_attr_link_state.attr,
+	&dev_attr_heartbeat.attr,
+	&dev_attr_ioc_guid.attr,
+	&dev_attr_dgid.attr,
+	&dev_attr_pkey.attr,
+	&dev_attr_hca_info.attr,
+	&dev_attr_ioc_string.attr,
+	&dev_attr_multicast_state.attr,
+	NULL
+};
+
+struct attribute_group vnic_path_attr_group = {
+	.attrs = vnic_path_attrs,
+};
+
+
+static int setup_path_class_files(struct netpath *path, char *name)
+{
+	init_completion(&path->dev_info.released);
+
+	path->dev_info.dev.class = NULL;
+	path->dev_info.dev.parent = &path->parent->dev_info.dev;
+	path->dev_info.dev.release = vnic_release_dev;
+	snprintf(path->dev_info.dev.bus_id, BUS_ID_SIZE, name);
+
+	if (device_register(&path->dev_info.dev)) {
+		SYS_ERROR("error in registering path class dev\n");
+		goto out;
+	}
+
+	if (sysfs_create_group(&path->dev_info.dev.kobj,
+			       &vnic_path_attr_group)) {
+		SYS_ERROR("error in creating vnic path group attrs");
+		goto err_path;
+	}
+
+	return 0;
+
+err_path:
+	device_unregister(&path->dev_info.dev);
+	wait_for_completion(&path->dev_info.released);
+out:
+	return -1;
+
+}
+
+static inline void update_dgids(u8 *old, u8 *new, char *vnic_name,
+				char *path_name)
+{
+	int i;
+
+	if (!memcmp(old, new, 16))
+		return;
+
+	printk(KERN_INFO PFX "Changing dgid from 0x");
+	print_dgid(old);
+	printk(" to 0x");
+	print_dgid(new);
+	printk(" for %s path of %s\n", path_name, vnic_name);
+	for (i = 0; i < 16; i++)
+		old[i] = new[i];
+}
+
+static inline void update_ioc_guids(struct path_param *params,
+				    struct netpath *path,
+				    char *vnic_name, char *path_name)
+{
+	u64 sid;
+
+	if (path->viport->config->ioc_guid == params->ioc_guid)
+		return;
+
+	printk(KERN_INFO PFX "Changing IOC GUID from 0x%llx to 0x%llx "
+			 "for %s path of %s\n",
+			 __be64_to_cpu(path->viport->config->ioc_guid),
+			 __be64_to_cpu(params->ioc_guid), path_name, vnic_name);
+
+	path->viport->config->ioc_guid = params->ioc_guid;
+
+	sid = (SST_AGN << 56) | (SST_OUI << 32) | (CONTROL_PATH_ID << 8)
+				| IOC_NUMBER(be64_to_cpu(params->ioc_guid));
+
+	path->viport->config->control_config.ib_config.service_id =
+							 cpu_to_be64(sid);
+
+	sid = (SST_AGN << 56) | (SST_OUI << 32) | (DATA_PATH_ID << 8)
+				| IOC_NUMBER(be64_to_cpu(params->ioc_guid));
+
+	path->viport->config->data_config.ib_config.service_id =
+							 cpu_to_be64(sid);
+}
+
+static inline void update_pkeys(__be16 *old, __be16 *new, char *vnic_name,
+				char *path_name)
+{
+	if (*old == *new)
+		return;
+
+	printk(KERN_INFO PFX "Changing P_Key from 0x%x to 0x%x "
+			 "for %s path of %s\n", *old, *new,
+			 path_name, vnic_name);
+	*old = *new;
+}
+
+static void update_ioc_strings(struct path_param *params, struct netpath *path,
+								char *path_name)
+{
+	if (!strcmp(params->ioc_string, path->viport->config->ioc_string))
+		return;
+
+	printk(KERN_INFO PFX "Changing ioc_string to %s for %s path of %s\n",
+				params->ioc_string, path_name, params->name);
+
+	strcpy(path->viport->config->ioc_string, params->ioc_string);
+}
+
+static void update_path_parameters(struct path_param *params,
+				   struct netpath *path)
+{
+	update_dgids(path->viport->config->path_info.path.dgid.raw,
+		params->dgid, params->name,
+		(path->second_bias ? "secondary" : "primary"));
+
+	update_ioc_guids(params, path, params->name,
+		(path->second_bias ? "secondary" : "primary"));
+
+	update_pkeys(&path->viport->config->path_info.path.pkey,
+		&params->pkey, params->name,
+		(path->second_bias ? "secondary" : "primary"));
+
+	update_ioc_strings(params, path,
+		(path->second_bias ? "secondary" : "primary"));
+}
+
+static ssize_t update_params_and_connect(struct path_param *params,
+					 struct netpath *path, size_t count)
+{
+	if (is_dgid_zero(params->dgid) && params->ioc_guid != 0 &&
+	    params->pkey != 0) {
+
+		if (!memcmp(path->viport->config->path_info.path.dgid.raw,
+			params->dgid, 16) &&
+		    params->ioc_guid == path->viport->config->ioc_guid &&
+		    params->pkey     == path->viport->config->path_info.path.pkey) {
+
+			printk(KERN_WARNING PFX "All of the dgid, ioc_guid and "
+						"pkeys are same as the existing"
+						" one. Not updating values.\n");
+			return -EINVAL;
+		} else {
+			if (path->viport->state == VIPORT_CONNECTED) {
+				printk(KERN_WARNING PFX "%s path of %s "
+					"interface is already in connected "
+					"state. Not updating values.\n",
+				(path->second_bias ? "Secondary" : "Primary"),
+				path->parent->config->name);
+				return -EINVAL;
+			} else {
+				update_path_parameters(params, path);
+				viport_kick(path->viport);
+				vnic_disconnected(path->parent, path);
+				return count;
+			}
+		}
+	} else {
+		printk(KERN_WARNING PFX "Either dgid, iocguid, pkey is zero. "
+					"No update.\n");
+		return -EINVAL;
+	}
+}
+
+ssize_t vnic_create_primary(struct device *dev,
+			    struct device_attribute *dev_attr, const char *buf,
+			    size_t count)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic_ib_port *target =
+	    container_of(info, struct vnic_ib_port, pdev_info);
+
+	struct path_param param;
+	int ret = -EINVAL;
+	struct vnic *vnic;
+	struct list_head    *ptr;
+
+	param.instance = 0;
+	param.rx_csum = -1;
+	param.tx_csum = -1;
+	param.heartbeat = -1;
+	param.ib_multicast = -1;
+	*param.ioc_string = '\0';
+
+	ret = vnic_parse_options(buf, &param);
+
+	if (ret)
+		goto out;
+
+	list_for_each(ptr, &vnic_list) {
+		vnic = list_entry(ptr, struct vnic, list_ptrs);
+		if (!strcmp(vnic->config->name, param.name)) {
+			ret = update_params_and_connect(&param,
+							&vnic->primary_path,
+							count);
+			goto out;
+		}
+	 }
+
+	param.ibdev = target->dev->dev;
+	param.ibport = target;
+	param.port = target->port_num;
+
+	vnic = create_vnic(&param);
+	if (!vnic) {
+		printk(KERN_ERR PFX "creating vnic failed\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (create_netpath(&vnic->primary_path, &param)) {
+		printk(KERN_ERR PFX "creating primary netpath failed\n");
+		goto free_vnic;
+	}
+
+	if (setup_path_class_files(&vnic->primary_path, "primary_path"))
+		goto free_vnic;
+
+	if (vnic && !vnic->primary_path.viport) {
+		printk(KERN_ERR PFX "no valid netpaths\n");
+		goto free_vnic;
+	}
+
+	return count;
+
+free_vnic:
+	vnic_free(vnic);
+	ret = -EINVAL;
+out:
+	return ret;
+}
+
+ssize_t vnic_create_secondary(struct device *dev,
+			      struct device_attribute *dev_attr,
+			      const char *buf, size_t count)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic_ib_port *target =
+	    container_of(info, struct vnic_ib_port, pdev_info);
+
+	struct path_param param;
+	struct vnic *vnic = NULL;
+	int ret = -EINVAL;
+	struct list_head *ptr;
+	int found = 0;
+
+	param.instance = 0;
+	param.rx_csum = -1;
+	param.tx_csum = -1;
+	param.heartbeat = -1;
+	param.ib_multicast = -1;
+	*param.ioc_string = '\0';
+
+	ret = vnic_parse_options(buf, &param);
+
+	if (ret)
+		goto out;
+
+	list_for_each(ptr, &vnic_list) {
+		vnic = list_entry(ptr, struct vnic, list_ptrs);
+		if (!strncmp(vnic->config->name, param.name, IFNAMSIZ)) {
+			if (vnic->secondary_path.viport) {
+				ret = update_params_and_connect(&param,
+								&vnic->secondary_path,
+								count);
+				goto out;
+			}
+			found = 1;
+			break;
+		}
+	}
+
+	if (!found) {
+		printk(KERN_ERR PFX
+		       "primary connection with name '%s' does not exist\n",
+		       param.name);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	param.ibdev = target->dev->dev;
+	param.ibport = target;
+	param.port = target->port_num;
+
+	if (create_netpath(&vnic->secondary_path, &param)) {
+		printk(KERN_ERR PFX "creating secondary netpath failed\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (setup_path_class_files(&vnic->secondary_path, "secondary_path"))
+		goto free_vnic;
+
+	return count;
+
+free_vnic:
+	vnic_free(vnic);
+	ret = -EINVAL;
+out:
+	return ret;
+}
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h
new file mode 100644
index 0000000..b41e770
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_sys.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_SYS_H_INCLUDED
+#define VNIC_SYS_H_INCLUDED
+
+struct dev_info {
+	struct device		dev;
+	struct completion	released;
+};
+
+extern struct class vnic_class;
+extern struct dev_info interface_dev;
+extern struct attribute_group vnic_dev_attr_group;
+extern struct attribute_group vnic_path_attr_group;
+extern struct device_attribute dev_attr_create_primary;
+extern struct device_attribute dev_attr_create_secondary;
+extern struct device_attribute dev_attr_delete_vnic;
+
+extern void vnic_release_dev(struct device *dev);
+
+extern ssize_t vnic_create_primary(struct device *dev,
+				   struct device_attribute *dev_attr,
+				   const char *buf, size_t count);
+
+extern ssize_t vnic_create_secondary(struct device *dev,
+				     struct device_attribute *dev_attr,
+				     const char *buf, size_t count);
+
+extern ssize_t vnic_delete(struct device *dev,
+			   struct device_attribute *dev_attr,
+			   const char *buf, size_t count);
+#endif	/*VNIC_SYS_H_INCLUDED*/


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:20:25 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:50:25 +0530
Subject: [ofa-general] [PATCH 09/13] QLogic VNIC: IB Multicast for Ethernet
	broadcast/multicast
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430172025.31725.97795.stgit@localhost.localdomain>

From: Usha Srinivasan <usha.srinivasan at qlogic.com>

Implementation of ethernet broadcasting and multicasting for QLogic
VNIC interface by making use of underlying IB multicasting. 

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c |  332 +++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h |   76 +++++
 2 files changed, 408 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c
new file mode 100644
index 0000000..044d447
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.c
@@ -0,0 +1,332 @@
+/*
+ * Copyright (c) 2008 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/net.h>
+#include <linux/netdevice.h>
+#include <linux/jiffies.h>
+#include <rdma/ib_sa.h>
+#include "vnic_viport.h"
+#include "vnic_netpath.h"
+#include "vnic_main.h"
+#include "vnic_config.h"
+#include "vnic_control.h"
+#include "vnic_util.h"
+#include "vnic_ib.h"
+
+#define control_ifcfg_name(p)\
+   ((p)->parent->parent->parent->config->name)
+
+#define SET_MCAST_STATE_INVALID \
+do { \
+	viport->mc_info.state = MCAST_STATE_INVALID; \
+	viport->mc_info.mc = NULL; \
+	memset(&viport->mc_info.mgid, 0, sizeof(union ib_gid)); \
+} while (0);
+
+int vnic_mc_init(struct viport *viport)
+{
+	MCAST_FUNCTION("vnic_mc_init %p\n", viport);
+	SET_MCAST_STATE_INVALID;
+	viport->mc_info.retries = 0;
+	spin_lock_init(&viport->mc_info.lock);
+
+	return 0;
+}
+
+void vnic_mc_uninit(struct viport *viport)
+{
+	unsigned long flags;
+	MCAST_FUNCTION("vnic_mc_uninit %p\n", viport);
+
+	spin_lock_irqsave(&viport->mc_info.lock, flags);
+	if ((viport->mc_info.state != MCAST_STATE_INVALID) &&
+	    (viport->mc_info.state != MCAST_STATE_RETRIED)) {
+		MCAST_ERROR("%s mcast state is not INVALID or RETRIED %d\n",
+				control_ifcfg_name(&viport->control),
+				viport->mc_info.state);
+	}
+	spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+	MCAST_FUNCTION("vnic_mc_uninit done\n");
+}
+
+
+/* This function is called when NEED_MCAST_COMPLETION is set.
+ * It finishes off the join multicast work.
+ */
+int vnic_mc_join_handle_completion(struct viport *viport)
+{
+	unsigned long flags;
+	unsigned int ret = 0;
+
+	MCAST_FUNCTION("in vnic_mc_join_handle_completion\n");
+	spin_lock_irqsave(&viport->mc_info.lock, flags);
+	if (viport->mc_info.state != MCAST_STATE_JOINING) {
+		MCAST_ERROR("%s unexpected mcast state in handle_completion: "
+				" %d\n", control_ifcfg_name(&viport->control),
+				viport->mc_info.state);
+		spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+		return -1;
+	}
+	viport->mc_info.state = MCAST_STATE_ATTACHING;
+	spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+	MCAST_INFO("%s calling ib_attach_mcast %lx mgid:"
+			VNIC_GID_FMT " mlid:%x\n",
+			control_ifcfg_name(&viport->control), jiffies,
+			VNIC_GID_RAW_ARG(viport->mc_info.mgid.raw),
+					 viport->mc_info.mlid);
+	ret = ib_attach_mcast(viport->mc_data.ib_conn.qp, &viport->mc_info.mgid,
+			viport->mc_info.mlid);
+	if (ret) {
+		MCAST_ERROR("%s attach mcast qp failed %d\n",
+				control_ifcfg_name(&viport->control), ret);
+		return -1;
+	}
+	MCAST_INFO("%s attached\n",
+			control_ifcfg_name(&viport->control));
+	spin_lock_irqsave(&viport->mc_info.lock, flags);
+	viport->mc_info.state = MCAST_STATE_JOINED_ATTACHED;
+	MCAST_INFO("%s qp attached to mcast group\n",
+			control_ifcfg_name(&viport->control));
+	spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+	return 0;
+}
+
+/* NOTE: ib_sa.h says "returning a non-zero value from this callback will
+ * result in destroying the multicast tracking structure.
+ */
+static int vnic_mc_join_complete(int status,
+				struct ib_sa_multicast *multicast)
+{
+	struct viport *viport = (struct viport *)multicast->context;
+	unsigned long flags;
+
+	MCAST_FUNCTION("in vnic_mc_join_complete status:%x\n", status);
+	if (status) {
+		spin_lock_irqsave(&viport->mc_info.lock, flags);
+		if (status == -ENETRESET) {
+			SET_MCAST_STATE_INVALID;
+			viport->mc_info.retries = 0;
+			spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+			MCAST_ERROR("%s got ENETRESET what's the right thing "
+					"to do?\n",
+					control_ifcfg_name(&viport->control));
+			return status;
+		}
+		/* perhaps the mcgroup hasn't yet been created - retry */
+		viport->mc_info.retries++;
+		viport->mc_info.mc = NULL;
+		if (viport->mc_info.retries > MAX_MCAST_JOIN_RETRIES) {
+			viport->mc_info.state = MCAST_STATE_RETRIED;
+			spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+			MCAST_ERROR("%s join failed 0x%x - max retries:%d "
+					"exceeded\n",
+					control_ifcfg_name(&viport->control),
+					status, viport->mc_info.retries);
+		} else {
+			viport->mc_info.state = MCAST_STATE_INVALID;
+			spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+			spin_lock_irqsave(&viport->lock, flags);
+			viport->updates |= NEED_MCAST_JOIN;
+			spin_unlock_irqrestore(&viport->lock, flags);
+			viport_kick(viport);
+			MCAST_ERROR("%s join failed 0x%x - retrying; "
+					"retries:%d\n",
+					control_ifcfg_name(&viport->control),
+					status, viport->mc_info.retries);
+		}
+		return status;
+	}
+
+	/* finish join work from main state loop for viport - in case
+	 * the work itself cannot be done in a callback environment */
+	spin_lock_irqsave(&viport->lock, flags);
+	viport->mc_info.mlid = be16_to_cpu(multicast->rec.mlid);
+	viport->updates |= NEED_MCAST_COMPLETION;
+	spin_unlock_irqrestore(&viport->lock, flags);
+	viport_kick(viport);
+	MCAST_INFO("%s set NEED_MCAST_COMPLETION %x %x\n",
+			control_ifcfg_name(&viport->control),
+			multicast->rec.mlid, viport->mc_info.mlid);
+	return 0;
+}
+
+void vnic_mc_join_setup(struct viport *viport, union ib_gid *mgid)
+{
+	unsigned long flags;
+
+	MCAST_FUNCTION("in vnic_mc_join_setup\n");
+	spin_lock_irqsave(&viport->mc_info.lock, flags);
+	if (viport->mc_info.state != MCAST_STATE_INVALID) {
+		if (viport->mc_info.state == MCAST_STATE_DETACHING) {
+			MCAST_ERROR("%s detach in progress\n",
+					control_ifcfg_name(&viport->control));
+		} else if (viport->mc_info.state == MCAST_STATE_RETRIED) {
+			MCAST_ERROR("%s max join retries exceeded\n",
+					control_ifcfg_name(&viport->control));
+		} else {
+			/* join/attach in progress or done */
+			/* verify that the current mgid is same as prev mgid */
+			if (memcmp(mgid, &viport->mc_info.mgid, sizeof(union ib_gid)) != 0) {
+				/* Separate MGID for each IOC */
+				MCAST_ERROR("%s Multicast Group MGIDs not "
+					"unique; mgids: " VNIC_GID_FMT
+					 " " VNIC_GID_FMT "\n",
+					control_ifcfg_name(&viport->control),
+					VNIC_GID_RAW_ARG(mgid->raw),
+					VNIC_GID_RAW_ARG(viport->mc_info.mgid.raw));
+			} else
+				MCAST_INFO("%s join already issued: %d\n",
+					control_ifcfg_name(&viport->control),
+					viport->mc_info.state);
+
+		}
+		spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+		return;
+	}
+	viport->mc_info.mgid = *mgid;
+	spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+	spin_lock_irqsave(&viport->lock, flags);
+	viport->updates |= NEED_MCAST_JOIN;
+	spin_unlock_irqrestore(&viport->lock, flags);
+	viport_kick(viport);
+	MCAST_INFO("%s set NEED_MCAST_JOIN \n",
+			control_ifcfg_name(&viport->control));
+}
+
+int vnic_mc_join(struct viport *viport)
+{
+	struct ib_sa_mcmember_rec rec;
+	ib_sa_comp_mask comp_mask;
+	unsigned long flags;
+
+	MCAST_FUNCTION("in vnic_mc_join\n");
+	if (!viport->mc_data.ib_conn.qp) {
+		MCAST_ERROR("%s qp is NULL\n",
+				control_ifcfg_name(&viport->control));
+		return -1;
+	}
+	spin_lock_irqsave(&viport->mc_info.lock, flags);
+	if (viport->mc_info.state != MCAST_STATE_INVALID) {
+		MCAST_INFO("%s join already issued: %d\n",
+				control_ifcfg_name(&viport->control),
+				viport->mc_info.state);
+		spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+		return 0;
+	}
+	viport->mc_info.state = MCAST_STATE_JOINING;
+	spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+
+	memset(&rec, 0, sizeof(rec));
+	rec.join_state = 2; /* bit 1 is Nonmember */
+	rec.mgid = viport->mc_info.mgid;
+	rec.port_gid = viport->config->path_info.path.sgid;
+
+	comp_mask = 	IB_SA_MCMEMBER_REC_MGID     |
+			IB_SA_MCMEMBER_REC_PORT_GID |
+			IB_SA_MCMEMBER_REC_JOIN_STATE;
+
+	MCAST_INFO("%s calling ib_sa_join_multicast %lx mgid:"
+			VNIC_GID_FMT " port_gid: " VNIC_GID_FMT "\n",
+			control_ifcfg_name(&viport->control), jiffies,
+			VNIC_GID_RAW_ARG(rec.mgid.raw),
+			VNIC_GID_RAW_ARG(rec.port_gid.raw));
+
+	viport->mc_info.mc = ib_sa_join_multicast(&vnic_sa_client,
+			viport->config->ibdev, viport->config->port,
+			&rec, comp_mask, GFP_KERNEL,
+			vnic_mc_join_complete, viport);
+
+	if (IS_ERR(viport->mc_info.mc)) {
+		MCAST_ERROR("%s ib_sa_join_multicast failed " VNIC_GID_FMT
+				".\n",
+				control_ifcfg_name(&viport->control),
+				VNIC_GID_RAW_ARG(rec.mgid.raw));
+		spin_lock_irqsave(&viport->mc_info.lock, flags);
+		viport->mc_info.state = MCAST_STATE_INVALID;
+		spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+		return -1;
+	}
+	MCAST_INFO("%s join issued ib_sa_join_multicast mgid:"
+			VNIC_GID_FMT " port_gid: " VNIC_GID_FMT "\n",
+			control_ifcfg_name(&viport->control),
+			VNIC_GID_RAW_ARG(rec.mgid.raw),
+			VNIC_GID_RAW_ARG(rec.port_gid.raw));
+
+	return 0;
+}
+
+void vnic_mc_leave(struct viport *viport)
+{
+	unsigned long flags;
+	unsigned int ret;
+	struct ib_sa_multicast *mc;
+
+	MCAST_FUNCTION("vnic_mc_leave \n");
+
+	spin_lock_irqsave(&viport->mc_info.lock, flags);
+	if ((viport->mc_info.state == MCAST_STATE_INVALID) ||
+	    (viport->mc_info.state == MCAST_STATE_RETRIED)) {
+		spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+		return;
+	}
+
+	if (viport->mc_info.state == MCAST_STATE_JOINED_ATTACHED) {
+
+		viport->mc_info.state = MCAST_STATE_DETACHING;
+		spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+		ret = ib_detach_mcast(viport->mc_data.ib_conn.qp,
+					 &viport->mc_info.mgid,
+					viport->mc_info.mlid);
+		if (ret) {
+			MCAST_ERROR("%s detach failed %d\n",
+				control_ifcfg_name(&viport->control), ret);
+			return;
+		}
+		MCAST_INFO("%s detached succesfully\n",
+				control_ifcfg_name(&viport->control));
+		spin_lock_irqsave(&viport->mc_info.lock, flags);
+	}
+	mc = viport->mc_info.mc;
+	SET_MCAST_STATE_INVALID;
+	viport->mc_info.retries = 0;
+	spin_unlock_irqrestore(&viport->mc_info.lock, flags);
+
+	if (mc) {
+		MCAST_INFO("%s calling ib_sa_free_multicast\n",
+				control_ifcfg_name(&viport->control));
+		ib_sa_free_multicast(mc);
+	}
+	MCAST_FUNCTION("vnic_mc_leave done\n");
+	return;
+}
+
+
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h
new file mode 100644
index 0000000..0e5499d
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_multicast.h
@@ -0,0 +1,76 @@
+/*
+ * Copyright (c) 2008 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __VNIC_MULTICAST_H__
+#define __VNIC_MULTTCAST_H__
+
+enum {
+	MCAST_STATE_INVALID         = 0x00, /* join not attempted or failed */
+	MCAST_STATE_JOINING         = 0x01, /* join mcgroup in progress */
+	MCAST_STATE_ATTACHING       = 0x02, /* join completed with success,
+					     * attach qp to mcgroup in progress
+					     */
+	MCAST_STATE_JOINED_ATTACHED = 0x03, /* join completed with success */
+	MCAST_STATE_DETACHING       = 0x04, /* detach qp in progress */
+	MCAST_STATE_RETRIED         = 0x05, /* retried join and failed */
+};
+
+#define MAX_MCAST_JOIN_RETRIES 	       5 /* used to retry join */
+
+struct mc_info {
+	u8  			state;
+	spinlock_t 		lock;
+	union ib_gid 		mgid;
+	u16 			mlid;
+	struct ib_sa_multicast 	*mc;
+	u8 			retries;
+};
+
+
+int vnic_mc_init(struct viport *viport);
+void vnic_mc_uninit(struct viport *viport);
+
+/* This function is called when a viport gets a multicast mgid from EVIC
+   and must join the multicast group. It sets up NEED_MCAST_JOIN flag, which
+   results in vnic_mc_join being called later. */
+void vnic_mc_join_setup(struct viport *viport, union ib_gid *mgid);
+
+/* This function is called when NEED_MCAST_JOIN flag is set. */
+int vnic_mc_join(struct viport *viport);
+
+/* This function is called when NEED_MCAST_COMPLETION is set.
+   It finishes off the join multicast work. */
+int vnic_mc_join_handle_completion(struct viport *viport);
+
+void vnic_mc_leave(struct viport *viport);
+
+#endif /* __VNIC_MULTICAST_H__ */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:20:55 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:50:55 +0530
Subject: [ofa-general] [PATCH 10/13] QLogic VNIC: Driver Statistics
	collection
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430172055.31725.70663.stgit@localhost.localdomain>

From: Amar Mudrankit <amar.mudrankit at qlogic.com>

Collection of statistics about QLogic VNIC interfaces is implemented
in this patch.

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c |  234 ++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h |  497 +++++++++++++++++++++++++
 2 files changed, 731 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c
new file mode 100644
index 0000000..cebcc26
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.c
@@ -0,0 +1,234 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+
+#include "vnic_main.h"
+
+cycles_t recv_ref;
+
+/*
+ * TODO: Statistics reporting for control path, data path,
+ *       RDMA times, IOs etc
+ *
+ */
+static ssize_t show_lifetime(struct device *dev,
+			     struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+	cycles_t time = get_cycles() - vnic->statistics.start_time;
+
+	return sprintf(buf, "%llu\n", (unsigned long long)time);
+}
+
+static DEVICE_ATTR(lifetime, S_IRUGO, show_lifetime, NULL);
+
+static ssize_t show_conntime(struct device *dev,
+			     struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	if (vnic->statistics.conn_time)
+		return sprintf(buf, "%llu\n",
+			   (unsigned long long)vnic->statistics.conn_time);
+	return 0;
+}
+
+static DEVICE_ATTR(connection_time, S_IRUGO, show_conntime, NULL);
+
+static ssize_t show_disconnects(struct device *dev,
+				struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+	u32 num;
+
+	if (vnic->statistics.disconn_ref)
+		num = vnic->statistics.disconn_num + 1;
+	else
+		num = vnic->statistics.disconn_num;
+
+	return sprintf(buf, "%d\n", num);
+}
+
+static DEVICE_ATTR(disconnects, S_IRUGO, show_disconnects, NULL);
+
+static ssize_t show_total_disconn_time(struct device *dev,
+				       struct device_attribute *dev_attr,
+				       char *buf)
+{
+	struct dev_info *info = container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+	cycles_t time;
+
+	if (vnic->statistics.disconn_ref)
+		time = vnic->statistics.disconn_time +
+		       get_cycles() - vnic->statistics.disconn_ref;
+	else
+		time = vnic->statistics.disconn_time;
+
+	return sprintf(buf, "%llu\n", (unsigned long long)time);
+}
+
+static DEVICE_ATTR(total_disconn_time, S_IRUGO, show_total_disconn_time, NULL);
+
+static ssize_t show_carrier_losses(struct device *dev,
+				   struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+	u32 num;
+
+	if (vnic->statistics.carrier_ref)
+		num = vnic->statistics.carrier_off_num + 1;
+	else
+		num = vnic->statistics.carrier_off_num;
+
+	return sprintf(buf, "%d\n", num);
+}
+
+static DEVICE_ATTR(carrier_losses, S_IRUGO, show_carrier_losses, NULL);
+
+static ssize_t show_total_carr_loss_time(struct device *dev,
+					 struct device_attribute *dev_attr,
+					 char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+	cycles_t time;
+
+	if (vnic->statistics.carrier_ref)
+		time = vnic->statistics.carrier_off_time +
+		       get_cycles() - vnic->statistics.carrier_ref;
+	else
+		time = vnic->statistics.carrier_off_time;
+
+	return sprintf(buf, "%llu\n", (unsigned long long)time);
+}
+
+static DEVICE_ATTR(total_carrier_loss_time, S_IRUGO,
+			 show_total_carr_loss_time, NULL);
+
+static ssize_t show_total_recv_time(struct device *dev,
+				    struct device_attribute *dev_attr,
+				    char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	return sprintf(buf, "%llu\n",
+		       (unsigned long long)vnic->statistics.recv_time);
+}
+
+static DEVICE_ATTR(total_recv_time, S_IRUGO, show_total_recv_time, NULL);
+
+static ssize_t show_recvs(struct device *dev,
+			  struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	return sprintf(buf, "%d\n", vnic->statistics.recv_num);
+}
+
+static DEVICE_ATTR(recvs, S_IRUGO, show_recvs, NULL);
+
+static ssize_t show_multicast_recvs(struct device *dev,
+				    struct device_attribute *dev_attr,
+				    char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	return sprintf(buf, "%d\n", vnic->statistics.multicast_recv_num);
+}
+
+static DEVICE_ATTR(multicast_recvs, S_IRUGO, show_multicast_recvs, NULL);
+
+static ssize_t show_total_xmit_time(struct device *dev,
+				    struct device_attribute *dev_attr,
+				    char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	return sprintf(buf, "%llu\n",
+		       (unsigned long long)vnic->statistics.xmit_time);
+}
+
+static DEVICE_ATTR(total_xmit_time, S_IRUGO, show_total_xmit_time, NULL);
+
+static ssize_t show_xmits(struct device *dev,
+			  struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	return sprintf(buf, "%d\n", vnic->statistics.xmit_num);
+}
+
+static DEVICE_ATTR(xmits, S_IRUGO, show_xmits, NULL);
+
+static ssize_t show_failed_xmits(struct device *dev,
+				 struct device_attribute *dev_attr, char *buf)
+{
+	struct dev_info *info =	container_of(dev, struct dev_info, dev);
+	struct vnic *vnic = container_of(info, struct vnic, stat_info);
+
+	return sprintf(buf, "%d\n", vnic->statistics.xmit_fail);
+}
+
+static DEVICE_ATTR(failed_xmits, S_IRUGO, show_failed_xmits, NULL);
+
+static struct attribute *vnic_stats_attrs[] = {
+	&dev_attr_lifetime.attr,
+	&dev_attr_xmits.attr,
+	&dev_attr_total_xmit_time.attr,
+	&dev_attr_failed_xmits.attr,
+	&dev_attr_recvs.attr,
+	&dev_attr_multicast_recvs.attr,
+	&dev_attr_total_recv_time.attr,
+	&dev_attr_connection_time.attr,
+	&dev_attr_disconnects.attr,
+	&dev_attr_total_disconn_time.attr,
+	&dev_attr_carrier_losses.attr,
+	&dev_attr_total_carrier_loss_time.attr,
+	NULL
+};
+
+struct attribute_group vnic_stats_attr_group = {
+	.attrs = vnic_stats_attrs,
+};
diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h
new file mode 100644
index 0000000..af77794
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_stats.h
@@ -0,0 +1,497 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_STATS_H_INCLUDED
+#define VNIC_STATS_H_INCLUDED
+
+#include "vnic_main.h"
+#include "vnic_ib.h"
+#include "vnic_sys.h"
+
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_STATS
+
+static inline void vnic_connected_stats(struct vnic *vnic)
+{
+	if (vnic->statistics.conn_time == 0) {
+		vnic->statistics.conn_time =
+		    get_cycles() - vnic->statistics.start_time;
+	}
+
+	if (vnic->statistics.disconn_ref != 0) {
+		vnic->statistics.disconn_time +=
+		    get_cycles() - vnic->statistics.disconn_ref;
+		vnic->statistics.disconn_num++;
+		vnic->statistics.disconn_ref = 0;
+	}
+
+}
+
+static inline void vnic_stop_xmit_stats(struct vnic *vnic)
+{
+	if (vnic->statistics.xmit_ref == 0)
+		vnic->statistics.xmit_ref = get_cycles();
+}
+
+static inline void vnic_restart_xmit_stats(struct vnic *vnic)
+{
+	if (vnic->statistics.xmit_ref != 0) {
+		vnic->statistics.xmit_off_time +=
+		    get_cycles() - vnic->statistics.xmit_ref;
+		vnic->statistics.xmit_off_num++;
+		vnic->statistics.xmit_ref = 0;
+	}
+}
+
+static inline void vnic_recv_pkt_stats(struct vnic *vnic)
+{
+	vnic->statistics.recv_time += get_cycles() - recv_ref;
+	vnic->statistics.recv_num++;
+}
+
+static inline void vnic_multicast_recv_pkt_stats(struct vnic *vnic)
+{
+	vnic->statistics.multicast_recv_num++;
+}
+
+static inline void vnic_pre_pkt_xmit_stats(cycles_t *time)
+{
+	*time = get_cycles();
+}
+
+static inline void vnic_post_pkt_xmit_stats(struct vnic *vnic,
+					    cycles_t time)
+{
+	vnic->statistics.xmit_time += get_cycles() - time;
+	vnic->statistics.xmit_num++;
+
+}
+
+static inline void vnic_xmit_fail_stats(struct vnic *vnic)
+{
+	vnic->statistics.xmit_fail++;
+}
+
+static inline void vnic_carrier_loss_stats(struct vnic *vnic)
+{
+	if (vnic->statistics.carrier_ref != 0) {
+		vnic->statistics.carrier_off_time +=
+			get_cycles() -  vnic->statistics.carrier_ref;
+		vnic->statistics.carrier_off_num++;
+		vnic->statistics.carrier_ref = 0;
+	}
+}
+
+static inline int vnic_setup_stats_files(struct vnic *vnic)
+{
+	init_completion(&vnic->stat_info.released);
+	vnic->stat_info.dev.class = NULL;
+	vnic->stat_info.dev.parent = &vnic->dev_info.dev;
+	vnic->stat_info.dev.release = vnic_release_dev;
+	snprintf(vnic->stat_info.dev.bus_id, BUS_ID_SIZE,
+		 "stats");
+
+	if (device_register(&vnic->stat_info.dev)) {
+		SYS_ERROR("create_vnic: error in registering"
+			  " stat class dev\n");
+		goto stats_out;
+	}
+
+	if (sysfs_create_group(&vnic->stat_info.dev.kobj,
+			       &vnic_stats_attr_group))
+		goto err_stats_file;
+
+	return 0;
+err_stats_file:
+	device_unregister(&vnic->stat_info.dev);
+	wait_for_completion(&vnic->stat_info.released);
+stats_out:
+	return -1;
+}
+
+static inline void vnic_cleanup_stats_files(struct vnic *vnic)
+{
+	sysfs_remove_group(&vnic->dev_info.dev.kobj,
+			   &vnic_stats_attr_group);
+	device_unregister(&vnic->stat_info.dev);
+	wait_for_completion(&vnic->stat_info.released);
+}
+
+static inline void vnic_disconn_stats(struct vnic *vnic)
+{
+	if (!vnic->statistics.disconn_ref)
+		vnic->statistics.disconn_ref = get_cycles();
+
+	if (vnic->statistics.carrier_ref == 0)
+		vnic->statistics.carrier_ref = get_cycles();
+}
+
+static inline void vnic_alloc_stats(struct vnic *vnic)
+{
+	vnic->statistics.start_time = get_cycles();
+}
+
+static inline void control_note_rsptime_stats(cycles_t *time)
+{
+	*time = get_cycles();
+}
+
+static inline void control_update_rsptime_stats(struct control *control,
+						cycles_t response_time)
+{
+	response_time -= control->statistics.request_time;
+	control->statistics.response_time += response_time;
+	control->statistics.response_num++;
+	if (control->statistics.response_max < response_time)
+		control->statistics.response_max = response_time;
+	if ((control->statistics.response_min == 0) ||
+	    (control->statistics.response_min > response_time))
+		control->statistics.response_min =  response_time;
+
+}
+
+static inline void control_note_reqtime_stats(struct control *control)
+{
+	control->statistics.request_time = get_cycles();
+}
+
+static inline void control_timeout_stats(struct control *control)
+{
+	control->statistics.timeout_num++;
+}
+
+static inline void data_kickreq_stats(struct data *data)
+{
+	data->statistics.kick_reqs++;
+}
+
+static inline void data_no_xmitbuf_stats(struct data *data)
+{
+	data->statistics.no_xmit_bufs++;
+}
+
+static inline void data_xmits_stats(struct data *data)
+{
+	data->statistics.xmit_num++;
+}
+
+static inline void data_recvs_stats(struct data *data)
+{
+	data->statistics.recv_num++;
+}
+
+static inline void data_note_kickrcv_time(void)
+{
+	recv_ref = get_cycles();
+}
+
+static inline void data_rcvkicks_stats(struct data *data)
+{
+	data->statistics.kick_recvs++;
+}
+
+
+static inline void vnic_ib_conntime_stats(struct vnic_ib_conn *ib_conn)
+{
+	ib_conn->statistics.connection_time = get_cycles();
+}
+
+static inline void vnic_ib_note_comptime_stats(cycles_t *time)
+{
+	*time = get_cycles();
+}
+
+static inline void vnic_ib_callback_stats(struct vnic_ib_conn *ib_conn)
+{
+	ib_conn->statistics.num_callbacks++;
+}
+
+static inline void vnic_ib_comp_stats(struct vnic_ib_conn *ib_conn,
+				      u32 *comp_num)
+{
+	ib_conn->statistics.num_ios++;
+	*comp_num = *comp_num + 1;
+
+}
+
+static inline void vnic_ib_io_stats(struct io *io,
+				    struct vnic_ib_conn *ib_conn,
+				    cycles_t comp_time)
+{
+	if ((io->type == RECV) || (io->type == RECV_UD))
+		io->time = comp_time;
+	else if (io->type == RDMA) {
+		ib_conn->statistics.rdma_comp_time += comp_time - io->time;
+		ib_conn->statistics.rdma_comp_ios++;
+	} else if (io->type == SEND) {
+		ib_conn->statistics.send_comp_time += comp_time - io->time;
+		ib_conn->statistics.send_comp_ios++;
+	}
+}
+
+static inline void vnic_ib_maxio_stats(struct vnic_ib_conn *ib_conn,
+				       u32 comp_num)
+{
+	if (comp_num > ib_conn->statistics.max_ios)
+		ib_conn->statistics.max_ios = comp_num;
+}
+
+static inline void vnic_ib_connected_time_stats(struct vnic_ib_conn *ib_conn)
+{
+	ib_conn->statistics.connection_time =
+			 get_cycles() - ib_conn->statistics.connection_time;
+
+}
+
+static inline void vnic_ib_pre_rcvpost_stats(struct vnic_ib_conn *ib_conn,
+					     struct io *io,
+					     cycles_t *time)
+{
+	*time = get_cycles();
+	if (io->time != 0) {
+		ib_conn->statistics.recv_comp_time += *time - io->time;
+		ib_conn->statistics.recv_comp_ios++;
+	}
+
+}
+
+static inline void vnic_ib_post_rcvpost_stats(struct vnic_ib_conn *ib_conn,
+					      cycles_t time)
+{
+	ib_conn->statistics.recv_post_time += get_cycles() - time;
+	ib_conn->statistics.recv_post_ios++;
+}
+
+static inline void vnic_ib_pre_sendpost_stats(struct io *io,
+					      cycles_t *time)
+{
+	io->time = *time = get_cycles();
+}
+
+static inline void vnic_ib_post_sendpost_stats(struct vnic_ib_conn *ib_conn,
+					       struct io *io,
+					       cycles_t time)
+{
+	time = get_cycles() - time;
+	if (io->swr.opcode == IB_WR_RDMA_WRITE) {
+		ib_conn->statistics.rdma_post_time += time;
+		ib_conn->statistics.rdma_post_ios++;
+	} else {
+		ib_conn->statistics.send_post_time += time;
+		ib_conn->statistics.send_post_ios++;
+	}
+}
+#else	/*CONFIG_INIFINIBAND_VNIC_STATS*/
+
+static inline void vnic_connected_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_stop_xmit_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_restart_xmit_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_recv_pkt_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_multicast_recv_pkt_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_pre_pkt_xmit_stats(cycles_t *time)
+{
+	;
+}
+
+static inline void vnic_post_pkt_xmit_stats(struct vnic *vnic,
+					    cycles_t time)
+{
+	;
+}
+
+static inline void vnic_xmit_fail_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline int vnic_setup_stats_files(struct vnic *vnic)
+{
+	return 0;
+}
+
+static inline void vnic_cleanup_stats_files(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_carrier_loss_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_disconn_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void vnic_alloc_stats(struct vnic *vnic)
+{
+	;
+}
+
+static inline void control_note_rsptime_stats(cycles_t *time)
+{
+	;
+}
+
+static inline void control_update_rsptime_stats(struct control *control,
+						cycles_t response_time)
+{
+	;
+}
+
+static inline void control_note_reqtime_stats(struct control *control)
+{
+	;
+}
+
+static inline void control_timeout_stats(struct control *control)
+{
+	;
+}
+
+static inline void data_kickreq_stats(struct data *data)
+{
+	;
+}
+
+static inline void data_no_xmitbuf_stats(struct data *data)
+{
+	;
+}
+
+static inline void data_xmits_stats(struct data *data)
+{
+	;
+}
+
+static inline void data_recvs_stats(struct data *data)
+{
+	;
+}
+
+static inline void data_note_kickrcv_time(void)
+{
+	;
+}
+
+static inline void data_rcvkicks_stats(struct data *data)
+{
+	;
+}
+
+static inline void vnic_ib_conntime_stats(struct vnic_ib_conn *ib_conn)
+{
+	;
+}
+
+static inline void vnic_ib_note_comptime_stats(cycles_t *time)
+{
+	;
+}
+
+static inline void vnic_ib_callback_stats(struct vnic_ib_conn *ib_conn)
+
+{
+	;
+}
+static inline void vnic_ib_comp_stats(struct vnic_ib_conn *ib_conn,
+				      u32 *comp_num)
+{
+	;
+}
+
+static inline void vnic_ib_io_stats(struct io *io,
+				    struct vnic_ib_conn *ib_conn,
+				    cycles_t comp_time)
+{
+	;
+}
+
+static inline void vnic_ib_maxio_stats(struct vnic_ib_conn *ib_conn,
+				       u32 comp_num)
+{
+	;
+}
+
+static inline void vnic_ib_connected_time_stats(struct vnic_ib_conn *ib_conn)
+{
+	;
+}
+
+static inline void vnic_ib_pre_rcvpost_stats(struct vnic_ib_conn *ib_conn,
+					     struct io *io,
+					     cycles_t *time)
+{
+	;
+}
+
+static inline void vnic_ib_post_rcvpost_stats(struct vnic_ib_conn *ib_conn,
+					      cycles_t time)
+{
+	;
+}
+
+static inline void vnic_ib_pre_sendpost_stats(struct io *io,
+					      cycles_t *time)
+{
+	;
+}
+
+static inline void vnic_ib_post_sendpost_stats(struct vnic_ib_conn *ib_conn,
+					       struct io *io,
+					       cycles_t time)
+{
+	;
+}
+#endif	/*CONFIG_INIFINIBAND_VNIC_STATS*/
+
+#endif	/*VNIC_STATS_H_INCLUDED*/


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:21:26 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:51:26 +0530
Subject: [ofa-general] [PATCH 11/13] QLogic VNIC: Driver utility file -
	implements various utility macros
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430172126.31725.48554.stgit@localhost.localdomain>

From: Poornima Kamath <poornima.kamath at qlogic.com>

This patch adds the driver utility file which mainly contains utility
macros for debugging of QLogic VNIC driver.

Signed-off-by: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/vnic_util.h |  251 ++++++++++++++++++++++++++
 1 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/vnic_util.h

diff --git a/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h b/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h
new file mode 100644
index 0000000..4d7d540
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/vnic_util.h
@@ -0,0 +1,251 @@
+/*
+ * Copyright (c) 2006 QLogic, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef VNIC_UTIL_H_INCLUDED
+#define VNIC_UTIL_H_INCLUDED
+
+#define MODULE_NAME "QLGC_VNIC"
+
+#define VNIC_MAJORVERSION	1
+#define VNIC_MINORVERSION	1
+
+#define is_power_of2(value)	(((value) & ((value - 1))) == 0)
+#define ALIGN_DOWN(x, a)	((x)&(~((a)-1)))
+
+extern u32 vnic_debug;
+
+enum {
+	DEBUG_IB_INFO			= 0x00000001,
+	DEBUG_IB_FUNCTION		= 0x00000002,
+	DEBUG_IB_FSTATUS		= 0x00000004,
+	DEBUG_IB_ASSERTS		= 0x00000008,
+	DEBUG_CONTROL_INFO		= 0x00000010,
+	DEBUG_CONTROL_FUNCTION	= 0x00000020,
+	DEBUG_CONTROL_PACKET	= 0x00000040,
+	DEBUG_CONFIG_INFO		= 0x00000100,
+	DEBUG_DATA_INFO 		= 0x00001000,
+	DEBUG_DATA_FUNCTION		= 0x00002000,
+	DEBUG_NETPATH_INFO		= 0x00010000,
+	DEBUG_VIPORT_INFO		= 0x00100000,
+	DEBUG_VIPORT_FUNCTION	= 0x00200000,
+	DEBUG_LINK_STATE		= 0x00400000,
+	DEBUG_VNIC_INFO 		= 0x01000000,
+	DEBUG_VNIC_FUNCTION		= 0x02000000,
+	DEBUG_MCAST_INFO		= 0x04000000,
+	DEBUG_MCAST_FUNCTION	= 0x08000000,
+	DEBUG_SYS_INFO			= 0x10000000,
+	DEBUG_SYS_VERBOSE		= 0x40000000
+};
+
+#ifdef CONFIG_INFINIBAND_QLGC_VNIC_DEBUG
+#define PRINT(level, x, fmt, arg...)					\
+	printk(level "%s: %s: %s, line %d: " fmt,			\
+	       MODULE_NAME, x, __FILE__, __LINE__, ##arg)
+
+#define PRINT_CONDITIONAL(level, x, condition, fmt, arg...)		\
+	do {								\
+		if (condition)						\
+			printk(level "%s: %s: %s, line %d: " fmt,	\
+			       MODULE_NAME, x, __FILE__, __LINE__,	\
+			       ##arg);					\
+	} while (0)
+#else
+#define PRINT(level, x, fmt, arg...)					\
+	printk(level "%s: " fmt, MODULE_NAME, ##arg)
+
+#define PRINT_CONDITIONAL(level, x, condition, fmt, arg...)		\
+	do {								\
+		 if (condition)						\
+			printk(level "%s: %s: " fmt,			\
+			       MODULE_NAME, x, ##arg);			\
+	} while (0)
+#endif	/*CONFIG_INFINIBAND_QLGC_VNIC_DEBUG*/
+
+#define IB_PRINT(fmt, arg...)			\
+	PRINT(KERN_INFO, "IB", fmt, ##arg)
+#define IB_ERROR(fmt, arg...)			\
+	PRINT(KERN_ERR, "IB", fmt, ##arg)
+
+#define IB_FUNCTION(fmt, arg...) 				\
+	PRINT_CONDITIONAL(KERN_INFO, 				\
+			  "IB", 				\
+			  (vnic_debug & DEBUG_IB_FUNCTION), 	\
+			  fmt, ##arg)
+
+#define IB_INFO(fmt, arg...)					\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "IB",					\
+			  (vnic_debug & DEBUG_IB_INFO),		\
+			  fmt, ##arg)
+
+#define IB_ASSERT(x)							\
+	do {								\
+		 if ((vnic_debug & DEBUG_IB_ASSERTS) && !(x))		\
+			panic("%s assertion failed, file:  %s,"		\
+				" line %d: ",				\
+				MODULE_NAME, __FILE__, __LINE__)	\
+	} while (0)
+
+#define CONTROL_PRINT(fmt, arg...)			\
+	PRINT(KERN_INFO, "CONTROL", fmt, ##arg)
+#define CONTROL_ERROR(fmt, arg...)			\
+	PRINT(KERN_ERR, "CONTROL", fmt, ##arg)
+
+#define CONTROL_INFO(fmt, arg...)					\
+	PRINT_CONDITIONAL(KERN_INFO,					\
+			  "CONTROL",					\
+			  (vnic_debug & DEBUG_CONTROL_INFO),		\
+			  fmt, ##arg)
+
+#define CONTROL_FUNCTION(fmt, arg...)					\
+	PRINT_CONDITIONAL(KERN_INFO,					\
+			"CONTROL",					\
+			(vnic_debug & DEBUG_CONTROL_FUNCTION),		\
+			fmt, ##arg)
+
+#define CONTROL_PACKET(pkt)					\
+	do {							\
+		 if (vnic_debug & DEBUG_CONTROL_PACKET)		\
+			control_log_control_packet(pkt);	\
+	} while (0)
+
+#define CONFIG_PRINT(fmt, arg...)		\
+	PRINT(KERN_INFO, "CONFIG", fmt, ##arg)
+#define CONFIG_ERROR(fmt, arg...)		\
+	PRINT(KERN_ERR, "CONFIG", fmt, ##arg)
+
+#define CONFIG_INFO(fmt, arg...)				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "CONFIG",				\
+			  (vnic_debug & DEBUG_CONFIG_INFO),	\
+			  fmt, ##arg)
+
+#define DATA_PRINT(fmt, arg...)			\
+	PRINT(KERN_INFO, "DATA", fmt, ##arg)
+#define DATA_ERROR(fmt, arg...)			\
+	PRINT(KERN_ERR, "DATA", fmt, ##arg)
+
+#define DATA_INFO(fmt, arg...)					\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "DATA",				\
+			  (vnic_debug & DEBUG_DATA_INFO),	\
+			  fmt, ##arg)
+
+#define DATA_FUNCTION(fmt, arg...)				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "DATA",				\
+			  (vnic_debug & DEBUG_DATA_FUNCTION),	\
+			  fmt, ##arg)
+
+
+#define MCAST_PRINT(fmt, arg...)        \
+    PRINT(KERN_INFO, "MCAST", fmt, ##arg)
+#define MCAST_ERROR(fmt, arg...)        \
+    PRINT(KERN_ERR, "MCAST", fmt, ##arg)
+
+#define MCAST_INFO(fmt, arg...)   	              		\
+	PRINT_CONDITIONAL(KERN_INFO,     			\
+			"MCAST",   				\
+			(vnic_debug & DEBUG_MCAST_INFO),	\
+			fmt, ##arg)
+
+#define MCAST_FUNCTION(fmt, arg...)				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			"MCAST",				\
+			(vnic_debug & DEBUG_MCAST_FUNCTION), 	\
+			fmt, ##arg)
+
+#define NETPATH_PRINT(fmt, arg...)		\
+	PRINT(KERN_INFO, "NETPATH", fmt, ##arg)
+#define NETPATH_ERROR(fmt, arg...)		\
+	PRINT(KERN_ERR, "NETPATH", fmt, ##arg)
+
+#define NETPATH_INFO(fmt, arg...)				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "NETPATH",				\
+			  (vnic_debug & DEBUG_NETPATH_INFO),	\
+			  fmt, ##arg)
+
+#define VIPORT_PRINT(fmt, arg...)		\
+	PRINT(KERN_INFO, "VIPORT", fmt, ##arg)
+#define VIPORT_ERROR(fmt, arg...)		\
+	PRINT(KERN_ERR, "VIPORT", fmt, ##arg)
+
+#define VIPORT_INFO(fmt, arg...) 				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "VIPORT",				\
+			  (vnic_debug & DEBUG_VIPORT_INFO),	\
+			  fmt, ##arg)
+
+#define VIPORT_FUNCTION(fmt, arg...)				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "VIPORT",				\
+			  (vnic_debug & DEBUG_VIPORT_FUNCTION),	\
+			  fmt, ##arg)
+
+#define LINK_STATE(fmt, arg...) 				\
+	PRINT_CONDITIONAL(KERN_INFO,				\
+			  "LINK",				\
+			  (vnic_debug & DEBUG_LINK_STATE),	\
+			  fmt, ##arg)
+
+#define VNIC_PRINT(fmt, arg...)			\
+	PRINT(KERN_INFO, "NIC", fmt, ##arg)
+#define VNIC_ERROR(fmt, arg...)			\
+	PRINT(KERN_ERR, "NIC", fmt, ##arg)
+#define VNIC_INIT(fmt, arg...)			\
+	PRINT(KERN_INFO, "NIC", fmt, ##arg)
+
+#define VNIC_INFO(fmt, arg...)					\
+	 PRINT_CONDITIONAL(KERN_INFO,				\
+			   "NIC",				\
+			   (vnic_debug & DEBUG_VNIC_INFO),	\
+			   fmt, ##arg)
+
+#define VNIC_FUNCTION(fmt, arg...)				\
+	 PRINT_CONDITIONAL(KERN_INFO,				\
+			   "NIC",				\
+			   (vnic_debug & DEBUG_VNIC_FUNCTION),	\
+			   fmt, ##arg)
+
+#define SYS_PRINT(fmt, arg...)			\
+	PRINT(KERN_INFO, "SYS", fmt, ##arg)
+#define SYS_ERROR(fmt, arg...)			\
+	PRINT(KERN_ERR, "SYS", fmt, ##arg)
+
+#define SYS_INFO(fmt, arg...)					\
+	 PRINT_CONDITIONAL(KERN_INFO,				\
+			   "SYS",				\
+			   (vnic_debug & DEBUG_SYS_INFO),	\
+			   fmt, ##arg)
+
+#endif	/* VNIC_UTIL_H_INCLUDED */


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:21:56 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:51:56 +0530
Subject: [ofa-general] [PATCH 12/13] QLogic VNIC: Driver Kconfig and
	Makefile.
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430172156.31725.94843.stgit@localhost.localdomain>

From: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>

Kconfig and Makefile for the QLogic VNIC driver.

Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/ulp/qlgc_vnic/Kconfig  |   28 ++++++++++++++++++++++++++++
 drivers/infiniband/ulp/qlgc_vnic/Makefile |   13 +++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Kconfig
 create mode 100644 drivers/infiniband/ulp/qlgc_vnic/Makefile

diff --git a/drivers/infiniband/ulp/qlgc_vnic/Kconfig b/drivers/infiniband/ulp/qlgc_vnic/Kconfig
new file mode 100644
index 0000000..6a08770
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/Kconfig
@@ -0,0 +1,28 @@
+config INFINIBAND_QLGC_VNIC
+	tristate "QLogic VNIC - Support for QLogic Ethernet Virtual I/O Controller"
+	depends on INFINIBAND && NETDEVICES && INET
+	---help---
+	  Support for the QLogic Ethernet Virtual I/O Controller
+	  (EVIC). In conjunction with the EVIC, this provides virtual
+	  ethernet interfaces and transports ethernet packets over
+	  InfiniBand so that you can communicate with Ethernet networks
+	  using your IB device.
+
+config INFINIBAND_QLGC_VNIC_DEBUG
+	bool "QLogic VNIC Verbose debugging"
+	depends on INFINIBAND_QLGC_VNIC
+	default n
+	---help---
+	  This option causes verbose debugging code to be compiled
+	  into the QLogic VNIC driver.  The output can be turned on via the
+	  vnic_debug module parameter.
+
+config INFINIBAND_QLGC_VNIC_STATS
+	bool "QLogic VNIC Statistics"
+	depends on INFINIBAND_QLGC_VNIC
+	default n
+	---help---
+	  This option compiles statistics collecting code into the
+	  data path of the QLogic VNIC driver to help in profiling and fine
+	  tuning. This adds some overhead in the interest of gathering
+	  data.
diff --git a/drivers/infiniband/ulp/qlgc_vnic/Makefile b/drivers/infiniband/ulp/qlgc_vnic/Makefile
new file mode 100644
index 0000000..509dd67
--- /dev/null
+++ b/drivers/infiniband/ulp/qlgc_vnic/Makefile
@@ -0,0 +1,13 @@
+obj-$(CONFIG_INFINIBAND_QLGC_VNIC)		+= qlgc_vnic.o
+
+qlgc_vnic-y					:= vnic_main.o \
+						   vnic_ib.o \
+						   vnic_viport.o \
+						   vnic_control.o \
+						   vnic_data.o \
+						   vnic_netpath.o \
+						   vnic_config.o \
+						   vnic_sys.o \
+						   vnic_multicast.o
+
+qlgc_vnic-$(CONFIG_INFINIBAND_QLGC_VNIC_STATS)	+= vnic_stats.o


From ramachandra.kuchimanchi at qlogic.com  Wed Apr 30 10:22:26 2008
From: ramachandra.kuchimanchi at qlogic.com (Ramachandra K)
Date: Wed, 30 Apr 2008 22:52:26 +0530
Subject: [ofa-general] [PATCH 13/13] QLogic VNIC: Modifications to IB Kconfig
	and Makefile
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <20080430172226.31725.57890.stgit@localhost.localdomain>

From: Ramachandra K <ramachandra.kuchimanchi at qlogic.com>

This patch modifies the toplevel Infiniband Kconfig and Makefile
to include QLogic VNIC as new ULP.

Signed-off-by: Poornima Kamath <poornima.kamath at qlogic.com>
Signed-off-by: Amar Mudrankit <amar.mudrankit at qlogic.com>
---

 drivers/infiniband/Kconfig  |    2 ++
 drivers/infiniband/Makefile |    1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index a5dc78a..0775df5 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -53,4 +53,6 @@ source "drivers/infiniband/ulp/srp/Kconfig"
 
 source "drivers/infiniband/ulp/iser/Kconfig"
 
+source "drivers/infiniband/ulp/qlgc_vnic/Kconfig"
+
 endif # INFINIBAND
diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index ed35e44..845271e 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_INFINIBAND_NES)		+= hw/nes/
 obj-$(CONFIG_INFINIBAND_IPOIB)		+= ulp/ipoib/
 obj-$(CONFIG_INFINIBAND_SRP)		+= ulp/srp/
 obj-$(CONFIG_INFINIBAND_ISER)		+= ulp/iser/
+obj-$(CONFIG_INFINIBAND_QLGC_VNIC)	+= ulp/qlgc_vnic/


From eli at dev.mellanox.co.il  Wed Apr 30 10:39:16 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 30 Apr 2008 20:39:16 +0300
Subject: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup
Message-ID: <1209577156.1790.11.camel@mtls03>

>From 1644c62982335b5cf67300ccba2533016e240d6a Mon Sep 17 00:00:00 2001
From: Eli Cohen <eli at mellanox.co.il>
Date: Wed, 30 Apr 2008 20:37:31 +0300
Subject: [PATCH] IB/ipoib: fix net queue lockup

Fix the lockup of the net queue introduced in the split CQ patch.
The idea is to arm the send CQ just before posting the last send
request to the QP. When the completion handler is called, drain
the CQ. Since not all the CQEs might already be in the CQ, verify
that the the net queue has been woken up. If not arm a timer and
drain again at the timer function.

In order to reduce the number of cases the queue is stopped we should
use a larger tx queue.

Roland,
we haves seen a few other cases where a large tx queue is needed. I
think we should choose a larger default value than the current 64.
How about 256?
---
 drivers/infiniband/ulp/ipoib/ipoib.h       |    2 +
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |   47 +++++++++++++++++++++++++---
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |    3 +-
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 9044f88..b46baf2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -334,6 +334,7 @@ struct ipoib_dev_priv {
 #endif
 	int	hca_caps;
 	struct ipoib_ethtool_st ethtool;
+	struct timer_list poll_timer;
 };
 
 struct ipoib_ah {
@@ -404,6 +405,7 @@ extern struct workqueue_struct *ipoib_workqueue;
 
 int ipoib_poll(struct napi_struct *napi, int budget);
 void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr);
+void send_comp_handler(struct ib_cq *cq, void *dev_ptr);
 
 struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
 				 struct ib_pd *pd, struct ib_ah_attr *attr);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 97b815c..e620a90 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -461,6 +461,26 @@ void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
 	netif_rx_schedule(dev, &priv->napi);
 }
 
+static void drain_tx_cq(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	while(poll_tx(priv))
+		; /* nothing */
+
+	if (netif_queue_stopped(dev))
+		mod_timer(&priv->poll_timer, jiffies + 1);
+
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+void send_comp_handler(struct ib_cq *cq, void *dev_ptr)
+{
+	drain_tx_cq((struct net_device *)dev_ptr);
+}
+
 static inline int post_send(struct ipoib_dev_priv *priv,
 			    unsigned int wr_id,
 			    struct ib_ah *address, u32 qpn,
@@ -555,12 +575,22 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	else
 		priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM;
 
+	if (++priv->tx_outstanding == ipoib_sendq_size) {
+		ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
+		if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP))
+			ipoib_warn(priv, "request notify on send queue failed\n");
+		netif_stop_queue(dev);
+	}
+
 	if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1),
 			       address->ah, qpn, tx_req, phead, hlen))) {
 		ipoib_warn(priv, "post_send failed\n");
 		++dev->stats.tx_errors;
+		--priv->tx_outstanding;
 		ipoib_dma_unmap_tx(priv->ca, tx_req);
 		dev_kfree_skb_any(skb);
+		if (netif_queue_stopped(dev))
+			netif_wake_queue(dev);
 	} else {
 		dev->trans_start = jiffies;
 
@@ -568,14 +598,11 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		++priv->tx_head;
 		skb_orphan(skb);
 
-		if (++priv->tx_outstanding == ipoib_sendq_size) {
-			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
-			netif_stop_queue(dev);
-		}
 	}
 
 	if (unlikely(priv->tx_outstanding > MAX_SEND_CQE))
-		poll_tx(priv);
+		while(poll_tx(priv))
+			; /* nothing */
 }
 
 static void __ipoib_reap_ah(struct net_device *dev)
@@ -609,6 +636,11 @@ void ipoib_reap_ah(struct work_struct *work)
 				   round_jiffies_relative(HZ));
 }
 
+static void ipoib_ib_tx_timer_func(unsigned long ctx)
+{
+        drain_tx_cq((struct net_device *)ctx);
+}
+
 int ipoib_ib_dev_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -645,6 +677,10 @@ int ipoib_ib_dev_open(struct net_device *dev)
 	queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task,
 			   round_jiffies_relative(HZ));
 
+	init_timer(&priv->poll_timer);
+	priv->poll_timer.function = ipoib_ib_tx_timer_func;
+	priv->poll_timer.data = (unsigned long)dev;
+
 	set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags);
 
 	return 0;
@@ -810,6 +846,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush)
 	ipoib_dbg(priv, "All sends and receives done.\n");
 
 timeout:
+	del_timer_sync(&priv->poll_timer);
 	qp_attr.qp_state = IB_QPS_RESET;
 	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
 		ipoib_warn(priv, "Failed to modify QP to RESET state\n");
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index c1e7ece..706384d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -187,7 +187,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_mr;
 	}
 
-	priv->send_cq = ib_create_cq(priv->ca, NULL, NULL, dev, ipoib_sendq_size, 0);
+	priv->send_cq = ib_create_cq(priv->ca, send_comp_handler, NULL, dev,
+				     ipoib_sendq_size, 0);
 	if (IS_ERR(priv->send_cq)) {
 		printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name);
 		goto out_free_recv_cq;
-- 
1.5.5


From swise at opengridcomputing.com  Wed Apr 30 11:28:55 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 13:28:55 -0500
Subject: [ofa-general] Re: iwarp-specific async events
In-Reply-To: <adak5if8dw4.fsf@cisco.com>
References: <4818A243.1090201@opengridcomputing.com>
	<adak5if8dw4.fsf@cisco.com>
Message-ID: <4818BA67.9000201@opengridcomputing.com>

Roland Dreier wrote:
>  > I'm looking for a good way to trigger iwarp QP flushing on a normal
>  > disconnect for user mode QPs.  The async event notification provider
>  > ops function is one way I can do it easily with the currently
>  > infrastructure, if we add some new event types.   For example, if a
>  > fatal error occurs on a QP which causes the connection to be aborted,
>  > then the kernel driver will mark the user qp as "in error" and post a
>  > FATAL_QP event.  When the app reaps that event, the libcxgb3 async
>  > event ops function will flush the user's qp.  However for a normal non
>  > fatal close, no async event is posted.  But one should be.  The iWARP
>  > verbs specify many async event types that I think we need to add at
>  > some point.  Case in point:
>  > 
>  > LLP Close Complete  (qp event) - The TCP connection completed and no
>  > SQ WQEs were flushed (normal close)
>
> Yeah, it makes sense just to add any iWARP events that make sense and
> don't fit the existing set of IB events.  We already have IB-specific
> stuff for path migration etc.
>
>  > There is a whole slew of other events.  The above event, however, is
>  > key in that libcxgb3 could trigger a qp flush when this event is
>  > reaped by the application.  Currently, the flushing of the QP is only
>  > triggered by fatal connections errors as described above and/or if the
>  > application tries to post on a QP that has been marked in error by the
>  > kernel.   However, If the app does neither, then the flush never
>  > happens.  
>
> On the other hand, how does cxgb3 know when an application has reaped
> the event?  Do we need to add code to the uverbs module to know when an
> async event has reached userspace?
>
>   
I meant libcxgb3, not the kernel modules. The kernel driver knows the 
connection went down and the qp needs flushing. That's who posted the 
async event. The driver just needs a way to kick the library to do the 
flush because the kernel driver doesn't cannot touch the user structs 
(without painful synchronization). So the library will discover this 
when the app reaps the async event via the context ops async_event 
function that libcxgb3 registers.

Steve.


From akepner at sgi.com  Wed Apr 30 12:23:54 2008
From: akepner at sgi.com (akepner at sgi.com)
Date: Wed, 30 Apr 2008 12:23:54 -0700
Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2)
Message-ID: <20080430192354.GG26724@sgi.com>


At a customer site running OFED 1.2 we are seeing the 
following - after ~10s of hours of stressing IPoIB,
the card apparently stops generating TX completions.
(These are MT25204 cards in x86_64 boxes, and we've seen
this with a couple f/w versions, including the latest.)

We get something like:

kernel: NETDEV WATCHDOG: ib0: transmit timed out
kernel: ib0: transmit timeout: latency 1972 msecs
kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207

and that repeats "forever".

And to simplify things, we can produce this behavior in
datagram mode.

As long as only datagram mode is in use, the TX code in the
IPoIB driver seems quite straightforward. The only reason I
can imagine that we'd fail to get a timely TX completion
would be if link-level flow control were to throttle us. And
I'd expect that to be a transient condition... Am I
ovelooking something? Anyone seen similar? Suggestions for
debugging?

-- 
Arthur


From liranl at mellanox.co.il  Wed Apr 30 12:56:28 2008
From: liranl at mellanox.co.il (Liran Liss)
Date: Wed, 30 Apr 2008 22:56:28 +0300
Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support,
	Patch 10)
In-Reply-To: <adad4o8wmru.fsf@cisco.com>
Message-ID: <40FA0A8088E8A441973D37502F00933E3A24@mtlexch01.mtl.com>

> 
> I would just like to see an approach that is fully thought through and
> gives a way for applications/kernel drivers to choose a CQ vector
based
> on some information about what CPU it will go to.
>

Isn't the decision of which CPU an MSI-X is routed to (and hence, to
which CPI an EQ is bound to) determined by userspace? (either by the irq
balancer process or by manually setting /proc/irq/<vec>/smp_affinity)?

I am not sure we aren't better off leaving this to user-space: both
application and interrupt affinity are administrative tasks.
We can also use installation scripts to set a "default" configuration in
which vector 0 is bound to cpu0, vector 1 is bound to cpu1, etc.
 
> If we want to add a way to allow a request for round-robin, that is
> fine, but I don't think we want to change the default to round-robin,
> unless someone can come up with a workload where it actually helps.

Several IPoIB partitions can easily saturate a single core if their Rx
interrupts are not handled by several CPUs. This is not any different
from multiple Ethernet NICs whose interrupts are balanced today by the
irq balancer.

We can argue that IPoIB can use a special "round-robin" vector while
leaving the default vector fixed to a single EQ. However, there is
essentially no difference between IPoIB and other IB ULPs: an IB HCA is
actually a platform for other services, each with its own queues that
are directly accessed by HW, each with its own CQs and interrupt
moderation. Putting all these ULPs on a single EQ will prevent interrupt
balancing.

What are we risking in making the default action to spread interrupts?

--Liran


From eli at dev.mellanox.co.il  Wed Apr 30 13:00:55 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 30 Apr 2008 23:00:55 +0300
Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2)
In-Reply-To: <20080430192354.GG26724@sgi.com>
References: <20080430192354.GG26724@sgi.com>
Message-ID: <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com>

Artur,
when it happens please:
1. Check the link error counters.
2. Disconnect and reconnect the cable and see if it recovers.

On 4/30/08, akepner at sgi.com <akepner at sgi.com> wrote:
>
> At a customer site running OFED 1.2 we are seeing the
> following - after ~10s of hours of stressing IPoIB,
> the card apparently stops generating TX completions.
> (These are MT25204 cards in x86_64 boxes, and we've seen
> this with a couple f/w versions, including the latest.)
>
> We get something like:
>
> kernel: NETDEV WATCHDOG: ib0: transmit timed out
> kernel: ib0: transmit timeout: latency 1972 msecs
> kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207
>
> and that repeats "forever".
>
> And to simplify things, we can produce this behavior in
> datagram mode.
>
> As long as only datagram mode is in use, the TX code in the
> IPoIB driver seems quite straightforward. The only reason I
> can imagine that we'd fail to get a timely TX completion
> would be if link-level flow control were to throttle us. And
> I'd expect that to be a transient condition... Am I
> ovelooking something? Anyone seen similar? Suggestions for
> debugging?
>
> --
> Arthur
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>


From eli at dev.mellanox.co.il  Wed Apr 30 13:02:34 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 30 Apr 2008 23:02:34 +0300
Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2)
In-Reply-To: <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com>
References: <20080430192354.GG26724@sgi.com>
	<4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com>
Message-ID: <4e6a6b3c0804301302i1fc42d90u9a0ac7be9048b8eb@mail.gmail.com>

On 4/30/08, Eli Cohen <eli at dev.mellanox.co.il> wrote:
> Artur,
> when it happens please:
> 1. Check the link error counters.
> 2. Disconnect and reconnect the cable and see if it recovers.
>
Sorry for misspelling you name :-)


From eli at dev.mellanox.co.il  Wed Apr 30 13:00:55 2008
From: eli at dev.mellanox.co.il (Eli Cohen)
Date: Wed, 30 Apr 2008 23:00:55 +0300
Subject: [ofa-general] IPoIB-UD TX timeouts (OFED 1.2)
In-Reply-To: <20080430192354.GG26724@sgi.com>
References: <20080430192354.GG26724@sgi.com>
Message-ID: <4e6a6b3c0804301300q57b4b562r854e337ff8706222@mail.gmail.com>

Artur,
when it happens please:
1. Check the link error counters.
2. Disconnect and reconnect the cable and see if it recovers.

On 4/30/08, akepner at sgi.com <akepner at sgi.com> wrote:
>
> At a customer site running OFED 1.2 we are seeing the
> following - after ~10s of hours of stressing IPoIB,
> the card apparently stops generating TX completions.
> (These are MT25204 cards in x86_64 boxes, and we've seen
> this with a couple f/w versions, including the latest.)
>
> We get something like:
>
> kernel: NETDEV WATCHDOG: ib0: transmit timed out
> kernel: ib0: transmit timeout: latency 1972 msecs
> kernel: ib0: queue stopped 1, tx_head 3271, tx_tail 3207
>
> and that repeats "forever".
>
> And to simplify things, we can produce this behavior in
> datagram mode.
>
> As long as only datagram mode is in use, the TX code in the
> IPoIB driver seems quite straightforward. The only reason I
> can imagine that we'd fail to get a timely TX completion
> would be if link-level flow control were to throttle us. And
> I'd expect that to be a transient condition... Am I
> ovelooking something? Anyone seen similar? Suggestions for
> debugging?
>
> --
> Arthur
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>


From makc at sgi.com  Wed Apr 30 13:59:47 2008
From: makc at sgi.com (Max Matveev)
Date: Thu, 1 May 2008 06:59:47 +1000
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
Message-ID: <18456.56771.908062.459625@kuku.melbourne.sgi.com>


IB GID has the same format as IPv6 address, IPv6 addresses are
resolvable via DNS' AAAA or A6 records, you can go from IPv4 to name
to IPv6 address without reinventing the wheel.

It would not help with replacing arp use in rdma_cm though.

max


From swise at opengridcomputing.com  Wed Apr 30 14:21:09 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 16:21:09 -0500
Subject: [ofa-general] [GIT PULL ofed-1.3.1] - chelsio changes for ofed-1.3.1
Message-ID: <4818E2C5.7060907@opengridcomputing.com>

Vlad,

Please pull from:

git://git.openfabrics.org/~swise/ofed-1.3 ofed_kernel

This will sync up ofed-1.3.1 with all the important upstream fixes since 
ofed-1.3.  The patch files added are:

kernel_patches/fixes/iw_cxgb3_0080_Fail_Loopback_Connections.patch
kernel_patches/fixes/iw_cxgb3_0090_Fix_shift_calc_in_build_phys_page_list_for_1-entry_page_lists.patch
kernel_patches/fixes/iw_cxgb3_0100_Return_correct_max_inline_data_when_creating_a_QP.patch
kernel_patches/fixes/iw_cxgb3_0110_Fix_iwch_create_cq_off-by-one_error.patch
kernel_patches/fixes/iw_cxgb3_0120_Dont_access_a_cm_id_after_dropping_reference.patch
kernel_patches/fixes/iw_cxgb3_0130_Correctly_set_the_max_mr_size_device_attribute.patch
kernel_patches/fixes/iw_cxgb3_0140_Correctly_serialize_peer_abort_path.patch
kernel_patches/fixes/iw_cxgb3_0150_Support_peer-2-peer_connection_setup.patch


Thanks,

Steve.


From swise at opengridcomputing.com  Wed Apr 30 14:23:40 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 16:23:40 -0500
Subject: [ofa-general] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0
Message-ID: <4818E35C.4050206@opengridcomputing.com>

Vlad,

Please pull in version 1.2.0 of libcxgb3.  This is needed for the 
ofed-1.3.1 kernel drivers.

Pull from:

git://git.openfabrics.org/~swise/libcxgb3 ofed_1_3_1

Thanks,

Steve.


From jgunthorpe at obsidianresearch.com  Wed Apr 30 14:30:51 2008
From: jgunthorpe at obsidianresearch.com (Jason Gunthorpe)
Date: Wed, 30 Apr 2008 15:30:51 -0600
Subject: [ofa-general] mapping IP addresses to GIDs across IP subnets
In-Reply-To: <18456.56771.908062.459625@kuku.melbourne.sgi.com>
References: <18456.56771.908062.459625@kuku.melbourne.sgi.com>
Message-ID: <20080430213051.GX24525@obsidianresearch.com>

On Thu, May 01, 2008 at 06:59:47AM +1000, Max Matveev wrote:
> IB GID has the same format as IPv6 address, IPv6 addresses are
> resolvable via DNS' AAAA or A6 records, you can go from IPv4 to name
> to IPv6 address without reinventing the wheel.

Well, you can't just assume that a AAAA record associated with the
reverse of a IPv4 is a GID - it could be a legitimate IPv6 address.
The GID space and IPv6 space are completely distinct, despite the same
format of the address.

The only way I could see to do this with DNS is to introduce a new
record type for GIDs..

Alternatively, you could use DNS to manage a mapping table, ala the
reverse map:

1.0.0.10.ipv4.ibta-addr. AAAA fd83:609c:bdc8:1:213:72ff:fe29:e65d

Jason


From roland.list at gmail.com  Wed Apr 30 15:21:17 2008
From: roland.list at gmail.com (Roland Dreier)
Date: Wed, 30 Apr 2008 15:21:17 -0700
Subject: [ofa-general] [GIT PULL ofed-1.3.1] - chelsio changes for
	ofed-1.3.1
In-Reply-To: <4818E2C5.7060907@opengridcomputing.com>
References: <4818E2C5.7060907@opengridcomputing.com>
Message-ID: <f8ca0a150804301521q44b5baaap5c4800964e3489ad@mail.gmail.com>

Steve -- did the IRD/ORD mixup fix get included?  (It's 1f71f503
"RDMA/cxgb3: Program hardware IRD with correct value") in the upstream
kernel

On Wed, Apr 30, 2008 at 2:21 PM, Steve Wise <swise at opengridcomputing.com> wrote:
> Vlad,
>
>  Please pull from:
>
>  git://git.openfabrics.org/~swise/ofed-1.3 ofed_kernel
>
>  This will sync up ofed-1.3.1 with all the important upstream fixes since
> ofed-1.3.  The patch files added are:
>
>  kernel_patches/fixes/iw_cxgb3_0080_Fail_Loopback_Connections.patch
>
> kernel_patches/fixes/iw_cxgb3_0090_Fix_shift_calc_in_build_phys_page_list_for_1-entry_page_lists.patch
>
> kernel_patches/fixes/iw_cxgb3_0100_Return_correct_max_inline_data_when_creating_a_QP.patch
>
> kernel_patches/fixes/iw_cxgb3_0110_Fix_iwch_create_cq_off-by-one_error.patch
>
> kernel_patches/fixes/iw_cxgb3_0120_Dont_access_a_cm_id_after_dropping_reference.patch
>
> kernel_patches/fixes/iw_cxgb3_0130_Correctly_set_the_max_mr_size_device_attribute.patch
>
> kernel_patches/fixes/iw_cxgb3_0140_Correctly_serialize_peer_abort_path.patch
>
> kernel_patches/fixes/iw_cxgb3_0150_Support_peer-2-peer_connection_setup.patch
>
>
>  Thanks,
>
>  Steve.
>  _______________________________________________
>  general mailing list
>  general at lists.openfabrics.org
>  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
>  To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>


From rdreier at cisco.com  Wed Apr 30 15:24:18 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 15:24:18 -0700
Subject: [ofa-general] Re: [ewg] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0
In-Reply-To: <4818E35C.4050206@opengridcomputing.com> (Steve Wise's message of
	"Wed, 30 Apr 2008 16:23:40 -0500")
References: <4818E35C.4050206@opengridcomputing.com>
Message-ID: <adar6cn6kx9.fsf@cisco.com>

Steve -- If you put a tarball (from make dist ;) on openfabrics.org,
I'll update the Debian packages.


From swise at opengridcomputing.com  Wed Apr 30 15:25:40 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 17:25:40 -0500
Subject: [ofa-general] [GIT PULL ofed-1.3.1] - chelsio changes for
	ofed-1.3.1
In-Reply-To: <f8ca0a150804301521q44b5baaap5c4800964e3489ad@mail.gmail.com>
References: <4818E2C5.7060907@opengridcomputing.com>
	<f8ca0a150804301521q44b5baaap5c4800964e3489ad@mail.gmail.com>
Message-ID: <4818F1E4.1080202@opengridcomputing.com>

Roland Dreier wrote:
> Steve -- did the IRD/ORD mixup fix get included?  (It's 1f71f503
> "RDMA/cxgb3: Program hardware IRD with correct value") in the upstream
> kernel
>
>   

Oops.  Good catch.

No worries though, I've got another series to post (including the qp 
flush bug NFSRDMA found) for ofed-1.3.1 so i'll add this one.

Thanks,

Steve.


> On Wed, Apr 30, 2008 at 2:21 PM, Steve Wise <swise at opengridcomputing.com> wrote:
>   
>> Vlad,
>>
>>  Please pull from:
>>
>>  git://git.openfabrics.org/~swise/ofed-1.3 ofed_kernel
>>
>>  This will sync up ofed-1.3.1 with all the important upstream fixes since
>> ofed-1.3.  The patch files added are:
>>
>>  kernel_patches/fixes/iw_cxgb3_0080_Fail_Loopback_Connections.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0090_Fix_shift_calc_in_build_phys_page_list_for_1-entry_page_lists.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0100_Return_correct_max_inline_data_when_creating_a_QP.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0110_Fix_iwch_create_cq_off-by-one_error.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0120_Dont_access_a_cm_id_after_dropping_reference.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0130_Correctly_set_the_max_mr_size_device_attribute.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0140_Correctly_serialize_peer_abort_path.patch
>>
>> kernel_patches/fixes/iw_cxgb3_0150_Support_peer-2-peer_connection_setup.patch
>>
>>
>>  Thanks,
>>
>>  Steve.
>>  _______________________________________________
>>  general mailing list
>>  general at lists.openfabrics.org
>>  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>>  To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>>     


From rdreier at cisco.com  Wed Apr 30 15:25:40 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 15:25:40 -0700
Subject: [ofa-general] Re: [PATCH 00/13] QLogic Virtual NIC (VNIC) Driver
In-Reply-To: <20080430171028.31725.86190.stgit@localhost.localdomain>
	(Ramachandra K.'s message of "Wed, 30 Apr 2008 22:45:52 +0530")
References: <20080430171028.31725.86190.stgit@localhost.localdomain>
Message-ID: <adamynb6kuz.fsf@cisco.com>

 > This is the QLogic Virtual NIC driver patch series which has been tested
 > against your for-2.6.26 and for-2.6.27 branches. We intended these patches to
 > make it to the 2.6.26 kernel, but if it is too late for the 2.6.26 merge window
 > please consider them for 2.6.27.

Yes, *WAY* too late for 2.6.26, given that today is the last day of the
merge window, and that things that get merged need to be ready before
the merge window opens.

 > The driver compiles cleanly with sparse endianness checking enabled. We have
 > also tested the driver with lockdep checking enabled.
 > 
 > We have run these patches through checkpatch.pl and the only warnings are
 > related to lines slightly longer than 80 columns in some of the statements.

All good news.

Will review and I hope get this into 2.6.27.

 - R.


From swise at opengridcomputing.com  Wed Apr 30 15:26:22 2008
From: swise at opengridcomputing.com (Steve Wise)
Date: Wed, 30 Apr 2008 17:26:22 -0500
Subject: [ofa-general] Re: [ewg] [GIT PULL ofed-1.3.1] libcxgb3 version 1.2.0
In-Reply-To: <adar6cn6kx9.fsf@cisco.com>
References: <4818E35C.4050206@opengridcomputing.com>
	<adar6cn6kx9.fsf@cisco.com>
Message-ID: <4818F20E.3040500@opengridcomputing.com>

Roland Dreier wrote:
> Steve -- If you put a tarball (from make dist ;) on openfabrics.org,
> I'll update the Debian packages.
>   

I plan to do this soon.

Steve.


From rdreier at cisco.com  Wed Apr 30 15:30:03 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 15:30:03 -0700
Subject: [ofa-general][PATCH] Re: mlx4: Completion EQ per cpu (MP support,
	Patch 10)
In-Reply-To: <40FA0A8088E8A441973D37502F00933E3A24@mtlexch01.mtl.com> (Liran
	Liss's message of "Wed, 30 Apr 2008 22:56:28 +0300")
References: <40FA0A8088E8A441973D37502F00933E3A24@mtlexch01.mtl.com>
Message-ID: <adaiqxz6kno.fsf@cisco.com>

 > > I would just like to see an approach that is fully thought through and
 > > gives a way for applications/kernel drivers to choose a CQ vector based
 > > on some information about what CPU it will go to.

 > Isn't the decision of which CPU an MSI-X is routed to (and hence, to
 > which CPI an EQ is bound to) determined by userspace? (either by the irq
 > balancer process or by manually setting /proc/irq/<vec>/smp_affinity)?

Yes, but how can anything tell which IRQ number corresponds to a given
"CQ vector" number?  (And don't be too stuck on MSI-X, since ehca uses
some completely different GX-bus related thing to get multiple interrupts)

 > What are we risking in making the default action to spread interrupts?

There are fairly plausible scenarios like a multi-threaded app where
each thread creates a send CQ and a receive CQ, which should both be
bound to the same CPU as the thread.  If we spread all CQs then it's
impossible to get thread-locality.

I'm not saying that round-robin is necessarily a bad default policy, but
I do think there needs to be a complete picture of how that policy can
be overridden before we go for multiple interrupt vectors.

 - R.


From arlin.r.davis at intel.com  Wed Apr 30 15:57:54 2008
From: arlin.r.davis at intel.com (Davis, Arlin R)
Date: Wed, 30 Apr 2008 15:57:54 -0700
Subject: [ofa-general] [PATCH] [dat2.0] dapl: fix post_ext_send, post_send,
	post_recv to handle 0 byte's and NULL iov handles
Message-ID: <B0095134066CC94FBC80973103FFA1FE06FBA33D@orsmsx416.amr.corp.intel.com>

and return errno with verbs post failures.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_dto.h        |   20 ++++++++++++--------
 dapl/openib_cma/dapl_ib_extensions.c |    3 ---
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_dto.h
b/dapl/openib_cma/dapl_ib_dto.h
index b111e5e..ffb5dca 100644
--- a/dapl/openib_cma/dapl_ib_dto.h
+++ b/dapl/openib_cma/dapl_ib_dto.h
@@ -124,7 +124,7 @@ dapls_ib_post_recv (
 	    dapl_os_free(ds_array_start_p, segments *
sizeof(ib_data_segment_t));
 
 	if (ret)
-		return( dapl_convert_errno(EFAULT,"ibv_recv") );
+		return( dapl_convert_errno(errno,"ibv_recv") );
 
 	return DAT_SUCCESS;
 }
@@ -202,7 +202,8 @@ dapls_ib_post_send (
 	if (cookie != NULL) 
 		cookie->val.dto.size = total_len;
 
-	if ((op_type == OP_RDMA_WRITE) || (op_type == OP_RDMA_READ)) {
+	if (wr.num_sge && 
+	    (op_type == OP_RDMA_WRITE || op_type == OP_RDMA_READ)) {
 		wr.wr.rdma.remote_addr = remote_iov->virtual_address;
 		wr.wr.rdma.rkey = remote_iov->rmr_context;
 		dapl_dbg_log(DAPL_DBG_TYPE_EP, 
@@ -234,7 +235,7 @@ dapls_ib_post_send (
 	    dapl_os_free(ds_array_start_p, segments *
sizeof(ib_data_segment_t));
 
 	if (ret)
-		return( dapl_convert_errno(EFAULT,"ibv_send") );
+		return( dapl_convert_errno(errno,"ibv_send") );
 
 	dapl_dbg_log(DAPL_DBG_TYPE_EP," post_snd: returned\n");
 	return DAT_SUCCESS;
@@ -357,12 +358,15 @@ dapls_ib_post_ext_send (
 		/* OP_RDMA_WRITE)IMMED has direct IB wr_type mapping */
 		dapl_dbg_log(DAPL_DBG_TYPE_EP, 
 			     " post_ext: rkey 0x%x va %#016Lx
immed=0x%x\n",
-			     remote_iov->rmr_context, 
-			     remote_iov->virtual_address, immed_data);
+			     remote_iov?remote_iov->rmr_context:0, 
+			     remote_iov?remote_iov->virtual_address:0,
+			     immed_data);
 
 		wr.imm_data = immed_data;
-		wr.wr.rdma.remote_addr = remote_iov->virtual_address;
-		wr.wr.rdma.rkey = remote_iov->rmr_context;
+	        if (wr.num_sge) {
+			wr.wr.rdma.remote_addr =
remote_iov->virtual_address;
+			wr.wr.rdma.rkey = remote_iov->rmr_context;
+		}
 		break;
 	case OP_COMP_AND_SWAP:
 		/* OP_COMP_AND_SWAP has direct IB wr_type mapping */
@@ -411,7 +415,7 @@ dapls_ib_post_ext_send (
 	    dapl_os_free(ds_array_start_p, segments *
sizeof(ib_data_segment_t));
 
 	if (ret)
-		return( dapl_convert_errno(EFAULT,"ibv_send") );
+		return( dapl_convert_errno(errno,"ibv_send") );
 	
 	dapl_dbg_log(DAPL_DBG_TYPE_EP," post_snd: returned\n");
 	return DAT_SUCCESS;
diff --git a/dapl/openib_cma/dapl_ib_extensions.c
b/dapl/openib_cma/dapl_ib_extensions.c
index 3132ffb..52b238f 100755
--- a/dapl/openib_cma/dapl_ib_extensions.c
+++ b/dapl/openib_cma/dapl_ib_extensions.c
@@ -185,9 +185,6 @@ dapli_post_ext( IN       DAT_EP_HANDLE
ep_handle,
 	if (DAPL_BAD_HANDLE(ep_handle, DAPL_MAGIC_EP))
 		return(DAT_ERROR(DAT_INVALID_HANDLE,
DAT_INVALID_HANDLE_EP));
 
-	if ((NULL == remote_iov) || (NULL == local_iov))
-		return DAT_INVALID_PARAMETER;
-
 	ep_ptr = (DAPL_EP *) ep_handle;
 	qp_ptr = ep_ptr->qp_handle;
 
-- 
1.5.2.5


From arlin.r.davis at intel.com  Wed Apr 30 15:57:50 2008
From: arlin.r.davis at intel.com (Arlin Davis)
Date: Wed, 30 Apr 2008 15:57:50 -0700
Subject: [ofa-general] [PATCH] [dat1.2] dapl: fix post_send,
	post_recv to handle 0 byte's and NULL iov handles
Message-ID: <000901c8ab15$9d72f9c0$daba020a@amr.corp.intel.com>


and return errno with verbs post failures.

Signed-off by: Arlin Davis ardavis at ichips.intel.com
---
 dapl/openib_cma/dapl_ib_dto.h |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/dapl/openib_cma/dapl_ib_dto.h b/dapl/openib_cma/dapl_ib_dto.h
index 52b189b..f45da35 100644
--- a/dapl/openib_cma/dapl_ib_dto.h
+++ b/dapl/openib_cma/dapl_ib_dto.h
@@ -120,7 +120,7 @@ dapls_ib_post_recv (
 	    dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t));
 
 	if (ret)
-		return( dapl_convert_errno(EFAULT,"ibv_recv") );
+		return( dapl_convert_errno(errno,"ibv_recv") );
 
 	return DAT_SUCCESS;
 }
@@ -199,7 +199,8 @@ dapls_ib_post_send (
 	if (cookie != NULL) 
 		cookie->val.dto.size = total_len;
 	
-	if ((op_type == OP_RDMA_WRITE) || (op_type == OP_RDMA_READ)) {
+	if (wr.num_sge &&
+	    (op_type == OP_RDMA_WRITE || op_type == OP_RDMA_READ)) {
 		wr.wr.rdma.remote_addr = remote_iov->target_address;
 		wr.wr.rdma.rkey = remote_iov->rmr_context;
 		dapl_dbg_log(DAPL_DBG_TYPE_EP, 
@@ -230,7 +231,7 @@ dapls_ib_post_send (
 	    dapl_os_free(ds_array_start_p, segments * sizeof(ib_data_segment_t));
 
 	if (ret)
-		return( dapl_convert_errno(EFAULT,"ibv_send") );
+		return( dapl_convert_errno(errno,"ibv_send") );
 
 	dapl_dbg_log(DAPL_DBG_TYPE_EP," post_snd: returned\n");
 	return DAT_SUCCESS;
-- 
1.5.2.5


From rdreier at cisco.com  Wed Apr 30 19:55:24 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 19:55:24 -0700
Subject: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup
In-Reply-To: <1209577156.1790.11.camel@mtls03> (Eli Cohen's message of "Wed,
	30 Apr 2008 20:39:16 +0300")
References: <1209577156.1790.11.camel@mtls03>
Message-ID: <aday76u68df.fsf@cisco.com>

thanks, looks like a good solution, applied, just adding an ipoib_
prefix since

 > +void send_comp_handler(struct ib_cq *cq, void *dev_ptr)

is too generic a name for a global symbol.

By the way I figured out the crash on unload -- it was an mlx4 bug that
I introduced, which is fixed by:


IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()

When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I
changed things around so that mlx4_ib_alloc_cq_buf() and
mlx4_ib_free_cq_buf() were used everywhere they could be.  However, I
screwed up the number of entries passed into mlx4_ib_alloc_cq_buf()
in a couple places -- the function bumps the number of entries
internally, so the caller shouldn't add 1 as well.

Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf()
can cause the cleanup to go off the end of an array and corrupt
allocator state in interesting ways.

Signed-off-by: Roland Dreier <rolandd at cisco.com>
---
 drivers/infiniband/hw/mlx4/cq.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 2f199c5..4521319 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -246,7 +246,7 @@ err_mtt:
 	if (context)
 		ib_umem_release(cq->umem);
 	else
-		mlx4_ib_free_cq_buf(dev, &cq->buf, entries);
+		mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
 
 err_db:
 	if (!context)
@@ -434,7 +434,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq)
 		mlx4_ib_db_unmap_user(to_mucontext(cq->uobject->context), &mcq->db);
 		ib_umem_release(mcq->umem);
 	} else {
-		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1);
+		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe);
 		mlx4_db_free(dev->dev, &mcq->db);
 	}
 
-- 
1.5.5.1


From rdreier at cisco.com  Wed Apr 30 20:05:43 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 20:05:43 -0700
Subject: [ofa-general] [PATCH] IB/ipoib: fix net queue lockup
In-Reply-To: <1209577156.1790.11.camel@mtls03> (Eli Cohen's message of "Wed,
	30 Apr 2008 20:39:16 +0300")
References: <1209577156.1790.11.camel@mtls03>
Message-ID: <adar6cm67w8.fsf@cisco.com>

 > we haves seen a few other cases where a large tx queue is needed. I
 > think we should choose a larger default value than the current 64.

maybe yes, maybe no... what are the cases where it is needed?

The send queue is basically acting as a "shock absorber" for bursty
traffic.  If the queue is filling up because of a steady traffic rate,
then making the queue bigger means it will just take a little longer to
fill.  The way a longer send queue helps I guess is if the send queue is
emptying out before the transmit queue is woken up...

with small packets I suppose it doesn't take long for the send queue to
drain completely.

 - R.


From rdreier at cisco.com  Wed Apr 30 20:46:24 2008
From: rdreier at cisco.com (Roland Dreier)
Date: Wed, 30 Apr 2008 20:46:24 -0700
Subject: [ofa-general] [GIT PULL] please pull infiniband.git
Message-ID: <adamyna660f.fsf@cisco.com>

Linus, please pull from

    master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This tree is also available from kernel.org mirrors at:

    git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus

This will get a couple of fixes for medium-impact bugs.  If they can go
into -rc1, great; otherwise the world won't end if they end up in -rc2.

Eli Cohen (1):
      IB/ipoib: Fix transmit queue stalling forever

Roland Dreier (1):
      IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()

 drivers/infiniband/hw/mlx4/cq.c            |    4 +-
 drivers/infiniband/ulp/ipoib/ipoib.h       |    2 +
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |   47 +++++++++++++++++++++++++---
 drivers/infiniband/ulp/ipoib/ipoib_verbs.c |    3 +-
 4 files changed, 48 insertions(+), 8 deletions(-)


diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 2f199c5..4521319 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -246,7 +246,7 @@ err_mtt:
 	if (context)
 		ib_umem_release(cq->umem);
 	else
-		mlx4_ib_free_cq_buf(dev, &cq->buf, entries);
+		mlx4_ib_free_cq_buf(dev, &cq->buf, cq->ibcq.cqe);
 
 err_db:
 	if (!context)
@@ -434,7 +434,7 @@ int mlx4_ib_destroy_cq(struct ib_cq *cq)
 		mlx4_ib_db_unmap_user(to_mucontext(cq->uobject->context), &mcq->db);
 		ib_umem_release(mcq->umem);
 	} else {
-		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe + 1);
+		mlx4_ib_free_cq_buf(dev, &mcq->buf, cq->cqe);
 		mlx4_db_free(dev->dev, &mcq->db);
 	}
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 9044f88..ca126fc 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -334,6 +334,7 @@ struct ipoib_dev_priv {
 #endif
 	int	hca_caps;
 	struct ipoib_ethtool_st ethtool;
+	struct timer_list poll_timer;
 };
 
 struct ipoib_ah {
@@ -404,6 +405,7 @@ extern struct workqueue_struct *ipoib_workqueue;
 
 int ipoib_poll(struct napi_struct *napi, int budget);
 void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr);
+void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr);
 
 struct ipoib_ah *ipoib_create_ah(struct net_device *dev,
 				 struct ib_pd *pd, struct ib_ah_attr *attr);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 97b815c..f429bce 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -461,6 +461,26 @@ void ipoib_ib_completion(struct ib_cq *cq, void *dev_ptr)
 	netif_rx_schedule(dev, &priv->napi);
 }
 
+static void drain_tx_cq(struct net_device *dev)
+{
+	struct ipoib_dev_priv *priv = netdev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->tx_lock, flags);
+	while (poll_tx(priv))
+		; /* nothing */
+
+	if (netif_queue_stopped(dev))
+		mod_timer(&priv->poll_timer, jiffies + 1);
+
+	spin_unlock_irqrestore(&priv->tx_lock, flags);
+}
+
+void ipoib_send_comp_handler(struct ib_cq *cq, void *dev_ptr)
+{
+	drain_tx_cq((struct net_device *)dev_ptr);
+}
+
 static inline int post_send(struct ipoib_dev_priv *priv,
 			    unsigned int wr_id,
 			    struct ib_ah *address, u32 qpn,
@@ -555,12 +575,22 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	else
 		priv->tx_wr.send_flags &= ~IB_SEND_IP_CSUM;
 
+	if (++priv->tx_outstanding == ipoib_sendq_size) {
+		ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
+		if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP))
+			ipoib_warn(priv, "request notify on send CQ failed\n");
+		netif_stop_queue(dev);
+	}
+
 	if (unlikely(post_send(priv, priv->tx_head & (ipoib_sendq_size - 1),
 			       address->ah, qpn, tx_req, phead, hlen))) {
 		ipoib_warn(priv, "post_send failed\n");
 		++dev->stats.tx_errors;
+		--priv->tx_outstanding;
 		ipoib_dma_unmap_tx(priv->ca, tx_req);
 		dev_kfree_skb_any(skb);
+		if (netif_queue_stopped(dev))
+			netif_wake_queue(dev);
 	} else {
 		dev->trans_start = jiffies;
 
@@ -568,14 +598,11 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
 		++priv->tx_head;
 		skb_orphan(skb);
 
-		if (++priv->tx_outstanding == ipoib_sendq_size) {
-			ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
-			netif_stop_queue(dev);
-		}
 	}
 
 	if (unlikely(priv->tx_outstanding > MAX_SEND_CQE))
-		poll_tx(priv);
+		while (poll_tx(priv))
+			; /* nothing */
 }
 
 static void __ipoib_reap_ah(struct net_device *dev)
@@ -609,6 +636,11 @@ void ipoib_reap_ah(struct work_struct *work)
 				   round_jiffies_relative(HZ));
 }
 
+static void ipoib_ib_tx_timer_func(unsigned long ctx)
+{
+	drain_tx_cq((struct net_device *)ctx);
+}
+
 int ipoib_ib_dev_open(struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -645,6 +677,10 @@ int ipoib_ib_dev_open(struct net_device *dev)
 	queue_delayed_work(ipoib_workqueue, &priv->ah_reap_task,
 			   round_jiffies_relative(HZ));
 
+	init_timer(&priv->poll_timer);
+	priv->poll_timer.function = ipoib_ib_tx_timer_func;
+	priv->poll_timer.data = (unsigned long)dev;
+
 	set_bit(IPOIB_FLAG_INITIALIZED, &priv->flags);
 
 	return 0;
@@ -810,6 +846,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush)
 	ipoib_dbg(priv, "All sends and receives done.\n");
 
 timeout:
+	del_timer_sync(&priv->poll_timer);
 	qp_attr.qp_state = IB_QPS_RESET;
 	if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
 		ipoib_warn(priv, "Failed to modify QP to RESET state\n");
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
index c1e7ece..8766d29 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c
@@ -187,7 +187,8 @@ int ipoib_transport_dev_init(struct net_device *dev, struct ib_device *ca)
 		goto out_free_mr;
 	}
 
-	priv->send_cq = ib_create_cq(priv->ca, NULL, NULL, dev, ipoib_sendq_size, 0);
+	priv->send_cq = ib_create_cq(priv->ca, ipoib_send_comp_handler, NULL,
+				     dev, ipoib_sendq_size, 0);
 	if (IS_ERR(priv->send_cq)) {
 		printk(KERN_WARNING "%s: failed to create send CQ\n", ca->name);
 		goto out_free_recv_cq;